tehom package¶
Submodules¶
tehom.downloads module¶
Create data samples of labeled acoustics
Uses hydrophone data from Ocean Networks Canada and Automated Information System records from Marine Cadastre to label acoustic recordings for machine learning.
With this module, you can:
Download blocks of AIS records
Download hydrophone recordings
Show the data either available for download or already downloaded
Sample downloaded data into a labeled format, ready for
model.fit
- class tehom.downloads.SampleParams(interval: Union[str, Timedelta] = Timedelta('0 days 00:05:00'), duration: Union[str, Timedelta] = Timedelta('0 days 00:00:01'), extension: str = 'mp3')¶
Bases:
object
Settings for repeatable data samples. For you probability nerds, this should control the X in X(omega), while the other parameters to sample() control the omega.
- duration: Union[str, Timedelta] = Timedelta('0 days 00:00:01')¶
- extension: str = 'mp3'¶
- interval: Union[str, Timedelta] = Timedelta('0 days 00:05:00')¶
- tehom.downloads.certify_audio_availability()¶
Works with ONC server to determine data availability intervals
As this is a long-running-process, it saves its progress along the way in a pickle file and restarts from the last pickle.
- tehom.downloads.download_acoustics(hydrophones: List[str], begin: Union[datetime, str, Timestamp], end: Union[datetime, str, Timestamp], extension: str) None ¶
Download acoustic data from ONC. Data is stored in files in the package data folder, but metadata and paths are stored in a local sqlite database.
- Parameters:
hydrophones (List[str]) – A list of hydrophone names, which ONC refers to as ‘deviceCode’s.
begin (Union[datetime, str, pd.Timestamp]) – start time to download. Will download the file that starts before this time but finishes after.
end (Union[datetime, str, Timestamp]) – end time to download. Will download the file that ends after this time but begins before.
extension (str) – The file type to download the acoustics. Can be mp3, wav, png, or mat
- tehom.downloads.download_ships(year: int, month: int, zone: int) None ¶
Download AIS records from Marine Cadastre. Records are stored in a local sqlite database. Marine Cadastre organizes records into csv files by year, month, and UTM zone (wikipedia the Universal_Transverse_Mercator_coordinate_system)
- Parameters:
year (int) – year to download
month (int) – month to download
zone (int) – UTM zone to download
- tehom.downloads.filter_hphones_rect(hphones, sw_corner=(-90, -180), ne_corner=(90, 180))¶
Filter a hydrophone table by geographic area
- tehom.downloads.get_audio_availability(start: Union[Timestamp, str] = '2010', finish: Union[Timestamp, str] = Timestamp('2022-12-30 06:20:25.182942+0000', tz='UTC'), certified: bool = False) DataFrame ¶
Show what hydrophones have data available within an interval
- Parameters:
start – beginning of interval
finish – end of interval
certified – Whether to show certified availaibility (or just deployments)
- tehom.downloads.sample(hydrophones: List[str], begin: Union[datetime, str, Timestamp], end: Union[datetime, str, Timestamp], sample_params: SampleParams, ais_db: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/tehom/envs/latest/lib/python3.8/site-packages/tehom/storage/ais.db'), onc_db: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/tehom/envs/latest/lib/python3.8/site-packages/tehom/storage/onc.db'), verbose: bool = False) DataFrame ¶
Sample the downloaded acoustics and AIS data to create a labeled dataset for machine learning, ready for
model.fit()
.- Parameters:
hydrophones – List of hydrophone names (what ONC calls ‘deviceCode’s) to sample.
begin – start time for sample
end – end time for sample
sample_params (SampleParams) – parameters for repeatable or related data samples.
ais_db – path to the database of AIS records
onc_db – database to track ONC downloads
verbose – heavy print output for debugging.
- Returns:
DataFrame indexed by (hydrophone, time), a column for acoustic data as a numpy array, and columns for each label
- tehom.downloads.show_available_data(begin: Union[datetime, str, Timestamp], end: Union[datetime, str, Timestamp], style: str, bottomleft: Tuple[float] = (-90.0, -180.0), topright: Tuple[float] = (90.0, 180.0), ais_db: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/tehom/envs/latest/lib/python3.8/site-packages/tehom/storage/ais.db'), onc_db: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/tehom/envs/latest/lib/python3.8/site-packages/tehom/storage/onc.db'), certified: bool = False) Union[Figure, Figure] ¶
Creates a visualization of what data is available to download and what data is available locally for sampling.
- Parameters:
begin – start time to display
end – end time to display
bottomleft – Latitude, longitude tuple. Only include hydrophones north and east of this point.
topright – Latitude, longitude tuple. Only include hydrophones south and west of this point.
style – Either ‘map’ for a geographic map with hydrophones identified or ‘bar’ for a bar chart showing overlapping downloads.
ais_db – path to the database of AIS records
onc_db – database to track ONC downloads
certified – Whether to restrict ranges to when data actually available
- Returns:
A plotly figure if
style='map'
or a matplotlib figure ifstyle='bar'