tehom package

Submodules

tehom.downloads module

Create data samples of labeled acoustics

Uses hydrophone data from Ocean Networks Canada and Automated Information System records from Marine Cadastre to label acoustic recordings for machine learning.

With this module, you can:

  • Download blocks of AIS records

  • Download hydrophone recordings

  • Show the data either available for download or already downloaded

  • Sample downloaded data into a labeled format, ready for model.fit

class tehom.downloads.SampleParams(interval: Union[str, Timedelta] = Timedelta('0 days 00:05:00'), duration: Union[str, Timedelta] = Timedelta('0 days 00:00:01'), extension: str = 'mp3')

Bases: object

Settings for repeatable data samples. For you probability nerds, this should control the X in X(omega), while the other parameters to sample() control the omega.

duration: Union[str, Timedelta] = Timedelta('0 days 00:00:01')
extension: str = 'mp3'
interval: Union[str, Timedelta] = Timedelta('0 days 00:05:00')
tehom.downloads.certify_audio_availability()

Works with ONC server to determine data availability intervals

As this is a long-running-process, it saves its progress along the way in a pickle file and restarts from the last pickle.

tehom.downloads.download_acoustics(hydrophones: List[str], begin: Union[datetime, str, Timestamp], end: Union[datetime, str, Timestamp], extension: str) None

Download acoustic data from ONC. Data is stored in files in the package data folder, but metadata and paths are stored in a local sqlite database.

Parameters:
  • hydrophones (List[str]) – A list of hydrophone names, which ONC refers to as ‘deviceCode’s.

  • begin (Union[datetime, str, pd.Timestamp]) – start time to download. Will download the file that starts before this time but finishes after.

  • end (Union[datetime, str, Timestamp]) – end time to download. Will download the file that ends after this time but begins before.

  • extension (str) – The file type to download the acoustics. Can be mp3, wav, png, or mat

tehom.downloads.download_ships(year: int, month: int, zone: int) None

Download AIS records from Marine Cadastre. Records are stored in a local sqlite database. Marine Cadastre organizes records into csv files by year, month, and UTM zone (wikipedia the Universal_Transverse_Mercator_coordinate_system)

Parameters:
  • year (int) – year to download

  • month (int) – month to download

  • zone (int) – UTM zone to download

tehom.downloads.filter_hphones_rect(hphones, sw_corner=(-90, -180), ne_corner=(90, 180))

Filter a hydrophone table by geographic area

tehom.downloads.get_audio_availability(start: Union[Timestamp, str] = '2010', finish: Union[Timestamp, str] = Timestamp('2022-12-30 06:20:25.182942+0000', tz='UTC'), certified: bool = False) DataFrame

Show what hydrophones have data available within an interval

Parameters:
  • start – beginning of interval

  • finish – end of interval

  • certified – Whether to show certified availaibility (or just deployments)

tehom.downloads.sample(hydrophones: List[str], begin: Union[datetime, str, Timestamp], end: Union[datetime, str, Timestamp], sample_params: SampleParams, ais_db: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/tehom/envs/latest/lib/python3.8/site-packages/tehom/storage/ais.db'), onc_db: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/tehom/envs/latest/lib/python3.8/site-packages/tehom/storage/onc.db'), verbose: bool = False) DataFrame

Sample the downloaded acoustics and AIS data to create a labeled dataset for machine learning, ready for model.fit().

Parameters:
  • hydrophones – List of hydrophone names (what ONC calls ‘deviceCode’s) to sample.

  • begin – start time for sample

  • end – end time for sample

  • sample_params (SampleParams) – parameters for repeatable or related data samples.

  • ais_db – path to the database of AIS records

  • onc_db – database to track ONC downloads

  • verbose – heavy print output for debugging.

Returns:

DataFrame indexed by (hydrophone, time), a column for acoustic data as a numpy array, and columns for each label

tehom.downloads.show_available_data(begin: Union[datetime, str, Timestamp], end: Union[datetime, str, Timestamp], style: str, bottomleft: Tuple[float] = (-90.0, -180.0), topright: Tuple[float] = (90.0, 180.0), ais_db: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/tehom/envs/latest/lib/python3.8/site-packages/tehom/storage/ais.db'), onc_db: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/tehom/envs/latest/lib/python3.8/site-packages/tehom/storage/onc.db'), certified: bool = False) Union[Figure, Figure]

Creates a visualization of what data is available to download and what data is available locally for sampling.

Parameters:
  • begin – start time to display

  • end – end time to display

  • bottomleft – Latitude, longitude tuple. Only include hydrophones north and east of this point.

  • topright – Latitude, longitude tuple. Only include hydrophones south and west of this point.

  • style – Either ‘map’ for a geographic map with hydrophones identified or ‘bar’ for a bar chart showing overlapping downloads.

  • ais_db – path to the database of AIS records

  • onc_db – database to track ONC downloads

  • certified – Whether to restrict ranges to when data actually available

Returns:

A plotly figure if style='map' or a matplotlib figure if style='bar'

Module contents