Datafile

datafile.get_filenames(files_path: str, date_range: Tuple[str, str], file_serial_number: str) Tuple[List[str], str, str]

Generates a list of .dat filenames in a directory matching a date range. Check if all files selected have the serial number provided and if there are any Sync files.

Parameters:
  • files_path (str) – Directory path where the .dat files are located. For example: ‘/raw-data/iag/G2301_CFADS2502/DataLog_User’.

  • date_range (tuple of str or str) – Date range to filter filenames. Can be a tuple in the format (‘YYYY-MM-DD’, ‘YYYY-MM-DD’) or a string in the format ‘YYYY-MM’. If a tuple, it should contain two date strings (‘YYYY-MM-DD’), representing the start and end dates. If a string, it should be in ‘YYYY-MM’ format, representing an entire month.

  • file_serial_number (str) – Serial number of the CRDS being read as written in the input file. For example: ‘CFADS2502’. Be aware that it may be different from the instrument serial number.

Returns:

A tuple containing: - A list of filenames that match the specified date range. - The start date as a string in the format ‘YYYY-MM-DD’. - The end date as a string in the format ‘YYYY-MM-DD’.

Return type:

tuple of (list of str, str, str)

Notes

  • The parameters files_path and file_serial_number may be defined in the campaign config file.

datafile.read_raw_data(files_path: str, date_range: Tuple[str, str], file_serial_number: str, usecols: List[str], dtype: Dict, species: bool | int = False) DataFrame

Reads data from .dat files in a directory matching a date range and serial number.

Parameters:
  • files_path (str) – Directory path where the .dat files are located. For example: ‘/raw-data/iag/G2301_CFADS2502/DataLog_User’.

  • date_range (tuple of str or str) – Date range to filter filenames. Can be a tuple in the format (‘YYYY-MM-DD’, ‘YYYY-MM-DD’) or a string in the format ‘YYYY-MM’. If a tuple, it should contain two date strings (‘YYYY-MM-DD’), representing the start and end dates. If a string, it should be in ‘YYYY-MM’ format, representing an entire month.

  • file_serial_number (str) – Serial number of the CRDS being read as written in the input file. For example: ‘CFADS2502’. Be aware that it may be different from the instrument serial number.

  • usecols (list of str) – A list of column names to read from the .dat files.

  • dtype (dict) – A dictionary specifying the data type for each column.

  • species (bool or int, optional) – If False, no filtering by species is done. If an int, filter data by the specified species value. Default is False.

Returns:

A DataFrame containing the data read from the .dat files.

Return type:

pandas.DataFrame

Notes

  • The parameters files_path, file_serial_number, usecols, dtype and species may be defined in the campaign config file.

datafile.save_dataset_level_0(df: DataFrame, global_attrs: dict, variable_attrs: dict, file_serial_number: str, path_to_save: str) None

Save a DataFrame as a level 0 NetCDF dataset with specified attributes.

Parameters:
  • df (pd.DataFrame) – DataFrame to be converted to a NetCDF dataset.

  • global_attrs (dict) – Global attributes to add to the dataset.

  • variable_attrs (dict) – Variable-specific attributes to add to the dataset.

  • file_serial_number (str) – Serial number to include in the filename. Be aware that it may be different from the instrument serial number.

  • path_to_save (str) – Directory path where the file will be saved.

Return type:

None

Notes

  • The parameters global_attrs, variable_attrs and file_serial_number may be defined in the campaign config file.