Datafile
- datafile.get_filenames(files_path: str, date_range: Tuple[str, str], file_serial_number: str) Tuple[List[str], str, str]
Generates a list of .dat filenames in a directory matching a date range. Check if all files selected have the serial number provided and if there are any Sync files.
- Parameters:
files_path (str) – Directory path where the .dat files are located. For example: ‘/raw-data/iag/G2301_CFADS2502/DataLog_User’.
date_range (tuple of str or str) – Date range to filter filenames. Can be a tuple in the format (‘YYYY-MM-DD’, ‘YYYY-MM-DD’) or a string in the format ‘YYYY-MM’. If a tuple, it should contain two date strings (‘YYYY-MM-DD’), representing the start and end dates. If a string, it should be in ‘YYYY-MM’ format, representing an entire month.
file_serial_number (str) – Serial number of the CRDS being read as written in the input file. For example: ‘CFADS2502’. Be aware that it may be different from the instrument serial number.
- Returns:
A tuple containing: - A list of filenames that match the specified date range. - The start date as a string in the format ‘YYYY-MM-DD’. - The end date as a string in the format ‘YYYY-MM-DD’.
- Return type:
tuple of (list of str, str, str)
Notes
The parameters files_path and file_serial_number may be defined in the campaign config file.
- datafile.read_raw_data(files_path: str, date_range: Tuple[str, str], file_serial_number: str, usecols: List[str], dtype: Dict, species: bool | int = False) DataFrame
Reads data from .dat files in a directory matching a date range and serial number.
- Parameters:
files_path (str) – Directory path where the .dat files are located. For example: ‘/raw-data/iag/G2301_CFADS2502/DataLog_User’.
date_range (tuple of str or str) – Date range to filter filenames. Can be a tuple in the format (‘YYYY-MM-DD’, ‘YYYY-MM-DD’) or a string in the format ‘YYYY-MM’. If a tuple, it should contain two date strings (‘YYYY-MM-DD’), representing the start and end dates. If a string, it should be in ‘YYYY-MM’ format, representing an entire month.
file_serial_number (str) – Serial number of the CRDS being read as written in the input file. For example: ‘CFADS2502’. Be aware that it may be different from the instrument serial number.
usecols (list of str) – A list of column names to read from the .dat files.
dtype (dict) – A dictionary specifying the data type for each column.
species (bool or int, optional) – If False, no filtering by species is done. If an int, filter data by the specified species value. Default is False.
- Returns:
A DataFrame containing the data read from the .dat files.
- Return type:
pandas.DataFrame
Notes
The parameters files_path, file_serial_number, usecols, dtype and species may be defined in the campaign config file.
- datafile.save_dataset_level_0(df: DataFrame, global_attrs: dict, variable_attrs: dict, file_serial_number: str, path_to_save: str) None
Save a DataFrame as a level 0 NetCDF dataset with specified attributes.
- Parameters:
df (pd.DataFrame) – DataFrame to be converted to a NetCDF dataset.
global_attrs (dict) – Global attributes to add to the dataset.
variable_attrs (dict) – Variable-specific attributes to add to the dataset.
file_serial_number (str) – Serial number to include in the filename. Be aware that it may be different from the instrument serial number.
path_to_save (str) – Directory path where the file will be saved.
- Return type:
None
Notes
The parameters global_attrs, variable_attrs and file_serial_number may be defined in the campaign config file.