Reads in contaminant and effects data, the station dictionary and various reference tables. For data from the ICES webservice, it matches data to stations in the station dictionary. It also allows the user to set control parameters that dictate the assessment process.
Usage
read_data(
compartment = c("biota", "sediment", "water"),
purpose = c("OSPAR", "HELCOM", "AMAP", "custom"),
contaminants,
stations,
data_dir = ".",
data_format = c("ICES", "external"),
info_files = list(),
info_dir = ".",
extraction = NULL,
max_year = NULL,
oddity_dir = "oddities",
control = list()
)Arguments
- compartment
A string:
"biota","sediment"or"water"- purpose
A string specifying whether to use the default set up for
"OSPAR","HELCOM", or"AMAP"or to use a customised setup"custom"- contaminants
A file reference for the contaminant data
- stations
A file reference for the station data
- data_dir
The directory where the data files can be found (sometimes supplied using 'file.path'). Defaults to "."; i.e. the working directory.
- data_format
A string specifying whether the data were extracted from the ICES webservice (
"ICES"- the default) or are in the simplified format designed for other data sources ("external").- info_files
A list of files specifying reference tables which override the defaults. See examples.
- info_dir
The directory where the reference tables can be found (sometimes supplied using 'file.path'). Defaults to "."; i.e. the working directory
- extraction
A date saying when the extraction was made. Optional. This should be provided according to ISO 8601; for example, 29 February 2024 should be supplied as "2024-02-29". If the contaminant data were extracted from the ICES webservice and the download file name has not been changed, the extraction data will be taken from the contaminant file name.
- max_year
An integer giving the last monitoring year that should be included in the assessment. Data from monitoring years after
max_yearwill be deleted. If not specifiedmax_yearis taken to be the last monitoring year in the contaminant data file.- oddity_dir
The directory where the 'oddities' will be written (sometimes supplied using 'file.path'). This directory (and subdirectories) will be created if it does not already exist.
- control
A list of control parameters that override the default values used to run the assessment. These include the reporting window; the way in which data are matched to stations following an ICES extraction; information about reporting regions, and so on. See Details.
Value
A list with the following components:
callThe function call.infoA list containing the reference tables and the control parameters.dataA data frame containing the contaminant (and effects) data. Forexternaldata, this is identical to the input data file apart from some extra empty columns which have been added. ForICESdata, some existing columns have been renamed (otherwise they are untouched) and some additional columns have been constructed. The key ones of these are:station_codethe code of the station in the station dictionary that best matches the datastation_namethe name of the stationspecies(biota) the species based onworms_accepted_namewhere available andspeci_nameotherwisefiltration(water) whether the sample wasfilteredorunfilteredbased onmethod_pretreatmentretaina logical indicating whether each record would have been retained under the previous ICES extraction protocol. For example,retainwill beFALSEif the vflag entry is"S"or suspect. Records for whichretain == FALSEare deleted later intidy_data
stations
Details
Control parameters
Many aspects of the assessment process can be controlled using parameters
which are stored in the info component of the harsat data object. The
default control values can be overwritten using the control argument.
reporting_windowA scalar (default 6) which determines whether timeseries are excluded because they have no 'recent' data. Formally, timeseries are excluded if they have no data in the periodmax_year - reporting_window + 1andmax_year, so the default approach is to exclude timeseries if they have no dat in the most recent six monitoring years. The value of 6 is chosen to match with Marine Strategy Framework Directive reporting periods.regionadd_stationsbivalve_spawning_seasonuse_stagerelative_uncertaintyauxiliaryA list which allows flexibility in the treatment of auxiliary variables. At present, there is just one componentby_matrix, a character vector that determines which auxiliary variables are matched to the contaminant data bysampleandmatrixas opposed to justsample. For sediment and water, the default isall; i.e. all variables are matched bysampleandmatrix. This ensures, for example, that sediment normalisers such as aluminium and organic carbon content are matched to chemical measurements in the same grain fraction. For biota, the default isc("DRYWT%", "LIPIDWT%), so these variables are matched bysampleandmatrixand all other variables (e.g. LNMEA or %FEMALEPOP) are matched bysample. Thus, dry weight and lipid weight contents are matched to chemical measurements in the same tissue. However, mean length (which is usually the lenght of the whole organism) is matched to all tissue types.
External data
If data_format = "external", a simplified data and station file can
be supplied. See vignette("external-file-format") for details.