Reads in contaminant and effects data, the station dictionary and various reference tables. For data from the ICES webservice, it matches data to stations in the station dictionary. It also allows the user to set control parameters that dictate the assessment process.
Usage
read_data(
compartment = c("biota", "sediment", "water"),
purpose = c("OSPAR", "HELCOM", "AMAP", "custom"),
contaminants,
stations,
data_dir = ".",
data_format = c("ICES", "external"),
info_files = list(),
info_dir = ".",
extraction = NULL,
max_year = NULL,
oddity_dir = "oddities",
control = list()
)
Arguments
- compartment
A string:
"biota"
,"sediment"
or"water"
- purpose
A string specifying whether to use the default set up for
"OSPAR"
,"HELCOM"
, or"AMAP"
or to use a customised setup"custom"
- contaminants
A file reference for the contaminant data
- stations
A file reference for the station data
- data_dir
The directory where the data files can be found (sometimes supplied using 'file.path'). Defaults to "."; i.e. the working directory.
- data_format
A string specifying whether the data were extracted from the ICES webservice (
"ICES"
- the default) or are in the simplified format designed for other data sources ("external"
).- info_files
A list of files specifying reference tables which override the defaults. See examples.
- info_dir
The directory where the reference tables can be found (sometimes supplied using 'file.path'). Defaults to "."; i.e. the working directory
- extraction
A date saying when the extraction was made. Optional. This should be provided according to ISO 8601; for example, 29 February 2024 should be supplied as "2024-02-29". If the contaminant data were extracted from the ICES webservice and the download file name has not been changed, the extraction data will be taken from the contaminant file name.
- max_year
An integer giving the last monitoring year that should be included in the assessment. Data from monitoring years after
max_year
will be deleted. If not specifiedmax_year
is taken to be the last monitoring year in the contaminant data file.- oddity_dir
The directory where the 'oddities' will be written (sometimes supplied using 'file.path'). This directory (and subdirectories) will be created if it does not already exist.
- control
A list of control parameters that override the default values used to run the assessment. These include the reporting window; the way in which data are matched to stations following an ICES extraction; information about reporting regions, and so on. See Details.
Value
A list with the following components:
call
The function call.info
A list containing the reference tables and the control parameters.data
A data frame containing the contaminant (and effects) data. Forexternal
data, this is identical to the input data file apart from some extra empty columns which have been added. ForICES
data, some existing columns have been renamed (otherwise they are untouched) and some additional columns have been constructed. The key ones of these are:station_code
the code of the station in the station dictionary that best matches the datastation_name
the name of the stationspecies
(biota) the species based onworms_accepted_name
where available andspeci_name
otherwisefiltration
(water) whether the sample wasfiltered
orunfiltered
based onmethod_pretreatment
retain
a logical indicating whether each record would have been retained under the previous ICES extraction protocol. For example,retain
will beFALSE
if the vflag entry is"S"
or suspect. Records for whichretain == FALSE
are deleted later intidy_data
stations
Details
Control parameters
Many aspects of the assessment process can be controlled using parameters
which are stored in the info
component of the harsat data object. The
default control values can be overwritten using the control
argument.
reporting_window
A scalar (default 6) which determines whether timeseries are excluded because they have no 'recent' data. Formally, timeseries are excluded if they have no data in the periodmax_year - reporting_window + 1
andmax_year
, so the default approach is to exclude timeseries if they have no dat in the most recent six monitoring years. The value of 6 is chosen to match with Marine Strategy Framework Directive reporting periods.region
add_stations
bivalve_spawning_season
use_stage
relative_uncertainty
auxiliary
A list which allows flexibility in the treatment of auxiliary variables. At present, there is just one componentby_matrix
, a character vector that determines which auxiliary variables are matched to the contaminant data bysample
andmatrix
as opposed to justsample
. For sediment and water, the default isall
; i.e. all variables are matched bysample
andmatrix
. This ensures, for example, that sediment normalisers such as aluminium and organic carbon content are matched to chemical measurements in the same grain fraction. For biota, the default isc("DRYWT%", "LIPIDWT%)
, so these variables are matched bysample
andmatrix
and all other variables (e.g. LNMEA or %FEMALEPOP) are matched bysample
. Thus, dry weight and lipid weight contents are matched to chemical measurements in the same tissue. However, mean length (which is usually the lenght of the whole organism) is matched to all tissue types.
External data
If data_format = "external"
, a simplified data and station file can
be supplied. See vignette("external-file-format")
for details.