Introduction
This vignette shows how to do an assessment (mostly) following the approach taken in HELCOM HOLAS3.
The data were extracted from the ICES data base using the XHAT facilities on the ICES webservice. The data were extracted on 28 August 2023 and were filtered using is_helcom_area = TRUE anad maxYear = 2020. The data were subsequently reduced in size to make them more manageable for this example. The data have not been scrutinised by data assessors so their use, and the results below, must be treated as illustrative only; in particular, they should not be used for any formal reporting.
There are two exceptions to the HOLAS3 approach. First, imposex data are not assessed here - these have an additional level of complexity that will be explained in a subsequent vignette (not written yet). Second, the method for dealing with ‘initial’ data, unique to HELCOM, has not been implemented (this is not yet available in harsat).
We’ll begin with contaminants in water which are the simplest data to assess. We’ll then move on to sediment and biota which have more features to consider.
But before we get to any of that, we need to set up the environment.
First, load the harsat
library (let’s assume that you have
already installed it, as covered in the Getting
Started guide).
We set up our main directory to find the data files. If you use your
own data files, you will need to point to a directory containing a copy.
We usually do this by putting our data files in a directory
data
, and information files in a directory
information
, but you can use any directory for these.
working.directory <- 'C:/Users/robfr/Documents/HARSAT/HARSAT'
Water assessment
First, we use read_data()
to read in the contaminant
data, the station dictionary, and two important reference tables: the
determinand and threshold reference tables. There are several things to
note about the function call:
-
purpose = "HELCOM"
means that the assessment configuration is set to mirror that of the HOLAS3 assessment. You can change the assessment configuration in many ways using the control argument, but that is not explained here.
-
data_dir
identifies the directory storing the contaminant data (water.txt
) and station dictionary (stations.txt
). Usingfile.path
prevents any difficulties using forward or backward slashes when writing file paths. -
info_dir
identifies the directory storing the reference tables. These must be nameddeterminand.csv
andthresholds_water.csv
. The determinand table identifies which determinands are going to be assessed. If there are no thresholds, then there doesn’t need to be a file calledthresholds_water.csv
. - you don’t have to specify the extraction date, but it can help to keep track of things.
As well as reading in the contaminant data and station dictionary, the function matches each record in the contaminant data to a station in the station dictionary. The process for doing this is quite complicated (and it can take a few minutes to run) and we don’t go into details here.
Finally, a message is printed saying that the argument
max_year
is set to 2020. (This is as it should be since we
set maxYear
to be 2020 in the data extraction.) But an
important consequence is that a contaminant time series will only be
assessed if it has some data in the period 2015 to 2020 (i.e. in the
last six monitoring years).
water_data <- read_data(
compartment = "water",
purpose = "HELCOM",
contaminants = "water.txt",
stations = "stations.txt",
data_dir = file.path(working.directory, "data", "example_HELCOM"),
info_dir = file.path(working.directory, "information", "HELCOM_2023"),
extraction = "2023/08/23"
)
#> Found in path determinand.csv C:\Users\robfr\Documents\HARSAT\HARSAT\information\HELCOM_2023\determinand.csv
#> Found in path species.csv C:\Users\robfr\Documents\HARSAT\HARSAT\information\HELCOM_2023\species.csv
#> Found in path thresholds_water.csv C:\Users\robfr\Documents\HARSAT\HARSAT\information\HELCOM_2023\thresholds_water.csv
#> Found in package method_extraction.csv C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/method_extraction.csv
#> Found in package pivot_values.csv C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/pivot_values.csv
#> Found in package matrix.csv C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/matrix.csv
#> Found in package imposex.csv C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/imposex.csv
#> MD5 digest for: 'C:\Users\robfr\Documents\HARSAT\HARSAT\information\HELCOM_2023\determinand.csv': '4b48cbec9c71380f4b464779e643cab2'
#> MD5 digest for: 'C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/matrix.csv': '4b4fb3814bb84cfbf9b37f7b59d45eb9'
#> MD5 digest for: 'C:\Users\robfr\Documents\HARSAT\HARSAT\information\HELCOM_2023\thresholds_water.csv': '7e9487630022c11b0c3dd6d553a9955b'
#> Reading station dictionary from:
#> 'C:/Users/robfr/Documents/HARSAT/HARSAT/data/example_HELCOM/stations.txt'
#> MD5 digest for: 'C:/Users/robfr/Documents/HARSAT/HARSAT/data/example_HELCOM/stations.txt': 'd229a1c984d507537840e73080f3773c'
#>
#> Reading contaminant and effects data from:
#> 'C:/Users/robfr/Documents/HARSAT/HARSAT/data/example_HELCOM/water.txt'
#> MD5 digest for: 'C:/Users/robfr/Documents/HARSAT/HARSAT/data/example_HELCOM/water.txt': 'b18b0556f6f78378c6f0a77682f51988'
#>
#> Matching data with station dictionary
#> - restricting to stations in these convention areas: HELCOM
#> - no restriction of stations by data type
#> - no restriction of stations by purpose of monitoring
#> - no restriction of data by program governance
#> - no restriction of stations by program governance
#> - matching 10846 records by coordinates
#> - matching 0 records by station name
#> - 10746 of 10846 records have been matched to a station
#>
#> Argument max_year taken to be the maximum year in the data: 2020
We next simplify the data so that they are in the correct format for running the assessment. This also involves deleting some data that do not meet the conditions for the assessment.
Notice that a message appears about ‘oddities’. Both tidy_data and create_timeseries (the next function call) do a lot of checking of the data and strange values are written to the oddities folder for you to have a look at. This is in the hope that, if there are errors, they will get corrected and resubmitted to the ICES database. It turns out there are no strange value at this stage, but there are in the step that follows.
water_data <- tidy_data(water_data)
#>
#> Oddities will be written to 'oddities/water' with previous oddities backed up to
#> 'oddities/water_backup'
#>
#> Dropping 4071 records from data flagged for deletion. Possible reasons are:
#> - vflag = suspect
#> - upper depth >5.5m
#> - filtration missing
#> - matrix = 'SPM'
#>
#> Dropping 88 records from data that have no valid station code
#>
#> Dropping 13667 stations that are not associated with any data
#>
#> Cleaning station dictionary
#>
#> Cleaning contaminant and biological effects data
We now do some more data cleaning and then group the data into time series. Each time series consists of measurements of a single determinand at a single monitoring station. Measurements in filtered and unfiltered samples are split into different time series.
The determinands.control
argument in important to know
about since it allows related determinands to be processed in various
ways. Here, it is just used for PFOS and its linear and branched
components N-PFOS and BR-PFOS. PFOS measurements can be sumbitted as
PFOS (the sum of the two components) or as N-PFOS and BR-PFOs (the
individual components). The determinands.control
argument
below tells the code to sum records of N-PFOS and BR-PFOS from the same
sample and relabel them as PFOS. There are more complicated examples of
the use of determinands.control
in the sediment and biota
examples below.
water_timeseries <- create_timeseries(
water_data,
determinands.control = list(
PFOS = list(det = c("N-PFOS", "BR-PFOS"), action = "sum")
)
)
#>
#> Oddities will be written to 'oddities/water' with previous oddities backed up to
#> 'oddities/water_backup'
#>
#> Cleaning data
#> Dropping stations with no data between 2015 and 2020
#> Unexpected or missing values for 'basis': see basis_queries.csv
#> Unexpected or missing values for 'unit': see unit_queries.csv
#> Unexpected or missing values for 'value': see value_queries.csv
#> Non-positive detection limits: see non_positive_det_limits.csv
#> Censoring codes D and Q inconsistent with respective limits: see censoring_codes_inconsistent.csv
#> Detection limit higher than data: see detection_limit_high.csv
#>
#> Creating time series data
#> Converting data to appropriate basis for statistical analysis
#> Dropping groups of compounds / stations with no data between 2015 and 2020
If you want to see a list of all the time series, then you can run code along the following lines:
get_timeseries(water_timeseries) |> head(10)
#> series station_code station_name country determinand
#> 1 11241 CD filtered 11241 64A2 Lithuania CD
#> 2 11241 PB filtered 11241 64A2 Lithuania PB
#> 3 11241 PFOS unfiltered 11241 64A2 Lithuania PFOS
#> 4 11241 TBSN+ unfiltered 11241 64A2 Lithuania TBSN+
#> 5 12896 CD filtered 12896 AUDF Estonia CD
#> 6 12896 PB filtered 12896 AUDF Estonia PB
#> 7 12896 PFOS unfiltered 12896 AUDF Estonia PFOS
#> 8 12896 TBSN+ unfiltered 12896 AUDF Estonia TBSN+
#> 9 2144 CD filtered 2144 OMMVUW4 Germany CD
#> 10 2144 PB filtered 2144 OMMVUW4 Germany PB
#> filtration subseries basis unit
#> 1 filtered <NA> W ug/l
#> 2 filtered <NA> W ug/l
#> 3 unfiltered <NA> W ng/l
#> 4 unfiltered <NA> W ng/l
#> 5 filtered <NA> W ug/l
#> 6 filtered <NA> W ug/l
#> 7 unfiltered <NA> W ng/l
#> 8 unfiltered <NA> W ng/l
#> 9 filtered <NA> W ug/l
#> 10 filtered <NA> W ug/l
At last it is time to run the assessment. You need to specify which thresholds to use, otherwise the code will not use any of them. For water there is only the EQS, but you still need to specify it. Look at the threshold reference table to see what is available if you are unsure. Note that AC stands for Assessment Criteria which is what thresholds are often called. The parallel argument tells the code to use parallel processing. This usually speeds things up considerably. The assessment took about 1.5 minutes to run on my laptop.
water_assessment <- run_assessment(
water_timeseries,
AC = "EQS",
parallel = TRUE
)
#> Setting up clusters: be patient, it can take a while!
We now need to check whether there were any convergence issues. Lack of convergence only rarely occurs, but when it does it is typically because there are errors in the data (e.g. reported in incorrect units) or because there are outliers, so the best thing to do is first check your data. However, convergence can also be difficult if there are a lot of less-than measurements in the time series. Describing how to tweak the control arguments to get convergence is beyond the scope of this vignette (need to write another vignette to discuss this). Fortunately, there were no convergence issues here.
check_assessment(water_assessment)
#> All assessment models have converged
It is time to look at the results! We can plot the data for each time series along with the fitted model (see vignette for external data) or print a table giving summary information about the assessment of each time series. But first, we may need to create an output directory.
summary.dir <- file.path(working.directory, "output", "example_HELCOM")
if (!dir.exists(summary.dir)) {
dir.create(summary.dir, recursive = TRUE)
}
Now that we have an output directory, we can write the summary table to that directory. Of course, you can choose any directory to write to here, so long as you are sure it exists.
The code below prints out a CSV file giving summary information about the assessment of each time series. This includes:
- meta-data such as the monitoring location and number of years of data for each time series
- the fitted values in the last monitoring year with associated upper one-sided 95% confidence limits
- the trend assessments (p-values and trend estimates)
- the status assessments (if there any thresholds)
- (optionally) a symbology summarising the trend (shape) and status (colour) of each time series
This function is being actively developed and the function arguments are likely to evolve, so we’ll leave their explanation for the next release.
write_summary_table(
water_assessment,
determinandGroups = list(
levels = c("Metals", "Organotins", "Organofluorines"),
labels = c("Metals", "Organotins", "Organofluorines")
),
symbology = list(
colour = list(
below = c("EQS" = "green"),
above = c("EQS" = "red"),
none = "black"
)
),
collapse_AC = list(EAC = "EQS"),
output_dir = summary.dir
)
Sediment assessment
The sediment assessment is very similar, but has a few extra features related to normalisation (to account for differences in grain size) which are described below
sediment_data <- read_data(
compartment = "sediment",
purpose = "HELCOM",
contaminants = "sediment.txt",
stations = "stations.txt",
data_dir = file.path(working.directory, "data", "example_HELCOM"),
info_dir = file.path(working.directory, "information", "HELCOM_2023"),
extraction = "2023/08/23"
)
#> Found in path determinand.csv C:\Users\robfr\Documents\HARSAT\HARSAT\information\HELCOM_2023\determinand.csv
#> Found in path species.csv C:\Users\robfr\Documents\HARSAT\HARSAT\information\HELCOM_2023\species.csv
#> Found in path thresholds_sediment.csv C:\Users\robfr\Documents\HARSAT\HARSAT\information\HELCOM_2023\thresholds_sediment.csv
#> Found in package method_extraction.csv C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/method_extraction.csv
#> Found in package pivot_values.csv C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/pivot_values.csv
#> Found in package matrix.csv C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/matrix.csv
#> Found in package imposex.csv C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/imposex.csv
#> MD5 digest for: 'C:\Users\robfr\Documents\HARSAT\HARSAT\information\HELCOM_2023\determinand.csv': '4b48cbec9c71380f4b464779e643cab2'
#> MD5 digest for: 'C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/matrix.csv': '4b4fb3814bb84cfbf9b37f7b59d45eb9'
#> MD5 digest for: 'C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/method_extraction.csv': '28e38bdd0b9e735643c60026dcda8a78'
#> MD5 digest for: 'C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/pivot_values.csv': '23ca1799017bfea360d586b1a70bffd4'
#> MD5 digest for: 'C:\Users\robfr\Documents\HARSAT\HARSAT\information\HELCOM_2023\thresholds_sediment.csv': '41c686160bc8e0877477239eec0f0b1b'
#> Reading station dictionary from:
#> 'C:/Users/robfr/Documents/HARSAT/HARSAT/data/example_HELCOM/stations.txt'
#> MD5 digest for: 'C:/Users/robfr/Documents/HARSAT/HARSAT/data/example_HELCOM/stations.txt': 'd229a1c984d507537840e73080f3773c'
#>
#> Reading contaminant and effects data from:
#> 'C:/Users/robfr/Documents/HARSAT/HARSAT/data/example_HELCOM/sediment.txt'
#> MD5 digest for: 'C:/Users/robfr/Documents/HARSAT/HARSAT/data/example_HELCOM/sediment.txt': 'a5635836e9a69f3dd8be42d5078cad6b'
#>
#> Matching data with station dictionary
#> - restricting to stations in these convention areas: HELCOM
#> - no restriction of stations by data type
#> - no restriction of stations by purpose of monitoring
#> - no restriction of data by program governance
#> - no restriction of stations by program governance
#> - matching 6995 records by coordinates
#> - matching 9365 records by station name
#> - 15721 of 16360 records have been matched to a station
#>
#> Argument max_year taken to be the maximum year in the data: 2020
sediment_data <- tidy_data(sediment_data)
#>
#> Oddities will be written to 'oddities/sediment' with previous oddities backed up to
#> 'oddities/sediment_backup'
#>
#> Dropping 2176 records from data flagged for deletion. Possible reasons are:
#> - vflag = suspect
#> - upper depth > 0m
#> - unusual matrix
#>
#> Dropping 523 records from data that have no valid station code
#>
#> Dropping 13326 stations that are not associated with any data
#>
#> Cleaning station dictionary
#>
#> Cleaning contaminant and biological effects data
The sediment data are grouped into time series which consist of the
measurements of single determinand in a single matrix (the fraction the
sample has been sieved to; e.g. SED63
or
SEDTOT
) at a single monitoring station.
The create_timeseries()
call for sediment differs from
that for water in two ways. First, determinands.control
identifies two groups of determinands that need to be summed. Second,
the arguments normalise
and normalise.control
specify how the normalisation for grain size should be carried out.
There are default functions for normalisation that will work in many
cases. However, the process for HELCOM is more complicated (because
unlike other metals, copper is normalised to organic carbon rather than
aluminium) so a customised function
normalise_sediment_HELCOM()
is provided. The argument
normalise.control
specifies that metals (apart from copper)
will be normalised to 5% aluminium and copper and organics will be
normalised to 5% organic carbon. The normalise functions need a bit of
work, so expect them to change.
All contaminant time series in sediment are assessed on a dry weight
basis. The few measurements submitted on a wet weight basis are
converted to a dry weight basis using DRYWT%
supporting
information (also submitted with the data).
sediment_timeseries <- create_timeseries(
sediment_data,
determinands.control = list(
SBDE6 = list(
det = c("BDE28", "BDE47", "BDE99", "BD100", "BD153", "BD154"),
action = "sum"
),
HBCD = list(det = c("HBCDA", "HBCDB", "HBCDG"), action = "sum")
),
normalise = normalise_sediment_HELCOM,
normalise.control = list(
metals = list(method = "pivot", normaliser = "AL", value = 5),
copper = list(method = "hybrid", normaliser = "CORG", value = 5),
organics = list(method = "simple", normaliser = "CORG", value = 5)
)
)
#>
#> Oddities will be written to 'oddities/sediment' with previous oddities backed up to
#> 'oddities/sediment_backup'
#>
#> Cleaning data
#> Dropping stations with no data between 2015 and 2020
#> Dropping samples with only auxiliary variables
#> Relabelling matrix SED62 as SED63 and SED500, SED1000, SED2000 as SEDTOT
#> Unexpected or missing values for 'basis': see basis_queries.csv
#> Unexpected or missing values for 'unit': see unit_queries.csv
#> Unexpected or missing values for 'value': see value_queries.csv
#> Replicate measurements, only first retained: see replicate_measurements.csv
#> Non-positive detection limits: see non_positive_det_limits.csv
#> Non-positive quantification limits: see non_positive_quant_limits.csv
#> Limit of quantification less than limit of detection: see limits_inconsistent.csv
#> Censoring codes D and Q inconsistent with respective limits: see censoring_codes_inconsistent.csv
#> Detection limit higher than data: see detection_limit_high.csv
#> Implausible uncertainties reported with data: see implausible_uncertainties_reported.csv
#> Data submitted as BDE28, BDE47, BDE99, BD100, BD153, BD154 summed to give
#> SBDE6
#> 61 of 124 samples lost due to incomplete submissions
#> Data submitted as HBCDA, HBCDB, HBCDG summed to give HBCD
#>
#> Creating time series data
#> Converting data to appropriate basis for statistical analysis
#> Losing 6 out of 3282 records in basis conversion due to missing, censored
#> or zero drywt or lipidwt values.
#> Normalising copper to CORG using pivot values
#> Removing sediment data where normaliser is a less than
#> Normalising metals to AL using pivot values
#> Normalising organics to 5% CORG
#> Removing sediment data where normaliser is a less than
#> Implausible uncertainties calculated in data processing: see implausible_uncertainties_calculated.csv
#> Dropping groups of compounds / stations with no data between 2015 and 2020
Now run the assessment. Again there is only one threshold, the EQS. This only takes about a minute to run on my laptop.
sediment_assessment <- run_assessment(
sediment_timeseries,
AC = "EQS",
parallel = TRUE
)
#> Setting up clusters: be patient, it can take a while!
Everything has converged.
check_assessment(sediment_assessment)
#> All assessment models have converged
Finally, we can plot individual time series assessments (see vignette for external data) or print out the summary table
write_summary_table(
sediment_assessment,
determinandGroups = webGroups <- list(
levels = c("Metals", "Organotins", "PAH_parent", "PBDEs", "Organobromines"),
labels = c(
"Metals", "Organotins", "Polycyclic aromatic hydrocarbons",
"Organobromines", "Organobromines"
)
),
symbology = list(
colour = list(
below = c("EQS" = "green"),
above = c("EQS" = "red"),
none = "black"
)
),
collapse_AC = list(EAC = "EQS"),
output_dir = summary.dir
)
Biota assessment
The main difference in the biota assessment is the inclusion of effects data. This example has some PAH metabolite time series, but all imposex data have been excluded to keep things relatively simple. Imposex assessments have an additional modelling stage and this will be described in another vignette (not yet available).
The first two stages are just as before
biota_data <- read_data(
compartment = "biota",
purpose = "HELCOM",
contaminants = "biota.txt",
stations = "stations.txt",
data_dir = file.path(working.directory, "data", "example_HELCOM"),
info_dir = file.path(working.directory, "information", "HELCOM_2023"),
extraction = "2023/08/23"
)
#> Found in path determinand.csv C:\Users\robfr\Documents\HARSAT\HARSAT\information\HELCOM_2023\determinand.csv
#> Found in path species.csv C:\Users\robfr\Documents\HARSAT\HARSAT\information\HELCOM_2023\species.csv
#> Found in path thresholds_biota.csv C:\Users\robfr\Documents\HARSAT\HARSAT\information\HELCOM_2023\thresholds_biota.csv
#> Found in package method_extraction.csv C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/method_extraction.csv
#> Found in package pivot_values.csv C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/pivot_values.csv
#> Found in package matrix.csv C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/matrix.csv
#> Found in package imposex.csv C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/imposex.csv
#> MD5 digest for: 'C:\Users\robfr\Documents\HARSAT\HARSAT\information\HELCOM_2023\determinand.csv': '4b48cbec9c71380f4b464779e643cab2'
#> MD5 digest for: 'C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/matrix.csv': '4b4fb3814bb84cfbf9b37f7b59d45eb9'
#> MD5 digest for: 'C:\Users\robfr\Documents\HARSAT\HARSAT\information\HELCOM_2023\species.csv': '769328e51065226809c91944b6d8fe79'
#> MD5 digest for: 'C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/imposex.csv': 'b602a882d4783085c896bcf130c8f848'
#> MD5 digest for: 'C:\Users\robfr\Documents\HARSAT\HARSAT\information\HELCOM_2023\thresholds_biota.csv': '9af82cd9730c0b135edd4a003724e8a6'
#> Reading station dictionary from:
#> 'C:/Users/robfr/Documents/HARSAT/HARSAT/data/example_HELCOM/stations.txt'
#> MD5 digest for: 'C:/Users/robfr/Documents/HARSAT/HARSAT/data/example_HELCOM/stations.txt': 'd229a1c984d507537840e73080f3773c'
#>
#> Reading contaminant and effects data from:
#> 'C:/Users/robfr/Documents/HARSAT/HARSAT/data/example_HELCOM/biota.txt'
#> MD5 digest for: 'C:/Users/robfr/Documents/HARSAT/HARSAT/data/example_HELCOM/biota.txt': '0a1a33c4e668e63c97a6d50cdc644d22'
#>
#> Matching data with station dictionary
#> - restricting to stations in these convention areas: HELCOM
#> - no restriction of stations by data type
#> - no restriction of stations by purpose of monitoring
#> - no restriction of data by program governance
#> - no restriction of stations by program governance
#> - matching 8644 records by coordinates
#> - matching 19893 records by station name
#> - 28437 of 28537 records have been matched to a station
#>
#> Argument max_year taken to be the maximum year in the data: 2020
biota_data <- tidy_data(biota_data)
#>
#> Oddities will be written to 'oddities/biota' with previous oddities backed up to
#> 'oddities/biota_backup'
#>
#> Dropping 32 records from data flagged for deletion. Possible reasons are:
#> - vflag = suspect
#> - species missing
#>
#> Dropping 99 records from data that have no valid station code
#>
#> Dropping 13664 stations that are not associated with any data
#>
#> Cleaning station dictionary
#>
#> Cleaning contaminant and biological effects data
The construction of the time series has a few more features. However, first we need to provide the individual TEQs to allow the construction of the WHO TEQ for dioxins, furans and planar PCBS (labelled TEQDFP). These are the values for the human health QS. This stage won’t be necessary in later releases.
The biota data are grouped into time series which consist of the
measurements of a single determinand in a single matrix (tissue type;
e.g. ‘EH’,
LI' or
SB) in a single species at a single monitoring station. PAH metabolite data are further grouped by
method_analysis`.
The determinands.control
argument does rather more here.
There are five summed variables: PFOS, SBDE6, HBCD, SCB6 and TEQDFP.
There is also one variable CB138+163 which needs to be relabeled as
(replaced by) CB138. For the purposes of the assessment, the
contribution of CB163 is regarded as small. Similarly CB138+163 is taken
to be a good proxy for CB138. Note that the replacements must be done
before the six PCBs are summed to give SCB6 in order for them to be
included in the sum.
The calculation of TEQDFP is more complex than that of the other
summed variables. TEQDFP is the label used for the World Health
Organisation Toxic Equivalent sum for dioxins, furans and planar
polychlorinated biphenyls. This is calculated using the Toxic
Equivalency Factors (TEFs) stored in info_TEQ$HOLAS3
, with
the TEFS identified as weights
. Note that the weights used
in the example below are those used in the HELCOM HOLAS 3 assessment,
but have been superseeded; see help("info_TEQ")
for more
details.
There is also one ‘bespoke’ action in
determinands.control
. This triggers a customised functions
that does more complicated things. Here it deals with the
three different ways in which lipid weight measurements can be
submitted.
Finally, normalise_biota_HELCOM()
is a customised
function that determines which measurements are normalised to 5% lipid
in a HELCOM assessment. Again, the normalisation functions are under
active development and might well change before the next release.
One thing that is not obvious from the function call is the choice of
basis. By default, contaminant time series in biota are assessed on a
wet weight basis, with measurements submitted on a dry or lipid weight
basis transformed to a wet weight basis using supporting
DRYWT%
and LIPIDWT%
information. Look at the
OSPAR or external data examples to see how the choice of basis can be
changed.
biota_timeseries <- create_timeseries(
biota_data,
determinands.control = list(
PFOS = list(det = c("N-PFOS", "BR-PFOS"), action = "sum"),
SBDE6 = list(
det = c("BDE28", "BDE47", "BDE99", "BD100", "BD153", "BD154"),
action = "sum"
),
HBCD = list(det = c("HBCDA", "HBCDB", "HBCDG"), action = "sum"),
CB138 = list(det = "CB138+163", action = "replace"),
SCB6 = list(
det = c("CB28", "CB52", "CB101", "CB138", "CB153", "CB180"),
action = "sum"
),
TEQDFP = list(
det = names(info_TEF$DFP_HOLAS3),
action = "sum",
weights = info_TEF$DFP_HOLAS3
),
"LIPIDWT%" = list(det = c("EXLIP%", "FATWT%"), action = "bespoke")
),
normalise = normalise_biota_HELCOM,
normalise.control = list(
lipid = list(method = "simple", value = 5),
other = list(method = "none")
)
)
#>
#> Oddities will be written to 'oddities/biota' with previous oddities backed up to
#> 'oddities/biota_backup'
#>
#> Cleaning data
#> Dropping stations with no data between 2015 and 2020
#> Unexpected or missing values for 'species_group': see species_group_queries.csv
#> Unexpected or missing values for 'matrix': see matrix_queries.csv
#> Unexpected or missing values for 'basis': see basis_queries.csv
#> Bile metabolite units changed from 'ng/g' to 'ng/ml' and from 'ug/kg' to 'ug/l'
#> Unexpected or missing values for 'unit': see unit_queries.csv
#> Unexpected or missing values for 'value': see value_queries.csv
#> Replicate measurements, only first retained: see replicate_measurements.csv
#> Non-positive detection limits: see non_positive_det_limits.csv
#> Non-positive quantification limits: see non_positive_quant_limits.csv
#> Censoring codes D and Q inconsistent with respective limits: see censoring_codes_inconsistent.csv
#> Detection limit higher than data: see detection_limit_high.csv
#> Implausible uncertainties reported with data: see implausible_uncertainties_reported.csv
#> Data submitted as BDE28, BDE47, BDE99, BD100, BD153, BD154 summed to give
#> SBDE6
#> 257 of 497 samples lost due to incomplete submissions
#> Data submitted as HBCDA, HBCDB, HBCDG summed to give HBCD
#> 5 of 42 samples lost due to incomplete submissions
#> Data submitted as CB138+163 relabelled as CB138
#> Data submitted as CB28, CB52, CB101, CB138, CB153, CB180 summed to give SCB6
#> 22 of 925 samples lost due to incomplete submissions
#> Data submitted as
#> CB77, CB81, CB105, CB118, CB126, CB156, CB157, CB167, CB169, CDD1N, CDD4X, CDD6P, CDD6X, CDD9X, CDDO, TCDD, CDF2N, CDF2T, CDF4X, CDF6P, CDF6X, CDF9P, CDF9X, CDFO, CDFP2, CDFX1
#> summed to give TEQDFP
#> 907 of 945 samples lost due to incomplete submissions
#> Data submitted as EXLIP% or FATWT% relabelled as LIPIDWT%
#>
#> Creating time series data
#> Converting data to appropriate basis for statistical analysis
#> Losing 63 out of 5737 records in basis conversion due to missing, censored
#> or zero drywt or lipidwt values.
#> Normalising lipid to 5%
#> No normalisation for other
#> Dropping groups of compounds / stations with no data between 2015 and 2020
The asssessment took about 4.3 minutes on my laptop
biota_assessment <- run_assessment(
biota_timeseries,
AC = c("BAC", "EAC", "EQS", "MPC"),
parallel = TRUE
)
#> Setting up clusters: be patient, it can take a while!
One time series has not converged. (The parameter estimates are fine, but the standard errors are implausibly tight - if you look at the data, you will see why the routines struggle with this time series.) The code below tweaks the arguments of the numerical differencing routine that calculates the standard errors. Dealing with non-converged timeseries is a topic for a future vignette.
check_assessment(biota_assessment)
#> The following assessment models have not converged:
#> 2299 PYR1OH Limanda limanda BI HPLC-FD
biota_assessment <- update_assessment(
biota_assessment,
subset = series == "2299 PYR1OH Limanda limanda BI HPLC-FD",
hess.d = 0.0001, hess.r = 8
)
#>
#> assessing series: station_code 2299; determinand PYR1OH; species Limanda limanda; matrix BI; method_analysis HPLC-FD; unit ng/ml
check_assessment(biota_assessment)
#> All assessment models have converged
And that’s it :)
write_summary_table(
biota_assessment,
determinandGroups = list(
levels = c(
"Metals", "PAH_parent", "Metabolites", "PBDEs", "Organobromines",
"Organofluorines", "Chlorobiphenyls", "Dioxins"
),
labels = c(
"Metals", "PAH compounds and metabolites", "PAH compounds and metabolites",
"Organobromines", "Organobromines", "Organofluorines",
"PCBs and dioxins", "PCBs and dioxins"
)
),
symbology = list(
colour = list(
below = c("EQS" = "green"),
above = c("EQS" = "red"),
none = "black"
)
),
collapse_AC = list(EAC = c("EAC", "EQS", "MPC")),
output_dir = summary.dir
)