Skip to contents

This is a brief example of using the harsat with external data.

First, load the harsat library

We set up our main directory to find the data files. If you use your own data files, you will need to point to a directory containing a copy. We usually do this by putting our data files in a directory data, and information files in a directory information, but you can use any directory for these.

working.directory <- 'C:/Users/robfr/Documents/HARSAT/HARSAT'

Read data

Contaminant data with supporting variables and station dictionary

biota_data <- read_data(
  compartment = "biota",
  purpose = "AMAP",
  contaminants = "EXTERNAL_FO_PW_DATA.csv",   # NB, replace biota data filename above, as appropriate 
  stations = "EXTERNAL_AMAP_STATIONS.csv",    # NB, replace station data filename above, as appropriate 
  data_dir = file.path(working.directory, "data", "example_external_data"),
  data_format = "external",
  info_dir = file.path(working.directory, "information", "AMAP"),
)
#> Found in path determinand.csv C:\Users\robfr\Documents\HARSAT\HARSAT\information\AMAP\determinand.csv 
#> Found in path species.csv C:\Users\robfr\Documents\HARSAT\HARSAT\information\AMAP\species.csv 
#> Found in path thresholds_biota.csv C:\Users\robfr\Documents\HARSAT\HARSAT\information\AMAP\thresholds_biota.csv 
#> Found in package method_extraction.csv C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/method_extraction.csv 
#> Found in package pivot_values.csv C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/pivot_values.csv 
#> Found in package matrix.csv C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/matrix.csv 
#> Found in package imposex.csv C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/imposex.csv 
#> MD5 digest for: 'C:\Users\robfr\Documents\HARSAT\HARSAT\information\AMAP\determinand.csv': '80bca84d428856c93e89c52aebf8b144'
#> MD5 digest for: 'C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/matrix.csv': '4b4fb3814bb84cfbf9b37f7b59d45eb9'
#> MD5 digest for: 'C:\Users\robfr\Documents\HARSAT\HARSAT\information\AMAP\species.csv': '1aba5ace8155923ed18bd9d0b414e48e'
#> MD5 digest for: 'C:/Users/robfr/Documents/HARSAT/HARSAT/inst/information/imposex.csv': 'b602a882d4783085c896bcf130c8f848'
#> MD5 digest for: 'C:\Users\robfr\Documents\HARSAT\HARSAT\information\AMAP\thresholds_biota.csv': 'a6d82623b8968910b59c1308a646e8a8'
#> Reading station dictionary from:
#>  'C:/Users/robfr/Documents/HARSAT/HARSAT/data/example_external_data/EXTERNAL_AMAP_STATIONS.csv'
#> MD5 digest for: 'C:/Users/robfr/Documents/HARSAT/HARSAT/data/example_external_data/EXTERNAL_AMAP_STATIONS.csv': '91e7eb7661ce43b02c68cc81153ac3d7'
#> 
#> Reading contaminant and effects data from:
#>  'C:/Users/robfr/Documents/HARSAT/HARSAT/data/example_external_data/EXTERNAL_FO_PW_DATA.csv'
#> MD5 digest for: 'C:/Users/robfr/Documents/HARSAT/HARSAT/data/example_external_data/EXTERNAL_FO_PW_DATA.csv': '9ade644c5bdb561f0c3bf4a3560a2c15'
#> 
#> Argument max_year taken to be the maximum year in the data: 2020

Note: in the above function, replace ‘control = list()’ with ‘control = list(use_stage = TRUE)’ to include subseries in ICES format data runs; for ‘external’ format runs, if the input data file includes a column named ‘subseries’, the data in this column will be used to define subseries, to exclude subseries runs this column can be omitted or renamed e.g. to ‘nosubseries’

Prepare data for next stage

Get correct variables and streamline the data files

biota_data <- tidy_data(biota_data)
#> 
#> Oddities will be written to 'oddities/biota' with previous oddities backed up to
#>  'oddities/biota_backup'
#> 
#> Dropping 54 stations that are not associated with any data
#> 
#> Cleaning station dictionary
#> 
#> Cleaning contaminant and biological effects data

Construct timeseries

For each timeseries, use the basis which is reported most often in the data

biota_timeseries <- create_timeseries(
  biota_data,
  determinands = ctsm_get_determinands(biota_data$info),
  determinands.control = NULL,
  oddity_path = "oddities",
  return_early = FALSE,
  print_code_warnings = FALSE,
  get_basis = get_basis_most_common,
  normalise = FALSE,
  normalise.control = list()
)
#> 
#> Oddities will be written to 'oddities/biota' with previous oddities backed up to
#>  'oddities/biota_backup'
#> 
#> Cleaning data
#>    Dropping stations with no data between 2015 and 2020
#>    Replicate measurements, only first retained: see replicate_measurements.csv
#> 
#> Creating time series data
#>    Converting data to appropriate basis for statistical analysis
#>    Missing uncertainties which cannot be imputed: deleted data in 'missing_uncertainties.csv
#>    Dropping groups of compounds / stations with no data between 2015 and 2020

Assessment

Main runs

Note: the control argument specifies that the post-hoc power metrics will be based on a power of 80% and an annual percentage increase in concentration of 10%.

biota_assessment <- run_assessment(
  biota_timeseries,
  subset = NULL,
  AC = NULL,
  get_AC_fn = NULL,
  recent_trend = 20L,
  parallel = FALSE, 
  extra_data = NULL,
  control = list(power = list(target_power = 80, target_trend = 10)) 
)
#> 
#> assessing series:  station_code A902; determinand BD100; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand BD153; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand BD154; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand BD183; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand BDE28; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand BDE47; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand BDE66; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand BDE85; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand BDE99; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand CB101; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand CB105; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand CB118; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand CB128; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand CB138; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand CB153; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand CB156; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand CB170; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand CB180; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand CB183; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand CB187; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand CB28; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand CB52; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand CCDAN; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand CD; species Globicephala melas; matrix KI; subseries AD; basis W; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand CD; species Globicephala melas; matrix LI; subseries AD; basis W; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand CD; species Globicephala melas; matrix LI; subseries JV; basis W; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand CNONC; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand DDEOP; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand DDEPP; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand DDTOP; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand DDTPP; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand HBCDA; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand HBCDB; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand HBCDG; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand HCHB; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand HG; species Globicephala melas; matrix LI; subseries AD; basis W; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand HG; species Globicephala melas; matrix LI; subseries JV; basis W; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand HG; species Globicephala melas; matrix MU; subseries AD; basis W; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand HG; species Globicephala melas; matrix MU; subseries JV; basis W; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand OCDAN; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand PFDA; species Globicephala melas; matrix BB; subseries JV; basis W; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand PFDA; species Globicephala melas; matrix LI; subseries JV; basis W; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand PFDOA; species Globicephala melas; matrix BB; subseries JV; basis W; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand PFDOA; species Globicephala melas; matrix LI; subseries JV; basis W; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand PFDS; species Globicephala melas; matrix BB; subseries JV; basis W; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand PFHXS; species Globicephala melas; matrix BB; subseries JV; basis W; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand PFHXS; species Globicephala melas; matrix LI; subseries JV; basis W; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand PFNA; species Globicephala melas; matrix BB; subseries JV; basis W; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand PFNA; species Globicephala melas; matrix LI; subseries JV; basis W; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand PFOA; species Globicephala melas; matrix BB; subseries JV; basis W; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand PFOA; species Globicephala melas; matrix LI; subseries JV; basis W; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand PFOS; species Globicephala melas; matrix BB; subseries JV; basis W; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand PFOS; species Globicephala melas; matrix LI; subseries JV; basis W; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand SE; species Globicephala melas; matrix LI; subseries AD; basis W; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand SE; species Globicephala melas; matrix LI; subseries JV; basis W; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand SE; species Globicephala melas; matrix MU; subseries AD; basis W; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand SE; species Globicephala melas; matrix MU; subseries JV; basis W; unit ug/kg 
#> 
#> assessing series:  station_code A902; determinand TCDAN; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand TDEOP; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand TDEPP; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> 
#> assessing series:  station_code A902; determinand TNONC; species Globicephala melas; matrix BB; subseries JV; basis L; unit ug/kg
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')

Use the code below if it takes a long time to run

biota_assessment <- run_assessment(
  biota_timeseries,
  subset = NULL,
  AC = NULL,
  get_AC_fn = NULL,
  recent_trend = 20L,
  parallel = TRUE,
  extra_data = NULL,
  control = list(power = list(target_power = 80, target_trend = 10)) 
)

Check convergence

check_assessment(biota_assessment, save_result = FALSE)
#> All assessment models have converged

Summary files

This writes the summary data to a file in output/example_external_data. The argument extra_output = "power" ensures that the power metrics for lognormally distributed data will be exported.

summary.dir <- file.path(working.directory, "output", "example_external_data")

if (!dir.exists(summary.dir)) {
  dir.create(summary.dir, recursive = TRUE)
} 

write_summary_table(
  biota_assessment,
  output_file = "biota-FO-PW-test-output.csv",   # NB, file will be overwritten so change name as appropriate to retain results
  output_dir = summary.dir,
  export = TRUE,
  determinandGroups = NULL,
  symbology = NULL,
  collapse_AC = NULL, 
  extra_output = "power"
)

Graphics output

Plots the fitted trend with pointwise 90% confidence bands and either the data (file_type = "data") or the annual index (file_type = "index") or plots the raw data with key auxiliary variables (file_type = "auxiliary"). The default is to plot all three.

The plot function writes output to output_dir directory (must exist); with these settings all plots are output as .png formatted graphics; function can be omitted to avoid graphical output; changing “png” to “pdf” in above statement will output graphics in PDF (vector) format.

Can subset assessment based on variables in either timeSeries or stations components of object: commonly by determinand, matrix, species, station_code or station_name; can also use the series identifier in row.names(timeSeries) if subset is NULL (default), all timeseries are plotted (can take some time)

Graphics plots are written to files in output/graphics.

graphics.dir <- file.path(working.directory, "output", "graphics")

if (!dir.exists(graphics.dir)) {
  dir.create(graphics.dir, recursive = TRUE)
} 

plot_assessment(
  biota_assessment,
  subset = NULL ,
  output_dir = graphics.dir,
  file_type = c("data", "index", "auxiliary"),
  file_format = "png",
  auxiliary = "default"
)

The above code will write a summary file for the run concerned. Checksums for the four runs described in this example are as follows: