iRfcb Tutorial
tutorial.Rmd
Getting Started
Installation
You can install the package from GitHub using the
devtools
package:
# install.packages("devtools")
devtools::install_github("EuropeanIFCBGroup/iRfcb",
dependencies = TRUE)
Some functions in iRfcb
require Python
to
be installed (see in the sections below). You can download
Python
from the official website: python.org/downloads.
Load the iRfcb
library:
Download Sample Data
To get started, download sample data from the SMHI IFCB Plankton Image Reference Library (Torstensson et al. 2024) with the following function:
# Define data directory
data_dir <- "data"
# Download and extract test data in the data folder
ifcb_download_test_data(dest_dir = data_dir,
max_retries = 10,
sleep_time = 30)
## Download and extraction complete.
Extract Timestamps and Sample Volumes
Extract Timestamps from IFCB sample Filenames
Extract timestamps from sample names or filenames:
# Example sample names
filenames <- c("D20230314T001205_IFCB134",
"D20230615T123045_IFCB135.roi")
# Convert filenames to timestamps
timestamps <- ifcb_convert_filenames(filenames)
# Print result
print(timestamps)
## sample timestamp date year month day
## 1 D20230314T001205_IFCB134 2023-03-14 00:12:05 2023-03-14 2023 3 14
## 2 D20230615T123045_IFCB135 2023-06-15 12:30:45 2023-06-15 2023 6 15
## time ifcb_number
## 1 00:12:05 IFCB134
## 2 12:30:45 IFCB135
With ROI numbers:
# Example sample names
filenames <- c("D20230314T001205_IFCB134_00023.png",
"D20230615T123045_IFCB135")
# Convert filenames to timestamps
timestamps <- ifcb_convert_filenames(filenames)
# Print result
print(timestamps)
## sample timestamp date year month day
## 1 D20230314T001205_IFCB134 2023-03-14 00:12:05 2023-03-14 2023 3 14
## 2 D20230615T123045_IFCB135 2023-06-15 12:30:45 2023-06-15 2023 6 15
## time ifcb_number roi
## 1 00:12:05 IFCB134 23
## 2 12:30:45 IFCB135 NA
Get Volume Analyzed in ml
Get the volume analyzed from header/adc files:
# Path to HDR file
hdr_file <- "data/data/2023/D20230314/D20230314T001205_IFCB134.hdr"
# Calculate volume analyzed (in ml)
volume_analyzed <- ifcb_volume_analyzed(hdr_file)
# Print result
print(volume_analyzed)
## [1] 4.568676
Get Sample Runtime
Get the runtime from a header file:
# Get runtime from HDR-file
run_time <- ifcb_get_runtime(hdr_file)
# Print result
print(run_time)
## $runtime
## [1] 1200.853
##
## $inhibittime
## [1] 104.3704
Extract .PNG Images from ROI
Extract all images from a sample:
# All ROIs in sample
ifcb_extract_pngs("data/data/2023/D20230314/D20230314T001205_IFCB134.roi")
## Writing 1218 ROIs from D20230314T001205_IFCB134.roi to data/data/2023/D20230314/D20230314T001205_IFCB134
Extract specific ROIs:
# Only ROI number 2 and 5
ifcb_extract_pngs("data/data/2023/D20230314/D20230314T003836_IFCB134.roi",
ROInumbers = c(2, 5))
## Writing 2 ROIs from D20230314T003836_IFCB134.roi to data/data/2023/D20230314/D20230314T003836_IFCB134
To extract annotated images from MATLAB files, please see Use MATLAB Annotated Files To extract classified results from MATLAB files, please see Classified Results from MATLAB
PSD QC/QA
Particle Size Distribution
IFCB data can be quality controlled by analyzing the particle size
distribution (PSD) (Hayashi et al. in prep). iRfcb
uses the
code available at https://github.com/kudelalab/PSD.
Before running the PSD quality check, ensure the necessary Python
environment is set up and activated:
# Define path to virtual environment
env_path <- "~/.virtualenvs/iRfcb" # Or your preferred venv path
# Install python virtual environment
ifcb_py_install(envname = env_path)
# Run PSD quality control
psd <- ifcb_psd(feature_folder = "data/features/2023",
hdr_folder = "data/data/2023",
save_data = FALSE,
output_file = NULL,
plot_folder = NULL,
use_marker = FALSE,
start_fit = 10,
r_sqr = 0.5,
beads = 10 ** 12,
bubbles = 150,
incomplete = c(1500, 3),
missing_cells = 0.7,
biomass = 1000,
bloom = 5,
humidity = 70)
# Print output from PSD
head(psd$fits)
## # A tibble: 5 × 8
## sample a k R.2 max_ESD_diff capture_percent bead_run humidity
## <chr> <dbl> <dbl> <dbl> <int> <dbl> <lgl> <dbl>
## 1 D20230314T… 5.90e 5 -1.88 0.713 3 0.955 FALSE 16.0
## 2 D20230314T… 2.51e 5 -1.60 0.702 3 0.944 FALSE 16.0
## 3 D20230810T… 3.36e 7 -2.73 0.955 4 0.919 FALSE 65.4
## 4 D20230915T… 1.32e10 -5.54 0.989 2 0.967 FALSE 71.5
## 5 D20230915T… 4.39e10 -6.03 0.981 3 0.961 FALSE 71.5
head(psd$flags)
## # A tibble: 2 × 2
## sample flag
## <chr> <chr>
## 1 D20230915T091133 High Humidity
## 2 D20230915T093804 High Humidity
# Plot PSD of the first sample
plot <- ifcb_psd_plot(sample_name = psd$data$sample[1],
data = psd$data,
fits = psd$fits,
start_fit = 10)
# Print the plot
print(plot)
Geographical QC/QA
Check if IFCB is Near Land
To determine if the IFCB is near land (i.e. in harbor), examine the position data in the .hdr files (or from vectors of latitudes and longitudes):
# Read HDR data and extract GPS position (when available)
gps_data <- ifcb_read_hdr_data("data/data/",
gps_only = TRUE)
## Found 9 .hdr files.
## Processing completed.
# Create new column with the results
gps_data$near_land <- ifcb_is_near_land(gps_data$gpsLatitude,
gps_data$gpsLongitude,
distance = 100, # 100 meters from shore
shape = NULL) # Using the default NE 1:50m Land Polygon
# Print output
head(gps_data)
## sample gpsLatitude gpsLongitude timestamp
## 1 D20220522T000439_IFCB134 NA NA 2022-05-22 00:04:39
## 2 D20220522T003051_IFCB134 NA NA 2022-05-22 00:30:51
## 3 D20220712T210855_IFCB134 NA NA 2022-07-12 21:08:55
## 4 D20220712T222710_IFCB134 NA NA 2022-07-12 22:27:10
## 5 D20230314T001205_IFCB134 56.66883 12.11303 2023-03-14 00:12:05
## 6 D20230314T003836_IFCB134 56.66884 12.11302 2023-03-14 00:38:36
## date year month day time ifcb_number near_land
## 1 2022-05-22 2022 5 22 00:04:39 IFCB134 NA
## 2 2022-05-22 2022 5 22 00:30:51 IFCB134 NA
## 3 2022-07-12 2022 7 12 21:08:55 IFCB134 NA
## 4 2022-07-12 2022 7 12 22:27:10 IFCB134 NA
## 5 2023-03-14 2023 3 14 00:12:05 IFCB134 FALSE
## 6 2023-03-14 2023 3 14 00:38:36 IFCB134 FALSE
For more accurate determination, a detailed coastline .shp file may
be required (e.g. the EEA
Coastline Polygon). Refer to the help pages of ifcb_is_near_land
for further information.
Check which sub-basin an IFCB sample is from
To identify the specific sub-basin of the Baltic Sea (or using a custom shape-file) from which an Imaging FlowCytobot (IFCB) sample was collected, analyze the position data:
# Define example latitude and longitude vectors
latitudes <- c(55.337, 54.729, 56.311, 57.975)
longitudes <- c(12.674, 14.643, 12.237, 10.637)
# Check in which Baltic sea basin the points are in
points_in_the_baltic <- ifcb_which_basin(latitudes,
longitudes,
shape_file = NULL)
# Print output
print(points_in_the_baltic)
## [1] "13 - Arkona Basin" "12 - Bornholm Basin" "16 - Kattegat"
## [4] "17 - Skagerrak"
# Plot the points and the basins
ifcb_which_basin(latitudes,
longitudes,
plot = TRUE,
shape_file = NULL)
This function reads a pre-packaged shapefile of the Baltic Sea, Kattegat, and Skagerrak basins from the ‘iRfcb’ package by default, or a user-supplied shapefile if provided. The shapefiles provided in ‘iRfcb’ originate from SHARK.
Check whether the positions are within the Baltic Sea or elsewhere
This check is useful if only you want to apply a classifier specifically to phytoplankton from the Baltic Sea.
# Define example latitude and longitude vectors
latitudes <- c(55.337, 54.729, 56.311, 57.975)
longitudes <- c(12.674, 14.643, 12.237, 10.637)
# Check if the points are in the Baltic Sea Basin
points_in_the_baltic <- ifcb_is_in_basin(latitudes, longitudes)
# Print results
print(points_in_the_baltic)
## [1] TRUE TRUE FALSE FALSE
# Plot the points and the basin
ifcb_is_in_basin(latitudes, longitudes, plot = TRUE)
This function reads a land-buffered shapefile of the Baltic Sea Basin from the ‘iRfcb’ package by default, or a user-supplied shapefile if provided.
Find missing positions from RV Svea Ferrybox
This function is used by SMHI to collect and match stored ferrybox
positions when they are not available in the .hdr files. An example
ferrybox data file is provided in iRfcb
with data matching
D20220522T000439_IFCB134.
# Define path where ferrybox data are located
ferrybox_folder <- "data/ferrybox_data"
# Get GPS position from ferrybox data
positions <- ifcb_get_ferrybox_data(gps_data$timestamp,
ferrybox_folder)
# Print result
head(positions)
## # A tibble: 6 × 3
## timestamp gpsLatitude gpsLongitude
## <dttm> <dbl> <dbl>
## 1 2022-05-22 00:04:39 55.0 13.6
## 2 2022-05-22 00:30:51 NA NA
## 3 2022-07-12 21:08:55 NA NA
## 4 2022-07-12 22:27:10 NA NA
## 5 2023-03-14 00:12:05 NA NA
## 6 2023-03-14 00:38:36 NA NA
Find contextual ferrybox data from RV Svea
The ifcb_get_ferrybox_data
function can also be used to extract additional ferrybox parameters,
such as temperature (parameter number 8180) and salinity (parameter
number 8181).
# Get salinity and temperature from ferrybox data
ferrybox_data <- ifcb_get_ferrybox_data(gps_data$timestamp,
ferrybox_folder,
parameters = c("8180", "8181"))
# Print result
head(ferrybox_data)
## # A tibble: 6 × 3
## timestamp `8180` `8181`
## <dttm> <dbl> <dbl>
## 1 2022-05-22 00:04:39 11.4 7.86
## 2 2022-05-22 00:30:51 NA NA
## 3 2022-07-12 21:08:55 NA NA
## 4 2022-07-12 22:27:10 NA NA
## 5 2023-03-14 00:12:05 NA NA
## 6 2023-03-14 00:38:36 NA NA
Use MATLAB Annotated Files
PNG Directory
Summarize counts of annotated images at the sample and class levels. The ‘hdr_folder’ can be included to add GPS positions to the sample data frame:
# Summarise counts on sample level
png_per_sample <- ifcb_summarize_png_counts(png_folder = "data/png",
hdr_folder = "data/data",
sum_level = "sample")
head(png_per_sample)
## # A tibble: 6 × 13
## # Groups: sample, ifcb_number [3]
## sample ifcb_number class_name n_images roi_numbers gpsLatitude gpsLongitude
## <chr> <chr> <chr> <int> <chr> <dbl> <dbl>
## 1 D2022052… IFCB134 Ciliophora 1 5 NA NA
## 2 D2022052… IFCB134 Mesodiniu… 4 2, 6, 7, 8 NA NA
## 3 D2022052… IFCB134 Strombidi… 1 3 NA NA
## 4 D2022052… IFCB134 Mesodiniu… 2 2, 3 NA NA
## 5 D2022071… IFCB134 Alexandri… 2 42, 164 NA NA
## 6 D2022071… IFCB134 Strombidi… 2 34, 79 NA NA
## # ℹ 6 more variables: timestamp <dttm>, date <date>, year <dbl>, month <dbl>,
## # day <int>, time <chr>
# Summarise counts on class level
png_per_class <- ifcb_summarize_png_counts(png_folder = "data/png",
sum_level = "class")
# Print output
head(png_per_class)
## # A tibble: 6 × 2
## class_name n_images
## <chr> <int>
## 1 Alexandrium_pseudogonyaulax 3
## 2 Amphidnium-like 1
## 3 Chaetoceros_spp_chain 6
## 4 Chaetoceros_spp_single_cell 3
## 5 Ciliophora 23
## 6 Cryptomonadales 245
MATLAB Files
Count the annotations in the MATLAB files, similar to ifcb_summarize_png_counts
:
# Summarize counts from MATLAB files
mat_count <- ifcb_count_mat_annotations(manual_files = "data/manual",
class2use_file = "data/config/class2use.mat",
skip_class = "unclassified", # Or class ID
sum_level = "class") # Or per "sample"
# Print output
head(mat_count)
## # A tibble: 6 × 2
## class n
## <chr> <int>
## 1 Alexandrium_pseudogonyaulax 3
## 2 Amphidnium-like 1
## 3 Chaetoceros_spp_chain 6
## 4 Chaetoceros_spp_single_cell 3
## 5 Ciliophora 23
## 6 Cryptomonadales 245
Run Image Gallery
To visually inspect and correct annotations, run the image gallery.
# Run Shiny app
ifcb_run_image_gallery()
Individual images can be selected and a list of selected images can
be downloaded as a ‘correction_file’. This file can be used to correct
.mat annotations below using the ifcb_correct_annotation
function.
Correct .mat Files After Checking Images in the App
After reviewing images in the gallery, correct the .mat files using the ‘correction file’ with selected images:
# Get class2use
class_name <- ifcb_get_mat_names("data/config/class2use.mat")
class2use <- ifcb_get_mat_variable("data/config/class2use.mat",
variable_name = class_name)
# Find the class id of unclassified
unclassified_id <- which(grepl("unclassified",
class2use))
# Initialize the python session if not already set up
# env_path <- "~/.virtualenvs/iRfcb"
# ifcb_py_install(envname = env_path)
# Correct the annotation with the output from the image gallery
ifcb_correct_annotation(manual_folder = "data/manual",
out_folder = "data/manual",
correction = "data/manual/correction/Alexandrium_pseudogonyaulax_selected_images.txt",
correct_classid = unclassified_id)
Replace Specific Class Annotations
Replace all instances of a specific class with “unclassified” (class id 1):
# Get class2use
class_name <- ifcb_get_mat_names("data/config/class2use.mat")
class2use <- ifcb_get_mat_variable("data/config/class2use.mat",
variable_name = class_name)
# Find the class id of Alexandrium_pseudogonyaulax
ap_id <- which(grepl("Alexandrium_pseudogonyaulax",
class2use))
# Find the class id of unclassified
unclassified_id <- which(grepl("unclassified",
class2use))
# Initialize the python session if not already set up
# env_path <- "~/.virtualenvs/iRfcb"
# ifcb_py_install(envname = env_path)
# Move all Alexandrium_pseudogonyaulax images to unclassified
ifcb_replace_mat_values(manual_folder = "data/manual",
out_folder = "data/manual",
target_id = ap_id,
new_id = unclassified_id)
Extract Annotated Images
Extract annotated images, skipping the “unclassified” (class id 1) category:
# Extract .png images
ifcb_extract_annotated_images(manual_folder = "data/manual",
class2use_file = "data/config/class2use.mat",
roi_folder = "data/data",
out_folder = "data/extracted_images",
skip_class = 1, # or "unclassified"
verbose = FALSE)
Verify Correction
Verify that the corrections have been applied:
# Summarize new counts after correction
png_per_class <- ifcb_summarize_png_counts(png_folder = "data/extracted_images",
sum_level = "class")
# Print output
head(png_per_class)
## # A tibble: 6 × 2
## class_name n_images
## <chr> <int>
## 1 Amphidnium-like 1
## 2 Chaetoceros_spp_chain 6
## 3 Chaetoceros_spp_single_cell 3
## 4 Ciliophora 23
## 5 Cryptomonadales 245
## 6 Cylindrotheca_Nitzschia_longissima 47
Annotate image
Images can be batch annotated using the ifcb_annotate_batch
function. If a manual file already exists for the sample, the ROI class
list will be updated accordingly. If no file is found, a new .mat file
will be created, with all unannotated ROIs marked as unclassified.
# Read a file with selected images, generated by the image gallery app
correction <- read.table("data/manual/correction/Alexandrium_pseudogonyaulax_selected_images.txt",
header = TRUE)
# Print image names to be annotated
print(correction$image_filename)
## [1] "D20220712T210855_IFCB134_00164.png" "D20220712T222710_IFCB134_00044.png"
# Re-annotate the images that were moved to unclassified earlier in the tutorial
ifcb_annotate_batch(png_images = correction$image_filename,
class = "Alexandrium_pseudogonyaulax",
manual_folder = "data/manual",
adc_folder = "data/data",
class2use_file = "data/config/class2use.mat")
# Summarize new counts after re-annotation
mat_count <- ifcb_count_mat_annotations(manual_files = "data/manual",
class2use_file = "data/config/class2use.mat",
skip_class = "unclassified",
sum_level = "class")
# Print output and check if Alexandrium pseudogonyaulax is back
head(mat_count)
## # A tibble: 6 × 2
## class n
## <chr> <int>
## 1 Alexandrium_pseudogonyaulax 2
## 2 Amphidnium-like 1
## 3 Chaetoceros_spp_chain 6
## 4 Chaetoceros_spp_single_cell 3
## 5 Ciliophora 23
## 6 Cryptomonadales 245
Merge Manual Datasets
Datasets that have been manually annotated using the MATLAB code from
the ifcb-analysis
repository (Sosik and Olson 2007) can be merged using the ifcb_merge_manual
function. This is a wrapper function of the ifcb_create_class2use
,
ifcb_replace_mat_values
and ifcb_adjust_classes
functions.
In this example, two datasets from the Swedish west coast are downloaded from the SMHI IFCB Plankton Image Reference Library (version 3) (Torstensson et al. 2024) and combined into a single dataset. Please note that these datasets are large, and the downloading and merging processes may take considerable time.
# Define data directories
skagerrak_kattegat_dir <- "data_skagerrak_kattegat"
tangesund_dir <- "data_tangesund"
merged_dir <- "data_skagerrak_kattegat_tangesund_merged"
# Download and extract Skagerrak-Kattegat data in the data folder
ifcb_download_test_data(dest_dir = skagerrak_kattegat_dir,
figshare_article = "48158725")
# Download and extract Tångesund data in the data folder
ifcb_download_test_data(dest_dir = tangesund_dir,
figshare_article = "48158731")
# Initialize the python session if not already set up
# env_path <- "~/.virtualenvs/iRfcb"
# ifcb_py_install(envname = env_path)
# Merge Skagerrak-Kattegat and Tångesund to a single dataset
ifcb_merge_manual(class2use_file_base = file.path(skagerrak_kattegat_dir, "config/class2use.mat"),
class2use_file_additions = file.path(tangesund_dir, "config/class2use.mat"),
class2use_file_output = file.path(merged_dir, "config/class2use.mat"),
manual_folder_base = file.path(skagerrak_kattegat_dir, "manual"),
manual_folder_additions = file.path(tangesund_dir, "manual"),
manual_folder_output = file.path(merged_dir, "manual")
)
Prepare Annotated Images for Publication
Summarize Image Metadata
This function gather feature and hdr data for every image in the
png_folder
.
# Summarize image metadata from feature and hdr files
image_metadata <- ifcb_summarize_png_metadata(png_folder = "data/extracted_images",
feature_folder = "data/features",
hdr_folder = "data/data")
# Print the first ten columns of output
head(image_metadata)[1:10]
## image subfolder
## 1 D20230915T093804_IFCB134_02133.png Amphidnium-like_051
## 2 D20230810T113059_IFCB134_00952.png Chaetoceros_spp_chain_018
## 3 D20230810T113059_IFCB134_02303.png Chaetoceros_spp_chain_018
## 4 D20230915T091133_IFCB134_00057.png Chaetoceros_spp_chain_018
## 5 D20230915T093804_IFCB134_00507.png Chaetoceros_spp_chain_018
## 6 D20230915T093804_IFCB134_00689.png Chaetoceros_spp_chain_018
## sample timestamp date year month day
## 1 D20230915T093804_IFCB134 2023-09-15 09:38:04 2023-09-15 2023 9 15
## 2 D20230810T113059_IFCB134 2023-08-10 11:30:59 2023-08-10 2023 8 10
## 3 D20230810T113059_IFCB134 2023-08-10 11:30:59 2023-08-10 2023 8 10
## 4 D20230915T091133_IFCB134 2023-09-15 09:11:33 2023-09-15 2023 9 15
## 5 D20230915T093804_IFCB134 2023-09-15 09:38:04 2023-09-15 2023 9 15
## 6 D20230915T093804_IFCB134 2023-09-15 09:38:04 2023-09-15 2023 9 15
## time ifcb_number
## 1 09:38:04 IFCB134
## 2 11:30:59 IFCB134
## 3 11:30:59 IFCB134
## 4 09:11:33 IFCB134
## 5 09:38:04 IFCB134
## 6 09:38:04 IFCB134
The output can be mapped with the headers in ifcb_get_ecotaxa_example
to produce metadata files suitable for submitting images to EcoTaxa.
PNG Directory
Prepare the PNG directory for publication as a zip-archive, similar to the files in the SMHI IFCB Plankton Image Reference Library (Torstensson et al. 2024):
# Create zip-archive
ifcb_zip_pngs(png_folder = "data/extracted_images",
zip_filename = "data/zip/ifcb_annotated_images_corrected.zip",
readme_file = system.file("exdata/README-template.md",
package = "iRfcb"), # Template icluded in `iRfcb`
email_address = "tutorial@test.com",
version = "1.1",
print_progress = FALSE)
## Creating README file...
## Creating MANIFEST.txt...
## Creating zip archive...
## Zip archive created successfully: /home/runner/work/iRfcb/iRfcb/vignettes/data/zip/ifcb_annotated_images_corrected.zip
MATLAB Directory
Prepare the MATLAB directory for publication as a zip-archive, similar to the files in the SMHI IFCB Plankton Image Reference Library:
# Create zip-archive
ifcb_zip_matlab(manual_folder = "data/manual",
features_folder = "data/features",
class2use_file = "data/config/class2use.mat",
zip_filename = "data/zip/ifcb_matlab_files_corrected.zip",
data_folder = "data/data",
readme_file = system.file("exdata/README-template.md",
package = "iRfcb"), # Template icluded in `iRfcb`
matlab_readme_file = system.file("exdata/MATLAB-template.md",
package = "iRfcb"), # Template icluded in `iRfcb`
email_address = "tutorial@test.com",
version = "1.1",
print_progress = FALSE)
## Listing all files...
## Copying manual files...
## Copying feature files...
## Copying data files...
## Copying class2use file...
## Creating README file...
## Creating MANIFEST.txt...
## Creating zip archive...
## Zip archive created successfully: /home/runner/work/iRfcb/iRfcb/vignettes/data/zip/ifcb_matlab_files_corrected.zip
Create MANIFEST.txt
Create a manifest file for the zip packages:
# Create MANIFEST.txt of the zip folder content
ifcb_create_manifest("data/zip")
## MANIFEST.txt has been created at data/zip/MANIFEST.txt
Classified Results from MATLAB
Extract Classified Results from a Sample
Extract classified results from a sample:
# Extract all classified images from a sample
ifcb_extract_classified_images(sample = "D20230810T113059_IFCB134",
classified_folder = "data/classified",
roi_folder = "data/data",
out_folder = "data/classified_images",
taxa = "All", # or specify a particular taxa
threshold = "opt") # or specify another threshold
## Writing 2747 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Heterocapsa_rotundata
## Writing 519 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Cryptomonadales
## Writing 464 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Dino_smaller_than_30unidentified
## Writing 511 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/unclassified
## Writing 6 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Ciliates
## Writing 245 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Leptocylindrus_danicus_minimus
## Writing 114 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Leptocylindrus_danicus
## Writing 66 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Cylindrotheca_Nitzschia_longissima
## Writing 23 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Chaetoceros_chain
## Writing 6 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Dino_larger_than_30unidentified
## Writing 23 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Prorocentrum_micans
## Writing 51 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Scrippsiella_group
## Writing 2 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Tripos_lineatus
## Writing 1 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Cerataulina_pelagica
## Writing 6 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Gymnodiniales_smaller_than_30
## Writing 3 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Chaetoceros_single_cell
## Writing 5 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Skeletonema_marinoi
## Writing 1 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Enisiculifera_carinata
## Writing 2 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Thalassiosira_gravida
## Writing 2 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Pseudo-nitzschia_spp
## Writing 1 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Octactis_speculum
## Writing 3 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Guinardia_delicatula
## Writing 1 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Thalassiosira_nordenskioeldii
Read feature data
Read all feature files (.csv) from a folder:
# Read feature files from a folder
features <- ifcb_read_features("data/features/2023/")
# Print output of first 10 columns from the first sample in the list
head(features[[1]])[,1:10]
## roi_number Area Biovolume BoundingBox_xwidth BoundingBox_ywidth ConvexArea
## 1 2 446 6082.909 31 21 542
## 2 3 4326 142783.030 111 63 5186
## 3 4 9739 336908.323 202 129 10581
## 4 5 580 9186.802 27 28 602
## 5 6 3927 120366.981 99 50 4191
## 6 7 290 3111.748 22 20 335
## ConvexPerimeter Eccentricity EquivDiameter Extent
## 1 87.24196 0.6006111 23.82991 0.6850998
## 2 291.42030 0.8980639 74.21613 0.6186186
## 3 505.83898 0.9753657 111.35565 0.3737432
## 4 88.58696 0.3299815 27.17497 0.7671958
## 5 265.49548 0.9016151 70.71076 0.7933333
## 6 67.86613 0.3332706 19.21560 0.6590909
# Read only multiblob feature files
multiblob_features <- ifcb_read_features("data/features/2023",
multiblob = TRUE)
# Print output of first 10 columns from the first sample in the list
head(multiblob_features[[1]])[,1:10]
## roi_number blob_number Area MajorAxisLength MinorAxisLength Eccentricity
## 1 154 1 3647 109.93092 45.00010 0.9123779
## 2 154 2 1626 77.53922 30.74631 0.9180235
## 3 214 1 7456 232.11148 122.61037 0.8490956
## 4 214 2 4840 101.68493 68.30606 0.7407850
## 5 214 3 910 54.18655 28.51088 0.8503847
## 6 214 4 153 18.95031 10.93057 0.8168844
## Orientation ConvexArea EquivDiameter Solidity
## 1 11.28171 4205 68.14327 0.8673008
## 2 26.71876 2495 45.50041 0.6517034
## 3 30.89332 23666 97.43343 0.3150511
## 4 -35.88789 6955 78.50146 0.6959022
## 5 27.00911 1551 34.03892 0.5867182
## 6 48.78767 188 13.95728 0.8138298
Read a Summary File
Read a summary file:
# Read a MATLAB summary file generated by `countcells_allTBnew_user_training`
summary_data <- ifcb_read_summary("data/classified/2023/summary/summary_allTB_2023.mat",
biovolume = FALSE,
threshold = "opt")
# Print output
head(summary_data)
## # A tibble: 6 × 12
## sample timestamp date year month day time ifcb_number
## <chr> <dttm> <date> <dbl> <dbl> <int> <time> <chr>
## 1 D202308… 2023-08-10 11:30:59 2023-08-10 2023 8 10 11:30:59 IFCB134
## 2 D202308… 2023-08-10 11:30:59 2023-08-10 2023 8 10 11:30:59 IFCB134
## 3 D202308… 2023-08-10 11:30:59 2023-08-10 2023 8 10 11:30:59 IFCB134
## 4 D202308… 2023-08-10 11:30:59 2023-08-10 2023 8 10 11:30:59 IFCB134
## 5 D202308… 2023-08-10 11:30:59 2023-08-10 2023 8 10 11:30:59 IFCB134
## 6 D202308… 2023-08-10 11:30:59 2023-08-10 2023 8 10 11:30:59 IFCB134
## # ℹ 4 more variables: ml_analyzed <dbl>, species <chr>, counts <dbl>,
## # counts_per_liter <dbl>
Summarize counts, biovolumes and carbon content from classified IFCB data
This function calculates aggregated biovolumes and carbon content
from Imaging FlowCytobot (IFCB) samples based on feature and MATLAB
classification result files, without summarizing the data in MATLAB.
Biovolumes are converted to carbon according to Menden-Deuer and Lessard
(2000) for individual regions of interest (ROI), where different
conversion factors are applied to diatoms and non-diatom protist. If
provided, it also incorporates sample volume data from HDR files to
compute biovolume and carbon content per liter of sample. See details in
the help pages for ifcb_summarize_biovolumes
and ifcb_extract_biovolumes
.
# Summarize biovolume data using IFCB data from classified data folder
biovolume_data <- ifcb_summarize_biovolumes(feature_folder = "data/features/2023",
mat_folder = "data/classified",
hdr_folder = "data/data/2023",
micron_factor = 1/3.4,
diatom_class = "Bacillariophyceae",
threshold = "opt")
## INFO: The following classes are considered NOT diatoms for carbon calculations:
## Ciliates
## Cryptomonadales
## Dino_larger_than_30unidentified
## Dino_smaller_than_30unidentified
## Enisiculifera_carinata
## Gymnodiniales_smaller_than_30
## Heterocapsa_rotundata
## Octactis_speculum
## Prorocentrum_micans
## Scrippsiella_group
## Tripos_lineatus
## unclassified
# Print output
head(biovolume_data)
## # A tibble: 6 × 10
## sample classifier class counts biovolume_mm3 carbon_ug ml_analyzed
## <chr> <chr> <chr> <int> <dbl> <dbl> <dbl>
## 1 D20230810T113059_… "Z:\\data… Cera… 1 0.00000175 0.0000839 3.17
## 2 D20230810T113059_… "Z:\\data… Chae… 23 0.0000176 0.000901 3.17
## 3 D20230810T113059_… "Z:\\data… Chae… 3 0.00000118 0.0000674 3.17
## 4 D20230810T113059_… "Z:\\data… Cili… 6 0.0000117 0.00159 3.17
## 5 D20230810T113059_… "Z:\\data… Cryp… 519 0.0000971 0.0151 3.17
## 6 D20230810T113059_… "Z:\\data… Cyli… 66 0.0000168 0.00101 3.17
## # ℹ 3 more variables: counts_per_liter <dbl>, biovolume_mm3_per_liter <dbl>,
## # carbon_ug_per_liter <dbl>
Summarize counts, biovolumes and carbon content from manually annotated IFCB data
The ifcb_summarize_biovolumes
function can also be used to calculate aggregated biovolumes and carbon
content from manually annotated Imaging FlowCytobot (IFCB) image data.
See details in the help pages for ifcb_summarize_biovolumes
,
ifcb_extract_biovolumes
and ifcb_count_mat_annotations
.
# Summarize biovolume data using IFCB data from manual data folder
manual_biovolume_data <- ifcb_summarize_biovolumes(feature_folder = "data/features",
mat_folder = "data/manual",
class2use_file = "data/config/class2use.mat",
hdr_folder = "data/data",
micron_factor = 1/3.4,
diatom_class = "Bacillariophyceae")
## INFO: The following classes are considered NOT diatoms for carbon calculations:
## Alexandrium_pseudogonyaulax
## Amphidnium-like
## Ciliophora
## Cryptomonadales
## Dinobryon_spp
## Dinoflagellate_larger_than_30unidentified
## Dinoflagellate_smaller_than_30unidentified
## Dinophysis_acuminata
## Enisiculifera_carinata
## Gonyaulax_spp
## Gyrodinium_spirale
## Heterocapsa_Azadinium
## Heterocapsa_rotundata
## Karenia_mikimotoi
## Katodinium-like
## Mesodinium_rubrum
## Octactis_speculum
## Prorocentrum_micans
## Prorocentrum_triestinum
## Protoperidinium_spp
## Scrippsiella_group
## Strombidium-like
## Torodinium_robustum
## Tripos_furca
## Tripos_lineatus
## unclassified
# Print output
head(manual_biovolume_data)
## # A tibble: 6 × 10
## sample classifier class counts biovolume_mm3 carbon_ug ml_analyzed
## <chr> <lgl> <chr> <int> <dbl> <dbl> <dbl>
## 1 D20220522T000439_… NA Cili… 1 0.00000327 0.000432 4.86
## 2 D20220522T000439_… NA Meso… 4 0.0000274 0.00344 4.86
## 3 D20220522T000439_… NA Stro… 1 0.00000386 0.000504 4.86
## 4 D20220522T000439_… NA uncl… 1 0.00000288 0.000384 4.86
## 5 D20220522T003051_… NA Meso… 2 0.0000122 0.00155 2.98
## 6 D20220712T210855_… NA Alex… 1 0.0000160 0.00191 4.91
## # ℹ 3 more variables: counts_per_liter <dbl>, biovolume_mm3_per_liter <dbl>,
## # carbon_ug_per_liter <dbl>
Taxonomical Data
Check whether a class name is a diatom
This function takes a list of taxa names, cleans them, retrieves
their corresponding classification records from the World Register of
Marine Species (WoRMS), and checks if they belong to the specified
diatom class. The function only uses the first name (genus name) of each
taxa for classification. This function can be useful for converting
biovolumes to carbon according to Menden-Deuer and Lessard (2000). See
iRfcb:::vol2C_nondiatom
and iRfcb:::vol2C_lgdiatom
for carbon calculations (not included in NAMESPACE).
# Read class2use file
class2use <- ifcb_get_mat_variable("data/config/class2use.mat")
# Create a dataframe with class name and result from `ifcb_is_diatom`
class_list <- data.frame(class2use,
is_diatom = ifcb_is_diatom(class2use))
# Print rows 10-15 of result
class_list[10:15,]
## class2use is_diatom
## 10 Nodularia_spumigena FALSE
## 11 Cryptomonadales FALSE
## 12 Acanthoica_quattrospina FALSE
## 13 Asterionellopsis_glacialis TRUE
## 14 Centrales TRUE
## 15 Centrales_chain TRUE
The default class for diatoms is defined as Bacillariophyceae, but
may be adjusted using the diatom_class
argument.
Find trophic type of plankton taxa
This function takes a list of taxa names and matches them with the
SMHI Trophic Type
list used in SHARK.
# Example taxa names
taxa_list <- c("Acanthoceras zachariasii",
"Nodularia spumigena",
"Acanthoica quattrospina",
"Noctiluca",
"Gymnodiniales")
# Get trophic type for taxa
trophic_type <- ifcb_get_trophic_type(taxa_list)
# Print result
print(trophic_type)
## [1] "AU" "AU" "MX" "HT" "NS"
SHARK export
This function is used by SMHI to map IFCB data into the SHARK standard data
delivery format. An example submission is also provided in
iRfcb
.
# Get column names from example
shark_colnames <- ifcb_get_shark_colnames()
# Print column names
print(shark_colnames)
## [1] MYEAR STATN SAMPLING_PLATFORM
## [4] PROJ ORDERER SHIPC
## [7] CRUISE_NO DATE_TIME SDATE
## [10] STIME TIMEZONE LATIT
## [13] LONGI POSYS WADEP
## [16] MPROG MNDEP MXDEP
## [19] SLABO ACKR_SMP SMTYP
## [22] PDMET SMVOL METFP
## [25] IFCBNO SMPNO LATNM
## [28] SFLAG LATNM_SFLAG TRPHY
## [31] APHIA_ID IMAGE_VERIFICATION VERIFIED_BY
## [34] COUNT ABUND BIOVOL
## [37] C_CONC QFLAG COEFF
## [40] CLASS_NAME CLASS_F1 UNCLASSIFIED_COUNTS
## [43] UNCLASSIFIED_ABUNDANCE UNCLASSIFIED_VOLUME METOA
## [46] ASSOCIATED_MEDIA CLASSPROG ALABO
## [49] ACKR_ANA ANADATE METDC
## [52] TRAINING_SET CLASSIFIER_USED MANUAL_QC_DATE
## [55] PRE_FILTER_SIZE PH_FB CHL_FB
## [58] CDOM_FB PHYC_FB PHER_FB
## [61] WATERFLOW_FB TURB_FB PCO2_FB
## [64] TEMP_FB PSAL_FB OSAT_FB
## [67] DOXY_FB
## <0 rows> (or 0-length row.names)
# Load example stored from `iRfcb`
shark_example <- ifcb_get_shark_example()
# Print first ten columns of the SHARK data submission example
head(shark_example)[1:10]
## MYEAR STATN SAMPLING_PLATFORM PROJ ORDERER
## 1 2022 RV_FB_D20220713T175838 IFCB IFCB, DTO, JERICO SMHI
## 2 2022 RV_FB_D20220713T175838 IFCB IFCB, DTO, JERICO SMHI
## 3 2022 RV_FB_D20220713T175838 IFCB IFCB, DTO, JERICO SMHI
## 4 2022 RV_FB_D20220713T175838 IFCB IFCB, DTO, JERICO SMHI
## 5 2022 RV_FB_D20220713T175838 SveaFB IFCB, DTO, JERICO SMHI
## SHIPC CRUISE_NO DATE_TIME SDATE STIME
## 1 77SE 12 2,02E+13 2022-07-13 17:58:38
## 2 77SE 12 2,02E+13 2022-07-13 17:58:38
## 3 77SE 12 2,02E+13 2022-07-13 17:58:38
## 4 77SE 12 2,02E+13 2022-07-13 17:58:38
## 5 77SE 12 2,02E+13 2022-07-13 17:58:38
This concludes the tutorial for the iRfcb
package. For
more detailed information, refer to the package documentation. See how
data pipelines can be constructed using iRfcb
in the
following Example
Project. Happy analyzing!
Citation
## To cite package 'iRfcb' in publications use:
##
## Anders Torstensson (2024). I 'R' FlowCytobot (iRfcb): Tools for
## Analyzing and Processing Data from the IFCB. R package version
## 0.3.15. https://doi.org/10.5281/zenodo.12533225
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {I 'R' FlowCytobot (iRfcb): Tools for Analyzing and Processing Data from the IFCB},
## author = {Anders Torstensson},
## year = {2024},
## note = {R package version 0.3.15},
## url = {https://doi.org/10.5281/zenodo.12533225},
## }
References
- Hayashi, K., Walton, J., Lie, A., Smith, J. and Kudela M. Using particle size distribution (PSD) to automate imaging flow cytobot (IFCB) data quality in coastal California, USA. In prep.
- Menden-Deuer Susanne, Lessard Evelyn J., (2000), Carbon to volume relationships for dinoflagellates, diatoms, and other protist plankton, Limnology and Oceanography, 3, doi: 10.4319/lo.2000.45.3.0569.
- Sosik, H. M. and Olson, R. J. (2007) Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
- Torstensson, A., Skjevik, A-T., Mohlin, M., Karlberg, M. and Karlson, B. (2024). SMHI IFCB Plankton Image Reference Library. SciLifeLab. Dataset. https://doi.org/10.17044/scilifelab.25883455.v3