Introduction
This vignette demonstrates how to process and refine annotated and
automatically classified Imaging FlowCytobot (IFCB) data in R using the
iRfcb
package. The workflow assumes that MATLAB-based
preprocessing has already been conducted using the ifcb-analysis
repository (Sosik and Olson 2007). This preprocessing includes
generating .mat
files for annotated and classified
images.
With iRfcb
, you can further analyze and manage IFCB
data, including summarizing annotations and class results, refining
annotations.
Getting Started
Installation
You can install the package from GitHub using the
devtools
package:
# install.packages("devtools")
devtools::install_github("EuropeanIFCBGroup/iRfcb",
dependencies = TRUE)
Some functions from the iRfcb
package used in this
tutorial require Python
to be installed. You can download
Python
from the official website: python.org/downloads.
Load the iRfcb
library:
Download Sample Data
To get started, download sample data from the SMHI IFCB Plankton Image Reference Library (Torstensson et al. 2024) with the following function:
# Define data directory
data_dir <- "data"
# Download and extract test data in the data folder
ifcb_download_test_data(dest_dir = data_dir,
max_retries = 10,
sleep_time = 30,
verbose = FALSE)
Classified Results from MATLAB
The iRfcb
package facilitates the processing and
analysis of data classified using a random forest algorithm from the ifcb-analysis
repository. This workflow supports various tasks such as extracting
classified results, reading summary files, and calculating biovolume and
carbon content.
This section provides an overview of key functions available in
iRfcb
for handling classified IFCB data. Step-by-step
examples are included to guide users through extracting results,
summarizing data, and leveraging functionalities for both automated and
manually annotated datasets.
Extract Classified Images from a Sample
To begin working with classified data, you can extract all classified images from a specific sample. This is especially useful for isolating ROIs based on specific taxa or classification thresholds.
# Extract all classified images from a sample
ifcb_extract_classified_images(sample = "D20230810T113059_IFCB134",
classified_folder = "data/classified",
roi_folder = "data/data",
out_folder = "data/classified_images",
taxa = "All", # or specify a particular taxa
threshold = "opt") # or specify another threshold
## Writing 2747 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Heterocapsa_rotundata
## Writing 519 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Cryptomonadales
## Writing 464 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Dino_smaller_than_30unidentified
## Writing 511 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/unclassified
## Writing 6 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Ciliates
## Writing 245 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Leptocylindrus_danicus_minimus
## Writing 114 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Leptocylindrus_danicus
## Writing 66 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Cylindrotheca_Nitzschia_longissima
## Writing 23 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Chaetoceros_chain
## Writing 6 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Dino_larger_than_30unidentified
## Writing 23 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Prorocentrum_micans
## Writing 51 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Scrippsiella_group
## Writing 2 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Tripos_lineatus
## Writing 1 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Cerataulina_pelagica
## Writing 6 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Gymnodiniales_smaller_than_30
## Writing 3 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Chaetoceros_single_cell
## Writing 5 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Skeletonema_marinoi
## Writing 1 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Enisiculifera_carinata
## Writing 2 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Thalassiosira_gravida
## Writing 2 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Pseudo-nitzschia_spp
## Writing 1 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Octactis_speculum
## Writing 3 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Guinardia_delicatula
## Writing 1 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Thalassiosira_nordenskioeldii
Read a Summary File
Summary files generated by the MATLAB function
countcells_allTBnew_user_training
provide aggregated
classified data. Use the following function to read and process these
files.
# Read a MATLAB summary file generated by `countcells_allTBnew_user_training`
summary_data <- ifcb_read_summary("data/classified/2023/summary/summary_allTB_2023.mat",
biovolume = FALSE,
threshold = "opt")
# Print output
head(summary_data)
## # A tibble: 6 × 12
## sample timestamp date year month day time ifcb_number
## <chr> <dttm> <date> <dbl> <dbl> <int> <time> <chr>
## 1 D202308… 2023-08-10 11:30:59 2023-08-10 2023 8 10 11:30:59 IFCB134
## 2 D202308… 2023-08-10 11:30:59 2023-08-10 2023 8 10 11:30:59 IFCB134
## 3 D202308… 2023-08-10 11:30:59 2023-08-10 2023 8 10 11:30:59 IFCB134
## 4 D202308… 2023-08-10 11:30:59 2023-08-10 2023 8 10 11:30:59 IFCB134
## 5 D202308… 2023-08-10 11:30:59 2023-08-10 2023 8 10 11:30:59 IFCB134
## 6 D202308… 2023-08-10 11:30:59 2023-08-10 2023 8 10 11:30:59 IFCB134
## # ℹ 4 more variables: ml_analyzed <dbl>, species <chr>, counts <dbl>,
## # counts_per_liter <dbl>
Alternatively, iRfcb
can directly aggregate data and
compute carbon content from classification files using the ifcb_summarize_biovolumes
function demonstrated below.
Summarize Counts, Biovolumes and Carbon Content from Classified IFCB Data
This function calculates aggregated biovolumes and carbon content
from IFCB samples based on feature and MATLAB classification result
files, without summarizing the data in MATLAB. Biovolumes are converted
to carbon according to Menden-Deuer and Lessard (2000) for individual
ROIs, where different conversion factors are applied to diatoms and
non-diatom protist. If provided, it also incorporates sample volume data
from .hdr
files to compute biovolume and carbon content per
liter of sample. See details in the help pages for ifcb_summarize_biovolumes
and ifcb_extract_biovolumes
.
# Summarize biovolume data using IFCB data from classified data folder
biovolume_data <- ifcb_summarize_biovolumes(feature_folder = "data/features/2023",
mat_folder = "data/classified",
hdr_folder = "data/data/2023",
micron_factor = 1/3.4,
diatom_class = "Bacillariophyceae",
threshold = "opt",
verbose = FALSE) # Do not print progress bars
# Print output
head(biovolume_data)
## # A tibble: 6 × 10
## sample classifier class counts biovolume_mm3 carbon_ug ml_analyzed
## <chr> <chr> <chr> <int> <dbl> <dbl> <dbl>
## 1 D20230810T113059_… "Z:\\data… Cera… 1 0.00000175 0.0000839 3.17
## 2 D20230810T113059_… "Z:\\data… Chae… 23 0.0000176 0.000901 3.17
## 3 D20230810T113059_… "Z:\\data… Chae… 3 0.00000118 0.0000674 3.17
## 4 D20230810T113059_… "Z:\\data… Cili… 6 0.0000117 0.00159 3.17
## 5 D20230810T113059_… "Z:\\data… Cryp… 519 0.0000971 0.0151 3.17
## 6 D20230810T113059_… "Z:\\data… Cyli… 66 0.0000168 0.00101 3.17
## # ℹ 3 more variables: counts_per_liter <dbl>, biovolume_mm3_per_liter <dbl>,
## # carbon_ug_per_liter <dbl>
Summarize Counts, Biovolumes and Carbon Content from Manually Annotated IFCB Data
The ifcb_summarize_biovolumes
function can also be used to calculate aggregated biovolumes and carbon
content from manually annotated IFCB image data. See details in the help
pages for ifcb_summarize_biovolumes
,
ifcb_extract_biovolumes
and ifcb_count_mat_annotations
.
# Summarize biovolume data using IFCB data from manual data folder
manual_biovolume_data <- ifcb_summarize_biovolumes(feature_folder = "data/features",
mat_folder = "data/manual",
class2use_file = "data/config/class2use.mat",
hdr_folder = "data/data",
micron_factor = 1/3.4,
diatom_class = "Bacillariophyceae",
verbose = FALSE) # Do not print progress bars
# Print output
head(manual_biovolume_data)
## # A tibble: 6 × 10
## sample classifier class counts biovolume_mm3 carbon_ug ml_analyzed
## <chr> <lgl> <chr> <int> <dbl> <dbl> <dbl>
## 1 D20220522T000439_… NA Cili… 1 0.00000327 0.000432 4.86
## 2 D20220522T000439_… NA Meso… 4 0.0000274 0.00344 4.86
## 3 D20220522T000439_… NA Stro… 1 0.00000386 0.000504 4.86
## 4 D20220522T000439_… NA uncl… 1 0.00000288 0.000384 4.86
## 5 D20220522T003051_… NA Meso… 2 0.0000122 0.00155 2.98
## 6 D20220712T210855_… NA Alex… 2 0.0000476 0.00555 4.91
## # ℹ 3 more variables: counts_per_liter <dbl>, biovolume_mm3_per_liter <dbl>,
## # carbon_ug_per_liter <dbl>
Manually Annotated Data from MATLAB
Count and Summarize Annotated Image Data
PNG Directory
Summarize counts of annotated images at the sample and class levels.
The hdr_folder
can be included to add GPS positions to the
sample data frame:
# Summarise counts on sample level
png_per_sample <- ifcb_summarize_png_counts(png_folder = "data/png",
hdr_folder = "data/data",
sum_level = "sample")
head(png_per_sample)
## # A tibble: 6 × 13
## # Groups: sample, ifcb_number [3]
## sample ifcb_number class_name n_images roi_numbers gpsLatitude gpsLongitude
## <chr> <chr> <chr> <int> <chr> <dbl> <dbl>
## 1 D2022052… IFCB134 Ciliophora 1 5 NA NA
## 2 D2022052… IFCB134 Mesodiniu… 4 2, 6, 7, 8 NA NA
## 3 D2022052… IFCB134 Strombidi… 1 3 NA NA
## 4 D2022052… IFCB134 Mesodiniu… 2 2, 3 NA NA
## 5 D2022071… IFCB134 Alexandri… 2 42, 164 NA NA
## 6 D2022071… IFCB134 Strombidi… 2 34, 79 NA NA
## # ℹ 6 more variables: timestamp <dttm>, date <date>, year <dbl>, month <dbl>,
## # day <int>, time <chr>
# Summarise counts on class level
png_per_class <- ifcb_summarize_png_counts(png_folder = "data/png",
sum_level = "class")
# Print output
head(png_per_class)
## # A tibble: 6 × 2
## class_name n_images
## <chr> <int>
## 1 Alexandrium_pseudogonyaulax 3
## 2 Amphidnium-like 1
## 3 Chaetoceros_spp_chain 6
## 4 Chaetoceros_spp_single_cell 3
## 5 Ciliophora 23
## 6 Cryptomonadales 245
MATLAB Files
Count the annotations in the MATLAB files, similar to ifcb_summarize_png_counts
:
# Summarize counts from MATLAB files
mat_count <- ifcb_count_mat_annotations(manual_files = "data/manual",
class2use_file = "data/config/class2use.mat",
skip_class = "unclassified", # Or class ID
sum_level = "class") # Or per "sample"
# Print output
head(mat_count)
## # A tibble: 6 × 2
## class n
## <chr> <int>
## 1 Alexandrium_pseudogonyaulax 3
## 2 Amphidnium-like 1
## 3 Chaetoceros_spp_chain 6
## 4 Chaetoceros_spp_single_cell 3
## 5 Ciliophora 23
## 6 Cryptomonadales 245
Run Image Gallery
To visually inspect and correct annotations, run the image gallery.
# Run Shiny app
ifcb_run_image_gallery()
Individual images can be selected and a list of selected images can
be downloaded as a correction
file. This file can be used
to correct .mat
annotations below using the ifcb_correct_annotation
function.
Correct .mat Files After Checking Images in the App
After reviewing images in the gallery, correct the .mat
files using the correction
file with selected images:
# Get class2use
class_name <- ifcb_get_mat_names("data/config/class2use.mat")
class2use <- ifcb_get_mat_variable("data/config/class2use.mat",
variable_name = class_name)
# Find the class id of unclassified
unclassified_id <- which(grepl("unclassified",
class2use))
# Initialize the python session if not already set up
env_path <- "~/.virtualenvs/iRfcb"
ifcb_py_install(envname = env_path)
# Correct the annotation with the output from the image gallery
ifcb_correct_annotation(manual_folder = "data/manual",
out_folder = "data/manual",
correction = "data/manual/correction/Alexandrium_pseudogonyaulax_selected_images.txt",
correct_classid = unclassified_id)
Replace Specific Class Annotations
Replace all instances of a specific class with unclassified (class id 1):
# Get class2use
class_name <- ifcb_get_mat_names("data/config/class2use.mat")
class2use <- ifcb_get_mat_variable("data/config/class2use.mat",
variable_name = class_name)
# Find the class id of Alexandrium_pseudogonyaulax
ap_id <- which(grepl("Alexandrium_pseudogonyaulax",
class2use))
# Find the class id of unclassified
unclassified_id <- which(grepl("unclassified",
class2use))
# Initialize the python session if not already set up
# env_path <- "~/.virtualenvs/iRfcb"
# ifcb_py_install(envname = env_path)
# Move all Alexandrium_pseudogonyaulax images to unclassified
ifcb_replace_mat_values(manual_folder = "data/manual",
out_folder = "data/manual",
target_id = ap_id,
new_id = unclassified_id)
Verify Correction
Verify that the corrections have been applied:
# Summarize new counts after correction
mat_count <- ifcb_count_mat_annotations(manual_files = "data/manual",
class2use_file = "data/config/class2use.mat",
skip_class = "unclassified", # Or class ID
sum_level = "class") # Or per "sample"
# Print output
head(mat_count)
## # A tibble: 6 × 2
## class n
## <chr> <int>
## 1 Amphidnium-like 1
## 2 Chaetoceros_spp_chain 6
## 3 Chaetoceros_spp_single_cell 3
## 4 Ciliophora 23
## 5 Cryptomonadales 245
## 6 Cylindrotheca_Nitzschia_longissima 47
Annotate Images in Batch
Images can be batch annotated using the ifcb_annotate_batch
function. If a manual file already exists for the sample, the ROI class
list will be updated accordingly. If no file is found, a new
.mat
file will be created, with all unannotated ROIs marked
as unclassified.
# Read a file with selected images, generated by the image gallery app
correction <- read.table("data/manual/correction/Alexandrium_pseudogonyaulax_selected_images.txt",
header = TRUE)
# Print image names to be annotated
print(correction$image_filename)
## [1] "D20220712T210855_IFCB134_00164.png" "D20220712T222710_IFCB134_00044.png"
# Re-annotate the images that were moved to unclassified earlier in the tutorial
ifcb_annotate_batch(png_images = correction$image_filename,
class = "Alexandrium_pseudogonyaulax",
manual_folder = "data/manual",
adc_folder = "data/data",
class2use_file = "data/config/class2use.mat")
# Summarize new counts after re-annotation
mat_count <- ifcb_count_mat_annotations(manual_files = "data/manual",
class2use_file = "data/config/class2use.mat",
skip_class = "unclassified",
sum_level = "class")
# Print output and check if Alexandrium pseudogonyaulax is back
head(mat_count)
## # A tibble: 6 × 2
## class n
## <chr> <int>
## 1 Alexandrium_pseudogonyaulax 2
## 2 Amphidnium-like 1
## 3 Chaetoceros_spp_chain 6
## 4 Chaetoceros_spp_single_cell 3
## 5 Ciliophora 23
## 6 Cryptomonadales 245
This concludes this tutorial for the iRfcb
package. For
more detailed information, refer to the package documentation or the
other tutorials. See how data
pipelines can be constructed using iRfcb
in the following
Example
Project. Happy analyzing!
Citation
## To cite package 'iRfcb' in publications use:
##
## Anders Torstensson (2025). I 'R' FlowCytobot (iRfcb): Tools for
## Analyzing and Processing Data from the IFCB. R package version 0.4.0.
## https://doi.org/10.5281/zenodo.12533225
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {I 'R' FlowCytobot (iRfcb): Tools for Analyzing and Processing Data from the IFCB},
## author = {Anders Torstensson},
## year = {2025},
## note = {R package version 0.4.0},
## url = {https://doi.org/10.5281/zenodo.12533225},
## }
References
- Kraft, K., Velhonoja, O., Seppälä, J., Hällfors, H., Suikkanen, S., Ylöstalo, P., Anglès, S., Kielosto, S., Kuosa, H., Lehtinen, S., Oja, J., Tamminen, T. (2022). SYKE-plankton_IFCB_2022 [Data set]. https://b2share.eudat.eu. https://doi.org/10.23728/b2share.abf913e5a6ad47e6baa273ae0ed6617a
- Menden-Deuer Susanne, Lessard Evelyn J., (2000), Carbon to volume relationships for dinoflagellates, diatoms, and other protist plankton, Limnology and Oceanography, 3, doi: 10.4319/lo.2000.45.3.0569.
- Sosik, H. M. and Olson, R. J. (2007) Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
- Torstensson, A., Skjevik, A-T., Mohlin, M., Karlberg, M. and Karlson, B. (2024). SMHI IFCB Plankton Image Reference Library. SciLifeLab. Dataset. https://doi.org/10.17044/scilifelab.25883455.v3