Skip to contents

Introduction

This vignette demonstrates how to process and refine annotated and automatically classified Imaging FlowCytobot (IFCB) data in R using the iRfcb package. The workflow assumes that MATLAB-based preprocessing has already been conducted using the ifcb-analysis repository (Sosik and Olson 2007). This preprocessing includes generating .mat files for annotated and classified images.

With iRfcb, you can further analyze and manage IFCB data, including summarizing annotations and class results, refining annotations.

Getting Started

Installation

You can install the package from GitHub using the devtools package:

# install.packages("devtools")
devtools::install_github("EuropeanIFCBGroup/iRfcb",
                         dependencies = TRUE)

Some functions from the iRfcb package used in this tutorial require Python to be installed. You can download Python from the official website: python.org/downloads.

Load the iRfcb library:

Download Sample Data

To get started, download sample data from the SMHI IFCB Plankton Image Reference Library (Torstensson et al. 2024) with the following function:

# Define data directory
data_dir <- "data"

# Download and extract test data in the data folder
ifcb_download_test_data(dest_dir = data_dir,
                        max_retries = 10,
                        sleep_time = 30,
                        verbose = FALSE)

Classified Results from MATLAB

The iRfcb package facilitates the processing and analysis of data classified using a random forest algorithm from the ifcb-analysis repository. This workflow supports various tasks such as extracting classified results, reading summary files, and calculating biovolume and carbon content.

This section provides an overview of key functions available in iRfcb for handling classified IFCB data. Step-by-step examples are included to guide users through extracting results, summarizing data, and leveraging functionalities for both automated and manually annotated datasets.

Extract Classified Images from a Sample

To begin working with classified data, you can extract all classified images from a specific sample. This is especially useful for isolating ROIs based on specific taxa or classification thresholds.

# Extract all classified images from a sample
ifcb_extract_classified_images(sample = "D20230810T113059_IFCB134",
                               classified_folder = "data/classified",
                               roi_folder = "data/data",
                               out_folder = "data/classified_images",
                               taxa = "All", # or specify a particular taxa
                               threshold = "opt") # or specify another threshold
## Writing 2747 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Heterocapsa_rotundata 
## Writing 519 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Cryptomonadales 
## Writing 464 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Dino_smaller_than_30unidentified 
## Writing 511 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/unclassified 
## Writing 6 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Ciliates 
## Writing 245 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Leptocylindrus_danicus_minimus 
## Writing 114 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Leptocylindrus_danicus 
## Writing 66 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Cylindrotheca_Nitzschia_longissima 
## Writing 23 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Chaetoceros_chain 
## Writing 6 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Dino_larger_than_30unidentified 
## Writing 23 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Prorocentrum_micans 
## Writing 51 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Scrippsiella_group 
## Writing 2 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Tripos_lineatus 
## Writing 1 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Cerataulina_pelagica 
## Writing 6 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Gymnodiniales_smaller_than_30 
## Writing 3 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Chaetoceros_single_cell 
## Writing 5 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Skeletonema_marinoi 
## Writing 1 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Enisiculifera_carinata 
## Writing 2 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Thalassiosira_gravida 
## Writing 2 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Pseudo-nitzschia_spp 
## Writing 1 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Octactis_speculum 
## Writing 3 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Guinardia_delicatula 
## Writing 1 ROIs from D20230810T113059_IFCB134.roi to data/classified_images/Thalassiosira_nordenskioeldii

Read a Summary File

Summary files generated by the MATLAB function countcells_allTBnew_user_training provide aggregated classified data. Use the following function to read and process these files.

# Read a MATLAB summary file generated by `countcells_allTBnew_user_training`
summary_data <- ifcb_read_summary("data/classified/2023/summary/summary_allTB_2023.mat",
                                  biovolume = FALSE,
                                  threshold = "opt")

# Print output
head(summary_data)
## # A tibble: 6 × 12
##   sample   timestamp           date        year month   day time     ifcb_number
##   <chr>    <dttm>              <date>     <dbl> <dbl> <int> <time>   <chr>      
## 1 D202308… 2023-08-10 11:30:59 2023-08-10  2023     8    10 11:30:59 IFCB134    
## 2 D202308… 2023-08-10 11:30:59 2023-08-10  2023     8    10 11:30:59 IFCB134    
## 3 D202308… 2023-08-10 11:30:59 2023-08-10  2023     8    10 11:30:59 IFCB134    
## 4 D202308… 2023-08-10 11:30:59 2023-08-10  2023     8    10 11:30:59 IFCB134    
## 5 D202308… 2023-08-10 11:30:59 2023-08-10  2023     8    10 11:30:59 IFCB134    
## 6 D202308… 2023-08-10 11:30:59 2023-08-10  2023     8    10 11:30:59 IFCB134    
## # ℹ 4 more variables: ml_analyzed <dbl>, species <chr>, counts <dbl>,
## #   counts_per_liter <dbl>

Alternatively, iRfcb can directly aggregate data and compute carbon content from classification files using the ifcb_summarize_biovolumes function demonstrated below.

Summarize Counts, Biovolumes and Carbon Content from Classified IFCB Data

This function calculates aggregated biovolumes and carbon content from IFCB samples based on feature and MATLAB classification result files, without summarizing the data in MATLAB. Biovolumes are converted to carbon according to Menden-Deuer and Lessard (2000) for individual ROIs, where different conversion factors are applied to diatoms and non-diatom protist. If provided, it also incorporates sample volume data from .hdr files to compute biovolume and carbon content per liter of sample. See details in the help pages for ifcb_summarize_biovolumes and ifcb_extract_biovolumes.

# Summarize biovolume data using IFCB data from classified data folder
biovolume_data <- ifcb_summarize_biovolumes(feature_folder = "data/features/2023",
                                            mat_folder = "data/classified",
                                            hdr_folder = "data/data/2023",
                                            micron_factor = 1/3.4,
                                            diatom_class = "Bacillariophyceae",
                                            threshold = "opt",
                                            verbose = FALSE) # Do not print progress bars

# Print output
head(biovolume_data)
## # A tibble: 6 × 10
##   sample             classifier class counts biovolume_mm3 carbon_ug ml_analyzed
##   <chr>              <chr>      <chr>  <int>         <dbl>     <dbl>       <dbl>
## 1 D20230810T113059_… "Z:\\data… Cera…      1    0.00000175 0.0000839        3.17
## 2 D20230810T113059_… "Z:\\data… Chae…     23    0.0000176  0.000901         3.17
## 3 D20230810T113059_… "Z:\\data… Chae…      3    0.00000118 0.0000674        3.17
## 4 D20230810T113059_… "Z:\\data… Cili…      6    0.0000117  0.00159          3.17
## 5 D20230810T113059_… "Z:\\data… Cryp…    519    0.0000971  0.0151           3.17
## 6 D20230810T113059_… "Z:\\data… Cyli…     66    0.0000168  0.00101          3.17
## # ℹ 3 more variables: counts_per_liter <dbl>, biovolume_mm3_per_liter <dbl>,
## #   carbon_ug_per_liter <dbl>

Summarize Counts, Biovolumes and Carbon Content from Manually Annotated IFCB Data

The ifcb_summarize_biovolumes function can also be used to calculate aggregated biovolumes and carbon content from manually annotated IFCB image data. See details in the help pages for ifcb_summarize_biovolumes, ifcb_extract_biovolumes and ifcb_count_mat_annotations.

# Summarize biovolume data using IFCB data from manual data folder
manual_biovolume_data <- ifcb_summarize_biovolumes(feature_folder = "data/features",
                                                   mat_folder = "data/manual",
                                                   class2use_file = "data/config/class2use.mat",
                                                   hdr_folder = "data/data",
                                                   micron_factor = 1/3.4,
                                                   diatom_class = "Bacillariophyceae",
                                                   verbose = FALSE) # Do not print progress bars

# Print output
head(manual_biovolume_data)
## # A tibble: 6 × 10
##   sample             classifier class counts biovolume_mm3 carbon_ug ml_analyzed
##   <chr>              <lgl>      <chr>  <int>         <dbl>     <dbl>       <dbl>
## 1 D20220522T000439_… NA         Cili…      1    0.00000327  0.000432        4.86
## 2 D20220522T000439_… NA         Meso…      4    0.0000274   0.00344         4.86
## 3 D20220522T000439_… NA         Stro…      1    0.00000386  0.000504        4.86
## 4 D20220522T000439_… NA         uncl…      1    0.00000288  0.000384        4.86
## 5 D20220522T003051_… NA         Meso…      2    0.0000122   0.00155         2.98
## 6 D20220712T210855_… NA         Alex…      2    0.0000476   0.00555         4.91
## # ℹ 3 more variables: counts_per_liter <dbl>, biovolume_mm3_per_liter <dbl>,
## #   carbon_ug_per_liter <dbl>

Manually Annotated Data from MATLAB

Count and Summarize Annotated Image Data

PNG Directory

Summarize counts of annotated images at the sample and class levels. The hdr_folder can be included to add GPS positions to the sample data frame:

# Summarise counts on sample level
png_per_sample <- ifcb_summarize_png_counts(png_folder = "data/png",
                                            hdr_folder = "data/data",
                                            sum_level = "sample")

head(png_per_sample)
## # A tibble: 6 × 13
## # Groups:   sample, ifcb_number [3]
##   sample    ifcb_number class_name n_images roi_numbers gpsLatitude gpsLongitude
##   <chr>     <chr>       <chr>         <int> <chr>             <dbl>        <dbl>
## 1 D2022052… IFCB134     Ciliophora        1 5                    NA           NA
## 2 D2022052… IFCB134     Mesodiniu…        4 2, 6, 7, 8           NA           NA
## 3 D2022052… IFCB134     Strombidi…        1 3                    NA           NA
## 4 D2022052… IFCB134     Mesodiniu…        2 2, 3                 NA           NA
## 5 D2022071… IFCB134     Alexandri…        2 42, 164              NA           NA
## 6 D2022071… IFCB134     Strombidi…        2 34, 79               NA           NA
## # ℹ 6 more variables: timestamp <dttm>, date <date>, year <dbl>, month <dbl>,
## #   day <int>, time <chr>
# Summarise counts on class level
png_per_class <- ifcb_summarize_png_counts(png_folder = "data/png",
                                           sum_level = "class")

# Print output
head(png_per_class)
## # A tibble: 6 × 2
##   class_name                  n_images
##   <chr>                          <int>
## 1 Alexandrium_pseudogonyaulax        3
## 2 Amphidnium-like                    1
## 3 Chaetoceros_spp_chain              6
## 4 Chaetoceros_spp_single_cell        3
## 5 Ciliophora                        23
## 6 Cryptomonadales                  245

MATLAB Files

Count the annotations in the MATLAB files, similar to ifcb_summarize_png_counts:

# Summarize counts from MATLAB files
mat_count <- ifcb_count_mat_annotations(manual_files = "data/manual",
                                        class2use_file = "data/config/class2use.mat",
                                        skip_class = "unclassified", # Or class ID
                                        sum_level = "class") # Or per "sample"

# Print output
head(mat_count)
## # A tibble: 6 × 2
##   class                           n
##   <chr>                       <int>
## 1 Alexandrium_pseudogonyaulax     3
## 2 Amphidnium-like                 1
## 3 Chaetoceros_spp_chain           6
## 4 Chaetoceros_spp_single_cell     3
## 5 Ciliophora                     23
## 6 Cryptomonadales               245

To visually inspect and correct annotations, run the image gallery.

# Run Shiny app
ifcb_run_image_gallery()
image_gallery
image_gallery

Individual images can be selected and a list of selected images can be downloaded as a correction file. This file can be used to correct .mat annotations below using the ifcb_correct_annotation function.

Correct .mat Files After Checking Images in the App

After reviewing images in the gallery, correct the .mat files using the correction file with selected images:

# Get class2use
class_name <- ifcb_get_mat_names("data/config/class2use.mat")
class2use <- ifcb_get_mat_variable("data/config/class2use.mat",
                                   variable_name = class_name)

# Find the class id of unclassified
unclassified_id <- which(grepl("unclassified",
                         class2use))

# Initialize the python session if not already set up
env_path <- "~/.virtualenvs/iRfcb"
ifcb_py_install(envname = env_path)

# Correct the annotation with the output from the image gallery
ifcb_correct_annotation(manual_folder = "data/manual",
                        out_folder = "data/manual",
                        correction = "data/manual/correction/Alexandrium_pseudogonyaulax_selected_images.txt",
                        correct_classid = unclassified_id)

Replace Specific Class Annotations

Replace all instances of a specific class with unclassified (class id 1):

# Get class2use
class_name <- ifcb_get_mat_names("data/config/class2use.mat")
class2use <- ifcb_get_mat_variable("data/config/class2use.mat",
                                   variable_name = class_name)

# Find the class id of Alexandrium_pseudogonyaulax
ap_id <- which(grepl("Alexandrium_pseudogonyaulax",
                     class2use))

# Find the class id of unclassified
unclassified_id <- which(grepl("unclassified",
                         class2use))

# Initialize the python session if not already set up
# env_path <- "~/.virtualenvs/iRfcb"
# ifcb_py_install(envname = env_path)

# Move all Alexandrium_pseudogonyaulax images to unclassified
ifcb_replace_mat_values(manual_folder = "data/manual",
                        out_folder = "data/manual",
                        target_id = ap_id,
                        new_id = unclassified_id)

Verify Correction

Verify that the corrections have been applied:

# Summarize new counts after correction
mat_count <- ifcb_count_mat_annotations(manual_files = "data/manual",
                                        class2use_file = "data/config/class2use.mat",
                                        skip_class = "unclassified", # Or class ID
                                        sum_level = "class") # Or per "sample"

# Print output
head(mat_count)
## # A tibble: 6 × 2
##   class                                  n
##   <chr>                              <int>
## 1 Amphidnium-like                        1
## 2 Chaetoceros_spp_chain                  6
## 3 Chaetoceros_spp_single_cell            3
## 4 Ciliophora                            23
## 5 Cryptomonadales                      245
## 6 Cylindrotheca_Nitzschia_longissima    47

Annotate Images in Batch

Images can be batch annotated using the ifcb_annotate_batch function. If a manual file already exists for the sample, the ROI class list will be updated accordingly. If no file is found, a new .mat file will be created, with all unannotated ROIs marked as unclassified.

# Read a file with selected images, generated by the image gallery app
correction <- read.table("data/manual/correction/Alexandrium_pseudogonyaulax_selected_images.txt", 
                         header = TRUE)

# Print image names to be annotated
print(correction$image_filename)
## [1] "D20220712T210855_IFCB134_00164.png" "D20220712T222710_IFCB134_00044.png"
# Re-annotate the images that were moved to unclassified earlier in the tutorial
ifcb_annotate_batch(png_images = correction$image_filename,
                    class = "Alexandrium_pseudogonyaulax",
                    manual_folder = "data/manual",
                    adc_folder = "data/data",
                    class2use_file = "data/config/class2use.mat")

# Summarize new counts after re-annotation
mat_count <- ifcb_count_mat_annotations(manual_files = "data/manual",
                                        class2use_file = "data/config/class2use.mat",
                                        skip_class = "unclassified",
                                        sum_level = "class")

# Print output and check if Alexandrium pseudogonyaulax is back
head(mat_count)
## # A tibble: 6 × 2
##   class                           n
##   <chr>                       <int>
## 1 Alexandrium_pseudogonyaulax     2
## 2 Amphidnium-like                 1
## 3 Chaetoceros_spp_chain           6
## 4 Chaetoceros_spp_single_cell     3
## 5 Ciliophora                     23
## 6 Cryptomonadales               245

This concludes this tutorial for the iRfcb package. For more detailed information, refer to the package documentation or the other tutorials. See how data pipelines can be constructed using iRfcb in the following Example Project. Happy analyzing!

Citation

## To cite package 'iRfcb' in publications use:
## 
##   Anders Torstensson (2025). I 'R' FlowCytobot (iRfcb): Tools for
##   Analyzing and Processing Data from the IFCB. R package version 0.4.0.
##   https://doi.org/10.5281/zenodo.12533225
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {I 'R' FlowCytobot (iRfcb): Tools for Analyzing and Processing Data from the IFCB},
##     author = {Anders Torstensson},
##     year = {2025},
##     note = {R package version 0.4.0},
##     url = {https://doi.org/10.5281/zenodo.12533225},
##   }

References

  • Kraft, K., Velhonoja, O., Seppälä, J., Hällfors, H., Suikkanen, S., Ylöstalo, P., Anglès, S., Kielosto, S., Kuosa, H., Lehtinen, S., Oja, J., Tamminen, T. (2022). SYKE-plankton_IFCB_2022 [Data set]. https://b2share.eudat.eu. https://doi.org/10.23728/b2share.abf913e5a6ad47e6baa273ae0ed6617a
  • Menden-Deuer Susanne, Lessard Evelyn J., (2000), Carbon to volume relationships for dinoflagellates, diatoms, and other protist plankton, Limnology and Oceanography, 3, doi: 10.4319/lo.2000.45.3.0569.
  • Sosik, H. M. and Olson, R. J. (2007) Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
  • Torstensson, A., Skjevik, A-T., Mohlin, M., Karlberg, M. and Karlson, B. (2024). SMHI IFCB Plankton Image Reference Library. SciLifeLab. Dataset. https://doi.org/10.17044/scilifelab.25883455.v3