Skip to contents

This function calculates aggregated biovolumes and carbon content from Imaging FlowCytobot (IFCB) samples based on biovolume information from feature files. Images are grouped into classes either based on classification files (.mat, .h5, or .csv), manually annotated files, or a user-supplied list of images and their corresponding class labels (e.g. from a CNN model).

Usage

ifcb_summarize_biovolumes(
  feature_folder,
  class_files = NULL,
  class2use_file = NULL,
  hdr_folder = NULL,
  custom_images = NULL,
  custom_classes = NULL,
  micron_factor = 1/3.4,
  diatom_class = "Bacillariophyceae",
  diatom_include = NULL,
  marine_only = FALSE,
  diatom_equation = c("large", "all"),
  threshold = "opt",
  feature_recursive = TRUE,
  class_recursive = TRUE,
  hdr_recursive = TRUE,
  drop_zero_volume = FALSE,
  feature_version = NULL,
  use_cell_counts = FALSE,
  single_cell_values = c(-1, 0),
  use_python = FALSE,
  verbose = TRUE,
  mat_folder = deprecated(),
  mat_files = deprecated(),
  mat_recursive = deprecated()
)

Arguments

feature_folder

Path to the folder containing feature files (e.g., CSV format).

class_files

(Optional) A character vector of full paths to classification or manual annotation files (.mat, .h5, or .csv), or a single path to a folder containing such files.

class2use_file

(Optional) A character string specifying the path to the file containing the class2use variable (default NULL). Only needed when summarizing manual MATLAB results.

hdr_folder

(Optional) Path to the folder containing HDR files. Needed for calculating cell, biovolume and carbon concentration per liter.

custom_images

(Optional) A character vector of image filenames in the format DYYYYMMDDTHHMMSS_IFCBXXX_ZZZZZ(.png), where "XXX" represents the IFCB number and "ZZZZZ" represents the ROI number. These filenames should match the roi_number assignment in the feature_files and can be used as a substitute for classification files.

custom_classes

(Optional) A character vector of corresponding class labels for custom_images.

micron_factor

Conversion factor from microns per pixel (default: 1/3.4).

diatom_class

A character vector of diatom class names in the World Register of Marine Species (WoRMS). Default is "Bacillariophyceae".

diatom_include

Optional character vector of class names that should always be treated as diatoms, overriding the boolean result of ifcb_is_diatom. Default: NULL.

marine_only

Logical. If TRUE, restricts the WoRMS search to marine taxa only. Default is FALSE.

diatom_equation

A character string selecting which Menden-Deuer and Lessard (2000) carbon-to-volume relationship to apply to diatoms. "large" (default) uses the large-diatom (> 3000 micron^3) equation, matching the ifcb-analysis convention. "all" uses the all-sizes diatom equation, which assigns more carbon to small cells. Note that biovolume is measured per region of interest (ROI/image), not per cell, so chains of small cells register a large ROI biovolume. Passed to ifcb_extract_biovolumes.

threshold

A character string controlling which classification to use. "opt" (default) uses the threshold-applied classification, where predictions below the per-class optimal threshold are labeled "unclassified". Any other value (e.g. "all") uses the raw winning class without any threshold applied.

feature_recursive

Logical. If TRUE, the function will search for feature files recursively within the feature_folder. Default is TRUE.

class_recursive

Logical. If TRUE, the function will search for classification files recursively when class_files is a folder. Default is TRUE.

hdr_recursive

Logical. If TRUE, the function will search for HDR files recursively within the hdr_folder (if provided). Default is TRUE.

drop_zero_volume

Logical. If TRUE, rows where Biovolume equals zero (e.g., artifacts such as smudges on the flow cell) are removed. Default: FALSE.

feature_version

Optional numeric or character version to filter feature files by (e.g. 2 for "_v2"). Default is NULL (no filtering).

use_cell_counts

Logical. If TRUE, reads the optional per-ROI cell_count data stored by the diatom chain counter in .h5/.csv classification files and adds cell_counts (and cell_counts_per_liter when hdr_folder is supplied) to the output, reporting cell abundance alongside ROI counts. Only supported with automated class_files. The function aborts if enabled but no classification file contains chain-count data. For chain-length statistics (mean, median, max chain length) use ifcb_summarize_cell_counts. Note that cell_counts here is summed only over ROIs that also have matching feature (biovolume) data (the same ROI population as counts); ifcb_summarize_cell_counts instead sums over all classified ROIs, so the two abundance totals can differ. Default is FALSE.

single_cell_values

Integer vector of cell_count values that should be treated as a single cell when computing cell_counts. Default is c(-1, 0), i.e. ROIs that were not counted (-1) and ROIs where no cells were detected (0) each count as one cell. Values not listed are used verbatim. Only used when use_cell_counts = TRUE.

use_python

Logical. If TRUE, attempts to read the .mat file using a Python-based method. Default is FALSE.

verbose

A logical indicating whether to print progress messages. Default is TRUE.

mat_folder

[Deprecated] Use class_files instead.

mat_files

[Deprecated] Use class_files instead.

mat_recursive

[Deprecated] Use class_recursive instead.

Value

A data frame summarizing aggregated biovolume and carbon content per class per sample. Columns include 'sample', 'classifier', 'class', 'biovolume_mm3', 'carbon_ug', 'ml_analyzed', 'biovolume_mm3_per_liter', and 'carbon_ug_per_liter'. When use_cell_counts = TRUE, the cell abundance columns 'cell_counts' (and 'cell_counts_per_liter' when hdr_folder is provided) are also included.

Details

This function performs the following steps:

  1. Extracts biovolumes and carbon content from feature and classification results using ifcb_extract_biovolumes.

  2. Optionally incorporates volume data from HDR files to calculate volume analyzed per sample.

  3. Computes biovolume and carbon content per liter of sample analyzed.

The classification or manual annotation files are generated by the ifcb-analysis repository (Sosik and Olson 2007). Users can optionally provide a custom classification by supplying a vector of image filenames (custom_images) along with corresponding class labels (custom_classes). This allows summarization of biovolume and carbon content without requiring classification or manual annotation files (e.g. results from a CNN model).

Biovolumes are converted to carbon according to Menden-Deuer and Lessard 2000 for individual regions of interest (ROI), applying different conversion factors to diatoms and non-diatom protists. The diatom relationship is selected with diatom_equation ("large", the default, or "all"). If provided, the function also incorporates sample volume data from HDR files to compute biovolume and carbon content per liter of sample.

If use_python = TRUE, the function tries to read the .mat file using ifcb_read_mat(), which relies on SciPy. This approach may be faster than the default R reader, especially for large .mat files. To enable this functionality, ensure Python is properly configured with the required dependencies. You can initialize the Python environment and install necessary packages using ifcb_py_install().

References

Menden-Deuer Susanne, Lessard Evelyn J., (2000), Carbon to volume relationships for dinoflagellates, diatoms, and other protist plankton, Limnology and Oceanography, 45(3), 569-579, doi: 10.4319/lo.2000.45.3.0569.

Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.

Groves, G. J. J., Arthur, G., Bresnan, E., Whyte, C., Arce, P. and Davidson, K. (2026), Automatic enumeration of chains of marine diatoms using "You Only Look Once" - a machine learning approach. Journal of Plankton Research, 48(2), fbaf064, doi: 10.1093/plankt/fbaf064.

Examples

if (FALSE) { # \dontrun{
# Example usage:
ifcb_summarize_biovolumes("path/to/features", "path/to/classified",
                          hdr_folder = "path/to/hdr")

# Using custom classification result:
images <- c("D20220522T003051_IFCB134_00002",
            "D20220522T003051_IFCB134_00003")
classes <- c("Mesodinium_rubrum",
             "Mesodinium_rubrum")

ifcb_summarize_biovolumes(feature_folder = "path/to/features",
                          hdr_folder = "path/to/hdr",
                          custom_images = images,
                          custom_classes = classes)
} # }