
Summarize Biovolumes and Carbon Content from IFCB Data
Source:R/ifcb_summarize_biovolumes.R
ifcb_summarize_biovolumes.RdThis function calculates aggregated biovolumes and carbon content from Imaging FlowCytobot (IFCB)
samples based on biovolume information from feature files. Images are grouped into classes either
based on classification files (.mat, .h5, or .csv), manually annotated files, or a
user-supplied list of images and their corresponding class labels (e.g. from a CNN model).
Usage
ifcb_summarize_biovolumes(
feature_folder,
class_files = NULL,
class2use_file = NULL,
hdr_folder = NULL,
custom_images = NULL,
custom_classes = NULL,
micron_factor = 1/3.4,
diatom_class = "Bacillariophyceae",
diatom_include = NULL,
marine_only = FALSE,
diatom_equation = c("large", "all"),
threshold = "opt",
feature_recursive = TRUE,
class_recursive = TRUE,
hdr_recursive = TRUE,
drop_zero_volume = FALSE,
feature_version = NULL,
use_cell_counts = FALSE,
single_cell_values = c(-1, 0),
use_python = FALSE,
verbose = TRUE,
mat_folder = deprecated(),
mat_files = deprecated(),
mat_recursive = deprecated()
)Arguments
- feature_folder
Path to the folder containing feature files (e.g., CSV format).
- class_files
(Optional) A character vector of full paths to classification or manual annotation files (
.mat,.h5, or.csv), or a single path to a folder containing such files.- class2use_file
(Optional) A character string specifying the path to the file containing the class2use variable (default NULL). Only needed when summarizing manual MATLAB results.
- hdr_folder
(Optional) Path to the folder containing HDR files. Needed for calculating cell, biovolume and carbon concentration per liter.
- custom_images
(Optional) A character vector of image filenames in the format DYYYYMMDDTHHMMSS_IFCBXXX_ZZZZZ(.png), where "XXX" represents the IFCB number and "ZZZZZ" represents the ROI number. These filenames should match the
roi_numberassignment in thefeature_filesand can be used as a substitute for classification files.- custom_classes
(Optional) A character vector of corresponding class labels for
custom_images.- micron_factor
Conversion factor from microns per pixel (default: 1/3.4).
- diatom_class
A character vector of diatom class names in the World Register of Marine Species (WoRMS). Default is "Bacillariophyceae".
- diatom_include
Optional character vector of class names that should always be treated as diatoms, overriding the boolean result of
ifcb_is_diatom. Default: NULL.- marine_only
Logical. If TRUE, restricts the WoRMS search to marine taxa only. Default is FALSE.
- diatom_equation
A character string selecting which Menden-Deuer and Lessard (2000) carbon-to-volume relationship to apply to diatoms.
"large"(default) uses the large-diatom (> 3000 micron^3) equation, matching theifcb-analysisconvention."all"uses the all-sizes diatom equation, which assigns more carbon to small cells. Note that biovolume is measured per region of interest (ROI/image), not per cell, so chains of small cells register a large ROI biovolume. Passed toifcb_extract_biovolumes.- threshold
A character string controlling which classification to use.
"opt"(default) uses the threshold-applied classification, where predictions below the per-class optimal threshold are labeled"unclassified". Any other value (e.g."all") uses the raw winning class without any threshold applied.- feature_recursive
Logical. If TRUE, the function will search for feature files recursively within the
feature_folder. Default is TRUE.- class_recursive
Logical. If TRUE, the function will search for classification files recursively when
class_filesis a folder. Default is TRUE.- hdr_recursive
Logical. If TRUE, the function will search for HDR files recursively within the
hdr_folder(if provided). Default is TRUE.- drop_zero_volume
Logical. If
TRUE, rows whereBiovolumeequals zero (e.g., artifacts such as smudges on the flow cell) are removed. Default:FALSE.- feature_version
Optional numeric or character version to filter feature files by (e.g. 2 for "_v2"). Default is NULL (no filtering).
- use_cell_counts
Logical. If
TRUE, reads the optional per-ROIcell_countdata stored by the diatom chain counter in.h5/.csvclassification files and addscell_counts(andcell_counts_per_literwhenhdr_folderis supplied) to the output, reporting cell abundance alongside ROI counts. Only supported with automatedclass_files. The function aborts if enabled but no classification file contains chain-count data. For chain-length statistics (mean, median, max chain length) useifcb_summarize_cell_counts. Note thatcell_countshere is summed only over ROIs that also have matching feature (biovolume) data (the same ROI population ascounts);ifcb_summarize_cell_countsinstead sums over all classified ROIs, so the two abundance totals can differ. Default isFALSE.- single_cell_values
Integer vector of
cell_countvalues that should be treated as a single cell when computingcell_counts. Default isc(-1, 0), i.e. ROIs that were not counted (-1) and ROIs where no cells were detected (0) each count as one cell. Values not listed are used verbatim. Only used whenuse_cell_counts = TRUE.- use_python
Logical. If
TRUE, attempts to read the.matfile using a Python-based method. Default isFALSE.- verbose
A logical indicating whether to print progress messages. Default is TRUE.
- mat_folder
- mat_files
- mat_recursive
Value
A data frame summarizing aggregated biovolume and carbon content per class per sample.
Columns include 'sample', 'classifier', 'class', 'biovolume_mm3', 'carbon_ug', 'ml_analyzed',
'biovolume_mm3_per_liter', and 'carbon_ug_per_liter'. When use_cell_counts = TRUE, the cell
abundance columns 'cell_counts' (and 'cell_counts_per_liter' when hdr_folder is provided) are
also included.
Details
This function performs the following steps:
Extracts biovolumes and carbon content from feature and classification results using
ifcb_extract_biovolumes.Optionally incorporates volume data from HDR files to calculate volume analyzed per sample.
Computes biovolume and carbon content per liter of sample analyzed.
The classification or manual annotation files are generated by the ifcb-analysis repository
(Sosik and Olson 2007). Users can optionally provide a custom classification by supplying a vector of image filenames
(custom_images) along with corresponding class labels (custom_classes). This allows summarization
of biovolume and carbon content without requiring classification or manual annotation files
(e.g. results from a CNN model).
Biovolumes are converted to carbon according to Menden-Deuer and Lessard 2000
for individual regions of interest (ROI), applying different conversion factors to diatoms and
non-diatom protists. The diatom relationship is selected with diatom_equation
("large", the default, or "all"). If provided, the function also incorporates sample volume
data from HDR files to compute biovolume and carbon content per liter of sample.
If use_python = TRUE, the function tries to read the .mat file using ifcb_read_mat(), which relies on SciPy.
This approach may be faster than the default R reader, especially for large .mat files.
To enable this functionality, ensure Python is properly configured with the required dependencies.
You can initialize the Python environment and install necessary packages using ifcb_py_install().
References
Menden-Deuer Susanne, Lessard Evelyn J., (2000), Carbon to volume relationships for dinoflagellates, diatoms, and other protist plankton, Limnology and Oceanography, 45(3), 569-579, doi: 10.4319/lo.2000.45.3.0569.
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
Groves, G. J. J., Arthur, G., Bresnan, E., Whyte, C., Arce, P. and Davidson, K. (2026), Automatic enumeration of chains of marine diatoms using "You Only Look Once" - a machine learning approach. Journal of Plankton Research, 48(2), fbaf064, doi: 10.1093/plankt/fbaf064.
Examples
if (FALSE) { # \dontrun{
# Example usage:
ifcb_summarize_biovolumes("path/to/features", "path/to/classified",
hdr_folder = "path/to/hdr")
# Using custom classification result:
images <- c("D20220522T003051_IFCB134_00002",
"D20220522T003051_IFCB134_00003")
classes <- c("Mesodinium_rubrum",
"Mesodinium_rubrum")
ifcb_summarize_biovolumes(feature_folder = "path/to/features",
hdr_folder = "path/to/hdr",
custom_images = images,
custom_classes = classes)
} # }