This function generates and saves data about a dataset's Particle Size Distribution (PSD) from Imaging FlowCytobot (IFCB) feature and hdr files, which can be used for data quality assurance and quality control.
Usage
ifcb_psd(
feature_folder,
hdr_folder,
bins = NULL,
save_data = FALSE,
output_file = NULL,
plot_folder = NULL,
use_marker = FALSE,
start_fit = 10,
r_sqr = 0.5,
beads = NULL,
bubbles = NULL,
incomplete = NULL,
missing_cells = NULL,
biomass = NULL,
bloom = NULL,
humidity = NULL,
micron_factor = 1/3.4,
fea_v = 2,
use_plot_subfolders = TRUE,
...
)Arguments
- feature_folder
The absolute path to a directory containing all of the feature files for the dataset (version can be defined in
fea_v).- hdr_folder
The absolute path to a directory containing all of the hdr files for the dataset.
- bins
An optional character vector of bin names (e.g.,
"D20251021T133007_IFCB134") to restrict processing to a specified subset of bins. IfNULL(default), all bins present infeature_folderare processed.- save_data
A boolean indicating whether to save data to CSV files. Default is FALSE.
- output_file
A string with the base file name for the .csv output (including path). Set to NULL to avoid saving data (default).
- plot_folder
The folder where graph images for each sample will be saved. If
NULL(default), plots are not saved. Ifuse_plot_subfolders = TRUE, plots are organized into subfolders based on their flag status.- use_marker
A boolean indicating whether to show markers on the plot. Default is FALSE.
- start_fit
An integer indicating the start fit value for the plot. Default is 10.
- r_sqr
The lower limit of acceptable R^2 values (any curves below it will be flagged). Default is 0.5.
- beads
The maximum multiplier for the curve fit. Any files with higher curve fit multipliers will be flagged as bead runs. If this argument is included, files with
"runBeads"marked as TRUE in the header file will also be flagged. Optional.- bubbles
The minimum difference between the starting ESD and the ESD with the most targets. Files with a difference higher than this threshold will be flagged as mostly bubbles. Optional.
- incomplete
A numeric vector of length 2 giving the minimum volume of cells (in c/L) and the minimum mL analyzed for a complete run. Files with values below these thresholds will be flagged as incomplete. Optional.
- missing_cells
The minimum image count ratio threshold. Files with ratios below this value will be flagged as missing cells. Optional.
- biomass
The minimum number of targets in the most populated ESD bin for any given run. Files with fewer targets will be flagged as low biomass. Optional.
- bloom
The minimum difference between the starting ESD and the ESD with the most targets. Files with a difference less than this threshold will be flagged as bloom events. This threshold is usually lower than the bubbles threshold. Optional.
- humidity
The maximum percent humidity. Files with higher values will be flagged as high humidity. Optional.
- micron_factor
The conversion factor to microns. Default is 1/3.4.
- fea_v
The version number of the IFCB feature file (e.g., 2, 4). Default is 2, as described in Hayashi et al. 2025.
- use_plot_subfolders
A boolean indicating whether to save plots in subfolders based on the sample's flag status. If TRUE (default), samples without flags are saved in a "PSD.OK" subfolder, and samples with flags are saved in subfolders named after their flag(s). If FALSE, all plots are saved directly in
plot_folder.- ...
Additional arguments passed to
ggsave(). These override the default width, height, dpi, and background color when saving plots. For example,width = 7, dpi = 300can be supplied.
Value
A list containing three tibbles:
- data
A tibble with flattened PSD data for each sample.
- fits
A tibble containing curve fit parameters for each sample.
- flags
A tibble of flags for each sample, or NULL if no flags are found.
The save_data parameter only controls whether CSV files are written to disk; the
function always returns this list.
Details
The PSD function originates from the PSD Python repository (Hayashi et al. 2025),
which can be found at https://github.com/kudelalab/PSD.
Python must be installed to use this function. The required Python packages can be
installed in a virtual environment using ifcb_py_install().
References
Hayashi, K., Enslein, J., Lie, A., Smith, J., Kudela, R.M., 2025. Using particle size distribution (PSD) to automate imaging flow cytobot (IFCB) data quality in coastal California, USA. International Society for the Study of Harmful Algae. https://doi.org/10.15027/0002041270
Examples
if (FALSE) { # \dontrun{
# Initialize the Python session if not already set up
ifcb_py_install()
ifcb_psd(
feature_folder = 'path/to/features',
hdr_folder = 'path/to/hdr_data',
bins = c("D20211021T133007_IFCB134", "D20211021T140753_IFCB134"),
save_data = TRUE,
output_file = 'psd/svea_2021',
plot_folder = 'psd/plots',
use_marker = FALSE,
start_fit = 13,
r_sqr = 0.5,
beads = 10 ** 9,
bubbles = 150,
incomplete = c(1500, 3),
missing_cells = 0.7,
biomass = 1000,
bloom = 5,
humidity = NULL,
micron_factor = 1/2.77,
fea_v = 2
)
} # }
