Introduction
Annotated images can be shared as zipped .png
packages
through various data repositories (e.g., Kraft et al., 2022; Torstensson
et al., 2024), enabling others to train or enhance their image
classifiers. This vignette provides a step-by-step guide to extracting
and preparing such images for publication using the iRfcb
package.
The workflow assumes that Regions of Interest (ROIs) have been
annotated using the MATLAB code from the ifcb-analysis
repository (Sosik and Olson, 2007). However, the methods presented can
be adapted to process images generated by other software platforms. The
archive can be shared through various sources, such as Figshare, Zenodo, EUDAT. Links to some repositories
from Northern Europe are gathered at the Nordic
Microalgae webpage. Images may also be shared through EcoTaxa, which
is demonstrated in vignette("ecotaxa-tutorial")
.
Additionally, this vignette shows how users of the ifcb-analysis
package can share and merge multiple datasets of manually annotated
images, enabling MATLAB users to incorporate external datasets into
their random forest algorithms.
Getting Started
Installation
You can install the package from CRAN using:
install.packages("iRfcb")
Some functions from the iRfcb
package used in this
tutorial require Python
to be installed. You can download
Python
from the official website: python.org/downloads.
The iRfcb
package can be configured to automatically
activate an installed Python virtual environment (venv) upon loading by
setting an environment variable. For more details, please refer to the
package README.
Load the iRfcb
library:
Download Sample Data
To get started, download sample data from the SMHI IFCB Plankton Image Reference Library (Torstensson et al. 2024) with the following function:
# Define data directory
data_dir <- "data"
# Download and extract test data in the data folder
ifcb_download_test_data(
dest_dir = data_dir,
max_retries = 10,
sleep_time = 30,
verbose = FALSE
)
Extract Annotated Images
Extract annotated ROIs as .png
images in subfolders for
each class, skipping the unclassified (class id 1)
category:
# Extract .png images
ifcb_extract_annotated_images(
manual_folder = "data/manual",
class2use_file = "data/config/class2use.mat",
roi_folders = "data/data",
out_folder = "data/extracted_images",
skip_class = 1, # or "unclassified"
verbose = FALSE # Do not print messages
)
Package PNG Directory
Prepare the PNG directory for publication as a zip-archive, similar
to the files in the SMHI IFCB Plankton
Image Reference Library (Torstensson et al. 2024). This function
reads, updates, and incorporates a README file into the
zip archive. A template README file is included with
the iRfcb
package.
# Create zip-archive
ifcb_zip_pngs(
png_folder = "data/extracted_images",
zip_filename = "data/zip/ifcb_annotated_images_corrected.zip",
# Template icluded in `iRfcb`
readme_file = system.file("exdata/README-template.md", package = "iRfcb"),
email_address = "tutorial@test.com",
version = "1.1",
print_progress = FALSE
)
## Creating README file...
## Creating MANIFEST.txt...
## Creating zip archive...
## Zip archive created successfully: /home/runner/work/iRfcb/iRfcb/vignettes/data/zip/ifcb_annotated_images_corrected.zip
Package MATLAB Directory
Prepare the MATLAB directory for publication as a zip-archive, similar to the files in the SMHI IFCB Plankton Image Reference Library:
# Create zip-archive
ifcb_zip_matlab(
manual_folder = "data/manual",
features_folder = "data/features",
class2use_file = "data/config/class2use.mat",
zip_filename = "data/zip/ifcb_matlab_files_corrected.zip",
data_folder = "data/data",
# Templates icluded in `iRfcb`
readme_file = system.file("exdata/README-template.md", package = "iRfcb"),
matlab_readme_file = system.file("exdata/MATLAB-template.md", package = "iRfcb"),
email_address = "tutorial@test.com",
version = "1.1",
print_progress = FALSE
)
## Listing all files...
## Copying manual files...
## Copying feature files...
## Copying data files...
## Copying class2use file...
## Creating README file...
## Creating MANIFEST.txt...
## Creating zip archive...
## Zip archive created successfully: /home/runner/work/iRfcb/iRfcb/vignettes/data/zip/ifcb_matlab_files_corrected.zip
Create MANIFEST.txt
Create a manifest file for the zip-archive (required for some data repositories):
# Create MANIFEST.txt of the zip folder content
ifcb_create_manifest("data/zip")
## MANIFEST.txt has been created at data/zip/MANIFEST.txt
Merge Manual Datasets
Datasets that have been manually annotated using the MATLAB code from
the ifcb-analysis
repository (Sosik and Olson 2007) can be merged using the
ifcb_merge_manual()
function. This is a wrapper function of
the ifcb_create_class2use()
,
ifcb_replace_mat_values()
and
ifcb_adjust_classes()
functions.
In this example, two datasets from the Swedish west coast are downloaded from the SMHI IFCB Plankton Image Reference Library (version 3) (Torstensson et al. 2024) and combined into a single dataset. Please note that these datasets are large, and the downloading and merging processes may take considerable time.
# Define data directories
skagerrak_kattegat_dir <- "data_skagerrak_kattegat"
tangesund_dir <- "data_tangesund"
merged_dir <- "data_skagerrak_kattegat_tangesund_merged"
# Download and extract Skagerrak-Kattegat data in the data folder
ifcb_download_test_data(dest_dir = skagerrak_kattegat_dir,
figshare_article = "48158725")
# Download and extract Tångesund data in the data folder
ifcb_download_test_data(dest_dir = tangesund_dir,
figshare_article = "48158731")
# Initialize the python session if not already set up
env_path <- file.path(tempdir(), "iRfcb") # Or your preferred venv path
ifcb_py_install(envname = env_path)
# Merge Skagerrak-Kattegat and Tångesund to a single dataset
ifcb_merge_manual(
class2use_file_base = file.path(skagerrak_kattegat_dir, "config/class2use.mat"),
class2use_file_additions = file.path(tangesund_dir, "config/class2use.mat"),
class2use_file_output = file.path(merged_dir, "config/class2use.mat"),
manual_folder_base = file.path(skagerrak_kattegat_dir, "manual"),
manual_folder_additions = file.path(tangesund_dir, "manual"),
manual_folder_output = file.path(merged_dir, "manual"))
This concludes this tutorial for the iRfcb
package. For
more detailed information, refer to the package documentation or the
other tutorials.
See how data pipelines can be constructed using iRfcb
in
the following Example
Project. Happy analyzing!
Citation
## To cite package 'iRfcb' in publications use:
##
## Anders Torstensson (2025). iRfcb: Tools for Managing Imaging
## FlowCytobot (IFCB) Data. R package version 0.4.3.
## https://CRAN.R-project.org/package=iRfcb
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {iRfcb: Tools for Managing Imaging FlowCytobot (IFCB) Data},
## author = {Anders Torstensson},
## year = {2025},
## note = {R package version 0.4.3},
## url = {https://CRAN.R-project.org/package=iRfcb},
## }
References
- Kraft, K., Velhonoja, O., Seppälä, J., Hällfors, H., Suikkanen, S., Ylöstalo, P., Anglès, S., Kielosto, S., Kuosa, H., Lehtinen, S., Oja, J., Tamminen, T. (2022). SYKE-plankton_IFCB_2022 [Data set]. https://b2share.eudat.eu. https://doi.org/10.23728/b2share.abf913e5a6ad47e6baa273ae0ed6617a
- Sosik, H. M. and Olson, R. J. (2007) Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
- Torstensson, A., Skjevik, A-T., Mohlin, M., Karlberg, M. and Karlson, B. (2024). SMHI IFCB Plankton Image Reference Library. SciLifeLab. Dataset. https://doi.org/10.17044/scilifelab.25883455.v3