Introduction
Annotated images can be shared as zipped .jpg
packages
through various data repositories (e.g., Kraft et al., 2022; Torstensson
et al., 2024), enabling others to train or enhance their image
classifiers. This vignette provides a step-by-step guide to extracting
and preparing such images for publication using the iRfcb
package.
The workflow assumes that Regions of Interest (ROIs) have been annotated using the MATLAB code from the ifcb-analysis repository (Sosik and Olson, 2007). However, the methods presented can be adapted to process images generated by other software platforms. The archive can be shared through various sources, such as Figshare, Zenodo, EUDAT. Links to some repositories from Northern Europe are gathered at the Nordic Microalgae webpage. Images may also be shared through EcoTaxa, which is demontrated in the Prepare IFCB Images for EcoTaxa tutorial.
Additionally, this vignette shows how users of the ifcb-analysis package can share and merge multiple datasets of manually annotated images, enabling MATLAB users to incorporate external datasets into their random forest algorithms.
Getting Started
Installation
You can install the package from GitHub using the
devtools
package:
# install.packages("devtools")
devtools::install_github("EuropeanIFCBGroup/iRfcb",
dependencies = TRUE)
Load the iRfcb
library:
Download Sample Data
To get started, download sample data from the SMHI IFCB Plankton Image Reference Library (Torstensson et al. 2024) with the following function:
# Define data directory
data_dir <- "data"
# Download and extract test data in the data folder
ifcb_download_test_data(dest_dir = data_dir,
max_retries = 10,
sleep_time = 30,
verbose = FALSE)
Extract Annotated Images
Extract annotated ROIs as .jpg
images in subfolders for
each class, skipping the unclassified (class id 1)
category:
# Extract .png images
ifcb_extract_annotated_images(manual_folder = "data/manual",
class2use_file = "data/config/class2use.mat",
roi_folder = "data/data",
out_folder = "data/extracted_images",
skip_class = 1, # or "unclassified"
verbose = FALSE) # Do not print messages
Package PNG Directory
Prepare the PNG directory for publication as a zip-archive, similar
to the files in the SMHI IFCB Plankton
Image Reference Library (Torstensson et al. 2024). This function
reads, updates, and incorporates a README file into the
zip archive. A template README file is included with
the iRfcb
package.
# Create zip-archive
ifcb_zip_pngs(png_folder = "data/extracted_images",
zip_filename = "data/zip/ifcb_annotated_images_corrected.zip",
readme_file = system.file("exdata/README-template.md",
package = "iRfcb"), # Template icluded in `iRfcb`
email_address = "tutorial@test.com",
version = "1.1",
print_progress = FALSE)
## Creating README file...
## Creating MANIFEST.txt...
## Creating zip archive...
## Zip archive created successfully: /home/runner/work/iRfcb/iRfcb/vignettes/data/zip/ifcb_annotated_images_corrected.zip
Package MATLAB Directory
Prepare the MATLAB directory for publication as a zip-archive, similar to the files in the SMHI IFCB Plankton Image Reference Library:
# Create zip-archive
ifcb_zip_matlab(manual_folder = "data/manual",
features_folder = "data/features",
class2use_file = "data/config/class2use.mat",
zip_filename = "data/zip/ifcb_matlab_files_corrected.zip",
data_folder = "data/data",
readme_file = system.file("exdata/README-template.md",
package = "iRfcb"), # Template icluded in `iRfcb`
matlab_readme_file = system.file("exdata/MATLAB-template.md",
package = "iRfcb"), # Template icluded in `iRfcb`
email_address = "tutorial@test.com",
version = "1.1",
print_progress = FALSE)
## Listing all files...
## Copying manual files...
## Copying feature files...
## Copying data files...
## Copying class2use file...
## Creating README file...
## Creating MANIFEST.txt...
## Creating zip archive...
## Zip archive created successfully: /home/runner/work/iRfcb/iRfcb/vignettes/data/zip/ifcb_matlab_files_corrected.zip
Create MANIFEST.txt
Create a manifest file for the zip-archive (required for some data repositories):
# Create MANIFEST.txt of the zip folder content
ifcb_create_manifest("data/zip")
## MANIFEST.txt has been created at data/zip/MANIFEST.txt
Merge Manual Datasets
Datasets that have been manually annotated using the MATLAB code from
the ifcb-analysis
repository (Sosik and Olson 2007) can be merged using the ifcb_merge_manual
function. This is a wrapper function of the ifcb_create_class2use
,
ifcb_replace_mat_values
and ifcb_adjust_classes
functions.
In this example, two datasets from the Swedish west coast are downloaded from the SMHI IFCB Plankton Image Reference Library (version 3) (Torstensson et al. 2024) and combined into a single dataset. Please note that these datasets are large, and the downloading and merging processes may take considerable time.
# Define data directories
skagerrak_kattegat_dir <- "data_skagerrak_kattegat"
tangesund_dir <- "data_tangesund"
merged_dir <- "data_skagerrak_kattegat_tangesund_merged"
# Download and extract Skagerrak-Kattegat data in the data folder
ifcb_download_test_data(dest_dir = skagerrak_kattegat_dir,
figshare_article = "48158725")
# Download and extract Tångesund data in the data folder
ifcb_download_test_data(dest_dir = tangesund_dir,
figshare_article = "48158731")
# Initialize the python session if not already set up
# env_path <- "~/.virtualenvs/iRfcb"
# ifcb_py_install(envname = env_path)
# Merge Skagerrak-Kattegat and Tångesund to a single dataset
ifcb_merge_manual(class2use_file_base = file.path(skagerrak_kattegat_dir, "config/class2use.mat"),
class2use_file_additions = file.path(tangesund_dir, "config/class2use.mat"),
class2use_file_output = file.path(merged_dir, "config/class2use.mat"),
manual_folder_base = file.path(skagerrak_kattegat_dir, "manual"),
manual_folder_additions = file.path(tangesund_dir, "manual"),
manual_folder_output = file.path(merged_dir, "manual")
)
This concludes this tutorial for the iRfcb
package. For
more detailed information, refer to the package documentation or the
other tutorials. See how data
pipelines can be constructed using iRfcb
in the following
Example
Project. Happy analyzing!
Citation
## To cite package 'iRfcb' in publications use:
##
## Anders Torstensson (2025). I 'R' FlowCytobot (iRfcb): Tools for
## Analyzing and Processing Data from the IFCB. R package version 0.4.0.
## https://doi.org/10.5281/zenodo.12533225
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {I 'R' FlowCytobot (iRfcb): Tools for Analyzing and Processing Data from the IFCB},
## author = {Anders Torstensson},
## year = {2025},
## note = {R package version 0.4.0},
## url = {https://doi.org/10.5281/zenodo.12533225},
## }
References
- Kraft, K., Velhonoja, O., Seppälä, J., Hällfors, H., Suikkanen, S., Ylöstalo, P., Anglès, S., Kielosto, S., Kuosa, H., Lehtinen, S., Oja, J., Tamminen, T. (2022). SYKE-plankton_IFCB_2022 [Data set]. https://b2share.eudat.eu. https://doi.org/10.23728/b2share.abf913e5a6ad47e6baa273ae0ed6617a
- Sosik, H. M. and Olson, R. J. (2007) Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
- Torstensson, A., Skjevik, A-T., Mohlin, M., Karlberg, M. and Karlson, B. (2024). SMHI IFCB Plankton Image Reference Library. SciLifeLab. Dataset. https://doi.org/10.17044/scilifelab.25883455.v3