Skip to contents

Introduction

Annotated images can be shared as zipped .jpg packages through various data repositories (e.g., Kraft et al., 2022; Torstensson et al., 2024), enabling others to train or enhance their image classifiers. This vignette provides a step-by-step guide to extracting and preparing such images for publication using the iRfcb package.

The workflow assumes that Regions of Interest (ROIs) have been annotated using the MATLAB code from the ifcb-analysis repository (Sosik and Olson, 2007). However, the methods presented can be adapted to process images generated by other software platforms. The archive can be shared through various sources, such as Figshare, Zenodo, EUDAT. Links to some repositories from Northern Europe are gathered at the Nordic Microalgae webpage. Images may also be shared through EcoTaxa, which is demontrated in the Prepare IFCB Images for EcoTaxa tutorial.

Additionally, this vignette shows how users of the ifcb-analysis package can share and merge multiple datasets of manually annotated images, enabling MATLAB users to incorporate external datasets into their random forest algorithms.

Getting Started

Installation

You can install the package from GitHub using the devtools package:

# install.packages("devtools")
devtools::install_github("EuropeanIFCBGroup/iRfcb",
                         dependencies = TRUE)

Load the iRfcb library:

Download Sample Data

To get started, download sample data from the SMHI IFCB Plankton Image Reference Library (Torstensson et al. 2024) with the following function:

# Define data directory
data_dir <- "data"

# Download and extract test data in the data folder
ifcb_download_test_data(dest_dir = data_dir,
                        max_retries = 10,
                        sleep_time = 30,
                        verbose = FALSE)

Extract Annotated Images

Extract annotated ROIs as .jpg images in subfolders for each class, skipping the unclassified (class id 1) category:

# Extract .png images
ifcb_extract_annotated_images(manual_folder = "data/manual",
                              class2use_file = "data/config/class2use.mat",
                              roi_folder = "data/data",
                              out_folder = "data/extracted_images",
                              skip_class = 1, # or "unclassified"
                              verbose = FALSE) # Do not print messages

Package PNG Directory

Prepare the PNG directory for publication as a zip-archive, similar to the files in the SMHI IFCB Plankton Image Reference Library (Torstensson et al. 2024). This function reads, updates, and incorporates a README file into the zip archive. A template README file is included with the iRfcb package.

# Create zip-archive
ifcb_zip_pngs(png_folder = "data/extracted_images",
              zip_filename = "data/zip/ifcb_annotated_images_corrected.zip",
              readme_file = system.file("exdata/README-template.md", 
                                        package = "iRfcb"), # Template icluded in `iRfcb`
              email_address = "tutorial@test.com",
              version = "1.1",
              print_progress = FALSE)
## Creating README file...
## Creating MANIFEST.txt...
## Creating zip archive...
## Zip archive created successfully: /home/runner/work/iRfcb/iRfcb/vignettes/data/zip/ifcb_annotated_images_corrected.zip

Package MATLAB Directory

Prepare the MATLAB directory for publication as a zip-archive, similar to the files in the SMHI IFCB Plankton Image Reference Library:

# Create zip-archive
ifcb_zip_matlab(manual_folder = "data/manual",
                features_folder = "data/features",
                class2use_file = "data/config/class2use.mat",
                zip_filename = "data/zip/ifcb_matlab_files_corrected.zip",
                data_folder = "data/data",
                readme_file = system.file("exdata/README-template.md", 
                                          package = "iRfcb"), # Template icluded in `iRfcb`
                matlab_readme_file = system.file("exdata/MATLAB-template.md", 
                                                 package = "iRfcb"), # Template icluded in `iRfcb`
                email_address = "tutorial@test.com",
                version = "1.1",
                print_progress = FALSE)
## Listing all files...
## Copying manual files...
## Copying feature files...
## Copying data files...
## Copying class2use file...
## Creating README file...
## Creating MANIFEST.txt...
## Creating zip archive...
## Zip archive created successfully: /home/runner/work/iRfcb/iRfcb/vignettes/data/zip/ifcb_matlab_files_corrected.zip

Create MANIFEST.txt

Create a manifest file for the zip-archive (required for some data repositories):

# Create MANIFEST.txt of the zip folder content
ifcb_create_manifest("data/zip")
## MANIFEST.txt has been created at data/zip/MANIFEST.txt

Merge Manual Datasets

Datasets that have been manually annotated using the MATLAB code from the ifcb-analysis repository (Sosik and Olson 2007) can be merged using the ifcb_merge_manual function. This is a wrapper function of the ifcb_create_class2use, ifcb_replace_mat_values and ifcb_adjust_classes functions.

In this example, two datasets from the Swedish west coast are downloaded from the SMHI IFCB Plankton Image Reference Library (version 3) (Torstensson et al. 2024) and combined into a single dataset. Please note that these datasets are large, and the downloading and merging processes may take considerable time.

# Define data directories
skagerrak_kattegat_dir <- "data_skagerrak_kattegat"
tangesund_dir <- "data_tangesund"
merged_dir <- "data_skagerrak_kattegat_tangesund_merged"

# Download and extract Skagerrak-Kattegat data in the data folder
ifcb_download_test_data(dest_dir = skagerrak_kattegat_dir,
                        figshare_article = "48158725")

# Download and extract Tångesund data in the data folder
ifcb_download_test_data(dest_dir = tangesund_dir,
                        figshare_article = "48158731")

# Initialize the python session if not already set up
# env_path <- "~/.virtualenvs/iRfcb"
# ifcb_py_install(envname = env_path)

# Merge Skagerrak-Kattegat and Tångesund to a single dataset
ifcb_merge_manual(class2use_file_base = file.path(skagerrak_kattegat_dir, "config/class2use.mat"),
                  class2use_file_additions = file.path(tangesund_dir, "config/class2use.mat"),
                  class2use_file_output = file.path(merged_dir, "config/class2use.mat"),
                  manual_folder_base = file.path(skagerrak_kattegat_dir, "manual"),
                  manual_folder_additions = file.path(tangesund_dir, "manual"),
                  manual_folder_output = file.path(merged_dir, "manual")
                  )

This concludes this tutorial for the iRfcb package. For more detailed information, refer to the package documentation or the other tutorials. See how data pipelines can be constructed using iRfcb in the following Example Project. Happy analyzing!

Citation

## To cite package 'iRfcb' in publications use:
## 
##   Anders Torstensson (2025). I 'R' FlowCytobot (iRfcb): Tools for
##   Analyzing and Processing Data from the IFCB. R package version 0.4.0.
##   https://doi.org/10.5281/zenodo.12533225
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {I 'R' FlowCytobot (iRfcb): Tools for Analyzing and Processing Data from the IFCB},
##     author = {Anders Torstensson},
##     year = {2025},
##     note = {R package version 0.4.0},
##     url = {https://doi.org/10.5281/zenodo.12533225},
##   }

References