Skip to contents

Introduction

This vignette demonstrates how to prepare annotated Imaging FlowCytobot (IFCB) data for EcoTaxa in R using the iRfcb package. The workflow assumes that Regions of Interest (ROIs) have been annotated using the MATLAB code from the ifcb-analysis repository (Sosik and Olson, 2007). However, the code can be adapted to process images from other software platforms. The code can also be adapted to submit unclassified or automatically classified images.

EcoTaxa is a web application widely used for hosting, classifying, and exporting images of individual objects, particularly in plankton imaging. It leverages machine learning to assign names based on a universal taxonomy and produces ecological data in standardized formats for scientific applications. To submit images, accompanying metadata is required, which can be generated using the iRfcb package.

Getting Started

Installation

You can install the package from GitHub using the devtools package:

# install.packages("devtools")
devtools::install_github("EuropeanIFCBGroup/iRfcb",
                         dependencies = TRUE)

Load the required libraries:

library(iRfcb)
library(dplyr) # For data wrangling
library(readr) # For creating .tsv files
library(lubridate) # For handling dates

Download Sample Data

To get started, download sample data from the SMHI IFCB Plankton Image Reference Library (Torstensson et al. 2024) with the following function:

# Define data directory
data_dir <- "data"

# Download and extract test data in the data folder
ifcb_download_test_data(dest_dir = data_dir,
                        max_retries = 10,
                        sleep_time = 30,
                        verbose = FALSE)

Extract Annotated Images

Extract annotated ROIs as .jpg images in subfolders for each class, skipping the unclassified (class id 1) category:

# Extract .png images
ifcb_extract_annotated_images(manual_folder = "data/manual",
                              class2use_file = "data/config/class2use.mat",
                              roi_folder = "data/data",
                              out_folder = "data/extracted_images",
                              skip_class = 1, # or "unclassified"
                              verbose = FALSE) # Do not print messages

Summarize Image Metadata

This section demonstrates how to gather and summarize metadata for images in the png_folder by combining data from feature and .hdr files. Additionally, it retrieves the analysis date and time for each sample based on .mat file creation dates and appends this information to the summarized dataset.

# Summarize image metadata from feature and hdr files
metadata <- ifcb_summarize_png_metadata(png_folder = "data/extracted_images",
                                        feature_folder = "data/features",
                                        hdr_folder = "data/data")

# Print the first ten columns of output
manual_files <- list.files("data/manual", pattern = ".mat", full.names = TRUE)

# Get file info the the .mat files
file_info <- file.info(manual_files)

# Extract analysis date and time based file timestamps
analysis_date <- data.frame(sample = sub(".mat$", "", basename(manual_files)), 
                            analysis_date = as.Date(file_info$ctime), 
                            analysis_time = format(ymd_hms(file_info$ctime), "%H:%M:%S"))

# Merge with metadata
metadata <- metadata %>%
  left_join(analysis_date, by = "sample")

Taxonomic Data Cleaning and Retrieval

Class names often contain unnecessary or inconsistent information. These names need to be cleaned before mapping them to higher taxonomic levels using external sources like WoRMS. The following code demonstrates how to clean class names and retrieve taxonomic details from WoRMS, such as AphiaID.

# Get taxa names
taxa_names <- unique(metadata$subfolder)

# Clean taxa_names by substituting specific patterns with spaces or empty strings
taxa_names_clean <- iRfcb:::truncate_folder_name(taxa_names) # Remove numerics from folder name
taxa_names_clean <- gsub("_", " ", taxa_names_clean)
taxa_names_clean <- gsub(" single cell", "", taxa_names_clean)
taxa_names_clean <- gsub(" chain", "", taxa_names_clean)
taxa_names_clean <- gsub("-like", "", taxa_names_clean)
taxa_names_clean <- gsub(" larger than 30unidentified", "", taxa_names_clean)
taxa_names_clean <- gsub(" smaller than 30unidentified", "", taxa_names_clean)

# Remove species flags from class names
taxa_names_clean <- gsub("\\<spp\\>", "", taxa_names_clean)
taxa_names_clean <- gsub("  ", " ", taxa_names_clean)

# Turn f to f. for forma
taxa_names_clean <- gsub("\\bf\\b", "f.", taxa_names_clean)

# Add "/" for multiple names with capital letters
# e.g. Heterocapsa_Azadinium to Heterocapsa/Azadinium
taxa_names_clean <- gsub(" ([A-Z])", "/\\1", taxa_names_clean)
taxa_names_clean <- gsub(" ([A-Z])", "/\\1", taxa_names_clean)

# Remove any whitespace
taxa_names_clean <- trimws(taxa_names_clean)

# Retrieve worms records
worms_records <- ifcb_match_taxa_names(taxa_names_clean,
                                       marine_only = FALSE,
                                       fuzzy = FALSE,
                                       verbose = FALSE)

# Create data frame with taxa information and class names
class_names <- worms_records %>%
  mutate(subfolder = taxa_names, class_clean = taxa_names_clean)

# Merge with metadata
metadata <- metadata %>%
  left_join(class_names, by = "subfolder")

Map EcoTaxa Headers

The metadata can be mapped with the headers in ifcb_get_ecotaxa_example to produce metadata files suitable for submitting images to EcoTaxa. The example below is comprehensive and includes several feature fields. For a simpler dataset, minimal fields can be retrieved using ifcb_get_ecotaxa_example(example = "minimal").

# Get EcoTaxa metadata header names
ecotaxa_headers <- ifcb_get_ecotaxa_example()[0,]

# Create a data frame with empty rows matching the length of data
ecotaxa_headers[1:nrow(metadata),] <- NA

# Map metadata to populate the empty dataframe
ecotaxa_metadata <- ecotaxa_headers %>%
  mutate(
    
    # Image fields
    img_file_name = metadata$image,
    
    # Static information
    object_link = "https://doi.org/10.17044/scilifelab.25883455",
    object_annotation_status = "validated",
    acq_resolution_pixels_per_micron = 3.4,
    acq_instrument = "IFCB",
    sample_source = "flowthrough",
    
    # Software
    process_soft = "MATLAB, R",
    process_soft_version = paste0("R2022a, ", version$version.string),
    process_library = "ifcb-analysis",
    process_library_version = 2,
    process_script = "iRfcb",
    process_script_version = as.character(packageVersion("iRfcb")),
    process_date = format(Sys.Date(),"%Y%m%d"), 
    process_time = format(Sys.time(),"%H%M%S"),
    
    # Object-related fields
    object_id = tools::file_path_sans_ext(metadata$image),  
    object_roi_number = metadata$roi,
    object_lat = metadata$gpsLatitude,
    object_lon = metadata$gpsLongitude,
    object_date = format(metadata$date, "%Y%m%d"),
    object_time = gsub(":", "", metadata$time),
    object_annotation_hierarchy = metadata$subfolder,
    object_annotation_category = metadata$class_clean,
    object_aphiaid = metadata$AphiaID,
    object_annotation_date = format(metadata$analysis_date, "%Y%m%d"),
    object_annotation_time = gsub(":", "", metadata$analysis_time),
    object_annotation_person_name = "John Doe",
    object_annotation_person_email = "john.doe@email.com",
    
    # Depth fields
    object_depth_min = 4, # Sampled at 4 m depth
    object_depth_max = 4, # Sampled at 4 m depth
    
    # Sample fields
    sample_vessel = "RV Svea",
    sample_id = metadata$sample,
    sample_station = NA,
    sample_cruise = NA,
    
    ### Features fields
    
    # PMT
    object_pmt_scattering = NA,
    object_pmt_fluorescence = NA,
    
    # Morphological metrics
    object_area = metadata$Area,  
    object_biovolume = metadata$Biovolume,
    object_perimeter = metadata$Perimeter,
    object_bounding_box_xwidth = metadata$BoundingBox_xwidth,
    object_bounding_box_ywidth = metadata$BoundingBox_ywidth,
    object_convex_area = metadata$ConvexArea,
    object_convex_perimeter = metadata$ConvexPerimeter,
    object_feret_diameter = metadata$FeretDiameter,
    object_major_axis_length = metadata$MajorAxisLength,
    object_minor_axis_length = metadata$MinorAxisLength,
    object_orientation = metadata$Orientation,
    object_eccentricity = metadata$Eccentricity,
    object_equiv_diameter = metadata$EquivDiameter,
    object_extent = metadata$Extent,
    object_r_wcenter2total_powerratio = metadata$RWcenter2total_powerratio,
    object_r_whalfpowerintegral = metadata$RWhalfpowerintegral,
    
    # Miscellaneous fields
    object_solidity = metadata$Solidity, 
    object_num_blobs = metadata$numBlobs, 
    object_h180 = metadata$H180, 
    object_h90 = metadata$H90, 
    object_hflip = metadata$Hflip,
    object_summed_area = metadata$summedArea,
    object_summed_biovolume = metadata$summedBiovolume,
    object_summed_convex_area = metadata$summedConvexArea,
    object_summed_convex_perimeter = metadata$summedConvexPerimeter,
    object_summed_feret_diameter = metadata$summedFeretDiameter,
    object_summed_major_axis_length = metadata$summedMajorAxisLength,
    object_summed_minor_axis_length = metadata$summedMinorAxisLength,
    object_summed_perimeter = metadata$summedPerimeter,
    object_shapehist_kurtosis_norm_eq_d = metadata$shapehist_kurtosis_normEqD,
    object_shapehist_mean_norm_eq_d = metadata$shapehist_mean_normEqD,
    object_shapehist_median_norm_eq_d = metadata$shapehist_median_normEqD,
    object_shapehist_mode_norm_eq_d = metadata$shapehist_mode_normEqD,
    object_shapehist_skewness_norm_eq_d = metadata$shapehist_skewness_normEqD,
    object_area_over_perimeter_squared = metadata$Area_over_PerimeterSquared,
    object_area_over_perimeter = metadata$Area_over_Perimeter,
    object_h90_over_hflip = metadata$H90_over_Hflip,
    object_h90_over_h180 = metadata$H90_over_H180,
    object_hflip_over_h180 = metadata$Hflip_over_H180,
    object_summed_convex_perimeter_over_perimeter = metadata$summedConvexPerimeter_over_Perimeter,
    object_rotated_bounding_box_solidity = metadata$rotated_BoundingBox_solidity,
    object_rotated_area = metadata$RotatedArea,
    object_rotated_bounding_box_xwidth = metadata$RotatedBoundingBox_xwidth,
    object_rotated_bounding_box_ywidth = metadata$RotatedBoundingBox_ywidth,
    
    # Texture-related fields
    object_texture_average_contrast = metadata$texture_average_contrast,
    object_texture_average_gray_level = metadata$texture_average_gray_level,
    object_texture_entropy = metadata$texture_entropy,
    object_texture_smoothness = metadata$texture_smoothness,
    object_texture_third_moment = metadata$texture_third_moment,
    object_texture_uniformity = metadata$texture_uniformity,
    
    # Moment invariants
    object_moment_invariant1 = metadata$moment_invariant1,
    object_moment_invariant2 = metadata$moment_invariant2,
    object_moment_invariant3 = metadata$moment_invariant3,
    object_moment_invariant4 = metadata$moment_invariant4,
    object_moment_invariant5 = metadata$moment_invariant5,
    object_moment_invariant6 = metadata$moment_invariant6,
    object_moment_invariant7 = metadata$moment_invariant7,
    
    # Ring fields
    object_ring01 = metadata$Ring01,  
    object_ring02 = metadata$Ring02,  
    object_ring03 = metadata$Ring03,  
    object_ring04 = metadata$Ring04,  
    object_ring05 = metadata$Ring05,  
    object_ring06 = metadata$Ring06,  
    object_ring07 = metadata$Ring07,  
    object_ring08 = metadata$Ring08,  
    object_ring09 = metadata$Ring09,  
    object_ring10 = metadata$Ring10,  
    object_ring11 = metadata$Ring11,  
    object_ring12 = metadata$Ring12,  
    object_ring13 = metadata$Ring13,  
    object_ring14 = metadata$Ring14,  
    object_ring15 = metadata$Ring15,  
    object_ring16 = metadata$Ring16,  
    object_ring17 = metadata$Ring17,  
    object_ring18 = metadata$Ring18,  
    object_ring19 = metadata$Ring19,  
    object_ring20 = metadata$Ring20,  
    object_ring21 = metadata$Ring21,  
    object_ring22 = metadata$Ring22,  
    object_ring23 = metadata$Ring23,  
    object_ring24 = metadata$Ring24,  
    object_ring25 = metadata$Ring25,  
    object_ring26 = metadata$Ring26,  
    object_ring27 = metadata$Ring27,  
    object_ring28 = metadata$Ring28,  
    object_ring29 = metadata$Ring29,  
    object_ring30 = metadata$Ring30,  
    object_ring31 = metadata$Ring31,  
    object_ring32 = metadata$Ring32,  
    object_ring33 = metadata$Ring33,  
    object_ring34 = metadata$Ring34,  
    object_ring35 = metadata$Ring35,  
    object_ring36 = metadata$Ring36,  
    object_ring37 = metadata$Ring37,  
    object_ring38 = metadata$Ring38,  
    object_ring39 = metadata$Ring39,  
    object_ring40 = metadata$Ring40,  
    object_ring41 = metadata$Ring41,  
    object_ring42 = metadata$Ring42,  
    object_ring43 = metadata$Ring43,  
    object_ring44 = metadata$Ring44,  
    object_ring45 = metadata$Ring45,  
    object_ring46 = metadata$Ring46,  
    object_ring47 = metadata$Ring47,  
    object_ring48 = metadata$Ring48,  
    object_ring49 = metadata$Ring49,  
    object_ring50 = metadata$Ring50,  
    
    # HOG fields
    object_hog01 = metadata$HOG01,  
    object_hog02 = metadata$HOG02,  
    object_hog03 = metadata$HOG03,  
    object_hog04 = metadata$HOG04,  
    object_hog05 = metadata$HOG05,  
    object_hog06 = metadata$HOG06,  
    object_hog07 = metadata$HOG07,  
    object_hog08 = metadata$HOG08,  
    object_hog09 = metadata$HOG09,  
    object_hog10 = metadata$HOG10,  
    object_hog11 = metadata$HOG11,  
    object_hog12 = metadata$HOG12,  
    object_hog13 = metadata$HOG13,  
    object_hog14 = metadata$HOG14,  
    object_hog15 = metadata$HOG15,  
    object_hog16 = metadata$HOG16,  
    object_hog17 = metadata$HOG17,  
    object_hog18 = metadata$HOG18,  
    object_hog19 = metadata$HOG19,  
    object_hog20 = metadata$HOG20,  
    object_hog21 = metadata$HOG21,  
    object_hog22 = metadata$HOG22,  
    object_hog23 = metadata$HOG23,  
    object_hog24 = metadata$HOG24,  
    object_hog25 = metadata$HOG25,  
    object_hog26 = metadata$HOG26,  
    object_hog27 = metadata$HOG27,  
    object_hog28 = metadata$HOG28,  
    object_hog29 = metadata$HOG29,  
    object_hog30 = metadata$HOG30,  
    object_hog31 = metadata$HOG31,  
    object_hog32 = metadata$HOG32,  
    object_hog33 = metadata$HOG33,  
    object_hog34 = metadata$HOG34,  
    object_hog35 = metadata$HOG35,  
    object_hog36 = metadata$HOG36,  
    object_hog37 = metadata$HOG37,  
    object_hog38 = metadata$HOG38,  
    object_hog39 = metadata$HOG39,  
    object_hog40 = metadata$HOG40,  
    object_hog41 = metadata$HOG41,  
    object_hog42 = metadata$HOG42,  
    object_hog43 = metadata$HOG43,  
    object_hog44 = metadata$HOG44,  
    object_hog45 = metadata$HOG45,  
    object_hog46 = metadata$HOG46,  
    object_hog47 = metadata$HOG47,  
    object_hog48 = metadata$HOG48,  
    object_hog49 = metadata$HOG49,  
    object_hog50 = metadata$HOG50,
    object_hog51 = metadata$HOG51,
    object_hog52 = metadata$HOG52,
    object_hog53 = metadata$HOG53,
    object_hog54 = metadata$HOG54,
    object_hog55 = metadata$HOG55,
    object_hog56 = metadata$HOG56,
    object_hog57 = metadata$HOG57,
    object_hog58 = metadata$HOG58,
    object_hog59 = metadata$HOG59,
    object_hog60 = metadata$HOG60,
    object_hog61 = metadata$HOG61,
    object_hog62 = metadata$HOG62,
    object_hog63 = metadata$HOG63,
    object_hog64 = metadata$HOG64,
    object_hog65 = metadata$HOG65,
    object_hog66 = metadata$HOG66,
    object_hog67 = metadata$HOG67,
    object_hog68 = metadata$HOG68,
    object_hog69 = metadata$HOG69,
    object_hog70 = metadata$HOG70,
    object_hog71 = metadata$HOG71,
    object_hog72 = metadata$HOG72,
    object_hog73 = metadata$HOG73,
    object_hog74 = metadata$HOG74,
    object_hog75 = metadata$HOG75,
    object_hog76 = metadata$HOG76,
    object_hog77 = metadata$HOG77,
    object_hog78 = metadata$HOG78,
    object_hog79 = metadata$HOG79,
    object_hog80 = metadata$HOG80,
    object_hog81 = metadata$HOG81,
    
    # Wedge fields
    object_wedge01 = metadata$Wedge01,  
    object_wedge02 = metadata$Wedge02,  
    object_wedge03 = metadata$Wedge03,  
    object_wedge04 = metadata$Wedge04,  
    object_wedge05 = metadata$Wedge05,  
    object_wedge06 = metadata$Wedge06,  
    object_wedge07 = metadata$Wedge07,  
    object_wedge08 = metadata$Wedge08,  
    object_wedge09 = metadata$Wedge09,  
    object_wedge10 = metadata$Wedge10,  
    object_wedge11 = metadata$Wedge11,  
    object_wedge12 = metadata$Wedge12,  
    object_wedge13 = metadata$Wedge13,  
    object_wedge14 = metadata$Wedge14,  
    object_wedge15 = metadata$Wedge15,  
    object_wedge16 = metadata$Wedge16,  
    object_wedge17 = metadata$Wedge17,  
    object_wedge18 = metadata$Wedge18,  
    object_wedge19 = metadata$Wedge19,  
    object_wedge20 = metadata$Wedge20,  
    object_wedge21 = metadata$Wedge21,
    object_wedge22 = metadata$Wedge22,  
    object_wedge23 = metadata$Wedge23,  
    object_wedge24 = metadata$Wedge24,  
    object_wedge25 = metadata$Wedge25,  
    object_wedge26 = metadata$Wedge26,  
    object_wedge27 = metadata$Wedge27,  
    object_wedge28 = metadata$Wedge28,  
    object_wedge29 = metadata$Wedge29,  
    object_wedge30 = metadata$Wedge30,  
    object_wedge31 = metadata$Wedge31,  
    object_wedge32 = metadata$Wedge32,  
    object_wedge33 = metadata$Wedge33,  
    object_wedge34 = metadata$Wedge34,  
    object_wedge35 = metadata$Wedge35,  
    object_wedge36 = metadata$Wedge36,  
    object_wedge37 = metadata$Wedge37,  
    object_wedge38 = metadata$Wedge38,  
    object_wedge39 = metadata$Wedge39,  
    object_wedge40 = metadata$Wedge40,  
    object_wedge41 = metadata$Wedge41,  
    object_wedge42 = metadata$Wedge42,  
    object_wedge43 = metadata$Wedge43,  
    object_wedge44 = metadata$Wedge44,  
    object_wedge45 = metadata$Wedge45,  
    object_wedge46 = metadata$Wedge46,  
    object_wedge47 = metadata$Wedge47,  
    object_wedge48 = metadata$Wedge48
    )

Generate EcoTaxa TSV Files

This section demonstrates how to generate .tsv files containing metadata for each class subfolder. These files are essential for uploading data into EcoTaxa. Each .tsv file is written to its respective class subfolder and includes the relevant metadata for that class.

# Loop .tsv creation for each class
for (i in seq_along(unique(ecotaxa_metadata$object_annotation_hierarchy))) {
  
  # Define path to subfolder
  subfolder_path <- file.path("data/extracted_images",
                              unique(ecotaxa_metadata$object_annotation_hierarchy)[i])
  
  # Filter metadata for each class
  ecotaxa_metadata_ix <- ecotaxa_metadata %>%
    filter(object_annotation_hierarchy == unique(ecotaxa_metadata$object_annotation_hierarchy)[i]) %>%
    mutate(object_annotation_hierarchy = iRfcb:::truncate_folder_name(object_annotation_hierarchy))
  
  # Add data format codes (text[t], float[f] etc.)
  ecotaxa_metadata_ix <- bind_rows(
    ifcb_get_ecotaxa_example()[1, ] %>%
      mutate(across(everything(), as.character)),
    ecotaxa_metadata_ix %>%
      mutate(across(everything(), as.character))
  )
  
  # Write one metadata file per class subfolder
  write_tsv(ecotaxa_metadata_ix,
            file.path(
              subfolder_path, 
              paste0("ecotaxa_",
                     unique(iRfcb:::truncate_folder_name(ecotaxa_metadata$object_annotation_hierarchy))[i],
                     ".tsv")),
            na = "")
}

Creating a Zip Archive for EcoTaxa

Prepare the PNG directory for publication by creating a zip archive, ready for upload through the EcoTaxa web interface. Note that the web interface has a maximum file size limit of 500 MB. To accommodate this limitation, the zip archive can be split into multiple files by setting split_zip to TRUE and specifying the max_size parameter in megabytes.

# Create zip-archive
ifcb_zip_pngs(png_folder = "data/extracted_images",
              zip_filename = "data/zip/iRfcb_ecotaxa.zip",
              readme_file = system.file("exdata/README-template.md", 
                                        package = "iRfcb"), # Template icluded in `iRfcb`
              email_address = "tutorial@test.com",
              version = "1.1",
              include_txt = TRUE, # To include the metadata text-files in the archive
              split_zip = TRUE,
              max_size = 500,
              print_progress = FALSE)
## Creating README file...
## Creating MANIFEST.txt...
## Creating zip archive...
## Zip archive created successfully: /home/runner/work/iRfcb/iRfcb/vignettes/data/zip/iRfcb_ecotaxa.zip

This concludes this tutorial for the iRfcb package. For more detailed information, refer to the package documentation or the other tutorials. See how data pipelines can be constructed using iRfcb in the following Example Project. Happy analyzing!

Citation

## To cite package 'iRfcb' in publications use:
## 
##   Anders Torstensson (2025). I 'R' FlowCytobot (iRfcb): Tools for
##   Analyzing and Processing Data from the IFCB. R package version 0.4.0.
##   https://doi.org/10.5281/zenodo.12533225
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {I 'R' FlowCytobot (iRfcb): Tools for Analyzing and Processing Data from the IFCB},
##     author = {Anders Torstensson},
##     year = {2025},
##     note = {R package version 0.4.0},
##     url = {https://doi.org/10.5281/zenodo.12533225},
##   }

References

  • Sosik, H. M. and Olson, R. J. (2007) Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
  • Torstensson, A., Skjevik, A-T., Mohlin, M., Karlberg, M. and Karlson, B. (2024). SMHI IFCB Plankton Image Reference Library. SciLifeLab. Dataset. https://doi.org/10.17044/scilifelab.25883455.v3