Skip to contents

Complete documentation for all ClassiPyR features.


Interface Overview

ClassiPyR interface showing the title bar, sidebar, and main image gallery area.

ClassiPyR interface showing the title bar, sidebar, and main image gallery area. Click to enlarge.

Title Bar

  • App name and version
  • Mode indicator: Shows current state and mode
    • No sample loaded: Initial state before selecting a sample
    • Validation mode: Shows accuracy percentage
    • Annotation mode: Shows progress (X/Y classified)
    • Class Review mode: Shows selected class and image counts
  • Annotator name: Your name for statistics tracking
  • Settings: Configure folders and options
  • Sample selection: Year, month, status filters
  • Navigation: Load, previous, next, random, sync
  • Cache age: Shows when folders were last scanned
  • Save button: Manual save trigger
  • Predict button: One-click CNN classification (when configured)
  • Clear Annotations: Delete a sample’s annotations from the database (annotation mode only)

Main Area (Tabs)

  1. Image Gallery: View and annotate images
  2. Summary Table: Class distribution statistics
  3. Validation Statistics: Accuracy metrics and change log

Validation vs Annotation Mode

Validation Mode

  • Activated when loading samples with existing auto-classifications
  • Original classifications shown with confidence scores
  • Statistics track how many you’ve changed
  • Accuracy percentage calculated

Annotation Mode

  • Activated for samples without classifications
  • All images start as “unclassified”
  • Progress shows classified vs remaining
  • Validation statistics tab shows annotation progress instead

Switching Between Modes

Any sample with auto-classification data (✓ or ✎✓) shows a mode toggle button in the header bar:

  • → Manual (in validation mode): Switch to annotation mode. If manual annotations exist they are loaded; otherwise blank “unclassified” annotations are created.
  • → Validation (in annotation mode): Switch back to the original auto-classifications.

Samples with only manual annotations (✎) or no data at all (*) do not show the toggle.


Class Review Mode

Class Review mode lets you view all annotated images of a specific class and reclassify them. There are two sources to choose from: the SQLite Database or an External PNG Folder.

Class Review mode showing all images of a selected class across the database.

Class Review mode showing all images of a selected class across the database. Click to enlarge.

Database review

Review all annotated images of a class across the entire database:

  1. Switch to Class Review using the mode toggle in the sidebar
  2. Select Database as the source (default)
  3. The class dropdown is searchable — type to filter. Each class shows its image count in parentheses
  4. Select a class and click Load — images are loaded from ROI files or discovered PNG sample folders across all samples
  5. Review images in the gallery. Each image label shows the full name (sample + ROI) so you can identify which sample it belongs to
  6. Use the same selection and relabeling tools as in sample mode
  7. Click Save Changes to write row-level updates to the database

External folder review

Review and split a folder of PNG images (e.g. from an external classifier or export) into class subfolders:

  1. Switch to Class Review and select External PNG Folder as the source
  2. Browse to the folder containing PNG images
  3. The initial class label defaults to the folder name — change it if needed
  4. Click Load Folder — all PNGs are loaded into the gallery with the initial class label
  5. Select images and relabel them to different classes
  6. Set an Export Folder and click Export Split Folders — images are copied into class-name subfolders

This is useful for sorting or correcting a flat folder of images without importing them into the database.

Key differences from sample mode

  • Images come from multiple samples (database) or a single folder (external), not a single sample
  • Image labels show the full image name (e.g. D20230101T120000_IFCB134_00001) instead of just the ROI number (database mode)
  • Saving performs surgical row-level UPDATEs rather than replacing an entire sample’s annotations (database mode)
  • The title bar turns purple to indicate class review mode
  • Navigation buttons (prev/next/random sample) are not available

Live Prediction

The Predict button in the sidebar lets you classify all images in the loaded sample using a remote CNN model, without needing pre-computed classifier files. This uses iRfcb::ifcb_classify_images() to send images to a Gradio-hosted classification model.

Setup

  1. Open Settings (gear icon)
  2. Under Live Prediction, enter a Gradio API URL (e.g. https://irfcb-classify.hf.space)
  3. The Prediction Model dropdown is populated automatically from the server
  4. Select a model and click Save Settings

The Predict button appears in the sidebar once a Gradio URL and model are configured.

Using Predict

  1. Load a sample in Sample Mode (annotated or unannotated)
  2. Click Predict in the sidebar
  3. A progress bar shows per-image classification progress
  4. When complete, the app switches to validation mode with the predicted classes

Behaviour details

  • Manual labels are preserved: If you have already reclassified some images, those are skipped during prediction. Only unchanged images are sent to the model.
  • Threshold setting applies: The “Apply classification threshold” setting controls whether thresholded or raw predictions are used, just like for CSV/H5/MAT classifications.
  • New classes are added automatically: If the model returns class names not in your current class list, they are added so the filter and relabel dropdowns work correctly.
  • Predictions become the baseline: After prediction, the predicted classes are treated as the “original” classifications for validation statistics.

Working with Images

Image Cards

Each image card displays:

  • The plankton image
  • ROI number (in sample mode) or full image name (in class review mode)
  • Classification score (if available)
  • Original class (if relabeled)

Border colors:

  • Default (gray): Unchanged
  • Yellow: Relabeled in this session
  • Blue: Currently selected
Image card border colors: gray (unchanged), yellow (relabeled), blue (selected).

Image card border colors: gray (unchanged), yellow (relabeled), blue (selected). Click to enlarge.

Selecting Images

Method Action
Click Toggle single image selection
Drag Draw rectangle to select multiple
Select Page / Select All First click selects the current page; second click selects all images across all pages
Deselect Clear all selections
Drag-select: draw a rectangle to select multiple images at once.

Drag-select: draw a rectangle to select multiple images at once. Click to enlarge.

Relabeling

  1. Select target images
  2. Choose new class in “Relabel to” dropdown
  3. Click Relabel

The dropdown supports type-ahead search - just start typing the class name.


Measuring Images

The measure tool allows you to measure distances in images.

Using the Measure Tool

  1. Click the Measure button (ruler icon) in the toolbar to activate measure mode
  2. Click and drag on any image to draw a measurement line
  3. The distance is displayed in both micrometers (µm) and pixels
  4. Click elsewhere on the image to clear the measurement
  5. Click the Measure button again to deactivate measure mode
Measure tool showing distance in micrometers and pixels.

Measure tool showing distance in micrometers and pixels. Click to enlarge.

Configuring Scale

The default scale is 3.4 pixels per micrometer (standard for IFCB). To adjust:

  1. Open Settings (gear icon)
  2. Find Pixels per Micrometer field
  3. Enter your instrument’s calibration value
  4. Click Save Settings

Classification Sources

ClassiPyR supports multiple classification input formats. When multiple formats exist for the same sample, the priority is: CSV > H5 > MAT. Samples without any pre-computed classifications can be classified on the fly using Live Prediction.

All formats (including live predictions) support a classification threshold option (configurable in Settings under the Classification Folder). When enabled, predictions below the confidence threshold are shown as “unclassified”; when disabled, the raw (unthresholded) class prediction is used.

CSV Files

Standard classification CSV output from iRfcb. The CSV file must be named after the sample it describes (e.g., D20230101T120000_IFCB134.csv).

Required columns (exact names):

  • file_name: Image filename including .png extension (e.g., D20230101T120000_IFCB134_00001.png)
  • class_name: Predicted class name (threshold-applied)

Optional columns:

  • score: Classification confidence (0-1)
  • class_name_auto: Raw class prediction without threshold (used when threshold is disabled)

Minimal example:

file_name,class_name
D20230101T120000_IFCB134_00001.png,Diatom
D20230101T120000_IFCB134_00002.png,Ciliate

Example with confidence scores and raw predictions:

file_name,class_name,class_name_auto,score
D20230101T120000_IFCB134_00001.png,unclassified,Diatom,0.45
D20230101T120000_IFCB134_00002.png,Ciliate,Ciliate,0.87

Different CNN pipelines: If your classifier produces different column names, rename them to file_name and class_name before placing the CSV in the Classification Folder.

Files are looked up from the file index cache (see File Index Cache below).

HDF5 Classifier Output

Files matching *_class*.h5 pattern, produced by iRfcb (>= 0.8.0). Contains:

  • roi_numbers: ROI identifiers
  • class_name: Predicted class with threshold applied
  • class_name_auto: Predicted class without threshold
  • output_scores: Per-class confidence scores
  • class_labels: All possible class names
  • classifier_name: Name of the classifier model

Requires the optional hdf5r package. Install with install.packages("hdf5r").

MATLAB Classifier Output

Files matching *_class*.mat pattern from ifcb-analysis containing:

  • roinum: ROI numbers
  • TBclass_above_threshold: Predicted class with threshold
  • TBclass: Predicted class without threshold

Existing Annotations

Previously saved annotations (in SQLite database or .mat files in the output folder) are automatically detected and can be resumed. When both exist, the SQLite version is loaded (faster).


File Index Cache

To avoid slow startup from scanning large folder hierarchies, ClassiPyR maintains a file index cache on disk. The cache stores the locations of all ROI, classification, and annotation files found in your configured folders.

How it Works

  • On first launch (or after changing folder paths in Settings), the app scans all configured folders and saves the results to a JSON cache file
  • On subsequent launches, the app loads the cached index instantly instead of re-scanning
  • The cache is stored alongside your settings in the platform config directory (see Settings Persistence)

Sync Button

The Sync button (circular arrow icon) in the sidebar navigation row triggers a manual rescan of all folders. Use this when:

  • You’ve added new IFCB data files to your folders
  • The sample dropdown seems out of date
  • You want to force a fresh scan

The cache age indicator below the navigation buttons shows when the folders were last scanned (e.g. “synced just now”, “synced 2 hours ago”).

Auto-Sync

By default, the app checks whether the cache matches your current folder settings on startup and rescans automatically if needed. You can disable auto-sync in Settings to always load from the existing cache, which provides the fastest possible startup.

Headless Rescan

You can update the file index cache without launching the app using rescan_file_index(). This is useful for scheduled updates (e.g. cron jobs) on servers where new data arrives regularly:

# Rescan using saved settings
ClassiPyR::rescan_file_index()

# Or specify folder paths explicitly
ClassiPyR::rescan_file_index(
  roi_folder = "/data/ifcb/raw",
  csv_folder = "/data/ifcb/classified",
  output_folder = "/data/ifcb/manual"
)

Output Files

When you save, the app creates files based on your chosen storage format (configurable in Settings).

SQLite Database (default)

db_folder/annotations.sqlite

A single SQLite database file containing annotations for all samples. This is the default storage backend:

  • No Python dependency required
  • Fast read/write performance
  • Single file for all samples — easy to back up and manage
  • Contains annotations table (one row per ROI) and class_lists table (preserves class indices for .mat export)

The database is stored in a separate Database Folder (configurable in Settings), which defaults to a local user-level directory (tools::R_user_dir("ClassiPyR", "data")). This separation ensures the SQLite database stays on a local filesystem even when the Output Folder is on a network drive.

Note: The SQLite database must be on a local drive. SQLite file locking is unreliable on network filesystems (NFS/SMB), which can lead to database corruption. For multi-user workflows, each annotator should use their own local Database Folder.

Annotation MAT File (optional)

output/[sample_name].mat

MATLAB-compatible format with:

  • classlist: ROI numbers and class indices
  • Compatible with ifcb-analysis toolbox

Note: Saving MAT files requires Python with scipy. Enable in Settings > Annotation Storage by selecting “MAT file” or “Both”.

Statistics Files

output_folder/validation_statistics/[sample_name]_validation_stats.csv

  • Summary: total, correct, incorrect, accuracy

output_folder/validation_statistics/[sample_name]_validation_detailed.csv

  • Per-image: original class, validated class, correct flag

Organized PNGs

png_output_folder/[class_name]/[image_files]

Images organized into class folders for training CNN models or other classifiers.


Settings Reference

Data Source

ClassiPyR supports two data source modes, selectable in Settings:

Mode Description
Local Folders (default) Read IFCB data from local ROI/ADC/HDR files
IFCB Dashboard Connect to a remote IFCB Dashboard instance

IFCB Dashboard Mode

When “IFCB Dashboard” is selected, enter a Dashboard URL such as:

  • https://habon-ifcb.whoi.edu/timeline?dataset=tangosund — a specific dataset (recommended)
  • https://habon-ifcb.whoi.edu/ — all datasets on the dashboard

Tip: Always include ?dataset= in the URL when working with large Dashboard instances. Dashboards like habon-ifcb.whoi.edu host hundreds of thousands of samples across many datasets — loading all of them at once will be very slow and may cause the interface to become unresponsive. Specifying a dataset keeps the sample list manageable and ensures faster startup.

The app fetches the sample list from the Dashboard API. When you load a sample, PNG images are downloaded from the dashboard and cached locally at tools::R_user_dir("ClassiPyR", "cache")/dashboard/. ADC files are downloaded on demand for image dimensions and MAT export.

Dashboard Setting Description
Dashboard URL The full URL of the IFCB Dashboard (with optional ?dataset= parameter)
Use dashboard auto-classifications When checked, downloads the dashboard’s _class_scores.csv for validation mode
Advanced Download Settings Parallel downloads, sleep time, timeout, and max retries for dashboard downloads

The Classification Folder setting is available in both local and dashboard mode. In dashboard mode, the classification source depends on the “Use dashboard auto-classifications” checkbox:

When “Use dashboard auto-classifications” is disabled (default):

  1. Database annotations (manual, existing)
  2. Local classification files (CSV > H5 > MAT) — if Classification Folder is configured
  3. New annotation mode

When “Use dashboard auto-classifications” is enabled:

  1. Database annotations (manual, existing)
  2. Dashboard auto-classifications (downloaded _class_scores.csv)
  3. New annotation mode

Local classification files and dashboard auto-classifications are mutually exclusive — the checkbox determines which source is used.

Note: In dashboard mode, the ROI/PNG Data Folder setting is not used. The Output Folder, Database Folder, and PNG Output Folder still apply for saving annotations and exports.

Folder Paths

Setting Description
Classification Folder Source of CSV/H5/MAT classifications (both local and dashboard mode)
ROI/PNG Data Folder IFCB raw files (ROI/ADC/HDR) or extracted PNG sample folders (local mode only)
Output Folder Where MAT files and statistics go (can be on a network drive)
Database Folder Where the SQLite database is stored (must be a local drive)
PNG Output Folder Where organized images go

Folder paths are configured using a web-based folder browser that works on all platforms (Linux, macOS, Windows). Changing folder paths in Settings automatically invalidates the file index cache, triggering a fresh scan.

Annotation Storage

Format Description
SQLite (recommended) Default. Stores annotations in annotations.sqlite in the Database Folder. No Python needed.
MAT file MATLAB-compatible .mat files for ifcb-analysis. Requires Python with scipy.
Both Writes to both SQLite and .mat for maximum compatibility.

Below the format selector, two buttons allow bulk conversion between formats:

  • Import .mat → SQLite: Imports all .mat annotation files from the output folder into the SQLite database. Already-imported samples are skipped.
  • Import PNG → SQLite: Imports annotations from a folder of PNG images organized in class-name subfolders. Class names are extracted from folder names (trailing _NNN suffixes are stripped). Useful for re-importing corrected exports or importing external classification datasets.
  • Export SQLite → .mat: Exports all annotated samples from the database to .mat files in the specified Output Folder. Requires Python with scipy.
  • Export SQLite → PNG: Extracts annotated images from ROI files into class-name subfolders in the PNG Output Folder. Useful for building training datasets for CNN classifiers.
  • Export SQLite → ZIP: Builds an EcoTaxa-ready ZIP archive from SQLite annotations that is also suitable for sharing datasets in general repositories (for example Zenodo or Figshare). The export writes class-organized PNGs, per-class inventories (ecotaxa_<CLASSNAME>.tsv), and a README file.
  • Export SQLite → MATLAB ZIP: Builds a MATLAB-format ZIP archive via iRfcb::ifcb_zip_matlab(), bundling .mat annotation files, feature CSVs, a class2use.mat config file, optional raw data (.roi, .adc, .hdr), and README files. When using SQLite-only storage the annotations are automatically converted to temporary .mat files (requires Python with scipy). See the iRfcb image export tutorial for more details on the MATLAB ZIP format.

The Export SQLite → ZIP and Export SQLite → MATLAB ZIP dialogs include optional README metadata fields:

  • Author
  • Contact e-mail
  • DOI
  • Licence
  • Version
  • Citation
  • Institute

These values are saved in settings and reused for future ZIP exports. Empty fields are omitted from the README.

ZIP export also supports archive splitting via split_zip and max_size (MB), matching iRfcb::ifcb_zip_pngs().

The MATLAB ZIP dialog additionally requires a Features folder (top-level folder containing feature CSV files) and an optional Data folder (raw IFCB data). Both have a “search recursively” checkbox enabled by default so that files in year-based subdirectories are included automatically.

Both ZIP export dialogs include an optional Filter by IFCB dropdown (shown when the database contains samples from more than one instrument). This lets you restrict the export to a single instrument, e.g. to exclude data from a test or development instrument.

Note: External datasets can be viewed without ROI files when extracted PNGs are available under sample-named folders in the ROI/PNG Data Folder. MAT export still requires ADC data (from local files or IFCB Dashboard).

WoRMS (AphiaID) Matching

Class list entries can be linked to WoRMS AphiaIDs in the Class List Editor:

WoRMS match results modal with per-class manual rematch query inputs.

WoRMS match results modal with per-class manual rematch query inputs. Click to enlarge.

  1. Open SettingsEdit Class List
  2. Click Match WoRMS AphiaID
  3. Review accepted/synonym/unmatched/skipped results
  4. For unresolved classes, edit query fields and click Rematch Unmatched
  5. Click Apply AphiaID Matches

When applied, AphiaID values are saved in SQLite (class_taxonomy table) and shown inline in the class list display as [AphiaID: ...].

Note: AphiaID mappings are database metadata. They are not written into exported class2use.mat / .txt files.

Auto-Sync

Setting Description
Auto-sync folders on startup When enabled (default), the app checks and refreshes the file index on launch. Disable for instant startup using the existing cache.

Python Configuration

The Python virtual environment path is configured when launching the app:

run_app(venv_path = "/path/to/your/venv")

The path is remembered for future sessions. Priority order: run_app(venv_path=) argument > saved settings > default (./venv).

Live Prediction

Setting Description
Gradio API URL URL of a Gradio-hosted CNN classification server (e.g. https://irfcb-classify.hf.space)
Prediction Model CNN model to use for classification. Choices are fetched from the Gradio server.

When both fields are configured, a Predict button appears in the sidebar for Sample Mode. See Live Prediction for usage details.

Classifier Options

Apply classification threshold: When loading classifier output (CSV, H5, or MAT) or live predictions, use the threshold-filtered class predictions (checked) or the raw unthresholded predictions (unchecked).

PNG Export Options

Skip class from PNG export: Optionally exclude a class (e.g. “unclassified”) from the organized PNG output. Set in Settings under the PNG Output Folder section.

ZIP Export Notes

  • ZIP export uses your current SQLite annotations as source.
  • README content is generated from the package template and includes archive provenance:
    • ClassiPyR version used for export
    • ClassiPyR citation text
  • Per-class inventory files are named ecotaxa_<CLASSNAME>.tsv.

MATLAB ZIP Export Notes

  • MATLAB ZIP export bundles .mat annotations, feature CSVs, class2use.mat, and optionally raw data into a single archive suitable for sharing (e.g. for the SMHI IFCB Plankton Image Reference Library).
  • If your storage format is SQLite-only, annotations are converted to .mat files on the fly (requires Python with scipy). When using MAT or Both storage, existing .mat files from the Output Folder are used directly.
  • The class2use.mat config file is generated automatically from the current class list.
  • For details on the archive structure and the underlying iRfcb::ifcb_zip_matlab() function, see the iRfcb image export tutorial.

Statistics and Reporting

Summary Table Tab

Shows class distribution:

  • Class name
  • Image count
  • Average/min/max confidence scores

Validation Statistics Tab

Classification Performance:

  • Total images
  • Correct/incorrect counts
  • Overall accuracy
  • Per-class breakdown

Changes Made:

  • Table of all relabeling actions
  • Original class → New class

Session Cache

The app maintains two types of caches:

In-memory session cache (per session):

  • Switching samples saves work automatically
  • Returning to a sample restores your changes
  • Cache persists until you close the app

Note: Always click Save before closing for permanent storage.

File index cache (persistent on disk):

  • Stores the locations of all IFCB files across your configured folders
  • Persists between sessions for fast startup
  • See File Index Cache for details

Settings Persistence

ClassiPyR stores your settings in a configuration file that follows R standards:

  • Linux: ~/.config/R/ClassiPyR/settings.json
  • macOS: ~/Library/Preferences/org.R-project.R/R/ClassiPyR/settings.json
  • Windows: %APPDATA%/R/config/R/ClassiPyR/settings.json

Settings are loaded automatically when you start the app, so your folder paths, class list location, and Python venv path are remembered between sessions. Settings can be reset by specifying run_app(reset_settings = TRUE).


Dependencies

ClassiPyR relies on:

  • iRfcb for IFCB data operations (extracting images, reading ADC metadata, reading/writing .mat files, class list handling)
  • RSQLite and DBI for the SQLite annotation database

Optional dependencies:

  • hdf5r for reading HDF5 (.h5) classifier output files. Install with install.packages("hdf5r").

All R dependencies are installed automatically when you install ClassiPyR. Python is only needed for .mat file export.