User Guide

Complete documentation for all ClassiPyR features.

Interface Overview

ClassiPyR interface showing the title bar, sidebar, and main image gallery area. Click to enlarge.

Title Bar

App name and version
Mode indicator: Shows current state and mode
- No sample loaded: Initial state before selecting a sample
- Validation mode: Shows accuracy percentage
- Annotation mode: Shows progress (X/Y classified)
- Class Review mode: Shows selected class and image counts

Annotator name: Your name for statistics tracking
Settings: Configure folders and options
Sample selection: Year, month, status filters
Navigation: Load, previous, next, random, sync
Cache age: Shows when folders were last scanned
Save button: Manual save trigger
Predict button: One-click CNN classification (when configured)
Clear Annotations: Delete a sample’s annotations from the database (annotation mode only)

Main Area (Tabs)

Image Gallery: View and annotate images
Summary Table: Class distribution statistics
Validation Statistics: Accuracy metrics and change log

Validation vs Annotation Mode

Validation Mode

Activated when loading samples with existing auto-classifications
Original classifications shown with confidence scores
Statistics track how many you’ve changed
Accuracy percentage calculated

Annotation Mode

Activated for samples without classifications
All images start as “unclassified”
Progress shows classified vs remaining
Validation statistics tab shows annotation progress instead

Switching Between Modes

Any sample with auto-classification data (✓ or ✎✓) shows a mode toggle button in the header bar:

→ Manual (in validation mode): Switch to annotation mode. If manual annotations exist they are loaded; otherwise blank “unclassified” annotations are created.
→ Validation (in annotation mode): Switch back to the original auto-classifications.

Samples with only manual annotations (✎) or no data at all (*) do not show the toggle.

Class Review Mode

Class Review mode lets you view all annotated images of a specific class and reclassify them. There are two sources to choose from: the SQLite Database or an External PNG Folder.

Class Review mode showing all images of a selected class across the database. Click to enlarge.

Database review

Review all annotated images of a class across the entire database:

Switch to Class Review using the mode toggle in the sidebar
Select Database as the source (default)
The class dropdown is searchable — type to filter. Each class shows its image count in parentheses
Select a class and click Load — images are loaded from ROI files or discovered PNG sample folders across all samples
Review images in the gallery. Each image label shows the full name (sample + ROI) so you can identify which sample it belongs to
Use the same selection and relabeling tools as in sample mode
Click Save Changes to write row-level updates to the database

External folder review

Review and split a folder of PNG images (e.g. from an external classifier or export) into class subfolders:

Switch to Class Review and select External PNG Folder as the source
Browse to the folder containing PNG images
The initial class label defaults to the folder name — change it if needed
Click Load Folder — all PNGs are loaded into the gallery with the initial class label
Select images and relabel them to different classes
Set an Export Folder and click Export Split Folders — images are copied into class-name subfolders

This is useful for sorting or correcting a flat folder of images without importing them into the database.

Key differences from sample mode

Images come from multiple samples (database) or a single folder (external), not a single sample
Image labels show the full image name (e.g. D20230101T120000_IFCB134_00001) instead of just the ROI number (database mode)
Saving performs surgical row-level UPDATEs rather than replacing an entire sample’s annotations (database mode)
The title bar turns purple to indicate class review mode
Navigation buttons (prev/next/random sample) are not available

Live Prediction

The Predict button in the sidebar lets you classify all images in the loaded sample using a remote CNN model, without needing pre-computed classifier files. This uses iRfcb::ifcb_classify_images() to send images to a Gradio-hosted classification model.

Setup

Open Settings (gear icon)
Under Live Prediction, enter a Gradio API URL (e.g. https://irfcb-classify.hf.space)
The Prediction Model dropdown is populated automatically from the server
Select a model and click Save Settings

The Predict button appears in the sidebar once a Gradio URL and model are configured.

Using Predict

Load a sample in Sample Mode (annotated or unannotated)
Click Predict in the sidebar
A progress bar shows per-image classification progress
When complete, the app switches to validation mode with the predicted classes

Behaviour details

Manual labels are preserved: If you have already reclassified some images, those are skipped during prediction. Only unchanged images are sent to the model.
Threshold setting applies: The “Apply classification threshold” setting controls whether thresholded or raw predictions are used, just like for CSV/H5/MAT classifications.
New classes are added automatically: If the model returns class names not in your current class list, they are added so the filter and relabel dropdowns work correctly.
Predictions become the baseline: After prediction, the predicted classes are treated as the “original” classifications for validation statistics.

Working with Images

Image Cards

Each image card displays:

The plankton image
ROI number (in sample mode) or full image name (in class review mode)
Classification score (if available)
Original class (if relabeled)

Border colors:

Default (gray): Unchanged
Yellow: Relabeled in this session
Blue: Currently selected

Image card border colors: gray (unchanged), yellow (relabeled), blue (selected). Click to enlarge.

Selecting Images

Method	Action
Click	Toggle single image selection
Drag	Draw rectangle to select multiple
Select Page / Select All	First click selects the current page; second click selects all images across all pages
Deselect	Clear all selections

Drag-select: draw a rectangle to select multiple images at once. Click to enlarge.

Relabeling

Select target images
Choose new class in “Relabel to” dropdown
Click Relabel

The dropdown supports type-ahead search - just start typing the class name.

Measuring Images

The measure tool allows you to measure distances in images.

Using the Measure Tool

Click the Measure button (ruler icon) in the toolbar to activate measure mode
Click and drag on any image to draw a measurement line
The distance is displayed in both micrometers (µm) and pixels
Click elsewhere on the image to clear the measurement
Click the Measure button again to deactivate measure mode

Measure tool showing distance in micrometers and pixels. Click to enlarge.

Configuring Scale

The default scale is 3.4 pixels per micrometer (standard for IFCB). To adjust:

Open Settings (gear icon)
Find Pixels per Micrometer field
Enter your instrument’s calibration value
Click Save Settings

Classification Sources

ClassiPyR supports multiple classification input formats. When multiple formats exist for the same sample, the priority is: CSV > H5 > MAT. Samples without any pre-computed classifications can be classified on the fly using Live Prediction.

All formats (including live predictions) support a classification threshold option (configurable in Settings under the Classification Folder). When enabled, predictions below the confidence threshold are shown as “unclassified”; when disabled, the raw (unthresholded) class prediction is used.

CSV Files

Standard classification CSV output from iRfcb. The CSV file must be named after the sample it describes (e.g., D20230101T120000_IFCB134.csv).

Required columns (exact names):

file_name: Image filename including .png extension (e.g., D20230101T120000_IFCB134_00001.png)
class_name: Predicted class name (threshold-applied)

Optional columns:

score: Classification confidence (0-1)
class_name_auto: Raw class prediction without threshold (used when threshold is disabled)

Minimal example:

file_name,class_name
D20230101T120000_IFCB134_00001.png,Diatom
D20230101T120000_IFCB134_00002.png,Ciliate

Example with confidence scores and raw predictions:

file_name,class_name,class_name_auto,score
D20230101T120000_IFCB134_00001.png,unclassified,Diatom,0.45
D20230101T120000_IFCB134_00002.png,Ciliate,Ciliate,0.87

Different CNN pipelines: If your classifier produces different column names, rename them to file_name and class_name before placing the CSV in the Classification Folder.

Files are looked up from the file index cache (see File Index Cache below).

HDF5 Classifier Output

Files matching *_class*.h5 pattern, produced by iRfcb (>= 0.8.0). Contains:

roi_numbers: ROI identifiers
class_name: Predicted class with threshold applied
class_name_auto: Predicted class without threshold
output_scores: Per-class confidence scores
class_labels: All possible class names
classifier_name: Name of the classifier model

Requires the optional hdf5r package. Install with install.packages("hdf5r").

MATLAB Classifier Output

Files matching *_class*.mat pattern from ifcb-analysis containing:

roinum: ROI numbers
TBclass_above_threshold: Predicted class with threshold
TBclass: Predicted class without threshold

Existing Annotations

Previously saved annotations (in SQLite database or .mat files in the output folder) are automatically detected and can be resumed. When both exist, the SQLite version is loaded (faster).

File Index Cache

To avoid slow startup from scanning large folder hierarchies, ClassiPyR maintains a file index cache on disk. The cache stores the locations of all ROI, classification, and annotation files found in your configured folders.

How it Works

On first launch (or after changing folder paths in Settings), the app scans all configured folders and saves the results to a JSON cache file
On subsequent launches, the app loads the cached index instantly instead of re-scanning
The cache is stored alongside your settings in the platform config directory (see Settings Persistence)

Sync Button

The Sync button (circular arrow icon) in the sidebar navigation row triggers a manual rescan of all folders. Use this when:

You’ve added new IFCB data files to your folders
The sample dropdown seems out of date
You want to force a fresh scan

The cache age indicator below the navigation buttons shows when the folders were last scanned (e.g. “synced just now”, “synced 2 hours ago”).

Auto-Sync

By default, the app checks whether the cache matches your current folder settings on startup and rescans automatically if needed. You can disable auto-sync in Settings to always load from the existing cache, which provides the fastest possible startup.

Headless Rescan

You can update the file index cache without launching the app using rescan_file_index(). This is useful for scheduled updates (e.g. cron jobs) on servers where new data arrives regularly:

# Rescan using saved settings
ClassiPyR::rescan_file_index()

# Or specify folder paths explicitly
ClassiPyR::rescan_file_index(
  roi_folder = "/data/ifcb/raw",
  csv_folder = "/data/ifcb/classified",
  output_folder = "/data/ifcb/manual"
)

Output Files

When you save, the app creates files based on your chosen storage format (configurable in Settings).

SQLite Database (default)

db_folder/annotations.sqlite

A single SQLite database file containing annotations for all samples. This is the default storage backend:

No Python dependency required
Fast read/write performance
Single file for all samples — easy to back up and manage
Contains annotations table (one row per ROI) and class_lists table (preserves class indices for .mat export)

The database is stored in a separate Database Folder (configurable in Settings), which defaults to a local user-level directory (tools::R_user_dir("ClassiPyR", "data")). This separation ensures the SQLite database stays on a local filesystem even when the Output Folder is on a network drive.

Note: The SQLite database must be on a local drive. SQLite file locking is unreliable on network filesystems (NFS/SMB), which can lead to database corruption. For multi-user workflows, each annotator should use their own local Database Folder.

Annotation MAT File (optional)

output/[sample_name].mat

MATLAB-compatible format with:

classlist: ROI numbers and class indices
Compatible with ifcb-analysis toolbox

Note: Saving MAT files requires Python with scipy. Enable in Settings > Annotation Storage by selecting “MAT file” or “Both”.

Statistics Files

output_folder/validation_statistics/[sample_name]_validation_stats.csv

Summary: total, correct, incorrect, accuracy

output_folder/validation_statistics/[sample_name]_validation_detailed.csv

Per-image: original class, validated class, correct flag

Organized PNGs

png_output_folder/[class_name]/[image_files]

Images organized into class folders for training CNN models or other classifiers.

Settings Reference

Data Source

ClassiPyR supports two data source modes, selectable in Settings:

Mode	Description
Local Folders (default)	Read IFCB data from local ROI/ADC/HDR files
IFCB Dashboard	Connect to a remote IFCB Dashboard instance

IFCB Dashboard Mode

When “IFCB Dashboard” is selected, enter a Dashboard URL such as:

https://habon-ifcb.whoi.edu/timeline?dataset=tangosund — a specific dataset (recommended)
https://habon-ifcb.whoi.edu/ — all datasets on the dashboard

Tip: Always include ?dataset= in the URL when working with large Dashboard instances. Dashboards like habon-ifcb.whoi.edu host hundreds of thousands of samples across many datasets — loading all of them at once will be very slow and may cause the interface to become unresponsive. Specifying a dataset keeps the sample list manageable and ensures faster startup.

The app fetches the sample list from the Dashboard API. When you load a sample, PNG images are downloaded from the dashboard and cached locally at tools::R_user_dir("ClassiPyR", "cache")/dashboard/. ADC files are downloaded on demand for image dimensions and MAT export.

Dashboard Setting	Description
Dashboard URL	The full URL of the IFCB Dashboard (with optional `?dataset=` parameter)
Use dashboard auto-classifications	When checked, downloads the dashboard’s `_class_scores.csv` for validation mode
Advanced Download Settings	Parallel downloads, sleep time, timeout, and max retries for dashboard downloads

The Classification Folder setting is available in both local and dashboard mode. In dashboard mode, the classification source depends on the “Use dashboard auto-classifications” checkbox:

When “Use dashboard auto-classifications” is disabled (default):

Database annotations (manual, existing)
Local classification files (CSV > H5 > MAT) — if Classification Folder is configured
New annotation mode

When “Use dashboard auto-classifications” is enabled:

Database annotations (manual, existing)
Dashboard auto-classifications (downloaded _class_scores.csv)
New annotation mode

Local classification files and dashboard auto-classifications are mutually exclusive — the checkbox determines which source is used.

Note: In dashboard mode, the ROI/PNG Data Folder setting is not used. The Output Folder, Database Folder, and PNG Output Folder still apply for saving annotations and exports.

Folder Paths

Setting	Description
Classification Folder	Source of CSV/H5/MAT classifications (both local and dashboard mode)
ROI/PNG Data Folder	IFCB raw files (ROI/ADC/HDR) or extracted PNG sample folders (local mode only)
Output Folder	Where MAT files and statistics go (can be on a network drive)
Database Folder	Where the SQLite database is stored (must be a local drive)
PNG Output Folder	Where organized images go

Folder paths are configured using a web-based folder browser that works on all platforms (Linux, macOS, Windows). Changing folder paths in Settings automatically invalidates the file index cache, triggering a fresh scan.

Annotation Storage

Format	Description
SQLite (recommended)	Default. Stores annotations in `annotations.sqlite` in the Database Folder. No Python needed.
MAT file	MATLAB-compatible `.mat` files for ifcb-analysis. Requires Python with scipy.
Both	Writes to both SQLite and `.mat` for maximum compatibility.

Below the format selector, two buttons allow bulk conversion between formats:

Import .mat → SQLite: Imports all .mat annotation files from the output folder into the SQLite database. Already-imported samples are skipped.
Import PNG → SQLite: Imports annotations from a folder of PNG images organized in class-name subfolders. Class names are extracted from folder names (trailing _NNN suffixes are stripped). Useful for re-importing corrected exports or importing external classification datasets.
Export SQLite → .mat: Exports all annotated samples from the database to .mat files in the specified Output Folder. Requires Python with scipy.
Export SQLite → PNG: Extracts annotated images from ROI files into class-name subfolders in the PNG Output Folder. Useful for building training datasets for CNN classifiers.
Export SQLite → ZIP: Builds an EcoTaxa-ready ZIP archive from SQLite annotations that is also suitable for sharing datasets in general repositories (for example Zenodo or Figshare). The export writes class-organized PNGs, per-class inventories (ecotaxa_<CLASSNAME>.tsv), and a README file.
Export SQLite → MATLAB ZIP: Builds a MATLAB-format ZIP archive via iRfcb::ifcb_zip_matlab(), bundling .mat annotation files, feature CSVs, a class2use.mat config file, optional raw data (.roi, .adc, .hdr), and README files. When using SQLite-only storage the annotations are automatically converted to temporary .mat files (requires Python with scipy). See the iRfcb image export tutorial for more details on the MATLAB ZIP format.

The Export SQLite → ZIP and Export SQLite → MATLAB ZIP dialogs include optional README metadata fields:

Author
Contact e-mail
DOI
Licence
Version
Citation
Institute

These values are saved in settings and reused for future ZIP exports. Empty fields are omitted from the README.

ZIP export also supports archive splitting via split_zip and max_size (MB), matching iRfcb::ifcb_zip_pngs().

The MATLAB ZIP dialog additionally requires a Features folder (top-level folder containing feature CSV files) and an optional Data folder (raw IFCB data). Both have a “search recursively” checkbox enabled by default so that files in year-based subdirectories are included automatically.

Both ZIP export dialogs include an optional Filter by IFCB dropdown (shown when the database contains samples from more than one instrument). This lets you restrict the export to a single instrument, e.g. to exclude data from a test or development instrument.

Note: External datasets can be viewed without ROI files when extracted PNGs are available under sample-named folders in the ROI/PNG Data Folder. MAT export still requires ADC data (from local files or IFCB Dashboard).

WoRMS (AphiaID) Matching

Class list entries can be linked to WoRMS AphiaIDs in the Class List Editor:

WoRMS match results modal with per-class manual rematch query inputs. Click to enlarge.

Open Settings → Edit Class List
Click Match WoRMS AphiaID
Review accepted/synonym/unmatched/skipped results
For unresolved classes, edit query fields and click Rematch Unmatched
Click Apply AphiaID Matches

When applied, AphiaID values are saved in SQLite (class_taxonomy table) and shown inline in the class list display as [AphiaID: ...].

Note: AphiaID mappings are database metadata. They are not written into exported class2use.mat / .txt files.

Auto-Sync

Setting	Description
Auto-sync folders on startup	When enabled (default), the app checks and refreshes the file index on launch. Disable for instant startup using the existing cache.

Python Configuration

The Python virtual environment path is configured when launching the app:

run_app(venv_path = "/path/to/your/venv")

The path is remembered for future sessions. Priority order: run_app(venv_path=) argument > saved settings > default (./venv).

Live Prediction

Setting	Description
Gradio API URL	URL of a Gradio-hosted CNN classification server (e.g. `https://irfcb-classify.hf.space`)
Prediction Model	CNN model to use for classification. Choices are fetched from the Gradio server.

When both fields are configured, a Predict button appears in the sidebar for Sample Mode. See Live Prediction for usage details.

Classifier Options

Apply classification threshold: When loading classifier output (CSV, H5, or MAT) or live predictions, use the threshold-filtered class predictions (checked) or the raw unthresholded predictions (unchecked).

PNG Export Options

Skip class from PNG export: Optionally exclude a class (e.g. “unclassified”) from the organized PNG output. Set in Settings under the PNG Output Folder section.

ZIP Export Notes

ZIP export uses your current SQLite annotations as source.
README content is generated from the package template and includes archive provenance:
- ClassiPyR version used for export
- ClassiPyR citation text
Per-class inventory files are named ecotaxa_<CLASSNAME>.tsv.

MATLAB ZIP Export Notes

MATLAB ZIP export bundles .mat annotations, feature CSVs, class2use.mat, and optionally raw data into a single archive suitable for sharing (e.g. for the SMHI IFCB Plankton Image Reference Library).
If your storage format is SQLite-only, annotations are converted to .mat files on the fly (requires Python with scipy). When using MAT or Both storage, existing .mat files from the Output Folder are used directly.
The class2use.mat config file is generated automatically from the current class list.
For details on the archive structure and the underlying iRfcb::ifcb_zip_matlab() function, see the iRfcb image export tutorial.

Statistics and Reporting

Summary Table Tab

Shows class distribution:

Class name
Image count
Average/min/max confidence scores

Validation Statistics Tab

Classification Performance:

Total images
Correct/incorrect counts
Overall accuracy
Per-class breakdown

Changes Made:

Table of all relabeling actions
Original class → New class

Session Cache

The app maintains two types of caches:

In-memory session cache (per session):

Switching samples saves work automatically
Returning to a sample restores your changes
Cache persists until you close the app

Note: Always click Save before closing for permanent storage.

File index cache (persistent on disk):

Stores the locations of all IFCB files across your configured folders
Persists between sessions for fast startup
See File Index Cache for details

Settings Persistence

ClassiPyR stores your settings in a configuration file that follows R standards:

Linux: ~/.config/R/ClassiPyR/settings.json
macOS: ~/Library/Preferences/org.R-project.R/R/ClassiPyR/settings.json
Windows: %APPDATA%/R/config/R/ClassiPyR/settings.json

Settings are loaded automatically when you start the app, so your folder paths, class list location, and Python venv path are remembered between sessions. Settings can be reset by specifying run_app(reset_settings = TRUE).

Dependencies

ClassiPyR relies on:

iRfcb for IFCB data operations (extracting images, reading ADC metadata, reading/writing .mat files, class list handling)
RSQLite and DBI for the SQLite annotation database

Optional dependencies:

hdf5r for reading HDF5 (.h5) classifier output files. Install with install.packages("hdf5r").

All R dependencies are installed automatically when you install ClassiPyR. Python is only needed for .mat file export.

Interface Overview

Title Bar

Sidebar

Main Area (Tabs)

Validation vs Annotation Mode

Validation Mode

Annotation Mode

Switching Between Modes

Class Review Mode

Database review

External folder review

Key differences from sample mode

Live Prediction

Setup

Using Predict

Behaviour details

Working with Images

Image Cards

Selecting Images

Relabeling

Measuring Images

Using the Measure Tool

Configuring Scale

Classification Sources

CSV Files

HDF5 Classifier Output

MATLAB Classifier Output

Existing Annotations

File Index Cache

How it Works

Sync Button

Auto-Sync

Headless Rescan

Output Files

SQLite Database (default)

Annotation MAT File (optional)

Statistics Files

Organized PNGs

Settings Reference

Data Source

IFCB Dashboard Mode

Folder Paths

Annotation Storage

WoRMS (AphiaID) Matching

Auto-Sync

Python Configuration

Live Prediction

Classifier Options

PNG Export Options

ZIP Export Notes

MATLAB ZIP Export Notes

Statistics and Reporting

Summary Table Tab

Validation Statistics Tab

Session Cache

Settings Persistence

Dependencies