Setting Up a New Loader

The TAVI framework provides a plugin-based architecture for handling different scan file formats through the Loader system. This guide explains how to create and register a new loader so it gets automatically picked up by the LoaderRegistry and integrated into the RawScanClassifier flow.

Overview

The loader system consists of three main components:

  1. LoaderInterface: Abstract interface that all loaders must implement

  2. LoaderRegistry: Singleton that manages and coordinates all loaders

  3. RawScanClassifier: Uses the registry to automatically classify scan file types

The classification flow works as follows:

File Path → RawScanClassifier → LoaderRegistry → All Loaders
                                     ↓
                    Each loader scores the file
                                     ↓
                    Highest scoring loader wins
                                     ↓
                               File Type Identified

Step 1: Create Your Loader Class

Create a new Python file in src/tavi/library/storage/loader/ directory. Your loader must inherit from AbstractLoader.

Example: Creating a loader for a hypothetical “MyFormat” scan type

"""MyFormat scan file loader."""

from typing import Any

from tavi.library.data.enum.raw_scan_type import RawScanType
from tavi.library.data.scan import Scan, ScanData, ScanMetadata
from tavi.library.storage.interface.file_store_interface import FileStoreInterface
from tavi.library.storage.loader.interface.base import AbstractLoader


class MyFormatLoader(AbstractLoader):
    """Loader for MyFormat scan files."""

    def __init__(self, filestore: FileStoreInterface) -> None:
        """Initialize MyFormat loader."""
        super().__init__(filestore)

    def load(self, path: str) -> Scan:
        """Load scan data from file."""
        # Implement your file loading logic
        pass

    def get_scan_type(self) -> RawScanType:
        """Get scan type identifier."""
        # Must match the enum value you add in Step 2
        return RawScanType.MyFormat

    def get_score(self, path: str) -> float:
        """
        Score how confident this loader is for a given file.

        Return a score between 0.0 and 1.0 where:
        - 0.0: This loader cannot handle this file
        - 0.1-0.5: Maybe this loader can handle it
        - 0.51-1.0: Strongly confident this loader can handle it

        The loader with the highest score will be selected.
        """
        # Implement your file format detection logic
        if path.endswith('.myformat'):
            return 1.0  # Strong match
        return 0.0

    def parse_metadata(self, path: str) -> ScanMetadata:
        """Parse metadata from the file."""
        pass

    def parse_scan_values(self, path: str) -> ScanData:
        """Parse scan data values from the file."""
        pass

    def parse_external_metadata(self, path: str) -> dict[str, Any]:
        """Parse any external metadata associated with the file."""
        pass

    def adapt_scan_data(self, meta: ScanMetadata, values: ScanData) -> Scan:
        """Combine metadata and values into a Scan object."""
        pass

Step 2: Add Enum Value for Your Format

Add your format to the RawScanType enum in src/tavi/library/data/enum/raw_scan_type.py:

"""Enumeration of raw scan types."""

from enum import StrEnum


class RawScanType(StrEnum):
    """Enumeration of supported raw scan file types."""

    ORNLSpice = "ORNLSpice"
    MyFormat = "MyFormat"  # Add your format here
    NONE = "None"

Step 3: Register Your Loader

Register your loader in the LoaderRegistry singleton located at src/tavi/library/storage/loader/loader_registry.py:

"""Loader registry singleton."""

from neutrons_standard.decorators.singleton import Singleton

from tavi.library.storage.interface.filestore_interface import Filestore
from tavi.library.storage.loader.default_loader import DefaultLoader
from tavi.library.storage.loader.interface.base import AbstractLoader
from tavi.library.storage.loader.my_format_loader import MyFormatLoader  # Add import
from tavi.library.storage.loader.ornl_spice_loader import ORNLSpiceLoader


@Singleton
class LoaderRegistry:
    """Registry for managing loaders."""

    def __init__(self, filestore: Filestore) -> None:
        """Initialize registry with filestore."""
        self.registry: dict[str, AbstractLoader] = {}
        self.set_filestore(filestore)

        # Register loaders in order of priority (highest first)
        self._register_loader(ORNLSpiceLoader(self.filestore))
        self._register_loader(MyFormatLoader(self.filestore))  # Add your loader
        self._register_loader(DefaultLoader(self.filestore))

    # ... rest of the implementation

Important: Register more specific loaders before more general ones. The DefaultLoader should always be last as it returns a score of 0 for everything.

Step 4: Test Your Loader

Create tests to verify your loader:

  1. Scoring: Test that get_score() returns appropriate scores for your format

  2. Loading: Test that load() correctly reads and parses files

  3. Classification: Test that RawScanClassifier correctly identifies your file type

Example test:

from tavi.backend.classification.raw_scan_classifier import RawScanClassifier
from tavi.library.data.enum.raw_scan_type import RawScanType


def test_myformat_classification():
    classifier = RawScanClassifier()
    result = classifier.get_classification("path/to/file.myformat")
    assert result == RawScanType.MyFormat

How Classification Works

When RawScanClassifier.get_classification(file_path) is called:

  1. It retrieves all registered loaders from LoaderRegistry

  2. It calls get_score(file_path) on each loader

  3. It tracks which loader returned the highest score

  4. It returns the get_scan_type() of the winning loader

  5. If no loader scores above 0, it returns RawScanType.NONE (from DefaultLoader)

# Example with multiple loaders
classifier = RawScanClassifier()

# LoaderRegistry contains: [ORNLSpiceLoader, MyFormatLoader, DefaultLoader]
# For "scan.myformat":
#   ORNLSpiceLoader.get_score("scan.myformat") → 0.0
#   MyFormatLoader.get_score("scan.myformat") → 1.0
#   DefaultLoader.get_score("scan.myformat") → 0.0
# Winner: MyFormatLoader
file_type = classifier.get_classification("scan.myformat")
# Result: RawScanType.MyFormat

Best Practices

  1. Implement Robust Scoring: Your get_score() method should be fast and use multiple heuristics, returning a float between 0.0 and 1.0:

    • Check file extension

    • Validate magic bytes (file header)

    • Check for format-specific markers

    • Examine file structure

  2. Handle Edge Cases: Return 0.0 for files you can’t handle, not negative scores

  3. Document Your Format: Add docstrings explaining what file formats your loader supports

  4. Fail Gracefully: If file parsing fails in load(), raise clear exceptions with context

  5. Keep Score Ranges Consistent:

    • 0.0 = definitely not your format

    • 0.01-0.5 = uncertain

    • 0.51-1.0 = confident match

Example: Real-World Implementation

The ORNLSpiceLoader demonstrates a production-ready implementation:

class ORNLSpiceLoader(AbstractLoader):
    """Loader for ORNL Spice format scan files."""

    def __init__(self, filestore: FileStoreInterface) -> None:
        """Initialize ORNL Spice loader with classifier."""
        super().__init__(filestore)
        self.classifier = RuleBasedClassifier()
        self.classification_rules = ORNLSpiceRuleSet()

    def get_score(self, path: str) -> int:
        """Get score for scan using rule-based classification."""
        # Uses a dedicated RuleBasedClassifier for intelligent scoring
        return self.classifier.get_score(path, self.classification_rules)

This example shows how you can: - Use helper classifiers for complex format detection - Leverage rule sets for sophisticated scoring logic - Maintain clean separation of concerns

Troubleshooting

My loader never gets selected
  • Check that get_score() returns a higher score than other loaders for your file type (between 0.0 and 1.0)

  • Verify your loader is registered in LoaderRegistry.__init__()

  • Ensure get_scan_type() returns the correct enum value

RawScanType.NONE is returned
  • All loaders returned score 0.0, or no loaders were registered

  • Verify your loader’s get_score() implementation returns values between 0.0 and 1.0

  • Check that your RawScanType enum value exists

Precommit checks fail
  • Add module and class docstrings to your loader file

  • Add docstrings to all public methods

  • Run ruff format to auto-fix formatting issues

See Also

  • RuleBasedClassifier - For implementing complex classification logic

  • Loader Interface: src/tavi/library/storage/loader/interface/loader_interface.py

  • Existing Implementations: src/tavi/library/storage/loader/ornl_spice_loader.py