File Storage System
The file storage system provides an abstraction for managing file operations, with a local filesystem implementation. This design allows for flexible storage backends while maintaining a consistent interface for file handling, validation, and user data management.
Core Concepts
The storage system distinguishes between two primary concerns:
Interface (
FileStoreInterface) Defines the contract for any file storage implementation.Implementation (
LocalFileStore) Concrete implementation for local filesystem operations.
Architecture Overview
Components:
FileStoreInterface- Abstract base class defining storage contractLocalFileStore- Concrete implementation for local filesystemConfig- Configuration system for runtime settingsPath(pathlib) - Cross-platform path handling
Responsibilities by method:
File Reading/Writing: Read and write text files with automatic creation
File Validation: Check file existence, type, and size constraints
File Inspection: Extract metadata (extension, size)
Directory Scanning: Fetch and filter files from directories
User Data Storage: Write application data to user-specific locations
FileStoreInterface
Abstract base class defining the storage contract.
Methods:
def fetch_files_at(path: str) -> list[str]
"""Fetch all valid files from a directory."""
def validate_file(file_path: str) -> bool
"""Validate if a file meets storage requirements."""
def write_user_data_file(file_subpath: str, value: str) -> None
"""Write application data to user home directory."""
def write_text_file(file_path: str, value: str) -> None
"""Write text content to a file."""
def read_text_file(file_path: str) -> str
"""Read text content from a file."""
def get_file_ext(file_path: str) -> str
"""Get the file extension."""
def get_file_size_mb(file_path: str) -> float
"""Get file size in megabytes."""
LocalFileStore Implementation
Concrete implementation for local filesystem operations.
File Writing
write_text_file(file_path, value)
Creates file if it doesn’t exist
Overwrites existing files
Creates parent directories as needed (if using Path operations elsewhere)
Logs all write operations
Example:
filestore = LocalFileStore()
filestore.write_text_file("/path/to/file.txt", "content")
write_user_data_file(file_subpath, value)
Writes to user application home directory
Resolves tilde paths
Note: Currently checks if value (content) starts with “/”, not the path
Example:
filestore = LocalFileStore()
# Writes to $USER_HOME/mydata/config.txt
filestore.write_user_data_file("mydata/config.txt", "settings")
File Reading
read_text_file(file_path)
Reads entire file content as string
Raises
RuntimeErrorif file doesn’t exist or is a directoryValidates file existence before reading
Example:
content = filestore.read_text_file("/path/to/file.txt")
File Validation & Inspection
validate_file(file_path)
Checks if a file meets storage requirements:
File must exist
Path must point to a file (not directory)
File size must not exceed
library.filestore.raw.size-limit(MB)
Returns True if all checks pass, False otherwise.
Example:
if filestore.validate_file("/path/to/data.dat"):
process_file("/path/to/data.dat")
get_file_ext(file_path)
Returns file extension including the dot (e.g., “.txt”, “.json”)
Raises
RuntimeErrorif file doesn’t existWorks with multi-part extensions (e.g., “.tar.gz” returns “.gz”)
get_file_size_mb(file_path)
Returns file size in megabytes (float)
Raises
RuntimeErrorif file doesn’t existConversion:
size_bytes / (1024 * 1024)
Example:
size_mb = filestore.get_file_size_mb("/path/to/data.h5")
if size_mb > 1000:
print(f"Large file: {size_mb:.2f} MB")
Directory Operations
fetch_files_at(path)
Retrieves and filters files from a directory.
Process:
Validates directory path exists
Raises
RuntimeErrorif path is a fileIterates through directory contents
Applies
validate_file()to each itemReturns list of valid file paths (absolute)
Example:
valid_files = filestore.fetch_files_at("/data/scans")
for filepath in valid_files:
print(f"Processing: {filepath}")
Internal Methods
_is_real_file(file_path, throws=False)
Helper method to validate if path is a real file.
Parameters:
file_path- Path object to checkthrows- If True, raisesRuntimeErroron failure
Returns:
Trueif path exists and is a fileFalseif path doesn’t exist or is not a fileRaises
RuntimeErrorifthrows=Trueand validation fails
Configuration
Size Limit
The maximum file size for validation is controlled by:
Config["library.filestore.raw.size-limit"] # in MB
Default value should be set in application configuration.
User Home Directory
User data files are written relative to:
Config["user.application.home"] # typically ~/.<appname>
This supports tilde expansion for cross-platform compatibility.
Error Handling
Exception Types
RuntimeError is raised for:
File operations on nonexistent files
Directory operations on file paths
Directory operations on nonexistent paths
Invalid file types during validation
Common Scenarios:
try:
content = filestore.read_text_file("/missing/file.txt")
except RuntimeError as e:
print(f"File error: {e}") # "File at path ... does not exist"
try:
filestore.fetch_files_at("/data/file.txt") # passing a file
except RuntimeError as e:
print(f"Path error: {e}") # "Path ... is a file. A folder path is expected"
Validation Failures
validate_file() returns False instead of raising exceptions:
File doesn’t exist
Path is a directory
File exceeds size limit
This allows graceful filtering without exception handling:
# Graceful filtering
files = filestore.fetch_files_at("/data")
# Returns only files that pass all validation checks
Usage Examples
Reading Scan Data
from tavi.library.storage.local_file_store import LocalFileStore
filestore = LocalFileStore()
# Fetch all valid scan files
scan_files = filestore.fetch_files_at("/experiments/exp123/scans")
for filepath in scan_files:
# Check before processing
if filestore.validate_file(filepath):
ext = filestore.get_file_ext(filepath)
size = filestore.get_file_size_mb(filepath)
content = filestore.read_text_file(filepath)
process_scan(content, size)
Storing User Configuration
filestore = LocalFileStore()
config_data = """
[settings]
theme=dark
resolution=1920x1080
"""
# Automatically writes to user home
filestore.write_user_data_file(
"config/settings.ini",
config_data
)
Testing Considerations
Test Fixtures
Use TemporaryDirectory for isolated tests:
from tempfile import TemporaryDirectory
with TemporaryDirectory() as tmpdir:
filestore.write_text_file(f"{tmpdir}/test.txt", "data")
# Directory auto-cleaned after context
Configuration Overrides
Use Config_override context manager for tests:
from tests.util.Config_helpers import Config_override
with Config_override("library.filestore.raw.size-limit", 100):
# Tests with custom size limit
result = filestore.validate_file(filepath)
Design Characteristics
Interface-based: Abstraction allows multiple implementations
Error-aware: Clear separation between validation failure and exceptions
Configuration-driven: Runtime settings without code changes
Path-safe: Uses pathlib for cross-platform compatibility
Simple semantics: Direct mapping to filesystem operations
Logging: All writes are logged for audit trails
Recommended Practices
Validate files before processing: Use
validate_file()to filterHandle size constraints: Check
get_file_size_mb()for large filesUse user data API: Call
write_user_data_file()for application configLog operations: Leverage built-in logging for debugging
Test with temp directories: Ensure tests clean up after themselves
Override config in tests: Use
Config_overridefor testabilityCatch RuntimeError: Be specific about which operations may fail
Future Considerations
Remote storage backends (S3, cloud storage)
Async file operations
File compression support
Additional metadata extraction (creation time, permissions)
Streaming for large files
File caching layer