semantica.utils provides shared infrastructure used throughout Semantica:
- Structured logging:
setup_logging(),get_logger(),log_execution_timedecorator - Validation helpers:
validate_entity()andvalidate_config()return(bool, Optional[str])without raising - Progress tracking:
ProgressTrackerclass andtrack_progress()iterable wrapper with ETA - Typed exceptions:
SemanticaError,ValidationError,ProcessingError,ConfigurationError,QualityError
Exported Classes
| Name | Type | Role |
|---|---|---|
setup_logging | function | Configure the semantica root logger: accepts level, file, console, rotation kwargs |
get_logger | function | Get a named logger instance (semantica.<name>) |
log_execution_time | decorator | Wraps a function: logs name, execution time, and success/failure |
log_performance | function | Log pre-collected performance metrics: log_performance(func_name, execution_time, **metrics) |
validate_entity | function | Validate entity dict: returns (bool, Optional[str]); does not raise |
validate_config | function | Validate config dict: returns (bool, Optional[str]); does not raise |
ProgressTracker | class | Class-based progress tracker with ETA and step callbacks |
track_progress | function | Wrap any iterable with a live progress bar |
clean_text | function | Normalize whitespace and strip zero-width control characters |
hash_data | function | Deterministic SHA-256 hash of a str, bytes, or dict |
SemanticaError | exception | Base exception for all Semantica errors |
ValidationError | exception | Raised when input fails validation; has .field, .value, .message |
ProcessingError | exception | Raised during extraction, graph build, or pipeline step; has .stage |
ConfigurationError | exception | Raised for configuration validation failures |
QualityError | exception | Raised when data quality falls below threshold |
What You Get
Logging
Structured logging with
@log_execution_time decorator and quality metrics via environment variables.Validation
validate_entity and validate_config with a typed ValidationError carrying field and value context.Progress Tracking
track_progress wraps any iterable: auto-detects console vs Jupyter for the right renderer.Helper Functions
clean_text, hash_data, safe_filename, and nested dict utilities used throughout the framework.Exception Hierarchy
SemanticaError → ValidationError, ProcessingError: typed exceptions for targeted recovery.File Utilities
read_json_file raises FileNotFoundError or json.JSONDecodeError on failure: no boilerplate try/except around JSON I/O.Logging
Validation
| Function | Description | Returns |
|---|---|---|
validate_entity(data) | Check entity dict has required fields (id, text, type) and correct types | Tuple[bool, Optional[str]] |
validate_config(cfg, required_keys=None) | Check configuration dict; optionally enforce required keys | Tuple[bool, Optional[str]] |
Progress Tracking
- Console: tqdm progress bar with ETA
- Jupyter: notebook-compatible widget (auto-detected)
- File: write progress to a log file
Helper Functions
Nested Dict Utilities
Helper functions for deep configuration access: used extensively insideConfig and ConfigManager:
Exception Hierarchy
Exception types and when they're raised
Exception types and when they're raised
| Exception | When Raised | Key Attributes |
|---|---|---|
SemanticaError | Base class: all framework errors inherit from this | .message, .context, .error_code |
ValidationError | Input data failed schema or type validation | .field, .value, .constraint |
ProcessingError | Failure during extraction, graph build, or pipeline step | .stage, .input_data, .output_data |
ConfigurationError | Configuration key missing or wrong type | .config_key, .config_value, .expected_type |
QualityError | Data quality score fell below threshold | .quality_score, .threshold, .metrics |
File Utilities
Core
Framework orchestration that uses Utils internally.
Pipeline
Uses ProgressTracker for per-step tracking.
