semantica.utils provides shared infrastructure used throughout Semantica:
  • Structured logging: setup_logging(), get_logger(), log_execution_time decorator
  • Validation helpers: validate_entity() and validate_config() return (bool, Optional[str]) without raising
  • Progress tracking: ProgressTracker class and track_progress() iterable wrapper with ETA
  • Typed exceptions: SemanticaError, ValidationError, ProcessingError, ConfigurationError, QualityError
Most users won’t call utils directly: it’s the shared foundation for all modules.

Exported Classes

NameTypeRole
setup_loggingfunctionConfigure the semantica root logger: accepts level, file, console, rotation kwargs
get_loggerfunctionGet a named logger instance (semantica.<name>)
log_execution_timedecoratorWraps a function: logs name, execution time, and success/failure
log_performancefunctionLog pre-collected performance metrics: log_performance(func_name, execution_time, **metrics)
validate_entityfunctionValidate entity dict: returns (bool, Optional[str]); does not raise
validate_configfunctionValidate config dict: returns (bool, Optional[str]); does not raise
ProgressTrackerclassClass-based progress tracker with ETA and step callbacks
track_progressfunctionWrap any iterable with a live progress bar
clean_textfunctionNormalize whitespace and strip zero-width control characters
hash_datafunctionDeterministic SHA-256 hash of a str, bytes, or dict
SemanticaErrorexceptionBase exception for all Semantica errors
ValidationErrorexceptionRaised when input fails validation; has .field, .value, .message
ProcessingErrorexceptionRaised during extraction, graph build, or pipeline step; has .stage
ConfigurationErrorexceptionRaised for configuration validation failures
QualityErrorexceptionRaised when data quality falls below threshold

What You Get

Logging

Structured logging with @log_execution_time decorator and quality metrics via environment variables.

Validation

validate_entity and validate_config with a typed ValidationError carrying field and value context.

Progress Tracking

track_progress wraps any iterable: auto-detects console vs Jupyter for the right renderer.

Helper Functions

clean_text, hash_data, safe_filename, and nested dict utilities used throughout the framework.

Exception Hierarchy

SemanticaErrorValidationError, ProcessingError: typed exceptions for targeted recovery.

File Utilities

read_json_file raises FileNotFoundError or json.JSONDecodeError on failure: no boilerplate try/except around JSON I/O.

Logging

1

Initialize logging at application startup

from semantica.utils import setup_logging, get_logger

setup_logging(level="INFO")   # "DEBUG" | "INFO" | "WARNING" | "ERROR"
logger = get_logger(__name__)
Call setup_logging(level="INFO") once at application startup. Without it, Semantica falls back to Python’s root logger, which may be silent or misconfigured. Call it before importing other Semantica modules to capture initialization messages.
2

Instrument expensive functions with the performance decorator

from semantica.utils import log_execution_time

@log_execution_time
def expensive_step(data):
    ...
# Logs: "expensive_step completed in 2.34s"
@log_execution_time is the performance decorator. Apply it to any function to automatically log its name, execution time, and success/failure. log_performance is a lower-level function for logging metrics you’ve already collected: it is not a decorator.
3

Configure via environment variables

export SEMANTICA_LOG_LEVEL=DEBUG
export SEMANTICA_LOG_FORMAT=json     # "json" | "text"
export SEMANTICA_DISABLE_PROGRESS=true

Validation

from semantica.utils import validate_entity, validate_config, ValidationError

# validate_entity returns (is_valid, error_message)
is_valid, error = validate_entity({"id": "1", "type": "PERSON", "text": "Alice"})
if not is_valid:
    raise ValidationError(error)

# validate_config returns (is_valid, error_message)
is_valid, error = validate_config(config, required_keys=["model", "provider"])
if not is_valid:
    print(f"Invalid config: {error}")
FunctionDescriptionReturns
validate_entity(data)Check entity dict has required fields (id, text, type) and correct typesTuple[bool, Optional[str]]
validate_config(cfg, required_keys=None)Check configuration dict; optionally enforce required keysTuple[bool, Optional[str]]

Progress Tracking

from semantica.utils import track_progress

# Wraps any iterable: auto-detects console vs Jupyter
for item in track_progress(items, desc="Processing documents"):
    process(item)
Supports:
  • Console: tqdm progress bar with ETA
  • Jupyter: notebook-compatible widget (auto-detected)
  • File: write progress to a log file
track_progress auto-detects Jupyter. In a terminal it renders a tqdm progress bar; in a Jupyter notebook it renders an interactive widget. You don’t need to check the environment: the same call works in both.

Helper Functions

from semantica.utils import clean_text, hash_data, safe_filename

# Normalize whitespace and strip control characters
clean = clean_text("  Hello   World  ")     # -> "Hello World"

# Deterministic SHA-256 hash of a string, bytes, or dict
uid   = hash_data({"key": "value"})         # -> hex digest string

# Sanitize a string for use as a filename
fname = safe_filename("My File?.txt")       # -> "My_File.txt"
hash_data() is deterministic across runs. Given the same input dict (any JSON-serializable object), hash_data() always returns the same SHA-256 hex string: suitable as a cache key or idempotency token in pipeline steps.

Nested Dict Utilities

Helper functions for deep configuration access: used extensively inside Config and ConfigManager:
from semantica.utils import get_nested_value, set_nested_value, merge_dicts

config = {
    "processing": {"batch_size": 32, "max_workers": 4},
    "llm":        {"provider": "groq", "model": "llama-3.3-70b-versatile"},
}

# Dot-notation read: returns default if key path is absent
batch = get_nested_value(config, "processing.batch_size", default=16)
# -> 32

# Dot-notation write
set_nested_value(config, "processing.batch_size", 64)

# Deep merge: nested keys are merged recursively (deep=True by default)
base      = {"a": {"x": 1, "y": 2}, "b": 3}
overrides = {"a": {"y": 99, "z": 4}, "c": 5}
merged    = merge_dicts(base, overrides)
# -> {"a": {"x": 1, "y": 99, "z": 4}, "b": 3, "c": 5}

Exception Hierarchy

from semantica.utils import SemanticaError, ValidationError, ProcessingError

try:
    run_pipeline(data)
except ValidationError as e:
    # Input data did not pass schema validation
    logger.error("Validation failed: %s", e.message)
except ProcessingError as e:
    # Failure during extraction or graph construction
    logger.error("Processing failed at stage %s: %s", e.stage, e)
except SemanticaError as e:
    # Catch-all for all Semantica framework errors
    logger.error("Framework error: %s", e)
ExceptionWhen RaisedKey Attributes
SemanticaErrorBase class: all framework errors inherit from this.message, .context, .error_code
ValidationErrorInput data failed schema or type validation.field, .value, .constraint
ProcessingErrorFailure during extraction, graph build, or pipeline step.stage, .input_data, .output_data
ConfigurationErrorConfiguration key missing or wrong type.config_key, .config_value, .expected_type
QualityErrorData quality score fell below threshold.quality_score, .threshold, .metrics
Catch SemanticaError as the broadest exception net. All framework errors inherit from SemanticaError, so except SemanticaError catches validation failures, processing errors, and everything in between. Use specific subclasses for targeted recovery logic.

File Utilities

from semantica.utils import read_json_file

# Read and parse a JSON file: raises FileNotFoundError or json.JSONDecodeError on failure
config = read_json_file("config.json")

Core

Framework orchestration that uses Utils internally.

Pipeline

Uses ProgressTracker for per-step tracking.