Looking for a quick reference? Jump to the Module Index at the bottom.
Architecture Overview
Input Layer
Data ingestion and preparation. Modules: Ingest, Parse, Split, Normalize
Core Processing
Intelligence and understanding. Modules: Semantic Extract, KG, Ontology, Reasoning
Storage
Persistent data storage. Modules: Embeddings, Vector Store, Graph Store, Triplet Store
Quality Assurance
Data quality and consistency. Modules: Deduplication, Conflicts
Context & Memory
Agent memory and decision tracking. Modules: Context, Provenance, Change Management
Output & Orchestration
Export, visualization, and workflows. Modules: Export, Visualization, Pipeline, Explorer
Input Layer
Ingest
Loads data from files, web, databases, and streams into a unifiedSourceDocument format.
FileIngestor, WebIngestor, ParquetIngestor, XMLIngestor, RESTIngestor, PublicAPIIngestor, DBIngestor, DuckDBIngestor, ElasticIngestor, EmailIngestor, FeedIngestor, GDriveIngestor, HuggingFaceIngestor, MCPIngestor, MongoIngestor, OntologyIngestor, PandasIngestor, RepoIngestor, SnowflakeIngestor, StreamIngestor
Parse
Extracts structured text and layout metadata from raw documents.DocumentParser, DoclingParser, CodeParser, CSVParser, DocxParser, EmailParser, ExcelParser, HTMLParser, ImageParser, JSONParser, MCPParser, MediaParser, PDFParser, PPTXParser, StructuredDataParser, WebParser, XMLParser
Split
Chunks text for embedding and RAG pipelines with awareness of semantic boundaries.recursive, semantic_transformer, entity_aware, relation_aware, sliding_window, structural
Normalize
Cleans and standardizes text before semantic processing.Core Processing
Semantic Extract
Named entity recognition, relation extraction, and triplet generation."pattern" (no API key), "ml" (local model), "llm" (any of the 8 supported providers)
Additional extractors: CoreferenceResolver, EventDetector, SemanticAnalyzer, SemanticNetworkExtractor
Knowledge Graph
Graph construction, graph algorithms, temporal model, and distance intelligence.Ontology
Schema management including SHACL, SKOS, alignments, diff/migration, auto-generation, and the visual Ontology Hub (v0.5.0).OntologyGenerator, SHACLGenerator, OntologyValidator, OntologyEvaluator, LLMOntologyGenerator, OWLGenerator, PropertyGenerator, DomainOntologies, NamespaceManager
Reasoning
Derives new facts from existing knowledge using multiple inference strategies.Storage
Embeddings
Generates and manages vector embeddings for semantic similarity.EmbeddingGenerator, TextEmbedder, VectorEmbeddingManager, GraphEmbeddingManager, PoolingStrategies
Vector Store
Multi-backend vector database with hybrid search support.Graph Store
Connects to graph databases for persistent, query-able storage.Triplet Store
RDF triple-based storage with SPARQL query support.Quality Assurance
Deduplication
Detects, scores, and merges duplicate entities across sources.blocking_v2, hybrid_v2, semantic_v2) are up to 7x faster than v1.
Components: EntityResolver, DuplicateDetector, EntityMerger, SimilarityCalculator, ClusterBuilder
DuplicateDetector options: max_results, top_k_per_entity, min_similarity, sort_by
Conflicts
Detects and resolves fact conflicts across overlapping knowledge sources.Context & Memory
Context
Agent context graphs, decision tracking, causal chains, and precedent search.AgentContext, ContextGraph, AgentMemory, DecisionRecorder, CausalAnalyzer, EntityLinker, PolicyEngine
Provenance
W3C PROV-O compliant lineage tracking across all modules.ProvenanceManager, IntegrityChecker, BridgeAxiom, ProvenanceStorage
Change Management
Version control with SHA-256 checksums, diffs, and rollback.TemporalVersionManager, ChangeLog, OntologyVersionManager, VersionStorage
Output & Orchestration
Export
Serializes graphs to downstream formats for analytics, semantic web, or graph databases.Visualization
Renders interactive and static knowledge graph visualizations.GraphVisualizer, OntologyVisualizer, EmbeddingVisualizer, SemanticNetworkVisualizer, TemporalVisualizer, AnalyticsVisualizer
Layout algorithms: force-directed, hierarchical, circular
Pipeline
Pipeline DSL with parallel workers, retry policies, and failure handling.Pipeline, PipelineBuilder, ExecutionEngine, FailureHandler, PipelineValidator, ParallelismManager, ResourceScheduler
Explorer
FastAPI Knowledge Explorer with Ontology Hub, WebSocket progress, bidirectional path finding, and indexed search (0.004ms on 118k nodes).Utilities
LLM Providers
Unified interface to all supported LLM providers.MCP Server
Exposes Semantica as an MCP stdio server for IDE and agent integrations.Seed
Bootstrap knowledge graphs from verified structured sources: fixed-point reference data, controlled vocabularies, and domain anchors.Evals
Evaluation framework for measuring KG quality, extraction accuracy, and pipeline performance.KGEvaluator, ExtractionEvaluator, PipelineEvaluator, RegressionTracker
Core
Base classes, shared data models, and the plugin registry used across all modules.Semantica, PluginRegistry, ConfigManager, LifecycleManager, HealthMonitor, Config
Utils
Shared utilities for ID generation, date parsing, validation, and logging.helpers, validators, constants, types, exceptions, logging, ProgressTracker
Common Module Chains
- Document → KG
- GraphRAG
- AI Agent
- Compliance Pipeline
- Web Scraping → Graph
- Temporal Analysis
Load documents from any source and turn them into a queryable knowledge graph.Pipeline: Best for: research pipelines, enterprise data extraction, document intelligence
Ingest → Parse → Normalize → Semantic Extract → GraphBuilder → KGModule Index
| Module | Purpose | Key Classes |
|---|---|---|
| ingest | Data ingestion | FileIngestor, WebIngestor, ParquetIngestor, XMLIngestor |
| parse | Document parsing | DocumentParser, DoclingParser |
| split | Text chunking | TextSplitter |
| normalize | Data cleaning | TextNormalizer, EntityNormalizer, LanguageDetector |
| semantic_extract | NER & relation extraction | NERExtractor, RelationExtractor, TripletExtractor, SemanticAnalyzer, SemanticNetworkExtractor, ExtractionValidator |
| kg | Graph construction | GraphBuilder, TemporalGraphQuery, SimilarityCalculator |
| ontology | Schema management | OntologyGenerator, SHACLGenerator |
| reasoning | Logical inference | Reasoner, DatalogReasoner |
| embeddings | Vector embeddings | EmbeddingGenerator |
| vector_store | Vector database | VectorStore |
| graph_store | Graph database | GraphStore |
| triplet_store | RDF triple store | TripletStore |
| deduplication | Entity resolution | EntityResolver, DuplicateDetector, ClusterBuilder, MergeStrategyManager |
| conflicts | Conflict resolution | ConflictDetector |
| context | Agent context & decisions | AgentContext, ContextGraph |
| provenance | W3C PROV-O lineage | ProvenanceManager |
| change_management | Version control | TemporalVersionManager |
| export | Data export | RDFExporter, ParquetExporter |
| visualization | Graph visualization | GraphVisualizer |
| pipeline | Workflow orchestration | Pipeline, PipelineBuilder |
| explorer | Knowledge Explorer UI | start_explorer |
| llms | LLM providers | Groq, OpenAI, create_provider |
| mcp_server | MCP stdio server | python -m semantica.mcp_server |
| seed | KG bootstrapping from structured sources | SeedManager |
| evals | Quality evaluation | KGEvaluator, ExtractionEvaluator, PipelineEvaluator, RegressionTracker |
| core | Base classes & registry | Semantica, ConfigManager, PluginRegistry, LifecycleManager |
| utils | Shared utilities | helpers, validators |
Getting Started
Your first knowledge graph in 5 minutes.
Cookbook
40+ domain notebooks with real-world examples.
API Reference
Full technical documentation.
Use Cases
Domain-specific examples.
