The fundamental ideas behind Semantica: knowledge graphs, reasoning, provenance, and temporal intelligence explained.
New here? Start with Getting Started for hands-on examples, then return here for deeper understanding.
Semantica transforms unstructured data: documents, web pages, reports, databases: into knowledge graphs: structured representations that AI systems can query, reason about, and trace back to sources.At its core, Semantica adds a context and accountability layer on top of your existing AI stack. It doesn’t replace LangChain, LlamaIndex, or your LLM provider: it makes their outputs grounded, traceable, and auditable.
Context Layer
Knowledge graphs, GraphRAG retrieval, semantic embeddings, and temporal intelligence ground every LLM response in structured, queryable facts.
Accountability Layer
Provenance tracking, decision intelligence, conflict detection, and W3C PROV-O compliance make every claim in your AI stack auditable and explainable.
Extension Layer
PluginRegistry and MethodRegistry let you replace or augment any component: ingestors, extractors, reasoning engines, backends: without changing framework code.
This structure makes knowledge searchable, connectable, queryable, and: critically: explainable: every answer can be traced back to the facts and relationships that produced it.
Relationships can be extracted via rule-based methods, ML models, or LLMs: each producing typed triplets with confidence scores and source attribution.
Stores dense embeddings of text chunks. Answers questions by finding semantically similar passages: useful when the structure of the answer isn’t known in advance.
Strength
Why it matters
Fuzzy similarity
Finds relevant content even when exact words don’t match
Speed
Sub-millisecond approximate nearest-neighbor search at scale
Unstructured text
Works directly on paragraphs, sentences, and raw documents
Simplicity
No schema design required: embed and index
Use when: you need fast semantic search over large text corpora.
from semantica.vector_store import VectorStorestore = VectorStore(backend="faiss", dimension=768)store.add_documents(["Apple was founded in 1976.", "Google was founded in 1998."])results = store.search("tech company founding dates", limit=5)
Semantica combines both: vector search seeds the graph traversal, and the graph provides structure and provenance the vector store cannot.
Step
What happens
Query embedding
User query is embedded and used to find anchor nodes via vector similarity
Graph traversal
Multi-hop traversal from anchor nodes retrieves related entities and relationships
Context assembly
Facts + relationships are assembled with source attribution for each claim
LLM generation
LLM generates an answer grounded in the retrieved structured context
Result: every claim in the response links back to a specific graph node: no hallucination from training data, full audit trail.
from semantica.context import AgentContext, ContextGraphfrom semantica.vector_store import VectorStorecontext = AgentContext( vector_store=VectorStore(backend="faiss", dimension=768), knowledge_graph=ContextGraph(advanced_analytics=True),)result = context.query("Who founded Apple?", mode="graphrag")
Embeddings convert text into numerical vectors so AI systems can measure semantic similarity: finding related concepts even when the exact words differ.Semantica uses embeddings for:
Semantic search: retrieve by meaning, not just keywords
Entity resolution: match the same entity across different sources
GraphRAG (Graph-Augmented Retrieval Augmented Generation) enhances LLM responses by grounding them in a structured knowledge graph rather than raw text chunks alone.
1
User submits a query
The query is embedded and used to seed both vector search and graph traversal simultaneously.
2
Hybrid context retrieval
Semantica retrieves relevant graph context: entities, typed relationships, and multi-hop reasoning paths: alongside vector-similar text chunks.
3
Context building
Retrieved facts and reasoning paths are assembled into a structured prompt context, each fact tagged with its source node and confidence.
4
LLM generates a grounded response
The LLM produces an answer where every claim links back to a source node in the graph: no floating assertions, no hallucinations from training data.
GraphRAG eliminates the hallucination and traceability problems of standard RAG. Standard RAG retrieves text chunks; GraphRAG retrieves structured facts with typed relationships. The LLM cannot confabulate structure that was never in the graph.
Semantica can auto-generate ontologies from your knowledge graph or import existing OWL/RDF/Turtle ontologies. The Ontology Hub (v0.5.0) adds a visual editor, SHACL Studio, alignment authoring, and a live health dashboard. See the Ontology reference for the full 6-stage generation pipeline.
Efficient pattern matching for large rule sets: the Rete algorithm avoids re-evaluating rules whose preconditions haven’t changed. Best for thousands of rules over millions of facts.
from semantica.reasoning import ReteEngineengine = ReteEngine()engine.load_rules("rules/domain_rules.json")results = engine.run(kg)
Deductive: classical syllogistic reasoning from premises to guaranteed conclusions.Abductive: infers the most likely explanation for observed evidence. Best for diagnostic and investigative use cases.
Knowledge changes over time. Temporal graphs attach valid_from / valid_until windows to nodes and edges, enabling point-in-time queries and historical analysis.
from semantica.kg import TemporalGraphQueryfrom datetime import datetimequery_engine = TemporalGraphQuery(enable_temporal_reasoning=True)# Query the graph as it existed on a specific datesnapshot = query_engine.query_at_time(kg, query="", at_time=datetime(2021, 6, 15))
Supported features: Allen interval algebra (all 13 temporal relations), OWL-Time export, recorded_at stamping, temporal provenance.Common uses: tracking company leadership changes, policy evolution, research timelines, financial instrument histories, regulatory compliance windows.
Explore the semantic neighborhood of any entity in your graph: useful for understanding what’s conceptually close, detecting clusters, and visualizing knowledge topology.
from semantica.kg import SimilarityCalculatorcalc = SimilarityCalculator()scores = calc.calculate_similarity(entity_a, entity_b)
Features: N×N semantic distance matrices, ego-mode visualization, distance band classification (near / mid / far), embedding cache optimization for large graphs.The Visualization module renders distance matrices as interactive heatmaps and ego-mode neighborhood graphs. The Explorer embeds distance intelligence directly in the browser dashboard.
Real-world data contains the same entity under many names: “Apple”, “Apple Inc.”, “Apple Computer Inc.” Semantica’s deduplication pipeline detects these, merges attributes, resolves conflicts, and preserves the original source provenance.
Strategies
Configuration
Strategy
Algorithm
Best For
v1
Jaro-Winkler string similarity
Small datasets, fast baseline
blocking_v2
Candidate blocking + similarity
Large corpora: reduces O(n²) comparisons
hybrid_v2
Blocking + semantic embedding match
Mixed structured/unstructured entity names
semantic_v2
Pure embedding-based resolution
Up to 7× faster than v1; handles abbreviations and aliases
The ontology rules applied during graph construction
The reasoning steps that produced any inferred fact
This is W3C PROV-O compliant lineage: suitable for regulated industries that require audit trails (HIPAA, SOX, GDPR, FDA 21 CFR Part 11). Use RDFExporter(include_provenance=True) to embed provenance inline in any RDF export.
Every agent decision is a first-class object in Semantica: recorded, causally linked, and searchable by precedent. This is the accountability layer for AI pipelines: decisions are no longer ephemeral log messages, they are queryable knowledge graph nodes.
decision_id = context.record_decision( category="model_selection", scenario="Choose LLM for production pipeline", reasoning="GPT-4 benchmark advantage justifies 3x cost increase", outcome="selected_gpt4", confidence=0.91,)# Find similar past decisions before making a new oneprecedents = context.find_precedents("model selection reasoning", limit=5)# Trace downstream impact of a past decisioninfluence = context.analyze_decision_influence(decision_id)
Use find_precedents() before every high-stakes decision. Hybrid similarity search over all recorded decisions surfaces past reasoning that may apply: reducing inconsistency across agent runs and enabling genuine organisational learning from AI decision history.
When multiple sources disagree on the same fact, Semantica flags and resolves the conflict rather than silently picking one value.Resolution strategies:
Recency: prefer the most recent source
Source credibility: prefer the most reliable source (configurable credibility scores)
Majority vote: aggregate across all sources with ≥ 2 agreeing
Manual review: flag for human arbitration; continue pipeline without blocking
See the Conflicts reference for ConflictResolver, SourceTracker, and InvestigationGuideGenerator.
Semantica is designed for extension. Any component: ingestor, extractor, graph builder, reasoning engine: can be replaced or augmented with a custom implementation registered at runtime.
PluginRegistry: replace any component by name
PluginRegistry provides dynamic plugin discovery, registration, and loading across all modules. Register your own class under a string key; Semantica will use it wherever that key is referenced in config or pipeline steps.
from semantica.core import PluginRegistryregistry = PluginRegistry()# Register a custom ingestorregistry.register_plugin( "my_sql_ingestor", MySQLIngestor, version="1.0.0", description="PostgreSQL ingestor for internal warehouse", capabilities=["ingest"],)# Load and useplugin = registry.load_plugin("my_sql_ingestor", connection_string="postgresql://...")result = plugin.execute("SELECT * FROM documents")# Reference by name in pipeline YAML: no code changes needed
MethodRegistry lets you register custom methods on knowledge graph objects by name: useful for adding domain-specific graph operations without subclassing.
from semantica.kg import MethodRegistryregistry = MethodRegistry()def find_supply_chain_hops(graph, source_node, max_hops=3): """Custom BFS traversal for supply chain graphs.""" ...# Register under a string keyregistry.register("supply_chain_hops", find_supply_chain_hops)# Call by name on any graph objectresult = registry.call("supply_chain_hops", kg, source_node="Supplier_A", max_hops=5)# List all registered methodsprint(registry.list_methods()) # ["supply_chain_hops", ...]