New here? Start with Getting Started for hands-on examples, then return here for deeper understanding.
Semantica transforms unstructured data: documents, web pages, reports, databases: into knowledge graphs: structured representations that AI systems can query, reason about, and trace back to sources. At its core, Semantica adds a context and accountability layer on top of your existing AI stack. It doesn’t replace LangChain, LlamaIndex, or your LLM provider: it makes their outputs grounded, traceable, and auditable.

Context Layer

Knowledge graphs, GraphRAG retrieval, semantic embeddings, and temporal intelligence ground every LLM response in structured, queryable facts.

Accountability Layer

Provenance tracking, decision intelligence, conflict detection, and W3C PROV-O compliance make every claim in your AI stack auditable and explainable.

Extension Layer

PluginRegistry and MethodRegistry let you replace or augment any component: ingestors, extractors, reasoning engines, backends: without changing framework code.

Knowledge Graphs

Knowledge graph node and edge structure showing entities (Person, Organization, Location, Date) and their typed relations The foundation of everything in Semantica. A knowledge graph stores information as three building blocks:
  • Nodes (entities): people, companies, locations, events, concepts
  • Edges (relationships): works_for, located_in, founded_by
  • Properties: name, date, confidence score, source URL
This structure makes knowledge searchable, connectable, queryable, and: critically: explainable: every answer can be traced back to the facts and relationships that produced it.

Entity Extraction (NER)

Scanning text to find and classify real-world entities:
# Input: "Apple Inc. was founded by Steve Jobs in 1976 in Cupertino."
{
    "entities": [
        {"text": "Apple Inc.",  "type": "ORGANIZATION", "confidence": 0.98},
        {"text": "Steve Jobs",  "type": "PERSON",       "confidence": 0.99},
        {"text": "1976",        "type": "DATE",         "confidence": 0.95},
        {"text": "Cupertino",   "type": "LOCATION",     "confidence": 0.97}
    ]
}
Each entity gets a type, confidence score, and a link to its source document. Three extraction methods are available:
MethodSpeedAccuracyRequirements
"pattern"⚡ Very fastModerateNo API key: regex-based
"ml"FastHighLocal ML model
"llm"MediumHighestLLM provider: all 9 supported

Relationship Extraction

Finding how entities connect to each other:
{
    "relationships": [
        {"subject": "Steve Jobs", "predicate": "founded",    "object": "Apple Inc.", "confidence": 0.92},
        {"subject": "Apple Inc.", "predicate": "located_in", "object": "Cupertino",  "confidence": 0.89}
    ]
}
Relationships can be extracted via rule-based methods, ML models, or LLMs: each producing typed triplets with confidence scores and source attribution.

Knowledge Graph vs. Vector Store

Both store information for AI retrieval: but they’re built for different jobs.
Stores structured facts as typed nodes and labeled edges. Answers questions that require understanding relationships between entities.
StrengthWhy it matters
TraversalMulti-hop queries: “Who founded companies that Apple alumni later joined?”
ExplainabilityEvery answer traces back to specific nodes and edges: no black-box retrieval
Temporal reasoningPoint-in-time queries, valid_from/valid_until windows, historical snapshots
Conflict detectionTwo sources disagreeing on the same fact is surfaced and resolvable
Schema enforcementSHACL validation catches constraint violations before they corrupt results
Use when: you need structured reasoning, provenance, compliance, or explainability.
from semantica.kg import GraphBuilder, PathFinder

graph   = GraphBuilder(merge_entities=True).build(entities=entities, relationships=rels)
finder  = PathFinder()
path    = finder.dijkstra_shortest_path(graph, "Steve Jobs", "Tim Cook")

Embeddings

Embeddings convert text into numerical vectors so AI systems can measure semantic similarity: finding related concepts even when the exact words differ. Semantica uses embeddings for:
  • Semantic search: retrieve by meaning, not just keywords
  • Entity resolution: match the same entity across different sources
  • Precedent search: find similar past decisions
  • GraphRAG retrieval: hybrid vector + graph traversal
  • Distance Intelligence: N×N semantic distance matrices between any node set
Supported models: Sentence-Transformers, FastEmbed, OpenAI, BGE, Ollama local embeddings.

GraphRAG

GraphRAG (Graph-Augmented Retrieval Augmented Generation) enhances LLM responses by grounding them in a structured knowledge graph rather than raw text chunks alone. GraphRAG flow: User Query → Vector Search + Graph Traversal → Context Builder → LLM → Grounded Answer
1

User submits a query

The query is embedded and used to seed both vector search and graph traversal simultaneously.
2

Hybrid context retrieval

Semantica retrieves relevant graph context: entities, typed relationships, and multi-hop reasoning paths: alongside vector-similar text chunks.
3

Context building

Retrieved facts and reasoning paths are assembled into a structured prompt context, each fact tagged with its source node and confidence.
4

LLM generates a grounded response

The LLM produces an answer where every claim links back to a source node in the graph: no floating assertions, no hallucinations from training data.
GraphRAG eliminates the hallucination and traceability problems of standard RAG. Standard RAG retrieves text chunks; GraphRAG retrieves structured facts with typed relationships. The LLM cannot confabulate structure that was never in the graph.

Ontology

An ontology defines the schema and rules for your knowledge: what entity types exist, which relationships are valid, and what constraints apply.
ontology = {
    "classes": ["Person", "Organization", "Location"],
    "relationships": ["works_for", "located_in", "founded_by"],
    "rules": {
        "Person":       ["must_have_name"],
        "Organization": ["must_have_name", "can_have_founding_date"]
    }
}
Semantica can auto-generate ontologies from your knowledge graph or import existing OWL/RDF/Turtle ontologies. The Ontology Hub (v0.5.0) adds a visual editor, SHACL Studio, alignment authoring, and a live health dashboard. See the Ontology reference for the full 6-stage generation pipeline.

Reasoning & Inference

Semantica includes multiple reasoning engines to derive new knowledge from existing facts.
Known:    Steve Jobs founded Apple Inc.
Known:    Apple Inc. is headquartered in Cupertino
Inferred: Steve Jobs has a connection to Cupertino
Applies IF/THEN rules repeatedly until no new facts can be derived. Best for alert systems, compliance checks, and trigger-based workflows.
from semantica.reasoning import Reasoner, Rule, Fact, RuleType

engine = Reasoner()
engine.add_fact(Fact(subject="Alice", predicate="is_a", obj="Manager"))
engine.add_rule(Rule(
    rule_type=RuleType.FORWARD_CHAIN,
    conditions=[{"subject": "?x", "predicate": "is_a", "object": "Manager"}],
    conclusion={"subject": "?x", "predicate": "has_authority", "object": "true"}
))
result = engine.infer()
All engines produce explainable inference paths: not black-box conclusions. Every derived fact includes the rules and premises that produced it.

Temporal Intelligence

Knowledge changes over time. Temporal graphs attach valid_from / valid_until windows to nodes and edges, enabling point-in-time queries and historical analysis.
from semantica.kg import TemporalGraphQuery
from datetime import datetime

query_engine = TemporalGraphQuery(enable_temporal_reasoning=True)

# Query the graph as it existed on a specific date
snapshot = query_engine.query_at_time(kg, query="", at_time=datetime(2021, 6, 15))
Supported features: Allen interval algebra (all 13 temporal relations), OWL-Time export, recorded_at stamping, temporal provenance. Common uses: tracking company leadership changes, policy evolution, research timelines, financial instrument histories, regulatory compliance windows.

Distance Intelligence

Explore the semantic neighborhood of any entity in your graph: useful for understanding what’s conceptually close, detecting clusters, and visualizing knowledge topology.
from semantica.kg import SimilarityCalculator

calc   = SimilarityCalculator()
scores = calc.calculate_similarity(entity_a, entity_b)
Features: N×N semantic distance matrices, ego-mode visualization, distance band classification (near / mid / far), embedding cache optimization for large graphs. The Visualization module renders distance matrices as interactive heatmaps and ego-mode neighborhood graphs. The Explorer embeds distance intelligence directly in the browser dashboard.

Deduplication & Entity Resolution

Real-world data contains the same entity under many names: “Apple”, “Apple Inc.”, “Apple Computer Inc.” Semantica’s deduplication pipeline detects these, merges attributes, resolves conflicts, and preserves the original source provenance.
StrategyAlgorithmBest For
v1Jaro-Winkler string similaritySmall datasets, fast baseline
blocking_v2Candidate blocking + similarityLarge corpora: reduces O(n²) comparisons
hybrid_v2Blocking + semantic embedding matchMixed structured/unstructured entity names
semantic_v2Pure embedding-based resolutionUp to 7× faster than v1; handles abbreviations and aliases

Provenance & Auditability

Every fact in Semantica links back to:
  • The source document it came from
  • The extraction method used (pattern / ML / LLM)
  • The ontology rules applied during graph construction
  • The reasoning steps that produced any inferred fact
This is W3C PROV-O compliant lineage: suitable for regulated industries that require audit trails (HIPAA, SOX, GDPR, FDA 21 CFR Part 11). Use RDFExporter(include_provenance=True) to embed provenance inline in any RDF export.
from semantica.provenance import ProvenanceManager

prov    = ProvenanceManager()
lineage = prov.get_entity_lineage("apple_inc")

print(f"Source:    {lineage.source_document}")
print(f"Method:    {lineage.extraction_method}")
print(f"Extracted: {lineage.timestamp}")
print(f"Checksum:  {lineage.checksum}")

Decision Intelligence

Every agent decision is a first-class object in Semantica: recorded, causally linked, and searchable by precedent. This is the accountability layer for AI pipelines: decisions are no longer ephemeral log messages, they are queryable knowledge graph nodes.
decision_id = context.record_decision(
    category="model_selection",
    scenario="Choose LLM for production pipeline",
    reasoning="GPT-4 benchmark advantage justifies 3x cost increase",
    outcome="selected_gpt4",
    confidence=0.91,
)

# Find similar past decisions before making a new one
precedents = context.find_precedents("model selection reasoning", limit=5)

# Trace downstream impact of a past decision
influence  = context.analyze_decision_influence(decision_id)
Use find_precedents() before every high-stakes decision. Hybrid similarity search over all recorded decisions surfaces past reasoning that may apply: reducing inconsistency across agent runs and enabling genuine organisational learning from AI decision history.

Conflict Detection

When multiple sources disagree on the same fact, Semantica flags and resolves the conflict rather than silently picking one value. Resolution strategies:
  • Recency: prefer the most recent source
  • Source credibility: prefer the most reliable source (configurable credibility scores)
  • Majority vote: aggregate across all sources with ≥ 2 agreeing
  • Manual review: flag for human arbitration; continue pipeline without blocking
See the Conflicts reference for ConflictResolver, SourceTracker, and InvestigationGuideGenerator.

Custom Plugin Development

Semantica is designed for extension. Any component: ingestor, extractor, graph builder, reasoning engine: can be replaced or augmented with a custom implementation registered at runtime.
PluginRegistry provides dynamic plugin discovery, registration, and loading across all modules. Register your own class under a string key; Semantica will use it wherever that key is referenced in config or pipeline steps.
from semantica.core import PluginRegistry

registry = PluginRegistry()

# Register a custom ingestor
registry.register_plugin(
    "my_sql_ingestor", MySQLIngestor,
    version="1.0.0",
    description="PostgreSQL ingestor for internal warehouse",
    capabilities=["ingest"],
)

# Load and use
plugin = registry.load_plugin("my_sql_ingestor", connection_string="postgresql://...")
result = plugin.execute("SELECT * FROM documents")

# Reference by name in pipeline YAML: no code changes needed
steps:
  - name: ingest
    plugin: my_sql_ingestor
    config:
      connection_string: "${DB_URL}"
Extension points available: ingestors, parsers, normalizers, extractors, reasoning engines, export formats, vector store backends, graph store backends, visualization renderers.
MethodRegistry lets you register custom methods on knowledge graph objects by name: useful for adding domain-specific graph operations without subclassing.
from semantica.kg import MethodRegistry

registry = MethodRegistry()

def find_supply_chain_hops(graph, source_node, max_hops=3):
    """Custom BFS traversal for supply chain graphs."""
    ...

# Register under a string key
registry.register("supply_chain_hops", find_supply_chain_hops)

# Call by name on any graph object
result = registry.call("supply_chain_hops", kg, source_node="Supplier_A", max_hops=5)

# List all registered methods
print(registry.list_methods())   # ["supply_chain_hops", ...]

Quickstart Tutorial

Build a full pipeline with code.

Modules Guide

Every module explained with examples.

Use Cases

Real-world domain examples.

API Reference

Complete technical reference.