v0.5.0: Ontology Hub, Distance Intelligence, Parquet & XML ingestion, 12 security fixes. What’s new →
This guide walks you through the end-to-end pipeline for building your first knowledge graph. Start here after installation. An LLM API key is optional: pattern-based extraction works out of the box.

Install

pip install semantica
Verify:
python -c "import semantica; print(semantica.__version__)"
# 0.5.0

Full Pipeline

Semantica end-to-end pipeline: Ingest → Parse → Normalize → Extract → Build KG → QA → Store → Deliver
1

Ingest

Load a document from a file, directory, URL, or database.
from semantica.ingest import FileIngestor

ingestor = FileIngestor()
sources  = ingestor.ingest("data/report.pdf")
# Also accepts: .docx, .html, .json, .csv, .xlsx, .pptx, .parquet, .xml
2

Parse

Extract structured text and layout from raw documents.
from semantica.parse import DocumentParser

parser = DocumentParser()
parsed = parser.parse(sources[0])

print(parsed.text[:200])  # extracted text
print(parsed.metadata)    # title, author, date, source
For PDFs with tables, charts, or multi-column layouts, use DoclingParser: it applies advanced layout analysis and returns structured table data alongside text.
from semantica.parse import DoclingParser

parser = DoclingParser()
parsed = parser.parse(sources[0])
print(parsed.tables)  # structured table objects
3

Extract Entities & Relationships

Identify named entities and extract typed relationships between them.
from semantica.semantic_extract import NERExtractor, RelationExtractor

ner      = NERExtractor(method="pattern")
entities = ner.extract(parsed)
# Returns: [{"text": "Apple Inc.", "type": "ORGANIZATION", "confidence": 0.98}, ...]

rel           = RelationExtractor(method="rule")
relationships = rel.extract(parsed, entities=entities)
# Returns: [{"subject": "Steve Jobs", "predicate": "founded", "object": "Apple Inc."}, ...]
4

Build the Knowledge Graph

Assemble extracted entities and relationships into a queryable knowledge graph.
from semantica.kg import GraphBuilder

builder = GraphBuilder(merge_entities=True)
graph   = builder.build(entities=entities, relationships=relationships)

print(f"Graph: {len(graph.nodes)} nodes, {len(graph.edges)} edges")

# Query the graph
apple    = graph.get_node("Apple Inc.")
founders = graph.get_neighbors("Apple Inc.", predicate="founded_by")
merge_entities=True automatically resolves duplicate entity references: “Apple”, “Apple Inc.”, “AAPL”: using semantic similarity. No manual deduplication needed.
5

Visualize

Render an interactive, zoomable knowledge graph in the browser.
from semantica.visualization import GraphVisualizer

viz = GraphVisualizer(
    layout="force",        # "force" | "hierarchical" | "circular"
    node_color_by="type",  # color nodes by entity type
    show_confidence=True,
)
viz.visualize(graph, output="graph.html")
Open graph.html in any browser: pan, zoom, click nodes for details, filter by entity type.
6

Export

Export to any downstream format.
from semantica.export import RDFExporter

exporter = RDFExporter()
exporter.export_to_rdf(graph, format="turtle",  output="graph.ttl")
exporter.export_to_rdf(graph, format="json-ld", output="graph.jsonld")
exporter.export_to_rdf(graph, format="nt",      output="graph.nt")

Add Decision Intelligence

Track every agent decision with full causal chains and provenance: one extra import:
from semantica.context import AgentContext, ContextGraph
from semantica.vector_store import VectorStore

context = AgentContext(
    vector_store=VectorStore(backend="faiss", dimension=768),
    knowledge_graph=ContextGraph(advanced_analytics=True),
    decision_tracking=True,
)

# Store a fact with provenance
context.store("GPT-4 outperforms GPT-3.5 on reasoning benchmarks by 40%")

# Record a decision
decision_id = context.record_decision(
    category="model_selection",
    scenario="Choose LLM for production reasoning pipeline",
    reasoning="GPT-4 benchmark advantage justifies 3x cost increase",
    outcome="selected_gpt4",
    confidence=0.91,
)

# Retrieve similar past decisions: prevents inconsistent choices
precedents = context.find_precedents("model selection reasoning", limit=5)
influence  = context.analyze_decision_influence(decision_id)

Common Patterns

from semantica.semantic_extract import NERExtractor, RelationExtractor

text = "Apple Inc. was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in 1976 in Cupertino, California."

ner           = NERExtractor()
entities      = ner.extract(text)

rel           = RelationExtractor()
relationships = rel.extract(text, entities=entities)
from semantica.kg import GraphBuilder

builder     = GraphBuilder(merge_entities=True)
all_entities, all_rels = [], []

for doc in parsed_docs:
    entities = ner.extract(doc)
    rels     = rel.extract(doc, entities=entities)
    all_entities.extend(entities)
    all_rels.extend(rels)

graph = builder.build(entities=all_entities, relationships=all_rels)
from semantica.kg import TemporalKnowledgeGraph
from datetime import datetime

tkg = TemporalKnowledgeGraph()

tkg.add_node("Tim Cook",   role="CEO", valid_from=datetime(2011, 8, 24))
tkg.add_node("Steve Jobs", role="CEO", valid_from=datetime(1997, 9, 16),
             valid_until=datetime(2011, 8, 24))

# What did the graph look like on Jan 1, 2005?
snapshot = tkg.at(datetime(2005, 1, 1))
print(snapshot.get_node("Steve Jobs"))  # role: CEO
from semantica.graph_store import Neo4jStore
from semantica.kg import GraphBuilder

store = Neo4jStore(
    uri="bolt://localhost:7687",
    user="neo4j",
    password="password",
)

builder = GraphBuilder(merge_entities=True, graph_store=store)
graph   = builder.build(entities=entities, relationships=relationships)
# Graph persisted to Neo4j: survives process restarts
from semantica.provenance import ProvenanceTracker
from semantica.kg import GraphBuilder

tracker = ProvenanceTracker()
builder = GraphBuilder(merge_entities=True, provenance=tracker)
graph   = builder.build(entities=entities, relationships=relationships)

# Every node and edge has full lineage
node = graph.get_node("Apple Inc.")
print(node.provenance)
# {
#   "source_document": "data/report.pdf",
#   "extraction_method": "NERExtractor:llm",
#   "extracted_at": "2026-05-22T10:30:00Z",
#   "confidence": 0.98
# }

Troubleshooting

The document likely contains scanned images rather than machine-readable text. Enable OCR:
from semantica.parse import DocumentParser

parser = DocumentParser(ocr=True)  # enables Tesseract OCR
parsed = parser.parse(sources[0])
Enable parallel processing and GPU acceleration:
pip install semantica[gpu]
from semantica.pipeline import Pipeline

pipeline = Pipeline(workers=8, batch_size=32)
pipeline.run(sources)
Switch from in-memory NetworkX to a persistent backend:
from semantica.graph_store import FalkorDBStore

store   = FalkorDBStore(host="localhost", port=6379)
builder = GraphBuilder(merge_entities=True, graph_store=store)
Fixed in v0.5.0. Upgrade:
pip install --upgrade semantica

Next Steps

Core Concepts

Knowledge graphs, ontologies, reasoning engines: the mental model behind Semantica.

Module Reference

Every module explained with key classes and common chains.

API Reference

Complete documentation for every module, class, and parameter.

Cookbook

40+ interactive Jupyter notebooks with real-world datasets.