Quickstart - Semantica

v0.5.0: Ontology Hub, Distance Intelligence, Parquet & XML ingestion, 12 security fixes. What’s new →

This guide walks you through the end-to-end pipeline for building your first knowledge graph. Start here after installation. An LLM API key is optional: pattern-based extraction works out of the box.

Install

pip install semantica

Verify:

python -c "import semantica; print(semantica.__version__)"
# 0.5.0

Full Pipeline

Ingest

Load a document from a file, directory, URL, or database.

from semantica.ingest import FileIngestor

ingestor = FileIngestor()
sources  = ingestor.ingest("data/report.pdf")
# Also accepts: .docx, .html, .json, .csv, .xlsx, .pptx, .parquet, .xml

Parse

Extract structured text and layout from raw documents.

from semantica.parse import DocumentParser

parser = DocumentParser()
parsed = parser.parse(sources[0])

print(parsed.text[:200])  # extracted text
print(parsed.metadata)    # title, author, date, source

For PDFs with tables, charts, or multi-column layouts, use DoclingParser: it applies advanced layout analysis and returns structured table data alongside text.

from semantica.parse import DoclingParser

parser = DoclingParser()
parsed = parser.parse(sources[0])
print(parsed.tables)  # structured table objects

Extract Entities & Relationships

Identify named entities and extract typed relationships between them.

from semantica.semantic_extract import NERExtractor, RelationExtractor

ner      = NERExtractor(method="pattern")
entities = ner.extract(parsed)
# Returns: [{"text": "Apple Inc.", "type": "ORGANIZATION", "confidence": 0.98}, ...]

rel           = RelationExtractor(method="rule")
relationships = rel.extract(parsed, entities=entities)
# Returns: [{"subject": "Steve Jobs", "predicate": "founded", "object": "Apple Inc."}, ...]

Build the Knowledge Graph

Assemble extracted entities and relationships into a queryable knowledge graph.

from semantica.kg import GraphBuilder

builder = GraphBuilder(merge_entities=True)
graph   = builder.build(entities=entities, relationships=relationships)

print(f"Graph: {len(graph.nodes)} nodes, {len(graph.edges)} edges")

# Query the graph
apple    = graph.get_node("Apple Inc.")
founders = graph.get_neighbors("Apple Inc.", predicate="founded_by")

merge_entities=True automatically resolves duplicate entity references: “Apple”, “Apple Inc.”, “AAPL”: using semantic similarity. No manual deduplication needed.

Visualize

Render an interactive, zoomable knowledge graph in the browser.

from semantica.visualization import GraphVisualizer

viz = GraphVisualizer(
    layout="force",        # "force" | "hierarchical" | "circular"
    node_color_by="type",  # color nodes by entity type
    show_confidence=True,
)
viz.visualize(graph, output="graph.html")

Open graph.html in any browser: pan, zoom, click nodes for details, filter by entity type.

Export

Export to any downstream format.

from semantica.export import RDFExporter

exporter = RDFExporter()
exporter.export_to_rdf(graph, format="turtle",  output="graph.ttl")
exporter.export_to_rdf(graph, format="json-ld", output="graph.jsonld")
exporter.export_to_rdf(graph, format="nt",      output="graph.nt")

Add Decision Intelligence

Track every agent decision with full causal chains and provenance: one extra import:

from semantica.context import AgentContext, ContextGraph
from semantica.vector_store import VectorStore

context = AgentContext(
    vector_store=VectorStore(backend="faiss", dimension=768),
    knowledge_graph=ContextGraph(advanced_analytics=True),
    decision_tracking=True,
)

# Store a fact with provenance
context.store("GPT-4 outperforms GPT-3.5 on reasoning benchmarks by 40%")

# Record a decision
decision_id = context.record_decision(
    category="model_selection",
    scenario="Choose LLM for production reasoning pipeline",
    reasoning="GPT-4 benchmark advantage justifies 3x cost increase",
    outcome="selected_gpt4",
    confidence=0.91,
)

# Retrieve similar past decisions: prevents inconsistent choices
precedents = context.find_precedents("model selection reasoning", limit=5)
influence  = context.analyze_decision_influence(decision_id)

Common Patterns

Process raw text directly: no file needed

from semantica.semantic_extract import NERExtractor, RelationExtractor

text = "Apple Inc. was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in 1976 in Cupertino, California."

ner           = NERExtractor()
entities      = ner.extract(text)

rel           = RelationExtractor()
relationships = rel.extract(text, entities=entities)

Multi-source incremental graph build

from semantica.kg import GraphBuilder

builder     = GraphBuilder(merge_entities=True)
all_entities, all_rels = [], []

for doc in parsed_docs:
    entities = ner.extract(doc)
    rels     = rel.extract(doc, entities=entities)
    all_entities.extend(entities)
    all_rels.extend(rels)

graph = builder.build(entities=all_entities, relationships=all_rels)

Temporal knowledge graph with point-in-time queries

from semantica.kg import TemporalKnowledgeGraph
from datetime import datetime

tkg = TemporalKnowledgeGraph()

tkg.add_node("Tim Cook",   role="CEO", valid_from=datetime(2011, 8, 24))
tkg.add_node("Steve Jobs", role="CEO", valid_from=datetime(1997, 9, 16),
             valid_until=datetime(2011, 8, 24))

# What did the graph look like on Jan 1, 2005?
snapshot = tkg.at(datetime(2005, 1, 1))
print(snapshot.get_node("Steve Jobs"))  # role: CEO

Persistent graph store: Neo4j, FalkorDB, Apache AGE

from semantica.graph_store import Neo4jStore
from semantica.kg import GraphBuilder

store = Neo4jStore(
    uri="bolt://localhost:7687",
    user="neo4j",
    password="password",
)

builder = GraphBuilder(merge_entities=True, graph_store=store)
graph   = builder.build(entities=entities, relationships=relationships)
# Graph persisted to Neo4j: survives process restarts

Full provenance pipeline: W3C PROV-O

from semantica.provenance import ProvenanceTracker
from semantica.kg import GraphBuilder

tracker = ProvenanceTracker()
builder = GraphBuilder(merge_entities=True, provenance=tracker)
graph   = builder.build(entities=entities, relationships=relationships)

# Every node and edge has full lineage
node = graph.get_node("Apple Inc.")
print(node.provenance)
# {
#   "source_document": "data/report.pdf",
#   "extraction_method": "NERExtractor:llm",
#   "extracted_at": "2026-05-22T10:30:00Z",
#   "confidence": 0.98
# }

Troubleshooting

No entities extracted

The document likely contains scanned images rather than machine-readable text. Enable OCR:

from semantica.parse import DocumentParser

parser = DocumentParser(ocr=True)  # enables Tesseract OCR
parsed = parser.parse(sources[0])

Slow processing on large corpora

Enable parallel processing and GPU acceleration:

pip install semantica[gpu]

from semantica.pipeline import Pipeline

pipeline = Pipeline(workers=8, batch_size=32)
pipeline.run(sources)

Memory errors on large graphs

Switch from in-memory NetworkX to a persistent backend:

from semantica.graph_store import FalkorDBStore

store   = FalkorDBStore(host="localhost", port=6379)
builder = GraphBuilder(merge_entities=True, graph_store=store)

NER falls back to pattern mode on enterprise gateway

Fixed in v0.5.0. Upgrade:

pip install --upgrade semantica

Next Steps

Core Concepts

Knowledge graphs, ontologies, reasoning engines: the mental model behind Semantica.

Module Reference

Every module explained with key classes and common chains.

API Reference

Complete documentation for every module, class, and parameter.

Cookbook

40+ interactive Jupyter notebooks with real-world datasets.

​Install

​Full Pipeline

​Add Decision Intelligence

​Common Patterns

​Troubleshooting

​Next Steps

Core Concepts

Module Reference

API Reference

Cookbook

Install

Full Pipeline

Add Decision Intelligence

Common Patterns

Troubleshooting

Next Steps