Use Ctrl+F / Cmd+F to search this page. Common jumps: Installation · Data & Features · Troubleshooting

Quick Answers

QuestionAnswer
License?MIT: free forever, no paywalled features
Python version?3.8+ (3.11+ recommended)
API key required?Optional: pattern extraction works with no keys
Works with LangChain / LlamaIndex?Yes: Semantica is a layer on top, not a replacement
Production-ready?Yes: 1,000+ tests, v0.5.0 ships with 12 security fixes
Latest version?v0.5.0 (May 2026)
Local LLMs?Yes: Ollama via LiteLLM, HuggingFaceLLM for air-gapped

General

Semantica is an open-source framework for building context graphs and decision intelligence layers for AI. It transforms unstructured data: documents, APIs, databases: into structured knowledge graphs with full provenance tracking, making AI systems explainable and auditable.It’s not a replacement for LangChain or LlamaIndex. It’s the accountability layer that goes on top: recording decisions, tracing facts to sources, and making reasoning transparent.
  • Knowledge graphs from documents and multi-source data
  • GraphRAG systems with graph-grounded retrieval and source attribution
  • AI agents with structured decision history and semantic memory
  • Compliance-ready pipelines with W3C PROV-O lineage (HIPAA, SOX, GDPR, FDA 21 CFR Part 11)
  • Temporal graphs that track how facts change over time
  • Ontology-driven knowledge bases with SHACL validation
Most frameworks stop at retrieval or generation. Semantica adds an accountability layer: every decision is recorded, every fact links to a source, and every reasoning step is explainable. It’s designed for environments where you need to audit why an AI reached a conclusion: not just what it said.Semantica works alongside these frameworks, not against them.
Yes: MIT licensed, no vendor lock-in, no paywalled features. Some capabilities require third-party API keys (e.g., OpenAI embeddings, Groq inference), but Semantica itself is always free and open source.
v0.5.0: released May 2026.Highlights: Ontology Hub, Distance Intelligence, Parquet/XML ingestion, 12 security fixes, Graph Explorer redesign, NER gateway fix.
pip install --upgrade semantica

Installation

pip install semantica
See Installation for virtual environment setup, optional extras ([gpu], [all], provider-specific), and platform-specific troubleshooting.
Python 3.8 or higher. Python 3.11+ is recommended for best performance and compatibility.
This was a known bug: fixed in v0.5.0. Upgrade:
pip install --upgrade semantica
If you’re on an older version, install extras individually: pip install "semantica[core]", then add [llm-openai], [gpu], etc.
RequirementMinimumRecommended
Python3.83.11+
RAM4 GB16 GB+
Storage2 GB20 GB+
GPUOptionalCUDA for embeddings and ML models

Data & Features

CategorySources
FilesPDF, DOCX, HTML, JSON, CSV, Excel, PPTX, Parquet (v0.5.0), XML (v0.5.0), archives
WebWebIngestor crawl, RSS feeds, sitemaps
DatabasesPostgreSQL, MySQL, Snowflake via DBIngestor / SnowflakeIngestor
NoSQLMongoDB via MongoIngestor, DuckDB via DuckDBIngestor
StreamsKafka, real-time ingestion via StreamIngestor
ProtocolsMCP (Model Context Protocol) via MCPIngestor
CloudGoogle Drive via GDriveIngestor, HuggingFace datasets
Yes. Semantica supports:
  • Custom NER and extraction models: register via method_registry
  • Custom embedding models: any model with a .encode() interface
  • Custom LLM providers: via LiteLLM (100+ models) or direct provider integration
  • Custom pipeline processors: register via PluginRegistry
Yes. When available, GPUs are used automatically for embedding generation, ML model inference, and vector operations. Install GPU support:
pip install "semantica[gpu]"
This includes PyTorch with CUDA, FAISS GPU, and CuPy.
  • Batching: process documents in configurable chunks to control memory usage
  • Parallel processing: Pipeline(workers=N) runs extraction steps concurrently
  • Delta processing: update graphs incrementally without full recompute on new data
  • Persistent backends: swap in-memory NetworkX for Neo4j, FalkorDB, or Apache AGE for large-scale production graphs
TemporalKnowledgeGraph attaches valid_from / valid_until windows to nodes and edges, enabling point-in-time queries and historical analysis. Supports all 13 Allen interval algebra relations and OWL-Time export.
from semantica.kg import TemporalKnowledgeGraph

tkg = TemporalKnowledgeGraph()
tkg.add_temporal_triple("A", "caused", "B", valid_from="2024-01", valid_until="2024-06")
snapshot = tkg.query_at_time("2024-03")
Available since v0.4.0.
A visual browser UI for the full ontology lifecycle: launched via semantica.explorer. Includes:
  • Visual editor: create and edit classes, properties, and relationships
  • SHACL Studio: author, validate, and export SHACL shapes
  • Alignment authoring: map concepts across ontologies
  • Health dashboard: coverage, consistency, and constraint violation metrics
  • Version control: diff and history for ontology changes
Available since v0.5.0.
Semantic neighborhood exploration for any entity in the graph. Returns structured proximity data with distance band classification.
  • N×N distance matrices across a set of entities
  • Ego-mode visualization centered on a single node
  • Distance bands: near / mid / far based on embedding thresholds
  • Embedding cache optimization for repeated queries
Available since v0.5.0.
Fixed in v0.5.0. The response_format=json_object parameter is now conditionally omitted for incompatible gateways, with a plain generate() plus JSON parsing fallback applied automatically. Upgrade to fix:
pip install --upgrade semantica

Technical

  • Neo4j: industry standard, Cypher query language
  • FalkorDB: Redis-protocol, ultra-low latency
  • Apache AGE: PostgreSQL extension, OpenCypher
  • Amazon Neptune: managed AWS, SPARQL and Gremlin
  • NetworkX: in-memory, for development and small graphs
RDF (Turtle, JSON-LD, N-Triples, XML), Apache Parquet, ArangoDB AQL, Apache Arrow, LPG, CSV, YAML, OWL ontologies, and distance matrices.
FAISS, Pinecone, Weaviate, Qdrant, Milvus, PgVector, and in-memory. All backends share the same VectorStore API: swap with one line change.
Groq, OpenAI, Anthropic, Google Gemini, Ollama (fully local), DeepSeek, Novita AI, LiteLLM (100+ models via a single interface), and any OpenAI-compatible gateway.
Yes. v0.5.0 ships with:
  • 1,000+ passing tests across Python 3.8–3.12
  • PipelineValidator and FailureHandler with exponential backoff and configurable retry policies
  • W3C PROV-O provenance tracking across all modules
  • Change management with SHA-256 checksums and full audit trails
  • 12 security vulnerability fixes: eval injection, pickle deserialization, SQL injection, XXE, SSRF, ReDoS, path traversal, and more

Troubleshooting

Ensure the correct Python environment is active:
pip list | grep semantica
pip install --upgrade semantica
pip install --upgrade pip wheel
pip install semantica
If [all] fails on Windows, install extras individually instead.
Reduce batch sizes, enable streaming ingestion, or switch to a persistent graph backend:
from semantica.graph_store import FalkorDBStore
store   = FalkorDBStore(host="localhost", port=6379)
builder = GraphBuilder(merge_entities=True, graph_store=store)
Install GPU support and confirm CUDA is available:
pip install "semantica[gpu]"
nvidia-smi  # confirm GPU is visible
Fixed in v0.5.0. Upgrade, or set the encoding environment variable for older versions:
pip install --upgrade semantica
# or for older versions:
set PYTHONIOENCODING=utf-8

Support

Discord

Community chat and live support.

GitHub Issues

Bug reports and feature requests.

Contributing

Help improve Semantica.