Whether you’re running your first pipeline or deploying Semantica in production, this page gives you a structured path forward: from beginner to enterprise-grade usage.

Learning Paths

Beginner (1–2 hrs)

New to Semantica and knowledge graphs. Start with Installation →

Intermediate (4–6 hrs)

Comfortable with basics, building real applications. Start with Modules →

Advanced (8+ hrs)

Enterprise deployments, customization, and extension. Start with Architecture →
New to Semantica and knowledge graphs. No prior graph database experience required.
1

Set up your environment

Installation Guide: virtual environments, optional extras, platform-specific fixes.
2

Understand the core ideas

Core Concepts: what knowledge graphs are, how embeddings work, what extraction does.
3

Run your first example

Getting Started: 5-minute code walkthrough with pattern-based extraction (no API key needed).
4

Build your first knowledge graph

Quickstart Tutorial: full 6-step pipeline from ingestion to visualization.
5

Explore interactively

Welcome to Semantica notebook: Jupyter walkthrough of every module.

Configuration Reference

All settings can be overridden with environment variables: no code changes needed.
SettingEnvironment VariableDefault
OpenAI API KeyOPENAI_API_KEYNone
Groq API KeyGROQ_API_KEYNone
Anthropic API KeyANTHROPIC_API_KEYNone
Embedding ProviderSEMANTICA_EMBEDDING_PROVIDER"openai"
Graph BackendSEMANTICA_GRAPH_BACKEND"networkx"
Log LevelSEMANTICA_LOG_LEVEL"INFO"
Log FormatSEMANTICA_LOG_FORMAT"text"

Troubleshooting

Verify installation and that the correct Python environment is active:
pip list | grep semantica
pip install --upgrade semantica
For optional features, install the relevant extra:
pip install "semantica[llm-openai]"   # OpenAI provider
pip install "semantica[gpu]"          # GPU acceleration
Set your API key as an environment variable — never hardcode keys in source files:
export OPENAI_API_KEY="sk-..."
export GROQ_API_KEY="gsk_..."
Switch from the default in-memory NetworkX backend to a persistent graph database:
from semantica.graph_store import FalkorDBStore
from semantica.kg import GraphBuilder

store   = FalkorDBStore(host="localhost", port=6379)
builder = GraphBuilder(merge_entities=True, graph_store=store)
Also reduce batch sizes and enable streaming ingestion for large corpora.
Enable parallel execution and GPU acceleration:
from semantica.pipeline import Pipeline

pipeline = Pipeline(workers=8, batch_size=32)
pipeline.run(sources)
pip install "semantica[gpu]"  # CUDA-backed embeddings
Fixed in v0.5.0. Upgrade:
pip install --upgrade semantica
Or install extras individually: pip install "semantica[core]", then add [llm-openai], [gpu], etc. as needed.
Fixed in v0.5.0. For earlier versions, set the encoding environment variable:
set PYTHONIOENCODING=utf-8

Performance Optimization

OperationNetworkX (default)Neo4j / FalkorDB
Graph constructionFastModerate
Query performanceModerateFast
ScalabilityIn-memory onlyPersistent, production-scale
Recommended forDevelopment, small graphsProduction, large corpora
Use NetworkX for local development and prototyping. Switch to a persistent backend before deploying to production.
Process documents in batches rather than one at a time. Configure chunk_size based on available RAM: a good starting point is 1,000 documents per batch on a 16 GB machine.
from semantica.pipeline import Pipeline

pipeline = Pipeline(workers=8, batch_size=32)
pipeline.run(sources)
If deduplication is a bottleneck, switch from v1 strategies to the v2 engine:
resolver = EntityResolver()
merged   = resolver.resolve(entities, strategy="semantic_v2")  # up to 7x faster
The blocking_v2, hybrid_v2, and semantic_v2 strategies reduce O(n²) comparisons via candidate blocking before similarity scoring.

Security Best Practices

  • API keys: store in environment variables or a secrets manager; never commit them to version control; rotate on a schedule
  • Sensitive data: use local embedding models (Ollama, HuggingFace) for PII or classified content; avoid sending sensitive data to external APIs without data handling agreements
  • Graph exports: encrypt sensitive exports at rest; use the v0.5.0 SSRF-safe base_url validation when configuring custom LLM gateways
  • XML ingestion: always use XMLIngestor (v0.5.0), which uses the XXE-safe lxml backend; never parse untrusted XML with the standard library parser

Cookbook

Interactive Jupyter notebooks from beginner to advanced.

FAQ

Common questions answered.

API Reference

Complete technical documentation.

Use Cases

Domain-specific examples with notebooks.