Learning Paths
Beginner (1–2 hrs)
New to Semantica and knowledge graphs.
Start with Installation →
Intermediate (4–6 hrs)
Comfortable with basics, building real applications.
Start with Modules →
Advanced (8+ hrs)
Enterprise deployments, customization, and extension.
Start with Architecture →
- Beginner (1–2 hrs)
- Intermediate (4–6 hrs)
- Advanced (8+ hrs)
New to Semantica and knowledge graphs. No prior graph database experience required.
Set up your environment
Installation Guide: virtual environments, optional extras, platform-specific fixes.
Understand the core ideas
Core Concepts: what knowledge graphs are, how embeddings work, what extraction does.
Run your first example
Getting Started: 5-minute code walkthrough with pattern-based extraction (no API key needed).
Build your first knowledge graph
Quickstart Tutorial: full 6-step pipeline from ingestion to visualization.
Explore interactively
Welcome to Semantica notebook: Jupyter walkthrough of every module.
Configuration Reference
All settings can be overridden with environment variables: no code changes needed.| Setting | Environment Variable | Default |
|---|---|---|
| OpenAI API Key | OPENAI_API_KEY | None |
| Groq API Key | GROQ_API_KEY | None |
| Anthropic API Key | ANTHROPIC_API_KEY | None |
| Embedding Provider | SEMANTICA_EMBEDDING_PROVIDER | "openai" |
| Graph Backend | SEMANTICA_GRAPH_BACKEND | "networkx" |
| Log Level | SEMANTICA_LOG_LEVEL | "INFO" |
| Log Format | SEMANTICA_LOG_FORMAT | "text" |
Troubleshooting
ModuleNotFoundError: No module named 'semantica'
ModuleNotFoundError: No module named 'semantica'
Verify installation and that the correct Python environment is active:For optional features, install the relevant extra:
AuthenticationError
AuthenticationError
Set your API key as an environment variable — never hardcode keys in source files:
MemoryError or OOM crashes
MemoryError or OOM crashes
Switch from the default in-memory NetworkX backend to a persistent graph database:Also reduce batch sizes and enable streaming ingestion for large corpora.
Slow processing on large datasets
Slow processing on large datasets
Enable parallel execution and GPU acceleration:
Windows [all] installation fails
Windows [all] installation fails
Fixed in v0.5.0. Upgrade:Or install extras individually:
pip install "semantica[core]", then add [llm-openai], [gpu], etc. as needed.cp1252 encoding crash on Windows
cp1252 encoding crash on Windows
Fixed in v0.5.0. For earlier versions, set the encoding environment variable:
Performance Optimization
Backend selection: development vs. production
Backend selection: development vs. production
| Operation | NetworkX (default) | Neo4j / FalkorDB |
|---|---|---|
| Graph construction | Fast | Moderate |
| Query performance | Moderate | Fast |
| Scalability | In-memory only | Persistent, production-scale |
| Recommended for | Development, small graphs | Production, large corpora |
Batch processing for large corpora
Batch processing for large corpora
Process documents in batches rather than one at a time. Configure
chunk_size based on available RAM: a good starting point is 1,000 documents per batch on a 16 GB machine.Deduplication v2: up to 7× faster
Deduplication v2: up to 7× faster
If deduplication is a bottleneck, switch from v1 strategies to the v2 engine:The
blocking_v2, hybrid_v2, and semantic_v2 strategies reduce O(n²) comparisons via candidate blocking before similarity scoring.Security Best Practices
- API keys: store in environment variables or a secrets manager; never commit them to version control; rotate on a schedule
- Sensitive data: use local embedding models (Ollama, HuggingFace) for PII or classified content; avoid sending sensitive data to external APIs without data handling agreements
- Graph exports: encrypt sensitive exports at rest; use the v0.5.0 SSRF-safe
base_urlvalidation when configuring custom LLM gateways - XML ingestion: always use
XMLIngestor(v0.5.0), which uses the XXE-safe lxml backend; never parse untrusted XML with the standard library parser
Cookbook
Interactive Jupyter notebooks from beginner to advanced.
FAQ
Common questions answered.
API Reference
Complete technical documentation.
Use Cases
Domain-specific examples with notebooks.
