semantica.llms provides a single consistent API across every major LLM provider:
  • Every provider is a drop-in replacement for the llm_provider= parameter in extractors, reasoners, and agents
  • LiteLLM routes to 100+ providers with a single class and model-string prefixes
  • HuggingFaceLLM runs fully on-premise: no API key, no network calls
  • Structured output via generate_with_schema() for JSON extraction from any provider
  • Streaming, tool use, and generate_batch() for bulk inference

Exported Classes

from semantica.llms import Groq, OpenAI, LiteLLM, HuggingFaceLLM
ClassProviderAPI Key Required
GroqGroq CloudGROQ_API_KEY
OpenAIOpenAI / any OpenAI-compatible gatewayOPENAI_API_KEY
LiteLLM100+ providers via LiteLLM routingDepends on model
HuggingFaceLLMLocal HuggingFace TransformersNone (local)
Anthropic, Gemini, Ollama, DeepSeek, Azure, Bedrock, Cohere, and 90+ others are all available via LiteLLM using their model-string prefix. See the LiteLLM section below.

What You Get

  • Unified LLMProvider interface: swap providers with a one-line change, no application code changes
  • LiteLLM: single class for 100+ providers using model-string routing
  • Local models: HuggingFaceLLM runs fully on-premise, no API key
  • Streaming: token-by-token output for low-latency UX
  • Custom gateways: point OpenAI at any OpenAI-compatible endpoint via base_url

Choosing a Provider

Free tier, fastest inference, zero setup friction. Best for development and high-throughput extraction pipelines.
SpeedVery fast: 100+ tok/s
CostFree tier available
Context128k
Best forDevelopment, high-throughput extraction
import os
from semantica.llms import Groq

llm = Groq(
    model="llama-3.1-8b-instant",
    api_key=os.getenv("GROQ_API_KEY"),
    temperature=0.0,
)
Get your free key at console.groq.com.

API Key Setup

# Add to your shell profile (.bashrc, .zshrc, etc.)
export GROQ_API_KEY="your_groq_api_key_here"
export OPENAI_API_KEY="your_openai_api_key_here" 
export ANTHROPIC_API_KEY="your_anthropic_api_key_here"

# Reload your shell
source ~/.bashrc

Configuration File Method

# config.yaml
llm_provider:
  name: groq
  model: llama-3.1-8b-instant
  temperature: 0.0
# Set GROQ_API_KEY environment variable and pass to constructor

Programmatic Setup

import os
from semantica.llms import Groq, LiteLLM

# Method 1: Direct API key
llm = Groq(api_key="your-api-key-here", model="llama-3.1-8b-instant")

# Method 2: Environment variable (preferred)
llm = Groq(api_key=os.getenv("GROQ_API_KEY"), model="llama-3.1-8b-instant")

# Method 3: Multiple providers via LiteLLM
providers = {
    "fast": LiteLLM(model="groq/llama-3.1-8b-instant", api_key=os.getenv("GROQ_API_KEY")),
    "smart": LiteLLM(model="anthropic/claude-sonnet-4-20250514", api_key=os.getenv("ANTHROPIC_API_KEY"))
}

Security Best Practices

Never commit API keys to version control. Use environment variables or secure secret management.
# ❌ Bad - API key in code
llm = Groq(api_key="gsk_abc123...", model="llama-3.1-8b-instant")

# ✅ Good - Environment variable
llm = Groq(api_key=os.getenv("GROQ_API_KEY"), model="llama-3.1-8b-instant")

Providers

import os
from semantica.llms import Groq

llm = Groq(
    model="llama-3.3-70b-versatile",   # recommended; implementation default: llama-3.1-8b-instant
    api_key=os.getenv("GROQ_API_KEY"),
    max_tokens=64000,
    temperature=0.0,
)
# **Best for:** high-throughput extraction, fast inference at low cost

LiteLLM: 100+ Providers

LiteLLM is the recommended way to access any provider not directly exported by semantica.llms. Use the provider/model string format:
import os
from semantica.llms import LiteLLM

# Pattern: LiteLLM(model="<provider>/<model-name>")
providers = {
    "Anthropic":  LiteLLM(model="anthropic/claude-opus-4-5",       api_key=os.getenv("ANTHROPIC_API_KEY")),
    "Gemini":     LiteLLM(model="gemini/gemini-1.5-pro",            api_key=os.getenv("GOOGLE_API_KEY")),
    "Ollama":     LiteLLM(model="ollama/llama3.2:3b",               api_base="http://localhost:11434"),
    "DeepSeek":   LiteLLM(model="deepseek/deepseek-chat",           api_key=os.getenv("DEEPSEEK_API_KEY")),
    "Azure":      LiteLLM(model="azure/gpt-4o",                     api_key=os.getenv("AZURE_API_KEY")),
    "Bedrock":    LiteLLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0"),
    "Cohere":     LiteLLM(model="cohere/command-r-plus",            api_key=os.getenv("COHERE_API_KEY")),
    "Novita AI":  LiteLLM(model="novita/deepseek/deepseek-v3.2",    api_key=os.getenv("NOVITA_API_KEY")),
}

# Every LiteLLM instance implements the same .generate() interface
response = providers["Anthropic"].generate("Explain GraphRAG in one paragraph.")
The full list of supported LiteLLM model strings is at docs.litellm.ai/docs/providers. Use the provider/model format shown above.

Custom / Enterprise Gateways

Any OpenAI-compatible endpoint: internal routing layers, Qwen proxies, or private LLaMA deployments:
import os
from semantica.llms import OpenAI

llm = OpenAI(
    model="qwen2.5-72b",
    api_key=os.getenv("GATEWAY_API_KEY"),
    base_url="https://my-internal-gateway.company.com/v1",
)
base_url is validated at construction time. Non-HTTP(S) schemes raise ValueError to prevent SSRF attacks (fixed in v0.5.0).

Using in Extractors

All extractors accept any provider as llm_provider=:
import os
from semantica.semantic_extract import NERExtractor, RelationExtractor, TripletExtractor
from semantica.llms import Groq

llm = Groq(model="llama-3.3-70b-versatile", api_key=os.getenv("GROQ_API_KEY"))

ner  = NERExtractor(method="llm",      llm_provider=llm, max_retries=3)
rel  = RelationExtractor(method="llm", llm_provider=llm)
trip = TripletExtractor(method="llm",  llm_provider=llm)

Provider Comparison

ProviderImportSpeedCostLocalContextBest For
GroqGroqVery fastLowNo128kHigh-throughput extraction
OpenAIOpenAIFastMediumNo128kGeneral purpose, function calling
AnthropicLiteLLM(model="anthropic/...")FastMediumNo200kComplex reasoning, safety
GeminiLiteLLM(model="gemini/...")FastLowNo1MLong context, multimodal
OllamaLiteLLM(model="ollama/...")MediumFreeYesVariesPrivacy, air-gapped
DeepSeekLiteLLM(model="deepseek/...")FastVery lowNo64kCoding, analysis
Azure OpenAILiteLLM(model="azure/...")FastMediumNo128kEnterprise, compliance
AWS BedrockLiteLLM(model="bedrock/...")FastVariesNoVariesAWS-native workloads
HuggingFaceHuggingFaceLLMSlowFreeYesVariesCustom models, BYOM
For production extraction pipelines, Groq delivers the best throughput-to-cost ratio. For complex multi-hop reasoning, Claude Opus or GPT-4o provide the highest accuracy.

Defaults and Reproducibility

Documentation examples may showcase stronger models for better developer experience, while implementation defaults prioritize reliability and cost efficiency. Understanding actual defaults helps with reproducible results and consistent benchmarking. Verified Implementation Defaults:
ProviderDefault ModelNotes
Groqllama-3.1-8b-instantImplementation default; examples use llama-3.3-70b-versatile for showcase
OpenAIgpt-3.5-turboImplementation default; examples use gpt-4o for showcase
HuggingFaceLLMgpt2Lightweight, widely compatible
These are the models used when you construct a provider without specifying model=. Examples throughout this documentation use stronger showcase models. Always pass model= explicitly in production for reproducible results. Why This Matters:
  • Reproducible extraction results across environments
  • Consistent baseline performance for benchmarking
  • Predictable costs when scaling production workloads

Performance and Reliability Tips

Extraction with Retries

import os
from semantica.semantic_extract import NERExtractor
from semantica.llms import Groq

llm = Groq(model="llama-3.3-70b-versatile", api_key=os.getenv("GROQ_API_KEY"))
ner = NERExtractor(method="llm", llm_provider=llm, max_retries=3)

# Process multiple texts with automatic retries
texts = ["Document 1 text...", "Document 2 text...", "Document 3 text..."]
all_entities = []

for text in texts:
    entities = ner.extract(text)
    all_entities.extend(entities)
    
# Rate limiting handled automatically by provider

Model Selection by Use Case

Use CaseRecommended Provider/ModelReasoning
Entity ExtractionGroq("llama-3.3-70b-versatile")Fast, good accuracy for structured tasks
Relation ExtractionOpenAI("gpt-4o")Best at complex relationship reasoning
Complex AnalysisLiteLLM("anthropic/claude-sonnet-4-20250514")Highest reasoning capability
High Volume/CostLiteLLM("deepseek/deepseek-chat")Lowest cost per token

Error Handling

import os
from semantica.llms import Groq
from semantica.semantic_extract import NERExtractor

llm = Groq(
    model="llama-3.3-70b-versatile",
    api_key=os.getenv("GROQ_API_KEY")
)

# Automatic retries for rate limits and transient errors
extractor = NERExtractor(
    method="llm", 
    llm_provider=llm,
    max_retries=3      # Retry failed requests automatically
)

Semantic Extract

Use LLMs for NER and relation extraction.

Agno Integration

LLM providers in Agno multi-agent teams.

Reasoning

LLM-backed deductive and abductive reasoning.

Context

GraphRAG uses LLMs for reasoning over knowledge graphs.