Semantica exposes a unified provider interface — a single .generate() method — across Groq, OpenAI, Anthropic Claude, HuggingFace, Novita AI, and 100+ providers via LiteLLM. Use it when you need to swap providers for latency, accuracy, cost, or data-residency reasons without touching application code.
The providers in semantica.llms (Groq, OpenAI, LiteLLM, HuggingFaceLLM) are for text generation and query_with_reasoning(). For structured entity and relation extraction, semantica.semantic_extract accepts provider names as strings. Both patterns are covered here.

Choosing a Provider

Four factors drive provider selection. Latency matters most in real-time SOC triage loops where an analyst is waiting on a triage verdict — Groq’s inference server typically returns 8B model responses in under 300ms. Accuracy matters most in high-stakes decisions: clinical contraindication checks, credit committee reasoning, and legal document analysis reward the frontier models available via LiteLLM. Data residency constraints eliminate cloud providers for classified or HIPAA-regulated workloads — HuggingFaceLLM with a local model path covers those cases. Cost at scale favors high-throughput open-model providers like Novita AI for bulk extraction pipelines where you are processing thousands of documents per hour. The good news: because Semantica’s interface is identical across providers, you can prototype with Groq for speed, validate accuracy with Claude, and deploy to Azure OpenAI for compliance — without changing a single line of your application code. Only the provider constructor changes.

The Shared Interface

Every provider exposes the same two methods:
provider.generate(prompt: str, **kwargs) -> str
provider.generate_structured(prompt: str, **kwargs) -> dict
provider.is_available() -> bool
generate() returns a plain string. generate_structured() instructs the model to respond in JSON and returns a parsed dict. is_available() lets you health-check the provider before committing to a call — useful in retry logic and warm-up checks. This means every place in Semantica that accepts an LLM — query_with_reasoning(), semantic extraction, custom reasoning loops — accepts any of these providers interchangeably.

Groq — Fast Inference for Real-Time Agents

Groq Cloud runs open models on purpose-built Language Processing Units that deliver sub-300ms latency for 8B parameter models. This makes Groq the right default for any agent loop where the LLM is in the hot path — SOC triage, real-time alert classification, conversational agents.
from semantica.llms import Groq

# api_key falls back to the GROQ_API_KEY environment variable
groq = Groq(model="llama-3.1-8b-instant", api_key="YOUR_GROQ_KEY")

# Always health-check before the first call in a long-running process
if not groq.is_available():
    raise RuntimeError("Groq provider unreachable — check GROQ_API_KEY")

# Plain generation
verdict = groq.generate(
    "Alert: 4200 LDAP objects enumerated in 8s from ws-finance-03. "
    "True positive or false positive? One sentence.",
    temperature=0.1,    # low temperature for deterministic triage verdicts
)
print(verdict)
# "True positive — volume and speed are consistent with T1087.002 domain enumeration."

# Structured extraction — returns a parsed dict
entities = groq.generate_structured(
    "Extract threat actors and CVEs from: "
    "APT29 exploited CVE-2024-3400 in PAN-OS GlobalProtect."
)
# {"threat_actors": ["APT29"], "cves": ["CVE-2024-3400"], "products": ["PAN-OS GlobalProtect"]}
Groq model selection comes down to the speed-vs-capability tradeoff: llama-3.1-8b-instant for anything in the hot path, llama-3.3-70b-versatile when you need stronger reasoning but can afford slightly higher latency, mixtral-8x7b-32768 for long-context summarisation tasks.

OpenAI — Function Calling and Vision

The OpenAI provider wraps the OpenAI API. Use it when you need GPT-4o’s function-calling precision, vision capabilities for document screenshots, or when your team already has an OpenAI contract and wants to stay there.
from semantica.llms import OpenAI

oai = OpenAI(model="gpt-4o", api_key="YOUR_OAI_KEY")
# api_key falls back to OPENAI_API_KEY environment variable

response = oai.generate(
    "Under CRR2 Article 92, what is the minimum total capital ratio "
    "for a G-SIB subject to a 2% GSIB buffer surcharge?"
)
print(response)

# Structured output — useful for deterministic data extraction
risk_data = oai.generate_structured(
    "Extract all counterparty names and exposure amounts from: "
    "Counterparty A: 45M EUR notional, Counterparty B: 12M EUR notional, "
    "Counterparty C: 89M EUR notional."
)
# {"counterparties": [{"name": "Counterparty A", "exposure_eur": 45000000}, ...]}
The default model gpt-3.5-turbo is fine for classification and light extraction. Switch to gpt-4o for complex multi-step regulatory reasoning or document understanding.

LiteLLM — One Interface, 100+ Providers

LiteLLM is the Swiss Army knife. It wraps the litellm library, which speaks to every major provider using a unified completion API. The model string encodes both provider and model name: "anthropic/claude-sonnet-4-20250514", "azure/gpt-4o", "bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0", "ollama/llama3.2". Change the string, change the provider — no other code changes needed.
from semantica.llms import LiteLLM

# Anthropic Claude — highest accuracy for complex reasoning
llm = LiteLLM(model="anthropic/claude-sonnet-4-20250514")
# Reads ANTHROPIC_API_KEY from environment

# Azure OpenAI — compliance and data-residency requirements
llm = LiteLLM(model="azure/gpt-4o", api_key="YOUR_AZURE_KEY")

# AWS Bedrock — existing cloud agreement, no new vendor
llm = LiteLLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")

# Google Vertex AI
llm = LiteLLM(model="vertex_ai/gemini-1.5-pro")

# Ollama — fully local, no network calls
llm = LiteLLM(model="ollama/llama3.2")

# All of them: same call
response = llm.generate("Summarise the ICH E6(R2) GCP guideline key requirements.")
The environment-variable convention for each provider: ANTHROPIC_API_KEY, AZURE_API_KEY, OPENAI_API_KEY, GOOGLE_APPLICATION_CREDENTIALS, etc. LiteLLM picks them up automatically. For a provider switch driven by deployment environment, you can keep provider selection in a config dict and inject it at startup — no if/else branches in application code:
import os

PROVIDER_MAP = {
    "prod":    "anthropic/claude-sonnet-4-20250514",
    "staging": "openai/gpt-4o-mini",
    "local":   "ollama/llama3.2",
    "azure":   "azure/gpt-4o",
}

env = os.getenv("DEPLOY_ENV", "local")
llm = LiteLLM(model=PROVIDER_MAP[env])
# The rest of the application never touches provider names

HuggingFaceLLM — Air-Gapped and On-Premise

HuggingFaceLLM loads a model from the HuggingFace Hub or from a local directory path. No network calls during inference. This is the only option for classified environments, HIPAA-constrained clinical deployments, and any network segment without outbound internet access.
from semantica.llms import HuggingFaceLLM

# HuggingFace Hub — authentication via HF_TOKEN environment variable
hf = HuggingFaceLLM(model="mistralai/Mistral-7B-Instruct-v0.3")

# Biomedical fine-tuned model for clinical entity extraction
bio_llm = HuggingFaceLLM(model="aaditya/Llama3-OpenBioLLM-70B")

# Air-gapped deployment — model on local NFS share or mounted volume
# Both `model` and `model_name` are accepted as constructor parameters
air_gapped_llm = HuggingFaceLLM(model="/opt/models/llama-3.1-70b-instruct")

response = air_gapped_llm.generate(
    "Summarise SIGINT collection window 2024-Q3 for APT29 C2 infrastructure.",
    max_length=512,
)
Set HF_TOKEN in your environment for Hub access to gated or private models. For local paths, no token is needed — the model directory must contain the standard HuggingFace checkpoint files.

Swapping Providers Without Changing Application Code

The real payoff of the unified interface is in query_with_reasoning(). This is the call that drives graph-grounded reasoning in every AgentContext. Because it accepts any provider object, you can slot in a different LLM at any tier of your pipeline with zero changes to the surrounding code.
from semantica.context import AgentContext, ContextGraph
from semantica.vector_store import VectorStore
from semantica.llms import Groq, LiteLLM

vs    = VectorStore(backend="faiss", dimension=768)
graph = ContextGraph(advanced_analytics=True)
context = AgentContext(vector_store=vs, knowledge_graph=graph, graph_expansion=True)

# Load your knowledge base once
context.store(
    [
        {"content": "APT29 exploits CVE-2024-3400 in PAN-OS GlobalProtect — CVSS 10.0",
         "metadata": {"source": "nvd", "actor": "APT29"}},
        {"content": "NOBELIUM (APT29) leverages OAuth token theft against Azure AD tenants",
         "metadata": {"source": "msft_blog_2023", "actor": "APT29", "technique": "T1528"}},
    ],
    extract_entities=True,
    extract_relationships=True,
)

query = "What is APT29's current exploitation methodology and what cloud services are targeted?"

# Tier 1: fast answer with Groq (< 300ms)
fast_llm = Groq(model="llama-3.1-8b-instant", api_key="YOUR_GROQ_KEY")
fast_result = context.query_with_reasoning(query, llm_provider=fast_llm, max_results=5)
print("FAST: {}  (conf={:.0%})".format(fast_result["response"], fast_result["confidence"]))

# Tier 2: deep answer with Claude if confidence is below threshold
if fast_result["confidence"] < 0.85:
    deep_llm = LiteLLM(model="anthropic/claude-sonnet-4-20250514")
    deep_result = context.query_with_reasoning(
        query, llm_provider=deep_llm, max_results=15, max_hops=3
    )
    print("DEEP: {}  (conf={:.0%})".format(deep_result["response"], deep_result["confidence"]))
The context, the graph, the retrieval logic — none of it changes. Only the llm_provider argument differs between the two tiers.

Using Providers for Semantic Extraction

For NER, relation extraction, and triplet extraction, Semantica’s semantica.semantic_extract module accepts provider names as strings rather than class instances. The module handles provider instantiation internally.
from semantica.semantic_extract import NamedEntityRecognizer, EventDetector
from semantica.semantic_extract.methods import (
    extract_entities_llm,
    extract_relations_llm,
    extract_triplets_llm,
)

# NER with Groq — provider name as a string
ner = NamedEntityRecognizer(
    methods=["llm"],
    provider="groq",
    llm_model="llama-3.1-8b-instant",
)
entities = ner.extract_entities(
    "APT29 exploited CVE-2024-3400 in PAN-OS GlobalProtect to compromise NATO member networks."
)
for e in entities:
    # Entity fields: .text, .label, .confidence
    print("{} ({}) — conf={:.2f}".format(e.text, e.label, e.confidence))
# APT29 (THREAT_ACTOR) — conf=0.96
# CVE-2024-3400 (CVE) — conf=0.99
# PAN-OS GlobalProtect (PRODUCT) — conf=0.94
# NATO (ORGANIZATION) — conf=0.91

# Event detection with the same provider pattern
detector = EventDetector(method="llm", provider="groq")
events = detector.detect_events(
    "ENISA published the Threat Landscape 2024 report on October 22nd, "
    "covering 11 primary threat categories including ransomware and supply-chain attacks."
)

# Triplet extraction — returns (subject, predicate, object) triples
text = "Warfarin inhibits VKORC1 enzyme activity, reducing vitamin K-dependent clotting factor synthesis."
triplets = extract_triplets_llm(text, provider="groq", model="llama-3.1-8b-instant")
for t in triplets:
    print("{} -> {} -> {}".format(t.subject, t.predicate, t.object))
# warfarin -> inhibits -> VKORC1 enzyme activity
# warfarin -> reduces -> vitamin K-dependent clotting factor synthesis

Novita AI — Cost-Efficient Bulk Extraction

Novita AI exposes an OpenAI-compatible API and is available as a built-in provider for the extraction layer. It is accessed differently from the semantica.llms classes — through create_provider from semantica.semantic_extract.providers — making it the right choice for high-volume NER pipelines where per-call cost matters.
from semantica.semantic_extract.providers import create_provider
from semantica.semantic_extract import NamedEntityRecognizer

# create_provider pools instances — same key reuses the same object
provider = create_provider(
    "novita",
    api_key="YOUR_NOVITA_KEY",       # or set NOVITA_API_KEY env var
    model="deepseek/deepseek-v3.2",  # default model
)

if provider.is_available():
    # Plain generation
    response = provider.generate("Summarise the Basel III leverage ratio requirement.")

    # Structured extraction — returns parsed dict
    data = provider.generate_structured(
        "Extract drug names and dosages from: "
        "Patient received warfarin 5mg daily, aspirin 75mg daily, metformin 500mg twice daily."
    )

# Use Novita through the NER interface — provider name as string
ner = NamedEntityRecognizer(
    methods=["llm"],
    provider="novita",
    llm_model="deepseek/deepseek-v3.2",
)
entities = ner.extract_entities(
    "CVE-2024-3400 is exploited by UNC3886 targeting PAN-OS GlobalProtect."
)
for e in entities:
    print("{} ({}) — conf={:.2f}".format(e.text, e.label, e.confidence))
Novita requires the openai Python client under the hood — install with pip install "semantica[llm-openai]" or pip install openai.

Domain Examples

A classified threat-intelligence analysis unit needs to run entirely air-gapped — no outbound network traffic of any kind. The extraction model and the reasoning model both load from a local NFS share. The knowledge graph accumulates over the analysis session; all inference happens on-premise.
from semantica.llms import HuggingFaceLLM
from semantica.semantic_extract import NamedEntityRecognizer
from semantica.context import AgentContext, ContextGraph
from semantica.vector_store import VectorStore

# Both models load from the air-gapped NFS share — no Hub calls
extraction_llm = HuggingFaceLLM(model="/opt/models/mistral-7b-instruct")
reasoning_llm  = HuggingFaceLLM(model="/opt/models/llama-3.1-70b-instruct")

# NER with local model — provider pattern still works for local paths
# (use extract_entities_llm directly with the provider instance)
from semantica.semantic_extract.methods import extract_entities_llm

sigint_text = (
    "[S//NF] APT29 operator observed deploying WARPWIRE credential harvester "
    "via CVE-2024-3400 on perimeter VPN gateways of target BRAVO-7."
)

# For fully local models, call the provider's generate method directly
raw_entities = extraction_llm.generate(
    "Extract all threat actors, CVEs, malware names, and target identifiers "
    "from the following text as a JSON list:\n\n" + sigint_text
)

# Build the graph and run reasoning
vs    = VectorStore(backend="faiss", dimension=768)
graph = ContextGraph(advanced_analytics=True)
context = AgentContext(
    vector_store=vs,
    knowledge_graph=graph,
    graph_expansion=True,
    decision_tracking=True,
)

context.store(
    sigint_text,
    metadata={"source": "SIGINT_Q4_2024", "classification": "SECRET//NOFORN"},
    conversation_id="op-analysis-q4",
)

result = context.query_with_reasoning(
    "What credential-harvesting capabilities has APT29 deployed against "
    "perimeter VPN gateways and what CVEs enable initial access?",
    llm_provider=reasoning_llm,   # fully local — no network calls
    max_results=10,
    max_hops=3,
)
print(result["response"])
print("Confidence: {:.0%}".format(result["confidence"]))

context.save("./classified_output/q4_analysis/")
  • Agent Memory — using query_with_reasoning() with any LLM provider for graph-grounded retrieval
  • Multi-Agent Systems — wiring different LLM providers to different agent tiers in a shared-graph pipeline
  • Semantic Extraction — LLM-powered NER, relation extraction, event detection, and triplet extraction
  • GraphRAG — multi-hop graph reasoning with query_with_reasoning()