When you ingest data from multiple sources, contradictions are inevitable. One annual report says Apple’s revenue was 391B;afinancialnewswiresays383B. Without conflict detection, both values land in your graph and queries silently return inconsistent answers.Semantica’s conflict detection makes disagreements explicit and actionable:
Value conflicts: SEC says revenue is 391B;Reuterssays383B
Type conflicts: “Python” is a ProgrammingLanguage in one source, a Snake species in another
Temporal conflicts: a CEO had two different employers during overlapping date ranges
Logical conflicts: an entity simultaneously holds two mutually exclusive properties
Relationship conflicts: the same relationship has inconsistent cardinality or properties across sources
from semantica.conflicts import ConflictResolver, InvestigationGuideGenerator, ResolutionStrategyresolver = ConflictResolver(source_tracker=tracker)# Auto-resolve low-severity conflictslow_conflicts = severity_details.get("low", [])# Re-fetch full Conflict objects if needed: severity_details contains dictsauto_resolved = resolver.resolve_conflicts( conflicts, strategy=ResolutionStrategy.CREDIBILITY_WEIGHTED,)# Generate investigation guides for critical conflictscritical_ids = {d["conflict_id"] for d in severity_details.get("critical", [])}critical_conflicts = [c for c in conflicts if c.conflict_id in critical_ids]generator = InvestigationGuideGenerator()for conflict in critical_conflicts: guide = generator.generate_guide(conflict) print("\n%s" % guide.title) for step in guide.investigation_steps: print(" [%d] %s" % (step.step_number, step.description)) print(" Action: %s" % step.action)
Detect before you merge, not after. Run conflict detection on raw entity data before deduplication and graph construction. Detecting conflicts in a live graph that already contains merged entities is harder: you lose the original source attribution.
from semantica.conflicts import ConflictDetectordetector = ConflictDetector()# Detect value conflicts on a specific propertyconflicts = detector.detect_value_conflicts(entities, "revenue")
Same entity, same property, different values across sources
Revenue 391Bvs383B
TYPE
Same entity classified as different types
”Python” as Language vs Snake
TEMPORAL
Conflicting timestamps or validity windows
CEO at two companies simultaneously
LOGICAL
Logically inconsistent property combinations
is_alive=True but death_date set
RELATIONSHIP
Inconsistent relationship properties across sources
Edge weight 0.9 vs 0.3 from two sources
TEMPORAL and LOGICAL conflict detection is not implemented on ConflictDetector directly. The ConflictType enum includes these types for use in custom pipelines, but the detector class only implements detect_value_conflicts, detect_type_conflicts, detect_relationship_conflicts, and detect_entity_conflicts.
Run targeted detection by type:
# Detect value conflicts for a specific propertyvalue_conflicts = detector.detect_value_conflicts(entities, "revenue")# Detect type classification conflictstype_conflicts = detector.detect_type_conflicts(entities)# Detect relationship property conflicts (takes a list of relationship dicts)relation_conflicts = detector.detect_relationship_conflicts(relationships)# Detect conflicts across all properties of a set of entitiesall_conflicts = detector.detect_entity_conflicts(entities)
from semantica.conflicts import ConflictResolver, ResolutionStrategyresolver = ConflictResolver()results = resolver.resolve_conflicts(conflicts, strategy=ResolutionStrategy.VOTING)for result in results: print("Resolved '%s' -> %s" % (result.conflict_id, result.resolved_value)) print(" Strategy: %s Confidence: %.2f" % (result.resolution_strategy, result.confidence))
Don’t auto-resolve everything. Use MANUAL_REVIEW for conflicts with severity == "critical" or severity == "high": high severity means the disagreement is large and the stakes of getting it wrong are high.
Best for: 3+ sources with roughly equal credibility. When all sources have identical credibility scores, CREDIBILITY_WEIGHTED behaves identically to VOTING.
# Most recent source wins: for fast-changing factsresults = resolver.resolve_conflicts(conflicts, strategy=ResolutionStrategy.MOST_RECENT)# First seen wins: for stable facts (founding date, original name)results = resolver.resolve_conflicts(conflicts, strategy=ResolutionStrategy.FIRST_SEEN)
# Flag for human review: use with InvestigationGuideGeneratorresults = resolver.resolve_conflicts(conflicts, strategy=ResolutionStrategy.MANUAL_REVIEW)generator = InvestigationGuideGenerator()for conflict in conflicts: guide = generator.generate_guide(conflict) print("%s" % guide.title) for step in guide.investigation_steps: print(" [%d] %s" % (step.step_number, step.description))
Best for: high-stakes decisions (severity == "critical"), regulated data (HIPAA/SOX), and domain-specific ambiguity.
Strategy
Enum
When to Use
Majority vote
VOTING
3+ sources with roughly equal credibility
Credibility-weighted
CREDIBILITY_WEIGHTED
Sources have different authority levels
Most recent
MOST_RECENT
Fast-changing facts: stock price, headcount, status
First seen
FIRST_SEEN
Stable facts: founding date, original name
Highest confidence
HIGHEST_CONFIDENCE
Extraction pipeline outputs confidence scores
Manual review
MANUAL_REVIEW
High-stakes decisions, regulated data
Expert review
EXPERT_REVIEW
Domain-specific ambiguity: escalate to a specialist
Use the convenience aliases for shorter code:
from semantica.conflicts import voting, credibility_weighted, most_recent, highest_confidenceresults = resolver.resolve_conflicts(conflicts, strategy=voting)
from semantica.conflicts import SourceTracker, SourceReferencetracker = SourceTracker()tracker.set_source_credibility("sec_10k", 0.92)tracker.set_source_credibility("wikipedia", 0.80)source_ref = SourceReference( document="sec_10k_2023", page=12, confidence=0.95,)tracker.track_property_source( entity_id="apple_inc", property_name="revenue", value="$391B", source=source_ref,)# Returns a PropertySource object with .value and .sources (List[SourceReference])prop_source = tracker.get_property_sources("apple_inc", "revenue")if prop_source: print("Value: %s" % prop_source.value) for s in prop_source.sources: credibility = tracker.get_source_credibility(s.document) print(" %s (confidence: %.2f, credibility: %.2f)" % ( s.document, s.confidence, credibility))chain = tracker.get_traceability_chain("apple_inc")
Key behaviours:
Credibility scores default to 0.50 for any source not explicitly set
SourceTracker stores property-level provenance: so you can trace exactly which source contributed each value
Always set credibility scores. The default credibility is 0.50 for all sources. Without explicit scores, CREDIBILITY_WEIGHTED behaves identically to VOTING. The power of this strategy is in the differentiation.
Combine with provenance. The SourceTracker feeds directly into the Provenance module’s audit trail. If you need to explain how a resolved value was chosen, provenance records give you the full chain.
from semantica.conflicts import ConflictAnalyzeranalyzer = ConflictAnalyzer()analysis = analyzer.analyze_conflicts(conflicts)patterns = analysis["patterns"]severity_counts = analysis["by_severity"]["counts"]source_stats = analysis["by_source"]trends = analyzer.analyze_trends(conflicts)# analyze_trends returns a list of dicts, one per time periodfor t in trends: print("Period: %s Count: %d Trend: %s" % ( t["period"], t["conflict_count"], t["trend"]))
Key behaviours:
analyze_conflicts()["patterns"] returns a list of ConflictPattern objects: use pattern.pattern_type and pattern.frequency to find systemic data quality issues
analyze_conflicts()["by_source"] includes counts and top_sources: sources appearing in many conflicts may have upstream data quality problems
analyze_trends() returns a list of per-period dicts (period, conflict_count, trend, trend_direction): trend is "increasing", "decreasing", or "stable"
Use analyze_conflicts()["by_source"]["top_sources"] to identify bad data feeds. A single source appearing in many conflicts is a data quality problem upstream, not a conflict to resolve record by record. Flag it and investigate the source pipeline.
Severity is a string label, not a score.ConflictDetector assigns "critical", "high", or "medium" based on property importance and value differences. Critical fields (id, name, type, revenue) always yield "critical". Domain context determines what to prioritize.
@dataclassclass ResolutionResult: conflict_id: str resolved: bool resolved_value: Any # None if unresolved or flagged for review resolution_strategy: Optional[str] # e.g. "voting", "credibility_weighted" confidence: float # 0.0–1.0 sources_used: List[str] # document IDs that contributed resolution_notes: Optional[str] metadata: Dict[str, Any]
ConflictType enum
from semantica.conflicts import ConflictTypeConflictType.VALUE_CONFLICT # revenue is $391B in source A, $383B in source BConflictType.TYPE_CONFLICT # "Apple" is ORGANIZATION in one source, PRODUCT in anotherConflictType.TEMPORAL_CONFLICT # overlapping validity windows with contradictory statesConflictType.LOGICAL_CONFLICT # fact violates an ontology axiom or SHACL constraintConflictType.RELATIONSHIP_CONFLICT # inconsistent relationship properties across sources
InvestigationGuide and InvestigationStep schemas
@dataclassclass InvestigationGuide: conflict_id: str conflict_summary: str # generated summary of the disagreement severity: str # "low" | "medium" | "high" | "critical" conflicting_sources: List[Dict[str, Any]] investigation_steps: List[InvestigationStep] recommended_actions: List[str] context: Dict[str, Any] generated_at: str # ISO timestamp # title is a @property: "Investigation: <conflict_id>"@dataclassclass InvestigationStep: step_number: int description: str # what to do action: str # specific action to take expected_outcome: Optional[str]
Deduplication
Resolve duplicate entities before conflict detection.
Ontology
Logical conflicts use SHACL shapes and ontology axioms.
Provenance
Track which source each conflicting fact came from.