Knowledge Graph API
The Knowledge Graph API is an ingestion pipeline that transforms messy PDFs (10-Ks, Earnings Calls, Press Releases) into a strict, typed knowledge graph.
Overview
We provide Deterministic Grounding for your AI Agents:
- FIBO-Standardized Entities: Every company is resolved to its unique Financial Industry Business Ontology ID
- Atomic Facts: Text is broken into single-sentence propositions with precise time-stamping
- Strict Edges: Relationships are classified into a fixed enum (e.g., ACQUIRED, SUED), making them SQL/CYPHER-queryable
System Architecture
The pipeline uses a Parallel Agentic Workflow (powered by LangGraph) to ensure high fidelity.
1. The "Atomizer" (Ingestion)
- Splits documents by Section Headers
- Explodes paragraphs into Atomic Facts
- Reflexion Loop: Self-corrects to ensure no facts are missed
2. The "Dual-Brain" Resolver (Transformation)
Stream A (Nouns - Librarian):
- Hybrid Search: Combines Qdrant Vector Search with RapidFuzz
- Thresholding: < 90% confidence → Creates new UUID
Stream B (Verbs - Analyst):
- Classifies facts into types like
REPORTED_FINANCIALSorCAUSED
3. The "Causal Linker" (Reasoning)
- Connects
FactNodes based on logical flow - Example:
(Fed Raised Rates) -[:CAUSES]-> (Stocks Dropped)
4. The "Assembler" (Loading)
- Writes to Neo4j using the Fact-as-Node pattern
- Ensures all Entities have
nameproperties and correct FIBO URIs
The Pipeline Flow
- Hierarchical Ingestion: Documents are split by Headers/Sections to preserve context
- Atomic Fact Extraction: Chunks are atomized into single-sentence "Propositions" (Fact Nodes) with Reflexion for self-correction
- Parallel Resolution (The "Split Brain"):
- Agent A (The Librarian): Resolves Entities against FIBO using Hybrid Search (Vector + Fuzzy). Enforces a 90% confidence threshold; creates "New Entities" if no match is found
- Agent B (The Analyst): Classifies Relationships against a strict Semantic Enum (e.g.,
RAISED_POLICY_RATE,CAUSED) to determine thefact_type
- Graph Assembly: Creates typed
FactNodes in Neo4j, links them to resolvedEntitynodes, and connects them toEpisodicNodes for provenance - Causal Linking: The
CausalLinkeragent scans the facts to identify and create explicit[:CAUSES]edges between them
Strict Edge Taxonomy
To prevent "Schema Drift," the LLM is restricted to high-value verbs:
- Corporate:
ACQUIRED,MERGED_WITH,SPUN_OFF,INVESTED_IN - Legal:
SUED,FINED,INVESTIGATED_BY - Causal:
CAUSED,EFFECTED_BY,CONTRIBUTED_TO,PREVENTED - Financial:
RAISED_POLICY_RATE,REPORTED_FINANCIALS,ISSUED_GUIDANCE
API Endpoints
Ingestion
from zomma import KnowledgeGraph
kg = KnowledgeGraph()
# Ingest a document
result = kg.ingest(
file_path="path/to/document.pdf",
document_type="10-K",
company_name="Apple Inc"
)
print(f"Extracted {result.fact_count} facts")
print(f"Resolved {result.entity_count} entities")Entity Resolution
# Resolve an entity to FIBO
entity = kg.resolve_entity(
name="Apple Inc",
context="Technology company"
)
print(f"FIBO ID: {entity.fibo_id}")
print(f"Confidence: {entity.confidence}")Graph Queries
Query the knowledge graph using Cypher:
# Find all acquisitions by a company
query = """
MATCH (company:Entity {name: 'Apple Inc'})-[:ACQUIRED]->(target:Entity)
RETURN target.name, target.acquired_date
"""
results = kg.query(query)
for result in results:
print(f"Acquired: {result['target.name']} on {result['target.acquired_date']}")Temporal Queries
Query facts within a specific time range:
# Get all facts about Apple from Q1 2024
facts = kg.get_facts(
entity="Apple Inc",
start_date="2024-01-01",
end_date="2024-03-31",
fact_types=["REPORTED_FINANCIALS", "ISSUED_GUIDANCE"]
)
for fact in facts:
print(f"{fact.timestamp}: {fact.text}")
print(f"Source: {fact.source_document}")Setup & Usage
1. Initialize Indices
python -m zomma.scripts.setup_graph_index2. Run Verification
python -m zomma.scripts.test_large_pipeline3. Process Documents
python -m zomma.scripts.run_pipeline --input documents/ --output graph.dbDirectory Structure
zomma/
├── agents/ # The "Brains"
│ ├── atomizer.py # Fact Extraction + Reflexion
│ ├── FIBO_librarian.py # Hybrid Entity Resolution
│ ├── graph_assembler.py # Fact-as-Node Creator
│ └── causal_linker.py # Causal Inference Agent
│
├── schemas/ # The "Contracts"
│ ├── atomic_fact.py
│ ├── relationship.py # Expanded Relationship Enums
│ └── nodes.py # Neo4j Node Definitions
│
├── workflows/ # The "Orchestration"
│ └── main_pipeline.py # LangGraph Pipeline
│
└── scripts/ # Utilities
├── setup_graph_index.py # Initializes Vector Indices
├── test_large_pipeline.py # End-to-End Verification
└── run_pipeline.py # Batch ProcessingAuthentication
All API requests require an API key:
import zomma
zomma.api_key = "your-api-key-here"Or via environment variable:
export ZOMMA_API_KEY="your-api-key-here"Rate Limits
- Free Tier: 1,000 requests/day, 10 documents/day
- Pro Tier: 100,000 requests/day, 1,000 documents/day
- Enterprise: Custom limits
Next Steps
- Agentic OS API - Build reasoning agents on top of the knowledge graph
- MCP Server - Integrate with Claude Desktop
- Examples (opens in a new tab) - Sample code and notebooks