docs
API Reference
Knowledge Graph API

Knowledge Graph API

The Knowledge Graph API is an ingestion pipeline that transforms messy PDFs (10-Ks, Earnings Calls, Press Releases) into a strict, typed knowledge graph.

Overview

We provide Deterministic Grounding for your AI Agents:

  • FIBO-Standardized Entities: Every company is resolved to its unique Financial Industry Business Ontology ID
  • Atomic Facts: Text is broken into single-sentence propositions with precise time-stamping
  • Strict Edges: Relationships are classified into a fixed enum (e.g., ACQUIRED, SUED), making them SQL/CYPHER-queryable

System Architecture

The pipeline uses a Parallel Agentic Workflow (powered by LangGraph) to ensure high fidelity.

1. The "Atomizer" (Ingestion)

  • Splits documents by Section Headers
  • Explodes paragraphs into Atomic Facts
  • Reflexion Loop: Self-corrects to ensure no facts are missed

2. The "Dual-Brain" Resolver (Transformation)

Stream A (Nouns - Librarian):

  • Hybrid Search: Combines Qdrant Vector Search with RapidFuzz
  • Thresholding: < 90% confidence → Creates new UUID

Stream B (Verbs - Analyst):

  • Classifies facts into types like REPORTED_FINANCIALS or CAUSED

3. The "Causal Linker" (Reasoning)

  • Connects FactNodes based on logical flow
  • Example: (Fed Raised Rates) -[:CAUSES]-> (Stocks Dropped)

4. The "Assembler" (Loading)

  • Writes to Neo4j using the Fact-as-Node pattern
  • Ensures all Entities have name properties and correct FIBO URIs

The Pipeline Flow

  1. Hierarchical Ingestion: Documents are split by Headers/Sections to preserve context
  2. Atomic Fact Extraction: Chunks are atomized into single-sentence "Propositions" (Fact Nodes) with Reflexion for self-correction
  3. Parallel Resolution (The "Split Brain"):
    • Agent A (The Librarian): Resolves Entities against FIBO using Hybrid Search (Vector + Fuzzy). Enforces a 90% confidence threshold; creates "New Entities" if no match is found
    • Agent B (The Analyst): Classifies Relationships against a strict Semantic Enum (e.g., RAISED_POLICY_RATE, CAUSED) to determine the fact_type
  4. Graph Assembly: Creates typed FactNodes in Neo4j, links them to resolved Entity nodes, and connects them to EpisodicNodes for provenance
  5. Causal Linking: The CausalLinker agent scans the facts to identify and create explicit [:CAUSES] edges between them

Strict Edge Taxonomy

To prevent "Schema Drift," the LLM is restricted to high-value verbs:

  • Corporate: ACQUIRED, MERGED_WITH, SPUN_OFF, INVESTED_IN
  • Legal: SUED, FINED, INVESTIGATED_BY
  • Causal: CAUSED, EFFECTED_BY, CONTRIBUTED_TO, PREVENTED
  • Financial: RAISED_POLICY_RATE, REPORTED_FINANCIALS, ISSUED_GUIDANCE

API Endpoints

Ingestion

from zomma import KnowledgeGraph
 
kg = KnowledgeGraph()
 
# Ingest a document
result = kg.ingest(
    file_path="path/to/document.pdf",
    document_type="10-K",
    company_name="Apple Inc"
)
 
print(f"Extracted {result.fact_count} facts")
print(f"Resolved {result.entity_count} entities")

Entity Resolution

# Resolve an entity to FIBO
entity = kg.resolve_entity(
    name="Apple Inc",
    context="Technology company"
)
 
print(f"FIBO ID: {entity.fibo_id}")
print(f"Confidence: {entity.confidence}")

Graph Queries

Query the knowledge graph using Cypher:

# Find all acquisitions by a company
query = """
MATCH (company:Entity {name: 'Apple Inc'})-[:ACQUIRED]->(target:Entity)
RETURN target.name, target.acquired_date
"""
 
results = kg.query(query)
for result in results:
    print(f"Acquired: {result['target.name']} on {result['target.acquired_date']}")

Temporal Queries

Query facts within a specific time range:

# Get all facts about Apple from Q1 2024
facts = kg.get_facts(
    entity="Apple Inc",
    start_date="2024-01-01",
    end_date="2024-03-31",
    fact_types=["REPORTED_FINANCIALS", "ISSUED_GUIDANCE"]
)
 
for fact in facts:
    print(f"{fact.timestamp}: {fact.text}")
    print(f"Source: {fact.source_document}")

Setup & Usage

1. Initialize Indices

python -m zomma.scripts.setup_graph_index

2. Run Verification

python -m zomma.scripts.test_large_pipeline

3. Process Documents

python -m zomma.scripts.run_pipeline --input documents/ --output graph.db

Directory Structure

zomma/
├── agents/                     # The "Brains"
│   ├── atomizer.py             # Fact Extraction + Reflexion
│   ├── FIBO_librarian.py       # Hybrid Entity Resolution
│   ├── graph_assembler.py      # Fact-as-Node Creator
│   └── causal_linker.py        # Causal Inference Agent

├── schemas/                    # The "Contracts"
│   ├── atomic_fact.py
│   ├── relationship.py         # Expanded Relationship Enums
│   └── nodes.py                # Neo4j Node Definitions

├── workflows/                  # The "Orchestration"
│   └── main_pipeline.py        # LangGraph Pipeline

└── scripts/                    # Utilities
    ├── setup_graph_index.py    # Initializes Vector Indices
    ├── test_large_pipeline.py  # End-to-End Verification
    └── run_pipeline.py         # Batch Processing

Authentication

All API requests require an API key:

import zomma
 
zomma.api_key = "your-api-key-here"

Or via environment variable:

export ZOMMA_API_KEY="your-api-key-here"

Rate Limits

  • Free Tier: 1,000 requests/day, 10 documents/day
  • Pro Tier: 100,000 requests/day, 1,000 documents/day
  • Enterprise: Custom limits

Next Steps