Spike · GET-109

Network Intelligence System

Unified architecture for Getro's network intelligence: connection paths (spec 007), relationship strength + reachability + key connection (spec 008). Built on shared Sidekiq + Postgres caching infrastructure with Findem enrichment.

TL;DR

Two intelligence layers, one infrastructure. Spec 007 surfaces who can intro you to companies on your lists (direct + work-overlap intro paths). Spec 008 turns email + calendar metadata into Warm/Known/Cold strength tiers, reachability rollups, and key-connection picks. Both share the same per-pair caches, edge tables, and rollup architecture documented in the technical spec. 007 v1 ships independently of email/calendar ingestion; 008 thin V1 ships at ~50% heuristic coverage and grows to ~88% as Findem enrichment lands.

Technical spec

Unified architecture & execution plan

For engineers. The single source of truth for the system — services, data model, caching, integrations, phased plan covering both 007 and 008.

  • System architecture + ER diagrams
  • Data model: per-pair caches + edges + rollups
  • §6 Caching architecture (English + technical)
  • Rule engine & scoring service
  • Operational envelope (numbers)
  • Phased plan covering 007 v1/v2/v3 + 008 slices
  • Decision records — DR-01 through ADR-007-B
Open technical spec →
Data flow narrative

From data to strength

For PM, design, reviewers. What data we ingest and how it becomes a signal.

  • Raw data → signal primitives
  • Every heuristic in plain English + code
  • Combining clauses into tiers
  • Reachability rollup
  • Worked scenarios + scale walkthrough
  • Privacy guardrails
Open data-flow narrative →
Integration deep-dive

Google & Microsoft — per provider

For backend engineers. What exists, what extends, what's greenfield, with code skeletons and official doc links.

  • Verified current-state audit
  • OAuth flow + scope management
  • Gmail metadata + Calendar clients
  • MS Graph Mail + Calendar (greenfield)
  • Azure AD multi-tenant setup
  • Shared-mailbox decision matrix
Open integration deep-dive →
Architecture decision record

Graph DB vs Postgres for connection paths

For tech leads + reviewers. The long-form analysis behind ADR-007-A — why we chose Postgres tables over Neo4j, Apache AGE, and live JOINs.

  • The two query patterns (list view + drill-in)
  • Why this is not deeply graph-shaped
  • Options matrix & performance comparison
  • SQL vs Cypher side-by-side
  • Multi-edge-type schema strategy (5 path types)
  • Operational cost breakdown + revisit triggers
Open ADR companion →
Findem code audit

Findem code findings (firstcut + app-next)

For engineers + reviewers. What Findem already does, with clickable links to source files in firstcut and app-next.

  • The overlap kernel (formulas worth copying)
  • connection_svc batch path + 15k cap
  • LoadConnections runtime merge (backend + app-next consumer)
  • sandbox/matches — what it does and doesn't include
  • Profile data model (sparse connections, missing score)
  • Findem capability gaps F1–F9
Open code findings →
Architecture variant

Apache AGE — graph in Postgres

Variant: keep one database, add openCypher graph queries via the AGE extension. Smaller step toward graph capability than Neo4j; bigger step than recursive CTEs.

  • Graph schema (vlabels + elabels)
  • Cypher inside cypher() SQL functions
  • Same Postgres backups + connection pool
  • Hosting compatibility caveat (RDS doesn't support)
  • Operational envelope + migration cost
  • ADR-AGE — adopt only if conditions met
Open AGE variant →
Architecture variant

Neo4j — dedicated graph database

Variant: Postgres remains source of truth; Neo4j handles traversal and shortest-path. Best graph performance; significant operational overhead.

  • Two databases, sync layer (dual-write or CDC)
  • Native Cypher + shortestPath() + Graph Data Science
  • 4-hop drill-in feasible (~600ms vs ~15s in PG)
  • ~$200–800/month hosting (Aura managed)
  • 6–10 week migration + 2-week canary
  • ADR-NEO4J — adopt only if conditions met
Open Neo4j variant →
Performance benchmark

DeepFinder load test — depths 1 → 4

Production-scale benchmark of the Postgres recursive-CTE walk against a 500k-contact synthetic graph calibrated to Inovia Capital's real shape. Plain-English breakdown of every metric with charts.

  • 500k contacts, 500k orgs, 2.3M work_overlap edges
  • 4 independent runs × 4 depths = 6,500+ queries
  • Parameter sweeps: MAX_EDGES_PER_HOP, limit
  • Verdict: depth 4 fast (p95 ~70ms), 250ms timeout safe
  • Bottleneck is the LIMIT, not the SQL
  • Recommendations for cap tuning
Open performance benchmark →
Backfill scale reality check

Production backfill plan — what we're up against

Measured prod numbers, sandbox5 empirical throughput, and the realistic plan for running the work_overlap backfill at production scale.

  • Prod: 3.5M contacts to backfill, ~570M edges expected
  • Sandbox5: 12k edges/min sustained on optimized SQL
  • SQL benchmark: 619s → 15s per contact (40× speedup)
  • Estimated prod runtime: 8–30 hours depending on Sidekiq fleet
  • Known blocker: sandbox5 Sidekiq stability (interrupted ≠ resuming)
  • Prod runbook + acceptance criteria
Open backfill scale doc →
Load test (v1 vs v2 · local + sandbox5)

connection_paths: legacy vs FinderV2

k6 sweep across depths 1–4 × max_paths 25/100 × impl v1/v2 on both local and sandbox5 (real data). Real data exposes legacy Finder can't scale.

  • Sandbox5 depth=2: v2 5s p95, v1 38s p95 (7.6× faster)
  • Sandbox5 throughput: v2 4 RPS, v1 0.3 RPS (15× more)
  • v2 depth=1 sustains 18 RPS at 700ms p95
  • v2 depth=3 usable (~20s); depth=4 research-grade (~23–50s)
  • Local Rails dev numbers are misleading — trust sandbox5
Open load-test report →
Spec 039 explainer

Reachability & Strongest Connection filter

For PM, design, eng. The team-level rollup that turns the contacts list into an action list — column, filter, sort, profile chip, auto-update rules.

  • What "rollup" means + why a Postgres cache table
  • Why an OpenSearch mirror is unavoidable for filter/sort
  • 3 workers: 2 backfills (one-time) + 1 incremental
  • Onboarding sequence — strict 5-step order
  • What happens when a new admin joins + syncs
  • Alternatives considered + why rejected
Open reachability explainer →
Interactive playground

Try the API live

Hit the deep_connection_paths endpoint with a UI: pick a collection, contact, depth, and max_paths. See ranked paths visualized as chains. Auto-detects local vs sandbox5.

  • Form-driven request builder
  • Visualizes each path as a node chain
  • Shows latency, truncation, raw JSON
  • Auth token saved in localStorage
  • Cmd+Enter to fire query
Open playground →
Drafted 2026-04-22 Status · review Source markdown