GET-109 · Architecture variant

Neo4j: dedicated graph database

Architectural variant of the Network Intelligence System using Neo4j — a separate graph database alongside Postgres. Postgres remains the source of truth; Neo4j carries traversal and shortest-path queries. Best-in-class graph performance; significant operational overhead.

Drafted 2026-04-29 · Status: review · Audience: Getro engineering team · Sibling: Apache AGE variant

1. Overview

When this variant wins

Choose Neo4j if 4+ hop traversal is a confirmed product requirement, weighted shortest-path is the dominant query pattern, or you need built-in graph algorithms (centrality, community detection). Neo4j ships these natively at sub-second latency. The cost is real: a second database to operate, a sync pipeline to maintain, and a Cypher learning curve for the team.

Like the AGE variant, everything in canonical 008 stays the same — InteractionEvent ingestion, OAuth, Findem enrichment, rule engine, UI. Postgres remains the source of truth and stores all per-pair caches and rollups. Neo4j is added for one purpose only: traversal queries that exceed what Postgres can do efficiently.

1.1 What Neo4j brings to the table

  • Native graph storage: nodes and edges as first-class objects with O(1) traversal between them. No row-based table walks.
  • Cypher: the original openCypher implementation, mature query language designed for traversal.
  • Built-in algorithms: shortestPath(), allShortestPaths(), weighted Dijkstra, A*, PageRank, community detection — all native, all production-tested.
  • Bidirectional search: at 4+ hop, Neo4j explores from both ends simultaneously, dramatically pruning the search space.

1.2 Why this variant exists

The canonical Postgres plan handles 1+2 hop queries beautifully and 3-hop adequately. The AGE variant pushes 3-hop and shortest-path into Postgres-native graph syntax. Neo4j is the option for when traversal queries become structurally important enough to justify their own infrastructure — typically when 4-hop, weighted shortest-path, or centrality become product requirements.

2. System architecture

Two databases. Postgres is the source of truth. Neo4j is a derived index optimized for graph traversal. A sync pipeline keeps them consistent.

flowchart LR subgraph App["Rails app"] Hooks[after_commit hooks] Workers[Sidekiq workers] Services[Service layer
SQL or Cypher] end subgraph PG["Postgres (source of truth)"] Source[Source tables
contact_work_experiences
interaction_events
UCC] PgCaches[Postgres caches
WorkOverlapCache
UserContactInteractionStats
SharedListNetworkSummary] end subgraph Neo["Neo4j (traversal index)"] Graph[(Graph store
Contact, Org, School
WORKED_AT, STUDIED_AT,
SHARED_IN, CURRENT_AT)] end Source -- "writes" --> Hooks Hooks --> Workers Workers -- "writes Postgres" --> PgCaches Workers -- "writes Neo4j (sync)" --> Graph Services -- "SQL: rollups, aggregation,
per-pair lookups" --> PgCaches Services -- "SQL: source reads" --> Source Services -- "Cypher: traversal,
shortest-path" --> Graph classDef pg fill:#fef3c7,stroke:#b45309 classDef neo fill:#dbeafe,stroke:#1e40af class Source,PgCaches pg class Graph neo

2.1 What's same as canonical 008

  • All source ingestion (Phase 1–3, 6, 7) is identical.
  • Postgres remains the source of truth. Every record exists in Postgres first.
  • Per-pair caches and rollups stay in Postgres tables.

2.2 What's different

  • Neo4j cluster stands alongside Postgres in production.
  • Sync pipeline: every after_commit from a relevant table writes to both Postgres and Neo4j (dual-write) or via change-data-capture.
  • Cypher service layer: a Rails service for graph queries using the official Bolt-protocol Ruby driver.
  • Drill-in queries with traversal > 2 hops hit Neo4j; everything else stays in Postgres.

3. Data model (graph schema)

One Neo4j database. Five vertex labels, four relationship types for v1. Schema constraints define uniqueness and indexed properties.

3.0 What lives where

Before reading the schema, anchor on the storage boundary. Postgres remains the source of truth. Neo4j is added as a derived graph store for traversal queries — it does not replace any relational tables.

Layer Where it lives Why
Source data
(facts about the world)
contact_work_experiences (PG)
contact_education (PG)
UCC.shared (PG)
+ duplicated as :WORKED_AT, :STUDIED_AT, :SHARED_IN relationships in Neo4j
PG is required for Rails ActiveRecord, Findem sync, and every non-graph feature. Neo4j gets its own copy because graph traversal needs graph-shaped storage. Two physical databases, sync layer required (dual-write or CDC). This is the largest operational cost of this variant.
Derived edges
(inferred from source)
Neo4j only — computed at query time via pattern match, OR materialized as :COWORKER_OF relationships if read latency demands. No ContactCoemploymentEdge Postgres table in this variant. Native graph storage makes pattern match fast even without precomputation. Single source of derived truth — no PG ↔ Neo4j drift on inferred edges.
Per-pair caches
(strength signals)
PG only — UserContactInteractionStats, WorkOverlapCache, etc. Per-pair lookups don't benefit from graph structure. PG keyed lookups are fastest, and round-tripping to Neo4j adds Bolt-protocol overhead without value.
Rollup tables
(list-view aggregations)
PG only — SharedListNetworkSummary, ContactReachability Hot-path read budget (50ms p99) requires a single indexed Postgres lookup. Aggregation is SQL's home turf, and a Neo4j network round-trip alone exceeds the budget.

The duplication is in source data, not derived edges

Adopting Neo4j means duplicating raw facts (work history, education, UCCs) across two databases. It does not mean duplicating derived edges (coemployment, coeducation) — those move entirely into Neo4j. The unavoidable cost is the sync layer for source data; the avoided cost is precomputed-edge maintenance in the canonical PG plan. Cross-DB sync is the largest engineering risk in this variant — see §7 for sync strategies.

3.1 Schema setup

// Constraints (uniqueness + creates indexes automatically)
CREATE CONSTRAINT contact_id IF NOT EXISTS FOR (c:Contact) REQUIRE c.id IS UNIQUE;
CREATE CONSTRAINT org_id IF NOT EXISTS FOR (o:Organization) REQUIRE o.id IS UNIQUE;
CREATE CONSTRAINT school_id IF NOT EXISTS FOR (s:School) REQUIRE s.id IS UNIQUE;
CREATE CONSTRAINT user_id IF NOT EXISTS FOR (u:TeamUser) REQUIRE u.id IS UNIQUE;

// Additional property indexes
CREATE INDEX contact_email IF NOT EXISTS FOR (c:Contact) ON (c.primary_email);
CREATE INDEX worked_at_dates IF NOT EXISTS FOR ()-[r:WORKED_AT]-() ON (r.date_from, r.date_to);
CREATE INDEX shared_in_collection IF NOT EXISTS FOR ()-[r:SHARED_IN]-() ON (r.collection_id);

3.2 Node labels

LabelPropertiesSourced from Postgres
Contactid, name, primary_emailcontacts
Organizationid, name, linkedin_urlorganizations
Schoolid, namecanonical schools table
TeamUserid, emailusers
Collectionid, network_idcollections

3.3 Relationship types

TypeFrom → ToPropertiesSource
WORKED_ATContact → Organizationdate_from (date), date_to (date), title, is_current (bool)contact_work_experiences
STUDIED_ATContact → Schooldate_from, date_to, degreecontact_education
SHARED_INTeamUser → Contactcollection_id, source, connected_atUCC.shared
CURRENT_ATContact → Organizationstarted_at, titlederived: WORKED_AT where date_to IS NULL

Coemployment is implicit

Like the AGE variant, coemployment edges are not materialized. Neo4j computes them at query time via pattern match: (c1:Contact)-[w1:WORKED_AT]->(o:Organization)<-[w2:WORKED_AT]-(c2:Contact) with date-overlap predicate. Native graph storage makes this fast even without materialization.

3.4 Native typing wins here

Unlike AGE's jsonb properties, Neo4j has native types: date, datetime, bool, int, string. Date-overlap predicates run on indexed typed properties; performance is materially better than jsonb-based pattern matching.

4. Query patterns

Side-by-side: canonical Postgres (recursive CTE) vs Neo4j (Cypher).

4.1 Direct connection (1-hop)

Stays in Postgres. Neo4j wouldn't add value for keyed lookups.

-- Canonical Postgres (recommended for 1-hop)
SELECT contact_id, current_title
FROM   collection_org_current_shared_contacts
WHERE  collection_id = $1 AND organization_id = $2;

4.2 Work-overlap intro path (2-hop)

Either substrate works; Postgres is faster for 2-hop in practice once ContactCoemploymentEdge is precomputed.

Canonical Postgres (precomputed)

SELECT via_contact_id, other_contact_id,
       organization_id, overlap_from, overlap_to
FROM   contact_coemployment_edges
WHERE  collection_id = $1
  AND  other_contact_id IN (
         SELECT contact_id
         FROM   collection_org_current_shared_contacts
         WHERE  collection_id = $1
           AND  organization_id = $2);

Neo4j Cypher

MATCH (u:TeamUser)
      -[s:SHARED_IN {collection_id: $cid}]->
      (via:Contact)
      -[w1:WORKED_AT]->(bridge:Organization)
      <-[w2:WORKED_AT]-(target:Contact)
      -[:CURRENT_AT]->
      (targetOrg:Organization {id: $oid})
WHERE w1.date_from <= w2.date_to
  AND w2.date_from <= w1.date_to
RETURN via, target, bridge, w1, w2
ORDER BY w1.date_to DESC;

4.3 Extended intro path (3-hop and 4-hop)

This is where Neo4j wins decisively. Variable-length path matching with a depth bound is one Cypher line; the Postgres recursive CTE equivalent is dozens of lines and starts to fall over at depth 4.

// 4-hop intro path with date-overlap predicate
MATCH path = (u:TeamUser {id: $uid})
             -[:SHARED_IN {collection_id: $cid}]->
             (start:Contact)
             -[:WORKED_AT*1..4]-(:Contact)
             -[:CURRENT_AT]->
             (target:Organization {id: $oid})
WHERE all(rel IN [r IN relationships(path) WHERE type(r) = 'WORKED_AT']
          WHERE rel.date_from IS NOT NULL)
RETURN path, length(path) AS hops
ORDER BY hops ASC
LIMIT 25;

4.4 Weighted shortest-path

Native, sub-second.

// Find the cheapest intro path to anyone at TargetCo
MATCH (u:TeamUser {id: $uid}),
      (target:Organization {id: $oid})
CALL apoc.algo.dijkstra(u, target, 'WORKED_AT|SHARED_IN|CURRENT_AT', 'weight')
YIELD path, weight
RETURN path, weight
ORDER BY weight ASC
LIMIT 5;

Edge weights can be set per relationship at write time (e.g., SHARED_IN = 1, WORKED_AT overlap = 3, longer time gaps = higher weight). Adjust weights freely; queries adapt without changing structure.

4.5 Network analytics

Neo4j Graph Data Science library ships PageRank, betweenness centrality, community detection, etc. Useful for "find super-connectors in my network" or "cluster contacts by community" features. Out of scope for v1 but available as a future product capability.

5. Heuristic mapping

Every signal the system computes — every strength clause, the reachability rollup, key connection picker, and all 007 connection-path types — mapped to where it actually executes in the Neo4j variant. The point of this section: most heuristics still execute in Postgres. Neo4j earns its keep on traversal-shaped queries and unlocks graph algorithms (centrality, community detection) that no other substrate can match.

Bottom line for this section

Of 16 strength clauses + reachability + key connection + 6 connection-path types, Neo4j substantially helps with ~9 — same headline as AGE — but with native shortest-path, weighted traversal, and Graph Data Science algorithms that AGE doesn't have. The gap shows up at depth 4+ and on analytics queries; under depth 3 the difference is mostly syntactic.

5.1 Strength heuristics (008 — Warm / Known / Cold)

Counter-aggregation clauses stay in Postgres. Neo4j is invoked only when a clause needs graph traversal. The Postgres ↔ Neo4j boundary is identical to the AGE variant; only the graph-side syntax differs.

ID Clause Source data Substrate Neo4j involvement
W1 2-way exchange in 12mo, 5+ each direction UserContactInteractionStats counters PG only None — counter check
W2 Sustained 2-year back-and-forth (low volume) UserContactInteractionStats.email_activity_bitmap_12q PG only None
W3 Recent email (3mo) + response + work OR LinkedIn connection Stats + work overlap + LinkedIn graph Hybrid Neo4j answers the LinkedIn-1st-degree branch via MATCH (u)-[:LINKEDIN_CONNECTED]-(c). Single round-trip Bolt query; ~10–20ms latency including network. PG handles email recency.
W4 1+ calendar meeting in last 180d UserContactInteractionStats.meeting_count_180d PG only None
W5 Same-team / small company overlap Coemployment + org employee count Hybrid Cypher: MATCH (a:Contact)-[:WORKED_AT]-(o)-[:WORKED_AT]-(b:Contact) WITH a, b, o, size((o)<-[:WORKED_AT]-()) AS hc WHERE hc < 50. Native graph storage makes the size check fast.
K1 Occasional 2-way (3–10 emails total) UserContactInteractionStats PG only None
K2 Inbound non-newsletter within 24mo UserContactInteractionStats PG only None
K3 Meeting 180d–24mo ago UserContactInteractionStats.meeting_count_24mo PG only None
K4 Same-team / small company overlap (looser window) Same as W5 with relaxed predicates Hybrid Same Cypher pattern as W5 with looser threshold.
K5 Same company + location + function Work history with location + role filters Hybrid Multi-property pattern match: MATCH (a)-[w1:WORKED_AT]-(o)-[w2:WORKED_AT]-(b) WHERE w1.location = w2.location AND w1.function = w2.function. Native typing of properties (vs jsonb in AGE) makes this faster.
K6 LinkedIn connection AND (old work overlap OR shared school) LinkedIn graph + coemployment + edu Hybrid One Cypher query: MATCH (u)-[:LINKEDIN_CONNECTED]-(c) WITH u, c MATCH (u)-[:WORKED_AT|STUDIED_AT]-()-[:WORKED_AT|STUDIED_AT]-(c) RETURN .... Native multi-edge-type traversal.
C1 One-way outbound only (no reply) UserContactInteractionStats PG only None
C2 Inbound newsletter-style only UserContactInteractionStats with newsletter_flag PG only None
C3 LinkedIn 1st-degree with no other signal LinkedIn edge + absence of other signals Hybrid Neo4j confirms LinkedIn edge exists; PG confirms absence of email/meeting/coemployment. Cypher's EXISTS + negation handles the graph-side absence check natively.
C4 Past shared employment >5yr ago, no recent contact Coemployment with date filter + email recency Hybrid Cypher with native date math: MATCH (u)-[w:WORKED_AT]-()-[:WORKED_AT]-(c) WHERE w.date_to < date() - duration('P5Y'). Date types are first-class in Neo4j.
C5 Default fallback Always PG only None

5.2 Reachability (008 — High / Medium / Low / None)

Reachability rolls up strength tiers across all team members for a given (contact, collection): count Warm / Known / Cold owners, apply tier rules, write to ContactReachability.

  • Substrate: PG only
  • Why: pure aggregation of UCC strength columns. SQL GROUP BY with FILTER clauses.
  • Neo4j involvement: none. Aggregation is SQL's home turf; round-tripping to Neo4j costs latency without value.

5.3 Key Connection (008 — best person on team to ask for an intro)

Per (contact, collection), pick the team member with the strongest tie. Two flavors:

5.3.1 V1 — direct UCC pick

  • Substrate: PG only
  • Sort UCCs by tier (Warm > Known > Cold) then by strength_score; take first row.
  • Neo4j adds nothing; PG ORDER BY + LIMIT 1 is fastest.

5.3.2 V2 — best weighted path including indirect intros

  • Substrate: Neo4j wins
  • Cypher with shortestPath() or apoc.algo.dijkstra finds the cheapest weighted route from any team user to the contact, allowing intermediate via-contacts.
  • Native algorithm; sub-200ms even at depth 4. Equivalent in vanilla PG would be a custom recursive CTE with weighted cost accumulation — possible but ugly.
  • Sample query: MATCH (u:TeamUser), (c:Contact {id: $cid}) CALL apoc.algo.dijkstra(u, c, 'WORKED_AT|STUDIED_AT|SHARED_IN', 'cost') YIELD path, weight RETURN path, weight ORDER BY weight ASC LIMIT 1.

5.4 Connection paths (007 — direct + intro paths)

This is where Neo4j shines structurally. Native graph storage, native shortest-path, native multi-edge-type traversal. Latency at depth 3–4 is materially better than AGE.

Path type Spec phase Substrate Neo4j note
Direct connection (1-hop) 007 v1 PG rollup List view reads SharedListNetworkSummary; Neo4j round-trip overhead is a loss at depth 1.
Work-overlap intro path (2-hop) 007 v1 Neo4j or PG Native pattern match. PG with precomputed edges is competitive at depth 2; Neo4j wins at depth 3+.
Education-overlap intro path 007 v2 Neo4j primary Adding STUDIED_AT as a bridge type is one Cypher line. No new edge table, no new worker class, no migration.
Co-investment intro path 007 v3 Neo4j primary Same shape with INVESTED_IN edge label.
Investor → Employee path 007 v3 Neo4j primary Mixed-edge pattern is one query: (investor)-[:INVESTED_IN]->(c)<-[:CURRENT_AT]-(employee).
Board peer path 007 v3 Neo4j primary BOARD_OF as the bridge edge.
3-hop traversal (any combo) future feature Neo4j wins decisively Variable-length match: (u)-[:SHARED_IN]->(via)-[:WORKED_AT|STUDIED_AT|INVESTED_IN*1..3]-(target). Bidirectional search prunes the explored space.
4-hop traversal future feature Neo4j only feasible option ~600ms vs ~15s in vanilla PG and ~3s in AGE. This is the structural win.
Weighted shortest path / top-K paths future feature Neo4j wins Native Dijkstra via apoc.algo.dijkstra; native top-K via apoc.algo.kShortestPath. Sub-second at typical scale.
Centrality / super-connectors future feature Neo4j only Graph Data Science library: gds.pageRank.stream(...), gds.betweenness.stream(...). Not available in AGE or PG.
Community detection future feature Neo4j only gds.louvain or gds.labelPropagation. Useful for "cluster contacts in my collection by community" features.

5.5 Honest summary

Of 23 distinct queries the system needs to answer in v1/v2 (16 strength clauses + reachability + key connection + 6 path types), Neo4j meaningfully helps on ~9 — same as AGE. The Neo4j-specific upside shows up beyond v2:

  • 4-hop traversal becomes feasible (~600ms vs infeasible in PG, slow in AGE).
  • Native shortest-path with weights for Key Connection v2 and best-intro-path features.
  • Graph Data Science library opens centrality, community detection, PageRank — capabilities that don't exist in AGE or PG.

The decision becomes clear: Neo4j is worthwhile only if 4-hop traversal, weighted shortest-path, or graph algorithms become product features within the next 2–3 quarters. For the canonical 008 + 007 v1/v2 scope, the gap between Neo4j and AGE is mostly syntactic; the operational overhead is what determines the answer.

6. Caching architecture

The layered cache strategy from canonical 008 §6 still applies. Same per-cache reasoning, different placement.

LayerCanonical 008Neo4j variant
Layer 2 — per-pair caches Postgres tables Same — Postgres tables. Per-pair keyed lookups don't benefit from graph structure.
Layer 3 — cross-pair edges Postgres tables (ContactCoemploymentEdge) Neo4j relationships (computed at query time from WORKED_AT patterns). Or precomputed as :COWORKER_OF if read latency demands.
Layer 4 — list rollups Postgres tables Same — Postgres tables. Aggregation stays in SQL.

Neo4j's role is narrow: it answers traversal-shaped questions. Aggregation, filtering by enum, sorting by computed columns — all stay in Postgres. The result is a hybrid where each query lands in the substrate that handles it best.

7. Sync layer (the tricky part)

The biggest engineering cost in this variant. Postgres is source of truth; Neo4j must stay in sync. Three approaches, increasing complexity.

6.1 Dual-write from Sidekiq workers

The simplest approach. Every after_commit hook that writes to Postgres also enqueues a Sidekiq job that writes to Neo4j.

class ContactWorkExperience < ApplicationRecord
  after_commit :enqueue_neo4j_sync, on: [:create, :update, :destroy]

  def enqueue_neo4j_sync
    Neo4j::WorkExperienceSyncWorker.perform_async(id, action_taken)
  end
end
  • Pros: easy to reason about; all sync logic in Sidekiq; same retry semantics as other workers.
  • Cons: dual-write race conditions (Postgres committed but Neo4j write failed). Need reconciliation jobs to detect drift. If Sidekiq is backed up, Neo4j lags.

6.2 Change-data-capture (CDC) via Debezium / logical replication

Postgres logical replication streams every committed change to a CDC consumer that writes to Neo4j.

  • Pros: at-most-once delivery semantics from the WAL; no race conditions; lag is observable.
  • Cons: significant ops complexity; new infrastructure (Debezium, Kafka or equivalent); CDC schema mapping per table.

6.3 Periodic full rebuild

Nightly cron drops the Neo4j graph and rebuilds from scratch by reading Postgres.

  • Pros: simplest; correctness guaranteed.
  • Cons: stale by up to 24h; rebuild may take hours at scale.

6.4 Recommended: dual-write + nightly reconciliation

Start with 6.1 (dual-write) for write path. Add a nightly reconciliation job that compares row counts and sample-checks key entities; flags drift. Move to CDC only if dual-write proves unstable in production.

ApproachSetup effortOngoing opsLagDrift risk
Dual-write3–5 daysLowSeconds to minutesMedium
CDC3–4 weeksHighSub-secondLow
Periodic rebuild2 daysLowUp to 24hResolved nightly

8. Operational envelope

MetricCanonical 008Neo4j variantDelta
Storage per collection (Postgres + Neo4j combined) ~5 MB (v1) ~5 MB (PG) + ~50 MB (Neo4j) = ~55 MB 11× larger
List-view p99 ~30 ms ~30 ms (still hits Postgres rollup) Same
Drill-in 2-hop p99 ~80 ms ~150 ms (Neo4j Bolt round-trip) ~2× slower at depth 2
Drill-in 3-hop p99 ~1.5 s ~250 ms ~6× faster
Drill-in 4-hop p99 ~15 s (infeasible) ~600 ms 25× faster
Shortest-path Custom CTE work, ~1–3 s Native Dijkstra, ~150 ms Native support is the win
Network analytics (centrality, etc.) Not feasible Native via Graph Data Science library Unlocks new capabilities
Write amplification 1–30 UPSERTs (Postgres) 1–30 UPSERTs (PG) + parallel Neo4j writes Doubled write paths
New infrastructure None Neo4j cluster + sync pipeline Significant
Monthly hosting cost (rough) ~$0 incremental ~$200–800/month for managed Neo4j (Aura) Recurring

9. Migration cost

PhaseWorkEffort
1. Provision Neo4j Choose hosting (Neo4j Aura managed cloud, self-hosted on EC2/k8s, or Memgraph if avoiding the Neo4j brand). Set up dev/staging/prod clusters. HA configuration. Backup schedule. 1–2 weeks
2. Driver + connection pooling Add neo4j-ruby-driver to Gemfile. Configure connection pool (Bolt protocol, persistent connections). Wire credentials through Rails secrets. 2–3 days
3. Schema definition + constraints Write Cypher migration files (uniqueness constraints, indexes). Build a migration runner equivalent to db:migrate for Neo4j. 3–5 days
4. One-time bulk import Read Postgres tables (contacts, organizations, work experiences, education, UCCs). Emit Cypher CREATE/MERGE statements. Load into Neo4j. Verify counts. 5–7 days (depends on data volume)
5. Sync layer (dual-write + reconciliation) Sidekiq workers per source table. Reconciliation job. Drift monitoring. 5–7 days
6. Service-layer query migration Build a Neo4j::Query service. Migrate 3-hop drill-in to Cypher. Add shortest-path queries. Keep 1-hop and 2-hop in Postgres. 5–7 days
7. Test fixtures Build test helpers that populate Neo4j from FactoryBot factories. RSpec teardown that cleans Neo4j between tests. 3–5 days
8. Observability Wire Neo4j metrics into existing dashboards. Alert on Neo4j down, sync lag, query latency degradation. 3 days
9. Production rollout + canary Dual-read for canary period (compare Cypher vs CTE results). Promote when result parity hits 99.9%+. Tear down Postgres-side traversal code. 5 days + 2-week canary

Total: 6–10 weeks engineering, plus a 2-week canary. Add 2–4 weeks for any team upskilling on Cypher.

10. What you give up

ConcernReality
Two databases to operate Backups, monitoring, security review, on-call rotation, version upgrades — all double. Even with managed Neo4j (Aura), there's still a separate vendor relationship, separate billing, separate auth.
Sync drift risk Dual-write means Postgres and Neo4j can diverge if a write fails on one side. Reconciliation jobs catch most drift but add latency. CDC eliminates the risk but adds significant complexity.
Hosting cost Managed Neo4j Aura: ~$200/month for entry, ~$800/month for production-grade. Self-hosted: free but requires ops effort. Either way, recurring cost the canonical plan doesn't have.
Cypher learning curve Two query languages in the codebase. Engineers context-switch. Code reviews need both expertises. New hires need both.
Backups Neo4j backup is its own pipeline (no pg_dump equivalent across both databases). Restoring to a consistent snapshot across both stores requires careful coordination.
Cross-DB transactions No way to atomically write to both Postgres and Neo4j. Have to handle "Postgres committed, Neo4j failed" as a recoverable inconsistency.
Schema migrations across two stores Adding a new entity type means migrations in both. Adding a property means coordinating both. Easy to forget one.
License (Enterprise features) Community edition covers core graph + Cypher. Enterprise features (clustering for HA, hot backups, fine-grained security, Graph Data Science scale) require a license. Aura includes Enterprise; self-hosted requires explicit licensing.

9.1 The honest summary

Neo4j is the right answer when graph traversal is core to the product, not when it's a side feature. If 4+ hop traversal, weighted shortest-path, and graph algorithms are 30%+ of the read pattern, Neo4j pays for itself. If they're occasional drill-in queries, the operational cost outweighs the latency win — that's when AGE or vanilla Postgres CTEs are the right answer.

11. Mini-ADR

ADR-NEO4J Adopt Neo4j as a derived graph store alongside Postgres Hypothetical
Context
008 has confirmed product requirements for 4+ hop intro paths, weighted shortest-path ranking, and network analytics (centrality / super-connector identification). Postgres recursive CTEs cannot meet these requirements at the latency budget. Apache AGE is closer but still slower than dedicated graph storage at depth 4+. Neo4j is the substrate of choice for traversal-heavy workloads.
Decision
Add Neo4j as a derived graph store. Postgres remains the source of truth — every write goes to Postgres first. Sidekiq workers dual-write to Neo4j on after_commit. The service layer routes queries: aggregations and per-pair lookups stay in Postgres; traversal > 2 hops, shortest-path, and graph algorithms hit Neo4j.
Consequences
  • + 4-hop drill-in becomes feasible (~600ms vs ~15s in Postgres).
  • + Native shortest-path and weighted Dijkstra unlock ranking features.
  • + Graph Data Science library opens centrality and community-detection features.
  • + Graph schema is more readable (Cypher).
  • − Two databases to operate; two backups; two on-call surfaces.
  • − Sync drift between Postgres and Neo4j is a real failure mode.
  • − Hosting cost: ~$200–800/month for managed Neo4j Aura.
  • − 6–10 weeks engineering + 2-week canary to ship.
  • − Team upskilling on Cypher is required.
Reversibility
Low. Once Neo4j is in production, removing it requires migrating all graph queries back to Postgres (recursive CTEs) and decommissioning the cluster. ~4 weeks of work plus production careful-rollback. Treat the decision as semi-permanent.
Trigger to revisit
If graph queries fall to <5% of read traffic; if Postgres alone proves sufficient (e.g., AGE matures and matches Neo4j performance for our workload); if hosting cost outpaces feature value.
Adopt only if
(a) 4+ hop traversal or weighted shortest-path is a confirmed product requirement, (b) team has appetite for operating two databases, (c) hosting budget includes ~$500/month line item indefinitely, (d) graph algorithms (centrality, community detection) are on the product roadmap.