GET-109 · Architecture variant

Neo4j: dedicated graph database

Architectural variant of the Network Intelligence System using Neo4j — a separate graph database alongside Postgres. Postgres remains the source of truth; Neo4j carries traversal and shortest-path queries. Best-in-class graph performance; significant operational overhead.

Drafted 2026-04-29 · Status: review · Audience: Getro engineering team · Sibling: Apache AGE variant

1. Overview

When this variant wins

Choose Neo4j if 4+ hop traversal is a confirmed product requirement, weighted shortest-path is the dominant query pattern, or you need built-in graph algorithms (centrality, community detection). Neo4j ships these natively at sub-second latency. The cost is real: a second database to operate, a sync pipeline to maintain, and a Cypher learning curve for the team.

Like the AGE variant, everything in canonical 008 stays the same — InteractionEvent ingestion, OAuth, Findem enrichment, rule engine, UI. Postgres remains the source of truth and stores all per-pair caches and rollups. Neo4j is added for one purpose only: traversal queries that exceed what Postgres can do efficiently.

1.1 What Neo4j brings to the table

Native graph storage: nodes and edges as first-class objects with O(1) traversal between them. No row-based table walks.
Cypher: the original openCypher implementation, mature query language designed for traversal.
Built-in algorithms: shortestPath(), allShortestPaths(), weighted Dijkstra, A*, PageRank, community detection — all native, all production-tested.
Bidirectional search: at 4+ hop, Neo4j explores from both ends simultaneously, dramatically pruning the search space.

1.2 Why this variant exists

The canonical Postgres plan handles 1+2 hop queries beautifully and 3-hop adequately. The AGE variant pushes 3-hop and shortest-path into Postgres-native graph syntax. Neo4j is the option for when traversal queries become structurally important enough to justify their own infrastructure — typically when 4-hop, weighted shortest-path, or centrality become product requirements.

2. System architecture

Two databases. Postgres is the source of truth. Neo4j is a derived index optimized for graph traversal. A sync pipeline keeps them consistent.

flowchart LR subgraph App["Rails app"] Hooks[after_commit hooks] Workers[Sidekiq workers] Services[Service layer
SQL or Cypher] end subgraph PG["Postgres (source of truth)"] Source[Source tables
contact_work_experiences
interaction_events
UCC] PgCaches[Postgres caches
WorkOverlapCache
UserContactInteractionStats
SharedListNetworkSummary] end subgraph Neo["Neo4j (traversal index)"] Graph[(Graph store
Contact, Org, School
WORKED_AT, STUDIED_AT,
SHARED_IN, CURRENT_AT)] end Source -- "writes" --> Hooks Hooks --> Workers Workers -- "writes Postgres" --> PgCaches Workers -- "writes Neo4j (sync)" --> Graph Services -- "SQL: rollups, aggregation,
per-pair lookups" --> PgCaches Services -- "SQL: source reads" --> Source Services -- "Cypher: traversal,
shortest-path" --> Graph classDef pg fill:#fef3c7,stroke:#b45309 classDef neo fill:#dbeafe,stroke:#1e40af class Source,PgCaches pg class Graph neo

2.1 What's same as canonical 008

All source ingestion (Phase 1–3, 6, 7) is identical.
Postgres remains the source of truth. Every record exists in Postgres first.
Per-pair caches and rollups stay in Postgres tables.

2.2 What's different

Neo4j cluster stands alongside Postgres in production.
Sync pipeline: every after_commit from a relevant table writes to both Postgres and Neo4j (dual-write) or via change-data-capture.
Cypher service layer: a Rails service for graph queries using the official Bolt-protocol Ruby driver.
Drill-in queries with traversal > 2 hops hit Neo4j; everything else stays in Postgres.

3. Data model (graph schema)

One Neo4j database. Five vertex labels, four relationship types for v1. Schema constraints define uniqueness and indexed properties.

3.0 What lives where

Before reading the schema, anchor on the storage boundary. Postgres remains the source of truth. Neo4j is added as a derived graph store for traversal queries — it does not replace any relational tables.

Layer	Where it lives	Why
Source data (facts about the world)	`contact_work_experiences` (PG) `contact_education` (PG) `UCC.shared` (PG) + duplicated as `:WORKED_AT`, `:STUDIED_AT`, `:SHARED_IN` relationships in Neo4j	PG is required for Rails ActiveRecord, Findem sync, and every non-graph feature. Neo4j gets its own copy because graph traversal needs graph-shaped storage. Two physical databases, sync layer required (dual-write or CDC). This is the largest operational cost of this variant.
Derived edges (inferred from source)	Neo4j only — computed at query time via pattern match, OR materialized as `:COWORKER_OF` relationships if read latency demands.	No `ContactCoemploymentEdge` Postgres table in this variant. Native graph storage makes pattern match fast even without precomputation. Single source of derived truth — no PG ↔ Neo4j drift on inferred edges.
Per-pair caches (strength signals)	PG only — `UserContactInteractionStats`, `WorkOverlapCache`, etc.	Per-pair lookups don't benefit from graph structure. PG keyed lookups are fastest, and round-tripping to Neo4j adds Bolt-protocol overhead without value.
Rollup tables (list-view aggregations)	PG only — `SharedListNetworkSummary`, `ContactReachability`	Hot-path read budget (50ms p99) requires a single indexed Postgres lookup. Aggregation is SQL's home turf, and a Neo4j network round-trip alone exceeds the budget.

The duplication is in source data, not derived edges

Adopting Neo4j means duplicating raw facts (work history, education, UCCs) across two databases. It does not mean duplicating derived edges (coemployment, coeducation) — those move entirely into Neo4j. The unavoidable cost is the sync layer for source data; the avoided cost is precomputed-edge maintenance in the canonical PG plan. Cross-DB sync is the largest engineering risk in this variant — see §7 for sync strategies.

3.1 Schema setup

// Constraints (uniqueness + creates indexes automatically)
CREATE CONSTRAINT contact_id IF NOT EXISTS FOR (c:Contact) REQUIRE c.id IS UNIQUE;
CREATE CONSTRAINT org_id IF NOT EXISTS FOR (o:Organization) REQUIRE o.id IS UNIQUE;
CREATE CONSTRAINT school_id IF NOT EXISTS FOR (s:School) REQUIRE s.id IS UNIQUE;
CREATE CONSTRAINT user_id IF NOT EXISTS FOR (u:TeamUser) REQUIRE u.id IS UNIQUE;

// Additional property indexes
CREATE INDEX contact_email IF NOT EXISTS FOR (c:Contact) ON (c.primary_email);
CREATE INDEX worked_at_dates IF NOT EXISTS FOR ()-[r:WORKED_AT]-() ON (r.date_from, r.date_to);
CREATE INDEX shared_in_collection IF NOT EXISTS FOR ()-[r:SHARED_IN]-() ON (r.collection_id);

3.2 Node labels

Label	Properties	Sourced from Postgres
`Contact`	id, name, primary_email	`contacts`
`Organization`	id, name, linkedin_url	`organizations`
`School`	id, name	canonical schools table
`TeamUser`	id, email	`users`
`Collection`	id, network_id	`collections`

3.3 Relationship types

Type	From → To	Properties	Source
`WORKED_AT`	Contact → Organization	date_from (date), date_to (date), title, is_current (bool)	`contact_work_experiences`
`STUDIED_AT`	Contact → School	date_from, date_to, degree	`contact_education`
`SHARED_IN`	TeamUser → Contact	collection_id, source, connected_at	`UCC.shared`
`CURRENT_AT`	Contact → Organization	started_at, title	derived: `WORKED_AT` where `date_to IS NULL`

Coemployment is implicit

Like the AGE variant, coemployment edges are not materialized. Neo4j computes them at query time via pattern match: (c1:Contact)-[w1:WORKED_AT]->(o:Organization)<-[w2:WORKED_AT]-(c2:Contact) with date-overlap predicate. Native graph storage makes this fast even without materialization.

3.4 Native typing wins here

Unlike AGE's jsonb properties, Neo4j has native types: date, datetime, bool, int, string. Date-overlap predicates run on indexed typed properties; performance is materially better than jsonb-based pattern matching.

4. Query patterns

Side-by-side: canonical Postgres (recursive CTE) vs Neo4j (Cypher).

4.1 Direct connection (1-hop)

Stays in Postgres. Neo4j wouldn't add value for keyed lookups.

-- Canonical Postgres (recommended for 1-hop)
SELECT contact_id, current_title
FROM   collection_org_current_shared_contacts
WHERE  collection_id = $1 AND organization_id = $2;

4.2 Work-overlap intro path (2-hop)

Either substrate works; Postgres is faster for 2-hop in practice once ContactCoemploymentEdge is precomputed.

Canonical Postgres (precomputed)

SELECT via_contact_id, other_contact_id,
       organization_id, overlap_from, overlap_to
FROM   contact_coemployment_edges
WHERE  collection_id = $1
  AND  other_contact_id IN (
         SELECT contact_id
         FROM   collection_org_current_shared_contacts
         WHERE  collection_id = $1
           AND  organization_id = $2);

Neo4j Cypher

MATCH (u:TeamUser)
      -[s:SHARED_IN {collection_id: $cid}]->
      (via:Contact)
      -[w1:WORKED_AT]->(bridge:Organization)
      <-[w2:WORKED_AT]-(target:Contact)
      -[:CURRENT_AT]->
      (targetOrg:Organization {id: $oid})
WHERE w1.date_from <= w2.date_to
  AND w2.date_from <= w1.date_to
RETURN via, target, bridge, w1, w2
ORDER BY w1.date_to DESC;

4.3 Extended intro path (3-hop and 4-hop)

This is where Neo4j wins decisively. Variable-length path matching with a depth bound is one Cypher line; the Postgres recursive CTE equivalent is dozens of lines and starts to fall over at depth 4.

// 4-hop intro path with date-overlap predicate
MATCH path = (u:TeamUser {id: $uid})
             -[:SHARED_IN {collection_id: $cid}]->
             (start:Contact)
             -[:WORKED_AT*1..4]-(:Contact)
             -[:CURRENT_AT]->
             (target:Organization {id: $oid})
WHERE all(rel IN [r IN relationships(path) WHERE type(r) = 'WORKED_AT']
          WHERE rel.date_from IS NOT NULL)
RETURN path, length(path) AS hops
ORDER BY hops ASC
LIMIT 25;

4.4 Weighted shortest-path

Native, sub-second.

// Find the cheapest intro path to anyone at TargetCo
MATCH (u:TeamUser {id: $uid}),
      (target:Organization {id: $oid})
CALL apoc.algo.dijkstra(u, target, 'WORKED_AT|SHARED_IN|CURRENT_AT', 'weight')
YIELD path, weight
RETURN path, weight
ORDER BY weight ASC
LIMIT 5;

Edge weights can be set per relationship at write time (e.g., SHARED_IN = 1, WORKED_AT overlap = 3, longer time gaps = higher weight). Adjust weights freely; queries adapt without changing structure.

4.5 Network analytics

Neo4j Graph Data Science library ships PageRank, betweenness centrality, community detection, etc. Useful for "find super-connectors in my network" or "cluster contacts by community" features. Out of scope for v1 but available as a future product capability.

5. Heuristic mapping

Every signal the system computes — every strength clause, the reachability rollup, key connection picker, and all 007 connection-path types — mapped to where it actually executes in the Neo4j variant. The point of this section: most heuristics still execute in Postgres. Neo4j earns its keep on traversal-shaped queries and unlocks graph algorithms (centrality, community detection) that no other substrate can match.

Bottom line for this section

Of 16 strength clauses + reachability + key connection + 6 connection-path types, Neo4j substantially helps with ~9 — same headline as AGE — but with native shortest-path, weighted traversal, and Graph Data Science algorithms that AGE doesn't have. The gap shows up at depth 4+ and on analytics queries; under depth 3 the difference is mostly syntactic.

5.1 Strength heuristics (008 — Warm / Known / Cold)

Counter-aggregation clauses stay in Postgres. Neo4j is invoked only when a clause needs graph traversal. The Postgres ↔ Neo4j boundary is identical to the AGE variant; only the graph-side syntax differs.

ID	Clause	Source data	Substrate	Neo4j involvement
W1	2-way exchange in 12mo, 5+ each direction	`UserContactInteractionStats` counters	PG only	None — counter check
W2	Sustained 2-year back-and-forth (low volume)	`UserContactInteractionStats.email_activity_bitmap_12q`	PG only	None
W3	Recent email (3mo) + response + work OR LinkedIn connection	Stats + work overlap + LinkedIn graph	Hybrid	Neo4j answers the LinkedIn-1st-degree branch via `MATCH (u)-[:LINKEDIN_CONNECTED]-(c)`. Single round-trip Bolt query; ~10–20ms latency including network. PG handles email recency.
W4	1+ calendar meeting in last 180d	`UserContactInteractionStats.meeting_count_180d`	PG only	None
W5	Same-team / small company overlap	Coemployment + org employee count	Hybrid	Cypher: `MATCH (a:Contact)-[:WORKED_AT]-(o)-[:WORKED_AT]-(b:Contact) WITH a, b, o, size((o)<-[:WORKED_AT]-()) AS hc WHERE hc < 50`. Native graph storage makes the size check fast.
K1	Occasional 2-way (3–10 emails total)	`UserContactInteractionStats`	PG only	None
K2	Inbound non-newsletter within 24mo	`UserContactInteractionStats`	PG only	None
K3	Meeting 180d–24mo ago	`UserContactInteractionStats.meeting_count_24mo`	PG only	None
K4	Same-team / small company overlap (looser window)	Same as W5 with relaxed predicates	Hybrid	Same Cypher pattern as W5 with looser threshold.
K5	Same company + location + function	Work history with location + role filters	Hybrid	Multi-property pattern match: `MATCH (a)-[w1:WORKED_AT]-(o)-[w2:WORKED_AT]-(b) WHERE w1.location = w2.location AND w1.function = w2.function`. Native typing of properties (vs jsonb in AGE) makes this faster.
K6	LinkedIn connection AND (old work overlap OR shared school)	LinkedIn graph + coemployment + edu	Hybrid	One Cypher query: `MATCH (u)-[:LINKEDIN_CONNECTED]-(c) WITH u, c MATCH (u)-[:WORKED_AT\|STUDIED_AT]-()-[:WORKED_AT\|STUDIED_AT]-(c) RETURN ...`. Native multi-edge-type traversal.
C1	One-way outbound only (no reply)	`UserContactInteractionStats`	PG only	None
C2	Inbound newsletter-style only	`UserContactInteractionStats` with `newsletter_flag`	PG only	None
C3	LinkedIn 1st-degree with no other signal	LinkedIn edge + absence of other signals	Hybrid	Neo4j confirms LinkedIn edge exists; PG confirms absence of email/meeting/coemployment. Cypher's `EXISTS` + negation handles the graph-side absence check natively.
C4	Past shared employment >5yr ago, no recent contact	Coemployment with date filter + email recency	Hybrid	Cypher with native date math: `MATCH (u)-[w:WORKED_AT]-()-[:WORKED_AT]-(c) WHERE w.date_to < date() - duration('P5Y')`. Date types are first-class in Neo4j.
C5	Default fallback	Always	PG only	None

5.2 Reachability (008 — High / Medium / Low / None)

Reachability rolls up strength tiers across all team members for a given (contact, collection): count Warm / Known / Cold owners, apply tier rules, write to ContactReachability.

Substrate: PG only
Why: pure aggregation of UCC strength columns. SQL GROUP BY with FILTER clauses.
Neo4j involvement: none. Aggregation is SQL's home turf; round-tripping to Neo4j costs latency without value.

5.3 Key Connection (008 — best person on team to ask for an intro)

Per (contact, collection), pick the team member with the strongest tie. Two flavors:

5.3.1 V1 — direct UCC pick

Substrate: PG only
Sort UCCs by tier (Warm > Known > Cold) then by strength_score; take first row.
Neo4j adds nothing; PG ORDER BY + LIMIT 1 is fastest.

5.3.2 V2 — best weighted path including indirect intros

Substrate: Neo4j wins
Cypher with shortestPath() or apoc.algo.dijkstra finds the cheapest weighted route from any team user to the contact, allowing intermediate via-contacts.
Native algorithm; sub-200ms even at depth 4. Equivalent in vanilla PG would be a custom recursive CTE with weighted cost accumulation — possible but ugly.
Sample query: MATCH (u:TeamUser), (c:Contact {id: $cid}) CALL apoc.algo.dijkstra(u, c, 'WORKED_AT|STUDIED_AT|SHARED_IN', 'cost') YIELD path, weight RETURN path, weight ORDER BY weight ASC LIMIT 1.

5.4 Connection paths (007 — direct + intro paths)

This is where Neo4j shines structurally. Native graph storage, native shortest-path, native multi-edge-type traversal. Latency at depth 3–4 is materially better than AGE.

Path type	Spec phase	Substrate	Neo4j note
Direct connection (1-hop)	007 v1	PG rollup	List view reads `SharedListNetworkSummary`; Neo4j round-trip overhead is a loss at depth 1.
Work-overlap intro path (2-hop)	007 v1	Neo4j or PG	Native pattern match. PG with precomputed edges is competitive at depth 2; Neo4j wins at depth 3+.
Education-overlap intro path	007 v2	Neo4j primary	Adding `STUDIED_AT` as a bridge type is one Cypher line. No new edge table, no new worker class, no migration.
Co-investment intro path	007 v3	Neo4j primary	Same shape with `INVESTED_IN` edge label.
Investor → Employee path	007 v3	Neo4j primary	Mixed-edge pattern is one query: `(investor)-[:INVESTED_IN]->(c)<-[:CURRENT_AT]-(employee)`.
Board peer path	007 v3	Neo4j primary	`BOARD_OF` as the bridge edge.
3-hop traversal (any combo)	future feature	Neo4j wins decisively	Variable-length match: `(u)-[:SHARED_IN]->(via)-[:WORKED_AT\|STUDIED_AT\|INVESTED_IN*1..3]-(target)`. Bidirectional search prunes the explored space.
4-hop traversal	future feature	Neo4j only feasible option	~600ms vs ~15s in vanilla PG and ~3s in AGE. This is the structural win.
Weighted shortest path / top-K paths	future feature	Neo4j wins	Native Dijkstra via `apoc.algo.dijkstra`; native top-K via `apoc.algo.kShortestPath`. Sub-second at typical scale.
Centrality / super-connectors	future feature	Neo4j only	Graph Data Science library: `gds.pageRank.stream(...)`, `gds.betweenness.stream(...)`. Not available in AGE or PG.
Community detection	future feature	Neo4j only	`gds.louvain` or `gds.labelPropagation`. Useful for "cluster contacts in my collection by community" features.

5.5 Honest summary

Of 23 distinct queries the system needs to answer in v1/v2 (16 strength clauses + reachability + key connection + 6 path types), Neo4j meaningfully helps on ~9 — same as AGE. The Neo4j-specific upside shows up beyond v2:

4-hop traversal becomes feasible (~600ms vs infeasible in PG, slow in AGE).
Native shortest-path with weights for Key Connection v2 and best-intro-path features.
Graph Data Science library opens centrality, community detection, PageRank — capabilities that don't exist in AGE or PG.

The decision becomes clear: Neo4j is worthwhile only if 4-hop traversal, weighted shortest-path, or graph algorithms become product features within the next 2–3 quarters. For the canonical 008 + 007 v1/v2 scope, the gap between Neo4j and AGE is mostly syntactic; the operational overhead is what determines the answer.

6. Caching architecture

The layered cache strategy from canonical 008 §6 still applies. Same per-cache reasoning, different placement.

Layer	Canonical 008	Neo4j variant
Layer 2 — per-pair caches	Postgres tables	Same — Postgres tables. Per-pair keyed lookups don't benefit from graph structure.
Layer 3 — cross-pair edges	Postgres tables (`ContactCoemploymentEdge`)	Neo4j relationships (computed at query time from `WORKED_AT` patterns). Or precomputed as `:COWORKER_OF` if read latency demands.
Layer 4 — list rollups	Postgres tables	Same — Postgres tables. Aggregation stays in SQL.

Neo4j's role is narrow: it answers traversal-shaped questions. Aggregation, filtering by enum, sorting by computed columns — all stay in Postgres. The result is a hybrid where each query lands in the substrate that handles it best.

7. Sync layer (the tricky part)

The biggest engineering cost in this variant. Postgres is source of truth; Neo4j must stay in sync. Three approaches, increasing complexity.

6.1 Dual-write from Sidekiq workers

The simplest approach. Every after_commit hook that writes to Postgres also enqueues a Sidekiq job that writes to Neo4j.

class ContactWorkExperience < ApplicationRecord
  after_commit :enqueue_neo4j_sync, on: [:create, :update, :destroy]

  def enqueue_neo4j_sync
    Neo4j::WorkExperienceSyncWorker.perform_async(id, action_taken)
  end
end

Pros: easy to reason about; all sync logic in Sidekiq; same retry semantics as other workers.
Cons: dual-write race conditions (Postgres committed but Neo4j write failed). Need reconciliation jobs to detect drift. If Sidekiq is backed up, Neo4j lags.

6.2 Change-data-capture (CDC) via Debezium / logical replication

Postgres logical replication streams every committed change to a CDC consumer that writes to Neo4j.

Pros: at-most-once delivery semantics from the WAL; no race conditions; lag is observable.
Cons: significant ops complexity; new infrastructure (Debezium, Kafka or equivalent); CDC schema mapping per table.

6.3 Periodic full rebuild

Nightly cron drops the Neo4j graph and rebuilds from scratch by reading Postgres.

Pros: simplest; correctness guaranteed.
Cons: stale by up to 24h; rebuild may take hours at scale.

6.4 Recommended: dual-write + nightly reconciliation

Start with 6.1 (dual-write) for write path. Add a nightly reconciliation job that compares row counts and sample-checks key entities; flags drift. Move to CDC only if dual-write proves unstable in production.

Approach	Setup effort	Ongoing ops	Lag	Drift risk
Dual-write	3–5 days	Low	Seconds to minutes	Medium
CDC	3–4 weeks	High	Sub-second	Low
Periodic rebuild	2 days	Low	Up to 24h	Resolved nightly

8. Operational envelope

Metric	Canonical 008	Neo4j variant	Delta
Storage per collection (Postgres + Neo4j combined)	~5 MB (v1)	~5 MB (PG) + ~50 MB (Neo4j) = ~55 MB	11× larger
List-view p99	~30 ms	~30 ms (still hits Postgres rollup)	Same
Drill-in 2-hop p99	~80 ms	~150 ms (Neo4j Bolt round-trip)	~2× slower at depth 2
Drill-in 3-hop p99	~1.5 s	~250 ms	~6× faster
Drill-in 4-hop p99	~15 s (infeasible)	~600 ms	25× faster
Shortest-path	Custom CTE work, ~1–3 s	Native Dijkstra, ~150 ms	Native support is the win
Network analytics (centrality, etc.)	Not feasible	Native via Graph Data Science library	Unlocks new capabilities
Write amplification	1–30 UPSERTs (Postgres)	1–30 UPSERTs (PG) + parallel Neo4j writes	Doubled write paths
New infrastructure	None	Neo4j cluster + sync pipeline	Significant
Monthly hosting cost (rough)	~$0 incremental	~$200–800/month for managed Neo4j (Aura)	Recurring

9. Migration cost

Phase	Work	Effort
1. Provision Neo4j	Choose hosting (Neo4j Aura managed cloud, self-hosted on EC2/k8s, or Memgraph if avoiding the Neo4j brand). Set up dev/staging/prod clusters. HA configuration. Backup schedule.	1–2 weeks
2. Driver + connection pooling	Add `neo4j-ruby-driver` to Gemfile. Configure connection pool (Bolt protocol, persistent connections). Wire credentials through Rails secrets.	2–3 days
3. Schema definition + constraints	Write Cypher migration files (uniqueness constraints, indexes). Build a migration runner equivalent to `db:migrate` for Neo4j.	3–5 days
4. One-time bulk import	Read Postgres tables (contacts, organizations, work experiences, education, UCCs). Emit Cypher CREATE/MERGE statements. Load into Neo4j. Verify counts.	5–7 days (depends on data volume)
5. Sync layer (dual-write + reconciliation)	Sidekiq workers per source table. Reconciliation job. Drift monitoring.	5–7 days
6. Service-layer query migration	Build a `Neo4j::Query` service. Migrate 3-hop drill-in to Cypher. Add shortest-path queries. Keep 1-hop and 2-hop in Postgres.	5–7 days
7. Test fixtures	Build test helpers that populate Neo4j from FactoryBot factories. RSpec teardown that cleans Neo4j between tests.	3–5 days
8. Observability	Wire Neo4j metrics into existing dashboards. Alert on Neo4j down, sync lag, query latency degradation.	3 days
9. Production rollout + canary	Dual-read for canary period (compare Cypher vs CTE results). Promote when result parity hits 99.9%+. Tear down Postgres-side traversal code.	5 days + 2-week canary

Total: 6–10 weeks engineering, plus a 2-week canary. Add 2–4 weeks for any team upskilling on Cypher.

10. What you give up

Concern	Reality
Two databases to operate	Backups, monitoring, security review, on-call rotation, version upgrades — all double. Even with managed Neo4j (Aura), there's still a separate vendor relationship, separate billing, separate auth.
Sync drift risk	Dual-write means Postgres and Neo4j can diverge if a write fails on one side. Reconciliation jobs catch most drift but add latency. CDC eliminates the risk but adds significant complexity.
Hosting cost	Managed Neo4j Aura: ~$200/month for entry, ~$800/month for production-grade. Self-hosted: free but requires ops effort. Either way, recurring cost the canonical plan doesn't have.
Cypher learning curve	Two query languages in the codebase. Engineers context-switch. Code reviews need both expertises. New hires need both.
Backups	Neo4j backup is its own pipeline (no `pg_dump` equivalent across both databases). Restoring to a consistent snapshot across both stores requires careful coordination.
Cross-DB transactions	No way to atomically write to both Postgres and Neo4j. Have to handle "Postgres committed, Neo4j failed" as a recoverable inconsistency.
Schema migrations across two stores	Adding a new entity type means migrations in both. Adding a property means coordinating both. Easy to forget one.
License (Enterprise features)	Community edition covers core graph + Cypher. Enterprise features (clustering for HA, hot backups, fine-grained security, Graph Data Science scale) require a license. Aura includes Enterprise; self-hosted requires explicit licensing.

9.1 The honest summary

Neo4j is the right answer when graph traversal is core to the product, not when it's a side feature. If 4+ hop traversal, weighted shortest-path, and graph algorithms are 30%+ of the read pattern, Neo4j pays for itself. If they're occasional drill-in queries, the operational cost outweighs the latency win — that's when AGE or vanilla Postgres CTEs are the right answer.

11. Mini-ADR

ADR-NEO4J Adopt Neo4j as a derived graph store alongside Postgres Hypothetical

Context

008 has confirmed product requirements for 4+ hop intro paths, weighted shortest-path ranking, and network analytics (centrality / super-connector identification). Postgres recursive CTEs cannot meet these requirements at the latency budget. Apache AGE is closer but still slower than dedicated graph storage at depth 4+. Neo4j is the substrate of choice for traversal-heavy workloads.

Decision

Add Neo4j as a derived graph store. Postgres remains the source of truth — every write goes to Postgres first. Sidekiq workers dual-write to Neo4j on after_commit. The service layer routes queries: aggregations and per-pair lookups stay in Postgres; traversal > 2 hops, shortest-path, and graph algorithms hit Neo4j.

Consequences

+ 4-hop drill-in becomes feasible (~600ms vs ~15s in Postgres).
+ Native shortest-path and weighted Dijkstra unlock ranking features.
+ Graph Data Science library opens centrality and community-detection features.
+ Graph schema is more readable (Cypher).
− Two databases to operate; two backups; two on-call surfaces.
− Sync drift between Postgres and Neo4j is a real failure mode.
− Hosting cost: ~$200–800/month for managed Neo4j Aura.
− 6–10 weeks engineering + 2-week canary to ship.
− Team upskilling on Cypher is required.

Reversibility

Low. Once Neo4j is in production, removing it requires migrating all graph queries back to Postgres (recursive CTEs) and decommissioning the cluster. ~4 weeks of work plus production careful-rollback. Treat the decision as semi-permanent.

Trigger to revisit

If graph queries fall to <5% of read traffic; if Postgres alone proves sufficient (e.g., AGE matures and matches Neo4j performance for our workload); if hosting cost outpaces feature value.

Adopt only if

(a) 4+ hop traversal or weighted shortest-path is a confirmed product requirement, (b) team has appetite for operating two databases, (c) hosting budget includes ~$500/month line item indefinitely, (d) graph algorithms (centrality, community detection) are on the product roadmap.