Neo4j: dedicated graph database
Architectural variant of the Network Intelligence System using Neo4j — a separate graph database alongside Postgres. Postgres remains the source of truth; Neo4j carries traversal and shortest-path queries. Best-in-class graph performance; significant operational overhead.
Drafted 2026-04-29 · Status: review · Audience: Getro engineering team · Sibling: Apache AGE variant
1. Overview
When this variant wins
Choose Neo4j if 4+ hop traversal is a confirmed product requirement, weighted shortest-path is the dominant query pattern, or you need built-in graph algorithms (centrality, community detection). Neo4j ships these natively at sub-second latency. The cost is real: a second database to operate, a sync pipeline to maintain, and a Cypher learning curve for the team.
Like the AGE variant, everything in canonical 008 stays the same — InteractionEvent ingestion, OAuth, Findem enrichment, rule engine, UI. Postgres remains the source of truth and stores all per-pair caches and rollups. Neo4j is added for one purpose only: traversal queries that exceed what Postgres can do efficiently.
1.1 What Neo4j brings to the table
- Native graph storage: nodes and edges as first-class objects with O(1) traversal between them. No row-based table walks.
- Cypher: the original openCypher implementation, mature query language designed for traversal.
- Built-in algorithms:
shortestPath(),allShortestPaths(), weighted Dijkstra, A*, PageRank, community detection — all native, all production-tested. - Bidirectional search: at 4+ hop, Neo4j explores from both ends simultaneously, dramatically pruning the search space.
1.2 Why this variant exists
The canonical Postgres plan handles 1+2 hop queries beautifully and 3-hop adequately. The AGE variant pushes 3-hop and shortest-path into Postgres-native graph syntax. Neo4j is the option for when traversal queries become structurally important enough to justify their own infrastructure — typically when 4-hop, weighted shortest-path, or centrality become product requirements.
2. System architecture
Two databases. Postgres is the source of truth. Neo4j is a derived index optimized for graph traversal. A sync pipeline keeps them consistent.
SQL or Cypher] end subgraph PG["Postgres (source of truth)"] Source[Source tables
contact_work_experiences
interaction_events
UCC] PgCaches[Postgres caches
WorkOverlapCache
UserContactInteractionStats
SharedListNetworkSummary] end subgraph Neo["Neo4j (traversal index)"] Graph[(Graph store
Contact, Org, School
WORKED_AT, STUDIED_AT,
SHARED_IN, CURRENT_AT)] end Source -- "writes" --> Hooks Hooks --> Workers Workers -- "writes Postgres" --> PgCaches Workers -- "writes Neo4j (sync)" --> Graph Services -- "SQL: rollups, aggregation,
per-pair lookups" --> PgCaches Services -- "SQL: source reads" --> Source Services -- "Cypher: traversal,
shortest-path" --> Graph classDef pg fill:#fef3c7,stroke:#b45309 classDef neo fill:#dbeafe,stroke:#1e40af class Source,PgCaches pg class Graph neo
2.1 What's same as canonical 008
- All source ingestion (Phase 1–3, 6, 7) is identical.
- Postgres remains the source of truth. Every record exists in Postgres first.
- Per-pair caches and rollups stay in Postgres tables.
2.2 What's different
- Neo4j cluster stands alongside Postgres in production.
- Sync pipeline: every
after_commitfrom a relevant table writes to both Postgres and Neo4j (dual-write) or via change-data-capture. - Cypher service layer: a Rails service for graph queries using the official Bolt-protocol Ruby driver.
- Drill-in queries with traversal > 2 hops hit Neo4j; everything else stays in Postgres.
3. Data model (graph schema)
One Neo4j database. Five vertex labels, four relationship types for v1. Schema constraints define uniqueness and indexed properties.
3.0 What lives where
Before reading the schema, anchor on the storage boundary. Postgres remains the source of truth. Neo4j is added as a derived graph store for traversal queries — it does not replace any relational tables.
| Layer | Where it lives | Why |
|---|---|---|
| Source data (facts about the world) |
contact_work_experiences (PG)contact_education (PG)UCC.shared (PG)+ duplicated as :WORKED_AT, :STUDIED_AT, :SHARED_IN relationships in Neo4j
|
PG is required for Rails ActiveRecord, Findem sync, and every non-graph feature. Neo4j gets its own copy because graph traversal needs graph-shaped storage. Two physical databases, sync layer required (dual-write or CDC). This is the largest operational cost of this variant. |
| Derived edges (inferred from source) |
Neo4j only — computed at query time via pattern match, OR materialized as :COWORKER_OF relationships if read latency demands.
|
No ContactCoemploymentEdge Postgres table in this variant. Native graph storage makes pattern match fast even without precomputation. Single source of derived truth — no PG ↔ Neo4j drift on inferred edges. |
| Per-pair caches (strength signals) |
PG only — UserContactInteractionStats, WorkOverlapCache, etc. |
Per-pair lookups don't benefit from graph structure. PG keyed lookups are fastest, and round-tripping to Neo4j adds Bolt-protocol overhead without value. |
| Rollup tables (list-view aggregations) |
PG only — SharedListNetworkSummary, ContactReachability |
Hot-path read budget (50ms p99) requires a single indexed Postgres lookup. Aggregation is SQL's home turf, and a Neo4j network round-trip alone exceeds the budget. |
The duplication is in source data, not derived edges
Adopting Neo4j means duplicating raw facts (work history, education, UCCs) across two databases. It does not mean duplicating derived edges (coemployment, coeducation) — those move entirely into Neo4j. The unavoidable cost is the sync layer for source data; the avoided cost is precomputed-edge maintenance in the canonical PG plan. Cross-DB sync is the largest engineering risk in this variant — see §7 for sync strategies.
3.1 Schema setup
// Constraints (uniqueness + creates indexes automatically)
CREATE CONSTRAINT contact_id IF NOT EXISTS FOR (c:Contact) REQUIRE c.id IS UNIQUE;
CREATE CONSTRAINT org_id IF NOT EXISTS FOR (o:Organization) REQUIRE o.id IS UNIQUE;
CREATE CONSTRAINT school_id IF NOT EXISTS FOR (s:School) REQUIRE s.id IS UNIQUE;
CREATE CONSTRAINT user_id IF NOT EXISTS FOR (u:TeamUser) REQUIRE u.id IS UNIQUE;
// Additional property indexes
CREATE INDEX contact_email IF NOT EXISTS FOR (c:Contact) ON (c.primary_email);
CREATE INDEX worked_at_dates IF NOT EXISTS FOR ()-[r:WORKED_AT]-() ON (r.date_from, r.date_to);
CREATE INDEX shared_in_collection IF NOT EXISTS FOR ()-[r:SHARED_IN]-() ON (r.collection_id);
3.2 Node labels
| Label | Properties | Sourced from Postgres |
|---|---|---|
Contact | id, name, primary_email | contacts |
Organization | id, name, linkedin_url | organizations |
School | id, name | canonical schools table |
TeamUser | id, email | users |
Collection | id, network_id | collections |
3.3 Relationship types
| Type | From → To | Properties | Source |
|---|---|---|---|
WORKED_AT | Contact → Organization | date_from (date), date_to (date), title, is_current (bool) | contact_work_experiences |
STUDIED_AT | Contact → School | date_from, date_to, degree | contact_education |
SHARED_IN | TeamUser → Contact | collection_id, source, connected_at | UCC.shared |
CURRENT_AT | Contact → Organization | started_at, title | derived: WORKED_AT where date_to IS NULL |
Coemployment is implicit
Like the AGE variant, coemployment edges are not materialized. Neo4j computes them at query time via pattern match: (c1:Contact)-[w1:WORKED_AT]->(o:Organization)<-[w2:WORKED_AT]-(c2:Contact) with date-overlap predicate. Native graph storage makes this fast even without materialization.
3.4 Native typing wins here
Unlike AGE's jsonb properties, Neo4j has native types: date, datetime, bool, int, string. Date-overlap predicates run on indexed typed properties; performance is materially better than jsonb-based pattern matching.
4. Query patterns
Side-by-side: canonical Postgres (recursive CTE) vs Neo4j (Cypher).
4.1 Direct connection (1-hop)
Stays in Postgres. Neo4j wouldn't add value for keyed lookups.
-- Canonical Postgres (recommended for 1-hop)
SELECT contact_id, current_title
FROM collection_org_current_shared_contacts
WHERE collection_id = $1 AND organization_id = $2;
4.2 Work-overlap intro path (2-hop)
Either substrate works; Postgres is faster for 2-hop in practice once ContactCoemploymentEdge is precomputed.
Canonical Postgres (precomputed)
SELECT via_contact_id, other_contact_id,
organization_id, overlap_from, overlap_to
FROM contact_coemployment_edges
WHERE collection_id = $1
AND other_contact_id IN (
SELECT contact_id
FROM collection_org_current_shared_contacts
WHERE collection_id = $1
AND organization_id = $2);
Neo4j Cypher
MATCH (u:TeamUser)
-[s:SHARED_IN {collection_id: $cid}]->
(via:Contact)
-[w1:WORKED_AT]->(bridge:Organization)
<-[w2:WORKED_AT]-(target:Contact)
-[:CURRENT_AT]->
(targetOrg:Organization {id: $oid})
WHERE w1.date_from <= w2.date_to
AND w2.date_from <= w1.date_to
RETURN via, target, bridge, w1, w2
ORDER BY w1.date_to DESC;
4.3 Extended intro path (3-hop and 4-hop)
This is where Neo4j wins decisively. Variable-length path matching with a depth bound is one Cypher line; the Postgres recursive CTE equivalent is dozens of lines and starts to fall over at depth 4.
// 4-hop intro path with date-overlap predicate
MATCH path = (u:TeamUser {id: $uid})
-[:SHARED_IN {collection_id: $cid}]->
(start:Contact)
-[:WORKED_AT*1..4]-(:Contact)
-[:CURRENT_AT]->
(target:Organization {id: $oid})
WHERE all(rel IN [r IN relationships(path) WHERE type(r) = 'WORKED_AT']
WHERE rel.date_from IS NOT NULL)
RETURN path, length(path) AS hops
ORDER BY hops ASC
LIMIT 25;
4.4 Weighted shortest-path
Native, sub-second.
// Find the cheapest intro path to anyone at TargetCo
MATCH (u:TeamUser {id: $uid}),
(target:Organization {id: $oid})
CALL apoc.algo.dijkstra(u, target, 'WORKED_AT|SHARED_IN|CURRENT_AT', 'weight')
YIELD path, weight
RETURN path, weight
ORDER BY weight ASC
LIMIT 5;
Edge weights can be set per relationship at write time (e.g., SHARED_IN = 1, WORKED_AT overlap = 3, longer time gaps = higher weight). Adjust weights freely; queries adapt without changing structure.
4.5 Network analytics
Neo4j Graph Data Science library ships PageRank, betweenness centrality, community detection, etc. Useful for "find super-connectors in my network" or "cluster contacts by community" features. Out of scope for v1 but available as a future product capability.
5. Heuristic mapping
Every signal the system computes — every strength clause, the reachability rollup, key connection picker, and all 007 connection-path types — mapped to where it actually executes in the Neo4j variant. The point of this section: most heuristics still execute in Postgres. Neo4j earns its keep on traversal-shaped queries and unlocks graph algorithms (centrality, community detection) that no other substrate can match.
Bottom line for this section
Of 16 strength clauses + reachability + key connection + 6 connection-path types, Neo4j substantially helps with ~9 — same headline as AGE — but with native shortest-path, weighted traversal, and Graph Data Science algorithms that AGE doesn't have. The gap shows up at depth 4+ and on analytics queries; under depth 3 the difference is mostly syntactic.
5.1 Strength heuristics (008 — Warm / Known / Cold)
Counter-aggregation clauses stay in Postgres. Neo4j is invoked only when a clause needs graph traversal. The Postgres ↔ Neo4j boundary is identical to the AGE variant; only the graph-side syntax differs.
| ID | Clause | Source data | Substrate | Neo4j involvement |
|---|---|---|---|---|
| W1 | 2-way exchange in 12mo, 5+ each direction | UserContactInteractionStats counters |
PG only | None — counter check |
| W2 | Sustained 2-year back-and-forth (low volume) | UserContactInteractionStats.email_activity_bitmap_12q |
PG only | None |
| W3 | Recent email (3mo) + response + work OR LinkedIn connection | Stats + work overlap + LinkedIn graph | Hybrid | Neo4j answers the LinkedIn-1st-degree branch via MATCH (u)-[:LINKEDIN_CONNECTED]-(c). Single round-trip Bolt query; ~10–20ms latency including network. PG handles email recency. |
| W4 | 1+ calendar meeting in last 180d | UserContactInteractionStats.meeting_count_180d |
PG only | None |
| W5 | Same-team / small company overlap | Coemployment + org employee count | Hybrid | Cypher: MATCH (a:Contact)-[:WORKED_AT]-(o)-[:WORKED_AT]-(b:Contact) WITH a, b, o, size((o)<-[:WORKED_AT]-()) AS hc WHERE hc < 50. Native graph storage makes the size check fast. |
| K1 | Occasional 2-way (3–10 emails total) | UserContactInteractionStats |
PG only | None |
| K2 | Inbound non-newsletter within 24mo | UserContactInteractionStats |
PG only | None |
| K3 | Meeting 180d–24mo ago | UserContactInteractionStats.meeting_count_24mo |
PG only | None |
| K4 | Same-team / small company overlap (looser window) | Same as W5 with relaxed predicates | Hybrid | Same Cypher pattern as W5 with looser threshold. |
| K5 | Same company + location + function | Work history with location + role filters | Hybrid | Multi-property pattern match: MATCH (a)-[w1:WORKED_AT]-(o)-[w2:WORKED_AT]-(b) WHERE w1.location = w2.location AND w1.function = w2.function. Native typing of properties (vs jsonb in AGE) makes this faster. |
| K6 | LinkedIn connection AND (old work overlap OR shared school) | LinkedIn graph + coemployment + edu | Hybrid | One Cypher query: MATCH (u)-[:LINKEDIN_CONNECTED]-(c) WITH u, c MATCH (u)-[:WORKED_AT|STUDIED_AT]-()-[:WORKED_AT|STUDIED_AT]-(c) RETURN .... Native multi-edge-type traversal. |
| C1 | One-way outbound only (no reply) | UserContactInteractionStats |
PG only | None |
| C2 | Inbound newsletter-style only | UserContactInteractionStats with newsletter_flag |
PG only | None |
| C3 | LinkedIn 1st-degree with no other signal | LinkedIn edge + absence of other signals | Hybrid | Neo4j confirms LinkedIn edge exists; PG confirms absence of email/meeting/coemployment. Cypher's EXISTS + negation handles the graph-side absence check natively. |
| C4 | Past shared employment >5yr ago, no recent contact | Coemployment with date filter + email recency | Hybrid | Cypher with native date math: MATCH (u)-[w:WORKED_AT]-()-[:WORKED_AT]-(c) WHERE w.date_to < date() - duration('P5Y'). Date types are first-class in Neo4j. |
| C5 | Default fallback | Always | PG only | None |
5.2 Reachability (008 — High / Medium / Low / None)
Reachability rolls up strength tiers across all team members for a given (contact, collection): count Warm / Known / Cold owners, apply tier rules, write to ContactReachability.
- Substrate: PG only
- Why: pure aggregation of UCC strength columns. SQL
GROUP BYwithFILTERclauses. - Neo4j involvement: none. Aggregation is SQL's home turf; round-tripping to Neo4j costs latency without value.
5.3 Key Connection (008 — best person on team to ask for an intro)
Per (contact, collection), pick the team member with the strongest tie. Two flavors:
5.3.1 V1 — direct UCC pick
- Substrate: PG only
- Sort UCCs by tier (Warm > Known > Cold) then by
strength_score; take first row. - Neo4j adds nothing; PG
ORDER BY+LIMIT 1is fastest.
5.3.2 V2 — best weighted path including indirect intros
- Substrate: Neo4j wins
- Cypher with
shortestPath()orapoc.algo.dijkstrafinds the cheapest weighted route from any team user to the contact, allowing intermediate via-contacts. - Native algorithm; sub-200ms even at depth 4. Equivalent in vanilla PG would be a custom recursive CTE with weighted cost accumulation — possible but ugly.
- Sample query:
MATCH (u:TeamUser), (c:Contact {id: $cid}) CALL apoc.algo.dijkstra(u, c, 'WORKED_AT|STUDIED_AT|SHARED_IN', 'cost') YIELD path, weight RETURN path, weight ORDER BY weight ASC LIMIT 1.
5.4 Connection paths (007 — direct + intro paths)
This is where Neo4j shines structurally. Native graph storage, native shortest-path, native multi-edge-type traversal. Latency at depth 3–4 is materially better than AGE.
| Path type | Spec phase | Substrate | Neo4j note |
|---|---|---|---|
| Direct connection (1-hop) | 007 v1 | PG rollup | List view reads SharedListNetworkSummary; Neo4j round-trip overhead is a loss at depth 1. |
| Work-overlap intro path (2-hop) | 007 v1 | Neo4j or PG | Native pattern match. PG with precomputed edges is competitive at depth 2; Neo4j wins at depth 3+. |
| Education-overlap intro path | 007 v2 | Neo4j primary | Adding STUDIED_AT as a bridge type is one Cypher line. No new edge table, no new worker class, no migration. |
| Co-investment intro path | 007 v3 | Neo4j primary | Same shape with INVESTED_IN edge label. |
| Investor → Employee path | 007 v3 | Neo4j primary | Mixed-edge pattern is one query: (investor)-[:INVESTED_IN]->(c)<-[:CURRENT_AT]-(employee). |
| Board peer path | 007 v3 | Neo4j primary | BOARD_OF as the bridge edge. |
| 3-hop traversal (any combo) | future feature | Neo4j wins decisively | Variable-length match: (u)-[:SHARED_IN]->(via)-[:WORKED_AT|STUDIED_AT|INVESTED_IN*1..3]-(target). Bidirectional search prunes the explored space. |
| 4-hop traversal | future feature | Neo4j only feasible option | ~600ms vs ~15s in vanilla PG and ~3s in AGE. This is the structural win. |
| Weighted shortest path / top-K paths | future feature | Neo4j wins | Native Dijkstra via apoc.algo.dijkstra; native top-K via apoc.algo.kShortestPath. Sub-second at typical scale. |
| Centrality / super-connectors | future feature | Neo4j only | Graph Data Science library: gds.pageRank.stream(...), gds.betweenness.stream(...). Not available in AGE or PG. |
| Community detection | future feature | Neo4j only | gds.louvain or gds.labelPropagation. Useful for "cluster contacts in my collection by community" features. |
5.5 Honest summary
Of 23 distinct queries the system needs to answer in v1/v2 (16 strength clauses + reachability + key connection + 6 path types), Neo4j meaningfully helps on ~9 — same as AGE. The Neo4j-specific upside shows up beyond v2:
- 4-hop traversal becomes feasible (~600ms vs infeasible in PG, slow in AGE).
- Native shortest-path with weights for Key Connection v2 and best-intro-path features.
- Graph Data Science library opens centrality, community detection, PageRank — capabilities that don't exist in AGE or PG.
The decision becomes clear: Neo4j is worthwhile only if 4-hop traversal, weighted shortest-path, or graph algorithms become product features within the next 2–3 quarters. For the canonical 008 + 007 v1/v2 scope, the gap between Neo4j and AGE is mostly syntactic; the operational overhead is what determines the answer.
6. Caching architecture
The layered cache strategy from canonical 008 §6 still applies. Same per-cache reasoning, different placement.
| Layer | Canonical 008 | Neo4j variant |
|---|---|---|
| Layer 2 — per-pair caches | Postgres tables | Same — Postgres tables. Per-pair keyed lookups don't benefit from graph structure. |
| Layer 3 — cross-pair edges | Postgres tables (ContactCoemploymentEdge) |
Neo4j relationships (computed at query time from WORKED_AT patterns). Or precomputed as :COWORKER_OF if read latency demands. |
| Layer 4 — list rollups | Postgres tables | Same — Postgres tables. Aggregation stays in SQL. |
Neo4j's role is narrow: it answers traversal-shaped questions. Aggregation, filtering by enum, sorting by computed columns — all stay in Postgres. The result is a hybrid where each query lands in the substrate that handles it best.
7. Sync layer (the tricky part)
The biggest engineering cost in this variant. Postgres is source of truth; Neo4j must stay in sync. Three approaches, increasing complexity.
6.1 Dual-write from Sidekiq workers
The simplest approach. Every after_commit hook that writes to Postgres also enqueues a Sidekiq job that writes to Neo4j.
class ContactWorkExperience < ApplicationRecord
after_commit :enqueue_neo4j_sync, on: [:create, :update, :destroy]
def enqueue_neo4j_sync
Neo4j::WorkExperienceSyncWorker.perform_async(id, action_taken)
end
end
- Pros: easy to reason about; all sync logic in Sidekiq; same retry semantics as other workers.
- Cons: dual-write race conditions (Postgres committed but Neo4j write failed). Need reconciliation jobs to detect drift. If Sidekiq is backed up, Neo4j lags.
6.2 Change-data-capture (CDC) via Debezium / logical replication
Postgres logical replication streams every committed change to a CDC consumer that writes to Neo4j.
- Pros: at-most-once delivery semantics from the WAL; no race conditions; lag is observable.
- Cons: significant ops complexity; new infrastructure (Debezium, Kafka or equivalent); CDC schema mapping per table.
6.3 Periodic full rebuild
Nightly cron drops the Neo4j graph and rebuilds from scratch by reading Postgres.
- Pros: simplest; correctness guaranteed.
- Cons: stale by up to 24h; rebuild may take hours at scale.
6.4 Recommended: dual-write + nightly reconciliation
Start with 6.1 (dual-write) for write path. Add a nightly reconciliation job that compares row counts and sample-checks key entities; flags drift. Move to CDC only if dual-write proves unstable in production.
| Approach | Setup effort | Ongoing ops | Lag | Drift risk |
|---|---|---|---|---|
| Dual-write | 3–5 days | Low | Seconds to minutes | Medium |
| CDC | 3–4 weeks | High | Sub-second | Low |
| Periodic rebuild | 2 days | Low | Up to 24h | Resolved nightly |
8. Operational envelope
| Metric | Canonical 008 | Neo4j variant | Delta |
|---|---|---|---|
| Storage per collection (Postgres + Neo4j combined) | ~5 MB (v1) | ~5 MB (PG) + ~50 MB (Neo4j) = ~55 MB | 11× larger |
| List-view p99 | ~30 ms | ~30 ms (still hits Postgres rollup) | Same |
| Drill-in 2-hop p99 | ~80 ms | ~150 ms (Neo4j Bolt round-trip) | ~2× slower at depth 2 |
| Drill-in 3-hop p99 | ~1.5 s | ~250 ms | ~6× faster |
| Drill-in 4-hop p99 | ~15 s (infeasible) | ~600 ms | 25× faster |
| Shortest-path | Custom CTE work, ~1–3 s | Native Dijkstra, ~150 ms | Native support is the win |
| Network analytics (centrality, etc.) | Not feasible | Native via Graph Data Science library | Unlocks new capabilities |
| Write amplification | 1–30 UPSERTs (Postgres) | 1–30 UPSERTs (PG) + parallel Neo4j writes | Doubled write paths |
| New infrastructure | None | Neo4j cluster + sync pipeline | Significant |
| Monthly hosting cost (rough) | ~$0 incremental | ~$200–800/month for managed Neo4j (Aura) | Recurring |
9. Migration cost
| Phase | Work | Effort |
|---|---|---|
| 1. Provision Neo4j | Choose hosting (Neo4j Aura managed cloud, self-hosted on EC2/k8s, or Memgraph if avoiding the Neo4j brand). Set up dev/staging/prod clusters. HA configuration. Backup schedule. | 1–2 weeks |
| 2. Driver + connection pooling | Add neo4j-ruby-driver to Gemfile. Configure connection pool (Bolt protocol, persistent connections). Wire credentials through Rails secrets. |
2–3 days |
| 3. Schema definition + constraints | Write Cypher migration files (uniqueness constraints, indexes). Build a migration runner equivalent to db:migrate for Neo4j. |
3–5 days |
| 4. One-time bulk import | Read Postgres tables (contacts, organizations, work experiences, education, UCCs). Emit Cypher CREATE/MERGE statements. Load into Neo4j. Verify counts. | 5–7 days (depends on data volume) |
| 5. Sync layer (dual-write + reconciliation) | Sidekiq workers per source table. Reconciliation job. Drift monitoring. | 5–7 days |
| 6. Service-layer query migration | Build a Neo4j::Query service. Migrate 3-hop drill-in to Cypher. Add shortest-path queries. Keep 1-hop and 2-hop in Postgres. |
5–7 days |
| 7. Test fixtures | Build test helpers that populate Neo4j from FactoryBot factories. RSpec teardown that cleans Neo4j between tests. | 3–5 days |
| 8. Observability | Wire Neo4j metrics into existing dashboards. Alert on Neo4j down, sync lag, query latency degradation. | 3 days |
| 9. Production rollout + canary | Dual-read for canary period (compare Cypher vs CTE results). Promote when result parity hits 99.9%+. Tear down Postgres-side traversal code. | 5 days + 2-week canary |
Total: 6–10 weeks engineering, plus a 2-week canary. Add 2–4 weeks for any team upskilling on Cypher.
10. What you give up
| Concern | Reality |
|---|---|
| Two databases to operate | Backups, monitoring, security review, on-call rotation, version upgrades — all double. Even with managed Neo4j (Aura), there's still a separate vendor relationship, separate billing, separate auth. |
| Sync drift risk | Dual-write means Postgres and Neo4j can diverge if a write fails on one side. Reconciliation jobs catch most drift but add latency. CDC eliminates the risk but adds significant complexity. |
| Hosting cost | Managed Neo4j Aura: ~$200/month for entry, ~$800/month for production-grade. Self-hosted: free but requires ops effort. Either way, recurring cost the canonical plan doesn't have. |
| Cypher learning curve | Two query languages in the codebase. Engineers context-switch. Code reviews need both expertises. New hires need both. |
| Backups | Neo4j backup is its own pipeline (no pg_dump equivalent across both databases). Restoring to a consistent snapshot across both stores requires careful coordination. |
| Cross-DB transactions | No way to atomically write to both Postgres and Neo4j. Have to handle "Postgres committed, Neo4j failed" as a recoverable inconsistency. |
| Schema migrations across two stores | Adding a new entity type means migrations in both. Adding a property means coordinating both. Easy to forget one. |
| License (Enterprise features) | Community edition covers core graph + Cypher. Enterprise features (clustering for HA, hot backups, fine-grained security, Graph Data Science scale) require a license. Aura includes Enterprise; self-hosted requires explicit licensing. |
9.1 The honest summary
Neo4j is the right answer when graph traversal is core to the product, not when it's a side feature. If 4+ hop traversal, weighted shortest-path, and graph algorithms are 30%+ of the read pattern, Neo4j pays for itself. If they're occasional drill-in queries, the operational cost outweighs the latency win — that's when AGE or vanilla Postgres CTEs are the right answer.
11. Mini-ADR
- Context
- 008 has confirmed product requirements for 4+ hop intro paths, weighted shortest-path ranking, and network analytics (centrality / super-connector identification). Postgres recursive CTEs cannot meet these requirements at the latency budget. Apache AGE is closer but still slower than dedicated graph storage at depth 4+. Neo4j is the substrate of choice for traversal-heavy workloads.
- Decision
- Add Neo4j as a derived graph store. Postgres remains the source of truth — every write goes to Postgres first. Sidekiq workers dual-write to Neo4j on
after_commit. The service layer routes queries: aggregations and per-pair lookups stay in Postgres; traversal > 2 hops, shortest-path, and graph algorithms hit Neo4j. - Consequences
-
- + 4-hop drill-in becomes feasible (~600ms vs ~15s in Postgres).
- + Native shortest-path and weighted Dijkstra unlock ranking features.
- + Graph Data Science library opens centrality and community-detection features.
- + Graph schema is more readable (Cypher).
- − Two databases to operate; two backups; two on-call surfaces.
- − Sync drift between Postgres and Neo4j is a real failure mode.
- − Hosting cost: ~$200–800/month for managed Neo4j Aura.
- − 6–10 weeks engineering + 2-week canary to ship.
- − Team upskilling on Cypher is required.
- Reversibility
- Low. Once Neo4j is in production, removing it requires migrating all graph queries back to Postgres (recursive CTEs) and decommissioning the cluster. ~4 weeks of work plus production careful-rollback. Treat the decision as semi-permanent.
- Trigger to revisit
- If graph queries fall to <5% of read traffic; if Postgres alone proves sufficient (e.g., AGE matures and matches Neo4j performance for our workload); if hosting cost outpaces feature value.
- Adopt only if
- (a) 4+ hop traversal or weighted shortest-path is a confirmed product requirement, (b) team has appetite for operating two databases, (c) hosting budget includes ~$500/month line item indefinitely, (d) graph algorithms (centrality, community detection) are on the product roadmap.