GET-109 · Architecture variant

Apache AGE: graph-in-Postgres

Architectural variant of the Network Intelligence System using Apache AGE — Postgres extension that adds openCypher graph queries to your existing database. No new infrastructure; one query language gain, one ecosystem-maturity loss.

Drafted 2026-04-29 · Status: review · Audience: Getro engineering team · Sibling: Neo4j variant

1. Overview

When this variant wins

Choose Apache AGE if you need 3-hop traversal or weighted shortest-path queries soon, but want to avoid operating a second database. AGE lives inside your Postgres instance as an extension — same backups, same connection pool, same Rails app — but adds Cypher syntax for graph queries. The trade-off is ecosystem maturity: AGE is real but small, and not every managed Postgres provider supports it.

Everything in the canonical 008 spec stays the same — InteractionEvent ingestion, OAuth, Findem enrichment, rule engine, UI. What changes: the storage substrate for connection-path edges and graph traversal queries. Per-pair caches and rollups can live in either standard Postgres tables or AGE-managed graph storage; the choice is per-cache, not all-or-nothing.

1.1 What Apache AGE actually is

  • Postgres extension (installed via CREATE EXTENSION age;) that adds graph database semantics to a Postgres database. Open-source, Apache 2.0.
  • Same database, same connection: graph data is stored in tables AGE manages internally (ag_label_vertex, ag_label_edge) but your app reads/writes via Cypher inside SQL queries.
  • Cypher syntax inside cypher() function calls — you embed openCypher queries inside SQL. Not a separate query language at the protocol level.
  • One repo to deploy, one DB to back up, one credential to rotate.

1.2 Why this variant exists

The canonical 008 plan handles 1+2 hop queries beautifully but degrades for 3-hop traversal (recursive CTEs explode at branching factor) and lacks native shortest-path. AGE is the smallest possible step toward graph capability without adopting a separate graph database.

2. System architecture

The pipeline shape is the same as canonical 008. The change is purely at the storage layer: graph edges and certain caches live in AGE-managed graph storage instead of vanilla Postgres tables. Workers still write via after_commit; reads still happen via the same Rails service layer; the only difference is whether a query goes through SQL or through cypher().

flowchart LR subgraph App["Rails app"] Hooks[after_commit hooks] Workers[Sidekiq workers] Services[Service layer
SQL + Cypher mix] end subgraph PG["Postgres (one instance)"] Source[Source tables
contact_work_experiences
interaction_events] PgTables[Postgres rollups
SharedListNetworkSummary
UserContactInteractionStats] AGE[(AGE-managed graph
nodes + edges)] end Source -- writes --> Hooks Hooks --> Workers Workers -- "writes via cypher() or SQL" --> AGE Workers -- writes --> PgTables Services -- "Cypher (graph traversal)" --> AGE Services -- "SQL (rollups, aggregation)" --> PgTables Services -- "SQL (source reads)" --> Source classDef pg fill:#fef3c7,stroke:#b45309 classDef age fill:#f3e8ff,stroke:#7e22ce class Source,PgTables pg class AGE age

2.1 What's same as canonical 008

  • Source ingestion (Phase 1–3, 6, 7) is identical.
  • Per-pair caches (WorkOverlapCache, UserContactInteractionStats) stay as standard Postgres tables — no graph value-add for per-pair lookups.
  • List-view rollup (SharedListNetworkSummary) stays as a standard Postgres table — aggregation is SQL's home turf.
  • Reachability rollup (ContactReachability) stays as a standard Postgres table.

2.2 What's different

  • Cross-pair edges (ContactCoemploymentEdge, etc.) stored as AGE graph edges instead of Postgres tables.
  • Drill-in queries use Cypher via cypher() instead of recursive CTEs.
  • Shortest-path queries become available via Cypher's built-in shortestPath().
  • 3-hop traversal becomes idiomatic instead of a recursive-CTE workaround.

3. Data model (graph schema)

One AGE graph named getro_network. Five vertex labels, four edge labels for v1.

3.0 What lives where

Before reading the schema, anchor on the storage boundary. Postgres remains the source of truth. AGE adds graph storage as a derived index for traversal queries — it does not replace the relational tables.

Layer Where it lives Why
Source data
(facts about the world)
contact_work_experiences (PG)
contact_education (PG)
UCC.shared (PG)
+ duplicated as :WORKED_AT, :STUDIED_AT, :SHARED_IN edges in the AGE graph
PG is required for Rails ActiveRecord, Findem sync, and every non-graph feature. AGE gets its own copy because Cypher pattern matching needs graph-shaped storage. Same physical database, different storage layout. Sync via Sidekiq workers on after_commit.
Derived edges
(inferred from source)
AGE only — computed at query time via Cypher pattern matching, OR materialized as :COWORKER_OF edges if read latency demands. No ContactCoemploymentEdge Postgres table in this variant. Cypher's pattern match on :WORKED_AT edges replaces precomputation. Single source of derived truth — no PG ↔ AGE drift on inferred edges.
Per-pair caches
(strength signals)
PG only — UserContactInteractionStats, WorkOverlapCache, etc. Per-pair lookups don't benefit from graph structure. PG keyed lookups are fastest.
Rollup tables
(list-view aggregations)
PG only — SharedListNetworkSummary, ContactReachability Hot-path read budget (50ms p99) requires a single indexed Postgres lookup. Aggregation is SQL's home turf.

The duplication is in source data, not derived edges

This is the most-misread aspect of the variant. Adopting AGE means duplicating raw facts (work history, education, UCCs) across two storage layouts in the same Postgres instance. It does not mean duplicating derived edges (coemployment, coeducation) — those move entirely into AGE. The unavoidable cost is sync of source data; the avoided cost is precomputed-edge maintenance in the canonical PG plan.

3.1 Schema setup

-- One-time graph creation
LOAD 'age';
SET search_path = ag_catalog, "$user", public;
SELECT create_graph('getro_network');

-- Vertex labels
SELECT create_vlabel('getro_network', 'Contact');
SELECT create_vlabel('getro_network', 'Organization');
SELECT create_vlabel('getro_network', 'School');
SELECT create_vlabel('getro_network', 'TeamUser');
SELECT create_vlabel('getro_network', 'Collection');

-- Edge labels
SELECT create_elabel('getro_network', 'WORKED_AT');
SELECT create_elabel('getro_network', 'STUDIED_AT');
SELECT create_elabel('getro_network', 'SHARED_IN');     -- TeamUser→Contact via Collection
SELECT create_elabel('getro_network', 'CURRENT_AT');    -- Contact→Org for current employment

3.2 Vertex properties

LabelPropertiesSource
Contactid, name, primary_email_idOne per Getro Contact row
Organizationid, name, linkedin_urlOne per Organization row
Schoolid, nameOne per canonical school
TeamUserid, emailOne per User row
Collectionid, network_idOne per Collection row

3.3 Edge properties

LabelFrom → ToPropertiesSource
WORKED_ATContact → Organizationdate_from, date_to, title, is_currentOne per contact_work_experience row
STUDIED_ATContact → Schooldate_from, date_to, degreeOne per contact_education row
SHARED_INTeamUser → Contactcollection_id, source, connected_atOne per UCC.shared row
CURRENT_ATContact → Organizationstarted_at, titleDerived: WORKED_AT where date_to IS NULL

Coemployment is implicit, not stored

Unlike the canonical 008 plan that materializes ContactCoemploymentEdge rows, the AGE variant computes coemployment at query time via Cypher's pattern matching. Two contacts who worked at the same Organization with overlapping date ranges are connected by traversing (c1)-[:WORKED_AT]->(o)<-[:WORKED_AT]-(c2) with a date-overlap predicate. Saves write amplification; costs read latency.

3.4 Indexes

AGE supports B-tree indexes on vertex/edge properties via standard Postgres syntax:

CREATE INDEX ON getro_network."Contact" USING BTREE ((properties->>'id'));
CREATE INDEX ON getro_network."Organization" USING BTREE ((properties->>'id'));
CREATE INDEX ON getro_network."WORKED_AT" USING BTREE ((properties->>'organization_id'),
                                                       (properties->>'date_from'),
                                                       (properties->>'date_to'));

Property-based indexes are functional indexes on the underlying properties jsonb column. Performance is acceptable but slightly worse than native typed columns in vanilla Postgres tables.

4. Query patterns

Side-by-side comparison: canonical 008 (recursive CTEs in vanilla Postgres) vs the AGE variant (Cypher inside Postgres). Same physical database, different syntax.

4.1 Direct connection (1-hop)

Canonical Postgres

SELECT contact_id, current_title
FROM   collection_org_current_shared_contacts
WHERE  collection_id = $1
  AND  organization_id = $2;

Apache AGE

SELECT * FROM cypher('getro_network', $$
  MATCH (u:TeamUser)-[s:SHARED_IN]->(c:Contact)
        -[ca:CURRENT_AT]->(o:Organization)
  WHERE s.collection_id = $1
    AND o.id = $2
  RETURN c.id, ca.title
$$, $1::int, $2::int) AS (id agtype, title agtype);

For 1-hop, AGE is a step backward — Cypher syntax adds noise without traversal value. Recommendation: keep direct-connection queries in vanilla Postgres tables.

4.2 Work-overlap intro path (2-hop)

Canonical Postgres

-- Reads precomputed ContactCoemploymentEdge
SELECT via_contact_id, other_contact_id,
       organization_id, overlap_from, overlap_to
FROM   contact_coemployment_edges
WHERE  collection_id = $1
  AND  other_contact_id IN (
         SELECT contact_id
         FROM   collection_org_current_shared_contacts
         WHERE  collection_id = $1
           AND  organization_id = $2)
ORDER BY overlap_to DESC;

Apache AGE

-- Computes overlap at query time
SELECT * FROM cypher('getro_network', $$
  MATCH (u:TeamUser)-[s:SHARED_IN]->(via:Contact)
        -[w1:WORKED_AT]->(bridge:Organization)
        <-[w2:WORKED_AT]-(target:Contact)
        -[:CURRENT_AT]->(targetOrg:Organization)
  WHERE s.collection_id = $1
    AND targetOrg.id = $2
    AND w1.date_from <= w2.date_to
    AND w2.date_from <= w1.date_to
  RETURN via.id, target.id, bridge.id,
         w1.date_from, w1.date_to
  ORDER BY w1.date_to DESC
$$, ...) AS (...);

AGE is more readable for 2-hop. But it's also doing more work — every query recomputes overlaps that the canonical plan precomputes. Latency parity at depth 2 is roughly equal once indexes are tuned.

4.3 Extended intro path (3-hop)

Canonical Postgres (recursive CTE)

WITH RECURSIVE paths AS (
  SELECT contact_id AS via, 0 AS depth, ARRAY[]::int[] AS path
  FROM   user_contact_collections
  WHERE  collection_id = $1
    AND  source = 'shared'
  UNION ALL
  SELECT edge.other_contact_id, p.depth + 1,
         p.path || edge.organization_id
  FROM   paths p
  JOIN   contact_coemployment_edges edge
         ON edge.via_contact_id = p.via
  WHERE  p.depth < 3
)
SELECT * FROM paths
WHERE via IN (SELECT contact_id
              FROM collection_org_current_shared_contacts
              WHERE collection_id = $1
                AND organization_id = $2);

Apache AGE

SELECT * FROM cypher('getro_network', $$
  MATCH p = (u:TeamUser)-[:SHARED_IN]->
            (via:Contact)
            -[:WORKED_AT*1..3]->(bridge:Organization)
            <-[:WORKED_AT*1..3]-(target:Contact)
            -[:CURRENT_AT]->(o:Organization {id: $2})
  WHERE all(rel IN relationships(p)
            WHERE rel.date_from IS NOT NULL)
  RETURN p
  LIMIT 50
$$, ...) AS (path agtype);

This is where AGE earns its keep. The 3-hop CTE is verbose, brittle, and gets uglier with weights or filters. Cypher's path syntax is a clear improvement. Performance: AGE is ~2× faster than the recursive CTE at depth 3 with a typical branching factor.

4.4 Weighted shortest-path

SELECT * FROM cypher('getro_network', $$
  MATCH p = shortestPath(
    (u:TeamUser {id: $1})-[*..4]-(target:Contact)
    -[:CURRENT_AT]->(o:Organization {id: $2})
  )
  RETURN p, length(p) AS hops
$$, $1::int, $2::int) AS (path agtype, hops agtype);

AGE supports shortestPath(). Weighted variants require Dijkstra-style accumulator patterns; not as elegant as Neo4j but doable.

5. Heuristic mapping

Every signal the system computes — every strength clause, the reachability rollup, key connection picker, and all 007 connection-path types — mapped to where it actually executes in the AGE variant. The point of this section: most heuristics don't change at all. AGE earns its keep on a specific subset where graph traversal or pattern matching is the natural shape.

Bottom line for this section

Of 16 strength clauses + reachability + key connection + 6 connection-path types, AGE substantially helps with ~9: the W3/K6 LinkedIn-edge checks, the W5/K4/K5 small-org coemployment lookups, C3/C4 graph absence/presence checks, and all 6 connection-path types where graph traversal is the natural query. The remaining ~14 stay in vanilla Postgres because they're aggregation or counter-comparison queries.

5.1 Strength heuristics (008 — Warm / Known / Cold)

Each clause evaluates against an in-memory (team_user, contact) stats row. AGE only enters the picture when the clause needs a graph traversal (LinkedIn 1st-degree edge, coemployment shape, school-overlap fact).

ID Clause Source data Substrate AGE involvement
W1 2-way exchange in 12mo, 5+ each direction UserContactInteractionStats counters PG only None — pure counter check
W2 Sustained 2-year back-and-forth (low volume) UserContactInteractionStats.email_activity_bitmap_12q PG only None — bitmap analysis
W3 Recent email (3mo) + response + work OR LinkedIn connection Stats + WorkOverlapCache + LinkedIn graph Hybrid AGE answers the LinkedIn-1st-degree branch via MATCH (u)-[:LINKEDIN_CONNECTED]-(c); PG handles email recency. Cleaner than recursive CTE for the LinkedIn check.
W4 1+ calendar meeting in last 180d UserContactInteractionStats.meeting_count_180d PG only None — counter check
W5 Same-team / small company overlap WorkOverlapCache + org employee count Hybrid AGE finds coemployment edges where bridge org has <N total employees: MATCH (a:Contact)-[:WORKED_AT]-(o:Organization)-[:WORKED_AT]-(b:Contact) WHERE size((o)<-[:WORKED_AT]-()) < 50. PG can do this but joins are uglier.
K1 Occasional 2-way (3–10 emails total) UserContactInteractionStats PG only None
K2 Inbound non-newsletter within 24mo UserContactInteractionStats.email_inbound_nnl_count_24mo PG only None
K3 Meeting 180d–24mo ago UserContactInteractionStats.meeting_count_24mo PG only None
K4 Same-team / small company overlap (looser window) Same as W5 with looser threshold Hybrid Same Cypher pattern as W5 with relaxed predicate.
K5 Same company + location + function Work history with location + role filters Hybrid Cypher pattern: MATCH (a)-[w1:WORKED_AT]-(o)-[w2:WORKED_AT]-(b) WHERE w1.location = w2.location AND w1.function = w2.function. Multi-property pattern match is where Cypher reads cleaner than SQL JOINs.
K6 LinkedIn connection AND (old work overlap OR shared school) LinkedIn graph + WorkOverlapCache + EducationOverlapCache Hybrid One Cypher query expresses the OR-of-paths: MATCH (u)-[:LINKEDIN_CONNECTED]-(c) MATCH (u)-[:WORKED_AT|STUDIED_AT]-()-[:WORKED_AT|STUDIED_AT]-(c). Equivalent SQL is a UNION of 3 different join patterns.
C1 One-way outbound only (no reply) UserContactInteractionStats PG only None
C2 Inbound newsletter-style only UserContactInteractionStats.email_inbound_count_24mo with newsletter_flag filter PG only None — counter check with newsletter exclusion
C3 LinkedIn 1st-degree with no other signal LinkedIn edge + absence of other signals Hybrid AGE confirms the LinkedIn edge exists; PG confirms absence of email/meeting/coemployment signals. Cypher's EXISTS + negation pattern handles the "no other graph signal" check naturally.
C4 Past shared employment >5yr ago, no recent contact WorkOverlapCache with date filter + email recency Hybrid Cypher finds the old coemployment edge: MATCH (u)-[:WORKED_AT]-()-[:WORKED_AT]-(c) WHERE w.date_to < date() - 5*365 days. PG checks email recency.
C5 Default fallback Always PG only None

5.2 Reachability (008 — High / Medium / Low / None)

Reachability rolls up strength tiers across all team members for a given (contact, collection): count Warm / Known / Cold owners, apply tier rules, write to ContactReachability.

  • Substrate: PG only
  • Why: pure aggregation of UCC strength columns. SELECT collection_id, contact_id, COUNT(*) FILTER (WHERE strength_tier = 'warm') AS warm_count ...
  • AGE involvement: none. Aggregation is SQL's home turf; cypher() wrapping would slow this down without value.

5.3 Key Connection (008 — best person on team to ask for an intro)

Per (contact, collection), pick the team member with the strongest tie. The picker reads UCC strength tiers, applies a deterministic rule (Warm beats Known beats Cold; within tier, highest strength_score), writes ContactReachability.key_user_id.

  • Substrate: PG primary — the picker is a sort-and-pick on UCC rows.
  • AGE bonus opportunity: if you want "best path to the contact" (not just best direct UCC), AGE shines. Cypher's shortestPath() returns the cheapest weighted route through the graph — useful when the contact has no direct UCC owner but is reachable via 2-hop intro.
  • Honest assessment: V1 picks based on direct UCC; AGE doesn't help. V2 with weighted multi-hop key-connection lookup is where Cypher meaningfully wins.

5.4 Connection paths (007 — direct + intro paths)

This is where AGE genuinely earns its keep. Every 007 path type maps to a graph traversal, and Cypher pattern matching is more natural than recursive CTEs.

Path type Spec phase Substrate Cypher shape
Direct connection (1-hop) 007 v1 PG rollup List view reads SharedListNetworkSummary; AGE adds overhead without value at depth 1.
Work-overlap intro path (2-hop) 007 v1 AGE or PG MATCH (u)-[:SHARED_IN]->(via)-[w1:WORKED_AT]->(o)<-[w2:WORKED_AT]-(target)-[:CURRENT_AT]->(target_org) with date-overlap predicate. Cleaner than precomputed-edge JOIN.
Education-overlap intro path 007 v2 AGE primary Same pattern with STUDIED_AT as the bridge. Adding a new bridge type is one Cypher line; in PG it's a new edge table + worker.
Co-investment intro path 007 v3 AGE primary MATCH (u)-[:SHARED_IN]->(via)-[:INVESTED_IN]->(c)<-[:INVESTED_IN]-(target) — same shape, different edge label.
Investor → Employee path 007 v3 AGE primary Mixed-edge pattern: (u)-[:SHARED_IN]->(investor)-[:INVESTED_IN]->(c)<-[:CURRENT_AT]-(employee). Mixed-type traversal is where Cypher really beats recursive CTEs.
Board peer path 007 v3 AGE primary BOARD_OF as the bridge edge.
3-hop traversal (any combo) future feature AGE wins MATCH (u)-[:SHARED_IN]->(via)-[:WORKED_AT|STUDIED_AT|INVESTED_IN*1..3]-(target). One line. Recursive CTE equivalent is dozens of lines and degrades.
Weighted shortest path future feature AGE wins Native shortestPath(). Returns the cheapest weighted route through any combination of edges.

5.5 Honest summary

Of 23 distinct queries the system needs to answer (16 strength clauses + reachability + key connection + 6 path types), AGE meaningfully helps on ~9. The remaining 14 stay in Postgres because they're either counter aggregations (most strength clauses) or rollup queries (reachability, list view).

  • Pure win zone: 007 connection-path queries (5–6 queries). Cypher is genuinely cleaner here.
  • Modest win zone: 008 hybrid clauses (W3, W5, K4, K5, K6, C3, C4) — 7 queries. Cypher shaves complexity but isn't transformative.
  • No-help zone: counter-aggregation strength clauses (W1, W2, W4, K1–K3, C1, C2, C5) and reachability rollup. Stay vanilla Postgres.

The decision becomes clear: AGE is worthwhile if 007 connection-path traversal is the dominant new feature. If the system is mostly 008 strength scoring (per-pair counters), AGE adds infrastructure for limited value.

6. Caching architecture

The three-layer cache strategy from canonical 008 §6 still applies. The difference is which layer lives where:

LayerCanonical 008AGE variant
Layer 2 — per-pair caches (WorkOverlapCache, etc.) Postgres tables Same — Postgres tables (no graph value-add for keyed lookups)
Layer 3 — cross-pair edges (ContactCoemploymentEdge, etc.) Postgres tables AGE graph edges (computed at query time from WORKED_AT patterns; or materialized as :COWORKER_OF edges if read latency demands)
Layer 4 — list rollups (SharedListNetworkSummary) Postgres tables Same — Postgres tables (aggregation is SQL's home turf)

The critical decision: do we materialize coemployment edges in AGE, or compute them at query time?

  • Compute at query time (recommended for v1): smaller graph, no write amplification for derived edges, pattern matching does the work. Read latency scales with branching factor.
  • Materialize as :COWORKER_OF edges (option for v2): faster reads at depth 3+, larger graph, write amplification on every WORKED_AT change. Worth it only if 3-hop drill-in becomes a hot path.

7. Operational envelope

MetricCanonical 008AGE variantDelta
Storage per collection ~5 MB (v1) / ~25 MB (v2) ~15 MB (v1) / ~50 MB (v2) — graph metadata + jsonb properties 2–3× larger
List-view p99 ~30 ms ~30 ms (still hits Postgres rollup) Same
Drill-in 2-hop p99 ~80 ms ~120 ms (Cypher overhead) ~50% slower at depth 2
Drill-in 3-hop p99 ~1.5 s (recursive CTE) ~600 ms (Cypher pattern match) ~60% faster at depth 3
Drill-in 4-hop p99 ~15 s (likely infeasible) ~3 s (still slow but tractable) 5× faster at depth 4
Shortest-path Custom CTE work, ~1–3 s Native shortestPath(), ~400 ms Native support is the key win
Write amplification 1–30 UPSERTs per source change 1–10 UPSERTs (no derived-edge materialization) Lower if computing edges at read time
New infrastructure None AGE extension (must be supported by hosting) Small but real

8. Migration cost

Effort to move from canonical 008 to AGE variant. Assumes 008 v1 has shipped (Postgres tables with coemployment edges).

PhaseWorkEffort
1. Hosting feasibility Confirm AGE is supported by current Postgres hosting (RDS managed Postgres typically does NOT support AGE; self-hosted, Aurora with custom, or migration to a provider that supports it). If managed-only, this phase blocks everything. 1–3 days investigation + possible infrastructure migration
2. Extension install + version pin Install AGE, lock to a specific version. Update Rails db schema management to handle the extension lifecycle. 2–3 days
3. Graph schema definitions Define vlabels, elabels, indexes. Migrations. 2 days
4. Sync layer (Postgres tables → AGE) One-time migration: read existing contact_work_experiences, contact_education, UCC, etc. Emit AGE vertices/edges. Then ongoing after_commit hooks to keep them in sync. 5–7 days
5. Service-layer query migration Replace 3-hop drill-in CTEs with Cypher equivalents. Add shortest-path queries. Keep 1-hop and 2-hop in vanilla Postgres. 5–7 days
6. Test fixtures + factories FactoryBot already populates Postgres tables; need parallel graph fixtures or a derive-from-tables helper. 3 days
7. Production rollout + monitoring Dual-read for canary period (compare Cypher vs CTE results), then promote. 3 days + 1 week canary

Total: ~3–4 weeks engineering, plus a 1-week canary. Add 1–3 weeks if hosting migration is required.

9. What you give up

ConcernReality
Hosting compatibility AWS RDS Postgres does NOT support AGE. Aurora Postgres does, with custom configuration. Self-hosted Postgres, Crunchy Bridge, and several others do. Heroku Postgres doesn't. Locks the hosting decision.
Ecosystem maturity AGE is real, used in production by some companies, but the GitHub commit cadence is bursty. Rails ORM support is sparse — you'll write raw SQL with cypher() wrappers, not use Cypher through ActiveRecord.
Property typing AGE stores vertex/edge properties as jsonb. Indexes are functional indexes on jsonb fields, slightly less efficient than native columns and harder to reason about.
Tooling pgAdmin, DataGrip, etc. don't natively render Cypher results. Debugging graph queries means reading agtype output. Not a blocker; an ergonomic friction.
Dual query languages Some queries stay SQL (rollups, aggregations); others become Cypher (traversal, shortest-path). Engineers context-switch. Mitigated by keeping the boundary clear: traversal ≥ 3 hops uses Cypher, everything else stays SQL.
Backups + restore Standard pg_dump works. AGE-managed tables come along for the ride. No extra backup concerns.
Schema migrations Rails schema.rb won't fully capture AGE labels and indexes; need a parallel migration tracker or carefully-written migration files that run AGE setup commands.

8.1 Hosting compatibility deserves its own paragraph

This is the most likely deal-breaker. Confirm with the team's infra owner that current Postgres hosting supports AGE before any further investment in this variant. If hosting requires migration, the effort estimate doubles and the operational complexity argument starts to look more like the Neo4j variant — at which point Neo4j may be the cleaner choice anyway.

10. Mini-ADR

ADR-AGE Adopt Apache AGE for graph traversal queries Hypothetical
Context
008 needs to support 3-hop intro-path traversal and weighted shortest-path. Recursive CTEs in vanilla Postgres degrade past depth 3. Adopting a separate graph database (Neo4j) adds significant operational overhead. AGE offers a middle path: graph capability inside the existing Postgres instance.
Decision
Install Apache AGE as a Postgres extension. Use Cypher (via cypher()) for traversal queries at depth ≥ 3 and for shortest-path. Keep Postgres tables for per-pair caches (Layer 2) and rollups (Layer 4). Compute coemployment edges at query time via Cypher pattern matching; materialize as :COWORKER_OF edges only if read latency demands.
Consequences
  • + 3-hop drill-in becomes idiomatic; ~60% faster than recursive CTE.
  • + Native shortestPath() for path-finding features.
  • + No new database; same backups, same connection pool.
  • − Hosting must support AGE (RDS doesn't; Aurora with custom config does).
  • − Sparse Rails ecosystem; raw SQL with cypher() wrappers.
  • − Storage 2–3× larger for graph-stored edges (jsonb properties + graph metadata).
  • − 1-hop and 2-hop queries should stay in vanilla Postgres (Cypher adds noise without value at low depth).
Reversibility
Medium. Migration back to vanilla Postgres tables requires re-materializing coemployment edges and rewriting Cypher queries as recursive CTEs. ~2 weeks of work. The graph schema is well-defined and lossless to extract.
Trigger to revisit
If 4-hop or shortest-path-with-weights becomes the dominant query pattern; if AGE performance falls below Neo4j by >3× at depth 3+; if a hosting decision forces a Postgres migration that opens the door to a different stack.
Adopt only if
(a) Postgres hosting supports AGE, (b) 3-hop or shortest-path is a confirmed product requirement, (c) team has appetite to maintain a small ecosystem-maturity risk in exchange for staying single-DB.