GET-109 · Architecture variant

Apache AGE: graph-in-Postgres

Architectural variant of the Network Intelligence System using Apache AGE — Postgres extension that adds openCypher graph queries to your existing database. No new infrastructure; one query language gain, one ecosystem-maturity loss.

Drafted 2026-04-29 · Status: review · Audience: Getro engineering team · Sibling: Neo4j variant

1. Overview

When this variant wins

Choose Apache AGE if you need 3-hop traversal or weighted shortest-path queries soon, but want to avoid operating a second database. AGE lives inside your Postgres instance as an extension — same backups, same connection pool, same Rails app — but adds Cypher syntax for graph queries. The trade-off is ecosystem maturity: AGE is real but small, and not every managed Postgres provider supports it.

Everything in the canonical 008 spec stays the same — InteractionEvent ingestion, OAuth, Findem enrichment, rule engine, UI. What changes: the storage substrate for connection-path edges and graph traversal queries. Per-pair caches and rollups can live in either standard Postgres tables or AGE-managed graph storage; the choice is per-cache, not all-or-nothing.

1.1 What Apache AGE actually is

Postgres extension (installed via CREATE EXTENSION age;) that adds graph database semantics to a Postgres database. Open-source, Apache 2.0.
Same database, same connection: graph data is stored in tables AGE manages internally (ag_label_vertex, ag_label_edge) but your app reads/writes via Cypher inside SQL queries.
Cypher syntax inside cypher() function calls — you embed openCypher queries inside SQL. Not a separate query language at the protocol level.
One repo to deploy, one DB to back up, one credential to rotate.

1.2 Why this variant exists

The canonical 008 plan handles 1+2 hop queries beautifully but degrades for 3-hop traversal (recursive CTEs explode at branching factor) and lacks native shortest-path. AGE is the smallest possible step toward graph capability without adopting a separate graph database.

2. System architecture

The pipeline shape is the same as canonical 008. The change is purely at the storage layer: graph edges and certain caches live in AGE-managed graph storage instead of vanilla Postgres tables. Workers still write via after_commit; reads still happen via the same Rails service layer; the only difference is whether a query goes through SQL or through cypher().

flowchart LR subgraph App["Rails app"] Hooks[after_commit hooks] Workers[Sidekiq workers] Services[Service layer
SQL + Cypher mix] end subgraph PG["Postgres (one instance)"] Source[Source tables
contact_work_experiences
interaction_events] PgTables[Postgres rollups
SharedListNetworkSummary
UserContactInteractionStats] AGE[(AGE-managed graph
nodes + edges)] end Source -- writes --> Hooks Hooks --> Workers Workers -- "writes via cypher() or SQL" --> AGE Workers -- writes --> PgTables Services -- "Cypher (graph traversal)" --> AGE Services -- "SQL (rollups, aggregation)" --> PgTables Services -- "SQL (source reads)" --> Source classDef pg fill:#fef3c7,stroke:#b45309 classDef age fill:#f3e8ff,stroke:#7e22ce class Source,PgTables pg class AGE age

2.1 What's same as canonical 008

Source ingestion (Phase 1–3, 6, 7) is identical.
Per-pair caches (WorkOverlapCache, UserContactInteractionStats) stay as standard Postgres tables — no graph value-add for per-pair lookups.
List-view rollup (SharedListNetworkSummary) stays as a standard Postgres table — aggregation is SQL's home turf.
Reachability rollup (ContactReachability) stays as a standard Postgres table.

2.2 What's different

Cross-pair edges (ContactCoemploymentEdge, etc.) stored as AGE graph edges instead of Postgres tables.
Drill-in queries use Cypher via cypher() instead of recursive CTEs.
Shortest-path queries become available via Cypher's built-in shortestPath().
3-hop traversal becomes idiomatic instead of a recursive-CTE workaround.

3. Data model (graph schema)

One AGE graph named getro_network. Five vertex labels, four edge labels for v1.

3.0 What lives where

Before reading the schema, anchor on the storage boundary. Postgres remains the source of truth. AGE adds graph storage as a derived index for traversal queries — it does not replace the relational tables.

Layer	Where it lives	Why
Source data (facts about the world)	`contact_work_experiences` (PG) `contact_education` (PG) `UCC.shared` (PG) + duplicated as `:WORKED_AT`, `:STUDIED_AT`, `:SHARED_IN` edges in the AGE graph	PG is required for Rails ActiveRecord, Findem sync, and every non-graph feature. AGE gets its own copy because Cypher pattern matching needs graph-shaped storage. Same physical database, different storage layout. Sync via Sidekiq workers on `after_commit`.
Derived edges (inferred from source)	AGE only — computed at query time via Cypher pattern matching, OR materialized as `:COWORKER_OF` edges if read latency demands.	No `ContactCoemploymentEdge` Postgres table in this variant. Cypher's pattern match on `:WORKED_AT` edges replaces precomputation. Single source of derived truth — no PG ↔ AGE drift on inferred edges.
Per-pair connections (unified `ContactConnection`)	PG only — single `contact_connections` table with `kind` enum (work_overlap, email_exchange, meeting, linkedin_connection, etc.). Read by 008 strength rules; written by all derivation workers.	Per-pair lookups don't benefit from graph structure. PG keyed lookups via `(contact_a_id, contact_b_id, kind)` are fastest. AGE's `:COWORKER_OF` derived edges are a parallel representation for graph traversal only.
Rollup tables (list-view aggregations)	PG only — `SharedListNetworkSummary`, `ContactReachability`	Hot-path read budget (50ms p99) requires a single indexed Postgres lookup. Aggregation is SQL's home turf.

The duplication is in source data, not connection rows

This is the most-misread aspect of the variant. Adopting AGE means duplicating raw facts (work history, education, UCCs) across two storage layouts in the same Postgres instance. It does not mean duplicating ContactConnection rows — those stay in Postgres as the single source of truth for per-pair signals. AGE optionally materializes :COWORKER_OF graph edges for traversal performance; those are derived from the same kernel that writes ContactConnection rows.

3.1 Schema setup

-- One-time graph creation
LOAD 'age';
SET search_path = ag_catalog, "$user", public;
SELECT create_graph('getro_network');

-- Vertex labels
SELECT create_vlabel('getro_network', 'Contact');
SELECT create_vlabel('getro_network', 'Organization');
SELECT create_vlabel('getro_network', 'School');
SELECT create_vlabel('getro_network', 'TeamUser');
SELECT create_vlabel('getro_network', 'Collection');

-- Edge labels
SELECT create_elabel('getro_network', 'WORKED_AT');
SELECT create_elabel('getro_network', 'STUDIED_AT');
SELECT create_elabel('getro_network', 'SHARED_IN');     -- TeamUser→Contact via Collection
SELECT create_elabel('getro_network', 'CURRENT_AT');    -- Contact→Org for current employment

3.2 Vertex properties

Label	Properties	Source
`Contact`	id, name, primary_email_id	One per Getro `Contact` row
`Organization`	id, name, linkedin_url	One per `Organization` row
`School`	id, name	One per canonical school
`TeamUser`	id, email	One per `User` row
`Collection`	id, network_id	One per `Collection` row

3.3 Edge properties

Label	From → To	Properties	Source
`WORKED_AT`	Contact → Organization	date_from, date_to, title, is_current	One per `contact_work_experience` row
`STUDIED_AT`	Contact → School	date_from, date_to, degree	One per `contact_education` row
`SHARED_IN`	TeamUser → Contact	collection_id, source, connected_at	One per `UCC.shared` row
`CURRENT_AT`	Contact → Organization	started_at, title	Derived: WORKED_AT where date_to IS NULL

Coemployment is implicit, not stored

Unlike the canonical 008 plan that materializes ContactCoemploymentEdge rows, the AGE variant computes coemployment at query time via Cypher's pattern matching. Two contacts who worked at the same Organization with overlapping date ranges are connected by traversing (c1)-[:WORKED_AT]->(o)<-[:WORKED_AT]-(c2) with a date-overlap predicate. Saves write amplification; costs read latency.

3.4 Indexes

AGE supports B-tree indexes on vertex/edge properties via standard Postgres syntax:

CREATE INDEX ON getro_network."Contact" USING BTREE ((properties->>'id'));
CREATE INDEX ON getro_network."Organization" USING BTREE ((properties->>'id'));
CREATE INDEX ON getro_network."WORKED_AT" USING BTREE ((properties->>'organization_id'),
                                                       (properties->>'date_from'),
                                                       (properties->>'date_to'));

Property-based indexes are functional indexes on the underlying properties jsonb column. Performance is acceptable but slightly worse than native typed columns in vanilla Postgres tables.

4. Query patterns

Side-by-side comparison: canonical 008 (recursive CTEs in vanilla Postgres) vs the AGE variant (Cypher inside Postgres). Same physical database, different syntax.

4.1 Direct connection (1-hop)

Canonical Postgres

SELECT contact_id, current_title
FROM   collection_org_current_shared_contacts
WHERE  collection_id = $1
  AND  organization_id = $2;

Apache AGE

SELECT * FROM cypher('getro_network', $$
  MATCH (u:TeamUser)-[s:SHARED_IN]->(c:Contact)
        -[ca:CURRENT_AT]->(o:Organization)
  WHERE s.collection_id = $1
    AND o.id = $2
  RETURN c.id, ca.title
$$, $1::int, $2::int) AS (id agtype, title agtype);

For 1-hop, AGE is a step backward — Cypher syntax adds noise without traversal value. Recommendation: keep direct-connection queries in vanilla Postgres tables.

4.2 Work-overlap intro path (2-hop)

Canonical Postgres

-- Reads precomputed ContactCoemploymentEdge
SELECT via_contact_id, other_contact_id,
       organization_id, overlap_from, overlap_to
FROM   contact_coemployment_edges
WHERE  collection_id = $1
  AND  other_contact_id IN (
         SELECT contact_id
         FROM   collection_org_current_shared_contacts
         WHERE  collection_id = $1
           AND  organization_id = $2)
ORDER BY overlap_to DESC;

Apache AGE

-- Computes overlap at query time
SELECT * FROM cypher('getro_network', $$
  MATCH (u:TeamUser)-[s:SHARED_IN]->(via:Contact)
        -[w1:WORKED_AT]->(bridge:Organization)
        <-[w2:WORKED_AT]-(target:Contact)
        -[:CURRENT_AT]->(targetOrg:Organization)
  WHERE s.collection_id = $1
    AND targetOrg.id = $2
    AND w1.date_from <= w2.date_to
    AND w2.date_from <= w1.date_to
  RETURN via.id, target.id, bridge.id,
         w1.date_from, w1.date_to
  ORDER BY w1.date_to DESC
$$, ...) AS (...);

AGE is more readable for 2-hop. But it's also doing more work — every query recomputes overlaps that the canonical plan precomputes. Latency parity at depth 2 is roughly equal once indexes are tuned.

4.3 Extended intro path (3-hop)

Canonical Postgres (recursive CTE)

WITH RECURSIVE paths AS (
  SELECT contact_id AS via, 0 AS depth, ARRAY[]::int[] AS path
  FROM   user_contact_collections
  WHERE  collection_id = $1
    AND  source = 'shared'
  UNION ALL
  SELECT edge.other_contact_id, p.depth + 1,
         p.path || edge.organization_id
  FROM   paths p
  JOIN   contact_coemployment_edges edge
         ON edge.via_contact_id = p.via
  WHERE  p.depth < 3
)
SELECT * FROM paths
WHERE via IN (SELECT contact_id
              FROM collection_org_current_shared_contacts
              WHERE collection_id = $1
                AND organization_id = $2);

Apache AGE

SELECT * FROM cypher('getro_network', $$
  MATCH p = (u:TeamUser)-[:SHARED_IN]->
            (via:Contact)
            -[:WORKED_AT*1..3]->(bridge:Organization)
            <-[:WORKED_AT*1..3]-(target:Contact)
            -[:CURRENT_AT]->(o:Organization {id: $2})
  WHERE all(rel IN relationships(p)
            WHERE rel.date_from IS NOT NULL)
  RETURN p
  LIMIT 50
$$, ...) AS (path agtype);

This is where AGE earns its keep. The 3-hop CTE is verbose, brittle, and gets uglier with weights or filters. Cypher's path syntax is a clear improvement. Performance: AGE is ~2× faster than the recursive CTE at depth 3 with a typical branching factor.

4.4 Weighted shortest-path

SELECT * FROM cypher('getro_network', $$
  MATCH p = shortestPath(
    (u:TeamUser {id: $1})-[*..4]-(target:Contact)
    -[:CURRENT_AT]->(o:Organization {id: $2})
  )
  RETURN p, length(p) AS hops
$$, $1::int, $2::int) AS (path agtype, hops agtype);

AGE supports shortestPath(). Weighted variants require Dijkstra-style accumulator patterns; not as elegant as Neo4j but doable.

5. Heuristic mapping

Every signal the system computes — every strength clause, the reachability rollup, key connection picker, and all 007 connection-path types — mapped to where it actually executes in the AGE variant. The point of this section: most heuristics don't change at all. AGE earns its keep on a specific subset where graph traversal or pattern matching is the natural shape.

Bottom line for this section

Of 16 strength clauses + reachability + key connection + 6 connection-path types, AGE substantially helps with ~9: the W3/K6 LinkedIn-edge checks, the W5/K4/K5 small-org coemployment lookups, C3/C4 graph absence/presence checks, and all 6 connection-path types where graph traversal is the natural query. The remaining ~14 stay in vanilla Postgres because they're aggregation or counter-comparison queries.

5.1 Strength heuristics (008 — Warm / Known / Cold)

Each clause evaluates against an in-memory (team_user, contact) stats row. AGE only enters the picture when the clause needs a graph traversal (LinkedIn 1st-degree edge, coemployment shape, school-overlap fact).

ID	Clause	Source data	Substrate	AGE involvement
W1	2-way exchange in 12mo, 5+ each direction	`UserContactInteractionStats` counters	PG only	None — pure counter check
W2	Sustained 2-year back-and-forth (low volume)	`UserContactInteractionStats.email_activity_bitmap_12q`	PG only	None — bitmap analysis
W3	Recent email (3mo) + response + work OR LinkedIn connection	Stats + `WorkOverlapCache` + LinkedIn graph	Hybrid	AGE answers the LinkedIn-1st-degree branch via `MATCH (u)-[:LINKEDIN_CONNECTED]-(c)`; PG handles email recency. Cleaner than recursive CTE for the LinkedIn check.
W4	1+ calendar meeting in last 180d	`UserContactInteractionStats.meeting_count_180d`	PG only	None — counter check
W5	Same-team / small company overlap	`WorkOverlapCache` + org employee count	Hybrid	AGE finds coemployment edges where bridge org has <N total employees: `MATCH (a:Contact)-[:WORKED_AT]-(o:Organization)-[:WORKED_AT]-(b:Contact) WHERE size((o)<-[:WORKED_AT]-()) < 50`. PG can do this but joins are uglier.
K1	Occasional 2-way (3–10 emails total)	`UserContactInteractionStats`	PG only	None
K2	Inbound non-newsletter within 24mo	`UserContactInteractionStats.email_inbound_nnl_count_24mo`	PG only	None
K3	Meeting 180d–24mo ago	`UserContactInteractionStats.meeting_count_24mo`	PG only	None
K4	Same-team / small company overlap (looser window)	Same as W5 with looser threshold	Hybrid	Same Cypher pattern as W5 with relaxed predicate.
K5	Same company + location + function	Work history with location + role filters	Hybrid	Cypher pattern: `MATCH (a)-[w1:WORKED_AT]-(o)-[w2:WORKED_AT]-(b) WHERE w1.location = w2.location AND w1.function = w2.function`. Multi-property pattern match is where Cypher reads cleaner than SQL JOINs.
K6	LinkedIn connection AND (old work overlap OR shared school)	LinkedIn graph + `WorkOverlapCache` + `EducationOverlapCache`	Hybrid	One Cypher query expresses the OR-of-paths: `MATCH (u)-[:LINKEDIN_CONNECTED]-(c) MATCH (u)-[:WORKED_AT\|STUDIED_AT]-()-[:WORKED_AT\|STUDIED_AT]-(c)`. Equivalent SQL is a UNION of 3 different join patterns.
C1	One-way outbound only (no reply)	`UserContactInteractionStats`	PG only	None
C2	Inbound newsletter-style only	`UserContactInteractionStats.email_inbound_count_24mo` with `newsletter_flag` filter	PG only	None — counter check with newsletter exclusion
C3	LinkedIn 1st-degree with no other signal	LinkedIn edge + absence of other signals	Hybrid	AGE confirms the LinkedIn edge exists; PG confirms absence of email/meeting/coemployment signals. Cypher's `EXISTS` + negation pattern handles the "no other graph signal" check naturally.
C4	Past shared employment >5yr ago, no recent contact	`WorkOverlapCache` with date filter + email recency	Hybrid	Cypher finds the old coemployment edge: `MATCH (u)-[:WORKED_AT]-()-[:WORKED_AT]-(c) WHERE w.date_to < date() - 5*365 days`. PG checks email recency.
C5	Default fallback	Always	PG only	None

5.2 Reachability (008 — High / Medium / Low / None)

Reachability rolls up strength tiers across all team members for a given (contact, collection): count Warm / Known / Cold owners, apply tier rules, write to ContactReachability.

Substrate: PG only
Why: pure aggregation of UCC strength columns. SELECT collection_id, contact_id, COUNT(*) FILTER (WHERE strength_tier = 'warm') AS warm_count ...
AGE involvement: none. Aggregation is SQL's home turf; cypher() wrapping would slow this down without value.

5.3 Key Connection (008 — best person on team to ask for an intro)

Per (contact, collection), pick the team member with the strongest tie. The picker reads UCC strength tiers, applies a deterministic rule (Warm beats Known beats Cold; within tier, highest strength_score), writes ContactReachability.key_user_id.

Substrate: PG primary — the picker is a sort-and-pick on UCC rows.
AGE bonus opportunity: if you want "best path to the contact" (not just best direct UCC), AGE shines. Cypher's shortestPath() returns the cheapest weighted route through the graph — useful when the contact has no direct UCC owner but is reachable via 2-hop intro.
Honest assessment: V1 picks based on direct UCC; AGE doesn't help. V2 with weighted multi-hop key-connection lookup is where Cypher meaningfully wins.

5.4 Connection paths (007 — direct + intro paths)

This is where AGE genuinely earns its keep. Every 007 path type maps to a graph traversal, and Cypher pattern matching is more natural than recursive CTEs.

Path type	Spec phase	Substrate	Cypher shape
Direct connection (1-hop)	007 v1	PG rollup	List view reads `SharedListNetworkSummary`; AGE adds overhead without value at depth 1.
Work-overlap intro path (2-hop)	007 v1	AGE or PG	`MATCH (u)-[:SHARED_IN]->(via)-[w1:WORKED_AT]->(o)<-[w2:WORKED_AT]-(target)-[:CURRENT_AT]->(target_org)` with date-overlap predicate. Cleaner than precomputed-edge JOIN.
Education-overlap intro path	007 v2	AGE primary	Same pattern with `STUDIED_AT` as the bridge. Adding a new bridge type is one Cypher line; in PG it's a new edge table + worker.
Co-investment intro path	007 v3	AGE primary	`MATCH (u)-[:SHARED_IN]->(via)-[:INVESTED_IN]->(c)<-[:INVESTED_IN]-(target)` — same shape, different edge label.
Investor → Employee path	007 v3	AGE primary	Mixed-edge pattern: `(u)-[:SHARED_IN]->(investor)-[:INVESTED_IN]->(c)<-[:CURRENT_AT]-(employee)`. Mixed-type traversal is where Cypher really beats recursive CTEs.
Board peer path	007 v3	AGE primary	`BOARD_OF` as the bridge edge.
3-hop traversal (any combo)	future feature	AGE wins	`MATCH (u)-[:SHARED_IN]->(via)-[:WORKED_AT\|STUDIED_AT\|INVESTED_IN*1..3]-(target)`. One line. Recursive CTE equivalent is dozens of lines and degrades.
Weighted shortest path	future feature	AGE wins	Native `shortestPath()`. Returns the cheapest weighted route through any combination of edges.

5.5 Honest summary

Of 23 distinct queries the system needs to answer (16 strength clauses + reachability + key connection + 6 path types), AGE meaningfully helps on ~9. The remaining 14 stay in Postgres because they're either counter aggregations (most strength clauses) or rollup queries (reachability, list view).

Pure win zone: 007 connection-path queries (5–6 queries). Cypher is genuinely cleaner here.
Modest win zone: 008 hybrid clauses (W3, W5, K4, K5, K6, C3, C4) — 7 queries. Cypher shaves complexity but isn't transformative.
No-help zone: counter-aggregation strength clauses (W1, W2, W4, K1–K3, C1, C2, C5) and reachability rollup. Stay vanilla Postgres.

The decision becomes clear: AGE is worthwhile if 007 connection-path traversal is the dominant new feature. If the system is mostly 008 strength scoring (per-pair counters), AGE adds infrastructure for limited value.

6. Caching architecture

The three-layer cache strategy from canonical 008 §6 still applies. The difference is which layer lives where:

Layer	Canonical 008	AGE variant
Layer 2 — per-pair caches (`WorkOverlapCache`, etc.)	Postgres tables	Same — Postgres tables (no graph value-add for keyed lookups)
Layer 3 — cross-pair edges (`ContactCoemploymentEdge`, etc.)	Postgres tables	AGE graph edges (computed at query time from `WORKED_AT` patterns; or materialized as `:COWORKER_OF` edges if read latency demands)
Layer 4 — list rollups (`SharedListNetworkSummary`)	Postgres tables	Same — Postgres tables (aggregation is SQL's home turf)

The critical decision: do we materialize coemployment edges in AGE, or compute them at query time?

Compute at query time (recommended for v1): smaller graph, no write amplification for derived edges, pattern matching does the work. Read latency scales with branching factor.
Materialize as :COWORKER_OF edges (option for v2): faster reads at depth 3+, larger graph, write amplification on every WORKED_AT change. Worth it only if 3-hop drill-in becomes a hot path.

7. Operational envelope

Metric	Canonical 008	AGE variant	Delta
Storage per collection	~5 MB (v1) / ~25 MB (v2)	~15 MB (v1) / ~50 MB (v2) — graph metadata + jsonb properties	2–3× larger
List-view p99	~30 ms	~30 ms (still hits Postgres rollup)	Same
Drill-in 2-hop p99	~80 ms	~120 ms (Cypher overhead)	~50% slower at depth 2
Drill-in 3-hop p99	~1.5 s (recursive CTE)	~600 ms (Cypher pattern match)	~60% faster at depth 3
Drill-in 4-hop p99	~15 s (likely infeasible)	~3 s (still slow but tractable)	5× faster at depth 4
Shortest-path	Custom CTE work, ~1–3 s	Native `shortestPath()`, ~400 ms	Native support is the key win
Write amplification	1–30 UPSERTs per source change	1–10 UPSERTs (no derived-edge materialization)	Lower if computing edges at read time
New infrastructure	None	AGE extension (must be supported by hosting)	Small but real

8. Migration cost

Effort to move from canonical 008 to AGE variant. Assumes 008 v1 has shipped (Postgres tables with coemployment edges).

Phase	Work	Effort
1. Hosting feasibility	Confirm AGE is supported by current Postgres hosting (RDS managed Postgres typically does NOT support AGE; self-hosted, Aurora with custom, or migration to a provider that supports it). If managed-only, this phase blocks everything.	1–3 days investigation + possible infrastructure migration
2. Extension install + version pin	Install AGE, lock to a specific version. Update Rails db schema management to handle the extension lifecycle.	2–3 days
3. Graph schema definitions	Define vlabels, elabels, indexes. Migrations.	2 days
4. Sync layer (Postgres tables → AGE)	One-time migration: read existing `contact_work_experiences`, `contact_education`, `UCC`, etc. Emit AGE vertices/edges. Then ongoing `after_commit` hooks to keep them in sync.	5–7 days
5. Service-layer query migration	Replace 3-hop drill-in CTEs with Cypher equivalents. Add shortest-path queries. Keep 1-hop and 2-hop in vanilla Postgres.	5–7 days
6. Test fixtures + factories	FactoryBot already populates Postgres tables; need parallel graph fixtures or a derive-from-tables helper.	3 days
7. Production rollout + monitoring	Dual-read for canary period (compare Cypher vs CTE results), then promote.	3 days + 1 week canary

Total: ~3–4 weeks engineering, plus a 1-week canary. Add 1–3 weeks if hosting migration is required.

9. What you give up

Concern	Reality
Hosting compatibility	AWS RDS Postgres does NOT support AGE. Aurora Postgres does, with custom configuration. Self-hosted Postgres, Crunchy Bridge, and several others do. Heroku Postgres doesn't. Locks the hosting decision.
Ecosystem maturity	AGE is real, used in production by some companies, but the GitHub commit cadence is bursty. Rails ORM support is sparse — you'll write raw SQL with `cypher()` wrappers, not use Cypher through ActiveRecord.
Property typing	AGE stores vertex/edge properties as jsonb. Indexes are functional indexes on jsonb fields, slightly less efficient than native columns and harder to reason about.
Tooling	pgAdmin, DataGrip, etc. don't natively render Cypher results. Debugging graph queries means reading agtype output. Not a blocker; an ergonomic friction.
Dual query languages	Some queries stay SQL (rollups, aggregations); others become Cypher (traversal, shortest-path). Engineers context-switch. Mitigated by keeping the boundary clear: traversal ≥ 3 hops uses Cypher, everything else stays SQL.
Backups + restore	Standard `pg_dump` works. AGE-managed tables come along for the ride. No extra backup concerns.
Schema migrations	Rails schema.rb won't fully capture AGE labels and indexes; need a parallel migration tracker or carefully-written migration files that run AGE setup commands.

8.1 Hosting compatibility deserves its own paragraph

This is the most likely deal-breaker. Confirm with the team's infra owner that current Postgres hosting supports AGE before any further investment in this variant. If hosting requires migration, the effort estimate doubles and the operational complexity argument starts to look more like the Neo4j variant — at which point Neo4j may be the cleaner choice anyway.

10. Mini-ADR

ADR-AGE Adopt Apache AGE for graph traversal queries Hypothetical

Context

008 needs to support 3-hop intro-path traversal and weighted shortest-path. Recursive CTEs in vanilla Postgres degrade past depth 3. Adopting a separate graph database (Neo4j) adds significant operational overhead. AGE offers a middle path: graph capability inside the existing Postgres instance.

Decision

Install Apache AGE as a Postgres extension. Use Cypher (via cypher()) for traversal queries at depth ≥ 3 and for shortest-path. Keep Postgres tables for per-pair caches (Layer 2) and rollups (Layer 4). Compute coemployment edges at query time via Cypher pattern matching; materialize as :COWORKER_OF edges only if read latency demands.

Consequences

+ 3-hop drill-in becomes idiomatic; ~60% faster than recursive CTE.
+ Native shortestPath() for path-finding features.
+ No new database; same backups, same connection pool.
− Hosting must support AGE (RDS doesn't; Aurora with custom config does).
− Sparse Rails ecosystem; raw SQL with cypher() wrappers.
− Storage 2–3× larger for graph-stored edges (jsonb properties + graph metadata).
− 1-hop and 2-hop queries should stay in vanilla Postgres (Cypher adds noise without value at low depth).

Reversibility

Medium. Migration back to vanilla Postgres tables requires re-materializing coemployment edges and rewriting Cypher queries as recursive CTEs. ~2 weeks of work. The graph schema is well-defined and lossless to extract.

Trigger to revisit

If 4-hop or shortest-path-with-weights becomes the dominant query pattern; if AGE performance falls below Neo4j by >3× at depth 3+; if a hosting decision forces a Postgres migration that opens the door to a different stack.

Adopt only if

(a) Postgres hosting supports AGE, (b) 3-hop or shortest-path is a confirmed product requirement, (c) team has appetite to maintain a small ecosystem-maturity risk in exchange for staying single-DB.