Spec 039 · Contact Reachability

Reachability & Strongest Connection Filter

Turn the contacts list from a directory into an action list. Every row gets a "how easy is it for the team to get a warm intro?" badge that's filterable, sortable, and usable as an auto-update list rule.

Vocabulary first

Rollup: A single summary value computed from many smaller values. Here: combining each teammate's individual relationship score into one team-level number per contact.
Reachability: The team-level rollup. Answers "how easy is it for the team to get a warm intro to this contact?" Values: High Medium Low None
Strongest Connection: The best individual teammate level for a contact. Values: Warm | Known | Cold | Inferred | Unknown (same vocabulary as the per-leg badge from 037).

TL;DR

Each contact gets a single Reachability value rolled up from per-teammate scores. It appears as a column on the contacts list, a chip on the contact profile, and as criteria for auto-update lists. A separate Strongest Connection filter lets you slice by the level of the best individual teammate connection. The data is precomputed into a new Postgres table (contact_relationship_rollups) and mirrored to OpenSearch so the list endpoint can filter and sort by it without recomputing on every request.

What ships in 039

Surface	Description	Figma
Reachability column on contacts list	`High / Medium / Low / None` badge per row	`61493:50523`
Reachability filter (multi-select, sidebar)	Filter contacts by team-level reachability	`61493:50722`
Reachability sort	Asc/desc on the column	—
Reachability chip on contact profile header	Same badge near the contact name	`61442:57969`
Strongest Connection filter (sidebar)	Filter by the best individual teammate level	`61493:50716`
Auto-update list rule criterion	Lists update themselves as reachability changes	—

Cut from 039

Strongest Connection column renderer + sort → 041-strongest-connection-rollup (data ships now via 039, UI later)
Key Connection chip on compact contact cards → 042-key-connection-chip
Reachability on shared lists — deliberately omitted from the shared-list serializer
Reachability customization (per-network threshold tuning) → 040-relationship-strength-customization

The data pipeline

Three layers, each owned by a different spec:

008                          037                                  039
contact_connections    →     RelationshipStrength::Scorer    →   contact_relationship_rollups
(raw graph data)             (per-leg Warm/Known/Cold)            (team-level High/Medium/Low/None)

008 populates contact_connections — work overlap, LinkedIn 1st-degree, and other primitive edges. Shipped.
037 turns each (team_member, contact) pair into a Warm/Known/Cold/Inferred/Unknown level. Shipped.
039 rolls up across all teammates in a collection to a single team-level value per contact.

The rollup, in plain English

Pick a contact — Teagan Thomas. She has 4 teammates with relationships to her:

Teammate	Relationship (from 037)
Sammie	Warm
Marcus	Cold
Priya	Known
Dave	None

039 collapses those four scores into one team-level answer: High (because at least one teammate has Warm).

Mapping table:

Team's best teammate score	Rollup value
Warm	High
Known	Medium
Cold	Low
Nobody has any signal	None

The rollup is scoped to (collection, contact). The same contact in a different collection gets a different value because the "team" is different.

Why a Postgres table (`contact_relationship_rollups`)

It's the cache layer. Three reasons it earns its keep:

1. Incremental updates need cached intermediate counts

The table doesn't only store the final answer — it also stores warm_count, known_count, cold_count. When a single contact_connection changes:

Read cached counts
Adjust by ±1 for the changed leg
Recompute the level (still High because warm_count > 0)
Write back

That's O(1) per update. Without the table, each invalidation would have to walk every teammate, re-run the scorer, and recompute counts from scratch — ~10–100× more expensive.

2. OpenSearch rebuilds happen

Mapping changes, version upgrades, accidental wipes — they all require reindexing. With the Postgres table, reindex = bulk export from a table. Without it, reindex = re-run the scorer for every contact in every collection (days of compute on a 1M-contact universe).

3. Different read paths want different stores

List endpoint (filter/sort over big result sets) → OpenSearch
Contact-detail endpoint (one row at a time) → Postgres
Debugging ("why is this contact rated High?") → SELECT * FROM contact_relationship_rollups ...
Auto-update list rules (Phase 4) → both, depending on predicate

Schema & how it links

Source of truth in Postgres, mirrored to OpenSearch for the list endpoint. Below are the full column shapes and the relationships to existing tables.

Columns — `contact_relationship_rollups`

Column	Type	Null?	Default	Notes
`id`	`bigint`	NOT NULL	`nextval`	Primary key
`collection_id`	`bigint`	NOT NULL	—	FK → `collections.id`, ON DELETE CASCADE
`contact_id`	`bigint`	NOT NULL	—	FK → `contacts.id`, ON DELETE CASCADE
`reachability`	`smallint`	NOT NULL	`0`	Enum: `0=none, 1=low, 2=medium, 3=high`
`strongest_connection`	`smallint`	NOT NULL	`0`	Enum: `0=unknown, 1=inferred, 2=cold, 3=known, 4=warm`
`warm_count`	`integer`	NOT NULL	`0`	Cached count of Warm legs — enables O(1) incremental recompute
`known_count`	`integer`	NOT NULL	`0`	… same for Known
`cold_count`	`integer`	NOT NULL	`0`	… same for Cold
`inferred_count`	`integer`	NOT NULL	`0`	… same for Inferred (2-hop secondary-leg structural matches)
`computed_at`	`timestamptz`	NOT NULL	`now()`	When the worker last refreshed this row
`created_at`	`timestamptz`	NOT NULL	`now()`	Standard Rails timestamp
`updated_at`	`timestamptz`	NOT NULL	`now()`	Standard Rails timestamp

How it links to existing tables

collections                 contact_relationship_rollups                contacts
+----+-------+               +----+----------------+-------------+       +----+------+
| id | name  | <-----FK------ | id | collection_id  | contact_id  | --FK--> | id | name |
+----+-------+               |    | reachability   |             |       +----+------+
                             |    | strongest_conn |             |
                             |    | warm_count     |             |
                             |    | known_count    |             |
                             |    | cold_count     |             |
                             |    | inferred_count |             |
                             |    | computed_at    |             |
                             +----+----------------+-------------+
                                       ▲
                                       │  after_commit (create/update/destroy)
                                       │
                             contact_connections (008)
                             +----+-------------+--------------+----------+
                             | id | team_user_id | contact_id  | kind     |
                             +----+-------------+--------------+----------+
                                       │
                                       └── triggers RelationshipRollupRefreshWorker
                                           which re-runs 037's scorer per (collection, contact)
                                           and updates the rollup row above

collections ← collection_id defines the "team" (one rollup row per team's view of one contact).
contacts ← contact_id is the rollup's subject.
contact_connections (008) → not a direct FK, but every change here invalidates a rollup. The refresh worker re-runs 037's scorer for the affected (collection, contact) pair and writes the new counts/level.
user_contact_collections (UCC) → not linked at the row level (different cardinality — UCC is per-(user, contact, collection), rollup is per-(collection, contact)). Joins/leaves trigger the refresh worker but the row written here is the aggregate.

Indexes

Index	Columns	Purpose
`idx_…_collection_id_and_contact_id`	`(collection_id, contact_id)`	UNIQUE — one rollup per team/contact
`idx_…_collection_and_reachability`	`(collection_id, reachability)`	Filter "show me High" fast (Postgres-side)
`idx_…_collection_and_strongest`	`(collection_id, strongest_connection)`	Same, for Strongest connection filter

Derivation (state transitions)

The counts are computed from per-leg scorer output (037); the level enums are total functions of those counts.

warm_count     = direct_or_primary_leg_paths.count_where(strength == :warm)
known_count    = …(:known)
cold_count     = …(:cold)
inferred_count = 2_hop_secondary_leg_paths.count_where(strength == :inferred)

reachability =
  warm_count ≥ 1 OR known_count ≥ 3 OR cold_count ≥ 10 → :high
  known_count ≥ 1 OR cold_count ≥ 5                    → :medium
  cold_count ≥ 1                                        → :low
  else                                                  → :none

strongest_connection =
  warm_count    ≥ 1                              → :warm
  known_count   ≥ 1                              → :known
  cold_count    ≥ 1                              → :cold
  inferred_count ≥ 1 AND no direct evidence      → :inferred
  else                                            → :unknown

Why Inferred only feeds Strongest Connection, not Reachability

Reachability answers "can the team reach this contact?" — an Inferred leg between two non-team contacts doesn't help the team. Strongest Connection answers "what's the best signal we have?" — Inferred is a meaningful research signal there, so it surfaces when nothing stronger exists.

OpenSearch mirror (additive fields on the contacts index)

The Postgres row is the source of truth; OpenSearch carries a denormalized copy for filter + sort on the contacts-list endpoint.

Field	Type	Purpose
`collections[].reachability`	`keyword`	Multi-select filter values: `none \| low \| medium \| high`
`collections[].reachability_level`	`integer`	Enum value for sorting (0=None … 3=High). Mirrors the existing `seniority_level` pattern.
`collections[].strongest_connection`	`keyword`	Multi-select filter values: `unknown \| inferred \| cold \| known \| warm`. No `_level` companion in R1 — strongest-connection sort is deferred to 041.

The fields are nested under the existing collections array (one entry per collection the contact belongs to), so a contact in two networks gets two reachability values — one per team.

Why we can't ship 039 without this table

It's tempting to "just compute on the fly" or "just store in OpenSearch." Both fail concretely.

What breaks if there's no table at all

Use case	Without the table
Render the reachability badge on a contacts page (50 rows)	✅ Works — compute per page from `contact_connections`
Filter "show me only High reachability"	❌ OpenSearch can't filter on a value it doesn't have. App-side filtering means loading the whole collection → pagination breaks.
Sort by reachability	❌ Same as above. Search engines sort on indexed fields.
Auto-update list rule "auto-add every High reachability contact at a Series B"	❌ Rule evaluator runs against OpenSearch — no field, no rule.
Strongest Connection filter	❌ Same OpenSearch constraint.
Contact profile chip stays fast	⚠ Works but every page load re-runs the scorer for that contact.
Sub-second incremental updates when a connection changes	❌ Re-walking every teammate per invalidation is 10–100× the work. Queue depth would blow out under any meaningful sync.
OpenSearch index rebuild	❌ Re-running the scorer for every contact in every collection = days of compute. With the table = bulk export = minutes.

The reachability column without filter/sort is "decoration you can see but not act on" — which is most of the product value. Shipping without the table either ships a half-feature or eats massive runtime cost on every list query.

Why removing the table after launch would be costly

If we shipped 039 without the Postgres rollup and then changed our mind:

Read paths now expect it. Contact-detail endpoint, debug tools, and auto-update list evaluator all read from the table. Removing it is multi-PR work touching every consumer.
OpenSearch becomes load-bearing. Today OS is a mirror — if it's wrong, we re-export from Postgres. Without the table, OS becomes the source of truth. Every OS rebuild requires re-running the scorer for every contact (days). Every concurrent write needs CAS retry logic.
The intermediate warm_count / known_count cache is gone. Every invalidation walks every teammate again. With 800 jobs from a single LinkedIn sync, queue work multiplies by ~50×.
Migration off the table requires a full re-index of the OS contacts index — a planned maintenance event that requires the same backfill task we'd just deleted, plus a new "rollup-from-scratch" path inside the indexer.
Auto-update lists drift silently. Rules return different results day-over-day because the OS field is now eventually-consistent without a source-of-truth check.

Net: shipping without the table is a one-way door. We'd be paying the cost (build + backfill + migration plumbing) anyway, just later and under harder conditions (live customers, real data, tighter SLAs).

TL;DR

The Postgres rollup table isn't an architectural luxury — it's the only design that makes filter + sort + auto-update and fast incremental updates and recoverable OpenSearch rebuilds all work at the same time. Skipping it ships either a half-feature or a future emergency.

Why an OpenSearch mirror too

The contacts list endpoint reads from OpenSearch, not from Postgres (the list view is the read-hot path with full-text search and complex filters across millions of rows). For Reachability and Strongest Connection to be filterable and sortable, the values have to live in the OpenSearch index — not just Postgres.

The mirror is denormalized as collections[].reachability and collections[].strongest_connection (nested array, one entry per collection the contact belongs to).

Why you can't skip the mirror

Filter "show me only High reachability" — OpenSearch can't filter on a field it doesn't have. Computing in app code would mean loading all 10k+ contacts and filtering in memory → defeats pagination.
Sort by reachability — same constraint. Sort happens in the search engine.
Auto-update list rules — the rule evaluator runs over OpenSearch.

Workers

Worker	Trigger	What it does
`Contacts::RelationshipRollupRefreshWorker` (Sidekiq, debounced)	`contact_connections` `after_commit`; `UserContactCollection` join/leave	Recomputes one `(collection, contact)` rollup. Writes Postgres + triggers OS re-index.
`Maintenance::RelationshipStrength::BackfillRollupsTask`	One-time, on flag activation per collection	Populates `contact_relationship_rollups` for every contact in the collection. Idempotent + restartable.
`Maintenance::Opensearch::BackfillContactsReachability`	One-time, on flag activation per collection	Bulk-indexes the new field on the contacts OpenSearch index. Mirrors `UpdateContactsSeniority`.

Performance budget (SC-R-003)

Postgres rollup updates within 60s of a triggering change
OpenSearch document reflects within 90s of the same trigger
Backfill: 100k contacts in ≤ 30 min

Onboarding sequence (strict order)

Per-collection rollout — when turning the feature on for a new network:

1. BuildConnectionGraphWorker (008)              ← populates contact_connections
   ↓
2. BackfillRollupsTask (039 P0)                  ← computes & stores rollups
   ↓
3. UpdateContactsMapping (OS mapping update)     ← adds the new field to the index
   ↓
4. BackfillContactsReachability (039 P1)         ← indexes the field for existing contacts
   ↓
5. Flip `relationship_strength` flag for the collection

If you flip the flag too early

Empty rollups everywhere → list shows "None" for every contact
Filter "High reachability" returns 0 results
Sort by reachability returns the same order as before
Users think the feature is broken

No automated orchestrator chains these today — it's a runbook. Worth documenting as a script in backend/docs/data-migrations.md.

What happens when a new admin joins a network

This is the heaviest event the system handles. Walk-through:

Admin added — UserCollection row created. Refresh worker fires for every contact in the collection. New admin has no connections yet, so jobs run but rollups don't change (harmless waste).
Admin connects integrations — LinkedIn sync, Google account. Background workers start pulling.
Connections flow in — for each connection that matches an existing contact in the collection:
- New contact_connections row inserted
- after_commit enqueues RelationshipRollupRefreshWorker(collection_id, contact_id)
- Worker re-runs 037's scorer for that contact, writes new rollup, OS re-indexes
Users see the list update live — as the queue drains, badges upgrade from None / Low → Medium / High. Filter results shift. Auto-update lists gain or lose contacts.

Worst case

Admin with 5,000 LinkedIn connections, 800 matching existing contacts → 800 jobs enqueued.

Mitigations: debounce via sidekiq-unique-jobs (multiple inserts for the same contact within the window collapse to one job); per-contact granularity (each job is small); Sidekiq concurrency spreads load.

You don't have to re-backfill the collection when an admin joins. The invalidation path handles incremental updates. The per-collection backfill task is only for "first time turning the flag on."

Phase breakdown

Phase	Owner	Ships
P0 — Postgres source of truth	BE	Rollup table + model + rollup service + refresh worker + backfill task
P1 — OpenSearch mirror	BE	Mapping addition + indexer hook + bulk-sync + backfill task
P2 — List endpoint extensions	BE	API filter + sort on reachability + strongest_connection
P3 — FE list surfaces + profile chip	FE	Column renderer, filter dropdown, sort affordance, ReachabilityBadge atom, profile chip
P4 — Auto-update list integration	BE	Rule evaluator accepts both criteria

Open questions worth a team conversation

Should the refresh worker get its own Sidekiq queue? Big-onboarding events could starve user-facing work. Probably yes — minor ops change for big resilience win.
Stale-read window communication — between a connection sync and the rollup landing, the list is "in between." We don't tell the user. Worth a subtle "Updating reachability…" indicator? Or accept the 60–90s lag as invisible?
Multi-admin onboarding — when 5 admins join the same week, queue work multiplies. Debounce helps but not enough at scale. Worth a write-up on whether we throttle integration syncs during heavy onboarding periods.

Alternatives considered (and rejected)

Alternative	Why rejected
Compute reachability on-the-fly per page (no precompute)	Works for column display only. Breaks filter, sort, auto-update rules.
Store cache in OpenSearch only (skip Postgres table)	OS isn't transactional — concurrent updates lose writes; rebuilds re-run the entire scorer.
Store on `UserContactCollection`	Wrong cardinality — UCC is per-`(user, contact)`, reachability is per-`(team, contact)`.
Materialized view + refresh	Stale until refresh; bad for "I just added a connection, where is it?"
Painless script field at query time	Needs massive denormalization first; slow at scale.