Spec 039 · Contact Reachability

Reachability & Strongest Connection Filter

Turn the contacts list from a directory into an action list. Every row gets a "how easy is it for the team to get a warm intro?" badge that's filterable, sortable, and usable as an auto-update list rule.

Vocabulary first

Rollup
A single summary value computed from many smaller values. Here: combining each teammate's individual relationship score into one team-level number per contact.
Reachability
The team-level rollup. Answers "how easy is it for the team to get a warm intro to this contact?" Values: High Medium Low None
Strongest Connection
The best individual teammate level for a contact. Values: Warm | Known | Cold | Inferred | Unknown (same vocabulary as the per-leg badge from 037).

TL;DR

Each contact gets a single Reachability value rolled up from per-teammate scores. It appears as a column on the contacts list, a chip on the contact profile, and as criteria for auto-update lists. A separate Strongest Connection filter lets you slice by the level of the best individual teammate connection. The data is precomputed into a new Postgres table (contact_relationship_rollups) and mirrored to OpenSearch so the list endpoint can filter and sort by it without recomputing on every request.

What ships in 039

SurfaceDescriptionFigma
Reachability column on contacts listHigh / Medium / Low / None badge per row61493:50523
Reachability filter (multi-select, sidebar)Filter contacts by team-level reachability61493:50722
Reachability sortAsc/desc on the column
Reachability chip on contact profile headerSame badge near the contact name61442:57969
Strongest Connection filter (sidebar)Filter by the best individual teammate level61493:50716
Auto-update list rule criterionLists update themselves as reachability changes

Cut from 039

The data pipeline

Three layers, each owned by a different spec:

008                          037                                  039
contact_connections    →     RelationshipStrength::Scorer    →   contact_relationship_rollups
(raw graph data)             (per-leg Warm/Known/Cold)            (team-level High/Medium/Low/None)

The rollup, in plain English

Pick a contact — Teagan Thomas. She has 4 teammates with relationships to her:

TeammateRelationship (from 037)
SammieWarm
MarcusCold
PriyaKnown
DaveNone

039 collapses those four scores into one team-level answer: High (because at least one teammate has Warm).

Mapping table:

Team's best teammate scoreRollup value
WarmHigh
KnownMedium
ColdLow
Nobody has any signalNone

The rollup is scoped to (collection, contact). The same contact in a different collection gets a different value because the "team" is different.

Why a Postgres table (contact_relationship_rollups)

It's the cache layer. Three reasons it earns its keep:

1. Incremental updates need cached intermediate counts

The table doesn't only store the final answer — it also stores warm_count, known_count, cold_count. When a single contact_connection changes:

That's O(1) per update. Without the table, each invalidation would have to walk every teammate, re-run the scorer, and recompute counts from scratch — ~10–100× more expensive.

2. OpenSearch rebuilds happen

Mapping changes, version upgrades, accidental wipes — they all require reindexing. With the Postgres table, reindex = bulk export from a table. Without it, reindex = re-run the scorer for every contact in every collection (days of compute on a 1M-contact universe).

3. Different read paths want different stores

Schema & how it links

Source of truth in Postgres, mirrored to OpenSearch for the list endpoint. Below are the full column shapes and the relationships to existing tables.

Columns — contact_relationship_rollups

ColumnTypeNull?DefaultNotes
idbigintNOT NULLnextvalPrimary key
collection_idbigintNOT NULLFK → collections.id, ON DELETE CASCADE
contact_idbigintNOT NULLFK → contacts.id, ON DELETE CASCADE
reachabilitysmallintNOT NULL0Enum: 0=none, 1=low, 2=medium, 3=high
strongest_connectionsmallintNOT NULL0Enum: 0=unknown, 1=inferred, 2=cold, 3=known, 4=warm
warm_countintegerNOT NULL0Cached count of Warm legs — enables O(1) incremental recompute
known_countintegerNOT NULL0… same for Known
cold_countintegerNOT NULL0… same for Cold
inferred_countintegerNOT NULL0… same for Inferred (2-hop secondary-leg structural matches)
computed_attimestamptzNOT NULLnow()When the worker last refreshed this row
created_attimestamptzNOT NULLnow()Standard Rails timestamp
updated_attimestamptzNOT NULLnow()Standard Rails timestamp

How it links to existing tables

collections                 contact_relationship_rollups                contacts
+----+-------+               +----+----------------+-------------+       +----+------+
| id | name  | <-----FK------ | id | collection_id  | contact_id  | --FK--> | id | name |
+----+-------+               |    | reachability   |             |       +----+------+
                             |    | strongest_conn |             |
                             |    | warm_count     |             |
                             |    | known_count    |             |
                             |    | cold_count     |             |
                             |    | inferred_count |             |
                             |    | computed_at    |             |
                             +----+----------------+-------------+
                                       ▲
                                       │  after_commit (create/update/destroy)
                                       │
                             contact_connections (008)
                             +----+-------------+--------------+----------+
                             | id | team_user_id | contact_id  | kind     |
                             +----+-------------+--------------+----------+
                                       │
                                       └── triggers RelationshipRollupRefreshWorker
                                           which re-runs 037's scorer per (collection, contact)
                                           and updates the rollup row above

Indexes

IndexColumnsPurpose
idx_…_collection_id_and_contact_id(collection_id, contact_id)UNIQUE — one rollup per team/contact
idx_…_collection_and_reachability(collection_id, reachability)Filter "show me High" fast (Postgres-side)
idx_…_collection_and_strongest(collection_id, strongest_connection)Same, for Strongest connection filter

Derivation (state transitions)

The counts are computed from per-leg scorer output (037); the level enums are total functions of those counts.

warm_count     = direct_or_primary_leg_paths.count_where(strength == :warm)
known_count    = …(:known)
cold_count     = …(:cold)
inferred_count = 2_hop_secondary_leg_paths.count_where(strength == :inferred)

reachability =
  warm_count ≥ 1 OR known_count ≥ 3 OR cold_count ≥ 10 → :high
  known_count ≥ 1 OR cold_count ≥ 5                    → :medium
  cold_count ≥ 1                                        → :low
  else                                                  → :none

strongest_connection =
  warm_count    ≥ 1                              → :warm
  known_count   ≥ 1                              → :known
  cold_count    ≥ 1                              → :cold
  inferred_count ≥ 1 AND no direct evidence      → :inferred
  else                                            → :unknown
Why Inferred only feeds Strongest Connection, not Reachability

Reachability answers "can the team reach this contact?" — an Inferred leg between two non-team contacts doesn't help the team. Strongest Connection answers "what's the best signal we have?" — Inferred is a meaningful research signal there, so it surfaces when nothing stronger exists.

OpenSearch mirror (additive fields on the contacts index)

The Postgres row is the source of truth; OpenSearch carries a denormalized copy for filter + sort on the contacts-list endpoint.

FieldTypePurpose
collections[].reachabilitykeywordMulti-select filter values: none | low | medium | high
collections[].reachability_levelintegerEnum value for sorting (0=None … 3=High). Mirrors the existing seniority_level pattern.
collections[].strongest_connectionkeywordMulti-select filter values: unknown | inferred | cold | known | warm. No _level companion in R1 — strongest-connection sort is deferred to 041.

The fields are nested under the existing collections array (one entry per collection the contact belongs to), so a contact in two networks gets two reachability values — one per team.

Why we can't ship 039 without this table

It's tempting to "just compute on the fly" or "just store in OpenSearch." Both fail concretely.

What breaks if there's no table at all

Use caseWithout the table
Render the reachability badge on a contacts page (50 rows)✅ Works — compute per page from contact_connections
Filter "show me only High reachability"❌ OpenSearch can't filter on a value it doesn't have. App-side filtering means loading the whole collection → pagination breaks.
Sort by reachability❌ Same as above. Search engines sort on indexed fields.
Auto-update list rule "auto-add every High reachability contact at a Series B"❌ Rule evaluator runs against OpenSearch — no field, no rule.
Strongest Connection filter❌ Same OpenSearch constraint.
Contact profile chip stays fast⚠ Works but every page load re-runs the scorer for that contact.
Sub-second incremental updates when a connection changes❌ Re-walking every teammate per invalidation is 10–100× the work. Queue depth would blow out under any meaningful sync.
OpenSearch index rebuild❌ Re-running the scorer for every contact in every collection = days of compute. With the table = bulk export = minutes.

The reachability column without filter/sort is "decoration you can see but not act on" — which is most of the product value. Shipping without the table either ships a half-feature or eats massive runtime cost on every list query.

Why removing the table after launch would be costly

If we shipped 039 without the Postgres rollup and then changed our mind:

  1. Read paths now expect it. Contact-detail endpoint, debug tools, and auto-update list evaluator all read from the table. Removing it is multi-PR work touching every consumer.
  2. OpenSearch becomes load-bearing. Today OS is a mirror — if it's wrong, we re-export from Postgres. Without the table, OS becomes the source of truth. Every OS rebuild requires re-running the scorer for every contact (days). Every concurrent write needs CAS retry logic.
  3. The intermediate warm_count / known_count cache is gone. Every invalidation walks every teammate again. With 800 jobs from a single LinkedIn sync, queue work multiplies by ~50×.
  4. Migration off the table requires a full re-index of the OS contacts index — a planned maintenance event that requires the same backfill task we'd just deleted, plus a new "rollup-from-scratch" path inside the indexer.
  5. Auto-update lists drift silently. Rules return different results day-over-day because the OS field is now eventually-consistent without a source-of-truth check.

Net: shipping without the table is a one-way door. We'd be paying the cost (build + backfill + migration plumbing) anyway, just later and under harder conditions (live customers, real data, tighter SLAs).

TL;DR

The Postgres rollup table isn't an architectural luxury — it's the only design that makes filter + sort + auto-update and fast incremental updates and recoverable OpenSearch rebuilds all work at the same time. Skipping it ships either a half-feature or a future emergency.

Why an OpenSearch mirror too

The contacts list endpoint reads from OpenSearch, not from Postgres (the list view is the read-hot path with full-text search and complex filters across millions of rows). For Reachability and Strongest Connection to be filterable and sortable, the values have to live in the OpenSearch index — not just Postgres.

The mirror is denormalized as collections[].reachability and collections[].strongest_connection (nested array, one entry per collection the contact belongs to).

Why you can't skip the mirror
  • Filter "show me only High reachability" — OpenSearch can't filter on a field it doesn't have. Computing in app code would mean loading all 10k+ contacts and filtering in memory → defeats pagination.
  • Sort by reachability — same constraint. Sort happens in the search engine.
  • Auto-update list rules — the rule evaluator runs over OpenSearch.

Workers

WorkerTriggerWhat it does
Contacts::RelationshipRollupRefreshWorker (Sidekiq, debounced) contact_connections after_commit; UserContactCollection join/leave Recomputes one (collection, contact) rollup. Writes Postgres + triggers OS re-index.
Maintenance::RelationshipStrength::BackfillRollupsTask One-time, on flag activation per collection Populates contact_relationship_rollups for every contact in the collection. Idempotent + restartable.
Maintenance::Opensearch::BackfillContactsReachability One-time, on flag activation per collection Bulk-indexes the new field on the contacts OpenSearch index. Mirrors UpdateContactsSeniority.
Performance budget (SC-R-003)
  • Postgres rollup updates within 60s of a triggering change
  • OpenSearch document reflects within 90s of the same trigger
  • Backfill: 100k contacts in ≤ 30 min

Onboarding sequence (strict order)

Per-collection rollout — when turning the feature on for a new network:

1. BuildConnectionGraphWorker (008)              ← populates contact_connections
   ↓
2. BackfillRollupsTask (039 P0)                  ← computes & stores rollups
   ↓
3. UpdateContactsMapping (OS mapping update)     ← adds the new field to the index
   ↓
4. BackfillContactsReachability (039 P1)         ← indexes the field for existing contacts
   ↓
5. Flip `relationship_strength` flag for the collection
If you flip the flag too early
  • Empty rollups everywhere → list shows "None" for every contact
  • Filter "High reachability" returns 0 results
  • Sort by reachability returns the same order as before
  • Users think the feature is broken

No automated orchestrator chains these today — it's a runbook. Worth documenting as a script in backend/docs/data-migrations.md.

What happens when a new admin joins a network

This is the heaviest event the system handles. Walk-through:

  1. Admin addedUserCollection row created. Refresh worker fires for every contact in the collection. New admin has no connections yet, so jobs run but rollups don't change (harmless waste).
  2. Admin connects integrations — LinkedIn sync, Google account. Background workers start pulling.
  3. Connections flow in — for each connection that matches an existing contact in the collection:
    • New contact_connections row inserted
    • after_commit enqueues RelationshipRollupRefreshWorker(collection_id, contact_id)
    • Worker re-runs 037's scorer for that contact, writes new rollup, OS re-indexes
  4. Users see the list update live — as the queue drains, badges upgrade from None / Low → Medium / High. Filter results shift. Auto-update lists gain or lose contacts.
Worst case

Admin with 5,000 LinkedIn connections, 800 matching existing contacts → 800 jobs enqueued.

Mitigations: debounce via sidekiq-unique-jobs (multiple inserts for the same contact within the window collapse to one job); per-contact granularity (each job is small); Sidekiq concurrency spreads load.

You don't have to re-backfill the collection when an admin joins. The invalidation path handles incremental updates. The per-collection backfill task is only for "first time turning the flag on."

Phase breakdown

PhaseOwnerShips
P0 — Postgres source of truthBERollup table + model + rollup service + refresh worker + backfill task
P1 — OpenSearch mirrorBEMapping addition + indexer hook + bulk-sync + backfill task
P2 — List endpoint extensionsBEAPI filter + sort on reachability + strongest_connection
P3 — FE list surfaces + profile chipFEColumn renderer, filter dropdown, sort affordance, ReachabilityBadge atom, profile chip
P4 — Auto-update list integrationBERule evaluator accepts both criteria

Open questions worth a team conversation

Alternatives considered (and rejected)

AlternativeWhy rejected
Compute reachability on-the-fly per page (no precompute)Works for column display only. Breaks filter, sort, auto-update rules.
Store cache in OpenSearch only (skip Postgres table)OS isn't transactional — concurrent updates lose writes; rebuilds re-run the entire scorer.
Store on UserContactCollectionWrong cardinality — UCC is per-(user, contact), reachability is per-(team, contact).
Materialized view + refreshStale until refresh; bad for "I just added a connection, where is it?"
Painless script field at query timeNeeds massive denormalization first; slow at scale.