Reachability & Strongest Connection Filter
Turn the contacts list from a directory into an action list. Every row gets a "how easy is it for the team to get a warm intro?" badge that's filterable, sortable, and usable as an auto-update list rule.
Vocabulary first
- Rollup
- A single summary value computed from many smaller values. Here: combining each teammate's individual relationship score into one team-level number per contact.
- Reachability
- The team-level rollup. Answers "how easy is it for the team to get a warm intro to this contact?" Values: High Medium Low None
- Strongest Connection
- The best individual teammate level for a contact. Values:
Warm | Known | Cold | Inferred | Unknown(same vocabulary as the per-leg badge from 037).
TL;DR
Each contact gets a single Reachability value rolled up from per-teammate scores. It appears as a column on the contacts list, a chip on the contact profile, and as criteria for auto-update lists. A separate Strongest Connection filter lets you slice by the level of the best individual teammate connection. The data is precomputed into a new Postgres table (contact_relationship_rollups) and mirrored to OpenSearch so the list endpoint can filter and sort by it without recomputing on every request.
What ships in 039
| Surface | Description | Figma |
|---|---|---|
| Reachability column on contacts list | High / Medium / Low / None badge per row | 61493:50523 |
| Reachability filter (multi-select, sidebar) | Filter contacts by team-level reachability | 61493:50722 |
| Reachability sort | Asc/desc on the column | — |
| Reachability chip on contact profile header | Same badge near the contact name | 61442:57969 |
| Strongest Connection filter (sidebar) | Filter by the best individual teammate level | 61493:50716 |
| Auto-update list rule criterion | Lists update themselves as reachability changes | — |
Cut from 039
- Strongest Connection column renderer + sort →
041-strongest-connection-rollup(data ships now via 039, UI later) - Key Connection chip on compact contact cards →
042-key-connection-chip - Reachability on shared lists — deliberately omitted from the shared-list serializer
- Reachability customization (per-network threshold tuning) →
040-relationship-strength-customization
The data pipeline
Three layers, each owned by a different spec:
008 037 039
contact_connections → RelationshipStrength::Scorer → contact_relationship_rollups
(raw graph data) (per-leg Warm/Known/Cold) (team-level High/Medium/Low/None)
- 008 populates
contact_connections— work overlap, LinkedIn 1st-degree, and other primitive edges. Shipped. - 037 turns each
(team_member, contact)pair into a Warm/Known/Cold/Inferred/Unknown level. Shipped. - 039 rolls up across all teammates in a collection to a single team-level value per contact.
The rollup, in plain English
Pick a contact — Teagan Thomas. She has 4 teammates with relationships to her:
| Teammate | Relationship (from 037) |
|---|---|
| Sammie | Warm |
| Marcus | Cold |
| Priya | Known |
| Dave | None |
039 collapses those four scores into one team-level answer: High (because at least one teammate has Warm).
Mapping table:
| Team's best teammate score | Rollup value |
|---|---|
| Warm | High |
| Known | Medium |
| Cold | Low |
| Nobody has any signal | None |
The rollup is scoped to (collection, contact). The same contact in a different collection gets a different value because the "team" is different.
Why a Postgres table (contact_relationship_rollups)
It's the cache layer. Three reasons it earns its keep:
1. Incremental updates need cached intermediate counts
The table doesn't only store the final answer — it also stores warm_count, known_count, cold_count. When a single contact_connection changes:
- Read cached counts
- Adjust by ±1 for the changed leg
- Recompute the level (still High because
warm_count > 0) - Write back
That's O(1) per update. Without the table, each invalidation would have to walk every teammate, re-run the scorer, and recompute counts from scratch — ~10–100× more expensive.
2. OpenSearch rebuilds happen
Mapping changes, version upgrades, accidental wipes — they all require reindexing. With the Postgres table, reindex = bulk export from a table. Without it, reindex = re-run the scorer for every contact in every collection (days of compute on a 1M-contact universe).
3. Different read paths want different stores
- List endpoint (filter/sort over big result sets) → OpenSearch
- Contact-detail endpoint (one row at a time) → Postgres
- Debugging ("why is this contact rated High?") →
SELECT * FROM contact_relationship_rollups ... - Auto-update list rules (Phase 4) → both, depending on predicate
Schema & how it links
Source of truth in Postgres, mirrored to OpenSearch for the list endpoint. Below are the full column shapes and the relationships to existing tables.
Columns — contact_relationship_rollups
| Column | Type | Null? | Default | Notes |
|---|---|---|---|---|
id | bigint | NOT NULL | nextval | Primary key |
collection_id | bigint | NOT NULL | — | FK → collections.id, ON DELETE CASCADE |
contact_id | bigint | NOT NULL | — | FK → contacts.id, ON DELETE CASCADE |
reachability | smallint | NOT NULL | 0 | Enum: 0=none, 1=low, 2=medium, 3=high |
strongest_connection | smallint | NOT NULL | 0 | Enum: 0=unknown, 1=inferred, 2=cold, 3=known, 4=warm |
warm_count | integer | NOT NULL | 0 | Cached count of Warm legs — enables O(1) incremental recompute |
known_count | integer | NOT NULL | 0 | … same for Known |
cold_count | integer | NOT NULL | 0 | … same for Cold |
inferred_count | integer | NOT NULL | 0 | … same for Inferred (2-hop secondary-leg structural matches) |
computed_at | timestamptz | NOT NULL | now() | When the worker last refreshed this row |
created_at | timestamptz | NOT NULL | now() | Standard Rails timestamp |
updated_at | timestamptz | NOT NULL | now() | Standard Rails timestamp |
How it links to existing tables
collections contact_relationship_rollups contacts
+----+-------+ +----+----------------+-------------+ +----+------+
| id | name | <-----FK------ | id | collection_id | contact_id | --FK--> | id | name |
+----+-------+ | | reachability | | +----+------+
| | strongest_conn | |
| | warm_count | |
| | known_count | |
| | cold_count | |
| | inferred_count | |
| | computed_at | |
+----+----------------+-------------+
▲
│ after_commit (create/update/destroy)
│
contact_connections (008)
+----+-------------+--------------+----------+
| id | team_user_id | contact_id | kind |
+----+-------------+--------------+----------+
│
└── triggers RelationshipRollupRefreshWorker
which re-runs 037's scorer per (collection, contact)
and updates the rollup row above
- collections ←
collection_iddefines the "team" (one rollup row per team's view of one contact). - contacts ←
contact_idis the rollup's subject. - contact_connections (008) → not a direct FK, but every change here invalidates a rollup. The refresh worker re-runs 037's scorer for the affected
(collection, contact)pair and writes the new counts/level. - user_contact_collections (UCC) → not linked at the row level (different cardinality — UCC is per-
(user, contact, collection), rollup is per-(collection, contact)). Joins/leaves trigger the refresh worker but the row written here is the aggregate.
Indexes
| Index | Columns | Purpose |
|---|---|---|
idx_…_collection_id_and_contact_id | (collection_id, contact_id) | UNIQUE — one rollup per team/contact |
idx_…_collection_and_reachability | (collection_id, reachability) | Filter "show me High" fast (Postgres-side) |
idx_…_collection_and_strongest | (collection_id, strongest_connection) | Same, for Strongest connection filter |
Derivation (state transitions)
The counts are computed from per-leg scorer output (037); the level enums are total functions of those counts.
warm_count = direct_or_primary_leg_paths.count_where(strength == :warm)
known_count = …(:known)
cold_count = …(:cold)
inferred_count = 2_hop_secondary_leg_paths.count_where(strength == :inferred)
reachability =
warm_count ≥ 1 OR known_count ≥ 3 OR cold_count ≥ 10 → :high
known_count ≥ 1 OR cold_count ≥ 5 → :medium
cold_count ≥ 1 → :low
else → :none
strongest_connection =
warm_count ≥ 1 → :warm
known_count ≥ 1 → :known
cold_count ≥ 1 → :cold
inferred_count ≥ 1 AND no direct evidence → :inferred
else → :unknown
Reachability answers "can the team reach this contact?" — an Inferred leg between two non-team contacts doesn't help the team. Strongest Connection answers "what's the best signal we have?" — Inferred is a meaningful research signal there, so it surfaces when nothing stronger exists.
OpenSearch mirror (additive fields on the contacts index)
The Postgres row is the source of truth; OpenSearch carries a denormalized copy for filter + sort on the contacts-list endpoint.
| Field | Type | Purpose |
|---|---|---|
collections[].reachability | keyword | Multi-select filter values: none | low | medium | high |
collections[].reachability_level | integer | Enum value for sorting (0=None … 3=High). Mirrors the existing seniority_level pattern. |
collections[].strongest_connection | keyword | Multi-select filter values: unknown | inferred | cold | known | warm. No _level companion in R1 — strongest-connection sort is deferred to 041. |
The fields are nested under the existing collections array (one entry per collection the contact belongs to), so a contact in two networks gets two reachability values — one per team.
Why we can't ship 039 without this table
It's tempting to "just compute on the fly" or "just store in OpenSearch." Both fail concretely.
What breaks if there's no table at all
| Use case | Without the table |
|---|---|
| Render the reachability badge on a contacts page (50 rows) | ✅ Works — compute per page from contact_connections |
| Filter "show me only High reachability" | ❌ OpenSearch can't filter on a value it doesn't have. App-side filtering means loading the whole collection → pagination breaks. |
| Sort by reachability | ❌ Same as above. Search engines sort on indexed fields. |
| Auto-update list rule "auto-add every High reachability contact at a Series B" | ❌ Rule evaluator runs against OpenSearch — no field, no rule. |
| Strongest Connection filter | ❌ Same OpenSearch constraint. |
| Contact profile chip stays fast | ⚠ Works but every page load re-runs the scorer for that contact. |
| Sub-second incremental updates when a connection changes | ❌ Re-walking every teammate per invalidation is 10–100× the work. Queue depth would blow out under any meaningful sync. |
| OpenSearch index rebuild | ❌ Re-running the scorer for every contact in every collection = days of compute. With the table = bulk export = minutes. |
The reachability column without filter/sort is "decoration you can see but not act on" — which is most of the product value. Shipping without the table either ships a half-feature or eats massive runtime cost on every list query.
Why removing the table after launch would be costly
If we shipped 039 without the Postgres rollup and then changed our mind:
- Read paths now expect it. Contact-detail endpoint, debug tools, and auto-update list evaluator all read from the table. Removing it is multi-PR work touching every consumer.
- OpenSearch becomes load-bearing. Today OS is a mirror — if it's wrong, we re-export from Postgres. Without the table, OS becomes the source of truth. Every OS rebuild requires re-running the scorer for every contact (days). Every concurrent write needs CAS retry logic.
- The intermediate
warm_count/known_countcache is gone. Every invalidation walks every teammate again. With 800 jobs from a single LinkedIn sync, queue work multiplies by ~50×. - Migration off the table requires a full re-index of the OS contacts index — a planned maintenance event that requires the same backfill task we'd just deleted, plus a new "rollup-from-scratch" path inside the indexer.
- Auto-update lists drift silently. Rules return different results day-over-day because the OS field is now eventually-consistent without a source-of-truth check.
Net: shipping without the table is a one-way door. We'd be paying the cost (build + backfill + migration plumbing) anyway, just later and under harder conditions (live customers, real data, tighter SLAs).
The Postgres rollup table isn't an architectural luxury — it's the only design that makes filter + sort + auto-update and fast incremental updates and recoverable OpenSearch rebuilds all work at the same time. Skipping it ships either a half-feature or a future emergency.
Why an OpenSearch mirror too
The contacts list endpoint reads from OpenSearch, not from Postgres (the list view is the read-hot path with full-text search and complex filters across millions of rows). For Reachability and Strongest Connection to be filterable and sortable, the values have to live in the OpenSearch index — not just Postgres.
The mirror is denormalized as collections[].reachability and collections[].strongest_connection (nested array, one entry per collection the contact belongs to).
- Filter "show me only High reachability" — OpenSearch can't filter on a field it doesn't have. Computing in app code would mean loading all 10k+ contacts and filtering in memory → defeats pagination.
- Sort by reachability — same constraint. Sort happens in the search engine.
- Auto-update list rules — the rule evaluator runs over OpenSearch.
Workers
| Worker | Trigger | What it does |
|---|---|---|
Contacts::RelationshipRollupRefreshWorker (Sidekiq, debounced) |
contact_connections after_commit; UserContactCollection join/leave |
Recomputes one (collection, contact) rollup. Writes Postgres + triggers OS re-index. |
Maintenance::RelationshipStrength::BackfillRollupsTask |
One-time, on flag activation per collection | Populates contact_relationship_rollups for every contact in the collection. Idempotent + restartable. |
Maintenance::Opensearch::BackfillContactsReachability |
One-time, on flag activation per collection | Bulk-indexes the new field on the contacts OpenSearch index. Mirrors UpdateContactsSeniority. |
- Postgres rollup updates within 60s of a triggering change
- OpenSearch document reflects within 90s of the same trigger
- Backfill: 100k contacts in ≤ 30 min
Onboarding sequence (strict order)
Per-collection rollout — when turning the feature on for a new network:
1. BuildConnectionGraphWorker (008) ← populates contact_connections
↓
2. BackfillRollupsTask (039 P0) ← computes & stores rollups
↓
3. UpdateContactsMapping (OS mapping update) ← adds the new field to the index
↓
4. BackfillContactsReachability (039 P1) ← indexes the field for existing contacts
↓
5. Flip `relationship_strength` flag for the collection
- Empty rollups everywhere → list shows "None" for every contact
- Filter "High reachability" returns 0 results
- Sort by reachability returns the same order as before
- Users think the feature is broken
No automated orchestrator chains these today — it's a runbook. Worth documenting as a script in backend/docs/data-migrations.md.
What happens when a new admin joins a network
This is the heaviest event the system handles. Walk-through:
- Admin added —
UserCollectionrow created. Refresh worker fires for every contact in the collection. New admin has no connections yet, so jobs run but rollups don't change (harmless waste). - Admin connects integrations — LinkedIn sync, Google account. Background workers start pulling.
- Connections flow in — for each connection that matches an existing contact in the collection:
- New
contact_connectionsrow inserted after_commitenqueuesRelationshipRollupRefreshWorker(collection_id, contact_id)- Worker re-runs 037's scorer for that contact, writes new rollup, OS re-indexes
- New
- Users see the list update live — as the queue drains, badges upgrade from None / Low → Medium / High. Filter results shift. Auto-update lists gain or lose contacts.
Admin with 5,000 LinkedIn connections, 800 matching existing contacts → 800 jobs enqueued.
Mitigations: debounce via sidekiq-unique-jobs (multiple inserts for the same contact within the window collapse to one job); per-contact granularity (each job is small); Sidekiq concurrency spreads load.
You don't have to re-backfill the collection when an admin joins. The invalidation path handles incremental updates. The per-collection backfill task is only for "first time turning the flag on."
Phase breakdown
| Phase | Owner | Ships |
|---|---|---|
| P0 — Postgres source of truth | BE | Rollup table + model + rollup service + refresh worker + backfill task |
| P1 — OpenSearch mirror | BE | Mapping addition + indexer hook + bulk-sync + backfill task |
| P2 — List endpoint extensions | BE | API filter + sort on reachability + strongest_connection |
| P3 — FE list surfaces + profile chip | FE | Column renderer, filter dropdown, sort affordance, ReachabilityBadge atom, profile chip |
| P4 — Auto-update list integration | BE | Rule evaluator accepts both criteria |
Open questions worth a team conversation
- Should the refresh worker get its own Sidekiq queue? Big-onboarding events could starve user-facing work. Probably yes — minor ops change for big resilience win.
- Stale-read window communication — between a connection sync and the rollup landing, the list is "in between." We don't tell the user. Worth a subtle "Updating reachability…" indicator? Or accept the 60–90s lag as invisible?
- Multi-admin onboarding — when 5 admins join the same week, queue work multiplies. Debounce helps but not enough at scale. Worth a write-up on whether we throttle integration syncs during heavy onboarding periods.
Alternatives considered (and rejected)
| Alternative | Why rejected |
|---|---|
| Compute reachability on-the-fly per page (no precompute) | Works for column display only. Breaks filter, sort, auto-update rules. |
| Store cache in OpenSearch only (skip Postgres table) | OS isn't transactional — concurrent updates lose writes; rebuilds re-run the entire scorer. |
Store on UserContactCollection | Wrong cardinality — UCC is per-(user, contact), reachability is per-(team, contact). |
| Materialized view + refresh | Stale until refresh; bad for "I just added a connection, where is it?" |
| Painless script field at query time | Needs massive denormalization first; slow at scale. |