# Spike: Email & Calendar Integration for Relationship Strength

**Ticket:** GET-109
**Author:** [Enaho]
**Date:** 2026-04-22
**Status:** Draft for team review

---

## 1. Executive summary

We're adding **email + calendar metadata ingestion** (Gmail, Google Calendar, Outlook Mail, Outlook Calendar) to power three relationship indicators on contacts:

1. **Connection strength** — per team-member × contact: **Warm / Known / Cold**
2. **Contact reachability** — per team × contact: **High / Medium / Low / None**
3. **Key connection** — the team member best positioned to intro

The work is more tractable than the PRD implies — several "we have to build everything" assumptions don't match reality. This doc captures what's already shipping, what's actually missing, and a phased plan with a viable thin-V1 that ships real value in ~6 weeks.

### Key decisions from spike

- **Keep Contact / ContactEmail structure unchanged.** Collection scoping stays.
- **Use Findem enrichment as the dedup oracle** (email → canonical identity → Getro Contact).
- **Two-layer scoring**: admin-configurable rules assign the tier; a weighted score ranks within tier.
- **Never create contacts from email metadata.** Unattributed emails stage in an "unresolved" state and resolve lazily when contacts are later imported.
- **Phased rollout**: thin V1 (email + calendar only) → add Findem resolver → add user enrichment → UI surfaces.

---

## 2. PRD corrections based on codebase research

The PRD implies a cold start. Research shows otherwise.

| PRD claim | Reality |
|---|---|
| "V1 G Suite integration was pulled back" | **Google Contacts sync is LIVE**, shipped Dec 2025, daily cron at 6am UTC. Feature flag `google_sync`. |
| "We need to add email/calendar OAuth scopes" | Scopes **already declared** in `GoogleAccount`: `gmail.metadata`, `calendar.readonly`. No Google Console work. |
| "Integration failed due to duplicates — must fix before shipping" | Duplicates are a **design consequence** of Collection-scoped Contacts, not a bug. Root-cause fix = external identity oracle (Findem). |
| "Must support both Google & Microsoft at launch" | Google is extension work; **Microsoft is entirely greenfield**. |
| "Integration points: dashboard, table, contact detail" | Admin-portal `/Settings/integrations` is **already per-user** (not network-scoped). Extension, not rebuild. |

### The real V1 problem

The V1 duplicate issue is real but misdiagnosed. Getro's `Contact` is collection-scoped: the same email in Collection A vs. Collection B creates two rows. When Google Contacts sync imports a user's address book into a collection, any person already present under a different email (or in another collection) can dupe. The existing `MergeService` resolves duplicates after enrichment, but that's expensive and reactive.

**Solution**: resolve identity at ingestion time using Findem's canonical person lookup. Do not create contacts from email/calendar data. Let interactions sit unresolved until the contact already exists.

---

## 3. Three indicators, one architecture

Per the PRD:

### Connection strength — user × contact

Fixed tiers in V1, rules-configurable in V2.

- **Warm**: "They know this person well and a warm intro is very likely to land"
- **Known**: "There's a real connection, but it's older or thinner — confirmer before intro"
- **Cold**: "No meaningful relationship signal"

Each tier qualifies via a list of alternative conditions (OR). First matching tier wins (Warm → Known → Cold → default Cold).

Each team member can **override** their own strength for any contact.

### Contact reachability — team × contact

Pure aggregation over connection strengths in the team's collection.

- **High**: 1+ Warm, OR 3+ Known, OR 10+ Cold
- **Medium**: 1+ Known, OR 5+ Cold
- **Low**: 1+ Cold
- **None**: otherwise

Not user-editable in V1.

### Key connection — team × contact

Team member with the highest strength tier for this contact (tiebreak on weighted score).

---

## 4. Architecture

### Data flow

```text
┌─────────────────────────────────────────────────────────────────┐
│ INGESTION (per-user, metadata only)                             │
│                                                                  │
│   GoogleAccount    ──▶ GmailMetadataSyncer       ──┐            │
│   (existing, extend)   CalendarMetadataSyncer    ──┤            │
│   MicrosoftAccount ──▶ OutlookMailSyncer         ──┤            │
│   (new)                OutlookCalendarSyncer     ──┤            │
│                                                     ▼            │
│                                    ┌──────────────────────────┐ │
│                                    │ InteractionEvent (new)   │ │
│                                    │  user_id, provider,      │ │
│                                    │  kind (email | meeting), │ │
│                                    │  direction, occurred_at, │ │
│                                    │  thread/conv_id,         │ │
│                                    │  contact_email,          │ │
│                                    │  contact_id (nullable),  │ │
│                                    │  remote_message_id UQ    │ │
│                                    └──────────┬───────────────┘ │
└───────────────────────────────────────────────┼─────────────────┘
                                                │
┌───────────────────────────────────────────────┼─────────────────┐
│ RESOLUTION (email → Contact via Findem)       ▼                 │
│                                                                  │
│   FindemEnrichmentResolver                                      │
│     1. Check EnrichedEmailLookup cache (TTL-based)              │
│     2. If miss → Findem person-by-email lookup (F1)             │
│     3. Returns: linkedin_handle, known_emails[], findem_id      │
│     4. Lookup Getro Contact via:                                │
│         a. Contact.find_by(linkedin_handle) — GLOBAL unique     │
│         b. ExistingContactsByEmailsFinder over known_emails[]   │
│     5. If found → set contact_id on InteractionEvent            │
│     6. If not → leave NULL; revisit on Contact create           │
└───────────────────────────────────────────────┬─────────────────┘
                                                │
┌───────────────────────────────────────────────┼─────────────────┐
│ ROLLUP (materialized cache for scoring)       ▼                 │
│                                                                  │
│   UserContactInteractionStats (new)                             │
│     user_id, contact_id                                         │
│     email_outbound_count_12mo, email_inbound_count_12mo         │
│     email_outbound_count_24mo, email_inbound_count_24mo         │
│     email_twoway_count_total                                    │
│     email_first_at, email_last_at                               │
│     email_last_outbound_at, email_last_inbound_at               │
│     email_has_response_in_3mo (bool)                            │
│     email_active_quarters_count                                 │
│     email_active_quarters_consecutive                           │
│     meeting_count_180d, meeting_count_24mo                      │
│     meeting_last_at                                             │
│                                                                  │
│   Cadence: nightly full rebuild + incremental on insert         │
└───────────────────────────────────────────────┬─────────────────┘
                                                │
┌───────────────────────────────────────────────┼─────────────────┐
│ ENRICHMENT (via Findem, for user work/edu)    ▼                 │
│                                                                  │
│   UserEnrichedProfile, UserWorkExperience, UserEducation        │
│   Users::Enrichment::FindemSyncer                               │
│   WorkOverlapCalculator(user, contact) →                        │
│     { overlap_months, same_team, small_company, recency_years } │
│   EducationOverlapCalculator(user, contact) →                   │
│     { shared_schools[], overlap_years[] }                       │
└───────────────────────────────────────────────┬─────────────────┘
                                                │
┌───────────────────────────────────────────────┼─────────────────┐
│ SCORING (rule engine + weighted score)        ▼                 │
│                                                                  │
│   RelationshipStrengthService.call(user, contact)               │
│     → { tier: :warm | :known | :cold,                            │
│         score: Float,                                            │
│         reasons: [{ code, met, detail }, ...] }                 │
│                                                                  │
│   Persists to UCC: strength_tier, strength_score,               │
│     strength_override, strength_computed_at                     │
│                                                                  │
│   ReachabilityService.call(contact, collection)                 │
│     → { tier: :high | :medium | :low | :none,                    │
│         key_user_id,                                             │
│         warm_count, known_count, cold_count }                   │
│                                                                  │
│   Persists to ContactReachability (new)                         │
└─────────────────────────────────────────────────────────────────┘
```

### Key design commitments

- **`InteractionEvent` is the single normalized store.** Gmail, Outlook, Google Calendar, MS Graph Calendar all land here. No `GmailInteraction` vs. `OutlookInteraction` split.
- **Strength is collection-agnostic at storage.** Scoping happens at read time. One user × contact pair has one strength.
- **Reachability is collection-scoped.** Pure SQL aggregation over UCCs in that collection.
- **Two-layer scoring**:
  - **Rules assign the tier**. Fixed in V1; admin-configurable in V2 per the PRD ("admins can set numbers on defined signals, can't add new signal types").
  - **Weighted score ranks within tier**. Numeric, hidden from users in V1.
  - Per Evan's thread: each signal has a 0-1 strength normalized + category weight; sum = score. Exposing numbers to users = V2.
- **Explainability**: scorer returns `reasons[]` — all satisfied conditions, used for the "hover to see why" UX.
- **UCC extension, not replacement**: `strength_tier`, `strength_score`, `strength_override`, `strength_computed_at` added as columns.

---

## 5. Heuristic coverage matrix

All 16 heuristic clauses from the PRD, mapped to data sources across three possible build slices.

**Slice A** = Thin V1: email + calendar only, no Findem, no user enrichment.
**Slice B** = Slice A + Findem dedup resolver.
**Slice C** = Slice B + Findem user enrichment.

| Tier | Clause | Slice A | Slice B | Slice C |
|---|---|---|---|---|
| Warm | 2-way exchange 12mo, 5+ each direction | ✅ | ✅ | ✅ |
| Warm | Sustained 2-way over 2+ years, low volume | ✅ | ✅ | ✅ |
| Warm | Recent email + response + work OR LinkedIn | ❌ | ❌ | ⚠️ work-side only |
| Warm | 1+ calendar meeting last 180d | ✅ | ✅ | ✅ |
| Warm | Same-team / small company overlap | ❌ | ❌ | ✅ |
| Known | Occasional 2-way (3-10 total) | ✅ | ✅ | ✅ |
| Known | Inbound within 2yr (they reached out) | ✅ | ✅ | ✅ |
| Known | Meeting 180d-24mo ago | ✅ | ✅ | ✅ |
| Known | Same-team / small company overlap (looser window) | ❌ | ❌ | ✅ |
| Known | Same company + location + function | ❌ | ❌ | ✅ |
| Known | LinkedIn connection AND (old overlap OR shared school) | ❌ | ❌ | ⚠️ (needs LinkedIn degree) |
| Cold | One-way outbound only (no reply) | ✅ | ✅ | ✅ |
| Cold | Inbound newsletter-style only | ✅ | ✅ | ✅ |
| Cold | LinkedIn 1st-degree with no other signal | ❌ | ❌ | ❌ |
| Cold | Past shared employment >5yr, no recent contact | ❌ | ❌ | ✅ |
| Cold | Default fallback | ✅ | ✅ | ✅ |

**Activation rates**:
- **Slice A (thin V1)**: 8 / 16 = 50%
- **Slice B (+ Findem dedup)**: 8 / 16 = 50% (attribution coverage improves but no new clauses activate)
- **Slice C (+ user enrichment)**: 14 / 16 = 88%

**Remaining gap in Slice C**: LinkedIn 1st-degree connection degree. No current Getro source. Potentially available via browser extension or Findem — needs investigation.

### Attribution coverage by slice

- **Slice A**: email→Contact only matches when the email is already on a `ContactEmail` row. Realistic hit rate: **40-60%**. Rest sit as unresolved interactions.
- **Slice B + C**: Findem supplies canonical identity for emails not already in Getro. Realistic hit rate: **75-85%**.

---

## 6. Heuristic implementation — per clause

Every clause is a boolean predicate over the rollup tables. The rule engine evaluates clauses in order (Warm → Known → Cold → default); the first tier with ≥1 matching clause wins. All matched conditions across every tier are collected into `reasons[]` for the hover UI.

The weighted score is a separate calculation used only for within-tier sorting. Each satisfied signal contributes `signal_weight × signal_strength` to the total. V1 weights fixed, V2 exposes them per Evan's thread.

All counts below read from `UserContactInteractionStats` (the rollup). Work/education overlap reads from `WorkOverlapCache` / `EducationOverlapCache` — derived tables populated by a nightly overlap calculator (Slice C).

### Warm tier

#### W1. Two-way exchange in last 12 months, 5+ each direction

**Plain English**: You've had a genuine back-and-forth over the past year — at least 5 emails each direction.

**Technical**:
```ruby
stats.email_inbound_count_12mo >= 5 &&
  stats.email_outbound_count_12mo >= 5
```

**Rollup logic**: on each `InteractionEvent` insert where `kind = :email` and `occurred_at > 12.months.ago`, increment the direction counter. Nightly rebuild recomputes from `InteractionEvent` (source of truth).

**Slice**: A (active in thin V1).

---

#### W2. Sustained 2-year back-and-forth (low volume)

**Plain English**: You've stayed in touch on and off over at least two years — not frequent, but a steady drumbeat.

**Technical**:
```ruby
stats.email_active_quarters_consecutive >= 8
```
A quarter is "active" iff it contains ≥1 inbound AND ≥1 outbound email. 8 consecutive = 2 years.

**Rollup logic**: maintain `email_activity_bitmap_12q` as a 12-bit integer (LSB = current quarter). On email insert, set the corresponding bit if 2-way presence achieved for that quarter. Derive `active_quarters_consecutive` as longest run of set bits.

**Open**: relax to "6 of last 8" if 8-of-8 proves too strict. Javier flagged this ambiguity ("at least one email per quarter, not 4 emails 20 months ago").

**Slice**: A.

---

#### W3. Recent email (3mo) + response + work OR LinkedIn connection

**Plain English**: You emailed them recently, they replied, and you're linked in some other way — worked together or connected on LinkedIn.

**Technical**:
```ruby
stats.email_last_outbound_at > 90.days.ago &&
  stats.email_has_response_in_3mo &&
  (WorkOverlapCache.exists?(user:, contact:) ||
   UserLinkedinConnection.exists?(user:, contact:))
```

**`email_has_response_in_3mo` computation**: within last 90d, there exists an outbound email in thread T followed by an inbound email in the same thread T at a later timestamp.

**Slice**: C partial — work-overlap half activates; LinkedIn half stays dormant until LinkedIn degree data lands.

---

#### W4. 1+ calendar meeting in last 180 days

**Plain English**: You've met with them via calendar in the last 6 months.

**Technical**:
```ruby
stats.meeting_count_180d >= 1
```

**Rollup logic**: count `InteractionEvent` rows where `kind = :meeting`, user and contact both in attendee list with `response_status != :declined`, within 180d.

**Slice**: A.

---

#### W5. Same-team or small-company overlap, recent

**Plain English**: You worked closely together — same team, or a small company where everyone knew each other — within the last few years.

**Technical**:
```ruby
overlap = WorkOverlapCache.where(user:, contact:).order(ended_at: :desc).first
overlap &&
  overlap.months >= 3 &&
  overlap.ended_at > 5.years.ago &&
  (overlap.same_team ||
   overlap.company_size < 50 ||
   overlap.both_c_level ||
   (overlap.same_location && overlap.same_job_function))
```

**Reads**: `WorkOverlapCache` (computed from `UserWorkExperience` × `ContactWorkExperience` on matching `organization_id` with date range intersection).

**Slice**: C.

---

### Known tier

#### K1. Occasional 2-way (3–10 emails total, any time)

**Plain English**: You've traded emails a few times over the years — real contact, but not frequent.

**Technical**:
```ruby
stats.email_twoway_count_total.between?(3, 10)
```

**Definition of "2-way email"**: counted per distinct thread where this user × contact pair has both inbound and outbound messages. Single-direction threads don't count.

**Slice**: A.

---

#### K2. One-way inbound within 2 years (they reached out)

**Plain English**: They emailed you at some point in the last 2 years, and it wasn't a newsletter.

**Technical**:
```ruby
stats.email_inbound_non_newsletter_count_24mo >= 1
```

**Newsletter detection (runs at ingestion)**:
```ruby
newsletter = msg.header?('List-Unsubscribe') ||
             msg.header('Precedence').in?(%w[bulk list junk]) ||
             msg.recipients.count > 20
```
The `newsletter` flag is persisted on `InteractionEvent` and excluded from the non-newsletter rollup counter.

**Slice**: A.

---

#### K3. Calendar meeting 180d–24mo ago

**Plain English**: You met with them 6 months to 2 years ago — not recent, not ancient.

**Technical**:
```ruby
(stats.meeting_count_24mo - stats.meeting_count_180d) >= 1
```

**Slice**: A.

---

#### K4. Same-team overlap (broader window, looser gates)

**Plain English**: You worked together in the past — maybe not recently, but the connection is real.

**Technical**: like W5 but `ended_at > 10.years.ago` and the c-level gate dropped.

**Slice**: C.

---

#### K5. Same company + same location + same job function

**Plain English**: You worked at the same company, same office, similar roles — bumped into each other, even if not the same team.

**Technical**:
```ruby
overlap.same_company &&
  overlap.same_location &&
  overlap.same_job_function &&
  overlap.months >= 3
```
No time-window restriction.

**Slice**: C.

---

#### K6. LinkedIn connection + (old overlap OR shared school)

**Plain English**: You're connected on LinkedIn, plus some historical overlap at work or school.

**Technical**:
```ruby
UserLinkedinConnection.exists?(user:, contact:) &&
  (WorkOverlapCache.exists?(user:, contact:) ||
   EducationOverlapCache.exists?(user:, contact:))
```

**Slice**: C partial (work/education sides OK; LinkedIn degree not available in V1).

---

### Cold tier

#### C1. One-way outbound only (no reply ever)

**Plain English**: You've sent emails, they never replied. Cold outreach.

**Technical**:
```ruby
stats.email_outbound_count_total > 0 &&
  stats.email_inbound_count_total == 0
```

**Slice**: A.

---

#### C2. Inbound newsletter-style only

**Plain English**: You're on a mailing list with them, but you've never had a real exchange.

**Technical**:
```ruby
stats.email_inbound_non_newsletter_count_total == 0 &&
  stats.email_inbound_newsletter_count_total > 0 &&
  stats.email_outbound_count_total == 0
```

**Slice**: A.

---

#### C3. LinkedIn 1st-degree with no other signal

**Plain English**: You're connected on LinkedIn only — nothing else.

**Technical**:
```ruby
UserLinkedinConnection.exists?(user:, contact:) &&
  stats.email_count_total == 0 &&
  stats.meeting_count_total == 0 &&
  !WorkOverlapCache.exists?(user:, contact:)
```

**Slice**: not activatable in V1 (no LinkedIn degree source).

---

#### C4. Past shared employment >5yr ago, no recent contact

**Plain English**: You worked together a long time ago and haven't kept in touch.

**Technical**:
```ruby
overlap = WorkOverlapCache.where(user:, contact:).order(ended_at: :desc).first
overlap &&
  overlap.ended_at < 5.years.ago &&
  (stats.email_last_at.nil? || stats.email_last_at < 2.years.ago) &&
  (stats.meeting_last_at.nil? || stats.meeting_last_at < 2.years.ago)
```

**Slice**: C.

---

#### C5. Default fallback

**Plain English**: No meaningful signal found — treat as cold.

**Technical**: no clause above matched → tier = `:cold`, reasons = empty.

**Slice**: A (always active as the fallback).

---

### Aggregates

#### Contact Breadth ("how many on your team know this person")

**Plain English**: "8 Inovians connected on LinkedIn, 3 have exchanged emails, 2 have met."

**Technical**:
```sql
SELECT
  COUNT(*)                        FILTER (WHERE ucc.strength_tier IS NOT NULL) AS total_known,
  COUNT(*)                        FILTER (WHERE stats.email_count_total > 0)   AS email_connected,
  COUNT(*)                        FILTER (WHERE stats.meeting_count_total > 0) AS met_count,
  COUNT(*)                        FILTER (WHERE ucc.linkedin_handle_matched)   AS linkedin_count
FROM user_contact_collections ucc
LEFT JOIN user_contact_interaction_stats stats
  ON (stats.user_id, stats.contact_id) = (ucc.user_id, ucc.contact_id)
WHERE ucc.contact_id    = :contact_id
  AND ucc.collection_id = :collection_id;
```

**Slice**: A (the UCC row + rollup are sufficient; Google Contacts sync already populates UCCs with `source: :google_contacts_import`).

---

#### Reachability (team × contact)

**Plain English**:
- **High**: very likely to get a warm intro — someone on the team knows them well.
- **Medium**: possible intro, team has real but thinner connections.
- **Low**: only weak signals; probably a cold reach.
- **None**: no one on the team is connected.

**Technical**:
```ruby
counts = uccs_in_collection.group(:strength_tier).count
reachability =
  if    counts[:warm].to_i  >= 1  || counts[:known].to_i >= 3 || counts[:cold].to_i >= 10 then :high
  elsif counts[:known].to_i >= 1  || counts[:cold].to_i  >= 5                             then :medium
  elsif counts[:cold].to_i  >= 1                                                          then :low
  else :none
  end
```

**Materialization**: `ContactReachability(contact_id, collection_id, tier, warm_count, known_count, cold_count, key_user_id, computed_at)`. Recomputed when any UCC strength tier changes in that collection, or on new Contact creation.

**Slice**: A.

---

#### Key Connection

**Plain English**: "The best person on your team to ask for an intro."

**Technical**:
```ruby
uccs_in_collection
  .order(Arel.sql("
    CASE strength_tier
      WHEN 'warm'  THEN 2
      WHEN 'known' THEN 1
      WHEN 'cold'  THEN 0
    END DESC,
    strength_score DESC NULLS LAST
  "))
  .first
  &.user_id
```

**Slice**: A.

---

### Summary table: clause × dependencies × slice

| Clause | Reads | Slice |
|---|---|---|
| W1 | email counters 12mo | A |
| W2 | email activity bitmap | A |
| W3 | email recency + response + work/LinkedIn | C partial |
| W4 | meeting count 180d | A |
| W5 | work overlap cache | C |
| K1 | email 2-way total | A |
| K2 | inbound non-newsletter 24mo | A |
| K3 | meeting 180d vs 24mo delta | A |
| K4 | work overlap cache | C |
| K5 | work overlap cache (company+loc+function) | C |
| K6 | LinkedIn + work/education overlap | C partial |
| C1 | email counts all-time | A |
| C2 | email counts (newsletter split) | A |
| C3 | LinkedIn + absence-of-signal | not V1 |
| C4 | old work overlap + email/meeting absence | C |
| C5 | default fallback | A |
| Breadth | UCC + stats per collection | A |
| Reachability | UCC.strength_tier counts per collection | A |
| Key Connection | UCC ordered by tier + score | A |

---

## 7. Findem integration surface

Every Findem capability the design consumes. Some live, some need Findem to confirm.

| ID | Capability | Getro consumer | Status |
|---|---|---|---|
| F1 | Person lookup by email | Dedup resolver | **Pending Findem** |
| F2 | Person enrichment by LinkedIn handle | Contact enrichment | ✅ Live |
| F3 | Enrichment when only email is known (no handle) | Contact enrichment fallback | **Pending Findem** |
| F4 | User enrichment (same APIs as contacts) | `UserEnrichedProfile`, `UserWorkExperience`, `UserEducation` | **Pending Findem** |
| F5 | Company details (size, stage, industry) | `ContactWorkExperience` → extend to organizations | ⚠️ Partial |
| F6 | Company investor / cap-table data | Investor-overlap signal | **Pending Findem** |
| F7 | Webhooks on profile updates | Strength re-roll on enrichment changes | ✅ Framework exists |
| (bonus) | LinkedIn connection-degree data | Close the one V1 gap | **Pending Findem** (long shot) |

### Questions to ask Findem

1. **F1**: Does person-by-email lookup exist? Rate limits, latency, cost per call, response schema?
2. **F3**: Can we enrich with just `email` (no handle, no name)? What's the fallback response when email is unknown?
3. **F4**: Are there TOS / product concerns with enriching our own authenticated users vs. external contacts? Same endpoint?
4. **F6**: Does Findem expose investor / funding round data on companies? What shape?
5. **LinkedIn degree**: Does Findem track 1st-degree LinkedIn connections for a person anywhere?

---

## 8. External integrations: build status

### Google Workspace (extension of live integration)

| Component | Status |
|---|---|
| OAuth client, GoogleAccount model, token encryption | ✅ Live |
| Scopes for `gmail.metadata` and `calendar.readonly` | ✅ Declared, not ingested |
| People API contact sync | ✅ Live |
| Admin-portal Google card, OAuth flow, polling | ✅ Live |
| Per-scope UI toggles (email / calendar separate from contacts) | 🔧 Build |
| Gmail metadata client (`lib/google/gmail_metadata_client.rb`) | 🔧 Build |
| Calendar metadata client (`lib/google/calendar_metadata_client.rb`) | 🔧 Build |
| Syncers, schedulers, workers for both | 🔧 Build |
| Newsletter / bulk inbound filter | 🔧 Build |
| CircuitBox wrap retrofit | 🔧 Build |
| Feature flags: `gmail_metadata_sync`, `google_calendar_metadata_sync` | 🔧 Build |

### Microsoft 365 / Outlook (greenfield)

Every component is new.

- Azure AD app registration + multi-tenant strategy
- `MicrosoftAccount` model + migration
- `lib/microsoft/graph_client.rb` base client
- `lib/microsoft/mail_metadata_client.rb` (scope: `Mail.ReadBasic`)
- `lib/microsoft/calendar_metadata_client.rb` (scope: `Calendars.ReadBasic`)
- OAuth callback controller + Signet-based refresh flow
- Syncers, schedulers, workers for both mail and calendar
- CircuitBox wrap
- Admin-portal Microsoft card
- RTK Query service `userMicrosoftAccountsV2`
- Feature flag `microsoft_sync`
- Env vars: `MICROSOFT_CLIENT_ID`, `MICROSOFT_CLIENT_SECRET`, `MICROSOFT_TENANT_ID`
- `docs/integrations.md` network egress registry row

### Findem (build after Findem confirms capabilities)

- CircuitBox wrap retrofit (verify gap)
- `lib/findem/apis/profile.rb` — `lookup_by_email` method
- `EnrichedEmailLookup` cache table + model
- `Users::Enrichment::FindemSyncer`
- `UserEnrichedProfile` / `UserWorkExperience` / `UserEducation` models + migrations
- User-profile-updated webhook handler
- `Organizations::Enrichment::FindemInvestorSyncer` (conditional on F6)

### Resolver / glue layer

- `InteractionEvent` table + model
- `FindemEnrichmentResolver` service (Slice B+)
- Simple `ContactEmailResolver` fallback (Slice A)
- Backfill-on-Contact-create hook
- Backfill-on-Contact-merge hook
- Email normalization utility

---

## 9. Phased execution plan

### Recommended path: ship Slice A first, then upgrade

| Phase | Scope | Depends on | Est. effort | Slice |
|---|---|---|---|---|
| **1** | `InteractionEvent` table + `ContactEmailResolver` (simple, no Findem) + backfill hooks | — | S | A |
| **2** | Gmail metadata syncer + Google Calendar syncer + scope toggle UI + CircuitBox retrofit | 1 | M | A |
| **3** | `UserContactInteractionStats` + nightly rollup + incremental-on-insert | 2 | M | A |
| **4** | `RelationshipStrengthService` rule engine (active 8 clauses) + `ReachabilityService` + UCC strength columns + override | 3 | M | A |
| **5** | Contact detail signals panel + sortable list columns + reachability + "connection of" filters + hover reasons | 4 | M | A |
| **6** | **MS Graph integration** (OAuth, mail+calendar syncers, admin-portal card) | 1 | L | A |
| **↓ Thin V1 ships here — 50% heuristic activation, Google + Microsoft, shippable** ↓ |||||
| **7** | `lib/findem/apis/profile.rb` `lookup_by_email` + `EnrichedEmailLookup` cache + swap resolver | Findem F1 confirmed | S | B |
| **8** | User enrichment pipeline (F4): `UserEnrichedProfile`, `UserWorkExperience`, `UserEducation`, syncer, webhook | Findem F4 confirmed | M | C |
| **9** | `WorkOverlapCalculator` + `EducationOverlapCalculator` + activate work/edu clauses in rule engine | 8 | M | C |
| **10** | (Optional) Investor overlap via F6 — activate investor signals | Findem F6 confirmed | S | C+ |
| **11** | (Nice-to-have) Intro request auto-draft | 5 | S | — |

### Thin V1 scope (Phase 1-6) — ~6 weeks one eng

Ships with:
- Both providers connected
- 8 of 16 heuristic clauses active
- Reachability, Key Connection, Breadth all fully functional
- 40-60% email attribution coverage

### Full V1 (Phases 1-9) — ~10 weeks

Ships with:
- 14 of 16 heuristic clauses active
- 75-85% email attribution coverage
- Work + education overlap signals live

---

## 10. Gotchas & risks

### 1. Cross-collection duplicate emails
**Risk**: a user's mailbox may contain interactions with contacts across every collection they belong to. Same email in Collection A vs. B = different Contacts by design.
**Mitigation**: Findem resolver uses `linkedin_handle` (globally unique in Getro) as the primary dedup key. Fallback to ContactEmail lookup.

### 2. Pending / unresolved interactions
**Risk**: when Alice emails `bob@acme.com` and bob isn't in Getro yet, we can't attribute. If we drop, we lose signal; if we create, we violate "no contact creation" V1 rule.
**Mitigation**: `InteractionEvent.contact_id` nullable. Backfill via `after_commit` hook on Contact create/merge.

### 3. Newsletter / bulk inbound noise
**Risk**: without filter, half of "Cold: one-way inbound" becomes "Known: they reached out." Makes tier look dumb.
**Mitigation**: Must-have heuristic. Detect via `List-Unsubscribe`, `Precedence: bulk/list`, N-recipient count.

### 4. Rate limits and backfill horizon
- Gmail: 1B quota units/day/project, ~250 units/user/sec
- MS Graph: 10k requests/10min/user
- Initial backfill: **24 months** (matches sustained-2yr heuristic). Older history = one-time historical job.

### 5. Findem rate + cost
**Risk**: hitting Findem for every new email observed is unsustainable.
**Mitigation**: `EnrichedEmailLookup` cache with TTL (90-180 days), negative caching for unknown emails (shorter TTL), batch lookups where API supports.

### 6. Self-emails / internal team-member emails
**Risk**: emailing your Inovia colleague makes them look like a strong "contact" relationship.
**Mitigation**: resolver filters out emails whose canonical identity is another `User` in the same `Collection`.

### 7. Privacy boundary enforcement
**Risk**: metadata-only is policy; code must enforce.
**Mitigation**:
- Use `gmail.metadata` scope (no body access possible at API level)
- Use `Mail.ReadBasic` + explicit `$select` excluding body fields
- Code review checklist: no body/subject/content fields written to DB

### 8. Ambiguous Findem responses
**Risk**: Findem may return multiple candidate identities for a single email, or low confidence.
**Mitigation**: conservative-skip on ambiguous responses (`contact_id = NULL`, audit log entry, dashboard for manual review).

---

## 11. Decisions the team should confirm

1. **Ship Slice A as V1?** Yes / No / Ship Slice A as internal-only, hold external launch until Slice C
2. **"Sustained at lower volume" threshold**: recommend **≥1 two-way email in 8 of the last 8 quarters** (continuous 2-year presence). Alternative: 6 of 8. Confirm default.
3. **Reachability editable?** Recommend **No for V1** — pure aggregation, editing creates drift vs. strength.
4. **Strength override audit**: when user overrides, do we keep audit of what rules would have produced? Recommend **yes**, for future rule-tuning feedback.
5. **Backfill horizon**: 24 months default — confirm.
6. **LinkedIn degree investigation**: worth a short codebase scan of the browser extension to see if 1st-degree data is already captured.
7. **"Recommended connection path" crown icon** (Evan's comment): treat as part of Key Connection UI surface in Phase 5.

---

## 12. File references for implementation

Key existing Getro code to extend or mirror:

| Purpose | Path |
|---|---|
| Google OAuth client | `backend/lib/google/oauth_client.rb` |
| People API client | `backend/app/services/google/people_client.rb` |
| GoogleAccount model | `backend/app/models/google_account.rb` |
| Contacts syncer (reference pattern) | `backend/app/services/contacts/import/google/contacts_syncer.rb` |
| Daily scheduler (reference pattern) | `backend/app/workers/schedulers/contacts/import/google_contacts_daily_sync_scheduler.rb` |
| Contact model | `backend/app/models/contact.rb` |
| ContactEmail model | `backend/app/models/contact_email.rb` |
| Dedup intake | `backend/app/services/contacts/contact_creator.rb` |
| Dedup lookup | `backend/app/services/contacts/existing_contact_finder.rb` |
| Merge with audit | `backend/app/services/contacts/merge_service.rb` |
| ContactEnrichedProfile (mirror for User) | `backend/app/models/contact_enriched_profile.rb` |
| UserContactCollection (add strength columns) | `backend/app/models/user_contact_collection.rb` |
| Findem base API | `backend/lib/findem/client.rb`, `backend/lib/findem/apis/` |
| Admin-portal integrations page | `admin-portal/src/pages/Settings/integrations/` |
| RTK Query Google service (mirror for Microsoft) | `admin-portal/src/services/userGoogleAccountsV2.js` |
| Path card UI (reuse for strength signals) | `admin-portal/src/pages/listDetail/networkConnections/components/pathCard/` |
| Contact detail page (new signals panel) | `admin-portal/src/components/organisms/contactDetail/` |
| Contacts list views (new columns) | `admin-portal/src/pages/contactsExtended/` |

---

## 13. Appendix: signal primitives

Atomic facts the scoring layer reads. Each primitive maps to exactly one data source.

| # | Primitive | Source | V1 availability |
|---|---|---|---|
| S1 | Inbound email count (user ← contact), windowed | Gmail metadata / MS Graph Mail.ReadBasic | ✅ |
| S2 | Outbound email count (user → contact), windowed | Same | ✅ |
| S3 | Response-to-outbound within N days | thread_id / conversation_id | ✅ |
| S4 | Quarter-bitmap of 2-way email presence | S1+S2 bucketed | ✅ |
| S5 | First / last email timestamp | internalDate / sentDateTime | ✅ |
| S6 | Newsletter / bulk inbound flag | `List-Unsubscribe`, `Precedence`, N-recipient | ✅ |
| S7 | Meeting count, windowed | Google Calendar / Graph Calendar | ✅ |
| S8 | Last meeting timestamp | Calendar event start | ✅ |
| S9 | Meeting RSVP status (non-declined) | Calendar attendee responseStatus | ✅ |
| S10 | User employment history | Findem F4 → `UserWorkExperience` | Slice C |
| S11 | Contact employment history | Existing `ContactWorkExperience` | ✅ |
| S12 | Work overlap (user × contact) | S10 × S11 | Slice C |
| S13 | Company size / c-level flags | `ContactWorkExperience` + Findem enrichment | ⚠️ partial |
| S14 | User education | Findem F4 → `UserEducation` | Slice C |
| S15 | Contact education | Existing `ContactEducation` | ✅ |
| S16 | Education overlap | S14 × S15 | Slice C |
| S17 | LinkedIn handle on Contact | `Contact.linkedin_handle` | ✅ |
| S18 | LinkedIn 1st-degree connection flag | None today — potential browser extension or Findem | ❌ |
| S19 | Company investor data | Findem F6 | Pending |
| S20 | Findem canonical identity for email | Findem F1 | Pending |
