GET-109 Β· Technical Spec

Email & Calendar Integration for Relationship Strength

Drafted 2026-04-22 Β· Status: review Β· Audience: backend + admin-portal engineers

1. Overview

Ingest email and calendar metadata from each admin user's Google Workspace or Microsoft 365 account and turn it into three transparent relationship indicators on contacts:

  • Connection strength β€” per team-member Γ— contact: Warm Known Cold
  • Contact reachability β€” per team Γ— contact: High / Medium / Low / None
  • Key connection β€” the best person on the team to ask for an intro

Anchor decisions

  • Keep Contact and ContactEmail structure unchanged. Collection scoping stays.
  • Use Findem enrichment as the dedup oracle (email β†’ canonical identity β†’ Getro Contact).
  • Two-layer scoring: admin rules assign the tier, weighted score sorts within tier.
  • Never create contacts from email metadata. Unattributed interactions stage and resolve lazily.
  • Ship a thin V1 (email + calendar, no Findem, no user enrichment) first; upgrade in place.

2. System architecture

Five pipeline stages β€” ingestion, resolution, rollup, enrichment, scoring β€” each isolated, each restartable independently. No stage writes outside its own tables.

flowchart LR subgraph Providers["External providers"] GM["Gmail
(gmail.metadata)"] GC["Google Calendar
(calendar.readonly)"] OM["Outlook Mail
(Mail.ReadBasic)"] OC["Outlook Calendar
(Calendars.ReadBasic)"] FI["Findem"] end subgraph Ingest["Ingestion (per user)"] GMS[GmailSyncer] GCS[GoogleCalSyncer] OMS[OutlookMailSyncer] OCS[OutlookCalSyncer] end IE[("InteractionEvent
user Β· kind Β· direction
occurred_at Β· thread_id
contact_email Β· contact_id?")] subgraph Resolve["Resolution"] FER[FindemEnrichmentResolver] CEL[(EnrichedEmailLookup
cache)] end subgraph Rollup["Rollup"] RW[NightlyRollupWorker] UCIS[("UserContactInteractionStats")] end subgraph Enrich["User enrichment"] UES[Users::Enrichment::FindemSyncer] UEP[("UserEnrichedProfile
UserWorkExperience
UserEducation")] WOC[(WorkOverlapCache
EducationOverlapCache)] end subgraph Score["Scoring"] RSS[RelationshipStrengthService] RS[ReachabilityService] UCC[("UCC: strength_tier
strength_score
strength_override")] CR[("ContactReachability
tier Β· counts
key_user_id")] end GM --> GMS --> IE GC --> GCS --> IE OM --> OMS --> IE OC --> OCS --> IE IE --> FER FER -.->|cache| CEL FER -.->|lookup| FI FER -->|contact_id| IE IE --> RW --> UCIS FI --> UES --> UEP --> WOC UCIS --> RSS WOC --> RSS RSS --> UCC UCC --> RS --> CR classDef provider fill:#e0f2fe,stroke:#0369a1 classDef data fill:#fef3c7,stroke:#b45309 classDef service fill:#f3e8ff,stroke:#7c3aed class GM,GC,OM,OC,FI provider class IE,CEL,UCIS,UEP,WOC,UCC,CR data class GMS,GCS,OMS,OCS,FER,RW,UES,RSS,RS service
Why one normalized table. InteractionEvent is provider-agnostic by construction β€” Gmail, Outlook, calendars all land in the same shape. The scoring engine has no idea where an interaction came from. Adding Slack or Zoom later would be an enum value, not a new pipeline. See DR-01.

Stage responsibilities

StageInputOutputTrigger
IngestionProvider APIs (metadata only)InteractionEvent rowsDaily cron + on-demand backfill
ResolutionNew/unresolved InteractionEvent + Contact createscontact_id populatedafter_commit on insert + Contact hooks
RollupInteractionEventUserContactInteractionStatsNightly full + incremental on event insert
User enrichmentUser email / linkedin handleUserEnrichedProfile, work/edu recordsOn account connect + weekly refresh + Findem webhook
ScoringStats + overlapsUCC.strength_tier, ContactReachabilityOn stats change + on override edit + nightly

3. Data model

Five new tables. No changes to Contact, ContactEmail, or Collection. Three existing tables get additive columns.

erDiagram USERS ||--o{ UCC : "has many" CONTACTS ||--o{ UCC : "has many" COLLECTIONS ||--o{ UCC : "has many" USERS ||--o{ INTERACTION_EVENT : "owns" CONTACTS ||--o{ INTERACTION_EVENT : "attributed" USERS ||--o{ USER_CONTACT_STATS : "rollup" CONTACTS ||--o{ USER_CONTACT_STATS : "rollup" USERS ||--o| USER_ENRICHED_PROFILE : "has one" USER_ENRICHED_PROFILE ||--o{ USER_WORK_EXPERIENCE : "has many" USER_ENRICHED_PROFILE ||--o{ USER_EDUCATION : "has many" CONTACTS ||--o{ CONTACT_REACHABILITY : "per collection" COLLECTIONS ||--o{ CONTACT_REACHABILITY : "per contact" UCC { int user_id FK int contact_id FK int collection_id FK string source string strength_tier float strength_score string strength_override datetime strength_computed_at } INTERACTION_EVENT { bigint id PK int user_id FK string provider string kind string direction datetime occurred_at string thread_id string remote_message_id UK string contact_email int contact_id FK boolean newsletter_flag string raw_headers } USER_CONTACT_STATS { int user_id FK int contact_id FK int email_inbound_count_12mo int email_outbound_count_12mo int email_twoway_count_total int email_inbound_nnl_count_24mo datetime email_last_at datetime email_last_outbound_at boolean email_has_response_in_3mo int email_activity_bitmap_12q int email_active_quarters_consecutive int meeting_count_180d int meeting_count_24mo datetime meeting_last_at datetime recomputed_at } USER_ENRICHED_PROFILE { int user_id FK string findem_id string raw datetime refreshed_at } USER_WORK_EXPERIENCE { int user_id FK int organization_id FK date date_from date date_to string title string location } USER_EDUCATION { int user_id FK int school_id FK date date_from date date_to } CONTACT_REACHABILITY { int contact_id FK int collection_id FK string tier int warm_count int known_count int cold_count int key_user_id FK datetime computed_at }

New columns on existing UCC table

ColumnTypeNotes
strength_tierenum (warm, known, cold, nullable)Null until first computation
strength_scorefloat, nullableWithin-tier sort, hidden from users in V1
strength_overrideenum, nullableUser-set override; rule result still stored for audit
strength_computed_attimestampLast rule-engine run

Enum values (stored as strings / Rails enums)

  • INTERACTION_EVENT.provider: gmail, outlook, gcal, ocal
  • INTERACTION_EVENT.kind: email, meeting
  • INTERACTION_EVENT.direction: inbound, outbound, attended
  • UCC.strength_tier / strength_override: warm, known, cold
  • CONTACT_REACHABILITY.tier: high, medium, low, none

Fields excluded from the diagram for brevity

  • USER_CONTACT_STATS also has email_inbound_count_24mo, email_outbound_count_24mo, email_last_inbound_at.
  • USER_ENRICHED_PROFILE.raw is actually jsonb; INTERACTION_EVENT.raw_headers is jsonb.
  • INTERACTION_EVENT.contact_id is nullable (intentional β€” unresolved interactions).
Additive columns, not a new table. Strength belongs where the user⇄contact link already lives. strength_override is nullable so the absence of a user decision is explicit β€” we can always tell whether tier was computed or set manually. See DR-06.

Why these shapes

  • InteractionEvent is the single normalized store. Gmail, Outlook, calendars all land here. The contact_id being nullable is deliberate: unresolved interactions wait for resolution rather than forcing a premature contact creation.
  • UserContactInteractionStats is a materialized cache, not the source of truth. Nightly rebuild + incremental updates on event insert. email_activity_bitmap_12q is a 12-bit int encoding 2-way activity per quarter β€” cheap to maintain, powerful for sustained-relationship detection.
  • UCC gets four columns, not a new table. Strength lives where the team-member⇄contact link already lives. Override is a nullable enum so the absence of an override is explicit.
  • ContactReachability is denormalized per (contact, collection). A cheap row-level recompute beats a per-request aggregation over hundreds of UCCs.
  • UserEnrichedProfile intentionally mirrors ContactEnrichedProfile β€” same enrichment engine (Findem), same shape, same webhook wiring.

4. Integration status

Google Workspace Mostly live

ComponentStatusNotes
OAuth client, GoogleAccount, token encryptionLivelib/google/oauth_client.rb
Scopes gmail.metadata + calendar.readonlyDeclaredNot yet ingested; already in GoogleAccount::GOOGLE_SCOPES
People API contact sync, daily cronLiveShipped Dec 2025
Admin-portal Google card, OAuth flow, pollingLiveuseIntegrationsPage hook
Per-scope UI togglesBuildExtend GoogleIntegrationCard
Gmail + Calendar metadata clientsBuildNew files under lib/google/
Syncers, schedulers, workersBuildMirror existing ContactsSyncer pattern
Newsletter / bulk inbound filterBuildHeader inspection + recipient count
CircuitBox wrap retrofitBuildGap in existing integration
Retrofit, not rewrite. Wrap existing client methods in Circuitbox.circuit(:name, ...).run { … } β€” no changes to business logic, just a guard layer. One named circuit per provider (:google_people, :google_gmail, :google_calendar, :ms_graph_mail, :ms_graph_calendar, :findem_profile) so one provider's outage doesn't cascade to others.

Microsoft 365 Greenfield

Multi-tenant Azure AD registration. One app registration in Getro's Azure tenant, users sign in from their own Microsoft 365 tenant. Same pattern as multi-org Google OAuth β€” the MS app is registered with signInAudience: "AzureADMultipleOrgs".
ComponentStatus
Azure AD app registration (multi-tenant)Build + ops
MicrosoftAccount model + migrationBuild
lib/microsoft/graph_client.rb baseBuild
Mail + calendar metadata clientsBuild
OAuth callback controller + refresh flowBuild
Syncers, schedulers, workers for bothBuild
Admin-portal Microsoft card + RTK Query serviceBuild
Env vars, feature flag, egress registry entryBuild + ops

Findem Pending capabilities

CapabilityConsumerStatus
F1 Β· Person lookup by emailDedup resolverAsk Findem
F2 Β· Enrichment by LinkedIn handleContact enrichmentLive
F3 Β· Enrichment when only email knownFallback contact enrichmentAsk Findem
F4 Β· User enrichmentNew UserEnrichedProfile pipelineAsk Findem
F5 Β· Company details (size, stage)Work-overlap gatePartial
F6 Β· Investor / cap-table dataInvestor-overlap signal (future)Ask Findem
F7 Β· Webhooks on profile updatesRe-trigger rollupsFramework

5. Service layer

Rule engine + scorer

The scorer is stateless: given a User and a Contact, it reads the rollup + overlap caches and returns a deterministic result.

Rules are declared as data, not code. Adding a new signal is a new Rule row β€” no changes to the evaluator. Makes the V2 admin-tunable-rules work a straightforward extension instead of a refactor. See DR-04.
Result = Struct.new(:tier, :score, :reasons)
# tier:    :warm | :known | :cold
# score:   Float, for within-tier sort only
# reasons: [{ code: :twoway_12mo_5plus, met: true, detail: "7 in / 9 out" }, ...]

class RelationshipStrengthService
  def self.call(user:, contact:)
    stats = UserContactInteractionStats.find_by(user: user, contact: contact)
    overlap = WorkOverlapCache.by_pair(user, contact)
    edu = EducationOverlapCache.by_pair(user, contact)

    evaluated = RULES.map { |rule| rule.evaluate(stats, overlap, edu) }
    tier = %i[warm known cold].find { |t| evaluated.any? { |r| r.tier == t && r.met } } || :cold
    score = WEIGHTED_SCORE.call(evaluated)
    Result.new(tier, score, evaluated.select(&:met))
  end
end

Rules are declared as data, not code:

RULES = [
  Rule.new(code: :twoway_12mo_5plus, tier: :warm, signal_weight: 3.0,
           predicate: ->(s, _, _) { s && s.email_inbound_count_12mo >= 5 && s.email_outbound_count_12mo >= 5 }),
  Rule.new(code: :sustained_2yr, tier: :warm, signal_weight: 2.5,
           predicate: ->(s, _, _) { s && s.email_active_quarters_consecutive >= 8 }),
  # ... 14 more ...
]

Adding a new signal = adding a Rule row. No changes to the evaluator.

Reachability service

Pure aggregation β€” runs once per (contact, collection) whenever any UCC tier in that collection changes.

class ReachabilityService
  THRESHOLDS = {
    high:   ->(c) { c[:warm] >= 1 || c[:known] >= 3 || c[:cold] >= 10 },
    medium: ->(c) { c[:known] >= 1 || c[:cold] >= 5 },
    low:    ->(c) { c[:cold] >= 1 }
  }.freeze

  def self.call(contact:, collection:)
    uccs = UserContactCollection.where(contact: contact, collection: collection)
    counts = uccs.group(:strength_tier).count.symbolize_keys
    tier = THRESHOLDS.find { |_, pred| pred.call(counts) }&.first || :none
    key = uccs.order(strength_tier_ordinal: :desc, strength_score: :desc).first&.user_id
    ContactReachability.upsert({ contact_id: contact.id, collection_id: collection.id,
                                 tier: tier, key_user_id: key, **counts, computed_at: Time.now })
  end
end

Rollup worker

Why nightly in addition to incremental? Windowed counters need to forget events as they age out (an email just crossed the 12-month boundary). Incremental can add but can't drop β€” the nightly sweep ages out stale entries and reconciles any drift. See DR-09.

Two entry points:

  • Incremental: on every InteractionEvent insert via after_commit, bump counters and update the activity bitmap. O(1) per event.
  • Nightly full: recompute all rows touched in the last 24h. Catches drift, re-bucketizes aging windows (e.g. emails that just fell outside the 12-month window).

Findem enrichment resolver

Lookup β†’ cache β†’ Contact match. Conservative: ambiguous Findem responses leave contact_id null.

Resolver runs per event; Findem called per unique email. The EnrichedEmailLookup cache absorbs ~99% of calls after day 1. Typical user: ~800 Findem calls on backfill, <10/day steady state. See DR-03 and the FAQ.
flowchart TD Start([New InteractionEvent
contact_email = bob@acme.com]) --> Norm[Normalize email
lowercase + trim] Norm --> Cache{EnrichedEmailLookup
cache hit?} Cache -->|Hit| Apply Cache -->|Miss| Findem[Findem F1 lookup] Findem --> Amb{Confidence >= threshold?} Amb -->|No| LeaveNull[Leave contact_id NULL
store neg cache 7d] Amb -->|Yes| Store[Store canonical:
linkedin_handle,
known_emails] Store --> Apply[Apply to Getro] Apply --> ByHandle{Contact by
linkedin_handle?} ByHandle -->|Found| Set[Set contact_id
on InteractionEvent] ByHandle -->|None| ByEmail{ExistingContactsByEmailsFinder
on known_emails?} ByEmail -->|Found| Set ByEmail -->|None| LeaveNull LeaveNull --> Done([Done β€” may resolve later
via Contact create hook]) Set --> Done classDef warn fill:#fef3c7,stroke:#b45309 classDef ok fill:#dcfce7,stroke:#15803d class LeaveNull warn class Set ok

6. Operational envelope

Quantitative choices, rate limits, and volume estimates β€” the numbers reviewers ask about when deciding whether this will scale or what it will cost.

Findem cache

Positive TTL
90–180 days
Negative TTL
7–30 days
Day-1 calls/user
~800 (one per unique correspondent)
Steady-state
5–10 calls/day/user
Invalidation
Findem F7 webhook on profile update

Backfill horizon

On OAuth connect
24 months (see DR-08)
Older history
On-demand "load historical" job (future)
First-sync duration
~minutes, not hours

Gmail / Google Calendar

Rate limit
600 req/min/user; 1B units/day/project
Sync cadence
Daily, 6am UTC
Incremental
historyId (mail), syncToken (calendar)

Microsoft Graph

Rate limit
10k req / 10min / user
Sync cadence
Daily, staggered from Google
Incremental
Delta query tokens

Interaction volume (typical user)

Day-1 events
~20,000 (24-month backfill)
Steady-state
~50 events/day
Row size
~500 bytes
10 users Γ— 2 years
~1.5 GB raw events

CircuitBox (all external clients)

error_threshold
50%
time_window
60s
volume_threshold
5 requests
sleep_window
120s before half-open probe

Rollup worker

Incremental
after_commit per event, O(1)
Nightly rebuild
00:00 UTC, 36-hour window
Batch size
500 pairs/job
Uniqueness
sidekiq-unique-jobs :until_and_while_executed

OAuth tokens

Google access
~1 hour
Google refresh
Long-lived, revoked on disconnect
MS access
~1 hour
MS refresh
24h rolling
Storage
Encrypted at rest via lockbox

Scoring / reachability

Scorer complexity
O(1) per pair β€” reads one stats row
Reachability aggregate
SQL GROUP BY on UCC strength_tier
Typical cascade on strength change
~1–10 reachability recomputes

Privacy guardrails

Scopes chosen
gmail.metadata, Mail.ReadBasic β€” no body access possible
No calendar titles stored
attendee list + times only
Per-user opt-in
Each scope toggleable independently
Disconnect wipe
All InteractionEvents deleted

7. Phased plan

The V1 breakpoint. The "Thin V1 ships" node splits the plan. Phases 1–6 depend on nothing we can't control β€” Phase 7+ gate on Findem capability confirmations. A capability answer that slips doesn't block the thin-V1 ship. See DR-05.
flowchart LR P1[Phase 1
InteractionEvent
+ simple resolver] --> P2[Phase 2
Gmail + GCal syncers
+ scope UI toggles] P1 --> P6[Phase 6
MS Graph integration] P2 --> P3[Phase 3
Rollup worker
+ stats table] P6 --> P3 P3 --> P4[Phase 4
Rule engine
+ Reachability] P4 --> P5[Phase 5
UI surfaces
+ filters] P5 --> V1[[Thin V1 ships
8/16 clauses active
both providers]] V1 --> P7[Phase 7
Findem lookup
+ cache] P7 --> P8[Phase 8
User enrichment] P8 --> P9[Phase 9
Overlap calculators
+ activate clauses] P9 --> Full[[Full V1
14/16 clauses active]] classDef v1 fill:#dcfce7,stroke:#15803d,stroke-width:2px classDef full fill:#ddf4ff,stroke:#0969da,stroke-width:2px class V1 v1 class Full full
PhaseScopeDepends onSizeSlice
1InteractionEvent + ContactEmailResolver + backfill hooksβ€”SA
2Gmail + Google Calendar syncers + scope UI + CircuitBox retrofit1MA
3Rollup worker + UserContactInteractionStats2MA
4Rule engine + reachability + UCC strength columns + override3MA
5Contact-detail signals + sortable columns + reachability filters + hover reasons4MA
6MS Graph integration (OAuth, mail+cal syncers, admin-portal card)1LA
7Findem lookup_by_email + EnrichedEmailLookup cache; swap resolverFindem F1SB
8User enrichment via Findem (F4)Findem F4MC
9Work + education overlap calculators; activate clauses in rule engine8MC
10(Optional) Investor overlap via F6Findem F6SC+
11(Nice) Intro-request auto-draft5Sβ€”

8. FAQ

The sharp questions reviewers already have.

Won't this hammer Findem with thousands of calls per mailbox?

No. The EnrichedEmailLookup cache keys on normalized email β€” Findem gets called once per unique correspondent, not once per event. Day-1 backfill is ~800 calls; steady state is <10/day. See DR-03.

What happens when a provider is down?

CircuitBox opens around the failing client, the sync worker exits cleanly, a Slack alert fires. New InteractionEvent rows pause for that provider; others are unaffected. Stats stay consistent with what we've seen. No data loss.

Can email data leak across collections?

InteractionEvent is scoped to the mailbox owner. Stats are per (user, contact). Reachability aggregates only the UCCs in one collection β€” a user in Collection A and B shares interaction data, but each collection only sees its own breadth counts.

What happens when a user disconnects their mailbox?

OAuth token revoked, provider-account row deleted, cleanup worker removes every InteractionEvent for that user. Dependent stats recompute (counts drop to 0). Strength tiers re-evaluate to Cold or null.

How do we prevent newsletters from being mistaken for real relationships?

At ingestion each email is flagged newsletter if any of: List-Unsubscribe header present, Precedence: bulk/list/junk, or >20 recipients. Newsletter-only inbound matches a Cold clause (C2), never Known.

Do we ever create contacts from email data?

No β€” that was V1's failure mode. If the sender/recipient doesn't resolve to an existing Contact, the InteractionEvent stays with contact_id = NULL and waits. An after_commit hook on Contact create backfills attributable events later.

Does this work for Outlook / Microsoft 365 users?

Yes β€” building Microsoft Graph in parallel with Google is the whole point. The scoring engine is provider-agnostic: an Outlook user and a Gmail user who know the same contact produce indistinguishable signals.

Can an individual hide their own signals?

V1: no hide-toggle, but they can override their own strength up or down. Disconnecting their mailbox is the escape hatch β€” contributions fall to Cold/None.

Why compute a numeric score if we don't show it?

Two places it matters: sorting a Warm list by "warmest first", and picking the Key Connection when multiple team members are all Warm. V1 hides it per product call; V2 exposes it.

What are the thin-V1 ship criteria?

Phases 1–6 complete: both providers connected, 8 of 16 heuristic clauses active, reachability + key connection functional, UI renders signals on contact detail + list. Findem dedup (7) and user enrichment (8–9) arrive incrementally.

9. Open questions

Findem capability confirmations

  1. F1 β€” person lookup by email: does the endpoint exist? Rate limits, cost per call, response schema, confidence scoring?
  2. F3 β€” email-only enrichment: fallback when no handle is known?
  3. F4 β€” user enrichment: any product/TOS concern enriching authenticated users vs. external contacts?
  4. F6 β€” investor data: does Findem expose company investor / funding rounds?
  5. LinkedIn degree: does Findem track 1st-degree LinkedIn connections? Long shot; would close the last V1 gap.

Team decisions before execution

  1. Ship Slice A (thin V1, 50% heuristic activation) as the first customer-facing release, or wait for Slice C?
  2. "Sustained at lower volume" threshold β€” recommend 8 of last 8 quarters continuous. Alternative: 6 of 8.
  3. Reachability editable? Recommend no in V1 β€” it's a pure aggregation, editing would create drift.
  4. Strength-override audit: when a user overrides, keep a record of what the rule engine would have said (for future rule tuning)?
  5. Backfill horizon: 24 months default. Confirm.
  6. Browser-extension LinkedIn capture: worth a short spike to see if 1st-degree data is already being scraped.

10. File references

Key existing Getro code to extend or mirror.

Backend

PurposePath
Google OAuth clientbackend/lib/google/oauth_client.rb
People API clientbackend/app/services/google/people_client.rb
GoogleAccount modelbackend/app/models/google_account.rb
Reference syncerbackend/app/services/contacts/import/google/contacts_syncer.rb
Reference schedulerbackend/app/workers/schedulers/contacts/import/google_contacts_daily_sync_scheduler.rb
Contact modelbackend/app/models/contact.rb
ContactEmailbackend/app/models/contact_email.rb
Dedup intakebackend/app/services/contacts/contact_creator.rb
Dedup lookupbackend/app/services/contacts/existing_contact_finder.rb
Merge with auditbackend/app/services/contacts/merge_service.rb
UCC (extend)backend/app/models/user_contact_collection.rb
Findem basebackend/lib/findem/client.rb, backend/lib/findem/apis/
CircuitBox configbackend/config/initializers/circuitbox.rb

Admin portal

PurposePath
Integrations page (per-user)admin-portal/src/pages/Settings/integrations/
Google OAuth flow hookadmin-portal/src/pages/Settings/integrations/hooks/useIntegrationsPage.jsx
Google RTK Query serviceadmin-portal/src/services/userGoogleAccountsV2.js
Path-card atoms (reuse for signals)admin-portal/src/pages/listDetail/networkConnections/components/pathCard/
Contact detail pageadmin-portal/src/components/organisms/contactDetail/
Contact list viewsadmin-portal/src/pages/contactsExtended/

11. Decision records

Short ADRs for the non-obvious choices landed during this spike. The "why we picked this" future maintainers will ask.

DR-01InteractionEvent as a single normalized store
Context
Two providers, two data kinds. A naive model would split into GmailInteraction, OutlookInteraction, GoogleCalendar, OutlookCalendar.
Decision
One InteractionEvent table; provider + kind are enum columns.
Consequences
Scoring is provider-agnostic by construction. Adding a third provider (e.g. Slack DMs later) is an enum value, not a new pipeline.
Alternatives
Per-provider tables β€” rejected, doubles the scoring code path and couples tier logic to source.
DR-02Findem as the dedup oracle; Contact model unchanged
Context
V1 G Suite integration created duplicates because Getro's Contact is Collection-scoped β€” same email in two collections is two rows.
Decision
Resolver passes every unresolved email through Findem's canonical-identity lookup, then matches to Getro Contact via globally-unique linkedin_handle first, then ContactEmail.
Consequences
No schema change to Contact. The identity problem moves outside Getro to Findem's graph.
Alternatives
Global Contact primary key (breaks Collection isolation); per-collection fallback lookup (doesn't close the cross-collection dup gap).
DR-03Findem called per unique email, not per event
Context
Per-event calls would cost thousands per user per day and trip rate limits.
Decision
EnrichedEmailLookup cache fronts every Findem call. TTL 90–180d positive, 7–30d negative.
Consequences
~800 calls per user on backfill, <10/day steady state. Cache invalidation via Findem F7 profile-update webhook.
DR-04Two-layer scoring β€” rules for tier, weighted score for sort
Context
Team thread split on rules-vs-weights. Admins want explainable tiers; product wants within-tier ranking.
Decision
Rules assign the tier (first-match-wins). Weighted score computed independently, used only for within-tier sort + Key Connection tiebreak.
Consequences
V1 tiers are explainable ("you matched W1 and W4"). V2 exposes weights for admin tuning.
Alternatives
Pure weighted score with hard thresholds β€” rejected, not explainable.
DR-05Ship thin V1 first; upgrade in place
Context
Waiting for Findem confirmations blocks ship. Email + calendar alone cover 50% of clauses.
Decision
Phases 1–6 ship without Findem or user enrichment. Findem (Phase 7) and user enrichment (8–9) land incrementally, no data migration required.
Consequences
Dedup coverage is ~40–60% on Slice A vs. ~75–85% with Findem. The architecture is additive at every stage.
DR-06UCC extension, not a new strength table
Context
Strength is per (user, contact). UserContactCollection already models that link.
Decision
Add 4 nullable columns to UCC: strength_tier, strength_score, strength_override, strength_computed_at.
Consequences
No new join for every strength read. Migration is purely additive.
Alternatives
Separate UserContactStrength table β€” rejected, one-to-one with UCC with no justification.
DR-07Reachability not user-editable in V1
Context
Reachability aggregates strength tiers. If users edit it, it drifts from its own inputs.
Decision
V1: strength is overridable (per-user), reachability is not. V2: revisit if users request it.
Consequences
Single source of truth is UCC strength. Reachability always derivable from current state.
DR-0824-month backfill horizon on OAuth connect
Context
Unbounded backfill is slow and expensive. The "sustained 2-year" heuristic defines the practical signal floor.
Decision
Read last 24 months of email + calendar on first connect. Older history via explicit "load historical" button (future).
Consequences
First sync completes in minutes. Signals stabilize within 24h of connection.
DR-09Nightly rollup in addition to incremental updates
Context
Windowed counters (email_count_12mo, has_response_in_3mo) need to forget events as they age out. Incremental can add but can't drop.
Decision
Incremental path on event insert (O(1) bump). Nightly full rebuild sweeps the aging tail + reconciles drift.
Consequences
Two code paths for the same stats. Incremental is fast; nightly is safe. If incremental ever lags, nightly heals.