GET-109 ยท Data โ†’ Strength Narrative

From raw email & calendar data to a relationship signal

A walk-through of exactly what data we collect, how it becomes evidence, and how that evidence answers "who knows this person and how well?"

1. The question we're answering

When an Inovia team member finds a contact and wants an intro, the platform should answer two things without asking them to guess:

"How likely am I to get a warm intro?"

โ†’ a reachability level on the contact: High, Medium, Low, or None.

"Who should I ask?"

โ†’ a key connection: the team member with the strongest relationship.

Both answers roll up from the same atomic layer โ€” per-team-member connection strength (Warm / Known / Cold) derived from transparent, explainable signals.

2. What data we ingest

For each Inovia admin user who connects their mailbox, we read metadata only from two providers:

ProviderScope requestedFields we see
Gmail
Google
gmail.metadata From, To, Cc, Bcc, Date, Subject, Message-ID, In-Reply-To, References, threadId
Google Calendar
Google
calendar.readonly Attendee email list, start time, RSVP status (accepted / tentative / declined)
Outlook Mail
Microsoft
Mail.ReadBasic from, toRecipients, ccRecipients, sentDateTime, receivedDateTime, conversationId, subject, internetMessageId
Outlook Calendar
Microsoft
Calendars.ReadBasic attendees (+ response status), start time
Metadata-only is enforced by the API, not just policy. gmail.metadata physically cannot return message bodies; Mail.ReadBasic explicitly excludes them. The API can't leak what it can't return โ€” the guardrail lives at Google's/Microsoft's boundary, not in our code.

Additionally, once Findem capabilities are confirmed, we enrich each user and each resolved contact with:

  • Employment history (company, title, dates, location) โ€” same pipeline as today's contact enrichment.
  • Education history (school, degree, dates).
  • Canonical identity for an email address โ€” so bob.smith@acme.com and b.smith@acme.com resolve to one person.

3. What we do not ingest

Privacy guardrails enforced at the API level

  • No email body. gmail.metadata physically cannot return message bodies. Mail.ReadBasic explicitly excludes them.
  • No calendar event titles or notes. We read attendee lists and times only.
  • No attachments.
  • Only one user's mailbox per OAuth grant. No delegated access, no admin override.
  • Each source is a per-user opt-in toggle. Calendar can be on while email is off, or vice versa.
  • Disconnecting wipes data. Revoking OAuth triggers deletion of all InteractionEvent rows for that user.

4. From raw events to signal primitives

Raw email and calendar events land in one normalized table (InteractionEvent). A nightly rollup distills them into per-(user, contact) signal primitives โ€” the atomic facts every heuristic reads.

flowchart LR subgraph Raw["Raw events"] E1[Email: Alice โ†’ Bob
2026-02-10] E2[Email: Bob โ†’ Alice
2026-02-11] E3[Email: Alice โ†’ Bob
2026-03-15] M1[Meeting: Alice + Bob
2026-04-01] N1[Newsletter: NYT โ†’ Alice
2026-04-05] end subgraph Normalize["Normalize + flag"] N[Mark newsletters
Mark direction
Resolve contact_id] end subgraph Rollup["UserContactInteractionStats
per (Alice, Bob)"] S1["email_outbound_count_12mo = 2"] S2["email_inbound_count_12mo = 1"] S3["email_last_at = 2026-03-15"] S4["email_has_response_in_3mo = true"] S5["meeting_count_180d = 1"] S6["activity_bitmap_12q = โ€ฆ0111"] end E1 --> N E2 --> N E3 --> N M1 --> N N1 --> N N --> S1 N --> S2 N --> S3 N --> S4 N --> S5 N --> S6 classDef raw fill:#fff7ed,stroke:#d97706 classDef norm fill:#ddf4ff,stroke:#0969da classDef rollup fill:#dcfce7,stroke:#15803d class E1,E2,E3,M1,N1 raw class N norm class S1,S2,S3,S4,S5,S6 rollup
Primitives are rollup columns, not raw events. The rule engine never scans InteractionEvent โ€” it reads one row from UserContactInteractionStats per (user, contact). That's O(1) scoring regardless of how much history lives underneath.

The primitive catalog

PrimitiveWhat it answersV1 available?
email_inbound_count_12moHow many emails has this person sent me in the last 12 months?Yes
email_outbound_count_12moHow many have I sent them?Yes
email_twoway_count_totalHow many back-and-forth email threads have we ever had?Yes
email_activity_bitmap_12qIn which of the last 12 quarters did we exchange email both ways?Yes
email_has_response_in_3moDid they reply to anything I sent in the last 90 days?Yes
email_last_at, email_last_outbound_at, email_last_inbound_atWhen was the most recent email? Last outbound? Last inbound?Yes
email_inbound_non_newsletter_count_24moHave they sent me a real (non-newsletter) email in 2 years?Yes
meeting_count_180d, meeting_count_24moHow many meetings have we shared?Yes
meeting_last_atWhen did we last meet?Yes
work_overlap (same company, dates, team, location, function)Did we work together? When? Same team?Slice C
education_overlapDid we go to the same school around the same time?Slice C
linkedin_degreeAre we 1st-degree connected on LinkedIn?Not V1

5. Heuristics โ€” plain English, then the rule

Each tier has multiple alternative qualifying conditions (OR). A contact becomes Warm if any Warm clause matches. If no Warm clause matches, we check Known. If no Known, it's Cold.

Warm โ€” "They know this person well"

W1Two-way exchange in the last 12 months, 5+ each direction

You've had a genuine back-and-forth in the past year โ€” at least 5 emails each direction.

email_inbound_count_12mo >= 5 AND email_outbound_count_12mo >= 5
Reads: email_inbound_count_12mo, email_outbound_count_12mo
Example met: last year you sent Bob 8 emails, he sent you 6. Warm โœ…
W2Sustained back-and-forth over 2+ years (even at low volume)

You've kept in touch on and off over a long time. Not frequent, but a steady drumbeat.

email_active_quarters_consecutive >= 8
Where a quarter is "active" if it contains โ‰ฅ1 inbound AND โ‰ฅ1 outbound email. 8 consecutive quarters = 2 years of steady contact.
Example met: one or two emails each way every quarter for the past 2 years โ€” not many total, but unbroken. Warm โœ…
W3Recent email with a response, plus a work or LinkedIn connection

You emailed them recently, they replied, and you're linked in some other way too โ€” worked together or connected on LinkedIn.

email_last_outbound_at > now - 90d AND email_has_response_in_3mo AND (work_overlap_exists OR linkedin_degree = 1)
Work-overlap half activates in Slice C. LinkedIn-degree half stays dormant until that data source exists.
Example met: you emailed Bob 3 weeks ago, he replied the next day, and you worked at Acme together in 2019. Warm โœ…
W41+ calendar meeting in the last 180 days

You've actually met with them in the past 6 months.

meeting_count_180d >= 1
Example met: you had a 30-minute call with Bob in March. Warm โœ…
W5Same-team or small-company work overlap, recently

You worked closely together โ€” same team, or a small company where everyone knew each other โ€” and it wasn't too long ago.

overlap_months >= 3 AND ended_at > now - 5y AND (same_team OR company_size < 50 OR both_c_level OR (same_location AND same_job_function))
Slice C (requires user enrichment from Findem).
Example met: you and Bob were both on the engineering team at a 30-person startup 2 years ago. Warm โœ…

Known โ€” "There's a real connection, but confirm before the intro"

K1Occasional 2-way email (3โ€“10 exchanges total)

You've traded emails a few times over the years. Real contact, not frequent.

email_twoway_count_total BETWEEN 3 AND 10
Example met: 6 back-and-forth email threads over 3 years. Known โœ…
K2One-way inbound within 2 years (they reached out, not a newsletter)

They emailed you at some point in the last 2 years, and it wasn't a newsletter.

email_inbound_non_newsletter_count_24mo >= 1
Newsletters are flagged at ingestion via List-Unsubscribe, Precedence: bulk/list, or recipient count > 20.
Example met: Bob emailed you last year asking for advice; you never got around to replying. Known โœ…
K3Calendar meeting 180 days to 24 months ago

You met with them 6 months to 2 years ago. Not recent, not ancient.

meeting_count_24mo - meeting_count_180d >= 1
Example met: you met Bob at a dinner last year but haven't talked since. Known โœ…
K4Same-team overlap (broader window, looser gates)

You worked together at some point โ€” maybe not recently, but the connection is real.

Slice C. Similar to W5 but the time window extends further back and the c-level / small-company gate is dropped.
K5Same company, same location, same job function

You worked at the same company, in the same office, in similar roles. Even if not the same team, you bumped into each other.

same_company AND same_location AND same_job_function AND overlap_months >= 3
Slice C.
K6LinkedIn connection + (old work overlap OR shared school)

You're connected on LinkedIn, plus some historical overlap at work or school.

Slice C partial. Work / education sides activate; LinkedIn-degree side stays dormant.

Cold โ€” "No meaningful relationship"

C1One-way outbound only (no reply)

You've emailed them, they never replied. Cold outreach territory.

email_outbound_count_total > 0 AND email_inbound_count_total = 0
C2Inbound newsletter-style only

You're on a mailing list with them, but you've never had a real exchange.

email_inbound_non_newsletter_count_total = 0 AND email_inbound_newsletter_count_total > 0 AND email_outbound_count_total = 0
C3LinkedIn 1st-degree only, no other signal

You're connected on LinkedIn โ€” nothing else. Not V1

C4Past shared employment >5 years ago, no recent contact

You worked together a long time ago and haven't kept in touch.

Slice C.
C5Default fallback

Nothing else matched โ€” treat as cold.

6. Combining clauses into a tier

The rule engine walks clauses top-down. The first tier with any matching clause wins. All matched clauses (including those from tiers below) are returned as reasons[] โ€” that powers the "hover to see why" UX.

flowchart TD Start([User ร— Contact
stats loaded]) --> EvalWarm{Any Warm
clause matches?
W1 ยท W2 ยท W3 ยท W4 ยท W5} EvalWarm -->|Yes| Warm[Tier = WARM
score = weighted sum
reasons = matched clauses] EvalWarm -->|No| EvalKnown{Any Known
clause matches?
K1 ยท K2 ยท K3 ยท K4 ยท K5 ยท K6} EvalKnown -->|Yes| Known[Tier = KNOWN] EvalKnown -->|No| EvalCold{Any Cold-positive
clause matches?
C1 ยท C2 ยท C3 ยท C4} EvalCold -->|Yes| Cold[Tier = COLD] EvalCold -->|No| Default[Tier = COLD
default fallback] Warm --> Override{User has set
an override?} Known --> Override Cold --> Override Default --> Override Override -->|Yes| Use[Use override
keep original as audit] Override -->|No| Persist[Persist to UCC:
strength_tier
strength_score
strength_computed_at] Use --> Persist classDef warm fill:#fed7aa,stroke:#d97706,stroke-width:2px classDef known fill:#bae6fd,stroke:#0284c7,stroke-width:2px classDef cold fill:#e5e7eb,stroke:#6b7280,stroke-width:2px class Warm warm class Known known class Cold,Default cold
First match wins โ€” not weighted aggregate. We evaluate Warm clauses first; if any matches, the tier is Warm even if a Known clause would also match. All matched conditions are retained as reasons[] for the hover UI. See technical spec DR-04.

Within-tier ranking

Two contacts both land at Warm, but one is "barely warm" and another is "obviously the strongest connection on the team." The weighted score resolves that.

  • Each matched clause contributes signal_weight ร— normalized_strength.
  • The score is not shown to users in V1 (per product call). It's used only to sort within a tier.
  • The same score drives the "key connection" tiebreaker: of all Warms on the team for this contact, the highest score wins.

7. Reachability โ€” rolling up to the team level

Once every team member has a Warm / Known / Cold (or no relationship) with a given contact, we aggregate those tiers into the team's reachability.

flowchart TD Counts[Count team members by tier
for this contact + collection] --> Q1{Warm count ≥ 1
OR Known ≥ 3
OR Cold ≥ 10?} Q1 -->|Yes| High[HIGH
Very likely warm intro] Q1 -->|No| Q2{Known ≥ 1
OR Cold ≥ 5?} Q2 -->|Yes| Medium[MEDIUM
Possible intro path] Q2 -->|No| Q3{Cold ≥ 1?} Q3 -->|Yes| Low[LOW
Only weak signals] Q3 -->|No| None[NONE
No one is connected] classDef high fill:#dcfce7,stroke:#15803d,stroke-width:2px classDef medium fill:#fef3c7,stroke:#b45309,stroke-width:2px classDef low fill:#fee2e2,stroke:#b91c1c,stroke-width:2px classDef none fill:#f3f4f6,stroke:#6b7280,stroke-width:2px class High high class Medium medium class Low low class None none

Reachability thresholds

LevelQualifies ifMeaning
High1+ Warm, OR 3+ Known, OR 10+ ColdVery likely to get a warm intro โ€” someone knows them well, or enough people know them to find the right path.
Medium1+ Known, OR 5+ ColdPossible intro, team has real but thinner connections.
Low1+ ColdOnly weak signals. Probably a cold reach.
NoneNo one on the team is connectedNo path identified.

Key connection = the warmest path

Independent of tier count, we surface one team member โ€” the one with the highest tier (ties broken by score) for this contact. That's the default "ask this person for the intro" suggestion.

8. Worked scenarios

Three concrete examples walked end-to-end, from raw data through to the displayed indicators.

These are synthetic. Real mailboxes have far more noise โ€” meeting-reminder bots, CI notifications, shared-mailbox artifacts, deal-room auto-forwards. Newsletter detection + self-filter + Findem resolution clean most of it; the rest rolls up to small numbers that don't move tiers.

Scenario 1 โ€” Alice โ†” Bob: active current collaborator Warm

Raw data observed in Alice's mailbox:

  • 27 outbound emails to bob@acme.com in the last 12 months
  • 31 inbound emails from bob@acme.com in the last 12 months
  • 3 calendar meetings with Bob as attendee in the last 180 days

Rollup row (Alice, Bob):

email_outbound_count_12mo: 27
email_inbound_count_12mo:  31
email_has_response_in_3mo: true
meeting_count_180d:        3

Rule engine: W1 matches (27 โ‰ฅ 5 AND 31 โ‰ฅ 5). W4 also matches (3 โ‰ฅ 1). Tier = Warm.

Reasons shown to user: "5+ two-way emails last year (27 out / 31 in)" + "3 meetings in the last 6 months."

Scenario 2 โ€” Carol โ†” Dave: old acquaintance Known

Raw data observed in Carol's mailbox:

  • 4 back-and-forth email threads over the last 4 years
  • No emails in the last 8 months
  • No calendar meetings
  • No work overlap data (Slice A)

Rollup row (Carol, Dave):

email_twoway_count_total:     4
email_outbound_count_12mo:    0
email_inbound_count_12mo:     0
email_inbound_nnl_count_24mo: 0
meeting_count_180d:           0

Rule engine: No Warm clause matches. K1 matches (4 is between 3 and 10). Tier = Known.

Reasons shown to user: "You've had 4 two-way email exchanges over time." Carol's UI suggestion: "Ping Dave before asking for an intro โ€” it's been a while."

Scenario 3 โ€” Eve โ†” Frank: newsletter subscriber only Cold

Raw data observed in Eve's mailbox:

  • 14 inbound emails from noreply@frankscoinpodcast.com with List-Unsubscribe header
  • 0 outbound emails
  • 0 calendar meetings

Rollup row (Eve, Frank):

email_inbound_newsletter_count_total:     14
email_inbound_non_newsletter_count_total: 0
email_outbound_count_total:               0

Rule engine: No Warm / Known clauses match. C2 matches (all-inbound-newsletter only). Tier = Cold.

Reasons shown to user: "You've only received newsletters from this contact โ€” no real exchange." The UI de-emphasizes Eve as a connector.

Rolling the three up into team reachability

Now imagine the team is evaluating contact "Frank" and three team members have signals:

Team memberTier with FrankWhy
AliceWarmRecent 2-way email + 3 meetings
CarolKnown4 threads over the years
EveColdNewsletter only

Aggregated reachability:

  • Warm count = 1 โ†’ meets the "1+ Warm" threshold โ†’ Reachability = High
  • Key connection = Alice (highest tier; Warm beats Known beats Cold)
  • Breadth badge: "3 Inovians know this person (1 Warm, 1 Known, 1 Cold)"

In the UI: the contact shows a High reachability pill, Alice is badged with a "Key connection" crown, and all three team members appear in the "Connection of" filter with their individual tiers.

9. Scale walkthrough โ€” day 1 to steady state

What actually happens to one realistic user, then extrapolated to a 10-person team. Replaces hand-wavy capacity claims with concrete numbers.

User profile

Inovia partner with ~8 years of email history. Connects Gmail + Google Calendar. System ingests the last 24 months only (per the backfill horizon decision).

Day 1 โ€” OAuth connect
  • ~20,000 emails read from Gmail (24-month backfill)
  • ~600 calendar events read over 180 days
  • ~3,000 unique email correspondents; ~800 unique addresses after self-filter and newsletter flagging
  • Resolver runs per event (20,600 events)
  • Cache miss on ~800 addresses โ†’ ~800 Findem F1 calls
  • ~650 resolve to existing Getro Contact rows (~81% hit rate)
  • ~150 stored as negative-cache entries; events sit with contact_id = NULL
  • Rollup materializes ~650 UserContactInteractionStats rows
  • Scorer computes ~650 UCC strength tiers (~5 sec total)
  • Reachability aggregates per contact for every touched pair
Day 2 โ€” first incremental sync
  • ~50 new emails since last sync
  • ~2 new calendar events
  • Resolver: 48 cache hits, 2 cache misses โ†’ 2 Findem calls
  • Incremental rollup: 50 event upserts, ~12 stats rows touched, ~12 UCCs rescored
  • Total external API cost for the day: 50 Gmail calls, 2 Findem calls
Week 2+ โ€” steady state
  • ~300 new events/week
  • ~20 new unique correspondents/month โ†’ ~5 Findem calls/day
  • Nightly rebuild processes the 36-hour rolling window + ages-out counters (emails crossing the 12-month boundary, meetings crossing 180d). Runs in minutes.
  • Any positive-cache Findem entry hitting its 90โ€“180d TTL is refreshed lazily on next encounter
Team scale โ€” 10 users in Inovia
  • Day-1 cumulative: ~8,000 Findem calls (cache is per-address globally, so overlap between team members reduces calls), ~200k email events processed, ~4,500 distinct resolved contacts covered
  • Steady state: ~50โ€“100 Findem calls/day for the team combined
  • Storage at 2 years: ~1.5 GB InteractionEvent, ~50 MB stats, ~5 MB reachability
  • Reachability per contact aggregates up to 10 UCC rows โ€” SQL GROUP BY, sub-millisecond
  • Per-contact recompute on strength change: 1 row insert + trigger re-aggregation for that contact's reachability row only

What this tells us

  • Findem cost is front-loaded: backfill is the spend, steady state is cheap.
  • Storage is modest: even a 50-user deployment over 5 years stays under ~40 GB of interaction data.
  • Scoring latency is bounded: O(1) per pair. Dashboard loads read pre-computed reachability rows โ€” no scoring happens at request time.
  • Incremental path is the hot path: after day 1, almost no external work per user per day.

10. FAQ

Why metadata only? Why not read subjects or bodies for richer signal?

Privacy guardrail. We chose scopes (gmail.metadata, Mail.ReadBasic) that physically cannot return content, so there's no "oops, we saved something we shouldn't" risk at the code level.

Will this work for someone who connects only their calendar, not their email?

Yes. Each scope is a per-user opt-in. Calendar-only signals populate the meeting-based heuristics (W4 and K3). Email-based clauses stay inactive for that user โ€” their contacts land Cold unless they also have a work overlap in Slice C.

What if two team members both have the same contact at Warm โ€” whose name shows up as Key Connection?

The team member whose weighted score is higher (see technical spec ยง 5). Ties broken alphabetically. The UI can still show all three in the "Connection of" filter so the user picks.

How fresh is the data a user sees in the UI?

Within ~minutes of the last Google/Microsoft sync โ€” incremental rollup runs on event insert. The nightly rebuild at 00:00 UTC reconciles drift and ages out windowed counters (so "last 12 months" stays accurate as time moves forward).

If I override my Warm relationship with someone to Cold, does that affect my teammates?

No. Overrides are per-user. Your override affects your own row in UCC and lowers the team's breadth count for that contact โ€” but other team members' tiers are untouched.

What if Findem says a person doesn't exist?

Their email goes into the negative cache for 7โ€“30 days. Any interactions we see with that address sit unresolved (contact_id = NULL) and don't contribute to strength. If that person is later imported as a Contact through normal channels, a backfill worker attributes their historical interactions.

Does this also work when an Inovia admin emails a colleague on the same team?

Those interactions are filtered out before reaching the resolver โ€” a user-to-user email shouldn't make them look like each other's "contacts." They remain team members, not contacts of each other.

Next

For implementation-level detail (tables, services, phased plan, file references, decision records), see the companion technical spec.