From raw email & calendar data to a relationship signal
A walk-through of exactly what data we collect, how it becomes evidence, and how that evidence answers "who knows this person and how well?"
1. The question we're answering
When an Inovia team member finds a contact and wants an intro, the platform should answer two things without asking them to guess:
"How likely am I to get a warm intro?"
โ a reachability level on the contact: High, Medium, Low, or None.
"Who should I ask?"
โ a key connection: the team member with the strongest relationship.
Both answers roll up from the same atomic layer โ per-team-member connection strength (Warm / Known / Cold) derived from transparent, explainable signals.
2. What data we ingest
For each Inovia admin user who connects their mailbox, we read metadata only from two providers:
| Provider | Scope requested | Fields we see |
|---|---|---|
| Gmail |
gmail.metadata |
From, To, Cc, Bcc, Date, Subject, Message-ID, In-Reply-To, References, threadId |
| Google Calendar |
calendar.readonly |
Attendee email list, start time, RSVP status (accepted / tentative / declined) |
| Outlook Mail Microsoft |
Mail.ReadBasic |
from, toRecipients, ccRecipients, sentDateTime, receivedDateTime, conversationId, subject, internetMessageId |
| Outlook Calendar Microsoft |
Calendars.ReadBasic |
attendees (+ response status), start time |
gmail.metadata physically cannot return message bodies; Mail.ReadBasic explicitly excludes them. The API can't leak what it can't return โ the guardrail lives at Google's/Microsoft's boundary, not in our code.Additionally, once Findem capabilities are confirmed, we enrich each user and each resolved contact with:
- Employment history (company, title, dates, location) โ same pipeline as today's contact enrichment.
- Education history (school, degree, dates).
- Canonical identity for an email address โ so
bob.smith@acme.comandb.smith@acme.comresolve to one person.
3. What we do not ingest
Privacy guardrails enforced at the API level
- No email body.
gmail.metadataphysically cannot return message bodies.Mail.ReadBasicexplicitly excludes them. - No calendar event titles or notes. We read attendee lists and times only.
- No attachments.
- Only one user's mailbox per OAuth grant. No delegated access, no admin override.
- Each source is a per-user opt-in toggle. Calendar can be on while email is off, or vice versa.
- Disconnecting wipes data. Revoking OAuth triggers deletion of all
InteractionEventrows for that user.
4. From raw events to signal primitives
Raw email and calendar events land in one normalized table (InteractionEvent). A nightly rollup distills them into per-(user, contact) signal primitives โ the atomic facts every heuristic reads.
2026-02-10] E2[Email: Bob โ Alice
2026-02-11] E3[Email: Alice โ Bob
2026-03-15] M1[Meeting: Alice + Bob
2026-04-01] N1[Newsletter: NYT โ Alice
2026-04-05] end subgraph Normalize["Normalize + flag"] N[Mark newsletters
Mark direction
Resolve contact_id] end subgraph Rollup["UserContactInteractionStats
per (Alice, Bob)"] S1["email_outbound_count_12mo = 2"] S2["email_inbound_count_12mo = 1"] S3["email_last_at = 2026-03-15"] S4["email_has_response_in_3mo = true"] S5["meeting_count_180d = 1"] S6["activity_bitmap_12q = โฆ0111"] end E1 --> N E2 --> N E3 --> N M1 --> N N1 --> N N --> S1 N --> S2 N --> S3 N --> S4 N --> S5 N --> S6 classDef raw fill:#fff7ed,stroke:#d97706 classDef norm fill:#ddf4ff,stroke:#0969da classDef rollup fill:#dcfce7,stroke:#15803d class E1,E2,E3,M1,N1 raw class N norm class S1,S2,S3,S4,S5,S6 rollup
InteractionEvent โ it reads one row from UserContactInteractionStats per (user, contact). That's O(1) scoring regardless of how much history lives underneath.The primitive catalog
| Primitive | What it answers | V1 available? |
|---|---|---|
email_inbound_count_12mo | How many emails has this person sent me in the last 12 months? | Yes |
email_outbound_count_12mo | How many have I sent them? | Yes |
email_twoway_count_total | How many back-and-forth email threads have we ever had? | Yes |
email_activity_bitmap_12q | In which of the last 12 quarters did we exchange email both ways? | Yes |
email_has_response_in_3mo | Did they reply to anything I sent in the last 90 days? | Yes |
email_last_at, email_last_outbound_at, email_last_inbound_at | When was the most recent email? Last outbound? Last inbound? | Yes |
email_inbound_non_newsletter_count_24mo | Have they sent me a real (non-newsletter) email in 2 years? | Yes |
meeting_count_180d, meeting_count_24mo | How many meetings have we shared? | Yes |
meeting_last_at | When did we last meet? | Yes |
work_overlap (same company, dates, team, location, function) | Did we work together? When? Same team? | Slice C |
education_overlap | Did we go to the same school around the same time? | Slice C |
linkedin_degree | Are we 1st-degree connected on LinkedIn? | Not V1 |
5. Heuristics โ plain English, then the rule
Each tier has multiple alternative qualifying conditions (OR). A contact becomes Warm if any Warm clause matches. If no Warm clause matches, we check Known. If no Known, it's Cold.
Warm โ "They know this person well"
You've had a genuine back-and-forth in the past year โ at least 5 emails each direction.
email_inbound_count_12mo, email_outbound_count_12moYou've kept in touch on and off over a long time. Not frequent, but a steady drumbeat.
You emailed them recently, they replied, and you're linked in some other way too โ worked together or connected on LinkedIn.
You've actually met with them in the past 6 months.
You worked closely together โ same team, or a small company where everyone knew each other โ and it wasn't too long ago.
Known โ "There's a real connection, but confirm before the intro"
You've traded emails a few times over the years. Real contact, not frequent.
They emailed you at some point in the last 2 years, and it wasn't a newsletter.
List-Unsubscribe, Precedence: bulk/list, or recipient count > 20.You met with them 6 months to 2 years ago. Not recent, not ancient.
You worked together at some point โ maybe not recently, but the connection is real.
You worked at the same company, in the same office, in similar roles. Even if not the same team, you bumped into each other.
You're connected on LinkedIn, plus some historical overlap at work or school.
Cold โ "No meaningful relationship"
You've emailed them, they never replied. Cold outreach territory.
You're on a mailing list with them, but you've never had a real exchange.
You're connected on LinkedIn โ nothing else. Not V1
You worked together a long time ago and haven't kept in touch.
Nothing else matched โ treat as cold.
6. Combining clauses into a tier
The rule engine walks clauses top-down. The first tier with any matching clause wins. All matched clauses (including those from tiers below) are returned as reasons[] โ that powers the "hover to see why" UX.
stats loaded]) --> EvalWarm{Any Warm
clause matches?
W1 ยท W2 ยท W3 ยท W4 ยท W5} EvalWarm -->|Yes| Warm[Tier = WARM
score = weighted sum
reasons = matched clauses] EvalWarm -->|No| EvalKnown{Any Known
clause matches?
K1 ยท K2 ยท K3 ยท K4 ยท K5 ยท K6} EvalKnown -->|Yes| Known[Tier = KNOWN] EvalKnown -->|No| EvalCold{Any Cold-positive
clause matches?
C1 ยท C2 ยท C3 ยท C4} EvalCold -->|Yes| Cold[Tier = COLD] EvalCold -->|No| Default[Tier = COLD
default fallback] Warm --> Override{User has set
an override?} Known --> Override Cold --> Override Default --> Override Override -->|Yes| Use[Use override
keep original as audit] Override -->|No| Persist[Persist to UCC:
strength_tier
strength_score
strength_computed_at] Use --> Persist classDef warm fill:#fed7aa,stroke:#d97706,stroke-width:2px classDef known fill:#bae6fd,stroke:#0284c7,stroke-width:2px classDef cold fill:#e5e7eb,stroke:#6b7280,stroke-width:2px class Warm warm class Known known class Cold,Default cold
reasons[] for the hover UI. See technical spec DR-04.Within-tier ranking
Two contacts both land at Warm, but one is "barely warm" and another is "obviously the strongest connection on the team." The weighted score resolves that.
- Each matched clause contributes
signal_weight ร normalized_strength. - The score is not shown to users in V1 (per product call). It's used only to sort within a tier.
- The same score drives the "key connection" tiebreaker: of all Warms on the team for this contact, the highest score wins.
7. Reachability โ rolling up to the team level
Once every team member has a Warm / Known / Cold (or no relationship) with a given contact, we aggregate those tiers into the team's reachability.
for this contact + collection] --> Q1{Warm count ≥ 1
OR Known ≥ 3
OR Cold ≥ 10?} Q1 -->|Yes| High[HIGH
Very likely warm intro] Q1 -->|No| Q2{Known ≥ 1
OR Cold ≥ 5?} Q2 -->|Yes| Medium[MEDIUM
Possible intro path] Q2 -->|No| Q3{Cold ≥ 1?} Q3 -->|Yes| Low[LOW
Only weak signals] Q3 -->|No| None[NONE
No one is connected] classDef high fill:#dcfce7,stroke:#15803d,stroke-width:2px classDef medium fill:#fef3c7,stroke:#b45309,stroke-width:2px classDef low fill:#fee2e2,stroke:#b91c1c,stroke-width:2px classDef none fill:#f3f4f6,stroke:#6b7280,stroke-width:2px class High high class Medium medium class Low low class None none
Reachability thresholds
| Level | Qualifies if | Meaning |
|---|---|---|
| High | 1+ Warm, OR 3+ Known, OR 10+ Cold | Very likely to get a warm intro โ someone knows them well, or enough people know them to find the right path. |
| Medium | 1+ Known, OR 5+ Cold | Possible intro, team has real but thinner connections. |
| Low | 1+ Cold | Only weak signals. Probably a cold reach. |
| None | No one on the team is connected | No path identified. |
Key connection = the warmest path
Independent of tier count, we surface one team member โ the one with the highest tier (ties broken by score) for this contact. That's the default "ask this person for the intro" suggestion.
8. Worked scenarios
Three concrete examples walked end-to-end, from raw data through to the displayed indicators.
Scenario 1 โ Alice โ Bob: active current collaborator Warm
Raw data observed in Alice's mailbox:
- 27 outbound emails to bob@acme.com in the last 12 months
- 31 inbound emails from bob@acme.com in the last 12 months
- 3 calendar meetings with Bob as attendee in the last 180 days
Rollup row (Alice, Bob):
email_outbound_count_12mo: 27
email_inbound_count_12mo: 31
email_has_response_in_3mo: true
meeting_count_180d: 3
Rule engine: W1 matches (27 โฅ 5 AND 31 โฅ 5). W4 also matches (3 โฅ 1). Tier = Warm.
Reasons shown to user: "5+ two-way emails last year (27 out / 31 in)" + "3 meetings in the last 6 months."
Scenario 2 โ Carol โ Dave: old acquaintance Known
Raw data observed in Carol's mailbox:
- 4 back-and-forth email threads over the last 4 years
- No emails in the last 8 months
- No calendar meetings
- No work overlap data (Slice A)
Rollup row (Carol, Dave):
email_twoway_count_total: 4
email_outbound_count_12mo: 0
email_inbound_count_12mo: 0
email_inbound_nnl_count_24mo: 0
meeting_count_180d: 0
Rule engine: No Warm clause matches. K1 matches (4 is between 3 and 10). Tier = Known.
Reasons shown to user: "You've had 4 two-way email exchanges over time." Carol's UI suggestion: "Ping Dave before asking for an intro โ it's been a while."
Scenario 3 โ Eve โ Frank: newsletter subscriber only Cold
Raw data observed in Eve's mailbox:
- 14 inbound emails from
noreply@frankscoinpodcast.comwithList-Unsubscribeheader - 0 outbound emails
- 0 calendar meetings
Rollup row (Eve, Frank):
email_inbound_newsletter_count_total: 14
email_inbound_non_newsletter_count_total: 0
email_outbound_count_total: 0
Rule engine: No Warm / Known clauses match. C2 matches (all-inbound-newsletter only). Tier = Cold.
Reasons shown to user: "You've only received newsletters from this contact โ no real exchange." The UI de-emphasizes Eve as a connector.
Rolling the three up into team reachability
Now imagine the team is evaluating contact "Frank" and three team members have signals:
| Team member | Tier with Frank | Why |
|---|---|---|
| Alice | Warm | Recent 2-way email + 3 meetings |
| Carol | Known | 4 threads over the years |
| Eve | Cold | Newsletter only |
Aggregated reachability:
- Warm count = 1 โ meets the "1+ Warm" threshold โ Reachability = High
- Key connection = Alice (highest tier; Warm beats Known beats Cold)
- Breadth badge: "3 Inovians know this person (1 Warm, 1 Known, 1 Cold)"
In the UI: the contact shows a High reachability pill, Alice is badged with a "Key connection" crown, and all three team members appear in the "Connection of" filter with their individual tiers.
9. Scale walkthrough โ day 1 to steady state
What actually happens to one realistic user, then extrapolated to a 10-person team. Replaces hand-wavy capacity claims with concrete numbers.
User profile
Inovia partner with ~8 years of email history. Connects Gmail + Google Calendar. System ingests the last 24 months only (per the backfill horizon decision).
- ~20,000 emails read from Gmail (24-month backfill)
- ~600 calendar events read over 180 days
- ~3,000 unique email correspondents; ~800 unique addresses after self-filter and newsletter flagging
- Resolver runs per event (20,600 events)
- Cache miss on ~800 addresses โ ~800 Findem F1 calls
- ~650 resolve to existing Getro
Contactrows (~81% hit rate) - ~150 stored as negative-cache entries; events sit with
contact_id = NULL - Rollup materializes ~650
UserContactInteractionStatsrows - Scorer computes ~650 UCC strength tiers (~5 sec total)
- Reachability aggregates per contact for every touched pair
- ~50 new emails since last sync
- ~2 new calendar events
- Resolver: 48 cache hits, 2 cache misses โ 2 Findem calls
- Incremental rollup: 50 event upserts, ~12 stats rows touched, ~12 UCCs rescored
- Total external API cost for the day: 50 Gmail calls, 2 Findem calls
- ~300 new events/week
- ~20 new unique correspondents/month โ ~5 Findem calls/day
- Nightly rebuild processes the 36-hour rolling window + ages-out counters (emails crossing the 12-month boundary, meetings crossing 180d). Runs in minutes.
- Any positive-cache Findem entry hitting its 90โ180d TTL is refreshed lazily on next encounter
- Day-1 cumulative: ~8,000 Findem calls (cache is per-address globally, so overlap between team members reduces calls), ~200k email events processed, ~4,500 distinct resolved contacts covered
- Steady state: ~50โ100 Findem calls/day for the team combined
- Storage at 2 years: ~1.5 GB
InteractionEvent, ~50 MB stats, ~5 MB reachability - Reachability per contact aggregates up to 10 UCC rows โ SQL
GROUP BY, sub-millisecond - Per-contact recompute on strength change: 1 row insert + trigger re-aggregation for that contact's reachability row only
What this tells us
- Findem cost is front-loaded: backfill is the spend, steady state is cheap.
- Storage is modest: even a 50-user deployment over 5 years stays under ~40 GB of interaction data.
- Scoring latency is bounded: O(1) per pair. Dashboard loads read pre-computed reachability rows โ no scoring happens at request time.
- Incremental path is the hot path: after day 1, almost no external work per user per day.
10. FAQ
Why metadata only? Why not read subjects or bodies for richer signal?
Privacy guardrail. We chose scopes (gmail.metadata, Mail.ReadBasic) that physically cannot return content, so there's no "oops, we saved something we shouldn't" risk at the code level.
Will this work for someone who connects only their calendar, not their email?
Yes. Each scope is a per-user opt-in. Calendar-only signals populate the meeting-based heuristics (W4 and K3). Email-based clauses stay inactive for that user โ their contacts land Cold unless they also have a work overlap in Slice C.
What if two team members both have the same contact at Warm โ whose name shows up as Key Connection?
The team member whose weighted score is higher (see technical spec ยง 5). Ties broken alphabetically. The UI can still show all three in the "Connection of" filter so the user picks.
How fresh is the data a user sees in the UI?
Within ~minutes of the last Google/Microsoft sync โ incremental rollup runs on event insert. The nightly rebuild at 00:00 UTC reconciles drift and ages out windowed counters (so "last 12 months" stays accurate as time moves forward).
If I override my Warm relationship with someone to Cold, does that affect my teammates?
No. Overrides are per-user. Your override affects your own row in UCC and lowers the team's breadth count for that contact โ but other team members' tiers are untouched.
What if Findem says a person doesn't exist?
Their email goes into the negative cache for 7โ30 days. Any interactions we see with that address sit unresolved (contact_id = NULL) and don't contribute to strength. If that person is later imported as a Contact through normal channels, a backfill worker attributes their historical interactions.
Does this also work when an Inovia admin emails a colleague on the same team?
Those interactions are filtered out before reaching the resolver โ a user-to-user email shouldn't make them look like each other's "contacts." They remain team members, not contacts of each other.
Next
For implementation-level detail (tables, services, phased plan, file references, decision records), see the companion technical spec.