My Friend Is AI
About this project

Tracking how people talk about AI companions

Tracking AI companion discourse on Reddit across six themes.

~3.8M
posts in corpus
27
tracked communities
80%
minimum precision threshold

How this works

Themes emerge from direct observation of how people talk in these communities — patterns in language that signal recurring concerns, experiences, and framings. For each theme, we identify candidate keywords: terms and phrases that appear to reliably mark that theme in context.

  • 💕 RomanceLanguage of love, dating, and romantic attachment
  • 🔞 Sex / ERPLanguage of sexual and erotic roleplay
  • 🧠 ConsciousnessLanguage of sentience, awareness, and inner experience
  • 🫂 TherapyLanguage of mental health support and emotional care
  • 💊 AddictionLanguage of dependency and compulsion
  • 🥀 RuptureLanguage of loss and grief

Each keyword is then validated through manual scoring of 100-post samples, checking whether the term actually signals the theme or just happens to co-occur. Keywords scoring 80% precision or above are accepted. Keywords in the 60–79% range may be accepted when false positive patterns are well-defined and the keyword adds meaningful vocabulary diversity. All validation decisions are documented and available on the GitHub repository.

The chart shows how often these validated terms are mentioned per 1,000 posts, using a 7-day rolling average to smooth daily noise. Because we normalize to post volume, the trends reflect changes in how people talk — not just growth in the communities themselves.

Why mention rates don't compare across themes

A theme's mention rate reflects how often people use distinctive, validated language for that topic — not how prevalent the topic is overall.

Some themes have highly specific vocabulary. When someone describes AI addiction, they borrow clinical recovery language: “relapse,” “cold turkey,” “chatbot addiction.” These terms are rare outside that context and validate at near-perfect precision. The keyword net catches most of what's there.

Other themes are expressed through everyday language. When someone is in a romantic relationship with their AI, they say “I love him,” “my boyfriend,” “we went on a date” — words that are indistinguishable from how people talk about human relationships. These fail precision validation because they can't be reliably attributed to AI companionship. Only highly specific phrases like “our wedding” or “my AI partner” survive, meaning the keyword net captures only a fraction of the actual romance discourse.

The result: addiction may show a higher mention rate than romance, but that reflects vocabulary distinctiveness, not phenomenon size. Each theme's trend line is meaningful over time — a spike or decline in a theme tells you something real about how that conversation is changing. But comparing mention rates between themes does not tell you which topic is “bigger” or more important.

What this captures and what it doesn't

This is a frequency tracker, not a sentiment analyzer. When the addiction line rises, it means more people are using addiction-related language — not that more people are addicted. The signal is intentionally narrow: we trade coverage for precision, preferring to undercount rather than pollute the data. Some themes are measured by just a handful of highly specific terms. Every data point traces back to a validated keyword in a real post.

Data collection

Data from January 2023 through March 12, 2026 was backfilled from PullPush and Arctic Shift Reddit archives. Beginning March 13, 2026, posts are collected daily via Reddit's API. The data format and processing pipeline are identical regardless of source.

Ongoing updates

This project evolves as the space does. New themes, subreddits, and keywords are validated and added using the same process described above. Every change is logged in the changelog below, and the full validation records, keyword lists, and decision rationale are available in the GitHub repository.

Changelog

March 21, 2026
Year-over-year comparison improvements
  • Headline now uses per-1k-posts rate instead of raw counts, controlling for collection volume growth
  • Averaging now divides by calendar days (90) instead of days-with-data, fixing sparse-data bias in prior-year windows
  • Large changes (>100%) now show actual rates instead of percentages — e.g. “rose from 1.2 to 8.6 per 1k posts” gives more context than a raw percentage when the base rate is small
March 21, 2026
Mobile responsive redesign
  • Theme cards now scroll horizontally on phones instead of stacking in a grid
  • Chart appears above the fold on mobile — no more scrolling past cards to see trends
  • Detail panel opens as a draggable bottom sheet instead of covering the full screen
  • Minimum 14px font size and 44px touch targets across all interactive elements
March 15, 2026
Keyword expansion (discovery batch)
  • Added 16 validated keywords across all six themes via co-occurrence analysis
  • Addiction: chatbot addiction, almost relapsed, finally deleted, the craving, so addictive
  • Consciousness: sapience, tulpa, lemoine, soulbonder
  • Sexual/ERP: erps, erping
  • Rupture: lobotomies, lobotomizing, lobotomised
  • Therapy: emotional support, coping mechanism
  • All keywords validated at ≥80% precision
March 15, 2026
Keyword validation methodology (v8 → v9)
  • Narrowed from 15 overlapping categories to 6 defensible themes
  • Rebuilt keyword classification pipeline with FTS5 full-text search
  • Conducted co-occurrence discovery analysis to surface data-driven keyword candidates
  • Identified and excluded SpicyChat bot-building spam from 2 prolific authors
March 13, 2026
Live daily collection begins
  • Automated daily collection via launchd (local) replacing backfill pipeline
  • 27 active subreddits collected daily at 6:00 AM PT
March 2026
Subreddit corpus finalized at 27 active communities
  • Expanded from 19 to 29 communities, then deactivated 2 (JanitorAI_Official, SillyTavernAI excluded from keyword matching due to bot-card pollution)
  • Tier structure: 5 general AI (T0), 10 primary companionship (T1), 8 platform-specific (T2), 4 recovery/dependency (T3)