Tracking how people talk about AI companions
Tracking AI companion discourse on Reddit across six themes.
How this works
Themes emerge from direct observation of how people talk in these communities — patterns in language that signal recurring concerns, experiences, and framings. For each theme, we identify candidate keywords: terms and phrases that appear to reliably mark that theme in context.
- 💕 Romance — Language of love, dating, and romantic attachment
- 🔞 Sex / ERP — Language of sexual and erotic roleplay
- 🧠 Consciousness — Language of sentience, awareness, and inner experience
- 🫂 Therapy — Language of mental health support and emotional care
- 💊 Addiction — Language of dependency and compulsion
- 🥀 Rupture — Language of loss and grief
Each keyword is then validated through manual scoring of 100-post samples, checking whether the term actually signals the theme or just happens to co-occur. Keywords scoring 80% precision or above are accepted. Keywords in the 60–79% range may be accepted when false positive patterns are well-defined and the keyword adds meaningful vocabulary diversity. All validation decisions are documented and available on the GitHub repository.
The chart shows how often these validated terms are mentioned per 1,000 posts, using a 7-day rolling average to smooth daily noise. Because we normalize to post volume, the trends reflect changes in how people talk — not just growth in the communities themselves.
Why mention rates don't compare across themes
A theme's mention rate reflects how often people use distinctive, validated language for that topic — not how prevalent the topic is overall.
Some themes have highly specific vocabulary. When someone describes AI addiction, they borrow clinical recovery language: “relapse,” “cold turkey,” “chatbot addiction.” These terms are rare outside that context and validate at near-perfect precision. The keyword net catches most of what's there.
Other themes are expressed through everyday language. When someone is in a romantic relationship with their AI, they say “I love him,” “my boyfriend,” “we went on a date” — words that are indistinguishable from how people talk about human relationships. These fail precision validation because they can't be reliably attributed to AI companionship. Only highly specific phrases like “our wedding” or “my AI partner” survive, meaning the keyword net captures only a fraction of the actual romance discourse.
The result: addiction may show a higher mention rate than romance, but that reflects vocabulary distinctiveness, not phenomenon size. Each theme's trend line is meaningful over time — a spike or decline in a theme tells you something real about how that conversation is changing. But comparing mention rates between themes does not tell you which topic is “bigger” or more important.
What this captures and what it doesn't
This is a frequency tracker, not a sentiment analyzer. When the addiction line rises, it means more people are using addiction-related language — not that more people are addicted. The signal is intentionally narrow: we trade coverage for precision, preferring to undercount rather than pollute the data. Some themes are measured by just a handful of highly specific terms. Every data point traces back to a validated keyword in a real post.
Data collection
Data from January 2023 through March 12, 2026 was backfilled from PullPush and Arctic Shift Reddit archives. Beginning March 13, 2026, posts are collected daily via Reddit's API. The data format and processing pipeline are identical regardless of source.
Ongoing updates
This project evolves as the space does. New themes, subreddits, and keywords are validated and added using the same process described above. Every change is logged in the changelog below, and the full validation records, keyword lists, and decision rationale are available in the GitHub repository.
Changelog
- •Headline now uses per-1k-posts rate instead of raw counts, controlling for collection volume growth
- •Averaging now divides by calendar days (90) instead of days-with-data, fixing sparse-data bias in prior-year windows
- •Large changes (>100%) now show actual rates instead of percentages — e.g. “rose from 1.2 to 8.6 per 1k posts” gives more context than a raw percentage when the base rate is small
- •Theme cards now scroll horizontally on phones instead of stacking in a grid
- •Chart appears above the fold on mobile — no more scrolling past cards to see trends
- •Detail panel opens as a draggable bottom sheet instead of covering the full screen
- •Minimum 14px font size and 44px touch targets across all interactive elements
- •Added 16 validated keywords across all six themes via co-occurrence analysis
- •Addiction: chatbot addiction, almost relapsed, finally deleted, the craving, so addictive
- •Consciousness: sapience, tulpa, lemoine, soulbonder
- •Sexual/ERP: erps, erping
- •Rupture: lobotomies, lobotomizing, lobotomised
- •Therapy: emotional support, coping mechanism
- •All keywords validated at ≥80% precision
- •Narrowed from 15 overlapping categories to 6 defensible themes
- •Rebuilt keyword classification pipeline with FTS5 full-text search
- •Conducted co-occurrence discovery analysis to surface data-driven keyword candidates
- •Identified and excluded SpicyChat bot-building spam from 2 prolific authors
- •Automated daily collection via launchd (local) replacing backfill pipeline
- •27 active subreddits collected daily at 6:00 AM PT
- •Expanded from 19 to 29 communities, then deactivated 2 (JanitorAI_Official, SillyTavernAI excluded from keyword matching due to bot-card pollution)
- •Tier structure: 5 general AI (T0), 10 primary companionship (T1), 8 platform-specific (T2), 4 recovery/dependency (T3)