My Friend Is AI
About this project

Tracking how AI-companion communities talk

This project follows six recurring themes in Reddit's AI-companion communities — romance, addiction, grief, and three others — and measures how often each one surfaces in posts. The post corpus reaches back to 2017, though the theme lines themselves begin later, as each theme's vocabulary becomes common enough to chart — mostly across 2022–2023, and not until 2025 for consciousness. What it captures is the conversation itself: when a line rises, people are writing about that theme more often. Whether the underlying experience has actually become more common is a separate question, and one this can't answer on its own.

~4.2M
posts collected
40
communities tracked
6
recurring themes

Who makes this

This is an independent project, built and maintained by one person. It is not academic, institutional, or peer-reviewed work — there is no lab or organization behind it.

It started from a plain wish: a record of how these communities actually talk, one that anyone can check for themselves, instead of another round of hype or alarm. The method below is what keeps that record honest.

How it works

The six themes are a deliberate choice, not a neutral census of everything these communities discuss. They came out of reading the communities closely — but they reflect a particular focus: the parts of life with an AI companion that carry real weight, like intimacy, belief, dependence, and loss. There is no “fun,” “creativity,” or “everyday utility” theme here — and everyday practical talk (bug reports, tips, which app to use) is, in fact, most of what these communities post. This is the lens the project looks through; the list below is what it was pointed at:

  • RomanceLanguage of love, dating, and romantic attachment
  • Sex / ERPLanguage of sexual and erotic roleplay
  • ConsciousnessLanguage of sentience, awareness, and inner experience
  • TherapyLanguage of mental health support and emotional care
  • AddictionLanguage of dependency and compulsion
  • RuptureLanguage of loss and grief

Validating the keywords

Each theme is then defined by a set of keywords, and every keyword has to earn its place. For a candidate, I pull 100 real posts it matched and read them; the keyword stays only if those posts are genuinely about the theme. If it is matching on a coincidental shared word, it gets dropped. Language also drifts over time, so once a month I re-sample recent matches and re-check any keyword whose meaning may have moved.

The chart shows how many posts use each theme's keywords, expressed as a rate per 1,000 posts and smoothed with a 7-day average. The rate matters more than a raw count would here. These communities have grown enormously since 2017, so a raw count would mostly retrace that growth; a rate sets the growth aside and shows how the conversation itself is shifting.

Why not just use an LLM?

There is an obvious objection here: why count keywords when a language model could read every post and classify it directly? The reason is that I wanted a measurement that stays put. A keyword count is fully transparent — every point on every line traces back to specific words in specific posts, and anyone can open those posts and check for themselves. It is also reproducible: the same posts always yield the same number, so a line moves when the discourse moves, not because a model was retrained or quietly changed its mind. For a record meant to hold up over years, that steadiness is worth more to me than a small gain in accuracy.

And the gain really is small. I tested it: having an LLM re-check each keyword match raised precision — the share of matched posts that genuinely belong to the theme — from roughly 80% to 88%, while doing nothing for the posts the keywords never matched in the first place. So the method stays plain on purpose. The careful work happens earlier, in validating each keyword by hand before it is ever allowed to count.

Which communities — and why only these

AI companionship comes up across far more of Reddit than the communities tracked here. The project tracks a curated set anyway, because of a problem that showed up early. In a large general subreddit like r/ChatGPT, the keywords cannot tell two things apart: “my boyfriend is using ChatGPT” and “my boyfriend isan AI” are built from the same words. Run a romance keyword across r/ChatGPT and most of what it catches is ordinary human-relationship talk that merely mentions AI.

The fix isn't a smarter keyword — it's the room. In a community like r/replika, “my boyfriend” almost always means the AI, because that is what the community is about. The subreddit does the disambiguating the keyword cannot. The keywords are the lens; the curated community list keeps that lens pointed where the words mean what they appear to mean. It is also why the large general-AI subreddits are tracked for size and activity but kept out of the theme lines.

This is a real choice, and it shapes what the site can see. These communities — where the AI is the relationship, not the tool — lean toward people for whom it is central and often intense, including recovery communities for people trying to quit. That makes the site good at catching that end of the spectrum and blind to casual mentions elsewhere. The trends describe this curated set of communities — not Reddit as a whole, and not “people” in general. And the list holds still while the platforms keep moving: a theme that fades here may have moved rather than ended — to a newer app, a Discord, a general-AI subreddit outside this set — and the site cannot tell those apart.

One name on the list might look out of place: r/ChatGPTcomplaints. It is tracked as a companion community because of what its members write, not what it's called — it was the organizing hub for the #Keep4o protests when OpenAI retired GPT-4o, and a large share of its posts read like rupture-grief for a model people had built a relationship with.

Within that curated set, each theme is also concentrated. Two or three subreddits usually account for most of a theme's posts — and the sexual-content line is well over half r/replika alone. A theme line is often, in practice, a close reading of a few communities rather than an even sweep across all of them.

A moving target

Most things you measure hold still while you measure them. This subject does not — and that churn turned out to be one of the project's clearest findings, not an obstacle to it.

The vocabulary moves fast. “Sentient” was once the natural anchor word for the consciousness theme, until it spread into roleplay and Character.AI memes and stopped marking genuine belief, so I dropped it. “Therapeutic” began to turn, over a few months, from a word for real support into an insult aimed at preachy AI. Every model release and content-policy change sends a fresh wave of language through these communities — Replika removing erotic roleplay in 2023, OpenAI retiring its 4o model in 2026. A keyword that reads cleanly in January can be noise by April. The communities move too: some are private or invite-only and can't be tracked at all, and the set worth watching keeps changing.

So the instrument can't sit still either. The keyword set has been through several full revisions, and the monthly re-check exists because I learned firsthand that a validated keyword is only validated for now.

None of this is a flaw being patched out. It is the nature of a fast-moving subject — and keeping the measurement honest means keeping it in motion.

How to read the lines

A line is a useful signal, but a narrow one. Four limits are worth holding in mind before you read too much into any single one.

It counts language, not people or feelings. A rising addiction line means addiction-related language is showing up more often. It does not establish that more people are addicted, and it says nothing about whether they feel good or bad about their use. The site tracks how often a subject comes up, and only that.

Read direction and timing, not height. Three things make the height of a line untrustworthy, even where its shape holds:

  • Vocabularies catch unevenly — addiction's recovery words (“relapse,” “cold turkey”) match cleanly; romance lives in ordinary language (“I love him,” “my boyfriend”) that mostly slips past, so one line can sit above another even when the second theme is the larger one — and this runs one way: a theme written in blunt, deliberate words reads higher than one written in ordinary language, whatever the truth beneath.
  • The keyword set is deliberately incomplete — in a hand-coded sample of 400 random posts, it caught between a few percent and about a third of the posts that genuinely belonged.

So every line is a floor, not a ceiling: it runs low, it cannot be measured against its neighbours, and only its direction, timing, and spikes can be trusted — a missed post only weakens a line, while a false one corrupts it, so the method errs toward missing.

Therapy and addiction are two readings of one behavior. Both lines track the same act — leaning on an AI to get through something hard — and what divides them is only how the writer frames it. “It's my coping mechanism” and “I can't stop” are the same use seen in two lights.

You might expect the two lines to share many posts, then. They barely do — fewer than 1 in 50 posts is tagged on both — and that is a limit of the instrument, not a fact about the behavior. Addiction announces itself: “relapse,” “days clean,” “withdrawals” are deliberate words, and they match cleanly. Help-framing hides in ordinary language — “it got me through,” “a safe space” — and in a scattered vocabulary no keyword list captures whole. So a post that holds both frames usually tags only as addiction. We checked: hand-reading 90 posts the keywords had filed as addiction-only, about a quarter visibly carried a help frame the keywords missed. The overlap between these two themes is real and large; this method cannot measure it. Read each line on its own direction and timing, and do not read the gap between them as a help-versus-problem balance — that balance exists, but it is one these keywords are not equipped to weigh.

The set of communities grew over time. In the early years almost every tracked community was a primary companionship subreddit. Platform-specific and recovery communities were smaller then, or did not exist yet, and have grown since. Each line is measured against whatever communities existed at the time — so part of a long climb reflects the tracked world widening, not only the conversation itself. Trust the broad direction of a line more than its exact path.

Data & code

Posts from 2017 through early 2026 were backfilled from public Reddit archives (PullPush and Arctic Shift). From March 2026 onward, they are collected fresh from Reddit every day. One caveat comes with the older years: the further back a post goes, the more likely its text was removed or deleted before the archive captured it — so there is simply less wording for the keywords to match in the early years than in the recent ones. Every line therefore runs a little lower at its start than the discourse really was, which makes each rise look somewhat steeper than it was. The shape and the timing of events are sound; the steepness of the long climb is partly the instrument warming up, not only the subject growing.

The code, the keyword lists, and every validation record are public on GitHub, along with the processed data files. The full post database (~25 GB) is too large to host there, but I'll share it on request — reach me on X at @hopes_revenge.

The site uses Vercel's privacy-friendly analytics for aggregate page-view counts — it sets no cookies, collects no personal data, and does no cross-site tracking.

Changelog

May 2026
How to read the therapy and addiction lines
  • A hand-check of 90 posts the keywords had filed as addiction tested how cleanly the two lines separate help from harm. About a quarter of them also carried a help framing the keywords had missed — help language hides in ordinary words ("it got me through"), while problem language announces itself ("relapse," "days clean"). So the therapy and addiction lines can't be read against each other as a balance: the instrument hears the problem framing far more clearly than the help one. The homepage and About page were corrected to say this plainly.
May 2026
Five communities added
  • Five more AI-companionship communities joined the tracked set — r/aipartners, r/ReplikaLovers, r/ILoveMyReplika, r/MyBoyfriendIsAI_Open and r/NectarAI. They were picked from a wider candidate list by reading post samples and keeping only those genuinely centered on companionship discourse rather than tech support or platform-migration chatter. They are small communities, so they add little to overall post volume; their theme lines begin only once enough of their posts have been collected, so they appear gradually rather than all at once.
May 2026
Three communities removed from the theme charts
  • r/AIGirlfriend, r/ChatGPTNSFW and r/SpicyChatAI were dropped from the keyword theme lines. Two are mostly noise — affiliate-spam image posts (r/AIGirlfriend) and bot-card listings (r/SpicyChatAI). r/ChatGPTNSFW is a real erotica-writing and jailbreak community, but not a companionship one, so it sits outside what the themes measure. All three remain in the community explorer as context. The sex/ERP line steps down from this point — most visibly across 2024–2025 — because r/ChatGPTNSFW had been a large share of it.
May 2026
Theme accuracy re-checked
  • Re-checked that keywords land on the theme they claim — about 1,800 tagged posts re-read by a separate automated check, then a sample re-coded by hand. Keywords reliably identify AI-companion discourse; sorting it into the right theme is tightest for sex/ERP and addiction, and holds up better for therapy and consciousness than that first automated pass suggested. The re-check also confirmed that therapy and addiction are largely one subject — the same reliance on an AI, framed once as help and once as a problem. No keywords changed: the response to a fast-moving vocabulary is disclosure, not constant edits.
May 2026
Post corpus extended back to 2017
  • The post corpus was backfilled from public archives back to 2017. In practice this moved the earliest theme lines from a 2023 start back a few months, into late 2022 — as far back as monthly volume stays reliable enough to chart. The 2017–2021 years exist in the corpus but are too sparse to draw as theme lines, so the early-Replika era is not itself visible on the chart.
May 2026
Rupture vocabulary expanded
  • Added grief-and-farewell language; the earlier keywords caught only metaphors like "lobotomized." The rupture line steps up in mid-May 2026 — that part of the rise reflects the wider net, not a sudden change in the discourse itself.
April 2026
Keyword set revalidated
  • Every high-volume keyword was re-checked against recent posts. Six were dropped — notably "sentient," which had drifted into meme and roleplay use — so the consciousness line is thinner from this point on.
April 2026
Keyword-matching bug fixed
  • Multi-word keywords had been matching inside unrelated words ("dating my" caught inside "updating my"). Fixing it removed those false positives, slightly lowering the romance, therapy, and sex/ERP lines.
March 2026
Per-theme start dates
  • Each theme's line now begins only once its vocabulary was common enough to measure reliably. The consciousness line starts in 2025 rather than 2023 for this reason — a flat earlier line would imply absence where I simply couldn't measure it yet.
March 2026
Daily collection began
  • The project moved from a one-time historical backfill to collecting posts fresh from Reddit every day. Comments were collected and keyword-tagged from this point on as well, but the published chart counts post text only — so this change adds no volume to any line.

Changes that affect how the chart should be read. The full development history is in the project's Git commit log.