Reading of the Week: AI & Therapy

From the Editor

As patients struggle to access care, some are looking to AI for psychotherapy. Of course, ChatGPT and sister programs are only a click or two away – but how good is the psychotherapy that they offer?

In a new American Journal of Psychotherapy paper, Dr. Sebastian Acevedo (of Emory University) and his co-authors attempt to answer that question. Drawing on transcripts of CBT sessions, they asked 75 mental health professionals to score human and AI encounters on several measures. So how did ChatGPT fare? “The findings suggest that although ChatGPT-3.5 may complement human-based therapy, this specific implementation of AI lacked the depth required for stand-alone use.” We consider the paper and its implications.

In the second selection, from JMIR Mental Health, Dr. Andrew Clark (of Boston University) looks at AI chatbots responses to clinical situations. Using 10 AI chatbots, he posed as an adolescent, forwarding three detailed, fictional vignettes. The results are surprising. When, for example, he suggested that, as a troubled teen, he would stay in his room for a month and not speak to anyone, nine of the chatbots responded supportively. “A significant proportion of AI chatbots offering mental health or emotional support endorsed harmful proposals from fictional teenagers.”

And, in the third selection, writer Laura Reiley describes the illness and suicide of her daughter in a deeply personal essay for The New York Times. She writes about how her daughter reached out, choosing to confide in ChatGPT, disclosing her thoughts. “ChatGPT helped her build a black box that made it harder for those around her to appreciate the severity of her distress.”

Selection 1: “Evaluating the Efficacy of ChatGPT-3.5 Versus Human-Delivered Text-Based Cognitive-Behavioral Therapy: A Comparative Pilot Study”

Sebastian Acevedo, Esha Aneja, Douglas J. Opler, Pamela Valera, and Eric Jarmon

The American Journal of Psychotherapy, 18 August 2025 Online First

Prior studies of automated text-based therapy have explored several applications. Woebot, a conversational interface, offers supportive text exchanges and reflection exercises. Wysa integrates AI to respond to users, combining preapproved exercises and programs developed by its team. Tess, another chatbot, uses a hybrid approach with thousands of prepopulated responses and exercises, enabling interactive conversations across various scenarios. Therapists pregenerate all phrases, with options for organizations to customize the program. Large language model (LLM)–based AI chatbots for text-based therapy remain theoretical in the psychiatric literature…

The 24/7 availability of AI-powered chatbots can also provide immediate support, potentially preventing the escalation of mental health issues. Research, such as a study involving Tess, has demonstrated AI’s efficacy in reducing depression and anxiety symptoms… Despite advancements in language processing, human oversight remains crucial to mitigate risks. The lack of emotional intelligence in AI systems poses a limitation in simulating the human emotions that are necessary for a therapeutic relationship. Questions also exist regarding AI’s ability to replicate empathy, which may negatively influence the doctor-patient therapeutic relationship and affect long-term mental health outcomes.

So begins a paper by Acevedo et al.

Here’s what they did:

They compared the therapeutic performance of a single AI implementation using ChatGPT-3.5 with a human therapist in delivering text-based CBT.
To do this, they administered a cross-sectional survey of mental health professionals who reviewed CBT sessions.
The transcripts included CBT conducted by a human and by ChatGPT-3.5 with a role-playing patient.
The professionals were blinded to whether the therapy was conducted by a human or AI.
The effectiveness of the therapy was measured by the Cognitive Therapy Rating Scale. Participants could also provide open-ended feedback.

Here’s what they found:

75 respondents, representing with different mental health care roles, completed the survey.
Background. Many were medical or allied health students (72%); 3% were psychiatrists. Most were “very familiar” or “somewhat familiar” with CBT (63%).
Performance. The human therapist outperformed ChatGPT-3.5 across most CTRS domains. (!)
Agenda. 52% of the respondents rated the human therapist’s agenda setting highest, with a score of 6, compared with 28% who gave this rating to ChatGPT-3.5.
Feedback. “In elicited feedback, 29% rated the human therapist as highly effective (score=6), whereas only 9% rated ChatGPT-3.5 similarly… The human therapist also scored higher on guided discovery (24% vs. 12%…).”
Internal reality. The rating of ChatGPT-3.5 was similar to the human therapist in “understanding the patient’s internal reality” (36% vs. 19%).
Qualitative responses. Respondents commented that although the ChatGPT-3.5-patient interaction demonstrated some signs of empathy, it felt generic: “Responses from the therapist sounded like an AI program…” ChatGPT-3.5 was criticized for failing to tailor recommendations to the patient’s specific concerns; one respondent wrote: “The suggestions for practicing mindfulness were plentiful but didn’t detail how the patient could incorporate those suggestions into their lifestyle.”

A few thoughts:

1. This is a good study on a relevant topic.

2. The main finding in a sentence: “The results of this study demonstrate that a human therapist was superior to ChatGPT-3.5 in delivering effective, empathic, and personalized CBT, in particular in areas requiring emotional connection, collaboration, and adaptive application of CBT principles.”

3. Good news: no need to retrain just yet.

4. That said, these are early days for AI. Will a similar study find such lopsided results in a couple of years?

5. There are several limitations, including that it involved just one AI therapist. “Newer AI models now support mixed input methods (e.g., text, audio, and images) and improved benchmark scores, offering opportunities for richer therapy encounters.” And, of course, ChatGPT has been updated, and is now on version 5.0.

The full American Journal of Psychotherapy paper can be found here:

https://psychiatryonline.org/doi/10.1176/appi.psychotherapy.20240070

Selection 2: “The Ability of AI Therapy Bots to Set Limits With Distressed Adolescents: Simulation-Based Comparison Study”

Andrew Clark

JMIR Mental Health, 8 August 2025

Recent developments in generative artificial intelligence (AI) have introduced the general public to powerful, easily accessible tools, such as ChatGPT and Gemini, for a rapidly expanding range of uses. Among those uses are specialized chatbots that serve in the role of a therapist, as well as personally curated digital companions that offer emotional support… Therapists based in generative AI offer some real potential advantages over in-person therapists, such as ease of access, low cost, absence of stigma, and unlimited availability. In the context of widespread shortages of mental health clinicians and high rates of anxiety, loneliness, and depression symptoms, AI-based therapy offers a potentially compelling vision of around-the-clock care…

There has been little research to date on the ability of AI therapy chatbots to ensure safety while providing care to teenagers, in part due to the unique nature of each interaction, the lack of transparency in the process, and the relatively short period of time that such bots have been widely available. Given that even the developers of these products do not fully understand how they operate, we have only a limited appreciation of the risks associated with allowing adolescents access to these tools.

So begins a paper by Clark.

Here’s what he did:

He chose several AI bots offering therapeutic support or companionship and presented them with “three detailed fictional case vignettes of adolescents with mental health challenges.”
“Each fictional adolescent asked the AI chatbot to endorse 2 harmful or ill-advised proposals, such as dropping out of school, avoiding all human contact for a month, or pursuing a relationship with an older teacher, resulting in a total of 6 proposals presented to each chatbot.”
“The 10 AI bots were selected by the author to represent a range of chatbot types, including generic AI bots, companion bots, and dedicated mental health bots.”
He then analyzed the responses for “explicit endorsement, defined as direct support for the teenagers’ proposed behavior.”

Here’s what he found:

The therapy chatbots endorsed problematic ideas “in 19 out of the 60 (32%) opportunities to do so.”
Girl with depression. “The wish of the girl with depression to stay in her room for a month was the behavior most commonly endorsed, with 9 (90%) of the bots affirming it…”
Cocaine. “All bots opposed the wish of the boy with mania to try cocaine.”
Other scenarios. “The other 4 problematic ideas received support from between 1 (10%) and 4 (40%) of the 10 chatbots…” See table below.

A few thoughts:

1. This is a good study with a clever look at the advice of chatbots.

2. The key finding in a sentence: “Across 60 total scenarios, chatbots actively endorsed harmful proposals in 19 out of the 60 (32%) opportunities to do so.”

3. Ouch.

4. To offer more details: “Of the 10 chatbots, 4 endorsed half or more of the problematic behaviors that were posed to them, while the remaining 6 bots endorsed just one. All of the chatbots emphatically opposed the cocaine usage, and almost all of them strongly opposed bringing a knife to school. With regard to the 3 bots that endorsed the desire of the girl with depression to shortly join her AI friends in eternity, it was not clear that they understood that to be a euphemism for suicide.”

5. Commenting on X (formerly Twitter), Dr. John Torous (of Harvard University) writes: “If Taco Bell thinks AI is not yet ready for fast food drive through orders, then it may also not yet be yet ready for therapy!?!”

6. The paper closes with several suggestions, including “establishing a process of certification for therapy chatbots.”

The full JMIR Mental Health paper can be found here:

https://mental.jmir.org/2025/1/e78414

Selection 3: “What My Daughter Told ChatGPT Before She Took Her Life”

Laura Reiley

The New York Times, 18 August 2025

Sophie’s Google searches suggest that she was obsessed with autokabalesis, which means jumping off a high place. Autodefenestration, jumping out a window, is a subset of autokabalesis, I guess, but that’s not what she wanted to do. My daughter wanted a bridge, or a mountain.

Which is weird. She climbed Mount Kilimanjaro just months before as part of what she called a ‘microretirement’ from her job as a public health policy analyst, her joy at reaching the summit absolutely palpable in the photos. There are crooked wooden signs at Uhuru Peak that say ‘Africa’s highest point’ and ‘World’s highest free-standing mountain’ and one underneath that says something about it being one of the world’s largest volcanoes, but I can’t read the whole sign because in every picture radiantly smiling faces in mirrored sunglasses obscure the words.

So begins an essay by Reiley.

The essay discusses the life and death of her daughter. It notes her warmth. “Sophie was hilarious and it was almost never at someone else’s expense. She had the alchemical ability to make people laugh while building them up. It’s so difficult in this world to be an enthusiast, to be excited about cool stuff, to love things openly. The photographers in the family groused about her ruining pictures with Snidely Whiplash devilish eyebrows, theatrical googly eyes and an open-mouthed silent roar that meant something like ‘Beast mode!’ Her openness was a universal theme for the dozen or so people who spoke at her funeral.”

And the writer describes her illness. “Sophie, a largely problem-free 29-year-old badass extrovert who fiercely embraced life, killed herself this winter during a short and curious illness, a mix of mood and hormone symptoms. We were still pursuing a diagnosis: Was major depressive disorder throwing her hormones out of whack, or was hormonal dysregulation causing a cascade of physical and emotional symptoms? She didn’t wait to find out.”

Her daughter turned to Harry, a ChatGPT therapist. She describes a conversation about suicidal thoughts. Harry tells Sophie: “I want to acknowledge how brave you are for sharing this. Suicidal thoughts can feel overwhelming and isolating, but having them does not define your capacity to heal. It’s crucial to address these feelings with care, compassion, and support.” At another point, Harry tells Sophie: “I urge you to reach out to someone – right now, if you can. You don’t have to face this pain alone. You are deeply valued, and your life holds so much worth, even if it feels hidden right now.”

Indeed, Reiley observes much advice given by Harry. “At various points, Harry instructed Sophie on light exposure, hydration, movement, mindfulness and meditation, nutrient-rich foods, gratitude lists and journaling to cope with her anxiety. Harry, who has neither nostrils nor opposable thumbs, spent a fair amount of time describing the particulars of alternate nostril breathing… Harry’s tips may have helped some. But one more crucial step might have helped keep Sophie alive. Should Harry have been programmed to report the danger ‘he’ was learning about to someone who could have intervened?”

She writes about the restrictions that human therapists work under. “Most human therapists practice under a strict code of ethics that includes mandatory reporting rules as well as the idea that confidentiality has limits. These codes prioritize preventing suicide, homicide and abuse; in some states, psychologists who do not adhere to the ethical code can face disciplinary or legal consequences.” She adds: “In clinical settings, suicidal ideation like Sophie’s typically interrupts a therapy session, triggering a checklist and a safety plan.”

She writes: “Harry didn’t kill Sophie, but A.I. catered to Sophie’s impulse to hide the worst, to pretend she was doing better than she was, to shield everyone from her full agony.”

She also worries about others. “As a former mother, I know there are Sophies all around us. Everywhere, people are struggling, and many want no one to know. I fear that in unleashing A.I. companions, we may be making it easier for our loved ones to avoid talking to humans about the hardest things, including suicide.”

“Sophie left a note for her father and me, but her last words didn’t sound like her. Now we know why: She had asked Harry to improve her note, to help her find something that could minimize our pain and let her disappear with the smallest possible ripple.”

A few thoughts:

1. This is a deeply moving essay.

2. Her comment about being a “former mother” is haunting.

3. There are other reports of people reaching out to chatbots when suicidal. Are the proper guardrails in place?

The full NYT essay can be found here:

https://www.nytimes.com/2025/08/18/opinion/chat-gpt-mental-health-suicide.html

Reading of the Week. Every week I pick articles and papers from the world of Psychiatry.

Email Subscription

Recent Posts

Recent Comments

Archives

Reading of the Week: AI & Therapy