From the Editor

In the early days of the pandemic, patients connected with us virtually from their kitchens and bedrooms – and, yes, their closets and washrooms. But as COVID-19 fades, we may wonder: what care should be delivered virtually and what should be done in person?

In the first selection, Sara Zandieh (of McMaster University) and her co-authors examine remote versus in-person CBT in a new CMAJ study. They conducted a systematic review and meta-analysis with 54 randomized controlled trials and almost 5 500 participants, addressing both physical and mental problems. “Moderate-certainty evidence showed little to no difference in the effectiveness of in-person and therapist-guided remote CBT across a range of mental health and somatic disorders, suggesting potential for the use of therapist-guided remote CBT to facilitate greater access to evidence-based care.” We consider the paper and its implications.

In the second selection, Dr. Tae Woo Park (of the University of Pittsburgh) and his co-authors explore opioid use disorder (OUD) treatment. In their JAMA research letter, they compared medication and psychosocial treatments for OUD across the United States, surveying more than 17 000 facilities and analyzing the availability of evidenced-based interventions like buprenorphine and contingency management. “Substance use treatment facilities reported significant gaps in provision of effective treatments for OUD.”

And in the third selection from CNBC, Dr. Scott Gottlieb and Shani Benezra (both of the American Enterprise Institute) describe their experiment: they tasked several large language models with answering questions from the USMLE Step 3. The average resident score is 75%; four of five AI programs surpassed that benchmark. “[These models] may offer a level of precision and consistency that human providers, constrained by fatigue and error, might sometimes struggle to match, and open the way to a future where treatment portals can be powered by machines, rather than doctors.”

There will be no Reading next week.

DG

Selection 1: “Therapist-guided remote versus in-person cognitive behavioural therapy: a systematic review and meta-analysis of randomized controlled trials”

Sara Zandieh, Seyedeh Maryam Abdollahzadeh, Behnam Sadeghirad, et al.

CMAJ, 28 March 2024

Cognitive behavioural therapy (CBT) is a form of psychotherapy that focuses on the identification and modification of unhelpful thoughts and behaviour patterns and has been shown to be effective for a wide range of mental health and somatic disorders. For example, a 2022 systematic review found moderate-certainty evidence that CBT delivered with physiotherapy probably resulted in large improvements in pain relief and physical functioning for patients with chronic low back pain, compared with physiotherapy alone. In 2022, more than 5 million Canadians (18.3%) met diagnostic criteria for a mood, anxiety, or substance use disorder, and 1 in 5 adults live with chronic pain. In 2019, the World Health Organization advised that access to CBT was essential for evidence-based health care; however, treatment access is an important barrier to care…

In Canada, CBT may be provided within existing government-funded health care services (e.g., hospital settings) and by private providers… In an effort to increase access, the government of Saskatchewan began providing funding for Internet-based CBT in 2015, as did the Ontario Ministry of Health… starting in 2020; however, the relative effectiveness of in-person and remote CBT is uncertain.

A previous systematic review addressed this question, searching the literature up to February 2017, and found that Internet-based CBT may be similarly effective to in-person CBT, but suggested that effectiveness could differ by the clinical condition being targeted.

So begins a paper by Zandieh et al.

Here’s what they did:

  • They conducted a systematic review and meta-analysis (adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting checklist and following Grading of Recommendations Assessment, Development and Evaluation (GRADE) guidance).
  • They searched various databases, including MEDLINE.
  • They included “RCTs that enrolled adult patients (aged ≥ 18 yr) who were seeking treatment for any clinical condition, randomized to receive either therapist-guided remote CBT (e.g., teleconference, videoconference) or in-person CBT.”
  • Excluded studies: those where CBT was provided without therapist guidance.
  • Independent reviewers, who extracted data independently, assessed risk of bias.
  • Different statistical analyses were done including a “random-effects model meta-analyses to pool patient-important primary outcomes across eligible RCTs as standardized mean differences (SMDs).”

Here’s what they found:

  • They analyzed 54 RCTs with a total of 5 463 patients.
  • Demographics. 61.4% of the participants were female with a mean age of 43 years.
  • Therapy. “19 studies (35%) randomized patients to group therapy, whereas 32 (59%) provided individual therapy; 2 studies (4%) did not specify how CBT was provided and 1 RCT (2%) administered both group and individual therapy. Types of remote CBT included interactive voice response technology, computerized CBT, telehealth and telephone-based CBT, videoconference, and Internet-delivered CBT.”
  • Diagnoses. Seventeen studies focused on treatment of anxiety and related disorders, 14 on depressive symptoms, 7 on insomnia, 6 on chronic pain or fatigue syndromes, 5 on body image or eating disorders, 3 on tinnitus, 1 on alcohol use disorder, and 1 on mood and anxiety disorders.
  • Effectiveness. “Moderate-certainty evidence showed little to no difference in the effectiveness of therapist-guided remote and in-person CBT on primary outcomes (SMD −0.02…).”
  • Risk of bias. “Patients and health care providers were unblinded in all RCTs and no study was at high risk of bias for deviation from the intended intervention…” Some studies had problems; 9%, for instance, were at high risk of bias for their randomization process.

A few thoughts:

1. This is a good study, with practical implications, published in a respected journal.

2. The main finding in a sentence: “Moderate-certainty evidence showed little to no difference in the effectiveness of in-person and therapist-guided remote CBT across a range of mental health and somatic disorders…”

3. In the post-COVID era, this paper adds nicely to the literature. It also helps guide practice, suggesting that CBT can be either remote or in-person, and that other things (like patient preference) are more important. The authors remind us: “Remote CBT imposes fewer demands on patients’ time as travel for face-to-face sessions is unnecessary.”

4. There are also implications for public policy. “Only 2 Canadian provinces (Saskatchewan and Ontario) currently provide funding for remote CBT. Access to psychotherapy is an important barrier for many people in Canada, particularly those living in remote or rural areas, including military veterans and Indigenous populations, both of which are at high risk for chronic pain and mental health disorders.”

5. Like all studies, there are limitations. The authors note several, including that the study covers “a wide range of clinical conditions,” and that some conditions were represented by a small number of RCTs (in the case of alcohol use disorder, just one RCT).

6. Past Readings have considered care in the post-pandemic world. In a 2023 Reading, for example, I summarized and commented on a BJP systematic review and meta-analysis focused on telepsychiatry and 11 mental disorders. “Telepsychiatry achieved a symptom improvement effect for various psychiatric disorders similar to that of face-to-face treatment. However, some superiorities/inferiorities were seen across a few specific psychiatric disorders, suggesting that its efficacy may vary according to disease type.” That Reading can be found here:

https://davidgratzer.com/reading-of-the-week/reading-of-the-week-telepsych-vs-in-person-treatment-the-new-bjp-paper-also-rethinking-palliative-care-in-psychiatry-and-kemp-on-his-depression/

The full CMAJ paper can be found here:

https://www.cmaj.ca/content/196/10/E327

Selection 2: “Treatment Approaches for Opioid Use Disorder Offered in US Substance Use Treatment Facilities”

Tae Woo Park, Bryant Shuey, Jane Liebschutz, et al.

JAMA, 11 July 2024

Medication and psychosocial treatments are the primary treatment approaches for substance use disorders (SUDs). For opioid use disorder (OUD), medications for OUD (MOUD) can reduce opioid-related harms. But aside from contingency management, there is little evidence that psychosocial treatments alone or combined with MOUD affect OUD treatment outcomes. Federally certified opioid treatment programs (OTPs) are the only outpatient facilities licensed to treat OUD with methadone, yet access to OTPs is limited and OUD is more commonly treated outside of OTPs.

So begins a research letter by Park et al.

Here’s what they did:

  • They conducted a cross-sectional study of specialty outpatient SUD treatment facilities that treat OUD, using the 2022 National Substance Use and Mental Health Services Survey – “an annual, multimodal (internet, mail, and telephone) survey of representatives of all SUD and mental health treatment facilities in the US.”
  • They inquired about treatments provided: “MOUD, psychosocial treatments for OUD, treatment of other SUDs, and any mental health treatment.”

Here’s what they found:

  • A total of 17 353 representatives from SUD facilities were surveyed; the response rate was 91%. (!) OUD treatment was offered at 12 060 outpatient facilities.
  • MOUD treatment. 99.4% of OTPs offered MOUD; 55.1% of non-OTPs offered it.
  • Psychosocial treatment. 35.2% of facilities offered psychosocial treatments only. “Among facilities that offered MOUD, compared with non-OTPs, a higher proportion of OTPs offered any MOUD plus contingency management treatment for OUD (53.3% vs 41.6%…), and a lower proportion of OTPs offered any MOUD, any psychosocial treatment, other SUD treatment, and mental health services (16.7% vs 33.5%…).”
  • Types of MOUD. “Overall, 62.2% of facilities offered at least 1 MOUD type, 45.6% offered at least 2 MOUD types, and 2.0% offered all 4 MOUD types.”
  • Meds in OTPs vs non-OTPs. “Compared with OTPs, a lower proportion of non-OTPs offered sublingual and injectable buprenorphine (49.7% vs 83.4%… and 22.4% vs 26.4%… and offered injectable naltrexone at similar rates (42.1% vs 41.0…).” In terms of psychosocial interventions: “Contingency management was more frequently offered at OTPs than non-OTPs (54.1% vs 40.2%…).”

A few thoughts:

1. This research letter offers solid data on the state of treatment.

2. The key findings in three sentences: “More than one-third of facilities did not offer MOUD and less than half offered multiple MOUD types, limiting MOUD treatment options for patients and clinicians. Contingency management, the psychosocial treatment with the most evidence of effectiveness for OUD, was offered at less than half of facilities. Less than a fifth of facilities offered MOUD, contingency management, mental health services, and treatment of other SUDs combined.”

3. Ouch.

4. Despite the focus on an opioid crisis, basic care isn’t provided at many facilities.

5. Needless to say, the authors make several recommendations, including better regulations.

6. Study limitations include the “reliance on facility self-report” – which raises an important question: is the patient experience even narrower than these results suggest?

The full JAMA research letter can be found here:

https://jamanetwork.com/journals/jama/article-abstract/2820976

Selection 3: “How well can AI chatbots mimic doctors in a treatment setting? We put 5 to the test”

Scott Gottlieb and Shani Benezra

CNBC, 18 July 2024

Many consumers and medical providers are turning to chatbots, powered by large language models, to answer medical questions and inform treatment choices. We decided to see whether there were major differences between the leading platforms when it came to their clinical aptitude.

To secure a medical license in the United States, aspiring doctors must successfully navigate three stages of the U.S. Medical Licensing Examination (USMLE), with the third and final installment widely regarded as the most challenging. It requires candidates to answer about 60% of the questions correctly, and historically, the average passing score hovered around 75%.

When we subjected the major large language models (LLMs) to the same Step 3 examination, their performance was markedly superior, achieving scores that significantly outpaced many doctors.

So begins an essay by Gottlieb and Benezra.

They remind of us the examination’s purpose. “It assesses a new doctor’s ability to manage patient care across a broad range of medical disciplines and includes both multiple-choice questions and computer-based case simulations.”

They describe the experiment. “We isolated 50 questions from the 2023 USMLE Step 3 sample test to evaluate the clinical proficiency of five different leading large language models, feeding the same set of questions to each of these platforms – ChatGPT, Claude, Google Gemini, Grok and Llama.” They note the significance: “it’s the first time these five leading platforms have been compared in a head-to-head evaluation.”

The results: ChatGPT-4o (Open AI): 98%, Claude 3.5 (Anthropic): 90%, Gemini Advanced (Google): 86%, Grok (xAI): 84%, and HuggingChat (Llama): 66%.

They note differences:

  • “OpenAI’s ChatGPT-4o emerged as the top performer… employing language reminiscent of a medical professional. It not only delivered answers with extensive reasoning, but also contextualized its decision-making process, explaining why alternative answers were less suitable.”
  • “Claude, from Anthropic, came in second with a score of 90%. It provided more human-like responses with simpler language and a bullet-point structure that might be more approachable to patients.”
  • “Gemini, which scored 86%, gave answers that weren’t as thorough as ChatGPT or Claude, making its reasoning harder to decipher…” That said, “its answers were succinct and straightforward.”
  • “Grok, the chatbot from Elon Musk’s xAI, scored a respectable 84% but didn’t provide descriptive reasoning during our analysis, making it hard to understand how it arrived at its answers.
  • “While HuggingChat – an open-source website built from Meta’s Llama – scored the lowest at 66%…” Still, they note “good reasoning” and “concise responses and links to sources.”

“These models weren’t designed for medical reasoning; they’re products of the consumer technology sector, crafted to perform tasks like language translation and content generation. Despite their non-medical origins, they’ve shown a surprising aptitude for clinical reasoning.”

They focus on a question about a twenty year-old patient with a sexually transmitted disease. “ChatGPT correctly determined that the patient should be scheduled for HIV serology testing in three months, but the model went further, recommending a follow-up examination in one week to ensure that the patient’s symptoms had resolved and that the antibiotics covered his strain of infection.” They add: “To us, the response highlighted the model’s capacity for broader reasoning, expanding beyond the binary choices presented by the exam.”

A few thoughts:

1. This is an interesting op ed.

2. The key finding in a sentence: four of the five programs did better than the average physician score on the exam. Wow.

3. And, as they note, the AI programs weren’t tailored for medical problem solving. They add: “Google recently introduced Med-Gemini… that’s fine-tuned for medical applications and equipped with web-based searching capabilities to enhance clinical reasoning.” What will programs be able to do in the future?

4. While the authors are subject experts (and Dr. Gottlieb is a former commissioner of the FDA), these findings weren’t published in a peer reviewed journal.

5. How will AI and ChatGPT change care? I discussed that with Dr. John Torous (of Harvard University) in a Quick Takes podcast, summarized in a past Reading, which you can find here:

https://davidgratzer.com/reading-of-the-week/reading-of-the-week-antidepressants-discontinuation-symptoms-the-new-lancet-psych-study-also-neuromodulation-and-digital-health-technology/

(Spoiler alert: Dr. Torous doesn’t think you should re-train just yet.)

The full CNBC op ed can be found here:

https://www.cnbc.com/2024/07/18/op-ed-how-well-can-ai-chatbots-mimic-doctors.html

Reading of the Week. Every week I pick articles and papers from the world of Psychiatry.