News blog

DSM field trials inflame debate over psychiatric testing

As the latest revision of a key psychiatric tome nears completion, field trials of its diagnoses have prompted key changes to controversial diagnoses and sparked questions as to how such trials should be conducted.

If it follows in the footsteps of its predecessors, the next edition of the influential Diagnostic and Statistical Manual of Mental Disorders (DSM) will guide clinical diagnoses, insurance reimbursements and research agendas. But the process of generating the fifth edition (DSM-5) has stumbled over controversies regarding everything from the openness of the revision process (see ‘Psychiatry manual revisions spark row‘) to potential conflicts of interest among the revisers (see ‘Industry ties remain rife on panels for psychiatry manual‘).

Field trials of several proposed changes, the results of which were published last week by the American Journal of Psychiatry, were designed to test the likelihood that two clinicians would arrive at the same diagnoses using the proposed criteria. This parameter is called ‘reliability’ and is measured by a statistical value called ‘kappa’ (see ‘Diagnostics tome comes under fire‘ for more detail).

Some of the new diagnoses performed well. David Kupfer, chair of the DSM-5 task force and a professor of psychiatry at the University of Pittsburgh School of Medicine in Pennsylvania, highlights post-traumatic stress disorder and autism spectrum disorder as two diagnoses that were significantly changed in the new proposals, yet performed well in field trials by the task force’s standards. Autism spectrum disorder, in particular, has been controversial because patient advocates worry the new diagnosis will exclude some patients previously classified as autistic. But the field trials found that reliability of the diagnosis was high, and its prevalence was virtually unchanged between DSM-5 and the previous edition of the manual.

Some controversial diagnoses did not perform as well.  At the annual meeting of the American Psychiatric Association in Philadelphia, Pennsylvania, in May, researchers announced that results from a small trial had prompted the deletion of ‘psychosis risk syndrome’ (see ‘Psychosis risk syndrome excluded from DSM-5). In addition, poor performance of ‘mixed anxiety-depressive disorder’ may result in its exclusion, says Kupfer, who stresses that final decisions have not yet been made.  And the newly proposed ‘mild neurocognitive disorder’ — intended to catch patients likely to develop neurocognitive disorders such as Alzheimer’s disease at an earlier, and hopefully more treatable, stage — performed unevenly, doing well at one field-trial site but poorly at another. In that case, Kupfer says the committee may add the recommendation that additional assessments, such as tests of memory and decision-making, be performed when mild neurocognitive disorder is suspected.

Meanwhile, the fate of paediatric ‘disruptive mood dysregulation disorder’, a diagnosis intended to serve as an alternative to paediatric bipolar disorder, is also uncertain. Although the diagnosis could serve as a welcome alternative to paediatric bipolar disorder, a diagnosis that many fear is being overused, poor performance in the field tests may relegate it to a special section of the manual reserved for diagnoses that are more experimental, and “not quite ready for prime time”, Kupfer says.

These proposed changes are unlikely to satisfy critics, however — particularly those who are sceptical about how the tests themselves were performed. The DSM-5 committees employed a novel trial design and earlier this year began to prepare the psychiatric community for kappa values that are much lower than what psychiatrists are accustomed to seeing. Kupfer says that the tests were designed to more accurately reflect what might happen in the real world. For example, rather than having two clinicians interview the patients simultaneously — a scenario often employed in field tests of the previous editions of the manual, but one that allows one clinician to influence the other — patients underwent two separate interviews, sometimes weeks apart.

But because of the unusual structure of the field tests, psychiatrists lack a frame of reference for evaluating the results. Psychiatrists typically consider kappas below 0.4 to be poor, yet the DSM-5 task force has argued that under the new testing rubric, such values should be considered acceptable. In January, the task force backed up their argument with three examples of published trials from other areas of medicine. But in May, psychiatrist Robert Spitzer, architect of the revolutionary DSM-III, noted that one of the publications they listed also described a kappa of 0.39–0.43 as “poor”.

Regardless, many in the field continue to compare results from the DSM5 trials directly to those achieved in DSM-IV trials, and find the new test results lacking. One solution would have been to evaluate the DSM-IV criteria alongside the new proposals, says Stephen Strakowski, a psychiatrist at the University of Cincinnati in Ohio. But doing so would have required two additional interviews, counter Kupfer and Helena Kraemer, a statistician at Stanford University in California, who helped design the field tests. Doing so would add significantly to the burden of patients and clinicians participating in the studies.

Others believe that the DSM-5 committee’s attempt to mimic the real world was unnecessary, and limited the utility of the tests. Thomas Widiger, a psychiatrist at the University of Kentucky in Lexington, argues that the field tests should have required the use of a ‘structured interview’, in which clinicians are asked to use a given set of questions. “We already know that without a structured interview, psychiatric diagnoses can be very poor,” he says. “There’s no useful information here.”

Although in practice, clinicians rarely use structured interviews, Widiger says that the point of the field trials is to look at performance of the diagnoses in a more controlled setting. Then, he says, at least clinicians would know the diagnoses are solid if they use structured interviews.

“I don’t think many people in the field are accepting this,” says Widiger. “In the end, we’re stuck with no information about these disorders.”


  1. Report this comment

    Bernard Carroll said:

    The very point of the DSM-5 project is to have diagnostic criteria that can be used reliably across the country, if not around the world. So when we see wide variations reported between centers, even for classic conditions like bipolar I disorder, we know something isn’t right. For that condition, the kappa obtained at the Mayo Clinic site was ~ 0.75 whereas at the San Antonio site it was ~0.27. You can’t gloss over that problem. Something isn’t right. They need to go back to the drawing boards.

  2. Report this comment

    Chuck Ruby said:

    Agreed that a .40 kappa is terrible reliability. But the more important issue is the validity of those diagnoses, which addresses whether or not those invented, voted on, removed, and experimental categories really refer to actual disease processes (they don’t).

Comments are closed.