How to read a paper Papers that report diagnostic or screening tests
This is the seventh in a series of 10 articles introducing non-experts to finding medical articles and assessing their value
Unit for Evidence-Based Practice and Policy, Department of Primary Care and Population Sciences, University College London Medical School/ Royal Free HospitalSchool of Medicine, Whittington Hospital, London N19 5NF Trisha Greenhalgh, senior lecturer p.greenhalgh@ ucl.ac.uk
Ten men in the dock
If you are new to the concept of validating diagnostic tests, the following example may help you. Ten men are awaiting trial for murder. Only three of them actually committed a murder; the seven others are innocent of any crime. A jury hearseach case and finds six of the men guilty of murder. Two of the convicted are true murderers. Four men are wrongly imprisoned. One murderer walks free. This information can be expressed in what is known as a two by two table (table 1). Note that the “truth” (whether or not the men really committed a murder) is expressed along the horizontal title row, whereas the jury’s verdict (which may or maynot reflect the truth) is expressed down the vertical row. These figures, if they are typical, reflect several features of this particular jury: x the jury correctly identifies two in every three true murderers; x it correctly acquits three out of every seven innocent people; x if this jury has found a person guilty, there is still only a one in three chance that they are actually a murderer; x ifthis jury found a person innocent, he or she has a three in four chance of actually being innocent; and x in five cases out of every 10 the jury gets it right. These five features constitute, respectively, the sensitivity, specificity, positive predictive value, negative predic-
New tests should be validated by comparison against an established gold standard in an appropriatespectrum of subjects Diagnostic tests are seldom 100% accurate (false positives and false negatives will occur) A test is valid if it detects most people with the target disorder (high sensitivity) and excludes most people without the disorder (high specificity), and if a positive test usually indicates that the disorder is present (high positive predictive value) The best measure of the usefulnessof a test is probably the likelihood ratio—how much more likely a positive test is to be found in someone with, as opposed to without, the disorder
tive value, and accuracy of this jury’s performance. The rest of this article considers these five features applied to diagnostic (or screening) tests when compared with a “true” diagnosis or gold standard. A sixth feature—the likelihood ratio—isintroduced at the end of the article.
Validating tests against a gold standard
Our window cleaner told me that he had been feeling thirsty recently and had asked his general practitioner to be tested for diabetes, which runs in his family. The nurse in his surgery had asked him to produce a urine specimen and dipped a stick in it. The stick stayed green, which meant, apparently, that there wasno sugar in his urine. This, the nurse had said, meant that he did not have diabetes. I had trouble explaining that the result did not necessarily mean this, any more than a guilty verdict necessarily makes someone a murderer. The definition of diabetes, according to the World Health Organisation, is a blood glucose level above 8 mmol/l in the fasting state, or above 11 mmol/l two hours after a100 g oral glucose load, on one occasion if the patient has symptoms and on two occasions if he or she does not.1 These stringent criteria can be termed
BMJ VOLUME 315 30 AUGUST 1997
Education and debate
Table 1 Two by two table showing outcome of trial for 10 men accused of murder
True criminal status Jury verdict Guilty Innocent Murderer Rightly convicted (2 men)...