Joann G Elmore, professor of medicine1, Raymond L Barnhill, professor of pathology2, David E Elder, professor of pathology and laboratory medicine3, Gary M Longton, senior statistical analyst4, Margaret S Pepe, professor of biostatistics4, Lisa M Reisch, research consultant1, Patricia A Carney, professor of family medicine5, Linda J Titus, professor of community and family medicine6, Heidi D Nelson, research professor78, Tracy Onega, associate professor of epidemiology910, Anna N A Tosteson, professor of community and family medicine11, Martin A Weinstock, professor of dermatology1213, Stevan R Knezevich, dermatopathologist14, Michael W Piepkorn, clinical professor of medicine (dermatology)
Objective To quantify the accuracy and reproducibility of pathologists’ diagnoses of melanocytic skin lesions.
Design Observer accuracy and reproducibility study.
Setting 10 US states.
Participants Skin biopsy cases (n=240), grouped into sets of 36 or 48. Pathologists from 10 US states were randomized to independently interpret the same set on two occasions (phases 1 and 2), at least eight months apart.
Main outcome measures Pathologists’ interpretations were condensed into five classes: I (eg, nevus or mild atypia); II (eg, moderate atypia); III (eg, severe atypia or melanoma in situ); IV (eg, pathologic stage T1a (pT1a) early invasive melanoma); and V (eg, ≥pT1b invasive melanoma). Reproducibility was assessed by intraobserver and interobserver concordance rates, and accuracy by concordance with three reference diagnoses.
Results In phase 1, 187 pathologists completed 8976 independent case interpretations resulting in an average of 10 (SD 4) different diagnostic terms applied to each case. Among pathologists interpreting the same cases in both phases, when pathologists diagnosed a case as class I or class V during phase 1, they gave the same diagnosis in phase 2 for the majority of cases (class I 76.7%; class V 82.6%). However, the intraobserver reproducibility was lower for cases interpreted as class II (35.2%), class III (59.5%), and class IV (63.2%). Average interobserver concordance rates were lower, but with similar trends. Accuracy using a consensus diagnosis of experienced pathologists as reference varied by class: I, 92% (95% confidence interval 90% to 94%); II, 25% (22% to 28%); III, 40% (37% to 44%); IV, 43% (39% to 46%); and V, 72% (69% to 75%). It is estimated that at a population level, 82.8% (81.0% to 84.5%) of melanocytic skin biopsy diagnoses would have their diagnosis verified if reviewed by a consensus reference panel of experienced pathologists, with 8.0% (6.2% to 9.9%) of cases overinterpreted by the initial pathologist and 9.2% (8.8% to 9.6%) underinterpreted.
Conclusion Diagnoses spanning moderately dysplastic nevi to early stage invasive melanoma were neither reproducible nor accurate in this large study of pathologists in the USA. Efforts to improve clinical practice should include using a standardized classification system, acknowledging uncertainty in pathology reports, and developing tools such as molecular markers to support pathologists’ visual assessments.