ChestView

Evaluation of radiologists’ performance compared to a deep learning algorithm for the detection of thoracic abnormalities on chest X-ray

Bennani et. al

ESTI 2022

Published In

ESTI 2022 (2022)

Authors

Souhail Bennani, Nor-Eddine Regnard, Louis Lassalle, Toan Nguyen, Cécile Malandrin, Hasmik Koulakian, Philippe Khafagy, Guillaume Chassagnon, Marie-Pierre Revel.

Abstract

purpose:
To compare the performance of radiologists to that of an artificial intelligence (AI) algorithm for the detection of thoracic abnormalities on chest- X-ray (CXR).

Material and methods:
Thoracic images of patients who underwent CXR and had a thoracic CT scan within 72 hours at Cochin university hospital were collected over a 10-year period (2010-2020). A senior radiologist specialized in thoracic imaging annotated CXR images for 5 main anomalies (pneumothorax, pleural effusion, mediastino-hilar mass, nodule, and alveolar pattern), with the corresponding CT scan as the standard of reference. Each abnormality was classified by the same chest radiologist into two different categories: detectable on the CXR or only detectable on CT. Twelve readers (4 chest radiologists, 4 general radiologists, 4 radiology residents) read half of the dataset, blinded to the chest radiologist’s annotations, CT findings, and AI algorithm (ChestView, Gleamer) results. Their readings were compared to that of the AI algorithm which was previously trained on 89,229 X-ray images, validated on 3687, and tested on 3722.

Results:
The study included 500 exams of which 267 presented at least one abnormality seen on CT whereas 233 CXR showed no abnormality. The AI had a sensitivity of 71.43% to detect visible pneumothoraces whereas chest radiologists, general radiologist and radiology residents had a sensitivity of 70.68%, 38.86%, 37.73% respectively. For visible pleural effusions, the sensitivity was as follows: 86.75%, 71.12%, 65.87%, and 64.02% for the AI, chest radiologists, general radiologists, and radiology residents respectively. The sensitivity for visible mediastino-hilar masses was 50%, 48.02%, 40.44%, and 32.75% for the AI, chest radiologists, general radiologists, and radiology residents respectively. Alveolar syndromes were detected with a sensitivity of 73.13% by the AI, 59.23% by the chest radiologists, 54.97% by the general radiologists, and 38.35% by the radiology residents. For the detection of lung nodules, the sensitivity was 41.91%, 39.33%, 29.73%, and 26.44% for the AI, chest radiologists, general radiologists, and radiology residents respectively. The specificity of the AI algorithm was equivalent to that of the chest radiologists for all 5 abnormalities.

Conclusion:
These preliminary results show that the AI algorithm has higher sensitivity than all readers and equivalent specificity to experts for detecting CXR abnormalities and thus has the potential to decrease diagnostic errors.