Published InSkeletal Radiology (May 2022)
AuthorsDaichi Hayashi, Andrew J. Kompel, Jeanne Ventre, Alexis Ducarouge, Toan Nguyen, Nor-Eddine Regnard & Ali Guermazi
We aimed to perform an external validation of an existing commercial AI software program (BoneView™) for the detection of acute appendicular fractures in pediatric patients.
Materials and methods
In our retrospective study, anonymized radiographic exams of extremities, with or without fractures, from pediatric patients (aged 2–21) were included. Three hundred exams (150 with fractures and 150 without fractures) were included, comprising 60 exams per body part (hand/wrist, elbow/upper arm, shoulder/clavicle, foot/ankle, leg/knee). The Ground Truth was defined by experienced radiologists. A deep learning algorithm interpreted the radiographs for fracture detection, and its diagnostic performance was compared against the Ground Truth, and receiver operating characteristic analysis was done. Statistical analyses included sensitivity per patient (the proportion of patients for whom all fractures were identified) and sensitivity per fracture (the proportion of fractures identified by the AI among all fractures), specificity per patient, and false-positive rate per patient.
There were 167 boys and 133 girls with a mean age of 10.8 years. For all fractures, sensitivity per patient (average [95% confidence interval]) was 91.3% [85.6, 95.3], specificity per patient was 90.0% [84.0,94.3], sensitivity per fracture was 92.5% [87.0, 96.2], and false-positive rate per patient in patients who had no fracture was 0.11. The patient-wise area under the curve was 0.93 for all fractures. AI diagnostic performance was consistently high across all anatomical locations and different types of fractures except for avulsion fractures (sensitivity per fracture 72.7% [39.0, 94.0]).
The BoneView™ deep learning algorithm provides high overall diagnostic performance for appendicular fracture detection in pediatric patients.