Can Performance of an AI Algorithm Be Improved on a Specific Ethnicity After Focused Training?
Clinical Relevance Statement:
Training machine learning algorithms requires varied datasets to prevent ethnic bias. Fine-tuning of algorithms with small datasets via transfer learning offers a viable solution to algorithmic bias.
Purpose:
Screening mammography remains the only effective screening method that significantly improves patient mortality. The accuracy of mammograms can vary between experts, resulting in inconsistent interpretations. Multiple artificial intelligence (AI) algorithms are being developed to assist in cancer detection, but they may have training bias due to limited ethnic diversity. We hypothesize that AI algorithms can be optimized for a specific patient population by using various fine-tuning strategies that utilize transfer learning.
Materials and Methods:
A test set of 100 random 2D FFDM mammograms of Arab women was obtained between July 2019 and December 2020. This was used to test a baseline novel AI algorithm which was trained on a large multi-ethnicity dataset, resulting in a Global Prediction Score of Malignancy for each mammogram. Then, the layers of the deep learning AI algorithm were fine-tuned to three different extents: all Layers, some Layers and few Layers using additional 500 random mammograms of Arab women obtained in a similar manner and period. The three fine-tuned AI algorithms were then each re-tested on the test set, and their respective Global Prediction Scores were compared to that of the baseline AI algorithm. Results were evaluated and compared using sign test, Wilcoxon test, and percent change error metrics as well as by visualizing absolute error across cases.
Results:
Absolute error for all test cases were computed. The three fine-tuned algorithms: all layers, some layers, and few layers resulted in better classifying the cases for 40%, 31%, and 32% of the times, respectively. The reduction in error rates is 5.2%, 14.6%, and 11.2% respectively.
Conclusion:
Fine-tuning of the AI algorithm using transfer learning and a small sample of Arabic patients showed a trend towards improved case classification and a reduction in error when tested on other Arabic patients. The benefit of fine-tuning the layers of deep learning algorithm using transfer learning optimizes the algorithm for better performance for a specific patient population. This concept should be explored further to develop ethnically unbiased AI algorithm in medical imaging applications.