A classification model for predicting diabetic retinopathy based on patient characteristics and biochemical measures

Evangelia Kotsiliti, Bashir Al-Diri, Andrew Hunter

Abstract


 

 Purpose: In the United Kingdom (UK), The NHS Diabetic Eye Screening Program offers an annual eye examination to all people with diabetes aged 12 or over, aiming at the early detection of people at high risk of visual loss due to diabetic retinopathy. The purpose of this study was the design of a model to predict patients at risk of developing retinopathy with the use of patient characteristics and clinical measures. 

Methods: We investigated data from 2011 to 2016 from the population-based Diabetic Eye Screening Program in East Anglia. The data comprised retinal eye screening results, patient characteristics, and routine biochemical measures of HbA1c, blood pressure, Albumin to Creatinine ratio (ACR), estimated Glomerular Filtration rate (eGFR), serum creatinine, cholesterol and Body Mass Index (BMI). Individuals were classified according to the presence or absence of retinopathy as indicated by their retinal eye examinations. A lasso regression, random forest, gradient boosting machine and regularized gradient boosting model were built and cross-validated for their predictive ability. 

 

Results: A total of 6,375 subjects with recorded information for all available biochemical measures were identified from the cohorts. Of these, 5,969 individuals had no signs of diabetic retinopathy. Of the remainder 406 individuals with signs of diabetic retinopathy, 352 had background diabetic retinopathy and 54 had referable diabetic retinopathy. The highest value of the10-fold cross-validated Area under the Curve (AUC) was achieved by the gradient boosting machine 0.73 ± 0.03 and the minimum required set of variables to yield this performance included 4 variables: duration of diabetes, HbA1c, ACR and age. A subsequent analysis on the predictive power of the biochemical measures showed that when HbA1c and ACR measurements were available for longer time periods, the performance of the models was greatly enhanced. When HbA1c and ACR measurements for a 5-year period prior to the event of study were available, gradient boosting machine cross-validated AUC was 0.77 ± 0.04 in comparison to the cross-validated AUC of 0.68 ± 0.04 when only information for the 1-year period for these variables was available. Similarly, an increment from 0.70 ± 0.02 to 0.75 ± 0.04 was observed with random forest. The dataset with the 1-year measurements comprised 4,857 subjects, of whom, 4,572 had no retinopathy and the remainder 285 had signs of retinopathy. The dataset with the 5-year measurements comprised 757 subjects, of whom, 696 had no retinopathy and the remainder 51 had signs of retinopathy. 

Conclusions: The utilization of patient information and routine biochemical measures can be used to identify patients at risk of developing retinopathy. The effective differentiation between patients with and without retinopathy could significantly reduce the number of screening visits without compromising patients’ health. 


Keywords


area under the ROC curve (AUC), classification, gradient boosting, lasso, prevalence and risk of diabetic retinopathy, random forest, retina screening

Full Text:

PDF

Refbacks

  • There are currently no refbacks.


Copyright (c) 2017 Journal for Modeling in Ophthalmology