Development and Comparison of Three Data Models for Predicting Diabetes Mellitus Using Risk Factors in a Nigerian Population

dc.contributor.authorOdukoya, O.
dc.contributor.authorNwaneri, S.
dc.contributor.authorOdeniyi, I.
dc.contributor.authorAkodu, B.
dc.contributor.authorOluwole, E.
dc.contributor.authorOlorunfemi, G.
dc.contributor.authorPopoola, O.
dc.contributor.authorOsuntoki, A.
dc.date.accessioned2023-06-13T07:39:10Z
dc.date.available2023-06-13T07:39:10Z
dc.date.issued2022-01
dc.descriptionScholarly article
dc.description.abstractObjectives This study developed and compared the performance of three widely used predictive models—logistic regression (LR), artificial neural network (ANN), and decision tree (DT)—to predict diabetes mellitus using the socio-demographic, lifestyle, and physical attributes of a population of Nigerians. Methods We developed three predictive models using 10 input variables. Data preprocessing steps included the removal of missing values and outliers, min-max normalization, and feature extraction using principal component analysis. Data training and validation were accomplished using 10-fold cross-validation. Accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under the receiver operating characteristic curve (AUROC) were used as performance evaluation metrics. Analysis and model development were performed in R version 3.6.1. Results The mean age of the participants was 50.52 ± 16.14 years. The classification accuracy, sensitivity, specificity, PPV, and NPV for LR were, respectively, 81.31%, 84.32%, 77.24%, 72.75%, and 82.49%. Those for ANN were 98.64%, 98.37%, 99.00%, 98.61%, and 98.83%, and those for DT were 99.05%, 99.76%, 98.08%, 98.77%, and 99.82%, respectively. The best-performing and poorest-performing classifiers were DT and LR, with 99.05% and 81.31% accuracy, respectively. Similarly, the DT algorithm achieved the best AUC value (0.992) compared to ANN (0.976) and LR (0.892). Conclusions Our study demonstrated that DT, LR, and ANN models can be used effectively for the prediction of diabetes mellitus in the Nigerian population based on certain risk factors. An overall comparative analysis of the models showed that the DT model performed better than LR and ANN.
dc.description.sponsorshipFogarty International Center of the National Institutes of Health (Award No. D43TW010134)
dc.identifier.citationOdukoya, O Nwaneri, S Odeniyi, I Akodu, B Oluwole, E Olorunfemi, G Popoola, O & Osuntoki, A (2022) Development and Comparison of Three Data Models for Predicting Diabetes Mellitus Using Risk Factors in a Nigerian Population. Healthcare Informatics Research, 28(1): 58-67.
dc.identifier.issn2093-369X
dc.identifier.urihttps://ir.unilag.edu.ng/handle/123456789/12491
dc.language.isoen
dc.publisherKorean Society of Medical Informatics
dc.relation.ispartofseriesHealthcare Informatics Research; 28(1)
dc.titleDevelopment and Comparison of Three Data Models for Predicting Diabetes Mellitus Using Risk Factors in a Nigerian Population
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
hir-2022-28-1-58-Models for Predicting Diabetes.pdf
Size:
1.03 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: