Figure 1: Proposed Methodology
S.No |
Author(year) |
Method(Accuracy) |
1 |
Sarwar et. (2018) |
KNN(77), SVM(77), LR(74), RF(71) |
2 |
Sonar and Prof.K.JayaMalini (2019) |
DT(85), NB(77), SVM(77.3) |
3 |
Soni et al. (2020) |
RF(77) |
4 |
Daanouni et al. (2020) |
ANN(87.5), DT(82.50) |
5 |
L.J.Muhammad et al. (2020) |
GB(88.76), RF(88.76) |
6 |
Xu and Wang (2019) |
RF(93), XG(93) |
Table 1: Literature Review
Column attributes |
Description |
Diabetes |
Yes (60), No (330) |
Waist/Hip |
Ratio may be a more significant heart |
disease risk factor than BMI. |
|
Waist/Hip |
In inches |
Diastolic BP |
The lower number of blood pressure |
Systolic BP |
The upper number of blood pressure |
BMI |
703 x weight (lbs)/ [height(inches)] |
Weight |
In pounds |
Height |
In inches |
Gender |
162 males, 228 females |
Age |
All adult African Americans |
Chol/HDL |
Ratio of total cholesterol to good |
cholesterol. The desirable finding is less than 5 |
|
HDL |
Good Cholesterol |
Glucose |
Fasting blood sugar |
Cholesterol |
Total cholesterol |
Patient number |
Identifies patients by number |
Table 2: Dataset Description
Measures |
Formula |
ROC |
Trade-off between True positive rateand False Positive Rate |
F1-Score |
F=2*(P*R)/(P+R) |
Recall(R) |
R=TP/(TP+FN) |
Precision(P) |
P=TP/(TP+FP) |
Accuracy(A) |
A=TP+TN/(TP+TN+FP+FN) |
Table 3: Accuracy Measures
Algorithm |
Withoutparame- ter tuning |
Randomizedsearch CV |
Gridsearch CV |
Decision tree |
92.3% |
94% |
92.3% |
Random Forest |
94% |
93.2% |
93.2% |
LogisticRe-gression |
94.9% |
94.9% |
94.9% |
SVM |
94% |
93.2% |
94.7% |
Ada Boost |
90% |
94% |
92.3% |
Light GBM |
94% |
91.4% |
92.3% |
Gradient Boost |
91.5% |
90.6% |
91.5% |
Cat Boost |
94.9% |
95.7% |
94% |
Table 4: Accuracy Table
Algorithm |
PIMAIndiaDataset |
Proposed model |
Decision Tree |
71.2% |
94% |
Random For-est |
77.48% |
94% |
Logistic Re-gression |
74.89% |
94.9% |
SVM |
74.09% |
94.9% |
Ada Boost |
75.32% |
94% |
Light GBM |
75% |
94% |
GradientBoost |
75.75% |
91.5% |
Cat Boost |
75.32% |
95.7% |
Table 5: Comparison Between PIMA [11] Indian Dataset and Proposed Mode
Figure 1: Proposed Methodology
Figure 2: Correlation Between Features
Figure 3: Accuracy Table
Figure 4: Roc Curve
Figure 5: Cat Boost with Randomized Search CV
Tables at a glance
Figures at a glance