Type 2 diabetes and susceptibility to COVID-19: a machine learning analysis

Shabestari, Motahare; Azizi, Reyhaneh; Ghadiri-Anari, Akram

doi:10.1186/s12902-024-01758-3

Research
Open access
Published: 21 October 2024

Type 2 diabetes and susceptibility to COVID-19: a machine learning analysis

Motahare Shabestari¹,
Reyhaneh Azizi² &
Akram Ghadiri-Anari²

BMC Endocrine Disorders volume 24, Article number: 221 (2024) Cite this article

926 Accesses
Metrics details

Abstract

Background

Type 2 diabetes mellitus (T2DM) was one of the most prevalent comorbidities among patients with coronavirus disease 2019 (COVID-19). Interactions between different metabolic parameters contribute to the susceptibility to the virus; thereby, this study aimed to rank the importance of clinical and laboratory variables as risk factors for COVID-19 or as protective factors against it by applying machine learning methods.

Method

This study is a retrospective cohort conducted at a single center, focusing on a population with T2DM. The patients attended the Yazd Diabetes Research Center in Yazd, Iran, from February 20, 2020, to October 21, 2020. Clinical and laboratory data were collected within three months before the onset of the COVID-19 pandemic in Iran. 59 patients were infected with COVID-19, while 59 were not. The dataset was split into 70% training and 30% test sets. Principal Component Analysis (PCA) was applied to the data. The most important components were selected using a ‘sequential feature selector’ and scored by a Linear Discriminant Analysis model. PCA loadings were then multiplied by the PCs’ scores to determine the importance of the original variables in contracting COVID-19.

Results

HDL-C, followed by eGFR, showed a strong negative correlation with the risk of contracting the virus. Higher levels of HDL-C and eGFR offer protection against COVID-19 in the T2DM population. But, the ratio of BUN to creatinine did not show any correlation. Conversely, the AIP, TyG index and TG showed the most positive correlation with susceptibility to COVID-19 in such a way that higher levels of these factors increase the risk of contracting the virus. The positive correlation of diastolic BP, TyG-BMI index, MAP, BMI, weight, TC, FPG, HbA1C, Cr, systolic BP, BUN, and LDL-C with the risk of COVID-19 decreased, respectively.

Conclusion

The atherogenic index of plasma, triglyceride glucose index, and triglyceride levels are the most significant risk factors for COVID-19 contracting in individuals with T2DM. Meanwhile, high-density lipoprotein cholesterol is the most protective factor.

Peer Review reports

Introduction

The coronavirus disease 2019 (COVID-19) caused a pandemic and significant health challenges all around the world. Among the infected population, hypertension and type 2 diabetes Mellitus (T2DM) were the most prevalent comorbidities [1]. The hallmarks of T2DM are insulin resistance (IR) and decreased tissue response to insulin’s stimulation effect, leading to systemic inflammation, oxidative stress, vascular dysfunction, and impaired immune system reactions [2, 3]. These characteristics predispose individuals with T2DM to infection [4].

Diabetic dyslipidemia, also known as atherogenic dyslipidemia, is a macrovascular complication [5] and is closely associated with insulin resistance in the T2DM population. The dyslipidemia includes elevated levels of triglycerides and reduced levels of high-density lipoprotein cholesterol [6], which develops a chronic inflammation and causes a sustained release of cytokines [7]. This metabolic disturbance has been reported as an independent risk factor for adverse outcomes in COVID-19 patients [8].

Diabetic kidney disease impacts approximately 40% of individuals with T2DM [9]. This microvascular complication induces uremia, which in turn disturbs innate and adaptive immune systems and increases susceptibility to infection [10]. The estimated glomerular filtration rate as a measure of kidney function has demonstrated a negative association with the severity of COVID-19 [11].

Machine learning-based models have been progressively applied in the medical field to diagnose, treat, and evaluate the prognosis of various diseases, as well as to predict and score the risk of developing diseases [12]. Unlike conventional statistical methods, these algorithms can explore the complex relationships between different clinical variables and their interactions to achieve a good and accurate predictive performance [13].

Abnormalities of clinical and laboratory variables in individuals with T2DM make this population susceptible to contracting COVID-19. Most of the previous studies examined the association between vulnerability to COVID-19 and each of the clinical and laboratory features in isolation without considering potential interactions among the features. To address this gap, our study applies various machine learning algorithms to determine the relative importance of each feature’s role in susceptibility to the virus. This helps to manage diabetic patients effectively during future pandemics.

Materials and methods

Study design

This retrospective cohort study was conducted at the Yazd Diabetes Research Center in Yazd, Iran, utilizing data collected from patients who attended the center between February 20, 2020, and October 21, 2020. The Research Ethics Council of Shahid Sadoughi University of Medical Sciences approved the study in Yazd, Iran (IR.SSU.REC.1401.097).

Patients and population

In this study,118 participants with T2DM aged between 30 and 60 were recruited. Clinical data and laboratory measurements were extracted from their medical records. Individuals who attended irregular follow-up visits or had a history of immunodeficiency, neoplasia, co-infection, and smoking were excluded. In order to minimize diabetic complications, the study specifically targeted individuals with a duration of T2DM between 3 and 7 years; therefore, none of the participants had macrovascular complications. Additionally, those without medical records within three months before the pandemic onset in Iran were not considered.

The “COVID-19 positive” group included 59 patients who tested positive for COVID-19 using the polymerase chain reaction (PCR) technique from February 20 to May 19, 2020, but were not hospitalized.

The “COVID-19 negative” group included 59 individuals with no documented history of COVID-19 infection before October 21, 2020. Gender and age matching were performed across the two groups.

Clinical variables and laboratory measures

Clinical data included age, gender, body mass index (BMI), and systolic and diastolic blood pressure. Blood samples obtained after 12 h of fasting and were analyzed for hemoglobin A1C (HbA1C), fasting plasma glucose (FPG), total cholesterol (TC), triglycerides (TG), high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), blood urea nitrogen (BUN), and creatinine (Cr). The estimated glomerular filtration rate (eGFR) was calculated using the CKD-EPI formula based on creatinine [14]. The triglyceride glucose (TyG) index, triglyceride glucose-body mass index (TyG-BMI), and atherogenic index of plasma (AIP) were calculated using the following equations: [15, 16].

$$\begin{aligned}\:\text{TyG}\:\text{Index}\hspace{0.17em}=&\hspace{0.17em}\text{Ln}\:\left(\text{fasting}\:\text{glucose}\:\right(\text{mg/dL})\\&\:\times\:\:\text{triglycerides}\:(\text{mg/dL})/2)\end{aligned}$$

$$\:\text{TyG}-\text{BMI}\hspace{0.17em}=\hspace{0.17em}\text{BMI}\:\times\:\:\text{TyG}\:\text{index}$$

$$\:API\:=\:log\,\frac{Triglycerides}{HDL}$$

Statistical analysis

Statistical analyses were performed using SPSS version 27. The normality of data was evaluated using the Shapiro–Wilk test. Variables following a normal distribution were presented as means ± standard deviation (SD), while non-normally distributed variables were reported as medians and interquartile ranges. Differences in variables between the two groups were analyzed using the Independent Samples T-test or the Mann–Whitney U test. A P-value below 0.05 was considered statistically significant.

Machine learning to predict susceptibility to COVID-19

Machine learning (ML) models were used to identify the importance of each variable in predicting susceptibility to COVID-19. The dataset was split into 70% training and 30% test sets. All the computations were conducted using Python (version 3.11) and the scikit-learn library (version 1.2.2). The analysis proceeded through the following steps:

Standardization

Each variable was standardized using the ‘StandardScaler’ method to adjust them to have a mean of 0 and a standard deviation of 1 [17]. This procedure aimed to reduce biases resulting from differences in the measuring units.

Principal component analysis (PCA)

Due to high correlations among the original variables (Fig. 1), PCA was applied to transform the standardized variables into linearly uncorrelated components [18]. The number of principal components (PCs) was determined by setting the n_components parameter to 0.95, ensuring that 95% of the original variance was retained.

Features selection

A ‘sequential feature selector’ was applied to find the most predictive PCs [19] using a receiver operating characteristic (ROC) curve and area under the curve (AUC) as a scoring metric [20]. ROC-AUC was selected as the primary metric because it provides a comprehensive evaluation of model performance across all classification thresholds, which is important for susceptibility predictions. ROC-AUC was calculated using a repeated stratified k-fold cross-validation method (five-fold and ten-repeat). To find the best selector for the ‘sequential feature selector,’ Six different machine learning models were trained on the training set, including Linear Discriminant Analysis (LAD) [21], Logistic Regression (LR) [22], Support Vector Machine (SVM) [23], K-Nearest Neighbor (KNN) [24], Random Forest (RF) [25], eXtreme Gradient Boosting classifier (XGBoost classifier) [26]. Hyperparameter tuning of each model was performed with either ‘grid search cv’ or ‘randomize search cv’ to find the optimal parameters. The model with the highest ROC-AUC score on the test set was chosen as the selector. Feature selection was performed in both forward and backward directions, producing two sets of selected PCs.

Scoring selected PCs

Regression-based and tree-based models, including XGBoost, RF, Least Absolute Shrinkage and Selection Operator (LASSO) [27], LR and LDA were trained on the forward-selected and backward-selected PCs. Model evaluation on the test set was conducted using repeated stratified k-fold cross-validation (five-fold and ten-repeat) to calculate ROC-AUC scores. The model with the highest scores across both sets of PCs was chosen, and the set yielding the higher score was used to assess PCs’ importance via the model’s ‘coef’ attribute.

Translate PCA results back to the original features

PCA loadings were multiplied by the PCs’ importance as identified by the chosen model from the previous step. This helped to determine the most critical clinical and laboratory variables that contributed to the prediction model and were strongly associated with COVID-19 susceptibility.

Results

Baseline characteristics

Finally, 59 patients with type 2 diabetes in each group underwent analysis. Of these patients, 42 were female (71.2%) and 17 were male (28.8%). Weight (P = 0.024), BMI (P = 0.024), DBP (0.039), TG levels (P < 0.001), TC levels (P = 0.047), AIP (P < 0.001), TyG index (P < 0.001), and TyG-BMI index (P = 0.001) were all higher in the ‘COVID-19 positive’ group due to statistical tests (Table 1).

Table 1 Baseline clinical and laboratory characteristics of 59 patients in each group

Full size table

Feature selection

The mean AUCs were calculated from six different machine-learning models in order to evaluate their performance (Table 2). The LR model demonstrated the best performance, with a mean ROC-AUC of 0.715. Therefore, the LR was used as a selector in the ‘sequential feature selector’ method. Four PCs with indices of 0,3,6,8 were chosen from the primary nine PCs using a forward selection method. On the other hand, five PCs with indices of 0,3,5,6,8 were selected from the nine primary PCs using a backward selection method.

Table 2 The ROC-AUC score was obtained after applying repeated stratified k-fold cross-validation (five-fold and ten-repeat) on models

Full size table

Scoring selected PCs

The performance of five regression-based and tree-based models on both sets of forward-selected and backward-selected PCs was evaluated through the mean AUCs (Fig. 2). The best performance, with a mean ROC-AUC of 0.714, was achieved when LDA was applied to the backward-selected PCs.

Map PCA components back to original features

The ‘coef’ attribute of the trained LDA model was used to obtain the importance of backward-selected PCs. The role of each variable in predicting susceptibility to COVID-19 was scored by multiplying the PCA loadings by the PCs’ importance.

HDL-C, followed by eGFR, showed a strong negative correlation with the risk of contracting the virus. Higher levels of HDL-C and eGFR offer protection against COVID-19 in the T2DM population. But, the ratio of BUN to creatinine did not show any correlation. Conversely, the AIP, TyG index and TG showed the most positive correlation with susceptibility to COVID-19 in such a way that higher levels of these factors increase the risk of contracting the virus. The positive correlation of diastolic BP, TyG-BMI index, MAP, BMI, weight, TC, FPG, HbA1C, Cr, systolic BP, BUN, and LDL-C with the risk of COVID-19 decreased respectively (Fig. 3).

Discussion

In this study, we employed machine learning (ML) to identify the susceptibility to COVID-19 among the T2DM population based on clinical and laboratory variables. We obtained the best estimation performance when LDA was applied to backward-selected PCs, which contained five PCs out of the primary nine PCs. The model yielded a mean ROC-AUC of 0.714. Due to a small sample size, a ten-repeated stratified five-fold cross-validation approach was used to calculate the mean ROC-AUC. This mitigates potential overestimation or underestimation of the model’s predictive capacity.

In the present study, HDL-C, followed by eGFR, was observed as a protective factor against COVID-19 infection in individuals with T2DM, While the AIP, TyG index and TG levels were the most significant risk factors for predicting susceptibility to COVID-19.

The findings obtained from the Mann- Whitney U test revealed that HDL-C and eGFR did not have statistically significant P-values between the ‘COVID-19 positive’ and ‘COVID-19 negative’ groups. Disparities in the outcomes may be due to the different analytic approaches of ML models and statistical tests. ML models are designed to enhance prediction accuracy by several mechanisms; they identify the complex interactions between variables, conduct concurrent analyses of multiple variables, and are able to handle widely fluctuating data. Conversely, statistical tests have limitations in dealing with these attributes [28].

Low levels of HDL-C have been identified as a risk factor for different types of infection [29,30,31]. HDL-C, along with its most important component, apolipoprotein A-I (Apo A-I), contributed to the susceptibility to COVID-19 [32,33,34]. An increase of 10 mg/dl in HDL-C or Apo A-I was able to reduce the risk of COVID-19 by 10% [34]. There are multiple mechanisms behind this link; for example, the receptor for HDL-C is called scavenger receptor protein B-I (SR B-I), which facilitated the entry of COVID-19 into cells that have angiotensin-converting enzyme 2 (ACE2) receptor [35]. Therefore, a higher concentration of HDL-C inhibited the virus’ entry through the SR B-I pathway [36]. Additionally, Apo A-I disturbed the viral entrance into body cells independently [37]. Studies employing a genetic approach have suggested that low levels of HDL-C may have a causal effect on developing the infection [38, 39]. Our result about the protective effect of HDL-C was consistent with these findings.

In this study, it was determined that eGFR was the second protective factor against COVID-19. This was consistent with the findings of previous observational studies [11, 40, 41]. Lim et al. [40] reported an inverse association between eGFR and the risk of COVID-19, even among patients experiencing mild to moderate kidney dysfunction. Also, a study that used the Mendelian randomization analysis method found that kidney dysfunction causes increased susceptibility to contracting the virus [42]. However, we found the influence of eGFR on susceptibility to COVID-19 was almost comparable to that of BUN and approximately half of that of creatinine. Unlike eGFR, both BUN and creatinine were identified as risk factors for contracting COVID-19 (Fig. 3). In fact, creatinine demonstrated a more pronounced effect than eGFR in this study, according to the PCA analysis, but relative to other variables acting as risk factors, creatinine was ranked as the least influential variable.

Reduced renal function was associated with impaired protein catabolism that, in turn, induces chronic oxidative stress and systemic inflammation. Moreover, the retention of toxic metabolites in kidney dysfunction inhibits immune cell activation and increases their apoptosis, which leads to systemic immunosuppression [43].

Insulin resistance (IR) is a condition in which tissues’ response decreases to the stimulatory effect of insulin, leading to hyperinsulinemia and hyperglycemia. IR plays an important role in the pathogenesis of T2DM and makes this population more vulnerable to infection. IR and its related hyperglycemia increase the production of interleukin-6(IL-6), IL-1β, and TNF-α and develop a chronic inflammation; therefore, impair the function of the immune system [44]. Hyperinsulinemia increases membrane expression of ACE2, which serves as the receptor for COVID-19 in host cells [45], thereby facilitating viral entry and amplifying viral load. Conventional methods utilized for the assessment of IR are often costly and require technical expertise that may not be available across clinical settings [46]. Recently, cost-effective and valuable biomarkers have been introduced for the estimation of IR, including the TyG index [15], the TyG-BMI index [47], and the AIP [48].

We found the AIP and TyG index were two of the most critical risk factors for contracting COVID-19 among T2DM participants. A previous study on Iranian patients reported that the TyG index and TG/HDL-C ratio (AIP is equal to the base ten logarithm of this ratio) positively correlated with COVID-19 infection and prognosis [49]. Another study that considered diabetic patients showed the TyG index as a predictor for COVID-19 severity and mortality [50]. AIP was also shown to be associated with intubation and intensive care admission in hospitalized infected patients [51]. A few studies have investigated the relationship between the TyG-BMI index and COVID-19, such as a retrospective study on T2DM patients, which found that this index increased after contracting the viral infection [52]. We found the TyG-BMI index to be a risk factor for COVID-19 infection; however, its predictive value was lower than that of the AIP and TyG index.

IR promotes atherogenic dyslipidemia in the T2DM population by reducing HDL-C levels and increasing TG levels [53]. Furthermore, free fatty acids (FFA), which are generated during the breakdown of TG by lipoprotein lipase, exacerbate IR through numerous mechanisms [54]; FFA activates the proinflammatory pathway in the skeletal muscle known as nuclear factor (NF-κB), resulting in the secretion of proinflammatory cytokines and elevation in monocyte chemoattractant protein-1 (MCP-1) levels [55, 56]. MCP-1, in turn, leads to enhanced macrophage differentiation, thereby exacerbating the inflammatory state [57]. The present study found TG as a risk factor of equal importance to the AIP and TyG index for the development of COVID-19. Our findings are in line with previous studies that found TG levels as a predictor of the infection severity [58, 59].

Hypertension has been recognized as a potential risk factor for COVID-19, with a prevalence ranging from 27 to 34.6% among infected patients [60, 61]. Factors involved in blood pressure regulation, notably sodium level, aldosterone, and angiotensin II, contribute to the generation of reactive oxygen species. The background inflammation disturbs cell signaling and cell activation, especially within the immune system [62,63,64,65]. The dysregulated innate and adaptive immune cells produce several cytokines and worsen the inflammation [63, 65].

In addition to chronic inflammation, there are several factors related to hypertension that increase susceptibility to infection. These include a reduction in lymphocyte count [66], activation of inefficient CD8 + cell types in antiviral defense [67], and vascular stiffness [68, 69]. Interestingly, we found that high diastolic blood pressure was approximately five times more important than high systolic blood pressure in predisposing individuals to COVID-19 (Fig. 3).

The present study found that BMI, weight, and TC levels had equivalent risks of contracting the infection. Following these factors, the importance of FPG, HbA1C, Creatinine, systolic blood pressure, BUN, and LDL-C levels decreased as risk factors for COVID-19, respectively.

To the best of our knowledge, this is the first study to employ ML algorithms to value metabolic and clinical parameters, which are commonly assessed during follow-up visits for individuals with T2DM, in predicting COVID-19 susceptibility. Therefore, the outcomes of the present study can help healthcare practitioners evaluate and score the risk factors associated with infection in this population, leading to better clinical decision-making.

The primary limitation of our study is the small sample size of each group because of our inclusion criteria. We enrolled individuals who received regular follow-up in our outpatient clinic at least once every three months and whose T2DM was well-controlled. Also, we only included those with laboratory measurements obtained within three months before the pandemic onset in Iran, which further limited the sample size. The lack of data on urine analysis and albuminuria resulted in their omission from the final model, which should be considered as a limitation in the interpretation of the findings. The ML method applied in this study follows standard analysis protocols. While our findings may be helpful for the development of future diagnostic or therapeutic interventions, it is important to note that the current research does not involve any product, device, or therapeutic intervention subject to FDA regulations.

Conclusion

In this study, it was found that TG levels and IR markers (AIP and TyG-index) are the most significant risk factors for COVID-19 susceptibility in the population with T2DM. Furthermore, diastolic blood pressure, TyG-BMI index, BMI, weight, TC levels, FPG, HbA1C, creatinine, systolic blood pressure, BUN, and LDL-C levels showed decreased significance as risk factors. Conversely, HDL-C levels, followed by eGFR, were identified as the protective factor against the infection.

Data availability

All data supporting the findings of this study are available from the corresponding author upon reasonable request.

Abbreviations

T2DM:: Type 2 diabetic mellitus
COVID-19:: Coronavirus disease 2019
PCA:: Principal Component Analysis
HbA1C:: Hemoglobin A1C
FPG:: Fasting plasma glucose
HDL-C:: High-density lipoprotein cholesterol
LDL-C:: Low-density lipoprotein cholesterol
TC:: Total cholesterols
TG:: Triglycerides
AIP:: Atherogenic index of plasma
TyG index:: Triglyceride glucose index
TyG-BMI index:: Triglyceride glucose-body mass index
BP:: Blood pressure
MAP:: Mean arterial pressure
eGFR:: Estimated glomerular filtration rate
ML:: Machine learning
ROC-AUC:: Receiver operating characteristic curve and area under the curve
LAD:: Linear Discriminant Analysis
LR:: Logistic Regression
SVM:: Support vector machine
KNN:: K-Nearest Neighbor
RF:: Random forest
XGBoost classifier:: eXtreme Gradient Boosting classifier
LASSO:: Least Absolute Shrinkage and Selection Operator
Apo A-I:: Apolipoprotein A-I
SR B-I:: Scavenger receptor protein B-I
ACE2:: Angiotensin-converting enzyme 2
NF-κB:: Nuclear factor κB
MCP-1:: Monocyte chemoattractant protein-1

References

Stoian AP, Banerjee Y, Rizvi AA, Rizzo M. Diabetes and the COVID-19 pandemic: how insights from recent experience might Guide Future Management. Metab Syndr Relat Disord. 2020;18(4):173–5.
Article PubMed PubMed Central CAS Google Scholar
Leahy JL. Pathogenesis of type 2 diabetes mellitus. Arch Med Res. 2005;36(3):197–209.
Article PubMed CAS Google Scholar
Ormazabal V, Nair S, Elfeky O, Aguayo C, Salomon C, Zuñiga FA. Association between insulin resistance and the development of cardiovascular disease. Cardiovasc Diabetol. 2018;17(1):122.
Article PubMed PubMed Central CAS Google Scholar
Shah BR, Hux JE. Quantifying the risk of infectious diseases for people with diabetes. Diabetes Care. 2003;26(2):510–3.
Article PubMed Google Scholar
Farmer JA. Diabetic dyslipidemia and atherosclerosis: evidence from clinical trials. Curr Diab Rep. 2008;8(1):71–7.
Article PubMed Google Scholar
Taskinen MR, Borén J. New insights into the pathophysiology of dyslipidemia in type 2 diabetes. Atherosclerosis. 2015;239(2):483–95.
Article PubMed CAS Google Scholar
Saltiel AR, Olefsky JM. Inflammatory mechanisms linking obesity and metabolic disease. J Clin Invest. 2017;127(1):1–4.
Article PubMed PubMed Central Google Scholar
Bellia A, Andreadi A, Giudice L, De Taddeo S, Maiorino A, D’Ippolito I, et al. Atherogenic dyslipidemia on admission is Associated with poorer outcome in people with and without diabetes hospitalized for COVID-19. Diabetes Care. 2021;44(9):2149–57.
Article PubMed CAS Google Scholar
Alicic RZ, Rooney MT, Tuttle KR. Diabetic kidney disease: challenges, Progress, and possibilities. Clin J Am Soc Nephrol. 2017;12(12):2032–45.
Article PubMed PubMed Central CAS Google Scholar
Syed-Ahmed M, Narayanan M. Immune Dysfunction and risk of infection in chronic kidney disease. Adv Chronic Kidney Dis. 2019;26(1):8–15.
Article PubMed Google Scholar
Carlson N, Nelveg-Kristensen KE, Freese Ballegaard E, Feldt-Rasmussen B, Hornum M, Kamper AL, et al. Increased vulnerability to COVID-19 in chronic kidney disease. J Intern Med. 2021;290(1):166–78.
Article PubMed PubMed Central CAS Google Scholar
Li R, Chen Y, Ritchie MD, Moore JH. Electronic health records and polygenic risk scores for predicting disease risk. Nat Rev Genet. 2020;21(8):493–502.
Article PubMed CAS Google Scholar
Deo RC. Machine learning in Medicine. Circulation. 2015;132(20):1920–30.
Article PubMed PubMed Central Google Scholar
Levey AS, Stevens LA, Schmid CH, Zhang YL, Castro AF 3rd, Feldman HI, et al. A new equation to estimate glomerular filtration rate. Ann Intern Med. 2009;150(9):604–12.
Guerrero-Romero F, Simental-Mendía LE, González-Ortiz M, Martínez-Abundis E, Ramos-Zavala MG, Hernández-González SO, et al. The product of triglycerides and glucose, a simple measure of insulin sensitivity. Comparison with the euglycemic-hyperinsulinemic clamp. J Clin Endocrinol Metab. 2010;95(7):3347–51.
Article PubMed CAS Google Scholar
Niroumand S, Khajedaluee M, Khadem-Rezaiyan M, Abrishami M, Juya M, Khodaee G, Dadgarmoghaddam M. Atherogenic Index of Plasma (AIP): a marker of cardiovascular disease. Med J Islam Repub Iran. 2015;29:240.
PubMed PubMed Central Google Scholar
Cabello-Solorzano K, Ortigosa de Araujo I, Peña M, Correia L, Tallón-Ballesteros J A, editors. The impact of data normalization on the accuracy of machine learning algorithms: a comparative analysis. International Conference on Soft Computing Models in Industrial and Environmental Applications; 2023: Springer.
Shlens J. A tutorial on principal component analysis. arXiv Preprint arXiv:14041100. 2014.
Kotsiantis S. Feature selection for machine learning classification problems: a recent overview. Artif Intell Rev. 2011;42(1):157–76.
Article Google Scholar
Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett. 2006;27(8):861–74.
Article Google Scholar
Zhao H, Lai Z, Leung H, Zhang X, Zhao H, Lai Z et al. Linear discriminant analysis. Feature Learn Understanding: Algorithms Appl. 2020:71–85.
Van Houwelingen J, Le Cessie S. Logistic regression, a review. Stat Neerl. 1988;42(4):215–32.
Article Google Scholar
Burges CJ. A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc. 1998;2(2):121–67.
Article Google Scholar
Peterson LE. K-nearest neighbor. Scholarpedia. 2009;4(2):1883.
Article Google Scholar
Breiman L. Random forests. Mach Learn. 2001;45:5–32.
Article Google Scholar
Chen T, Guestrin C, editors. Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining; 2016.
Tibshirani R. Regression shrinkage and selection via the lasso. J Royal Stat Soc Ser B: Stat Methodol. 1996;58(1):267–88.
Article Google Scholar
Rajula HSR, Verlato G, Manchia M, Antonucci N, Fanos V. Comparison of Conventional Statistical methods with Machine Learning in Medicine: diagnosis, Drug Development, and treatment. Med (Kaunas). 2020;56(9).
Catapano AL, Pirillo A, Bonacina F, Norata GD. HDL in innate and adaptive immunity. Cardiovasc Res. 2014;103(3):372–83.
Article PubMed CAS Google Scholar
Claxton AJ, Jacobs DR Jr., Iribarren C, Welles SL, Sidney S, Feingold KR. Association between serum total cholesterol and HIV infection in a high-risk cohort of young men. J Acquir Immune Defic Syndr Hum Retrovirol. 1998;17(1):51–7.
Article PubMed CAS Google Scholar
Madsen CM, Varbo A, Tybjærg-Hansen A, Frikke-Schmidt R, Nordestgaard BG. U-shaped relationship of HDL and risk of infectious disease: two prospective population-based cohort studies. Eur Heart J. 2018;39(14):1181–90.
Article PubMed CAS Google Scholar
Aung N, Khanji MY, Munroe PB, Petersen SE. Causal Inference for Genetic Obesity, Cardiometabolic Profile and COVID-19 susceptibility: a mendelian randomization study. Front Genet. 2020;11:586308.
Article PubMed PubMed Central CAS Google Scholar
Chidambaram V, Kumar A, Majella MG, Seth B, Sivakumar RK, Voruganti D, et al. HDL cholesterol levels and susceptibility to COVID-19. EBioMedicine. 2022;82:104166.
Article PubMed PubMed Central CAS Google Scholar
Hilser JR, Han Y, Biswas S, Gukasyan J, Cai Z, Zhu R, et al. Association of serum HDL-cholesterol and apolipoprotein A1 levels with risk of severe SARS-CoV-2 infection. J Lipid Res. 2021;62:100061.
Article PubMed PubMed Central CAS Google Scholar
Wei C, Wan L, Yan Q, Wang X, Zhang J, Yang X, et al. HDL-scavenger receptor B type 1 facilitates SARS-CoV-2 entry. Nat Metab. 2020;2(12):1391–400.
Article PubMed CAS Google Scholar
Cho KH, Kim JR, Lee IC, Kwon HJ. Native high-density lipoproteins (HDL) with higher paraoxonase exerts a potent antiviral effect against SARS-CoV-2 (COVID-19), while glycated HDL lost the antiviral activity. Antioxid (Basel). 2021;10(2).
Oliveira C, Fournier C, Descamps V, Morel V, Scipione CA, Romagnuolo R, et al. Apolipoprotein(a) inhibits hepatitis C virus entry through interaction with infectious particles. Hepatology. 2017;65(6):1851–64.
Article PubMed CAS Google Scholar
Trinder M, Walley KR, Boyd JH, Brunham LR. Causal inference for genetically determined levels of high-density lipoprotein cholesterol and risk of Infectious Disease. Arterioscler Thromb Vasc Biol. 2020;40(1):267–78.
Article PubMed CAS Google Scholar
Trinder M, Wang Y, Madsen CM, Ponomarev T, Bohunek L, Daisely BA, et al. Inhibition of Cholesteryl Ester Transfer Protein Preserves High-Density Lipoprotein Cholesterol and improves survival in Sepsis. Circulation. 2021;143(9):921–34.
Article PubMed CAS Google Scholar
Lim Y, Lee MH, Lee SK, Jeong S, Han HW. Increased estimated GFR is negatively Associated with the risk of SARS-CoV-2 infection and severe COVID-19 within normal to mildly decreased levels: nested case-control study. J Korean Med Sci. 2023;38(49):e415.
Article PubMed PubMed Central CAS Google Scholar
Mirijello A, Piscitelli P, de Matthaeis A, Inglese M, D’Errico MM, Massa V et al. Low eGFR is a strong predictor of worse outcome in hospitalized COVID-19 patients. J Clin Med. 2021;10(22).
Li Q, Lin M, Deng Y, Huang H. The causal relationship between COVID-19 and estimated glomerular filtration rate: a bidirectional mendelian randomization study. BMC Nephrol. 2024;25(1):21.
Article PubMed PubMed Central CAS Google Scholar
Kurts C, Panzer U, Anders HJ, Rees AJ. The immune system and kidney disease: basic concepts and clinical implications. Nat Rev Immunol. 2013;13(10):738–53.
Article PubMed CAS Google Scholar
Muniyappa R, Montagnani M, Koh KK, Quon MJ. Cardiovascular actions of insulin. Endocr Rev. 2007;28(5):463–91.
Article PubMed CAS Google Scholar
Govender N, Khaliq OP, Moodley J, Naicker T. Insulin resistance in COVID-19 and diabetes. Prim Care Diabetes. 2021;15(4):629–34.
Article PubMed PubMed Central CAS Google Scholar
Singh B, Saxena A. Surrogate markers of insulin resistance: a review. World J Diabetes. 2010;1(2):36–47.
Article PubMed PubMed Central Google Scholar
Er LK, Wu S, Chou HH, Hsu LA, Teng MS, Sun YC, Ko YL. Triglyceride glucose-body Mass Index is a simple and clinically useful surrogate marker for insulin resistance in nondiabetic individuals. PLoS ONE. 2016;11(3):e0149731.
Article PubMed PubMed Central Google Scholar
Lioy B, Webb RJ, Amirabdollahian F. The Association between the Atherogenic Index of Plasma and Cardiometabolic Risk Factors: A Review. Healthcare (Basel). 2023;11(7).
Rohani-Rasaf M, Mirjalili K, Vatannejad A, Teimouri M. Are lipid ratios and triglyceride-glucose index associated with critical care outcomes in COVID-19 patients? PLoS ONE. 2022;17(8):e0272000.
Article PubMed PubMed Central CAS Google Scholar
Ren H, Yang Y, Wang F, Yan Y, Shi X, Dong K, et al. Association of the insulin resistance marker TyG index with the severity and mortality of COVID-19. Cardiovasc Diabetol. 2020;19(1):58.
Article PubMed PubMed Central CAS Google Scholar
Turgay Yıldırım Ö, Kaya Ş. The atherogenic index of plasma as a predictor of mortality in patients with COVID-19. Heart Lung. 2021;50(2):329–33.
Article PubMed PubMed Central Google Scholar
Alshammari S, AlMasoudi AS, AlBuhayri AH, AlAtwi HM, AlHwiti SS, Alaidi HM, et al. Effect of COVID-19 on glycemic control, Insulin Resistance, and pH in Elderly patients with type 2 diabetes. Cureus. 2023;15(2):e35390.
PubMed PubMed Central Google Scholar
Manoria PC, Chopra HK, Parashar SK, Dutta AL, Pinto B, Mullasari A, Prajapati S. The nuances of atherogenic dyslipidemia in diabetes: focus on triglycerides and current management strategies. Indian Heart J. 2013;65(6):683–90.
Article PubMed PubMed Central CAS Google Scholar
Boden G. Obesity, insulin resistance and free fatty acids. Curr Opin Endocrinol Diabetes Obes. 2011;18(2):139–43.
Article PubMed PubMed Central CAS Google Scholar
Boden G, She P, Mozzoli M, Cheung P, Gumireddy K, Reddy P, et al. Free fatty acids produce insulin resistance and activate the proinflammatory nuclear factor-kappab pathway in rat liver. Diabetes. 2005;54(12):3458–65.
Article PubMed CAS Google Scholar
Itani SI, Ruderman NB, Schmieder F, Boden G. Lipid-induced insulin resistance in human muscle is associated with changes in diacylglycerol, protein kinase C, and IkappaB-alpha. Diabetes. 2002;51(7):2005–11.
Article PubMed CAS Google Scholar
Weisberg SP, McCann D, Desai M, Rosenbaum M, Leibel RL, Ferrante AW. Jr. Obesity is associated with macrophage accumulation in adipose tissue. J Clin Invest. 2003;112(12):1796–808.
Article PubMed PubMed Central CAS Google Scholar
Masana L, Correig E, Ibarretxe D, Anoro E, Arroyo JA, Jericó C, et al. Low HDL and high triglycerides predict COVID-19 severity. Sci Rep. 2021;11(1):7217.
Article PubMed PubMed Central CAS Google Scholar
Zhong P, Wang Z, Du Z. Serum triglyceride levels and related factors as prognostic indicators in COVID-19 patients: a retrospective study. Immun Inflamm Dis. 2021;9(3):1055–60.
Article PubMed PubMed Central CAS Google Scholar
Reynolds HR, Adhikari S, Pulgarin C, Troxel AB, Iturrate E, Johnson SB, et al. Renin-angiotensin-aldosterone system inhibitors and risk of Covid-19. N Engl J Med. 2020;382(25):2441–8.
Article PubMed CAS Google Scholar
Vincent JL, Taccone FS. Understanding pathways to death in patients with COVID-19. Lancet Respir Med. 2020;8(5):430–2.
Article PubMed PubMed Central CAS Google Scholar
Guzik TJ, Hoch NE, Brown KA, McCann LA, Rahman A, Dikalov S, et al. Role of the T cell in the genesis of angiotensin II induced hypertension and vascular dysfunction. J Exp Med. 2007;204(10):2449–60.
Article PubMed PubMed Central CAS Google Scholar
Norlander AE, Madhur MS, Harrison DG. The immunology of hypertension. J Exp Med. 2018;215(1):21–33.
Article PubMed PubMed Central CAS Google Scholar
Patrick DM, Van Beusecum JP, Kirabo A. The role of inflammation in hypertension: novel concepts. Curr Opin Physiol. 2021;19:92–8.
Article PubMed CAS Google Scholar
Touyz RM, Rios FJ, Alves-Lopes R, Neves KB, Camargo LL, Montezano AC. Oxidative stress: a unifying paradigm in hypertension. Can J Cardiol. 2020;36(5):659–70.
Article PubMed Google Scholar
Siedlinski M, Jozefczuk E, Xu X, Teumer A, Evangelou E, Schnabel RB, et al. White Blood cells and blood pressure: a mendelian randomization study. Circulation. 2020;141(16):1307–17.
Article PubMed PubMed Central CAS Google Scholar
Youn JC, Yu HT, Lim BJ, Koh MJ, Lee J, Chang DY, et al. Immunosenescent CD8 + T cells and C-X-C chemokine receptor type 3 chemokines are increased in human hypertension. Hypertension. 2013;62(1):126–33.
Article PubMed CAS Google Scholar
Rodilla E, López-Carmona MD, Cortes X, Cobos-Palacios L, Canales S, Sáez MC, et al. Impact of arterial stiffness on all-cause mortality in patients hospitalized with COVID-19 in Spain. Hypertension. 2021;77(3):856–67.
Article PubMed CAS Google Scholar
Safar ME, Asmar R, Benetos A, Blacher J, Boutouyrie P, Lacolley P, et al. Interaction between Hypertension and arterial stiffness. Hypertension. 2018;72(4):796–805.
Article PubMed CAS Google Scholar

Download references

Acknowledgements

Thanks to Doctor Foroozan Salari for her cooperation in collecting data.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Medical School, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
Motahare Shabestari
Diabetes Research Center, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
Reyhaneh Azizi & Akram Ghadiri-Anari

Authors

Motahare Shabestari
View author publications
You can also search for this author inPubMed Google Scholar
Reyhaneh Azizi
View author publications
You can also search for this author inPubMed Google Scholar
Akram Ghadiri-Anari
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

M.S wrote the original draft, analyzed the data, and coded in Python R.A edited the original draft, prepared the tables and figures. A.G was the supervisor and manager of the project, designed the methodology, and reviewed the investigation.

Corresponding author

Correspondence to Akram Ghadiri-Anari.

Ethics declarations

Ethics approval and consent to participate

The study was approved by the Research Ethics Committee of Shahid Sadoughi University of Medical Sciences, Yazd, Iran (IR.SSU.REC.1401.097). Written informed consent to participate was obtained from all enrolled subjects. The study was performed in accordance with the principles of the Helsinki Declaration.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Shabestari, M., Azizi, R. & Ghadiri-Anari, A. Type 2 diabetes and susceptibility to COVID-19: a machine learning analysis. BMC Endocr Disord 24, 221 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12902-024-01758-3

Download citation

Received: 23 July 2024
Accepted: 16 October 2024
Published: 21 October 2024
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12902-024-01758-3

You are viewing the site in preview mode

Type 2 diabetes and susceptibility to COVID-19: a machine learning analysis

Abstract

Background

Method

Results

Conclusion

Introduction

Materials and methods

Study design

Patients and population

Clinical variables and laboratory measures

Statistical analysis

Machine learning to predict susceptibility to COVID-19

Standardization

Principal component analysis (PCA)

Features selection

Scoring selected PCs

Translate PCA results back to the original features

Results

Baseline characteristics

Feature selection

Scoring selected PCs

Map PCA components back to original features

Discussion

Conclusion

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Endocrine Disorders

Contact us