Development and external validation of an interpretable machine learning model for obesity-depression comorbidity in Korean and US adults

Shangguan, Yuwen; Lin, Zhenhao; Sim, Young-Je; Wu, Kunpeng; Chu, Yu; Huang, Kunyi; Chen, Fangxi; Ji, Kangkang; Chen, Fang; Liu, Shangrui

doi:10.3389/ijph.2026.1609153

ORIGINAL ARTICLE

Int. J. Public Health, 28 May 2026

Volume 71 - 2026 | https://doi.org/10.3389/ijph.2026.1609153

Development and external validation of an interpretable machine learning model for obesity-depression comorbidity in Korean and US adults

YS
Yuwen Shangguan ¹^†
ZL
Zhenhao Lin ¹^†
YS
Young-Je Sim ¹
KW
Kunpeng Wu ¹
YC
Yu Chu ²
KH
Kunyi Huang ³
FC
Fangxi Chen ¹
KJ
Kangkang Ji ⁴^*
FC
Fang Chen ²^*
SL
Shangrui Liu ⁵^*

1. Department of Exercise Physiology, Kunsan National University, Gunsan, Republic of Korea
2. Yancheng Key Laboratory of Molecular Epigenetics, Yancheng Medical Research Center of Nanjing University Medical School, The First People’s Hospital of Yancheng, Yancheng, China
3. Department of Health and Physical Education, The Education University of Hong Kong, Tai Po, Hong Kong SAR, China
4. Department of Clinical Medical Research, Binhai County People’s Hospital, Binhai Clinical College, Yangzhou University Medical College, Yancheng, Jiangsu, China
5. Department of Physical Education, Kyungpook National University, Daegu, Republic of Korea

Abstract

Objective:

To investigate the association between physical inactivity and obesity–depression comorbidity (ODC), defined as the co-occurrence of obesity and depression, and to develop an effective screening tool for identifying high-risk individuals to facilitate early intervention.

Methods:

Data were obtained from 3,357 physically inactive adults enrolled in the Korea National Health and Nutrition Examination Survey (KNHANES, 2007–2012). An XGBoost machine learning framework was applied to develop predictive models. Feature selection was conducted using random forest, and the prediction mechanism was interpreted with SHAP values. The model was validated internally using KNHANES 2011–2012 data and externally with the U.S. NHANES dataset.

Results:

The XGBoost model demonstrated good discriminative performance in internal validation (AUC = 0.783 and 0.744) and achieved an external validation AUC of 0.886. Feature importance analysis revealed that insulin concentration, white blood cell count, and height were the primary predictors of ODC, with insulin exerting the strongest influence.

Conclusion:

This study developed a high-performing and interpretable prediction model for ODC risk. SHAP-based interpretation identified insulin as the most influential predictor within the model, suggesting that metabolic factors may be important for ODC risk stratification.

Introduction

The co-occurrence of obesity and depression, termed obesity-depression comorbidity (ODC), has emerged as a significant global public health challenge receiving heightened attention within healthcare systems and society [–]. Obesity represents a chronic, multifactorial disease state intricately associated with metabolic dysregulation, cardiovascular pathology, and mental health disturbances [, ]. Concurrently, depression—among the most prevalent affective disorders—exhibits strong bidirectional relationships with chronic somatic conditions, particularly obesity [, ]. Against the backdrop of evolving socioeconomic structures and lifestyle patterns, obesity and depression prevalence continues to escalate within adult populations globally. Their comorbid presentation has demonstrated increasing frequency, substantially impairing patients’ quality of life while imposing growing economic burdens on healthcare infrastructures and society [, ]. Although epidemiological associations between depression and obesity are well-documented, the underlying mechanistic pathways and causal sequences remain inadequately characterized []. The parallel increase in both conditions necessitates urgent development of interventions targeting their complex interplay. In the present study, ODC was treated as a concurrent comorbidity outcome rather than a directional transition from obesity to depression or from depression to obesity.

Current therapeutic modalities primarily encompass pharmacological and psychological approaches, yet these strategies frequently encounter limitations regarding sustained efficacy and treatment adherence [, ]. Given the multifactorial etiology of depression and obesity—involving genetic, environmental, and psychosocial determinants—innovative therapeutic paradigms are required to enhance outcomes and establish personalized intervention pathways [, ]. The identification and mechanistic dissection of pivotal factors in ODC development are consequently imperative for optimizing interventional efficacy. Previous research has established physical inactivity as a shared risk factor for both obesity and depression [, ], suggesting that augmented physical activity may complement conventional treatment strategies. Nevertheless, despite well-characterized associations between exercise engagement and these disease states, substantial knowledge gaps persist concerning their complex interactions, particularly within large-scale multi-cycle population-based survey data. Prior cross-sectional analyses have inadequately explored synergistic effects among physical activity, socioeconomic status, and dietary patterns, lacking systematic examination of comorbidity mechanisms. Contemporary machine learning methodologies and explainable artificial intelligence algorithms such as SHAP (Shapley Additive Explanations) have demonstrated considerable potential in health risk assessment and disease mechanism research through their capacity to model intricate relationships while enhancing interpretability [, ].

Given these research gaps—especially regarding multifactorial interactions and cross-sectional characteristics collected across multiple survey cycles—this study focuses on physically inactive adults. We employ an integrated machine learning and SHAP analytical framework to develop an interpretable risk prediction model for ODC and to quantify the relative contributions of key predictors at the model level. The model was specifically developed for risk stratification among physically inactive adults, a subgroup considered to be at elevated risk for obesity-depression comorbidity. Leveraging data from KNHANES (2007–2012), we examined how physical inactivity is associated with the co-occurrence of obesity and depression. Through integration of multidimensional data (demographic, behavioral, socioeconomic, clinical, and nutritional domains), this study identifies major predictors of ODC within the model and quantifies their relative contributions, thereby providing a basis for future risk stratification and hypothesis generation. Our principal objective was to address knowledge gaps in nonlinear pattern analysis and quantitative feature attribution via integration of machine learning and SHAP, thereby improving understanding of ODC-related risk patterns.

Methods

Data source and study population

Analytical data originated from the 2007–2012 Korea National Health and Nutrition Examination Survey (KNHANES) database administered by the Korea Centers for Disease Control and Prevention (CDC) [, ]. KNHANES constitutes a nationally representative continuous cross-sectional survey employing multistage stratified cluster sampling methodology. The survey comprises three distinct modules: health examinations, nutrition assessments, and health interviews. The institutional review board of the KCDC granted ethical approval for the study protocol, with all participants providing written informed consent.

The initial analytical cohort included 50,405 KNHANES participants (2007–2012). Screening procedures first excluded individuals with incomplete physical activity documentation, yielding 8,263 eligible subjects. Subsequent exclusions comprised participants meeting adequate physical activity thresholds (defined in Section Definition of physically inactive population) and those aged <19 years, leaving 3,750 individuals. Following exclusion of 393 subjects with missing depression-related metrics, the final analytical sample encompassed 3,357 adults. To assess temporal robustness within the same survey framework, the 2011 (n = 477) and 2012 (n = 502) KNHANES subsets were reserved as temporally separated internal validation cohorts. The remaining participants from 2007 to 2010 (n = 2,378) were randomly partitioned into training (n = 1,665) and testing (n = 713) subsets at a 7:3 ratio. Figure 1 presents the comprehensive participant selection workflow.

FIGURE 1

Definition of physically inactive population

Consistent with World Health Organization (WHO) physical activity guidelines, this study evaluated individual activity levels using metabolic equivalent minutes per week (MET-min/week). Exclusion criteria incorporated missing essential data elements including weekly frequency and average daily duration of walking, moderate-intensity exercise, and vigorous-intensity exercise. Referencing WHO recommendations, minimum effective session durations were established: ≥15 min for vigorous exercise and ≥10 min for moderate exercise or walking. To minimize extreme value influence on cumulative activity calculations, session durations underwent Winsorization at the 99th percentile. Following KNHANES methodology, total weekly physical activity (PA) was computed as PA (MET-min/week) = MET coefficient × session duration × weekly frequency []. Based on WHO standards, participants achieving PA < 600 MET-min/week were classified as “physically inactive,” whereas those attaining PA ≥ 600 MET-min/week were designated “physically active” [–].

Definition of obesity-depression comorbidity

This investigation employed Asian-specific diagnostic criteria for obesity classification and the Patient Health Questionnaire-9 (PHQ-9) for depression assessment to define obesity-depression comorbidity (ODC). Obesity categorization integrated body mass index (BMI) and waist circumference (WC) measurements: generalized obesity was defined as BMI ≥25 kg/m²; abdominal obesity was defined as WC ≥ 90 cm for males or ≥85 cm for females. Participants were consequently stratified into four mutually exclusive categories: 1) Non-obese (below threshold values for both BMI and WC); 2) Isolated abdominal obesity (WC exceeding threshold with subthreshold BMI); 3) Isolated generalized obesity (BMI exceeding threshold with subthreshold WC); and 4) Compound obesity (exceeding thresholds for both indices). For analytical purposes, categories 2–4 were collectively classified as obese. Depression status was assessed using the Patient Health Questionnaire-9 (PHQ-9; total score range: 0–27) []. Consistent with prior validation studies, a PHQ-9 score of ≥10 was used to indicate clinically significant depressive symptoms []. Thus, in this study, depression was operationally defined on the basis of symptom screening rather than physician-diagnosed depression.

Candidate predictor variables

Based on existing literature and clinical expertise, this study incorporated multiple classes of potential predictor variables relevant to depression-obesity comorbidity, comprising: Demographic and sociological characteristics (sex, age, household income, educational attainment, marital status); health status indicators and disease history (hypertension, dyslipidemia, stroke, myocardial infarction, arthritis, diabetes, smoking status, alcohol consumption); clinical signs and laboratory parameters (systolic blood pressure, diastolic blood pressure, height, fasting glucose, insulin, total cholesterol, high-density lipoprotein cholesterol (HDL-C), triglycerides, hematocrit, ferritin, serum creatinine, vitamin D, white blood cell count, red blood cell count, platelet count); dietary intake metrics derived from 24-h recall (total food mass, total energy intake, water consumption, protein, fat, carbohydrates, calcium, phosphorus, iron, sodium, potassium, vitamin A, β-carotene, retinol, thiamine, riboflavin, niacin, vitamin C).

Data preprocessing and machine learning modeling

The initial dataset contained 46 predictor variables (12 categorical, 34 continuous). To develop robust, generalizable prediction models, systematic data preprocessing and modeling procedures were implemented. Samples with missing values were excluded in a complete-case analysis, rather than being imputed, to avoid introducing additional model-based uncertainty across heterogeneous demographic, clinical, and nutritional variables during preprocessing. Subsequently, three feature selection methods—logistic regression, LASSO regression, and random forest—were applied to identify optimal predictive feature subsets. The LASSO method was selected for final feature subset construction based on five-fold cross-validated area under the receiver operating characteristic curve (AUC-ROC) performance in the training set. SHAP values were used only after final model development to interpret variable contributions within the XGBoost model; therefore, SHAP-based importance rankings were not intended to replicate the feature selection results obtained in the preprocessing stage. Pearson correlation coefficients were computed to address multicollinearity, retaining variables with stronger outcome associations when pairwise correlations exceeded 0.8. To address class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied exclusively to the training set to improve recognition of the minority ODC-positive class []. No oversampling was performed in the internal test or external validation datasets, thereby reducing the risk of information leakage and overly optimistic performance estimates.

Multiple candidate models were trained and compared using SMOTE-processed data, including logistic regression, random forest, XGBoost, decision tree, naïve Bayes, K-nearest neighbors, and radial basis function (RBF) kernel support vector machines (SVM). Hyperparameters were systematically optimized via grid search with five-fold cross-validation on the internal test set. Following comprehensive performance comparisons, the optimally parameterized XGBoost model was selected for final evaluation on temporally independent external validation sets (2011 and 2012 data) to assess clinical generalizability and stability. Crucially, SMOTE application was restricted to training data construction, while test and external validation sets retained original class distributions to prevent information leakage and ensure objective evaluation.

External validation strategy

To objectively evaluate model generalizability, the 2005–2020 U.S. National Health and Nutrition Examination Survey (NHANES) cohort served as an independent external validation dataset. Applying identical inclusion/exclusion criteria as KNHANES produced a validation cohort of 2,070 participants (demographic characteristics in Supplementary Table S1). NHANES was selected as an accessible and well-characterized independent population-based dataset for external testing; however, it was not intended to represent the most culturally or clinically comparable population to South Korea. We did not perform U.S.-specific recalibration. Instead, the NHANES analysis was intended to assess the external discrimination and transportability of the KNHANES-derived model in an independent population setting. In the external validation cohort, obesity was defined according to U.S.-appropriate criteria to preserve the clinical relevance of outcome ascertainment in that population. Within this cohort, XGBoost performance was benchmarked against conventional algorithms including logistic regression, SVM, and random forest, primarily using area under the receiver operating characteristic curve (AUC) for discriminative ability assessment. To interpret the optimal model’s (XGBoost) prediction patterns and identify influential predictors, mean absolute SHAP values were computed across the validation cohort, ensuring interpretative consistency and local explanation accuracy.

Statistical analysis

Binary prediction performance for ODC was comprehensively evaluated through systematic comparison of multiple machine learning algorithms: XGBoost, decision tree, logistic regression, naïve Bayes, K-nearest neighbors, random forest, and RBF-kernel SVM. Performance was assessed multidimensionally: Fundamental metrics included error rate and accuracy; class imbalance was addressed via Fβ-score (integrating precision and recall); discriminatory capacity was measured by AUC, sensitivity, and specificity; precision-recall balance was evaluated via precision-recall AUC (PR AUC). Calibration curves assessed agreement between predicted probabilities and observed event rates. Decision curve analysis (DCA) quantified clinical utility by comparing net benefits across decision thresholds. Following comparative evaluation using these metrics, the best-performing XGBoost model underwent final validation and interpretability analysis. SHAP (SHapley Additive exPlanations) values enabled quantification of individualized feature contributions to ODC predictions, enhancing model interpretability. Finally, an interactive online risk prediction tool was developed using the R Shiny framework based on the validated XGBoost architecture. All analyses were conducted in R (version 4.4.1) employing critical packages: DMwR, ggcor, mlr3, mlr3benchmark, mlr3extralearners, kernelshap, and shapviz. Two-sided statistical tests were applied with significance defined as p < 0.05.

Results

Baseline characteristics of the study population

The final analytical cohort comprised 3,357 physically inactive adult participants, stratified into training (n = 1,665), testing (n = 713), and two temporally independent internal validation cohorts (2011: n = 477; 2012: n = 502). Baseline characteristic analysis revealed no statistically significant differences in core demographic and clinical variables—including gender distribution, household income, educational attainment, marital status, and chronic disease history—across datasets (p > 0.05), with all standardized mean differences (SMD) below the 0.1 threshold. Though statistically significant variations existed for select metabolic and nutritional indicators (p < 0.05), their SMD values remained below the 0.3 benchmark for clinical relevance. Compared to the training set, validation cohorts exhibited elevated high-density lipoprotein cholesterol (HDL-C) and hematocrit concentrations, while demonstrating significantly reduced vitamin D levels and diminished dietary intakes of carbohydrates, phosphorus, sodium, and potassium (all p < 0.01; SMDs < 0.227). Statistically significant but clinically marginal differences were also observed for diastolic blood pressure, height, insulin concentrations, and alcohol consumption patterns (p < 0.05; SMDs<0.106). With all variables exhibiting SMDs below 0.3, the datasets demonstrated satisfactory clinical comparability for subsequent machine learning modeling and validation procedures (complete data in Supplementary Table S2).

To elucidate geographical heterogeneity in obesity-depression comorbidity (ODC) burden, we generated a spatial heatmap depicting ODC prevalence probabilities across Korean administrative regions (Figure 2). Results revealed marked epidemiological disparities: South Gyeongsang Province exhibited the highest prevalence (10.7%), followed sequentially by South Chungcheong, North Jeolla, Jeju Island, and North Gyeongsang (all >6.0%). Remaining regions demonstrated relatively uniform distribution patterns. This geographical stratification establishes critical context for subsequent subgroup analyses and informs assessments of model generalizability across diverse populations.

FIGURE 2

Feature selection

To construct an optimal predictive feature subset, we systematically evaluated three feature selection methodologies—logistic regression, LASSO regression, and random forest—on the training cohort (Figure 3A). Comparative assessment using five-fold cross-validated area under the receiver operating characteristic curve (AUC-ROC) demonstrated the random forest approach achieving superior discriminatory capacity (AUC = 0.721), significantly outperforming LASSO regression (AUC = 0.698) and logistic regression (AUC = 0.674) (Figure 3B). Consequently, random forest was selected as the definitive feature selection technique. Variable importance ranking yielded the top 30 predictors (Figure 3D), encompassing metabolic biomarkers, hematological indices, nutritional parameters, and demographic characteristics. To mitigate multicollinearity effects, Pearson correlation matrices were computed (Supplementary Figure S1). For feature pairs exhibiting correlation coefficients >0.8, we retained variables demonstrating stronger associations with the outcome (ODC), excluding six redundant parameters: serum creatinine, vitamin A, water intake, carbohydrate intake, total energy intake, and hematocrit. This refinement process yielded a final feature set comprising 24 core variables for model construction (Figure 3C). These results reflect the performance of alternative feature selection strategies when all candidate variables were initially entered for screening, with the aim of identifying the most suitable method for constructing an optimal predictor subset. Thus, the logistic regression result shown in Figure 3 represents a preliminary feature selection performance rather than the final performance of a logistic regression classifier.

FIGURE 3

Model performance comparison and optimization

After the optimal feature selection method was determined and the retained variables were used to construct the final dataset, seven machine learning classifiers were compared to identify the best-performing predictive model. Comprehensive algorithm comparison (Supplementary Figure S2; Table 1) revealed that XGBoost and random forest models substantially outperformed alternatives. Both attained accuracy exceeding 98%, ROC AUC surpassing 0.98, sensitivity approaching 100%, specificity exceeding 92%, and precision-recall AUC (PR AUC) above 0.97 when evaluated on the SMOTE-processed training data (Figure 4). Their respective Brier scores—0.0240 (ranked first) and 0.0244 (ranked second)—indicated optimal probability calibration. Moderate performance was observed for K-nearest neighbors and support vector machines, while decision trees, naive Bayes, and logistic regression showed markedly inferior metrics (notably Brier scores >0.14, specificity <75%, and PR AUC <0.5), suggesting inadequate predictive stability (detailed metrics in Supplementary Table S3). On the independent test set preserving original class distribution, the XGBoost model maintained strong generalization capability (Supplementary Table S4), achieving an AUC of 0.750—confirming robust discriminatory power. Following clinical sensitivity optimization using a 0.1 decision threshold, the model attained a recall rate of 65.38% (95% CI: 54.2%–75.4%), though precision remained constrained at 8.46%, reflecting inherent sensitivity-precision trade-offs in severely imbalanced data (F1 score = 0.150; accuracy = 72.93%). However, this sensitivity-oriented threshold was associated with a low positive predictive value, indicating a substantial false-positive burden and limiting the model’s suitability as a standalone clinical screening tool. The test set confusion matrix further delineated classification performance of the optimized XGBoost model (Supplementary Table S5).

TABLE 1

Model	Error Rate	Accuracy	F-beta	ROC AUC	Sensitivity	Specificity	PR AUC
Random forest	0.0122	0.9878	0.9928	0.9895	1.0000	0.9212	0.9783
XGBoost	0.0186	0.9814	0.9890	0.9883	0.9887	0.9418	0.9715
K-nearest neighbors	0.0441	0.9559	0.9734	0.9507	0.9523	0.9760	0.7624
SVM (RBF)	0.0939	0.9061	0.9459	0.9184	0.9711	0.5514	0.7617
Decision tree	0.1019	0.8981	0.9400	0.8640	0.9447	0.6438	0.5958
Naive Bayes	0.1539	0.8461	0.9063	0.8864	0.8807	0.6575	0.5950
Logistic regression	0.1576	0.8424	0.9126	0.7661	0.9742	0.1233	0.3469

Performance evaluation results of seven machine learning models (South Korea, 2007–2012).

Abbreviations: XGBoost, eXtreme Gradient Boosting; SVM (RBF), Support Vector Machine (Radial Basis Function); F-beta, harmonic mean of precision and recall with adjustable weighting toward recall; ROC AUC, area under the receiver operating characteristic curve; PR AUC, area under the precision-recall curve.

FIGURE 4

Systematic evaluation of seven machine learning algorithms (Supplementary Figure S2) employed SMOTE-oversampled training data (ODC positive:negative ratio = 1:5). Through grid search coupled with five-fold cross-validation for hyperparameter optimization, the optimal configuration was determined (learning rate eta = 0.1, max_depth = 6, subsample = 0.8, lambda = 1.0), with early stopping regularization controlling overfitting. As documented in Supplementary Table S6, peak performance occurred at the 44th iteration (test set AUC = 0.750), establishing XGBoost as the definitive predictive framework.

Internal and external model validation

To rigorously assess generalizability and clinical utility, the XGBoost model underwent comprehensive validation using two temporally distinct internal cohorts (2011: n = 477; 2012: n = 502). As presented in Supplementary Table S7, the model demonstrated excellent temporal discriminative capability (Supplementary Figure S3): ROC curve analysis yielded AUC values of 0.783 (95% CI: 0.702–0.864) for the 2011 cohort and 0.744 (95% CI: 0.652–0.835) for the 2012 cohort. Implementation of a low decision threshold (0.1) optimized for screening sensitivity achieved detection rates of 84.2% (95% CI: 73.1%–91.4%) and 70.0% (95% CI: 55.9%–81.2%) in the respective cohorts, successfully capturing over two-thirds of true positive cases—meeting fundamental requirements for early screening instruments.

In external validation using the U.S. NHANES cohort, the XGBoost model achieved an AUC of 0.886, allowing direct comparison with the internal validation performance reported above. The random forest classifier ranked second (AUC = 0.858), followed by radial basis function (RBF) kernel support vector machine (AUC = 0.831). Remaining models demonstrated comparatively limited predictive capacity: K-nearest neighbors (AUC = 0.795), logistic regression (AUC = 0.778), naive Bayes (AUC = 0.759), and decision tree (AUC = 0.667), See Supplementary Table S8 for details. Comparative ROC curves are depicted in Supplementary Figure S4.

SHAP interpretability analysis

SHAP (SHapley Additive exPlanations) was used to interpret the XGBoost model’s prediction patterns. Feature importance analysis (Figure 5) identified insulin concentration as the predominant predictor of obesity-depression comorbidity (mean |SHAP| = 0.052), exerting substantially greater influence than secondary contributors: white blood cell count (0.036), height (0.027), ferritin (0.023), HDL-C (0.021), and age (0.012). SHAP summary plots indicated that elevated insulin, advanced age, increased systolic blood pressure, and higher white blood cell counts were associated with higher predicted ODC probability, whereas greater height and elevated HDL-C concentrations were associated with lower predicted ODC probability. Supplementary Figure S5 provides additional SHAP dependence visualizations.

FIGURE 5

External validation SHAP analysis using the NHANES cohort further delineated the optimal model’s decision architecture and feature contribution patterns (Supplementary Figure S6). Insulin was reconfirmed as the most influential predictive variable, with its mean absolute SHAP value substantially exceeding those of other features—underscoring its centrality in model discrimination. Key predictors including age, retinol, height, and fasting glucose (fglu) followed in descending order of importance, exhibiting remarkable concordance with KNHANES-derived SHAP results. This cross-cohort reproducibility enhances model credibility and suggests that integrated metabolic, nutritional, and developmental features may provide a robust predictive foundation across populations.

Online prediction tool demonstration

Based on the rigorously validated XGBoost framework, we developed a clinically oriented online prediction instrument (accessible at https://zhlapp.shinyapps.io/Korea_ODC-shap-model/). Implemented via the R Shiny platform, this tool provides interactive risk assessment functionality, enabling healthcare practitioners to input 24 core indicators—including insulin concentration, white blood cell count, height, and age—for real-time ODC risk stratification, as visually demonstrated in Supplementary Figure S7.

Discussion

This investigation established an XGBoost-based machine learning framework for predicting obesity-depression comorbidity (ODC) risk among physically inactive adults, leveraging KNHANES data. SHAP methodology provided interpretable information on key predictors and their interactions within the model. Principal findings include: (1) The developed XGBoost model demonstrated robust discriminatory capacity in internal validation (2011 and 2012 KNHANES cohorts; AUCs = 0.783 and 0.744 respectively) and exceptional generalizability in an independent NHANES validation cohort (AUC = 0.886), significantly outperforming comparator models and confirming clinical utility for early ODC detection; (2) SHAP interpretability analysis identified insulin concentration as the predominant ODC predictor (highest mean absolute SHAP value), followed sequentially by white blood cell count, age, retinol, height, and fasting glucose—highlighting central roles of metabolic regulation, nutritional status, and developmental indicators [, ], however, given the cross-sectional design of the present study, these findings should be interpreted as associations with model prediction rather than evidence of temporal or causal pathways underlying ODC; (3) Marked geographical heterogeneity in ODC prevalence across South Korean regions (e.g., peak prevalence of 10.7% in South Gyeongsang) provides epidemiological foundations for targeted public health initiatives; (4) Integration of machine learning with SHAP methodology effectively quantified individualized contributions of multidimensional features (demographic, clinical, nutritional) to ODC risk and delineated their complex nonlinear association patterns.

SHAP analysis consistently identified insulin concentration as the most influential ODC predictor across both internal (KNHANES) and external (NHANES) validation cohorts. Insulin resistance—a core pathophysiological feature of obesity—has been mechanistically linked to depressive symptomatology in prior research [–]. Hyperinsulinemia and impaired insulin signaling may promote emotional dysregulation through disruptions in central neurotransmitter metabolism (e.g., dopamine), neuroplasticity, and hypothalamic-pituitary-adrenal (HPA) axis function [–]. By quantitatively establishing insulin’s centrality in ODC risk prediction through machine learning, this study highlights a potentially important association between metabolic dysregulation and metabolic-mental health comorbidity. The substantial contribution of white blood cell count (second-highest SHAP value) suggests that systemic low-grade inflammation may be associated with the co-occurrence of obesity and depression [–], though specific inflammatory biomarkers were not directly assayed. The positive association with advancing age may reflect cumulative effects of chronic disease burden, psychosocial stressors, or physiological decline [, ]. Conversely, the inverse associations observed for greater height and elevated HDL-C concentrations may reflect differences in growth-related exposures and cardiometabolic health status in relation to ODC. These predictors suggest that ODC is associated with complex multisystem patterns spanning metabolic, inflammatory, and developmental domains.

This study substantiates the superiority of machine learning algorithms, particularly XGBoost, in predicting complex outcomes like ODC that involve nonlinear interactions among demographic, behavioral, metabolic, and nutritional determinants [–]. Compared to conventional regression approaches, XGBoost more effectively captures intricate patterns and interaction effects within high-dimensional data, achieving superior discriminatory performance in both internal and external validations (AUC >0.74). Crucially, through integration of SHAP (SHapley Additive exPlanations)—an explainable artificial intelligence (XAI) technique—we successfully demystified the decision logic of this sophisticated model [, ]. SHAP values not only objectively quantified individualized predictor contributions to ODC risk (e.g., insulin’s dominant role) but dependence plots also visually revealed nonlinear relationships between key variables (e.g., insulin, age, systolic blood pressure, white blood cell count) and disease probability. This methodological synthesis enhances model transparency and clinical interpretability by helping translate risk scores into interpretable prediction patterns. The resultant online prediction tool (Supplementary Figure S7) may support clinical practice by helping identify high-risk individuals and informing further clinical assessment. In practice, clinicians could enter routinely available demographic, clinical, and laboratory variables into the web-based interface to obtain an individualized predicted risk of obesity-depression comorbidity. This output may be used to support preliminary risk stratification and to identify patients who may benefit from further psychological or metabolic assessment, rather than to establish a diagnosis independently.

This investigation presents the first geographical heatmap of ODC risk distribution across South Korea (Figure 2), revealing substantial regional heterogeneity (highest burden in South Gyeongsang). Such disparities may originate from inter-regional variations in socioeconomic status, healthcare access, cultural practices (e.g., dietary habits, physical activity norms), or environmental exposures. These findings provide critical epidemiological foundations for South Korea and comparable settings to develop regionalized precision prevention strategies. In high-prevalence regions (e.g., South Gyeongsang, South Chungcheong), community health initiatives should prioritize physical activity promotion, nutritional quality improvement, and enhanced access to integrated metabolic-mental health screening services.

Study limitations

Several methodological constraints warrant acknowledgment: First, the development dataset was based on KNHANES 2007–2012, and temporal changes in lifestyle patterns, obesity prevalence, mental health awareness, and public health policies over the past decade may limit the model’s direct applicability to contemporary populations. Future studies should therefore assess the temporal transportability of the model using more recent datasets and update or recalibrate it as needed. Second, depression was defined using PHQ-9 screening rather than a structured clinical diagnosis. Third, missing data were handled using complete-case analysis without imputation. Although this approach avoided additional assumptions introduced by imputation models, it reduced the effective sample size and may have introduced selection bias if the missingness mechanism was not completely random. Future studies should evaluate the robustness of the findings using multiple imputation or other sensitivity analyses. Fourth, though demonstrating robust performance in the NHANES validation cohort, population-specific genetic backgrounds, cultural contexts, and social structures in South Korea may limit global generalizability, necessitating further validation across diverse populations, Future research should prioritize validation and recalibration in East Asian populations that are more comparable (such as Chinese or Japanese populations). Fifth, feature engineering excluded highly correlated variables (r > 0.8); while statistically justified, this process may have omitted biologically relevant indicators (e.g., vitamin A, hematocrit). Sixth, although the model achieved acceptable sensitivity at the optimized decision threshold of 0.1, its precision remained relatively low, indicating a substantial false-positive burden. In practical clinical settings, this may reduce efficiency, contribute to alert fatigue among clinicians, and lead to unnecessary follow-up assessments or anxiety in individuals incorrectly classified as high risk. Therefore, the model should be regarded as a preliminary risk stratification tool rather than a standalone screening or diagnostic instrument. Finally, although obesity in the NHANES cohort was defined using U.S.-appropriate criteria, the absence of U.S.-specific recalibration means that the external validation results should still be interpreted primarily in terms of discrimination and transportability rather than calibration equivalence.

Conclusion

Utilizing a large-scale population-based dataset, this study developed and validated an interpretable XGBoost model for ODC risk prediction in physically inactive adults. SHAP analysis identified insulin as the most influential predictor within the model, indicating that metabolic variables may play an important role in model-based risk stratification. However, these feature-attribution results reflect predictive relevance within the algorithm rather than confirmed biological mechanisms. Accordingly, the findings should be interpreted as hypothesis-generating and supportive of further prospective, experimental, and interventional studies.

Statements

Data availability statement

The data used in this study are publicly available and can be freely downloaded from the KNHANES website (https://knhanes.kdca.go.kr/).

Ethics statement

The Korea National Health and Nutrition Examination Survey (KNHANES) was approved by the Institutional Review Board (IRB) of the Korea Centers for Disease Control and Prevention. Written informed consent was obtained from all participants. This study was conducted in accordance with the ethical principles of the Declaration of Helsinki for medical research involving human subjects.

Author contributions

Conceptualization: YS and ZL, Methodology: YS, Data curation: YS, Formal analysis: ZL, Writing original draft: KJ, Visualization: FC and JZ, Writing review and editing: YC and KW, Supervision: FC, Funding acquisition: YZ. All authors contributed to the article and approved the submitted version.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This work was supported by the National Natural Science Foundation of China (No. 82300780), the Natural Science Foundation of Jiangsu Province (No. BK20220306), Yancheng Key Research and Development Plan (Social Development) Project (No. YCBE202214).

Acknowledgments

The authors thank colleagues for their contributions.

Conflict of interest

The authors declare that they do not have any conflicts of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.ssph-journal.org/articles/10.3389/ijph.2026.1609153/full#supplementary-material

References

1.
PerdomoCMCohenRVSumithranPClémentKFrühbeckG. Contemporary medical, device, and surgical therapies for obesity in adults. Lancet (2023) 401(10382):1116–30. 10.1016/S0140-6736(22)02403-5
2.
RichardsonEPattersonRMeltzer-BrodySMcClureRTowA. Transformative therapies for depression: postpartum depression, major depressive disorder, and treatment-resistant depression. Annu Rev Med (2025) 76(1):81–93. 10.1146/annurev-med-050423-095712
3.
GerardoGPetersonNGoodpasterKHeinbergL. Depression and obesity. Curr Obes Rep (2025) 14(1):5. 10.1007/s13679-024-00603-x
4.
ZouYPitchumoniCS. Obesity, obesities and gastrointestinal cancers. Dis Mon (2023) 69(12):101592. 10.1016/j.disamonth.2023.101592
5.
FabricatoreANWaddenTA. Obesity. Annu Rev Clin Psychol (2006) 2:357–77. 10.1146/annurev.clinpsy.2.022305.095249
6.
LinZLawrenceWRHuangYLinQGaoY. Classifying depression using blood biomarkers: a large population study. J Psychiatr Res (2021) 140:364–72. 10.1016/j.jpsychires.2021.05.070
7.
MilaneschiYSimmonsWKvan RossumEFCPenninxBW. Depression and obesity: evidence of shared biological mechanisms. Mol Psychiatry (2019) 24(1):18–33. 10.1038/s41380-018-0017-5
8.
ParkJHMoonJHKimHJKongMHOhYH. Sedentary lifestyle: overview of updated evidence of potential health risks. Korean J Fam Med (2020) 41(6):365–73. 10.4082/kjfm.20.0165
9.
MoultonCDTharmarajaTHopkinsCWP. Collaborative care for adults with obesity and depression. Jama (2019) 322(4):367–8. 10.1001/jama.2019.6774
10.
LasserreAMGlausJVandeleurCLMarques-VidalPVaucherJBastardotFet alDepression with atypical features and increase in obesity, body mass index, waist circumference, and fat mass: a prospective, population-based study. JAMA Psychiatry (2014) 71(8):880–8. 10.1001/jamapsychiatry.2014.411
11.
MarchitelliSMazzaCRicciEFaiaVBiondiSColasantiMet alIdentification of psychological treatment dropout predictors using machine learning models on Italian patients living with overweight and obesity ineligible for bariatric surgery. Nutrients (2024) 16(16):2605. 10.3390/nu16162605
12.
OtteCChaeWRDoganDYPiberDRoepkeSChoABet alSimvastatin as Add-On treatment to escitalopram in patients with major depression and obesity: a randomized clinical trial. JAMA Psychiatry (2025) 82(8):759–67. 10.1001/jamapsychiatry.2025.0801
13.
JitteSKeluthSBishtPWalPSinghSMurtiKet alObesity and depression: common link and possible targets. CNS Neurol Disord Drug Targets (2024) 23(12):1425–49. 10.2174/0118715273291985240430074053
14.
Pérez-GutiérrezAMCarmonaRLouceraCCervillaJAGutiérrezBMolinaEet alMutational landscape of risk variants in comorbid depression and obesity: a next-generation sequencing approach. Mol Psychiatry (2024) 29(11):3553–66. 10.1038/s41380-024-02609-2
15.
HrubyAMansonJEQiLMalikVSRimmEBSunQet alDeterminants and consequences of obesity. Am J Public Health (2016) 106(9):1656–62. 10.2105/AJPH.2016.303326
16.
CasanovaFO'LoughlinJKarageorgiouVBeaumontRNBowdenJWoodARet alEffects of physical activity and sedentary time on depression, anxiety and well-being: a bidirectional Mendelian randomisation study. BMC Med (2023) 21(1):501. 10.1186/s12916-023-03211-z
17.
CurtissJDiPietroC. Machine learning in the prediction of treatment response for emotional disorders: a systematic review and meta-analysis. Clin Psychol Rev (2025) 120:102593. 10.1016/j.cpr.2025.102593
18.
QiXWangSFangCJiaJLinLYuanT. Machine learning and SHAP value interpretation for predicting comorbidity of cardiovascular disease and cancer with dietary antioxidants. Redox Biol (2025) 79:103470. 10.1016/j.redox.2024.103470
19.
LeeHAKimHRParkHJungSYJeonJPParkBet alData resource profile: the statistics of the korea national health and nutrition examination survey (KNHANES) biobank project. J Korean Med Sci (2025) 40(23):e189. 10.3346/jkms.2025.40.e189
20.
KweonSKimYJangMJKimYKimKChoiSet alData resource profile: the korea national health and nutrition examination survey (KNHANES). Int J Epidemiol (2014) 43(1):69–77. 10.1093/ije/dyt228
21.
GuHChenRFangTXuJZhangYBianCet alAssociations of physical activity with the risks of osteoarthritis and subtypes: a population-based cohort study of UK biobank data. Bone Joint Res (2025) 14(7):656–65. 10.1302/2046-3758.147.BJR-2024-0529.R1
22.
Vilar-GomezENephewLDVuppalanchiRGawriehSMladenovicAPikeFet alHigh-quality diet, physical activity, and college education are associated with low risk of NAFLD among the US population. Hepatology (2022) 75(6):1491–506. 10.1002/hep.32207
23.
ChudasamaYVKhuntiKKZaccardiFRowlandsAVYatesTGilliesCLet alPhysical activity, multimorbidity, and life expectancy: a UK biobank longitudinal study. BMC Med (2019) 17(1):108. 10.1186/s12916-019-1339-0
24.
SteffensDC. Treatment-resistant depression in older adults. N Engl J Med (2024) 390(7):630–9. 10.1056/NEJMcp2305428
25.
MartinezATekluSMTahirPGarciaME. Validity of the spanish-language patient health questionnaires 2 and 9: a systematic review and meta-analysis. JAMA Netw Open (2023) 6(10):e2336529. 10.1001/jamanetworkopen.2023.36529
26.
ShaoHLiuXZongDSongQ. Optimization of diabetes prediction methods based on combinatorial balancing algorithm. Nutr Diabetes (2024) 14(1):63. 10.1038/s41387-024-00324-z
27.
WatsonKTSimardJFHendersonVWNutkiewiczLLamersFRasgonNet alAssociation of insulin resistance with depression severity and remission status: defining a metabolic endophenotype of depression. JAMA Psychiatry (2021) 78(4):439–41. 10.1001/jamapsychiatry.2020.3669
28.
EhrmannDKrause-SteinraufHUschnerDWenHHoogendoornCJCrespo-RamosGet alDifferential associations of somatic and cognitive-affective symptoms of depression with inflammation and insulin resistance: cross-sectional and longitudinal results from the emotional distress sub-study of the GRADE study. Diabetologia (2025) 68(7):1403–15. 10.1007/s00125-025-06369-8
29.
TimonenMLaaksoMJokelainenJRajalaUMeyer-RochowVBKeinänen-KiukaanniemiS. Insulin resistance and depression: cross sectional study. Bmj (2005) 330(7481):17–8. 10.1136/bmj.38313.513310.F71
30.
GruberJHanssenRQubadMBouzouinaASchackVSochorHet alImpact of insulin and insulin resistance on brain dopamine signalling and reward processing - an underexplored mechanism in the pathophysiology of depression?Neurosci Biobehav Rev (2023) 149:105179. 10.1016/j.neubiorev.2023.105179
31.
de BartolomeisADe SimoneGDe PriscoMBaroneANapoliRBeguinotFet alInsulin effects on core neurotransmitter pathways involved in schizophrenia neurobiology: a meta-analysis of preclinical studies. Implications for the treatment. Mol Psychiatry (2023) 28(7):2811–25. 10.1038/s41380-023-02065-4
32.
ChoudharySMouryaAAhujaSSahSPKumarA. Plausible anti-inflammatory mechanism of resveratrol and caffeic acid against chronic stress-induced insulin resistance in mice. Inflammopharmacology (2016) 24(6):347–61. 10.1007/s10787-016-0287-y
33.
SarwarHRafiqiSIAhmadSJinnaSKhanSAKarimTet alHyperinsulinemia associated depression. Clin Med Insights Endocrinol Diabetes (2022) 15:11795514221090244. 10.1177/11795514221090244
34.
ChenMHHsuJWHuangKLTsaiSJSuTPLiCTet alRole of obesity in systemic low-grade inflammation and cognitive function in patients with bipolar I disorder or major depressive disorder. CNS Spectr (2021) 26(5):521–7. 10.1017/S1092852920001534
35.
PalmerERMorales-MuñozIPerryBIMarwahaSWarwickERogersJCet alTrajectories of inflammation in youth and risk of mental and cardiometabolic disorders in adulthood. JAMA Psychiatry (2024) 81(11):1130–7. 10.1001/jamapsychiatry.2024.2193
36.
Guillemot-LegrisOMuccioliGG. Obesity-induced neuroinflammation: beyond the hypothalamus. Trends Neurosci (2017) 40(4):237–53. 10.1016/j.tins.2017.02.005
37.
FormanDEKuchelGANewmanJCKirklandJLVolpiETaffetGEet alImpact of geroscience on therapeutic strategies for older adults with cardiovascular disease: JACC scientific statement. J Am Coll Cardiol (2023) 82(7):631–47. 10.1016/j.jacc.2023.05.038
38.
Calderón-LarrañagaAVetranoDLWelmerAKGrandeGFratiglioniLDekhtyarS. Psychological correlates of multimorbidity and disability accumulation in older adults. Age Ageing (2019) 48(6):789–96. 10.1093/ageing/afz117
39.
DragosloveanuSVulpeDEAndreiCANedeleaDGGarofilNDAnghelCet alPredicting periprosthetic joint infection: evaluating supervised machine learning models for clinical application. J Orthop Translat (2025) 54:51–64. 10.1016/j.jot.2025.06.016
40.
NgiamKYKhorIW. Big data and machine learning algorithms for health-care delivery. Lancet Oncol (2019) 20(5):e262–e273. 10.1016/S1470-2045(19)30149-4
41.
HandelmanGSKokHKChandraRVRazaviAHLeeMJAsadiH. eDoctor: machine learning and the future of medicine. J Intern Med (2018) 284(6):603–19. 10.1111/joim.12822
42.
WatsonDSKrutzinnaJBruceINGriffithsCEMcInnesIBBarnesMRet alClinical applications of machine learning algorithms: beyond the Black box. Bmj (2019) 364:l886. 10.1136/bmj.l886
43.
OzkanJ. Thinking outside the black box: Cardiopulse takes a look at some of the issues raised by machine learning and artificial intelligence. Eur Heart J (2023) 44(12):1007–9. 10.1093/eurheartj/ehac790

Summary

Keywords

KNHANES, machine learning prediction, obesity-depression comorbidity, physical inactivity, SHAP interpretability

Citation

Shangguan Y, Lin Z, Sim Y-J, Wu K, Chu Y, Huang K, Chen F, Ji K, Chen F and Liu S (2026) Development and external validation of an interpretable machine learning model for obesity-depression comorbidity in Korean and US adults. Int. J. Public Health 71:1609153. doi: 10.3389/ijph.2026.1609153

Received

02 October 2025

Revised

20 April 2026

Accepted

07 May 2026

Published

28 May 2026

Volume

71 - 2026

Edited by

Gabriel Gulis, University of Southern Denmark, Denmark

Reviewed by

Two reviewers who chose to remain anonymous

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Kangkang Ji, kyrie@mail.ustc.edu.cn; Fang Chen, jsdxchenfang@126.com; Shangrui Liu, lsr980324@knu.ac.kr

† These authors have contributed equally to this work

This Original Article is part of the IJPH Special Issue “Artificial Intelligence (AI) and Public Health”

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

ORIGINAL ARTICLE

Development and external validation of an interpretable machine learning model for obesity-depression comorbidity in Korean and US adults

Abstract

Introduction