Type 2 Diabetes Risk: Metabolites, Genetics, and Lifestyle
“`html
Study participants and ethics approval
Table of Contents
- Study participants and ethics approval
- Metabolome-wide association analysis for incident T2D
- genetic correlation r
g between metabolites and T2D-related traits - Genetic colocalization
- MWASs for modifiable risk factors
- Mediation analysis between risk factors, metabolites and T2D risk
- Women’s Health Initiative (WHI) and Type 2 Diabetes (T2D) Prediction
Our MWAS for incident T2D involves the use of data from ten prospective cohorts, including the Nurses’ health Study (NHS; initiated in 1976 with 121,701 female nurses aged 30-55 years9,50), NHS2 (started in 1989 with 116,429 female nurses aged 25-42 years9,50), Health professionals Follow-Up Study (HPFS; started in 1986 with 51,529 male health professions aged 40-75 years9),Hispanic Community Health Study/Study of Latinos (SOL; enrolled 16,415 Hispanic/LatinoIn each cohort,complete data on demographics,medical and family history,diet,lifestyle and other health details were collected at baseline and were updated during longitudinal follow-ups.Blood samples were collected at baseline and/or during follow-ups. Our MWAS for incident T2D included participants with qualified metabolomics data, and were free of diabetes, cardiovascular disease and cancer at study baseline.The final analysis included 6,890 participants from NHS; 3,692 from NHS2 and 2,529 from HPFS; 2,821 from SOL; 1,392 from WHI; 1,288 white and 1,433 Black participants from ARIC; 1,424 from FHS; 902 from MESA; 378 from BPRHS and 885 from PREDIMED (Extended Data Table 1). Each study was approved by Institutional Review Boards at respective institutions or study centers,and all participants provided informed consent. Our GWAS for metabolites included participants from eight cohorts comprising NHS, NHS2, HPFS, SOL, WHI, ARIC, FHS and, in addition, the Cardiovascular Health Study (CHS; enrolled 5,201 adults during 1989-1990 and 678 predominantly Black participants in 1992-1993). (Supplementary table 7).the detailed descriptions of the design, data collection, ethical review of each cohort, and our inclusion and exclusion criteria are provided in Supplementary Methods.
In all cohorts, incident T2D was defined when a participant was free of diabetes at baseline but was identified as having T2D during longitudinal follow-up. Detailed information on diagnosis criteria in each cohort is included in supplementary Methods.T2D cases in MESA and BPRHS were resolute according to the ADA criteria66, which included fasting plasma glucose level ≥7.0 mmol l−1 or the use of antidiabetic medications or insulin56,67. In PREDIMED, T2D was adjudicated through blind assessment by a Clinical Endpoint and Adjudication of Events Committee, based on the ADA criteria68.
### Assessment of diet, lifestyle factors and covariates
Detailed information on data collection in each cohort is in Supplementary Methods. Briefly, demographic factors (such as, self-reported sex, and race and ethnicity), socioeconomic status, health information (for exmaple, medical conditions and family history) and lifestyle (for example, smoking history and PAs), anthropometrics and blood pressure, were collected at baseline and follow-up visits, through self-administrated questionnaires, or in-person or telephone-baseIn SOL, diet was assessed using two 24-h dietary recalls and a food propensity questionnaire74. The overall dietary quality was assessed by the Alternate Healthy Eating Index-2010 (AHEI-2010)75 in all cohorts accept for the PREDIMED trial, in which it was assessed by a 14-item Mediterranean Diet Adherence Screener score57. In NHS/HPFS, SOL and WHI, we also calculated baseline consumptions of 15 main food groups in the unit of servings per day.
Metabolomic profiling, quality control and data harmonization
metabolomic profiling in NHS/HPFS, WHI, MESA, PREDIMED, FHS and CHS was conducted with the Metabolomics Platforms at the Broad Institute of MIT and Harvard University, using three to four complementary LC-MS methods9,65.
cohort. Samples were removed if their metabolite detection rate was <80%, or were identified as outliers by multidimensional scaling analysis within a specific race/ethnic group. Metabolites were filtered if their detection rate across samples was <80% and, if applicable, had a coefficient of variation >20% for quality control (QC) samples. After quality filtering, missingness of each metabolite were imputed using the half minimum value, and the data were then standardized for analysis. Across all cohorts, we matched metabolites by their HMDB ID and/or PubChem ID, provided by the corresponding metabolomic laboratories. A total of 1,273 named metabolites were initially qualified for analysis in at least one cohort. To reduce single-study bias, we limited our analyses to 469 metabolites that were available in at least four self-reliant cohorts, or available in at least three independent cohorts if the three cohorts covered both metabolomic platforms. 407 metabolites from NHS, 363 from NHS2, 291 from HPFS, 364 from WHI, 327 from MESA, 274 from PREDIMED, 188 from FHS, 283 from SOL, 139 from ARIC and 231 from BPRHS were harmonized for our analysis (Extended Data Table 1).In CHS,411 metabolites were included in genetic analyses (Supplementary Table 7). Details of the metabolomic profiling, QC and data processing are in the Supplementary Methods.
Metabolome-wide association analysis for incident T2D
Details of analytical approaches and models are provided in Supplementary Methods and Supplementary Table 1. Briefly, all association analyses were conducted separately for each cohort, stratified by major racial/ethnic groups when sample sizes permitted. Metabolites were inversely normal transformed by each substudy and racial/ethnic group (if applicable) in each cohort. To analyze the association between each metabolite and T2D risk, we applied Cox regression for studies of longitudinal cohort design (NHS excluding the T2D nested case-control substudy, NHS2, HPFS, SOL, ARIC, WHI, FHS, MESA and BPRHS); logistic regression for the NHS T2D nested case-control substudy; and Cox regression with Barlow weights80 and robust estimators for the PREDIMED T2D nested case-cohort study. The basic m
85,86,87,88,89,90,91,n = 6,610, range 971-8,054) and WHI (n = 1,256) using the RVTESTS tool6,42,106. Canonical pathway enrichment analyses was conducted using the MetaCore software with the default background107; and we compared top enriched pathways for genes annotated to mQTLs of T2D-related metabolites versus those of non-associated metabolites. We calculated the R2 of each metabolite explained by independent lead genetic variants using the formula ({sum }_{i=1}^{k}beta times beta times 2times {rm{MAF}}times (1-{rm{MAF}})), in which k is the number of independent lead variants, and β is the association coefficient between the variant and the metabolite. We compared the R2 distribution for the T2D-associated versus non-associated metabolites using Wilcoxon test.
We acquired publicly available GWAS summary statistics from large consortium studies for T2D (180,834 cases and 1,159,055 controls)27, fasting insulin (N = 98,210)114. For each trait, we compared the distribution of its rg with T2D-associated versus non-associated metabolites, using chi-squared test, and considered FDR < 0.05 (correcting for numbers of comparisons tested) as statistically significant.
Genetic colocalization
We obtained tissue-specific cis-eQTLs summary statistics from the GTEx project v.8115,116. The shared causal variants between each metabolite and tissue-specific transcriptome from 47 tissue types, were examined using colocalization analysis implemented in the coloc.abf() function in R package ‘coloc’ v.5117. For each metabolite,we input the GWAS summary statistics for all variants within ±500 kb of its independent lead variants (Supplementary Methods).A poste
MWASs for modifiable risk factors
We fitted linear models to regress inversely normal transformed metabolite levels on age, sex (only in SOL), current smoking status, BMI, PA, intakes of 15 main food groups and fasting status, simultaneously together with cohort-specific covariates.Analyses were conducted in NHS/HPFS, SOL and WHI, separately, further stratified by substudies or racial groups (Supplementary Methods). Association coefficients between metabolites and each particular risk factor were then combined across analytical sets using a fixed-effect IVW meta-analysis. The R2 of each metabolite explained by specific risk factors were first calculated in each analytical set using the formula (beta times beta times {mathrm{variance}}left({mathrm{risk}}; {mathrm{factor}}right)!{/mathrm{variance}}left({mathrm{metabolite}}right)), with the β being the association coefficients between the metabolite and the risk factor; and then averaged across all analytical sets. We compared the distributions of R2 for T2D-associated versus non-associated metabolites using the Wilcoxon test.
Mediation analysis between risk factors, metabolites and T2D risk
Details for mediation analysis are described in Supplementary Methods. Briefly,our analysis focused on BMI,PA,coffee/tea consumption and red/processed meat intake. for each risk factor, metabolites (1) that were associated with both the risk factor and T2D risk and (2) whose association directions with the risk factor and T2D risk were consistent with the pre-assumed epidemiological relationships between the risk factor and T2D risk, were considered. We tested whether, and to what degree, each metabolite mediated the association between a risk factor and T2D risk using the CMAverse R package120,adjusting age,sex,smoking,BMI and PA (if not the tested risk factor),calorie intake and other cohort-specific covariates,separately in NHS/HPFS,SOL and WHI.We combined total, indirect and direct effects, respectively, from each analytical set using a fixed-effect meta-analysis. The mediated proportion was calculated by dividing indirect effect to total effect.Metabolites with an indirect effect FDR < 0.05 and a consistent effect direction between the indirect and total effects, was considered as a potential mediator between a risk factor and T2D risk.
Women’s Health Initiative (WHI) and Type 2 Diabetes (T2D) Prediction
The Women’s Health Initiative (WHI) was utilized in a study employing elastic net regularization for predicting Type 2 Diabetes (T2D). The resulting model, based on selected metabolites, demonstrated high consistency with models developed using independent held-out cohorts. As of January 15,2026,this finding remains consistent with the published research.
Elastic Net Regularization in T2D Prediction
Elastic net regularization is a statistical method that combines the penalties of both L1 (Lasso) and L2 (Ridge) regularization, used to improve the accuracy and interpretability of regression models, particularly when dealing with high-dimensional data. It helps to prevent overfitting and variable selection. National Centre for Biotechnology Information defines elastic net as a regularization path that can select groups of correlated predictors.
In the WHI study, this technique was applied to metabolite data to predict the onset of T2D. The consistency of the selected metabolites across different cohorts suggests the robustness of the identified biomarkers.
Women’s Health Initiative (WHI)
The Women’s Health Initiative (WHI) is a long-term national health study funded by the National Heart, Lung, and Blood Institute (NHLBI). The WHI website details the study’s goals, which include identifying factors that affect women’s health and developing strategies for preventing heart disease, breast and other cancers, and osteoporosis.
The WHI enrolled nearly 161,808 women aged 50-79 years between 1991 and 1998 and has provided valuable insights into women’s health across various domains, including cardiovascular disease, cancer, and diabetes. NHLBI’s WHI overview provides further details on the study’s scope and impact.
Metabolite Biomarkers and T2D
Metabolites are small molecule intermediates and products of metabolism. Changes in metabolite levels can serve as biomarkers for various diseases, including T2D. Science.org published research detailing the role of metabolomics in understanding T2D.
The study’s finding that specific metabolites consistently predicted T2D across different WHI cohorts strengthens the evidence for their potential use in risk assessment and early detection. Supplementary Table 18a (referenced in the source text) provides a detailed list of these metabolites and their corresponding coefficients in the final model. (Note: Access to the full supplementary table requires access to the original publication).
Reporting Summary
A Nature Portfolio Reporting Summary is available for this research, providing further information on the study design and methodology.Nature Portfolio Reporting Summary details adherence to reporting guidelines.
