Popular Aging Clocks Are Mathematically Flawed and Miss Key Biology
UC Berkeley researchers expose critical incoherence in DNA methylation age clocks, showing up to 54% of their features contradict actual biological trends.
Summary
Researchers from UC Berkeley have published a rigorous critique of the most widely used DNA methylation (DNAm) age clocks, revealing that these machine learning models are mathematically optimized in ways that obscure real biology. Between 23.9% and 53.8% of CpG sites selected by major clocks have regression coefficients pointing in the opposite direction from their actual methylation change with aging — a problem the authors call 'incoherence.' These misaligned features account for 11.2% to 40.8% of total model weight. Additionally, all major clocks failed to reliably detect inflammaging — the shift from lymphoid to myeloid immune cells that is one of the most well-established hallmarks of aging. The paper argues that elastic net regression, the dominant method for building these clocks, is poorly suited to identifying true biological aging biomarkers because it systematically excludes high-variance CpGs that signal disease and pathological aging.
Detailed Summary
Biological age clocks based on DNA methylation have become a cornerstone of longevity research, used to evaluate interventions, estimate disease risk, and claim to reveal mechanisms of aging. But a new rapid communication from Irina Conboy's lab at UC Berkeley challenges the scientific foundations of these tools, presenting quantitative evidence that the most popular clocks are both mathematically incoherent and biologically misleading.
The core critique centers on elastic net (EN) regression, the penalized linear model used to build virtually all major DNAm clocks including Horvath, Hannum, PhenoAge, GrimAge, and PACE. EN selects a sparse subset of CpG sites — typically a few hundred out of 500,000–800,000 array probes — and assigns weights so their weighted sum predicts chronological age or a proxy. The authors demonstrate that this optimization process is inherently biased: it minimizes residuals by preferring CpGs with low variance across same-age samples, which systematically excludes the high-variance CpGs most likely to reflect pathological or accelerated biological aging. The very features most informative for distinguishing healthy from diseased aging are thus filtered out by design.
The paper introduces and quantifies the concept of 'incoherence': CpG sites whose model coefficient sign is opposite to the direction of their actual univariate methylation change with age. Across all major clocks analyzed, between 23.9% and 53.8% of selected CpG features were incoherent — in some models, more than half. These misaligned features contributed 11.2% to 40.8% of total model weight. The authors provide an interactive visualization of these misalignments for each model, making the error transparent and reproducible. This incoherence means that even when clock residuals (so-called 'biological age acceleration') are statistically significant, they cannot be meaningfully interpreted as reflecting underlying biology.
A particularly striking failure the paper documents is the inability of these clocks to detect inflammaging — the well-established age-related shift from lymphoid to myeloid immune cell populations accompanied by chronic low-grade inflammation. Since most DNAm clocks are trained on blood-derived datasets (PBMCs or leukocytes), inflammaging should be the easiest biological signal to detect. Yet systematic testing showed that clock predictions overlapped substantially for healthy individuals and patients with chronic inflammatory conditions including rheumatoid arthritis, multiple sclerosis, Parkinson's disease, and tauopathies. Even the PACE clock, trained on disease-risk scores, struggled to resolve this distinction, functioning primarily as a chronological age predictor.
The authors further clarify that EN clocks are also skewed toward leukocyte cell-fraction signals, with neutrophil composition dominating many models. When incoherence is rectified — aligning coefficient signs with actual methylation directions — the corrected model shows improved discrimination between healthy and patient populations, reduced neutrophil skew, and better detection of inflammaging, at the cost of reduced accuracy in predicting chronological age. This trade-off, the authors argue, is fundamental and underacknowledged: prioritizing mathematical fit to time-progression necessarily sacrifices biological interpretability. The paper closes by advocating for non-linear ML approaches that trace the natural trajectory of aging from primary data without forcing a linear framework onto an inherently non-linear process.
Key Findings
- Between 23.9% and 53.8% of CpG sites selected by major DNAm age clocks have regression coefficients pointing in the opposite direction from their actual methylation change with aging (termed 'incoherence')
- These incoherent features account for 11.2% to 40.8% of total model weight across clocks, meaning a substantial fraction of each model's predictive power is biologically misaligned
- All major conventional DNAm clocks failed or struggled to distinguish chronic inflammatory disease states (rheumatoid arthritis, multiple sclerosis, Parkinson's, tauopathies) from healthy aging in systematic testing
- Elastic net regression systematically excludes high-variance CpGs — precisely those most likely to reflect pathological biological aging — because their variability increases residuals during training
- Rectifying incoherence improved discrimination between healthy individuals and patient populations and reduced neutrophil/leukocyte fraction skew, but reduced chronological age prediction accuracy
- Clock residuals ('age acceleration') of under ±10 years fall within normal biological variation or random noise, undermining their use as biomarkers of intervention efficacy
- The PACE clock, despite being trained on disease-risk scores rather than chronological age, still functioned primarily as a chronological age proxy and failed to reliably detect inflammaging
Methodology
This is a rapid communication combining theoretical analysis, reanalysis of existing clock models, and illustrative simulations using hypothetical CpG datasets. The authors quantified incoherence across multiple published DNAm clocks by comparing each selected CpG's model coefficient sign to its univariate correlation direction with age, computing both the percentage of incoherent features and their proportional contribution to total model weight. Simulated CpG datasets were used to demonstrate how elastic net regression behaves under collinearity and variance conditions typical of biological aging data. No new human cohort data were collected; the paper relies on analytical reasoning and reanalysis of published clock architectures.
Study Limitations
This is a brief rapid communication without new experimental data or human cohort validation; the conclusions rest on theoretical and computational analysis of existing clock models. The authors do not provide a fully validated alternative clock, only a proof-of-concept that rectifying incoherence improves biological discrimination at the cost of chronological age accuracy. No conflicts of interest are declared; funding was from NIH (R01AG071787) and CDMRP (TX230133), both to senior author Irina Conboy.
Enjoyed this summary?
Get the latest longevity research delivered to your inbox every week.
