Papers 166 & 167 · Proof-of-Concept Methodology
Not screen time. Not “social media” in general. Specific, verifiable design features — algorithmic feeds, autoplay, opaque recommendations — tested against adolescent mental health data from CDC YRBS (N=7 time points) and PISA 2022 (80 countries). Preliminary results are directionally consistent. Every feature can be checked against public records. Larger datasets needed for definitive evidence.
Status: proof-of-concept. The U.S. analysis uses N=7 ecological time points — R² values are high but confidence intervals are wide. The PISA dose-response is J-shaped (light users score highest) and the headline slope is significant only among users (excluding non-users). The methodology (verifiable feature scoring) is the contribution; the effect size estimates are preliminary.
01 · The Finding
We scored 10 social media platforms on 13 binary/ordinal design features for each year from 2011 to 2023. We weighted each platform by teen adoption rates (Pew Research) and tested whether this feature-weighted exposure predicts CDC Youth Risk Behavior Survey outcomes better than simply counting how many teens use social media.
Directionally, yes — in all six outcomes (mean ΔR² = +0.048). However, at N=7 the ΔR² has not been formally tested for significance. The consistency across outcomes is more meaningful than any individual R² value.
| Outcome | R² (Features) | R² (Raw) | ΔR² | p |
|---|---|---|---|---|
| Persistent sadness | 0.799 | 0.703 | +0.095 | 0.007 |
| Suicidal ideation | 0.691 | 0.638 | +0.053 | 0.021 |
| Suicide planning | 0.665 | 0.634 | +0.031 | 0.025 |
| Attempted suicide | 0.222 | 0.209 | +0.013 | 0.286 |
| Electronic bullying | 0.055 | 0.055 | +0.000 | 0.612 |
| Sadness (girls only) | 0.835 | 0.739 | +0.095 | 0.004 |
The single strongest predictor: opaque_recommendation — whether the platform surfaces content from accounts the user doesn’t follow via opaque algorithms (FYP-style feeds). At N=7, this feature shows R²=0.938 for female sadness (p=0.0003). Standard caveat: any monotonically increasing variable produces high R² against a monotonic trend at N=7. The feature ranking (opacity first) is more informative than any single R².
Electronic bullying is flat across the entire series (16% in 2011 and 2023) — providing no trend to predict. That it shows no signal is a good sign: the features are not spuriously correlating with everything.
02 · The Features
No subjective judgment. No framework knowledge required. Every feature can be checked against app changelogs, press releases, Pew surveys, and Wayback Machine captures.
| Feature | Category | Scale | Avg R² |
|---|---|---|---|
| Opaque recommendation | Opacity | 0–2 | 0.852 |
| Real-time metrics | Reactivity | 0–2 | 0.802 |
| Social comparison visibility | Coupling | 0–2 | 0.802 |
| Infinite scroll | Reactivity | 0–1 | 0.779 |
| Push notifications | Reactivity | 0–2 | 0.724 |
| Autoplay video | Opacity | 0–2 | 0.716 |
| Algorithmic feed | Opacity | 0–2 | 0.695 |
| Hidden ranking signals | Opacity | 0–2 | 0.677 |
| Beauty/AR filters | Coupling | 0–1 | 0.660 |
| Identity persistence | Coupling | 0–2 | 0.611 |
| Disappearing content | Coupling | 0–1 | 0.475 |
| Streaks / daily hooks | Reactivity | 0–1 | 0.401 |
| Default-public minor profiles | Coupling | 0–1 | 0.005 |
Opacity features dominate. Average R² by category: Opacity 0.549 > Reactivity 0.493 > Coupling 0.375. The features that hide what the algorithm is doing predict more harm than the features that make the platform sticky or socially comparative.
03 · The Inflection
The largest single-year increase in feature exposure (+38%) coincides with Instagram’s March 2016 algorithmic feed launch and August 2016 Stories launch. In the feature matrix, Instagram’s algorithmic_feed went from 0 to 2, opaque_recommendation from 0 to 1, and disappearing_content from 0 to 1 — all in one year.
Female persistent sadness was stable at 39% from 2013–2015, then rose to 41% in 2017 and accelerated to 47% by 2019 and 57% by 2021. Internal Meta documents (Haugen disclosures) confirm this was a deliberate engagement strategy.
04 · Cross-National Replication (Paper 167)
Paper 167 tests whether the same features predict adolescent wellbeing in the OECD PISA 2022 dataset — a completely independent survey with a different outcome measure (life satisfaction on a 0–10 scale) across 80 countries.
Dose-response is J-shaped, not monotonic. Light users (<1hr) report the highest life satisfaction (7.04), higher than non-users (6.98). The dose-response slope is significant only among users (categories 2–6: slope = −0.104, p = 0.007). Including non-users: slope = −0.046, p = 0.051 (not significant).
Each step up in SM use (~2hrs): −0.176 life satisfaction points
Flat in most countries. Full-range slopes: girls −0.097 vs boys +0.005.
91% of countries show girls more affected than boys (43/47 countries, paired t = −8.42, p < 0.000001). This is not a U.S.-specific phenomenon.
Within economically comparable Western European countries (N=13), feature-weighted platform exposure predicts life satisfaction at r = −0.648 (p = 0.017), surviving GDP control (partial r = −0.580, p = 0.038). Bootstrap 95% CI: [−0.87, −0.12].
Instagram-specific web share is the strongest global ecological predictor: r = −0.373, p = 0.008. Countries where Instagram captures more social media web traffic show lower adolescent life satisfaction.
05 · Limitations
Small N (Paper 166). Seven YRBS time points. This is signal detection, not definitive proof. Individual-level microdata with platform-specific usage would be needed for publication in an epidemiology journal.
Ecological correlation. Population-level association, not individual-level causation. Simpson’s paradox cannot be ruled out without individual data.
Cross-sectional (Paper 167). PISA 2022 is a snapshot. Heavy social media users may report lower life satisfaction for reasons unrelated to social media. Longitudinal data (e.g., ABCD Study) would be needed to establish temporal precedence.
StatCounter measures web traffic, not app usage. TikTok — the highest-feature platform — records 0% web share because it’s app-only. The global ecological null (r = +0.108, p = 0.455) is a measurement artifact, not evidence against the hypothesis.
Confounders not fully controlled. Smartphones, COVID-19 (2021 wave), economic conditions, and cultural factors are not independently controlled. The comparison to raw adoption partially addresses this — both metrics share the same confounders.
06 · Pre-Registered Predictions
| Prediction | Result |
|---|---|
| Feature exposure outperforms raw adoption (sadness) | ΔR² = +0.095 |
| Feature exposure outperforms raw adoption (suicidal ideation) | ΔR² = +0.053 |
| Feature exposure outperforms raw adoption (female sadness) | ΔR² = +0.095 |
| O-type features have higher avg R² than R-type | 0.549 vs 0.493 |
| O-type features have higher avg R² than α-type | 0.549 vs 0.375 |
| opaque_recommendation is single strongest predictor | Avg R² = 0.852 |
| 2016 feature jump is largest year-over-year change | +17.1 vs next +10.1 |
| Prediction | Result |
|---|---|
| Global dose-response is negative | Partial: −0.104 (users only, p=0.007); full range p=0.051 |
| Female slope steeper than male | 91% of countries, paired t = −8.42, p < 0.000001 |
| W. Europe features predict wellbeing | r = −0.648, p = 0.017 |
| O-type features dominate | Partial — both sig. predictors are O-type |
| WhatsApp markets show weaker dose-response | Untestable (no WhatsApp data) |
| Survives GDP control | partial r = −0.580, p = 0.038 |
| Instagram-specific exposure predicts worse | r = −0.373, p = 0.008 |
Paper 166 (5/5): Feature ≤ raw in all outcomes (SURVIVED). O-type not dominant (SURVIVED). Feature worse than Pe in ≥4/6 (SURVIVED). No feature p < 0.01 (SURVIVED: opaque_rec p = 0.0003). Bullying shows spurious signal (SURVIVED: R² = 0.055).
Paper 167 (5/5 survived, 2/5 not yet triggered): Dose-response reversal (SURVIVED). Gender gap reversal (SURVIVED). W. Europe sign reversal after GDP (SURVIVED). All R² < 0.01 (SURVIVED). App-usage data null (NOT YET TESTED).
07 · Implications
If the feature-adoption separation holds at larger N, it would suggest that policies targeting specific design features (algorithmic feeds, autoplay) may be more effective than blanket screen time limits. At N=7 this is a hypothesis supported by directional evidence, not a proven policy prescription.
This provides a methodology for identifying specific platform design features associated with mental health outcomes. At N=7, the statistical power is insufficient for Daubert-level expert testimony as a standalone analysis. The value is the reproducible, verifiable approach — extensible to individual-level datasets (ABCD Study, Gallup) where the sample sizes would support formal evidentiary standards.
The 13 social media features are part of a broader taxonomy: 8 universal features + 109 domain-specific features across social media, AI systems, gambling, e-commerce, news, gaming, messaging, dating, finance, government, healthcare, education, streaming, insurance, real estate, and agriculture. Same verification standard. Same mechanical scoring. Ready for EU AI Act self-assessment and DSA transparency obligations.
EU AI Act Compliance →08 · Framework Connection
The Fantasia Bound: I(D;Y) + I(M;Y) ≤ H(Y). Engagement information and mechanism transparency share the same entropy budget. Maximizing engagement structurally requires hiding the optimization — this is a theorem derived from the Shannon chain rule, not a hypothesis.
The FYP-style feed is the purest implementation: the algorithm selects content for engagement (maximizing I(D;Y)) while the user has no visibility into why (minimizing I(M;Y)). That opaque_recommendation ranks as the single strongest feature predictor (R²=0.938 for female sadness at N=7) is consistent with the Fantasia Bound, though the N=7 caveat means the precise R² value is unstable.
The feature proxy (R²=0.799) performs within 2pp of the original Pe scoring (R²=0.780) for persistent sadness — but without the circularity. Replacing subjective O/R/α rubric with verifiable features loses almost nothing in predictive power.
| Platform | O | R | α | Pe |
|---|---|---|---|---|
| TikTok (FYP) | 3 | 3 | 3 | 22.1 |
| Instagram Reels | 3 | 2 | 3 | 18.7 |
| YouTube (algorithm) | 2 | 2 | 3 | 9.4 |
| Twitter/X (For You) | 2 | 2 | 2 | 3.8 |
09 · Reproduce It
Paper 166 (CDC YRBS): DOI 10.5281/zenodo.19339981 — CC-BY 4.0.
Paper 167 (PISA cross-national): DOI 10.5281/zenodo.19340038 — CC-BY 4.0.
PISA microdata: free download from OECD. StatCounter: free web interface. CDC YRBS: public domain. Total reproduction cost: $0. Runtime: under 30 minutes.