Papers 166 & 167 · Proof-of-Concept Methodology

Feature-based platform scoring: a proof-of-concept.

Not screen time. Not “social media” in general. Specific, verifiable design features — algorithmic feeds, autoplay, opaque recommendations — tested against adolescent mental health data from CDC YRBS (N=7 time points) and PISA 2022 (80 countries). Preliminary results are directionally consistent. Every feature can be checked against public records. Larger datasets needed for definitive evidence.

N=7 YRBS time points (CIs wide)
80 Countries (PISA 2022)
13 Verifiable features
91% Countries: girls steeper
12/12 Kill conditions survived

Status: proof-of-concept. The U.S. analysis uses N=7 ecological time points — R² values are high but confidence intervals are wide. The PISA dose-response is J-shaped (light users score highest) and the headline slope is significant only among users (excluding non-users). The methodology (verifiable feature scoring) is the contribution; the effect size estimates are preliminary.

01 · The Finding

Design features directionally outperform raw adoption (N=7, proof-of-concept).

We scored 10 social media platforms on 13 binary/ordinal design features for each year from 2011 to 2023. We weighted each platform by teen adoption rates (Pew Research) and tested whether this feature-weighted exposure predicts CDC Youth Risk Behavior Survey outcomes better than simply counting how many teens use social media.

Directionally, yes — in all six outcomes (mean ΔR² = +0.048). However, at N=7 the ΔR² has not been formally tested for significance. The consistency across outcomes is more meaningful than any individual R² value.

Outcome R² (Features) R² (Raw) ΔR² p
Persistent sadness 0.799 0.703 +0.095 0.007
Suicidal ideation 0.691 0.638 +0.053 0.021
Suicide planning 0.665 0.634 +0.031 0.025
Attempted suicide 0.222 0.209 +0.013 0.286
Electronic bullying 0.055 0.055 +0.000 0.612
Sadness (girls only) 0.835 0.739 +0.095 0.004

The single strongest predictor: opaque_recommendation — whether the platform surfaces content from accounts the user doesn’t follow via opaque algorithms (FYP-style feeds). At N=7, this feature shows R²=0.938 for female sadness (p=0.0003). Standard caveat: any monotonically increasing variable produces high R² against a monotonic trend at N=7. The feature ranking (opacity first) is more informative than any single R².

Electronic bullying is flat across the entire series (16% in 2011 and 2023) — providing no trend to predict. That it shows no signal is a good sign: the features are not spuriously correlating with everything.

02 · The Features

13 design features. All verifiable from public records.

No subjective judgment. No framework knowledge required. Every feature can be checked against app changelogs, press releases, Pew surveys, and Wayback Machine captures.

FeatureCategoryScaleAvg R²
Opaque recommendationOpacity0–20.852
Real-time metricsReactivity0–20.802
Social comparison visibilityCoupling0–20.802
Infinite scrollReactivity0–10.779
Push notificationsReactivity0–20.724
Autoplay videoOpacity0–20.716
Algorithmic feedOpacity0–20.695
Hidden ranking signalsOpacity0–20.677
Beauty/AR filtersCoupling0–10.660
Identity persistenceCoupling0–20.611
Disappearing contentCoupling0–10.475
Streaks / daily hooksReactivity0–10.401
Default-public minor profilesCoupling0–10.005

Opacity features dominate. Average R² by category: Opacity 0.549 > Reactivity 0.493 > Coupling 0.375. The features that hide what the algorithm is doing predict more harm than the features that make the platform sticky or socially comparative.

03 · The Inflection

2016: Instagram switches to algorithmic feed. Teen sadness accelerates.

The largest single-year increase in feature exposure (+38%) coincides with Instagram’s March 2016 algorithmic feed launch and August 2016 Stories launch. In the feature matrix, Instagram’s algorithmic_feed went from 0 to 2, opaque_recommendation from 0 to 1, and disappearing_content from 0 to 1 — all in one year.

Female persistent sadness was stable at 39% from 2013–2015, then rose to 41% in 2017 and accelerated to 47% by 2019 and 57% by 2021. Internal Meta documents (Haugen disclosures) confirm this was a deliberate engagement strategy.

04 · Cross-National Replication (Paper 167)

80 countries. Consistent direction. J-shaped dose-response.

Paper 167 tests whether the same features predict adolescent wellbeing in the OECD PISA 2022 dataset — a completely independent survey with a different outcome measure (life satisfaction on a 0–10 scale) across 80 countries.

Dose-response is J-shaped, not monotonic. Light users (<1hr) report the highest life satisfaction (7.04), higher than non-users (6.98). The dose-response slope is significant only among users (categories 2–6: slope = −0.104, p = 0.007). Including non-users: slope = −0.046, p = 0.051 (not significant).

-0.176
Girls: slope (users only, cat 2–6)

Each step up in SM use (~2hrs): −0.176 life satisfaction points

-0.032
Boys: slope (users only, cat 2–6)

Flat in most countries. Full-range slopes: girls −0.097 vs boys +0.005.

91% of countries show girls more affected than boys (43/47 countries, paired t = −8.42, p < 0.000001). This is not a U.S.-specific phenomenon.

Within economically comparable Western European countries (N=13), feature-weighted platform exposure predicts life satisfaction at r = −0.648 (p = 0.017), surviving GDP control (partial r = −0.580, p = 0.038). Bootstrap 95% CI: [−0.87, −0.12].

Instagram-specific web share is the strongest global ecological predictor: r = −0.373, p = 0.008. Countries where Instagram captures more social media web traffic show lower adolescent life satisfaction.

05 · Limitations

What this does not prove.

Small N (Paper 166). Seven YRBS time points. This is signal detection, not definitive proof. Individual-level microdata with platform-specific usage would be needed for publication in an epidemiology journal.

Ecological correlation. Population-level association, not individual-level causation. Simpson’s paradox cannot be ruled out without individual data.

Cross-sectional (Paper 167). PISA 2022 is a snapshot. Heavy social media users may report lower life satisfaction for reasons unrelated to social media. Longitudinal data (e.g., ABCD Study) would be needed to establish temporal precedence.

StatCounter measures web traffic, not app usage. TikTok — the highest-feature platform — records 0% web share because it’s app-only. The global ecological null (r = +0.108, p = 0.455) is a measurement artifact, not evidence against the hypothesis.

Confounders not fully controlled. Smartphones, COVID-19 (2021 wave), economic conditions, and cultural factors are not independently controlled. The comparison to raw adoption partially addresses this — both metrics share the same confounders.

06 · Pre-Registered Predictions

14 predictions. 11 confirmed. 2 partial. 1 untestable.

PredictionResult
Feature exposure outperforms raw adoption (sadness)ΔR² = +0.095
Feature exposure outperforms raw adoption (suicidal ideation)ΔR² = +0.053
Feature exposure outperforms raw adoption (female sadness)ΔR² = +0.095
O-type features have higher avg R² than R-type0.549 vs 0.493
O-type features have higher avg R² than α-type0.549 vs 0.375
opaque_recommendation is single strongest predictorAvg R² = 0.852
2016 feature jump is largest year-over-year change+17.1 vs next +10.1
PredictionResult
Global dose-response is negativePartial: −0.104 (users only, p=0.007); full range p=0.051
Female slope steeper than male91% of countries, paired t = −8.42, p < 0.000001
W. Europe features predict wellbeingr = −0.648, p = 0.017
O-type features dominatePartial — both sig. predictors are O-type
WhatsApp markets show weaker dose-responseUntestable (no WhatsApp data)
Survives GDP controlpartial r = −0.580, p = 0.038
Instagram-specific exposure predicts worser = −0.373, p = 0.008

Paper 166 (5/5): Feature ≤ raw in all outcomes (SURVIVED). O-type not dominant (SURVIVED). Feature worse than Pe in ≥4/6 (SURVIVED). No feature p < 0.01 (SURVIVED: opaque_rec p = 0.0003). Bullying shows spurious signal (SURVIVED: R² = 0.055).

Paper 167 (5/5 survived, 2/5 not yet triggered): Dose-response reversal (SURVIVED). Gender gap reversal (SURVIVED). W. Europe sign reversal after GDP (SURVIVED). All R² < 0.01 (SURVIVED). App-usage data null (NOT YET TESTED).

07 · Implications

Why design features matter more than screen time.

FOR POLICY

The hypothesis: target features, not usage.

If the feature-adoption separation holds at larger N, it would suggest that policies targeting specific design features (algorithmic feeds, autoplay) may be more effective than blanket screen time limits. At N=7 this is a hypothesis supported by directional evidence, not a proven policy prescription.

FOR LITIGATION

Methodology contribution, not standalone evidence.

This provides a methodology for identifying specific platform design features associated with mental health outcomes. At N=7, the statistical power is insufficient for Daubert-level expert testimony as a standalone analysis. The value is the reproducible, verifiable approach — extensible to individual-level datasets (ABCD Study, Gallup) where the sample sizes would support formal evidentiary standards.

FOR REGULATION

Feature taxonomy extends to 16 platform categories.

The 13 social media features are part of a broader taxonomy: 8 universal features + 109 domain-specific features across social media, AI systems, gambling, e-commerce, news, gaming, messaging, dating, finance, government, healthcare, education, streaming, insurance, real estate, and agriculture. Same verification standard. Same mechanical scoring. Ready for EU AI Act self-assessment and DSA transparency obligations.

EU AI Act Compliance →

08 · Framework Connection

Why opacity dominates — the Fantasia Bound.

The Fantasia Bound: I(D;Y) + I(M;Y) ≤ H(Y). Engagement information and mechanism transparency share the same entropy budget. Maximizing engagement structurally requires hiding the optimization — this is a theorem derived from the Shannon chain rule, not a hypothesis.

The FYP-style feed is the purest implementation: the algorithm selects content for engagement (maximizing I(D;Y)) while the user has no visibility into why (minimizing I(M;Y)). That opaque_recommendation ranks as the single strongest feature predictor (R²=0.938 for female sadness at N=7) is consistent with the Fantasia Bound, though the N=7 caveat means the precise R² value is unstable.

The feature proxy (R²=0.799) performs within 2pp of the original Pe scoring (R²=0.780) for persistent sadness — but without the circularity. Replacing subjective O/R/α rubric with verifiable features loses almost nothing in predictive power.

PlatformORαPe
TikTok (FYP)33322.1
Instagram Reels32318.7
YouTube (algorithm)2239.4
Twitter/X (For You)2223.8

09 · Reproduce It

All data public. All code open.

Paper 166 (CDC YRBS): DOI 10.5281/zenodo.19339981 — CC-BY 4.0.

Paper 167 (PISA cross-national): DOI 10.5281/zenodo.19340038 — CC-BY 4.0.

PISA microdata: free download from OECD. StatCounter: free web interface. CDC YRBS: public domain. Total reproduction cost: $0. Runtime: under 30 minutes.