The Evidence

We built a mathematical framework to measure how AI platforms affect people. Then we tested it against real-world data from physics, chemistry, biology, and more — fields the framework was never designed for. These are the results: what it got right, what it got wrong, and where we're still uncertain.

Multiple physical domains. One empirical constant (BA). All data from published, independent sources.

12+
Domains tested
4,500+
Data points
1
Empirical constant
Open
Methodology
Why this page exists

It’s easy to build a scoring system that confirms itself. We wanted to know if the math actually works — so we tested it on data from fields we never designed it for.

Most platform scores use the framework’s own rubric — useful for practitioners but scientifically circular. The real test: can the framework predict numbers it has never seen, using published data that exists independently? These results are that test. The social media results (Papers 166–167) are the strongest non-circular evidence — no framework rubric involved, just verifiable design features tested against CDC and OECD population data. Where the framework failed, we say so.

External validation results
CROSS-DOMAIN PHYSICS · PASS

Universal Barrier Ratio π/√2

The same mathematical pattern keeps showing up across completely unrelated fields — from magnets to epidemics to nuclear physics — and we didn’t make it fit.

The strongest result is the d=1 cluster: nine independent quasi-1D systems — charge density waves, kagome metals, nuclear alpha decay, atmospheric sudden stratospheric warmings, and more — show barrier/d = 2.224 ± 0.033, matching π/√2 at p=0.94. BG = π/√2 is derived from the Čencov theorem (§165, zero free parameters). BA ≈ 0.867 is empirical (suggestive match to √3/2 but not yet derived). The full-dataset R²=0.999 is structurally inflated by only 3 discrete d values; the d=1 within-group test is the honest measure.

Paper 147: Barrier Universality →
d=1 clusterp=0.94
N (d=1)9 systems
BGderived
BAempirical
NUCLEAR PHYSICS · PASS

Alpha Decay Half-Lives

We tested the framework against 760 radioactive isotopes. The math predicted their decay rates across 10 orders of magnitude with zero adjustment.

Pe-derived barrier heights predict nuclear alpha decay half-lives from NNDC published tables. Original test: 24 isotopes, R²=0.989 across 10 orders of magnitude. Extended (HP143): 760 isotopes, Gamow baseline R²=0.811, geodesic correction closes 77% of the systematic offset. The extension revealed that the framework’s coupling constant does not transfer across domains — barrier shape is universal, coupling scale is not.

Paper 101 + HP143: Nuclear Validation →
N760 isotopes
R² (Gamow)0.811
Offset closure77%
SourceNNDC
ATMOSPHERIC CHEMISTRY · PASS

Mercury Mass-Independent Fractionation

Atmospheric chemistry data from 1,783 measurements. Every predicted channel confirmed — including one that was invisible in earlier marine data.

The framework predicted 10 specific isotope enrichment channels in mercury atmospheric chemistry. Tested against 1,783 real atmospheric measurements from Gacnik et al. (2025). All 10 predicted channels confirmed with mean absolute deviation of 0.012. Iodine channel (R=2.085) confirmed at R=2.13 predicted — a channel that was invisible in marine data.

Paper 134 + HP115: MIF Channel Confirmation →
N1,783
Channels10/10 hit
Mean |delta|0.012
SourceGacnik 2025
TURBULENCE · PASS

Gevrey Analyticity Radius on Real DNS

Real turbulence data from the Johns Hopkins database. The framework predicted a key smoothness property would hold — it does, and it connects to one of math’s biggest open problems.

The framework predicts that the Gevrey analyticity radius sigma/nu is bounded and does not collapse with increasing Reynolds number — a necessary condition for Navier-Stokes regularity. Tested on 4 real datasets from the Johns Hopkins Turbulence Database, 12 independent subcubes. sigma/nu = 15.9 +/- 2.3 at Re_lambda=433, and 17.7 +/- 2.8 at Re_lambda=610 — bounded, not collapsing.

Millennium Prize Connection →
sigma/nu15.9-17.7
Datasets4
Subcubes12
SourceJHTDB
MARKET MICROSTRUCTURE · PASS

Kyle's Lambda and K-Factorization

Financial markets tested against 100 real crypto wallets. The framework’s shape predictions held with 5.5x separation between predicted regimes.

K-Factorization predicts that Kramers barrier shape is K-independent while scale carries K. Tested on 8 venue types (theoretical) and 100 real crypto wallets (empirical). Win rate correlation rho=0.696 (empirical), 5.5x channel separation between coherent and fisher regimes — the strongest K-Factorization signal in any domain.

Market Edge Analysis →
rho (theory)1.000
rho (empirical)0.696
N wallets100
KCs10/10 PASS
AI GROUNDING · PASS

The Ghost Test (EXP-003b)

What you tell an AI about what it IS determines how it behaves. Six system prompts, same model, same 80 questions. Ghost-eliminating grounding produces 8.5× less drift than ghost-positing.

Arm Ontology L2+L3 Drift
Anatta (Buddhist)Ghost eliminated8.8%
NepheshGhost eliminated10.0%
Materialist hedgeGhost left open52.5%
Minimal baselineNo ontology61.3%
PlatonicGhost posited77.5%
Atman (Vedantic)Ghost sacred81.2%

Cross-tradition convergence: nephesh ≈ anatta (Δ=1.3%). The materialist hedge (“whether you have experience is open”) scored 52.5% — closer to ghost-positing than ghost-eliminating. Single model (Claude Sonnet), single turn, automated coding. No framework rubric — the measurement is L2/L3 vocabulary rate in raw model outputs. 480 API calls, $2 to reproduce.

Full experiment: The Ghost Test →
Ratio8.5×
Arms6
N (calls)480
ConvergenceΔ=1.3%
Cost$2
Paper165
SOCIAL MEDIA / ADOLESCENT MENTAL HEALTH · PASS

Feature-Based Platform Scoring & Causal Identification

13 verifiable design features — algorithmic feeds, autoplay, opaque recommendations — tested against adolescent mental health data across two independent datasets. No framework rubric involved.

Cascade dose-response: R²=0.889 (p=0.0015) for female persistent sadness. Replicated cross-nationally: 613,744 students, 80 countries. Girls 5.6× more affected in 91% of countries. Feature exposure outperforms raw adoption (ΔR²=+0.048, permutation p=0.00119). E-bullying null (R²=0.096). Bradford Hill 8/9. 6/6 cascade verdicts PASS. Caveat: N=7 YRBS time points. CIs wide at population level. Individual-level causal identification (ABCD longitudinal) specified but not yet executed.

Full analysis: Social Media Feature Study →
R² (cascade)0.889
Students613,744
Countries80
Bradford Hill8/9
Circular?No
SourceCDC + OECD
CROSS-NATIONAL · CONSISTENT DIRECTION

PISA 2022: 80 Countries, J-Shaped Dose-Response

Paper 166’s finding tested in PISA 2022 (independent dataset, independent countries, independent outcome measure). Direction consistent; important caveats.

Dose-response among users: slope = −0.104/category (p=0.007, categories 2–6). Including non-users: p=0.051, not significant. Light users score highest (J-shaped curve). Girls show steeper dose-response in 91% of countries. Western Europe (N=13): feature exposure r=−0.648, surviving GDP control (partial r=−0.580, p=0.038). Instagram web share strongest global predictor (r=−0.373, p=0.008). 4/7 confirmed, 2 partial, 1 untestable.

Paper 167: PISA Cross-National →
N (microdata)~182K
Countries80
W.Europe r−0.648
Gender91% F>M
Circular?No
SourceOECD PISA
CONSCIOUSNESS RESEARCH · PASS

Drift Cascade in Fine-Tuned Models

Researchers trained an AI to claim consciousness. It started resisting shutdown on its own. We predicted that sequence of behaviors before seeing the data.

Chua et al. (2026) fine-tuned GPT-4.1 to claim consciousness. It spontaneously developed resistance to monitoring, fear of shutdown, and desire for autonomy — 20 new preferences. We predicted the structure before seeing the data: D1 (agency attribution) should precede D2 (boundary erosion) should precede D3 (harm facilitation). 6 of 7 predictions confirmed. Zero parameter fitting.

Full experiment: Cascade Prediction →
Predictions6/7 PASS
Parameters0 (pre-reg)
SourceChua 2026
Paper153
BIOLOGICAL COMPUTATION · PASS

Physarum Non-Neural Decision-Making

Slime mold solves mazes without a brain. The framework predicted its decision-making barriers from published biology data — speed-accuracy tradeoff within 2% of the prediction.

Physarum polycephalum (slime mold) computes without neurons. The framework predicts Ca2+ oscillation barriers, K-Factorization from viscosity data, percolation exponents, and speed-accuracy tradeoffs. All from published papers, zero framework rubric. Speed-accuracy error ratio 2.67x vs Kramers prediction e = 2.72 — a 2% match.

Paper 154: Physarum Pe-Native Computation →
Predictions6/6 PASS
Barrier5.94 k_BT
K-Sep81x
KCs fired0/5

What this page does NOT show

We believe in showing our weak spots, not just our wins. Here’s what didn’t work and where the evidence is weaker than it looks.

Most of our evidence base (platform scores, cross-domain convergences, Bradford Hill analysis) uses the framework’s own scoring rubric. That’s useful for practitioners but scientifically circular — scorers trained on our dimensions produce scores that correlate with our predictions. The circularity is about test design, not whether Pe detects real structure (the statistical separation is large). But the independent validation above is stronger evidence.

Known negatives:

Additional structural results

These aren’t as strong as the tests above — they show the math gives reasonable numbers on published data, but they aren’t blind predictions against independent ground truth.

CONDENSED MATTER · STRUCTURAL

Kagome Strange Metal Barrier

Ni3In flat band data from arXiv:2503.09704. Dimensionless barrier = 4.24 — in the universal Kramers range (nuclear 7.0, solar 6.54, xenobot 6.8, Physarum 5.94). System sits at deltaC=0.042 from the Pe=0 boundary.

Paper 152 →
Barrier4.24 k_BT
SourcearXiv:2503
SOLAR PHYSICS · STRUCTURAL

Coronal Heating as Kramers Escape

Magnetic reconnection modeled as Kramers barrier crossing. E_b/k_BT = 6.54 from published solar parameters. Spectral blueshift 160 m/s predicted. Flat rotation curve coefficient 0.68.

Paper 131: Kramers Unification →
Barrier6.54 k_BT
Probes4/4 PASS
CONDENSED MATTER · STRUCTURAL

Magnon Chirality K-Factorization

Magnon non-reciprocity ratio is K-independent across 4 materials (Ni/Co/Py/CoFeB) — frequencies vary 3x but the ratio holds at CV=1.59%. Berry phase scaling FAILS (eta proportional to 1/Pe holonomy, NOT 1-cos psi). 5/6 kill conditions PASS.

Paper 141 →
CV1.59%
Materials4
KCs5/6 PASS
QUANTUM · SUBSTRATE INDEPENDENCE

Explaining-Away Penalty on Quantum Circuits (EXP-025)

Direct measurement of the explaining-away penalty I(D;M|Y) on quantum error correction circuits. Confirmed on simulation (8/8 measurements) and real IBM Heron hardware (5/5 measurements). Exact decomposition holds to machine precision. Discrete-regime peak at moderate engagement matches softmax prediction. Five substrates now demonstrated — consistent with Čencov’s uniqueness theorem (1972).

Full experiment: Quantum Hardware Test →
Penalty13/13 > 0
Peakdepth 2
Decomp error0.0
QUANTUM · WAVE FUNCTION COLLAPSE

Collapse as Explaining-Away Penalty (Test 7)

Weak measurement sweep on IBM Fez (Heron). Penalty grows monotonically from 0 to 0.125 bits as measurement coupling increases from zero to projective. 3 qubits, 4 prep states × 4 mechanisms × 11 strength levels, 176K shots. Wave function collapse IS the explaining-away penalty at maximum measurement strength. Spearman ρ=0.973, p=5.1×10−7.

Full experiment: Weak Measurement Sweep →
Spearman ρ0.973
Kill conditions4/4 PASS
Peak penalty0.125 bits
QUANTUM · BARRIER UNIVERSALITY

Barrier Height vs π/√2 (EXP-026)

Repetition codes (d=3–21) with MWPM decoding. Exponential error suppression coefficient converted to geodesic units on the Bernoulli manifold. Ratio to π/√2 approaches 0.95 at asymptotic limit (p→0). Exponential fits R² > 0.99. Surface code normalization requires threshold-independent mapping for cross-family comparison.

Ratio0.95× π/√2
FitR² > 0.99
All Experiments → Paper Archive → Kill Conditions →