MarketsTechnologyWill an AI achieve >85% performance on the Frontie
🤖 TechnologyKalshi55/100 confidence

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

Forecasting market: Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

Alpha Opportunity

40/100
Market Price69%Kalshi
Analyst Estimate35%Analyst research
=
Your Edge+34.0%Bet sell
RecommendedNO0% APY
Trade on Kalshi

Alpha Thesis

📊 Dr. Sarah Chen⚖️ James Kowalski🔬 Dr. Aisha Patel🧠 Marcus WebbUpdated 2026-03-16
55/100
📊Free Summary

We believe the Manifold contract for AI achieving >85% on FrontierMath before 2028 is overvalued at 69%, with our estimate at 35%. GPT-5.4 currently leads at 47.6% overall and 50% on Tiers 1-3. While progress has been remarkable (from <2% to 47.6% in ~18 months), reaching 85% requires nearly doubling current performance — including solving the hardest Tier 4 research-level problems that currently stand at 38%. The rate of improvement is decelerating as problems get harder.

📐Key Metrics

1
47.6% currentThe Best ScoreGPT-5.4 leads FrontierMath at 47.6%. Reaching 85% means solving nearly all easy/medium problems AND most hard ones.
2
38% on Tier 4The Hard Problem WallOn research-level Tier 4 problems, the best AI scores 38%. These are the problems that prevent reaching 85%.
3
69% vs. 35%The Optimism GapThe market extrapolates from rapid early progress. But benchmark saturation typically slows dramatically as easy gains are exhausted.

Key Findings

  • Progress Has Been Remarkable But Decelerating — From <2% to 47.6% in 18 months. But gains from 50% to 85% are historically harder than 0% to 50%.
  • Tier 4 Is the Bottleneck — Research-level problems at 38% success rate require genuine mathematical creativity that current LLMs struggle with.
  • Benchmark Saturation Pattern — Every AI benchmark shows the same pattern: rapid initial gains, then asymptotic slowdown. FrontierMath will follow this pattern.
  • 2 Years Is Short — From March 2026 to January 2028, AI must nearly double FrontierMath performance. ~2 model generations.
  • 42% of Tier 4 Solved At Least Once — This is per-attempt; consistent >85% requires solving these reliably, not just occasionally.
🔒

Full Research Report

Unlock the complete analysis including probability assessment, Bayesian calculations, resolution rigor analysis, and strategic positioning recommendations across 5+ dimensions.

⚡ Upgrade to Pro

Alpha Quality Factors

Criteria that determine how exploitable this mispricing is

Edge Magnitude+34.0% raw edge — Strong mispricing
100
Liquidity Health$5K available — Thinner market, size carefully
0
Volume Activity$3K 24h volume — Lower activity, watch for stale pricing
0
Time ValueExpires in 12 months — Longer horizon, more uncertainty
40
Analyst Confidence55/100 confidence — Moderate conviction
60

Human Bias Detected

Cognitive biases creating this alpha opportunity

🧠
Information Asymmetry

The crowd may lack specialized knowledge that narrows the true probability range.

Compare Markets

Searching Polymarket, Kalshi, Manifold & Metaculus…

Market Data

Liquidity$5K
24h Volume$3K
Expected Return0.0%
Annualized APY0%
Time to Expiry12 months
Risk Levelmedium

Position Sizing

Kelly Criterion (per $1,000 bankroll)

Full Kelly$15315.3%
½ Kelly ★$767.6%
¼ Kelly$383.8%

Payoff Scenarios

InvestWinLose
$100+$223-$100
$250+$556-$250
$500+$1113-$500
$1000+$2226-$1000

Analysis Team

📊
Dr. Sarah ChenLead Quantitative Analyst
⚖️
James KowalskiRisk & Position Strategist
🔬
Dr. Aisha PatelDomain Research Lead
🧠
Marcus WebbBehavioral Finance Specialist