Markets›Technology›Will an AI achieve >85% performance on the Frontie…

🤖 TechnologyKalshi55/100 confidence

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

Forecasting market: Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

Alpha Opportunity

40/100

Market Price69%Kalshi

Analyst Estimate35%Analyst research

Your Edge+34.0%Bet sell

RecommendedNO0% APY

Trade on Kalshi →

Alpha Thesis

📊 Dr. Sarah Chen⚖️ James Kowalski🔬 Dr. Aisha Patel🧠 Marcus WebbUpdated 2026-03-16

55/100

📊Free Summary

We believe the Manifold contract for AI achieving >85% on FrontierMath before 2028 is overvalued at 69%, with our estimate at 35%. GPT-5.4 currently leads at 47.6% overall and 50% on Tiers 1-3. While progress has been remarkable (from <2% to 47.6% in ~18 months), reaching 85% requires nearly doubling current performance — including solving the hardest Tier 4 research-level problems that currently stand at 38%. The rate of improvement is decelerating as problems get harder.

📐Key Metrics

47.6% currentThe Best ScoreGPT-5.4 leads FrontierMath at 47.6%. Reaching 85% means solving nearly all easy/medium problems AND most hard ones.

38% on Tier 4The Hard Problem WallOn research-level Tier 4 problems, the best AI scores 38%. These are the problems that prevent reaching 85%.

69% vs. 35%The Optimism GapThe market extrapolates from rapid early progress. But benchmark saturation typically slows dramatically as easy gains are exhausted.

Key Findings

Progress Has Been Remarkable But Decelerating — From <2% to 47.6% in 18 months. But gains from 50% to 85% are historically harder than 0% to 50%.
Tier 4 Is the Bottleneck — Research-level problems at 38% success rate require genuine mathematical creativity that current LLMs struggle with.
Benchmark Saturation Pattern — Every AI benchmark shows the same pattern: rapid initial gains, then asymptotic slowdown. FrontierMath will follow this pattern.
2 Years Is Short — From March 2026 to January 2028, AI must nearly double FrontierMath performance. ~2 model generations.
42% of Tier 4 Solved At Least Once — This is per-attempt; consistent >85% requires solving these reliably, not just occasionally.

🔒

Full Research Report

Unlock the complete analysis including probability assessment, Bayesian calculations, resolution rigor analysis, and strategic positioning recommendations across 5+ dimensions.

⚡ Upgrade to Pro

Alpha Quality Factors

Criteria that determine how exploitable this mispricing is

Edge Magnitude+34.0% raw edge — Strong mispricing

100

Liquidity Health$5K available — Thinner market, size carefully

Volume Activity$3K 24h volume — Lower activity, watch for stale pricing

Time ValueExpires in 12 months — Longer horizon, more uncertainty

Analyst Confidence55/100 confidence — Moderate conviction

Human Bias Detected

Cognitive biases creating this alpha opportunity

🧠

Information Asymmetry

The crowd may lack specialized knowledge that narrows the true probability range.

Compare Markets

Searching Polymarket, Kalshi, Manifold & Metaculus…

Market Data

Liquidity$5K

24h Volume$3K

Expected Return0.0%

Annualized APY0%

Time to Expiry12 months

Risk Levelmedium

Position Sizing

Kelly Criterion (per $1,000 bankroll)

Full Kelly$15315.3%

½ Kelly ★$767.6%

¼ Kelly$383.8%

Payoff Scenarios

InvestWinLose

$100+$223-$100

$250+$556-$250

$500+$1113-$500

$1000+$2226-$1000

Analysis Team

📊

Dr. Sarah ChenLead Quantitative Analyst

⚖️

James KowalskiRisk & Position Strategist

🔬

Dr. Aisha PatelDomain Research Lead

🧠

Marcus WebbBehavioral Finance Specialist

More Technology Markets

Will SpaceX not IPO by December 31, 2027?+56.0%Will Bitcoin ever go below $10,000 again?+44.0%Perfect score achieved by an AI model in the International Math Olympiad (IMO) 2026?+34.0%