The Ambiguity Taxonomy: what 21 settlement disputes teach about contract language
Roughly $1.1–1.2B of reported volume sat on 2025–26 markets that ended in settlement controversy. The failures cluster into five repeatable language patterns — and all five are detectable before listing.
Trevor Sardis · Jun 11, 2026 · 9 minute read · sourced & dated (verification-log discipline)
Prediction markets price the world remarkably well and settle it surprisingly badly. Polymarket alone has logged 1,150+ disputed markets in 2026 year-to-date per The Defiant's count (Jun 1, 2026) — already past its full-year 2025 total. (Brierly now computes its own daily dispute tally from public Polymarket data; the method is on the methodology page and the live figure is on the disputes page.) The Wall Street Journal's May 2026 investigation found that in most disputed markets, more than half of UMA oracle votes came from the ten largest wallets, and roughly one in five disputes had at least one voter with a financial stake in the market being judged.
We coded twenty-one 2025–26 disputes — every major public case from the $237M Zelenskyy-suit market to the $54M Khamenei class action — against the rules text that produced them. Five failure types explain the entire record:
- Source dependence (6 of 21): resolution hinges on named outlets or dual-government confirmation. The Sanders-rally markets resolved No despite video evidence because closed-press logistics meant no qualifying outlet covered it. The $203.6M Iran-ceasefire-extension market died on 'confirmation from both governments' when Iran never spoke in its own voice.
- Definitional vagueness (11 of 21): 'suit,' 'ban,' 'invasion,' 'go live,' 'drop out' — each term settled nine-figure volume without a definition. This is the largest single bucket by both count and dollars.
- Scalar carve-outs (1 of 21, recurring pattern): first-print rules on revision-prone data. The 2025 Oscars market settled No on an initial 18M viewership print; the final figure was 19.7M — across the strike.
- Disclosure timing (2 of 21): the Strategy-Bitcoin market turned $60–85M on whether an 8-K filed June 1 could confirm a sale that happened inside the May window. Event-time versus disclosure-time is the sharpest open drafting question in the asset class.
- Oracle manipulation (1 of 21 + ambient structure): the Ukraine-minerals 'governance attack' (~5M UMA tokens across three accounts forcing a false Yes) plus the WSJ concentration findings above.
Two structural observations follow. First, identical clause architectures resolved in opposite directions within weeks — the Iran extension (No) and the Hezbollah extension (Yes) used mirrored dual-confirmation language read by the same voter set. When the same ambiguity can break either way, price was never measuring the world; it was measuring the adjudicator. Second, dispute frequency is category-skewed: mention/speech markets are the most disputed family in the record, while league championships and certified election results are near-canonical.
Both observations are mechanically detectable in rules text before listing. That is what RuleScore does: a deterministic rubric scores every market 0–100 across the five failure types, quotes the exact language that triggers each flag, and publishes the methodology. The grades are editorial opinions about contract language — risk of dispute, never who is right.
The full coded database — venue, date, notional, failure type, outcome, sources — is public on the Dispute Database page, and the rubric is on the Methodology page. The white-paper version with dollar-weighted frequency tables is in preparation for SSRN.
Registered calls in this note
Calls are probability-stamped at publication and scored against venue resolution on the Track Record page. Honest including the misses.