Does a low grade actually predict a dispute?

The question that turns an opinion into a rating. This page is the honest status of the work to answer it — including what we can't claim yet.

Status: study in progress. We do not publish an accuracy figure (e.g. “low grades dispute N× more often”) until it is computed from real data with a control sample. There is no such number on this site yet — by design.

Why this is the highest-value work

RuleScore is a deterministic opinion about contract language. It becomes a rating only when the grades carry information about real outcomes — disputes, resolution delays, rule changes, refunds. The whole firm rests on demonstrating that link honestly, with stated N, dates, method, and limitations.

The honest constraint

A clean out-of-sample backtest needs each market's rules text as it was listed, point-in-time. Brierly largely does not have that historically — the point-in-time recorder was scoped down for data-rights reasons. So we do not pretend to a backtest we can't run. Instead:

The three-part method

Component	What it is	Status
1 · Retrospective calibration	Score each coded dispute case on the information available at listing (rules text as quoted in sources), blind to outcome where possible, and read dispute/delay rates by grade band.	coding in progress 21 disputed cases coded; re-scoring on as-listed text underway
2 · Control sample	Code a comparison set of non-disputed high-volume markets — the denominator. Without controls, a “disputes-only” table proves nothing, so no “N× more likely” claim is made until this exists.	required before any ratio claim
3 · Forward (out-of-sample) log	Log live grades with timestamps for active markets now, so in a few months there is a clean, pre-registered test that needs no historical reconstruction.	● running grades are hash-stamped every scan in the rules-version log; see RuleScore → “Rules changed recently”

The forward log is the most important compounding asset here: every day of live grades is out-of-sample evidence that can't be reconstructed later. It is already accruing.

What you can check today

The 21-case dispute database — every case sourced and dated, the evidence the rubric is built on
The methodology — the five failure modes, bands, and weighting, reproducible by anyone
The Brier-scored call record — a separate, already-running honesty mechanism on the research side

When it reads out

Results will be published here only if the relationship holds, with exact N, dates, method, and caveats. If it doesn't hold, that's a model-improvement signal, not a press release. Either way, the number you eventually see will be computed, not asserted.

Want the read-out when it lands?

Brierly grades are independent editorial opinions about contract structure and resolution-ambiguity risk — not predictions of any event's outcome and not recommendations to buy, sell, or hold any contract.