Does a low grade actually predict a dispute?
Why this is the highest-value work
RuleScore is a deterministic opinion about contract language. It becomes a rating only when the grades carry information about real outcomes — disputes, resolution delays, rule changes, refunds. The whole firm rests on demonstrating that link honestly, with stated N, dates, method, and limitations.
The honest constraint
A clean out-of-sample backtest needs each market's rules text as it was listed, point-in-time. Brierly largely does not have that historically — the point-in-time recorder was scoped down for data-rights reasons. So we do not pretend to a backtest we can't run. Instead:
The three-part method
| Component | What it is | Status |
|---|---|---|
| 1 · Retrospective calibration | Score each coded dispute case on the information available at listing (rules text as quoted in sources), blind to outcome where possible, and read dispute/delay rates by grade band. | coding in progress 21 disputed cases coded; re-scoring on as-listed text underway |
| 2 · Control sample | Code a comparison set of non-disputed high-volume markets — the denominator. Without controls, a “disputes-only” table proves nothing, so no “N× more likely” claim is made until this exists. | required before any ratio claim |
| 3 · Forward (out-of-sample) log | Log live grades with timestamps for active markets now, so in a few months there is a clean, pre-registered test that needs no historical reconstruction. | ● running grades are hash-stamped every scan in the rules-version log; see RuleScore → “Rules changed recently” |
The forward log is the most important compounding asset here: every day of live grades is out-of-sample evidence that can't be reconstructed later. It is already accruing.
What you can check today
- The 21-case dispute database — every case sourced and dated, the evidence the rubric is built on
- The methodology — the five failure modes, bands, and weighting, reproducible by anyone
- The Brier-scored call record — a separate, already-running honesty mechanism on the research side
When it reads out
Results will be published here only if the relationship holds, with exact N, dates, method, and caveats. If it doesn't hold, that's a model-improvement signal, not a press release. Either way, the number you eventually see will be computed, not asserted.
Want the read-out when it lands?