Methodology

How Daybreak works

Every probability on every surface is sourced from real money. This page describes where that data comes from, how we process it, and what we will never do with it.

Data sources

Polymarket

The primary prediction-market source at foundation. We ingest all active markets via the Gamma API (metadata) and the CLOB prices-history API (daily-close price history), updated hourly. As of writing: 999 active markets tracked across six categories: politics, macro, geopolitics, crypto, tech, and other.

Category assignment is deterministic from Polymarket's own tagging with a small normalization layer applied on ingest. Market identity uses a deterministic string key “polymarket:<slug>” so upserts are idempotent.

Coming next

Kalshi, Manifold, Metaculus, FRED macro series, and financial derivatives (rates futures, equity options) are added when the widget that needs them reaches the build queue. Multi-source divergence analysis — when prediction markets and futures-implied probabilities disagree — ships as its own widget and will be documented here when it does.

Calibration database

The calibration database answers: historically, when a prediction market in category X priced an event at probability P with T days to resolution, how often did that event actually occur?

Calibration is computed against a three-axis grid:

Axis	Buckets	Detail
Probability	10	10 pp bands — 0–10%, 10–20%, …, 90–100%
Time-to-resolution	4	≤7d, ≤30d, ≤90d, ≤365d
Category	6	politics, macro, geopolitics, crypto, tech, other

Total grid: 240 cells (6 × 10 × 4). Each cell requires at least n = 10 resolved markets before it is surfaced — below that it carries a low-confidence flag and is shown with an explicit caveat.

Foundation state: 758 resolved markets ingested, 128 / 240 cells populated. The database fills in over time as markets resolve under the live ingest pipeline. There is no shortcut — depth is the point.

Cell identifiers follow the format “polymarket:politics:0.70-0.80:30d” — source, category, probability band, time bucket. Every calibration annotation in the product traces back to a cell identifier so the claimed count is always verifiable.

Reality Check

Reality Check takes a free-text claim and returns the deepest prediction market that prices it, the market's current implied probability, the 30-day Smart Money vs Crowd trajectory, and — when the calibration cell clears the n = 10 floor — the historical resolution rate for similar markets.

Matching. The claim is embedded with OpenAI's text-embedding-3-small (512 dimensions) and compared via cosine similarity to every active market's embedding. We keep the top 3 matches that clear a 0.55 similarity threshold and additionally have a liquidity score ≥ 0.5. When nothing clears the threshold, the page says so plainly rather than serving a weak match.

Interpretation. The one-sentence interpretation under the hero number is generated by Claude Haiku from a structured snapshot of the matched market. The model is instructed to emit prose only — every number on the page comes from the structured snapshot, not the model. A verifier rejects any output that contains a digit or numeric word and forces a corrective retry; if the verifier fails twice, the page renders the market data without a synthesis line rather than fudging it.

Calibration delta. The historical resolution rate is read from one cell of the calibration database described above — keyed on the market's source, category, implied-probability bucket, and time-to-resolution bucket. The delta displayed (“priced N pts high”) is the difference between the market's current implied probability and the historical realized rate for markets in that same cell. When the cell holds fewer than n = 10 resolved markets, the delta is replaced with an explicit limited-history caveat.

Share-image gate. The branded PNG export is disabled until the calibration database holds at least 200 resolved markets across at least 4 categories. Below that, the page renders normally but the share button is greyed out — we will not put thin numbers on a public share-image footer.

Rate limit. Five queries per IP per UTC day. We hash the IP with a server-side salt before storing the counter and never log the raw address. Anonymous use is the point until accounts arrive.

Anomaly baselines

The Biggest Movers strip ranks markets not by raw point-change, but by how unusual today's move is for that specific market. A 3-point move in a thin geopolitics market may be the first signal of something; a 3-point move in a liquid, high-volume market is often noise. Unusualness is measured as a z-score against each market's own baseline.

The window. Each market's baseline is its 30-dayrolling-vol — the standard deviation of daily log-returns over the prior 30 trading days. A market's z-score on a given day is its log-return divided by that baseline. A score of 2.6σmeans today's move was 2.6 times larger than a typical daily move for that market. The window is 30 days rather than 90 because Polymarket's market-age distribution makes a 90-day floor too exclusionary — it would drop 53% of the universe.

Exclusion floors. Three filters apply before a market appears in any strip:

Floor	Value	Why
History	≥ 30 daily closes	Below this the baseline stdev is too noisy to trust
Volume	≥ $5,000 24h volume	Below this, moves are toy-book noise on a single trade
Liquidity	liquidityScore ≥ 0.4	Below this, the order book is too thin for daily closes to be meaningful

Excluded markets are counted in the tile header (“N tracked · M excluded”) so the exclusion is transparent rather than silent.

Quiet Movers. The Quiet Movers strip surfaces markets where the z-score is less than 0.5 (“less than half a typical daily move”) but a known scheduled event lands within the next 7 days. The hypothesis: a market that hasn't repriced ahead of a known event is worth watching.

Events are drawn from a curated event_calendar.json— FOMC dates, SCOTUS ruling windows, elections, central bank meetings, and major economic releases, each tagged with the market IDs most likely to be affected. We do not infer event associations algorithmically; only manually tagged markets qualify. The limitation is honesty: if a market isn't in the calendar, it won't appear as a quiet mover, even if it should be.

Cold-start honesty. On the day the widget shipped, event-calendar entries had empty affectedMarketIds. The quiet-mover strip fills in over time as the calendar is manually curated. We track a quiet-mover hit rate — whether quiet markets actually reprice within 7 days of the listed event — as the moat-validation metric. It will not have enough data to be meaningful until 14+ days post-ship.

Anomalies — the six event types

The Anomalies feed answers one question — is anything fishy happening? — by naming what happened, in plain language, rather than collapsing everything into a single mystery score. Each card is one typed event: a specific, named pattern detected on one market on one day. A market can show more than one type on the same day — we keep them as separate stories rather than averaging them into one number.

All six detectors run continuously over every market. There is no single combined “anomaly score”: each type is its own signal, with its own evidence and its own track record.

Smart Money Appeared

Wallets with a strong settled-bet track record concentrated real money into a market today. Surfaces the wallet-accuracy database — each flagged wallet's lifetime record is shown next to its bet. A wallet with fewer than five settled bets is shown with an explicit thin-record caveat rather than as a clean win rate.

Quiet Reprice

The price moved meaningfully on a day with almost no trading — the market repriced quietly, without the volume you'd expect to accompany a real move. Surfaces the intraday price-vs-activity baseline (how much this market normally trades on a day it moves).

Outsized Move

Today's move was large relative to this market's own history— not a fixed point threshold, but a move that is rare for this specific market. Surfaces the per-market 30-day volatility baseline. We describe the rarity in plain frequency terms (“a move this big happens about once a month in this market”), never as a statistical score.

Early Discovery Surge

A young market suddenly drew a burst of attention and money far above its short history — the moment a market is “discovered.” Surfaces the new-market age-and-activity baseline.

Resolution Sprint · observation only

A market entered its final stretch and the price is sprinting toward an outcome as it closes. This is shown in the separate Closing now strip and carries no confidence number — it is an observation of what just happened as the market resolved, not a prediction.

One-Shot Whale · observation only

A single large bet from a wallet with no settled track record — a newcomer placing a big, unproven position. Carries no confidence number: with no history to weigh it against, we describe what we see (size, share of the day's money, wallet age) and draw no conclusion.

Categories & fairness. Markets are categorized using the source platform's own tags (politics, macro, geopolitics, crypto, tech, sports, other), not keyword guesses. Each signal is judged within its category — a busy sports market is measured against other sports markets, not against a quiet macro one — so one high-volume category can never crowd out the board.

Choosing what to show

The detectors are deliberately sensitive — on a busy day they flag far more real events than anyone should read. A calm “is anything fishy?” surface cannot open at hundreds of rows. So the feed ranks every event and shows only the most noteworthy few of each type, per category, per day.

How we rank. Two things move an event up the list: how surprising it is relative to what is normal for that type of event in that category, and how much real moneystands behind it. A genuine eight-point move in a liquid market outranks the same move in a near-dead one; a wallet putting thousands on a position outranks one putting tens. Nothing is deleted — thin, low-money events simply rank below better-backed ones. We rank by a market's own volatility and liquidity baselines (which take months of resolved history to mature), so the ranking itself is built on proprietary data, not a generic “most recent” sort.

What this means for you. The feed is a curated short list, not an exhaustive log. Every event we detect is still recorded and still counts toward the track-record figures below; the feed just surfaces the strongest handful so the page stays readable. Markets in their final stretch are routed to the separate Closing now strip and capped the same way.

Liquidity transparency. Because we de-rank rather than delete, a noteworthy move in a genuinely thin market can still appear. When it does, its low liquidity is shown on the card — we never present a number cleanly when the market behind it is thin.

Wallet accuracy

When a wallet address enters a market, Daybreak looks up its lifetime track record: out of all its previous trades, what percentage resolved YES (or NO)? A wallet that has resolved true 72% of the time on similar markets is higher-signal than one at 52%. This lifetime-calibration percentage is shown on the Smart Money Appeared card next to each flagged wallet.

How it's computed. For each wallet, we track every resolved market it has traded in, grouped by category and time-to-resolution bucket. The accuracy is the ratio of YES outcomes to total outcomes. A wallet with fewer than 5 resolved trades is shown with an explicit thin-record caveat — we will not present “5 of 5” as if it were a proven hand.

This metric is one of the three anchors of wallet-level reputation in Daybreak. It decays as new markets resolve — a wallet that was 75% accurate two years ago but has been wrong on the last three trades will trend toward 50%.

How often each signal has been right

For each event type, Daybreak tracks how often a flagged market actually moved in the days after the flag, compared with an ordinary market over the same window. That is the plain-frequency figure on the card — “flagged moves matched the final outcome 8 times in 10, vs 6 in 10 for ordinary moves.” The number is a direct ratio from resolved-outcome records; no language model ever touches it.

Cold-start honesty. An event type with too few resolved flags to quote a stable rate is marked still building its track record rather than shown with a thin, unstable number. The figure firms up as markets resolve. Two types — Resolution Sprint and One-Shot Whale — carry no confidence number at all by design: they are observations of what happened, not predictions.

These figures are computed from historical market resolutions stored in the Daybreak calibration database. They update as markets close, not in real time.

What we never do

—

No LLM-generated probability numbers. Every probability is sourced directly from market data. Language models are used for prose generation only, and every number they handle is verified against the structured input passed to them before it appears in any output. This is non-negotiable — it is the most load-bearing trust commitment the product makes.

—

Thin markets carry a caveat. Every output includes liquidity context — open interest, volume, and a flag for markets with concentrated or unusual activity. We never present a number cleanly when the data behind it is messy.

—

Anomalous flow is described, not accused. When wallet-level analysis flags unusual activity, we describe what we observe — volume, concentration, wallet age — and never draw conclusions about intent. Language like “manipulation” or “insider trading” never appears on any surface.

—

Calibration gaps are disclosed. Cells below the n = 10 floor are shown with a low-confidence flag. We do not populate empty cells with interpolated guesses.

Report an inaccuracy

If you spot a number that looks wrong, a market that's misclassified, or a calibration ribbon that seems off — report it. Every report is logged, investigated, and addressed.

Report inaccuracy →

Reports logged. Response within 48h.