← Research
Methodology Technical 4 min read • January 2026

Our Data Methodology

How we collect, validate, and analyze data to generate actionable betting insights with statistical rigor.

By the Metrics
50+
minimum instances required before any pattern is published
p<0.05
statistical significance threshold for all published patterns
4
validation layers: significance, out-of-sample, causal logic, re-validation
Problem
Sports data is noisy. Many apparent patterns are random variation. Operators need to trust the methodology behind any insight before embedding it in player-facing content.
Approach
Multi-source data aggregation with strict validation: 50+ instances, p < 0.05 significance, out-of-sample testing, logical review, and quarterly re-validation of all published patterns.
📈
Outcome
Every insight delivered to operators traces back to validated source data. AI explains patterns — it doesn't invent them. Full transparency on sample size, time period, and confidence level.
in 𝕏

Every insight we deliver is backed by data. This document explains our methodology — how we collect data, detect patterns, and generate narratives that bookmakers can trust.

Data Sources

BidCanvas aggregates data from multiple sources to build a comprehensive view of each match:

Primary Sources

Source Type Data Collected Update Frequency
Prediction Markets Polymarket, Kalshi probabilities, volume, wallet activity Real-time (WebSocket)
Sportsbook Odds Line movements, opening/closing odds, market consensus Every 60 seconds
Match Statistics Goals, cards, corners, shots, xG, player stats Post-match + live
Team/Player Data Form, injuries, transfers, lineup history Daily
Referee Data Card tendencies, foul counts, penalty rates Post-match

Data Validation

Raw data goes through validation before entering our analysis pipeline:

  • Cross-source verification — Match results verified across 2+ sources
  • Outlier detection — Statistical anomalies flagged for manual review
  • Completeness checks — Matches with missing critical data excluded
  • Timestamp normalization — All data converted to UTC

Pattern Detection

We use statistical methods to identify patterns that have predictive value. Not all correlations are patterns — we apply strict criteria.

Minimum Requirements

For a pattern to be published:
  • Sample size: Minimum 50 instances
  • Statistical significance: p < 0.05
  • Effect size: Meaningful deviation from baseline
  • Temporal stability: Pattern holds across multiple seasons
50 minimum instances before any pattern earns publication — combined with p < 0.05 significance and out-of-sample validation. Small samples get discarded, not published.

Pattern Categories

Category Example Detection Method
Referee Tendencies Anthony Taylor averages 4.2 cards/match Historical averaging + confidence intervals
Team Form Patterns Teams on 4-win streak win 53.6% of next match Conditional probability analysis
League Baselines Bundesliga BTTS rate: 59.3% Aggregate statistics by league
Probability Regression Markets at 70%+ regress toward 60% Time-series analysis
Sharp Money Signals 3+ sharp wallets aligned = 64% accuracy Wallet performance tracking

Avoiding False Patterns

Sports data is noisy. Many apparent patterns are random variation. We guard against false positives through:

  • Multiple testing correction — Bonferroni adjustment when testing many hypotheses
  • Out-of-sample validation — Patterns discovered on training data must hold on test data
  • Logical review — Patterns must have plausible causal mechanism
  • Regular re-validation — Published patterns re-tested quarterly

AI Narrative Generation

Raw statistics are hard to consume. We use AI to transform data into readable narratives.

The Process

# Simplified narrative generation flow

1. Data Assembly
   - Match context (teams, league, competition stage)
   - Relevant patterns (referee, form, H2H)
   - Current odds and market movements
   - Sharp wallet positions

2. Pattern Ranking
   - Score each pattern by relevance to this match
   - Filter to top 3-5 most relevant insights
   - Ensure diversity (don't repeat similar points)

3. Narrative Generation
   - LLM (Claude) generates human-readable text
   - Structured prompts ensure consistency
   - Facts grounded in source data

4. Quality Control
   - Automated fact-checking against source data
   - Confidence scoring based on pattern strength
   - Human review for high-stakes outputs

What AI Does (and Doesn't) Do

AI Role Human Role
Summarize complex statistics in natural language Validate pattern detection methodology
Combine multiple data points into coherent narrative Set minimum thresholds for pattern inclusion
Generate bet slip suggestions with reasoning Review edge cases and unusual outputs
Adapt tone and detail level to context Define business rules and constraints
Key Principle: AI explains patterns — it doesn't invent them. Every fact in a narrative traces back to validated source data.
4 layers of quality control before any narrative reaches your players: data assembly, pattern ranking, AI generation, and automated fact-checking against source data.

Continuous Improvement

Our methodology evolves based on results:

Feedback Loops

  • Pattern accuracy tracking — Monitor hit rate of published patterns over time
  • Bet slip performance — Track suggested bets against actual outcomes
  • Client feedback — Incorporate operator insights on what bettors find valuable
  • New data sources — Continuously evaluate emerging data providers

Deprecation Policy

Patterns that no longer meet our criteria are deprecated:

  • If accuracy drops below statistical significance for 2 consecutive quarters
  • If sample size becomes too small (e.g., referee retires)
  • If underlying conditions change (rule changes, team restructuring)

Transparency

Every insight we deliver includes:

  • Sample size — How many instances the pattern is based on
  • Time period — When the data was collected
  • Confidence level — Statistical significance and effect size
  • Source attribution — Where the underlying data came from

We believe transparency builds trust. Bookmakers can evaluate our methodology and decide how to weight our insights alongside their own analysis.

Questions about our methodology?

Request Demo Contact Sales

Data Sources Referenced

Related Articles