← Back to Research
Methodology Technical

Our Data Methodology

How we collect, validate, and analyze data to generate actionable betting insights with statistical rigor.

Every insight we deliver is backed by data. This document explains our methodology—how we collect data, detect patterns, and generate narratives that bookmakers can trust.

Data Sources

BidCanvas aggregates data from multiple sources to build a comprehensive view of each match:

Primary Sources

Source Type Data Collected Update Frequency
Prediction Markets Polymarket, Kalshi probabilities, volume, wallet activity Real-time (WebSocket)
Sportsbook Odds Line movements, opening/closing odds, market consensus Every 60 seconds
Match Statistics Goals, cards, corners, shots, xG, player stats Post-match + live
Team/Player Data Form, injuries, transfers, lineup history Daily
Referee Data Card tendencies, foul counts, penalty rates Post-match

Data Validation

Raw data goes through validation before entering our analysis pipeline:

Pattern Detection

We use statistical methods to identify patterns that have predictive value. Not all correlations are patterns—we apply strict criteria.

Minimum Requirements

For a pattern to be published:
  • Sample size: Minimum 50 instances
  • Statistical significance: p < 0.05
  • Effect size: Meaningful deviation from baseline
  • Temporal stability: Pattern holds across multiple seasons

Pattern Categories

Category Example Detection Method
Referee Tendencies Anthony Taylor averages 4.2 cards/match Historical averaging + confidence intervals
Team Form Patterns Teams on 4-win streak win 53.6% of next match Conditional probability analysis
League Baselines Bundesliga BTTS rate: 59.3% Aggregate statistics by league
Probability Regression Markets at 70%+ regress toward 60% Time-series analysis
Sharp Money Signals 3+ sharp wallets aligned = 64% accuracy Wallet performance tracking

Avoiding False Patterns

Sports data is noisy. Many apparent patterns are random variation. We guard against false positives through:

AI Narrative Generation

Raw statistics are hard to consume. We use AI to transform data into readable narratives.

The Process

# Simplified narrative generation flow

1. Data Assembly
   - Match context (teams, league, competition stage)
   - Relevant patterns (referee, form, H2H)
   - Current odds and market movements
   - Sharp wallet positions

2. Pattern Ranking
   - Score each pattern by relevance to this match
   - Filter to top 3-5 most relevant insights
   - Ensure diversity (don't repeat similar points)

3. Narrative Generation
   - LLM (Claude) generates human-readable text
   - Structured prompts ensure consistency
   - Facts grounded in source data

4. Quality Control
   - Automated fact-checking against source data
   - Confidence scoring based on pattern strength
   - Human review for high-stakes outputs

What AI Does (and Doesn't) Do

AI Role Human Role
Summarize complex statistics in natural language Validate pattern detection methodology
Combine multiple data points into coherent narrative Set minimum thresholds for pattern inclusion
Generate bet slip suggestions with reasoning Review edge cases and unusual outputs
Adapt tone and detail level to context Define business rules and constraints
Key Principle: AI explains patterns—it doesn't invent them. Every fact in a narrative traces back to validated source data.

Continuous Improvement

Our methodology evolves based on results:

Feedback Loops

Deprecation Policy

Patterns that no longer meet our criteria are deprecated:

Transparency

Every insight we deliver includes:

We believe transparency builds trust. Bookmakers can evaluate our methodology and decide how to weight our insights alongside their own analysis.

Questions about our methodology?

Request Demo Contact Sales

Data Sources Referenced

Related Research