Every insight we deliver is backed by data. This document explains our methodology — how we collect data, detect patterns, and generate narratives that bookmakers can trust.
Data CollectionData Sources
BidCanvas aggregates data from multiple sources to build a comprehensive view of each match:
Primary Sources
| Source Type | Data Collected | Update Frequency |
|---|---|---|
| Prediction Markets | Polymarket, Kalshi probabilities, volume, wallet activity | Real-time (WebSocket) |
| Sportsbook Odds | Line movements, opening/closing odds, market consensus | Every 60 seconds |
| Match Statistics | Goals, cards, corners, shots, xG, player stats | Post-match + live |
| Team/Player Data | Form, injuries, transfers, lineup history | Daily |
| Referee Data | Card tendencies, foul counts, penalty rates | Post-match |
Data Validation
Raw data goes through validation before entering our analysis pipeline:
- Cross-source verification — Match results verified across 2+ sources
- Outlier detection — Statistical anomalies flagged for manual review
- Completeness checks — Matches with missing critical data excluded
- Timestamp normalization — All data converted to UTC
Pattern Detection
We use statistical methods to identify patterns that have predictive value. Not all correlations are patterns — we apply strict criteria.
Minimum Requirements
- Sample size: Minimum 50 instances
- Statistical significance: p < 0.05
- Effect size: Meaningful deviation from baseline
- Temporal stability: Pattern holds across multiple seasons
Pattern Categories
| Category | Example | Detection Method |
|---|---|---|
| Referee Tendencies | Anthony Taylor averages 4.2 cards/match | Historical averaging + confidence intervals |
| Team Form Patterns | Teams on 4-win streak win 53.6% of next match | Conditional probability analysis |
| League Baselines | Bundesliga BTTS rate: 59.3% | Aggregate statistics by league |
| Probability Regression | Markets at 70%+ regress toward 60% | Time-series analysis |
| Sharp Money Signals | 3+ sharp wallets aligned = 64% accuracy | Wallet performance tracking |
Avoiding False Patterns
Sports data is noisy. Many apparent patterns are random variation. We guard against false positives through:
- Multiple testing correction — Bonferroni adjustment when testing many hypotheses
- Out-of-sample validation — Patterns discovered on training data must hold on test data
- Logical review — Patterns must have plausible causal mechanism
- Regular re-validation — Published patterns re-tested quarterly
AI Narrative Generation
Raw statistics are hard to consume. We use AI to transform data into readable narratives.
The Process
# Simplified narrative generation flow 1. Data Assembly - Match context (teams, league, competition stage) - Relevant patterns (referee, form, H2H) - Current odds and market movements - Sharp wallet positions 2. Pattern Ranking - Score each pattern by relevance to this match - Filter to top 3-5 most relevant insights - Ensure diversity (don't repeat similar points) 3. Narrative Generation - LLM (Claude) generates human-readable text - Structured prompts ensure consistency - Facts grounded in source data 4. Quality Control - Automated fact-checking against source data - Confidence scoring based on pattern strength - Human review for high-stakes outputs
What AI Does (and Doesn't) Do
| AI Role | Human Role |
|---|---|
| Summarize complex statistics in natural language | Validate pattern detection methodology |
| Combine multiple data points into coherent narrative | Set minimum thresholds for pattern inclusion |
| Generate bet slip suggestions with reasoning | Review edge cases and unusual outputs |
| Adapt tone and detail level to context | Define business rules and constraints |
Continuous Improvement
Our methodology evolves based on results:
Feedback Loops
- Pattern accuracy tracking — Monitor hit rate of published patterns over time
- Bet slip performance — Track suggested bets against actual outcomes
- Client feedback — Incorporate operator insights on what bettors find valuable
- New data sources — Continuously evaluate emerging data providers
Deprecation Policy
Patterns that no longer meet our criteria are deprecated:
- If accuracy drops below statistical significance for 2 consecutive quarters
- If sample size becomes too small (e.g., referee retires)
- If underlying conditions change (rule changes, team restructuring)
Transparency
Every insight we deliver includes:
- Sample size — How many instances the pattern is based on
- Time period — When the data was collected
- Confidence level — Statistical significance and effect size
- Source attribution — Where the underlying data came from
We believe transparency builds trust. Bookmakers can evaluate our methodology and decide how to weight our insights alongside their own analysis.
Data Sources Referenced
- Football-data.org - Historical Match Data
- Polymarket - Prediction Market Data
- Kalshi - Event Contract Data