Every insight we deliver is backed by data. This document explains our methodology—how we collect data, detect patterns, and generate narratives that bookmakers can trust.
Data Sources
BidCanvas aggregates data from multiple sources to build a comprehensive view of each match:
Primary Sources
| Source Type | Data Collected | Update Frequency |
|---|---|---|
| Prediction Markets | Polymarket, Kalshi probabilities, volume, wallet activity | Real-time (WebSocket) |
| Sportsbook Odds | Line movements, opening/closing odds, market consensus | Every 60 seconds |
| Match Statistics | Goals, cards, corners, shots, xG, player stats | Post-match + live |
| Team/Player Data | Form, injuries, transfers, lineup history | Daily |
| Referee Data | Card tendencies, foul counts, penalty rates | Post-match |
Data Validation
Raw data goes through validation before entering our analysis pipeline:
- Cross-source verification — Match results verified across 2+ sources
- Outlier detection — Statistical anomalies flagged for manual review
- Completeness checks — Matches with missing critical data excluded
- Timestamp normalization — All data converted to UTC
Pattern Detection
We use statistical methods to identify patterns that have predictive value. Not all correlations are patterns—we apply strict criteria.
Minimum Requirements
- Sample size: Minimum 50 instances
- Statistical significance: p < 0.05
- Effect size: Meaningful deviation from baseline
- Temporal stability: Pattern holds across multiple seasons
Pattern Categories
| Category | Example | Detection Method |
|---|---|---|
| Referee Tendencies | Anthony Taylor averages 4.2 cards/match | Historical averaging + confidence intervals |
| Team Form Patterns | Teams on 4-win streak win 53.6% of next match | Conditional probability analysis |
| League Baselines | Bundesliga BTTS rate: 59.3% | Aggregate statistics by league |
| Probability Regression | Markets at 70%+ regress toward 60% | Time-series analysis |
| Sharp Money Signals | 3+ sharp wallets aligned = 64% accuracy | Wallet performance tracking |
Avoiding False Patterns
Sports data is noisy. Many apparent patterns are random variation. We guard against false positives through:
- Multiple testing correction — Bonferroni adjustment when testing many hypotheses
- Out-of-sample validation — Patterns discovered on training data must hold on test data
- Logical review — Patterns must have plausible causal mechanism
- Regular re-validation — Published patterns re-tested quarterly
AI Narrative Generation
Raw statistics are hard to consume. We use AI to transform data into readable narratives.
The Process
# Simplified narrative generation flow 1. Data Assembly - Match context (teams, league, competition stage) - Relevant patterns (referee, form, H2H) - Current odds and market movements - Sharp wallet positions 2. Pattern Ranking - Score each pattern by relevance to this match - Filter to top 3-5 most relevant insights - Ensure diversity (don't repeat similar points) 3. Narrative Generation - LLM (Claude) generates human-readable text - Structured prompts ensure consistency - Facts grounded in source data 4. Quality Control - Automated fact-checking against source data - Confidence scoring based on pattern strength - Human review for high-stakes outputs
What AI Does (and Doesn't) Do
| AI Role | Human Role |
|---|---|
| Summarize complex statistics in natural language | Validate pattern detection methodology |
| Combine multiple data points into coherent narrative | Set minimum thresholds for pattern inclusion |
| Generate bet slip suggestions with reasoning | Review edge cases and unusual outputs |
| Adapt tone and detail level to context | Define business rules and constraints |
Continuous Improvement
Our methodology evolves based on results:
Feedback Loops
- Pattern accuracy tracking — Monitor hit rate of published patterns over time
- Bet slip performance — Track suggested bets against actual outcomes
- Client feedback — Incorporate operator insights on what bettors find valuable
- New data sources — Continuously evaluate emerging data providers
Deprecation Policy
Patterns that no longer meet our criteria are deprecated:
- If accuracy drops below statistical significance for 2 consecutive quarters
- If sample size becomes too small (e.g., referee retires)
- If underlying conditions change (rule changes, team restructuring)
Transparency
Every insight we deliver includes:
- Sample size — How many instances the pattern is based on
- Time period — When the data was collected
- Confidence level — Statistical significance and effect size
- Source attribution — Where the underlying data came from
We believe transparency builds trust. Bookmakers can evaluate our methodology and decide how to weight our insights alongside their own analysis.