Engineering in the Wild

The Lazy Edge: Why I Let Vegas Do My Math

Or: How I Turned Bookmakers Into My Personal Quant Team

I was looking at Polymarket's Singapore GP odds with Max Verstappen at 27% to win. Something felt off. The responsible thing would've been to build a proper prediction model—scrape historical data, train some classifier, maybe ensemble a few approaches.

But instead, I opened six bookmaker websites in separate tabs. :)

The Realization

Here's the hypothesis: every major sportsbook employs teams of people whose entire job is predicting these races. DraftKings, BetMGM, Bet365—they're essentially running prediction services with massive budgets. They probably have data pipelines I can only dream of.

And they publish their predictions every day. Sure, they call them "odds" but if you squint, they're just probability distributions.

What if, instead of trying to build a better model, I just... used theirs?

Copying Predictions

I started collecting odds from every bookmaker I could find. After converting everything to probabilities, I noticed something interesting...

If I average all the bookmaker probabilities for each driver, I get what we could call a consensus vector. It's basically model ensembling: each bookmaker is like a different model trained on their own data, and I'm just averaging their predictions. No fancy stacking or weighted voting, just simple mean ensemble.

# After scraping and converting (mostly division)
# Each row is a driver, each column is a bookmaker
bookmaker_matrix = np.array([
    [0.333, 0.323, 0.286, 0.308],  # Verstappen
    [0.348, 0.333, 0.340, 0.333],  # Piastri  
    [0.348, 0.345, 0.340, 0.350],  # Norris
    [0.108, 0.098, 0.110, 0.095],  # Leclerc
    [0.048, 0.045, 0.050, 0.047],  # Russell
    [0.038, 0.040, 0.038, 0.039],  # Hamilton
])

# Consensus is just row-wise mean (ensemble of experts)
consensus = np.mean(bookmaker_matrix, axis=1)

# Polymarket prices (same order)
polymarket = np.array([0.27, 0.30, 0.34, 0.07, 0.02, 0.03])

# The edge vector - where positive means Polymarket underprices
edge = consensus - polymarket

# Show significant edges
drivers = ['Verstappen', 'Piastri', 'Norris', 'Leclerc', 'Russell', 'Hamilton']
for i, driver in enumerate(drivers):
    if edge[i] > 0.02:  # 2% threshold
        print(f"{driver}: Vegas says {consensus[i]:.1%}, "
              f"Polymarket says {polymarket[i]:.1%} "
              f"(gap: {edge[i]*100:+.1f}%)")

Output:

Verstappen: Vegas says 31.3%, Polymarket says 27.0% (gap: +4.3%)
Piastri: Vegas says 33.9%, Polymarket says 30.0% (gap: +3.9%)
Leclerc: Vegas says 10.3%, Polymarket says 7.0% (gap: +3.3%)
Russell: Vegas says 4.8%, Polymarket says 2.0% (gap: +2.8%)

This is basically a diff between two prediction systems. And some of the gaps are massive! How can it be exploited for fun and profit?

The Verstappen Singapore Paradox

The Verstappen edge really stood out to me. Here's a guy who's won 67 races, absolutely dominated multiple seasons, and yet... he's never won Singapore?

(I asked GPT to help me understand the historical context, and the pattern is almost comical: Verstappen has finished 2nd, 5th, 7th, DNF, 8th—everything except 1st.)

Polymarket participants seem to treat this like it's a law—"Verstappen Can't Win Singapore". But I'm guessing (hoping?) Vegas doesn't care about narratives. They have some fancy model and they see Red Bull fixed their floor issues two races ago, Verstappen just won Monza and Azerbaijan back-to-back, and Singapore's track changes in 2023 actually suit his driving style better now (fewer slow corners, more medium-speed sections).

Story vs statistics. Narrative vs numbers... and a 4.3% gap between them.

Figuring Out Bet Sizes

Now I had to figure out how much to bet on each. I recently fell down a rabbit hole reading about something called the Kelly Criterion (recommended reading: "Fortune's Formula".).

The idea is simple: bet more when you have a bigger edge, but not so much that you go broke if you're wrong.

def calculate_kelly_fractions(consensus_probs, market_prices):
    """
    Vectorized Kelly Criterion calculation.
    
    Note: Using numpy arrays means we calculate all fractions at once.
    No loops needed - just element-wise operations.
    (I think quants call this 'position sizing'?)
    """
    odds = 1.0 / market_prices
    
    # Kelly formula: f = (p*odds - 1) / (odds - 1)
    # But we vectorize it - numpy handles element-wise automatically
    kelly = (consensus_probs * odds - 1) / (odds - 1)
    
    # Only positive expected value bets
    kelly[kelly < 0] = 0
    
    # Half-Kelly for safety (full Kelly is aggressive)
    return kelly * 0.5

# Calculate all bet fractions at once
kelly_fractions = calculate_kelly_fractions(consensus, polymarket)

# Apply to bankroll
bankroll = 50
suggested_bets = kelly_fractions * bankroll

# Show non-zero suggestions
print("Kelly bet sizing (% of bankroll → dollar amount):")
for i, driver in enumerate(drivers):
    if kelly_fractions[i] > 0:
        print(f"{driver}: {kelly_fractions[i]*100:.1f}% → ${suggested_bets[i]:.0f}")

Output:

Kelly bet sizing (% of bankroll → dollar amount):
Verstappen: 5.4% → $3
Piastri: 4.8% → $2
Leclerc: 3.2% → $2
Russell: 1.2% → $1

Uh. Underwhelming huh? For $50 I was willing to bet... Kelly advised me to only put <$10 on the table. And only $3 on Verstappen!

I rechecked the math three times and its fundamentally correct... The formula is basically saying "you don't have enough edge to bet big, and you only get one shot at this"... But... BORING!

Basically, Kelly is one of those "In The Long Run..." models. Thats fine for a black jack table, but in F1 there are only so many races. I promptly decided to ignore Kelly and cherry pick a selective insight: it confirms Verstappen should be my biggest bet.

Expected Value: Another Way to Think About It

What if I just calculate the expected profit for each bet? It's like calculating the average case return.

def expected_value_vector(true_probs, market_prices):
    """
    Expected profit per dollar bet (vectorized).
    
    Beautiful thing about numpy: this works element-wise
    across all drivers simultaneously. The broadcasting
    rules handle the division implicitly.
    """
    return (true_probs / market_prices) - 1

# Calculate EV for all opportunities
ev = expected_value_vector(consensus, polymarket)

# Sort by EV (numpy argsort gives us indices)
sorted_indices = np.argsort(ev)[::-1]  # Descending

print("Expected profit per dollar bet:")
for idx in sorted_indices:
    if ev[idx] > 0:
        print(f"{drivers[idx]}: ${ev[idx]:.3f} (or {ev[idx]*100:.1f}%)")

Output:

Leclerc: $0.471 (or 47.1%)
Russell: $0.400 (or 40.0%)
Verstappen: $0.159 (or 15.9%)
Piastri: $0.130 (or 13.0%)

Wait, Leclerc shows 47% expected profit? That seems broken. But then I realized—it's because Polymarket has him at 7% while Vegas has him at 10%. Small absolute difference, but huge in relative terms.

Getting Fancy with AI Help

At this point I got curious about portfolio optimization. I asked Claude to explain the problem to me and some frameworks for it. It's like parameter optimization but for money. The idea is you want to maximize returns while minimizing variance.

The difference here is that bets are mutually exclusive: only one driver can win. That's not like stocks where multiple can go up. Except here I'm betting on multiple classes even though only one will be right.

def allocate_portfolio(edge_vector, ev_vector, bankroll=50):
    """
    Treat this like exploration vs exploitation in RL.
    High edge = exploit known good actions
    High EV = explore high-reward possibilities
    
    Using numpy means we can do this scoring in one shot
    instead of iterating through drivers.
    """
    # Only consider positive edge bets
    valid_mask = edge_vector > 0.02
    
    # Score = edge * (1 + ev), but only for valid bets
    # Kind of like combining confidence and reward in RL
    scores = np.where(valid_mask, edge_vector * (1 + ev_vector), 0)
    
    # Normalize to sum to 1, then multiply by available bankroll
    # Keep 10% reserve (like keeping epsilon for exploration)
    if scores.sum() > 0:
        weights = scores / scores.sum()
        allocation = weights * bankroll * 0.9
    else:
        allocation = np.zeros_like(scores)
    
    return allocation

# Get suggested allocation
suggested = allocate_portfolio(edge, ev, bankroll)

print("\nAlgorithmic allocation:")
for i, driver in enumerate(drivers):
    if suggested[i] > 1:
        print(f"{driver}: ${suggested[i]:.0f}")

Output:

Algorithmic allocation:
Verstappen: $16
Leclerc: $13
Piastri: $11
Russell: $5

This is basically a simplified Markowitz optimization (I think?), where I'm balancing edge (confidence / priors) against EV (returns). The algorithm wants roughly the same ranking as Kelly but with more reasonable amounts. It's still suggesting Piastri though, which makes sense mathematically but misses something important about capital efficiency.

But honestly? After all this analysis, I rounded to cleaner numbers and made a judgment call about capital efficiency. Piastri had decent edge (3.9%) and okay EV (13%), but look at the price—30%!

Think about it this way: to win $100 profit, I'd need to bet $43 on Piastri. For the same $100 profit, I only need to bet $2 on Russell or $7.50 on Leclerc. The leverage just wasn't there with Piastri.

Russell at 2% with a 2.8% edge? That's a 140% relative mispricing (2.8/2.0). Leclerc at 7% with 3.3% edge? That's 47% relative mispricing. But Piastri at 30% with 3.9% edge? Only 13% relative mispricing. I'm paying premium prices for a modest edge.

With limited capital, I wanted positions where I get the most bang for my buck—where small bets could yield big returns if Vegas is right. So I cut Piastri and concentrated on the more leveraged opportunities.

The Actual Strategy

# My final allocation (after all that overthinking)
final_bets = {
    'Verstappen': 20,  # Biggest edge, most likely to actually hit
    'Leclerc': 15,     # Crazy EV, worth the gamble
    'Russell': 10,     # 2% -> 5% is huge relatively
    'Reserve': 5       # In case lines move before race
}

# What could happen 
bet_amounts = np.array([20, 15, 10])
bet_prices = np.array([0.27, 0.07, 0.02])
payouts = bet_amounts / bet_prices
profits = payouts - bet_amounts

for i, driver in enumerate(['Verstappen', 'Leclerc', 'Russell']):
    print(f"{driver}: Bet ${bet_amounts[i]} -> "
          f"Win ${payouts[i]:.0f} (profit ${profits[i]:.0f})")

Output:

Verstappen: Bet $20 -> Win $74 (profit $54)
Leclerc: Bet $15 -> Win $214 (profit $199)
Russell: Bet $10 -> Win $500 (profit $490)

But what's the expected value of this entire portfolio?

# Portfolio EV calculation
portfolio_ev = 0
for driver, bet in final_bets.items():
    if driver != 'Reserve':
        idx = ['Verstappen', 'Leclerc', 'Russell'].index(driver)
        win_prob = consensus[drivers.index(driver)]
        expected_return = (bet / polymarket[drivers.index(driver)]) * win_prob
        expected_profit = expected_return - bet
        portfolio_ev += expected_profit
        print(f"{driver}: {win_prob:.1%} chance of ${payouts[idx]:.0f} = ${expected_return:.1f} EV")

print(f"\nTotal invested: $45")
print(f"Portfolio Expected Value: ${portfolio_ev:.1f}")
print(f"Expected Return: {portfolio_ev/45*100:.1f}%")

Output:

Verstappen: 31.3% chance of $74 = $23.2 EV
Leclerc: 10.3% chance of $214 = $22.0 EV
Russell: 4.8% chance of $500 = $24.0 EV

Total invested: $45
Portfolio Expected Value: $24.2
Expected Return: 53.8%

Wait, 53.8% expected return? That seems ridiculous. But remember, there's a ~54% chance I lose everything (none of my three drivers win).

I thought about running some Monte Carlo simulations to get confidence intervals on these returns—maybe even use PyMC to model the uncertainty in my probability estimates themselves. But at this point I'd spent more time analyzing than the $50 warranted. For now, I'm going with gut feel informed by math.

The Moment of Truth

Placing the actual bets on Polymarket felt different. I had MATH on my side!

The Verstappen bet felt reasonable. "Breaking a curse" is a narrative too, and 27% seemed genuinely low for the championship favorite.

The Leclerc bet felt optimistic. Ferrari at Singapore? They either nail the setup and win, or completely botch strategy.

But hovering over the Russell bet—$10 on something Polymarket prices at 2%—felt insane. My finger hesitated. Two percent. That's a 98% chance of losing. But Vegas says 5%. That's a 95% chance of losing. Those three percentage points represent a 2.5x disagreement about reality.

... Now we wait for Sunday.

Why I Think This Works

The core insight is simple: bookmakers and Polymarket are solving different problems.

Bookmakers are trying to minimize prediction error. They lose money if they're wrong too often. It's like a well-trained model in production—accuracy directly impacts the loss function.

Polymarket participants are often betting for entertainment, expressing opinions, or following narratives. It's more like social media sentiment than rigorous prediction. High variance, lots of noise, probably not well-calibrated.

So when these two systems disagree, I'm betting the professionals know something the crowd doesn't.

The Code (If You Want to Try This)

class LazyArbitrage:
    """
    Find differences between pro predictions (bookmakers) 
    and crowd predictions (Polymarket).
    All operations vectorized for efficiency.
    """
    
    def __init__(self, bankroll=50):
        self.bankroll = bankroll
        self.min_edge = 0.02  # 2% minimum difference
        
    def find_opportunities(self, bookmaker_matrix, polymarket_prices):
        """
        Compare two prediction systems using vector operations.
        No loops needed - numpy handles the broadcasting.
        """
        # Consensus = mean across bookmakers (axis=1 for row-wise)
        consensus = np.mean(bookmaker_matrix, axis=1)
        
        # Edge = difference vector
        edge = consensus - polymarket_prices
        
        # Kelly fractions (vectorized)
        odds = 1.0 / polymarket_prices
        kelly_full = (consensus * odds - 1) / (odds - 1)
        kelly_half = np.maximum(0, kelly_full) * 0.5
        
        # Expected value (vectorized)
        ev = (consensus / polymarket_prices) - 1
        
        # Mask for valid opportunities
        valid = edge > self.min_edge
        
        return {
            'consensus': consensus,
            'edge': edge,
            'kelly': kelly_half,
            'ev': ev,
            'valid': valid
        }
    
    def suggest_bets(self, opportunities):
        """
        Convert opportunities to bet sizes.
        Uses boolean masking instead of if-statements.
        """
        valid = opportunities['valid']
        kelly = opportunities['kelly']
        
        # Apply kelly to bankroll, but only where valid
        suggested = np.where(valid, kelly * self.bankroll, 0)
        
        # Scale down if total exceeds bankroll
        total = suggested.sum()
        if total > self.bankroll * 0.9:  # Keep 10% reserve
            suggested *= (self.bankroll * 0.9 / total)
            
        return suggested

The Philosophy

What I love about this approach is I'm not claiming to be smarter than markets. I'm just noticing that some markets are smarter than others.

Every bookmaker I checked has decades of data, proprietary models, and probably connections I can't imagine. It's basically ensemble methods applied to betting. Vegas runs multiple models (each bookmaker), I average them to get a strong predictor, then I compare that to Polymarket's crowd-sourced predictions.

(I hope the different book makers aren't correlated. note to self: in future use bookmakers from different countries)

Caveats and Reality Checks

The edge could be illusory. Maybe Polymarket knows something Vegas doesn't (doubtful, but possible). And this is one race, not the thousands of bets Kelly assumes.

But for $50 and a few hours of analysis? The risk-reward feels right.

Plus, there's something beautiful about outsourcing the work to Vegas. Every bookmaker has decades of data, proprietary models, and probably some RL agents tweaking odds in real-time. I have numpy and Claude. But sometimes... that's enough.

Sometimes the best quant... is someone else's quant.


Appendix: The Piastri Edge - Playing With House Money

Written after qualifying, before race

The Russell Position Transforms

Update: Russell secured pole. What does this mean for the initial bets? Time to revisit the math. 🧮

My original 2¢ position was a chaos bet—P(chaos) × P(Russell wins | chaos).

Now with pole, we're no longer betting on chaos; we're betting on track position advantage. Historical data shows P(win | pole, Singapore) hovering around 35%... and we got it at a massive discount. Nice :)

Volatility Extraction During Qualifying

In this process, I had also set laddered limit orders ahead of qualifying and let volatility do the work—capturing panic-buy bursts during peak uncertainty under time constraints (aka FOMO). Nothing fancy, just systematic profit-taking.

The Piastri Mispricing Appears

So with Russell on pole and Verstappen P2, I started poking around the rest of the grid. Piastri's sitting in P3 at 15¢, which seems about right if you're just looking at historical P3 win rates (13.3%).

But here's the thing - markets were pricing Verstappen and Russell like they're independent events. Like what happens to one doesn't affect the other. That's... not quite right.

The Hidden Correlation

I went back and prompted to get every time Verstappen and Russell have actually battled wheel-to-wheel since 2023 (credit: Claude Opus). Found about 10 instances. Four of them ended with contact:

Date Event Outcome Contact?
Apr 2023 Azerbaijan Sprint Sidepod damage, "d***head" comments ✓
Dec 2024 Qatar Qualifying 1-place penalty, "lost all respect" ✓
Jun 2025 Spain GP Deliberate collision, 10s penalty ✓
Jun 2025 Canada GP SC incident, Red Bull protest ✓
- Clean battles ~6 incidents ✗

That's a 40% contact rate. For context, random driver pairings are maybe 10%. These two specifically have issues with each other.

The Historical Pattern

Here's another consideration - P3 has won exactly twice at Singapore, and both times the front-runners screwed up.

In 2012, Vettel won from third because Hamilton's gearbox exploded from pole. In 2019, Vettel won from third because Ferrari's strategy accidentally undercut their own pole-sitter.

P3 doesn't win by being faster. P3 wins when the leaders take themselves out. Which is exactly what Verstappen and Russell have a habit of doing to each other...

The Math

If you assume a normal 10% incident rate, Piastri's worth maybe 12%. Market's at 15%, so you'd think he's slightly overpriced.

But plug in the actual VER-RUS contact rate:

P(VER-RUS incident) = 40%
P(Piastri wins | incident) = 60% (inherits P1, executes)
P(Piastri wins | clean race) = 5%

Fair value = (0.40 × 0.60) + (0.60 × 0.05) 
           = 0.24 + 0.03 
           = 27%

Market's at 15%. Model says 27%. That's a 12-point edge just sitting there.

Redeploying House Money

So I took some Russell profits and bought Piastri at 15¢ with a simple ladder strategy:

40% of shares sell at 30¢. That's for Turn 1 chaos. If Verstappen and Russell make contact and Piastri inherits the lead, his price will spike immediately even though there's 50+ laps left. Pure panic buying. Just harvesting the emotional buyers out there. :)

Next 20% shares at 45¢. This is the "Piastri's leading but unclear if he'll hold it" zone. Markets will overpay for possibility when the outcome's still uncertain?

Remaining shares hold to 100¢. If he's leading late, Singapore data says he finishes there. The circuit's too tight to pass. At that point the thesis is playing out - no reason to sell.

The Thesis

The gap here is the market didn't price in driver relationships... which is a significant factor in racing (and Max...). When two drivers with 40% contact history start adjacent on a street circuit, the P3 beneficiary becomes undervalued.