F1 Red Bull Dominance Analysis — Shounak Mukherjee

Methodology

Analytics Pipeline

I followed a structured 5-phase workflow from raw API data to final insights to keep my analysis organized and my mind sane! Hover over each step to see exactly what I did.

Data Collection

FastF1 API

Data Cleaning

Pandas

Feature Engineering

New KPIs

EDA

Matplotlib · Seaborn

Advanced Analysis

6 Deep-Dive Charts

Visualizations

Charts I Created

I generated each of these charts using Python after full data cleaning and feature engineering. Click any chart to expand it.

Race Wins by Team — 2023 vs 2024

This is the most direct summary of what changed. One bar chart tells you everything about the 2023 season, and one tells you everything about 2024. In 2023 Red Bull won 21 of 22 races. In 2024 the wins got split across Red Bull, McLaren, Ferrari, and Mercedes. This showed that competitive balance was starting to return in F1.

Constructor Championship Points

The main thing that matters in F1 is points. This shows the actual championship scoring gap between Red Bull and the rest of the field across both seasons. Red Bull scored 790 points in 2023 — more than double of Mercedes who were in 2nd place. In 2024 they finished 3rd with 537, behind McLaren (609) and Ferrari (595).

Red Bull Dominance Metrics

There are 2 metrics that can draw a very clear picture of the whole situation here — the percentage of races won by Red Bull, and the percentage of all points scored by every team combined that Red Bull actually scored. Win rate dropped from 95.5% to 37.5%. Points share of the entire field dropped from 35.2% to 22%. This quantifies the sheer scale of the competitive swing.

Cumulative Constructor Points — Race by Race

Instead of a single total, this plot tracks how points accumulated race by race — so you can see exactly when Red Bull pulled away in 2023 and exactly how differently the 2024 season unfolded. In 2023 Red Bull's line shoots away from the field almost immediately. But in 2024 the top 4 teams track closely together for the entire season.

Driver Championship Battle

This one is a bit of a personal comparison between Max Verstappen and his closest rivals, round by round, across both seasons. Verstappen's 2023 lead kept increasing relentlessly. But in 2024 Norris and Leclerc stayed within striking distance for much longer before Verstappen finally got ahead.

Points Gap to Championship Leader

This is kind of the opposite of Chart 5. Instead of showing total points, this shows how far behind the leader each driver was after every round. The steeper the drop, the more dominant the leader. The gap curves downward steeply in 2023 but in 2024 competitors stay comparatively nearer to 0 throughout.

Verstappen Finishing Position Heatmap

Here every box is one race of Verstappen. Green means he won or finished on the podium. Red means a bad result. It basically summarizes his whole 2023 and 2024 seasons. 2023 is almost entirely green. But 2024 shows more variation, including a DNF and several P4-P6 finishes.

Race Win Distribution

This is probably the simplest chart in the project but also the most powerful one. The 2023 chart is almost completely Red Bull with a tiny Ferrari slice. The 2024 chart is split 4 ways. This simple visual is enough to tell the whole story of F1's shift.

Grid Position vs Finish Position

This compares where Red Bull started each race versus where they finished — plotted against the rest of the field. In 2023 those blue dots are in a completely different corner of the chart from everyone else. Blue dots (Red Bull) are isolated in the bottom and mostly in the left in 2023 which means they always qualified and finished at the front. In 2024 they start mixing into the gray field.

Driver Finishing Consistency

In F1 the key to winning a championship is considered to be consistency. This chart shows the spread of finishing positions for the top 8 drivers — a narrow box means reliable results, a wide box means unreliability. Verstappen's box is the tightest and leftmost in 2023, and still leftmost in 2024. Pérez's box widens dramatically in 2024. This suggests that his inconsistent driving was a big reason why Red Bull lost the constructors title.

Source Code

Full Python Scripts

These are all the scripts that I used in this project — from data collection to advanced analysis. Click tabs to switch between files.

        01_data_collection.py — used FastF1 API for safe batch download and creating CSV backups
        
      

        import fastf1
import pandas as pd
import os
import time

# First I enable caching so that any session data is not redownloaded
fastf1.Cache.enable_cache("cache")

SEASONS = [2023, 2024]

def load_session_safe(year, round_num, session_type):
    # Loaded each session safely. If there is a failure then instead of crashing the function retuns None
    try:
        session = fastf1.get_session(year, round_num, session_type)
        session.load()
        print(f"   Loaded {year} Round {round_num}")
        return session
    except Exception as e:
        print(f"   Failed {year} Round {round_num}: {e}")
        return None

def extract_race_results(session, year, round_num):
    # Extracted the race results and added context columns
    try:
        results = session.results.copy()
        results['Year']      = year
        results['Round']     = round_num
        results['EventName'] = session.event['EventName']
        results['Country']   = session.event['Country']
        return results
    except Exception as e:
        print(f"   Could not extract results: {e}")
        return None

all_race_results = []

for year in SEASONS:
    schedule = fastf1.get_event_schedule(year, include_testing=False)
    total_rounds = len(schedule)
    print(f"\nSeason {year} — {total_rounds} rounds")

    for round_num in range(1, total_rounds + 1):
        backup_path = f"data/backups/{year}_round{round_num:02d}_results.csv"

        # If session data already downloaded then skip
        if os.path.exists(backup_path):
            print(f"  Round {round_num} — already saved, skipping")
            continue

        race_session = load_session_safe(year, round_num, 'R')

        if race_session is not None:
            results = extract_race_results(race_session, year, round_num)
            if results is not None:
                all_race_results.append(results)
                results.to_csv(backup_path, index=False)
                print(f"   Backup saved: {backup_path}")

        time.sleep(5)  # Enabled a safety delay to avoid rate limits

# Combined all rounds into a master CSV file
if all_race_results:
    master = pd.concat(all_race_results, ignore_index=True)
    master.to_csv("data/raw/master_race_results.csv", index=False)
    print(f"Master results saved: {len(master)} rows")
      

        02_data_cleaning.py — removed nulls, fixed types, standardized teams and some basic feature engineering
        
      

        import pandas as pd
import numpy as np

df = pd.read_csv("data/raw/master_race_results.csv")
print(f"Raw shape: {df.shape}")

# Dropped the columns that are completely null or irrelevant
cols_to_drop = [
    'BroadcastName', 'TeamColor', 'HeadshotUrl',
    'CountryCode', 'Q1', 'Q2', 'Q3',
    'TeamId', 'DriverId'
]
df = df.drop(columns=cols_to_drop)

# Renamed all columns to follow snake_case convention
df = df.rename(columns={
    'DriverNumber'      : 'driver_number',
    'Abbreviation'      : 'driver_code',
    'FirstName'         : 'first_name',
    'LastName'          : 'last_name',
    'FullName'          : 'driver_name',
    'TeamName'          : 'team',
    'Position'          : 'finish_position',
    'ClassifiedPosition': 'classified_position',
    'GridPosition'      : 'grid_position',
    'Time'              : 'race_time',
    'Status'            : 'status',
    'Points'            : 'points',
    'Laps'              : 'laps_completed',
    'Year'              : 'year',
    'Round'             : 'round',
    'EventName'         : 'race_name',
    'Country'           : 'country',
})

# Fixed data types
df['finish_position'] = pd.to_numeric(df['finish_position'], errors='coerce')
df['grid_position']   = pd.to_numeric(df['grid_position'],   errors='coerce')
df['points']          = pd.to_numeric(df['points'],          errors='coerce').fillna(0)
df['year']            = df['year'].astype(int)
df['round']           = df['round'].astype(int)

# Standardized team names for both seasons
team_name_map = {
    'Alfa Romeo' : 'Sauber',
    'AlphaTauri' : 'RB F1 Team',
}
df['team'] = df['team'].replace(team_name_map)

# Made DNF counts more accurate by counting only mechanical/accident retirements
DNF_STATUSES = ['Retired', 'Accident', 'Collision damage', 'Undertray', 'Withdrew']
DNS_STATUSES = ['Did not start']

df['dnf']             = df['status'].isin(DNF_STATUSES)
df['dns']             = df['status'].isin(DNS_STATUSES)
df['finished']        = ~df['dnf'] & ~df['dns']
df['is_redbull']      = df['team'] == 'Red Bull'
df['positions_gained'] = df['grid_position'] - df['finish_position']
df['points_finish']   = df['points'] > 0

df.to_csv("data/cleaned/master_results_cleaned.csv", index=False)
print(f"Cleaned data saved: {df.shape}")
      

        03_eda.py — charts displaying wins, constructor points, Red Bull dominance metrics and cumulative points
        
      

        import matplotlib
matplotlib.use('Agg')  # This saves charts without opening windows which makes the terminal run faster
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import os

df = pd.read_csv("data/cleaned/master_results_cleaned.csv")
os.makedirs("outputs/charts", exist_ok=True)

COLORS = {
    'Red Bull'      : '#3671C6',
    'Ferrari'       : '#E8002D',
    'Mercedes'      : '#27F4D2',
    'McLaren'       : '#FF8000',
    'Aston Martin'  : '#229971',
    'Alpine F1 Team': '#FF87BC',
    'Williams'      : '#64C4FF',
    'RB F1 Team'    : '#6692FF',
    'Haas F1 Team'  : '#B6BABD',
    'Sauber'        : '#52E252',
}

# wins per team
winners      = df[df['finish_position'] == 1]
wins_by_team = winners.groupby(['year', 'team']).size().reset_index(name='wins')

fig, axes = plt.subplots(1, 2, figsize=(14, 6))
fig.suptitle('Race Wins by Team — 2023 vs 2024', fontsize=16, fontweight='bold')

for i, year in enumerate([2023, 2024]):
    data   = wins_by_team[wins_by_team['year'] == year].sort_values('wins', ascending=True)
    colors = [COLORS.get(t, '#888') for t in data['team']]
    axes[i].barh(data['team'], data['wins'], color=colors)
    axes[i].set_title(f'{year} Season')

plt.tight_layout()
plt.savefig('outputs/charts/01_wins_by_team.png', dpi=150, bbox_inches='tight')

# constructor points
team_points = df.groupby(['year', 'team'])['points'].sum().reset_index()

# Red Bull dominance metrics
rb_wins    = df[(df['is_redbull']==True) & (df['finish_position']==1)].groupby('year').size()
rb_points  = df[df['is_redbull']==True].groupby('year')['points'].sum()
ttl_points = df.groupby('year')['points'].sum()
ttl_races  = df.groupby('year')['round'].nunique()

dominance = pd.DataFrame({
    'rb_wins'        : rb_wins,
    'rb_win_pct'     : (rb_wins / ttl_races * 100).round(1),
    'rb_points_share': (rb_points / ttl_points * 100).round(1),
}).reset_index()

# cumulative constructor points
top_teams = team_points[team_points['year']==2023].nlargest(5, 'points')['team'].tolist()

fig, axes = plt.subplots(1, 2, figsize=(16, 6))
for i, year in enumerate([2023, 2024]):
    year_df = df[df['year'] == year]
    for team in top_teams:
        rp = year_df[year_df['team']==team].groupby('round')['points'].sum().reset_index()
        rp['cumulative'] = rp['points'].cumsum()
        axes[i].plot(rp['round'], rp['cumulative'],
                     color=COLORS.get(team, '#888'), label=team, linewidth=2)

plt.tight_layout()
plt.savefig('outputs/charts/04_cumulative_points.png', dpi=150, bbox_inches='tight')
      

        04_advanced_analysis.py — charts showing driver battles, Verstappen finishes heatmap, position change scatterplot and driver consistency boxplot
        
      

        import matplotlib
matplotlib.use('Agg')
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("data/cleaned/master_results_cleaned.csv")

DRIVER_COLORS = {
    'VER': '#3671C6', 'NOR': '#FF8000',
    'LEC': '#E8002D', 'HAM': '#27F4D2', 'PER': '#9B59B6'
}

# cumulative driver points — Verstappen vs rivals
top_drivers = ['VER', 'NOR', 'LEC', 'HAM', 'PER']

fig, axes = plt.subplots(1, 2, figsize=(18, 7))
for i, year in enumerate([2023, 2024]):
    year_df = df[df['year'] == year]
    for drv in top_drivers:
        drv_df = year_df[year_df['driver_code'] == drv].sort_values('round')
        if len(drv_df) == 0: continue
        drv_df = drv_df.copy()
        drv_df['cumulative_pts'] = drv_df['points'].cumsum()
        axes[i].plot(drv_df['round'], drv_df['cumulative_pts'],
                     marker='o', markersize=3, linewidth=2,
                     color=DRIVER_COLORS[drv], label=drv)

plt.savefig('outputs/charts/05_driver_points_race_by_race.png', dpi=150)

# Verstappen heatmap
fig, axes = plt.subplots(2, 1, figsize=(18, 8))
for i, year in enumerate([2023, 2024]):
    ver_df    = df[(df['driver_code']=='VER') & (df['year']==year)].sort_values('round')
    positions = ver_df['finish_position'].values.reshape(1, -1)
    axes[i].imshow(positions, cmap='RdYlGn_r', aspect='auto', vmin=1, vmax=20)
    for j, pos in enumerate(ver_df['finish_position'].values):
        axes[i].text(j, 0, str(int(pos)), ha='center', va='center',
                    fontsize=9, fontweight='bold')

plt.savefig('outputs/charts/07_verstappen_positions_heatmap.png', dpi=150)

# grid vs finish scatter ──
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
for i, year in enumerate([2023, 2024]):
    year_df = df[df['year']==year].dropna(subset=['grid_position', 'finish_position'])
    rb      = year_df[year_df['is_redbull']==True]
    rest    = year_df[year_df['is_redbull']==False]
    axes[i].scatter(rest['grid_position'], rest['finish_position'],
                    alpha=0.4, color='#888', s=30, label='Rest of field')
    axes[i].scatter(rb['grid_position'], rb['finish_position'],
                    alpha=0.9, color='#3671C6', s=60, label='Red Bull', zorder=5)
    axes[i].plot([1, 20], [1, 20], 'k--', alpha=0.3)

plt.savefig('outputs/charts/09_grid_vs_finish.png', dpi=150)

# driver consistency boxplot
top_drivers_full = [
    'Max Verstappen', 'Lando Norris', 'Charles Leclerc',
    'Carlos Sainz', 'Lewis Hamilton', 'George Russell',
    'Sergio Pérez', 'Oscar Piastri'
]

fig, axes = plt.subplots(1, 2, figsize=(16, 7))
for i, year in enumerate([2023, 2024]):
    year_df = df[(df['year']==year) & (df['driver_name'].isin(top_drivers_full))]
    order   = year_df.groupby('driver_name')['finish_position'].median().sort_values().index
    sns.boxplot(data=year_df, x='finish_position', y='driver_name',
                order=order, ax=axes[i], palette='viridis')
    axes[i].axvline(x=10.5, color='red', linestyle='--', alpha=0.3)

plt.savefig('outputs/charts/10_driver_consistency.png', dpi=150)
      

View Full Project on GitHub

Key Findings

What The Data Says

I managed to extract 6 major insights from the 2023 and 2024 Formula 1 race datasets.

Historic 2023 Dominance

Red Bull's 2023 campaign is statistically one of the most dominant campaigns ever in F1 history. Winning 21 of 22 races with a 95.5% win rate and claiming 35.2% of all points scored by every team combined. No other constructor came close.

21/22 wins · 790 pts · 35.2% share

McLaren's Championship Turnaround

McLaren went from equal 5th in 2023 (266 points, matching Aston Martin) to constructors' champions in 2024 with 609 points. This is the biggest single-season points jump of any team in the field.

266 pts in 2023 → 609 pts in 2024

Verstappen's Personal Resilience

Inspite of being in a significantly slower car in 2024, Verstappen still won the drivers' championship (his 4th consecutive title). His average finish dropped from 1.27 to 3.63 but he still defeated all his rivals through consistency and race craft.

Avg finish: 1.27 → 3.63 · Still champion

Pérez Collapse Cost Red Bull the Title

Sergio Pérez scored 260 points in 2023 as a strong number 2 driver. But in 2024 he dropped to 138 which was a 47% reduction. His wide finishing position boxplot shows how inconsistent he was and that directly cost Red Bull the constructors championship in 2024.

260 pts → 138 pts · –47%

McLaren's Zero DNF Season

McLaren completed 2024 with absolutely no DNFs. Both McLarens finished every single race they started. Combined with their pace advantage in the second half of the season, this reliability was critical to their constructors title win.

0 DNFs across 48 race starts in 2024

Win Share Tells the Full Story

Red Bull's points share fell from 35.2% to 22% of all points scored across the field. 4 different teams won races in 2024 compared to just 2 in 2023. Formula 1 went from a one-team show to its most competitive season in years.

4 race-winning teams in 2024 vs 2 in 2023

About

Who Built This

Here's myself — the analyst behind the project — background, skills, and other work.

Shounak Mukherjee

Aspiring Data & Business Analyst

🎓 B.Tech CSE — IEM Kolkata (2023–present)

📍 Kolkata, India

✍️ Content Writing Intern — Mobile Gaming Head

🏎️ Interests: Music · Long Drives · Motorsports

💼 LinkedIn Profile 📧 mukherjeeshounak05@gmail.com 🐙 GitHub Profile

Background

I'm a Computer Science & Engineering student at the Institute of Engineering and Management, Kolkata. I am actively building a career in data and business analytics. I have hands-on experience in data cleaning, visualization, and business insight generation using Excel, Power BI, MySQL, and Python. I have also been writing content professionally for over 2 years now that has helped strengthen my communication skills, analytical thinking, and ability to turn complex data into clear narratives.

Skills

Python Pandas MySQL Power BI Excel & Power Query Data Cleaning Feature Engineering DAX CTEs & Window Functions RFM Segmentation Matplotlib & Seaborn FastF1 API Sports Analytics

Forage Simulations

Accenture — Data Analytics & Visualization Red Bull — Off-Premise Sales Analysis TATA — Data Visualisation

Red Bull Dominance Analysis

What's This Project Actually About?

46

919

25

10

Analytics Pipeline

Data Collection

DATA COLLECTION

Data Cleaning

DATA CLEANING

Feature Engineering

FEATURE ENGINEERING

EDA

EXPLORATORY DATA ANALYSIS

Advanced Analysis

ADVANCED ANALYSIS

Charts I Created

Race Wins by Team — 2023 vs 2024

Constructor Championship Points

Red Bull Dominance Metrics

Cumulative Constructor Points — Race by Race

Driver Championship Battle

Points Gap to Championship Leader

Verstappen Finishing Position Heatmap

Race Win Distribution

Grid Position vs Finish Position

Driver Finishing Consistency

Full Python Scripts

What The Data Says

Historic 2023 Dominance

McLaren's Championship Turnaround

Verstappen's Personal Resilience

Pérez Collapse Cost Red Bull the Title

McLaren's Zero DNF Season

Win Share Tells the Full Story

Tech Stack

Python 3.10

FastF1

Pandas

Matplotlib

Seaborn

NumPy

CSV / Backups

Feature Engineering

Who Built This

Shounak Mukherjee

Background

Skills

Other Projects

E-Commerce Sales & Customer Analysis

Cigarettes & Alcohol Addiction Analysis

SQL Data Cleaning — World Layoffs Dataset

Forage Simulations

Red Bull
Dominance
Analysis