Formula 1 · Data Analytics · Portfolio Project

Red Bull
Dominance
Analysis

A complete end-to-end Python analytics project examining one of the most dramatic competitive shifts in recent Formula 1 history — Red Bull Racing's near-perfect 2023 season and the collapse of that dominance in 2024. Built with FastF1, Pandas, Matplotlib, and Seaborn across 46 races, 919 data points, and two very different championship stories.

2023 Red Bull Win Rate
95.5%
21 wins from 22 races — historic dominance
2024 Red Bull Win Rate
37.5%
4 teams in championship race
Verstappen 2023 Points Per Race
24.1
Down to 16.6 in 2024 — still champion

What's This Project Actually About?

In 2023, Red Bull Racing won 21 out of 22 Formula 1 races — a 95.5% win rate. It was one of the most statistically dominant campaigns in the F1 history. But the next year they only won 9 races, finished 3rd in the constructors championship, and were being outpaced by 3 different teams on any given weekend. In this project I have collected, cleaned, and analyzed real race result data from both seasons to put precise numbers on that collapse — when it started, how severe it was, and which teams closed the gap.

46

Total Races Analysed

919

Data Points Collected

25

Unique Drivers

10

Visualizations Built

Analytics Pipeline

I followed a structured 5-phase workflow from raw API data to final insights to keep my analysis organized and my mind sane! Hover over each step to see exactly what I did.

1

Data Collection

FastF1 API

DATA COLLECTION

  • Used FastF1 library to connect to the official F1 live timing API
  • Stored the fetched data in a local disk cache to avoid repeated API calls
  • Collected race results for all 22 rounds of 2023 and 24 rounds of 2024
  • Built a loop that skips already-downloaded rounds on restart
  • Saved individual CSV backups after every race as a safety net
  • Handled API rate limits using time.sleep() delays between requests
  • Final dataset: 919 rows across 46 races
2

Data Cleaning

Pandas

DATA CLEANING

  • Dropped 8 fully null columns (BroadcastName, TeamColor, HeadshotUrl, Q1/Q2/Q3 etc.)
  • Renamed all 17 remaining columns to clean snake_case format
  • Fixed data types (positions to numeric, year/round to integer, etc.)
  • Standardised team names across seasons (Alfa Romeo to Sauber, AlphaTauri to RB F1 Team, etc.)
  • Fixed DNF detection logic (separated Retired/Accident from Lapped finishers)
  • Added DNS (Did Not Start) as a separate flag
3

Feature Engineering

New KPIs

FEATURE ENGINEERING

  • Created is_redbull boolean flag for quick Red Bull filtering
  • Calculated positions_gained = grid_position - finish_position
  • Added points_finish flag (True if points > 0)
  • Built finished, dnf, and dns boolean columns from status values
  • Computed win rate, podium rate, and points per race as derived KPIs
  • Calculated Red Bull points share as % of all points scored in season
4

EDA

Matplotlib · Seaborn

EXPLORATORY DATA ANALYSIS

  • Calculated wins per team by season (21 vs 9 for Red Bull)
  • Compared total constructor points (Red Bull 790 in 2023 but only 3rd in 2024)
  • Calculated Red Bull dominance metrics (win % and points share side by side)
  • Calculated cumulative constructor points race-by-race for top 5 teams
  • Summarized Verstappen stats (wins, podiums, poles, DNFs, avg finish)
  • Analyzed DNF rate by team and season
  • Ranked top 10 driver points for both seasons
5

Advanced Analysis

6 Deep-Dive Charts

ADVANCED ANALYSIS

  • Compared cumulative driver points (Verstappen vs top 4 rivals across both seasons)
  • Calculated points gap to championship leader round-by-round
  • Created heatmap of Verstappen finishing position for every race in both seasons
  • Created simple win distribution pie chart
  • Plotted a grid vs finish scatter (Red Bull isolation vs field in 2023 vs 2024)
  • Plotted a driver consistency boxplot (Clear fall of Pérez and rise of Norris/Leclerc)

Charts I Created

I generated each of these charts using Python after full data cleaning and feature engineering. Click any chart to expand it.

Wins By Team

Race Wins by Team — 2023 vs 2024

This is the most direct summary of what changed. One bar chart tells you everything about the 2023 season, and one tells you everything about 2024. In 2023 Red Bull won 21 of 22 races. In 2024 the wins got split across Red Bull, McLaren, Ferrari, and Mercedes. This showed that competitive balance was starting to return in F1.

Constructor Points

Constructor Championship Points

The main thing that matters in F1 is points. This shows the actual championship scoring gap between Red Bull and the rest of the field across both seasons. Red Bull scored 790 points in 2023 — more than double of Mercedes who were in 2nd place. In 2024 they finished 3rd with 537, behind McLaren (609) and Ferrari (595).

Dominance Metrics

Red Bull Dominance Metrics

There are 2 metrics that can draw a very clear picture of the whole situation here — the percentage of races won by Red Bull, and the percentage of all points scored by every team combined that Red Bull actually scored. Win rate dropped from 95.5% to 37.5%. Points share of the entire field dropped from 35.2% to 22%. This quantifies the sheer scale of the competitive swing.

Cumulative Points

Cumulative Constructor Points — Race by Race

Instead of a single total, this plot tracks how points accumulated race by race — so you can see exactly when Red Bull pulled away in 2023 and exactly how differently the 2024 season unfolded. In 2023 Red Bull's line shoots away from the field almost immediately. But in 2024 the top 4 teams track closely together for the entire season.

Driver Points

Driver Championship Battle

This one is a bit of a personal comparison between Max Verstappen and his closest rivals, round by round, across both seasons. Verstappen's 2023 lead kept increasing relentlessly. But in 2024 Norris and Leclerc stayed within striking distance for much longer before Verstappen finally got ahead.

Points Gap

Points Gap to Championship Leader

This is kind of the opposite of Chart 5. Instead of showing total points, this shows how far behind the leader each driver was after every round. The steeper the drop, the more dominant the leader. The gap curves downward steeply in 2023 but in 2024 competitors stay comparatively nearer to 0 throughout.

Verstappen Heatmap

Verstappen Finishing Position Heatmap

Here every box is one race of Verstappen. Green means he won or finished on the podium. Red means a bad result. It basically summarizes his whole 2023 and 2024 seasons. 2023 is almost entirely green. But 2024 shows more variation, including a DNF and several P4-P6 finishes.

Win Distribution

Race Win Distribution

This is probably the simplest chart in the project but also the most powerful one. The 2023 chart is almost completely Red Bull with a tiny Ferrari slice. The 2024 chart is split 4 ways. This simple visual is enough to tell the whole story of F1's shift.

Grid vs Finish

Grid Position vs Finish Position

This compares where Red Bull started each race versus where they finished — plotted against the rest of the field. In 2023 those blue dots are in a completely different corner of the chart from everyone else. Blue dots (Red Bull) are isolated in the bottom and mostly in the left in 2023 which means they always qualified and finished at the front. In 2024 they start mixing into the gray field.

Driver Consistency

Driver Finishing Consistency

In F1 the key to winning a championship is considered to be consistency. This chart shows the spread of finishing positions for the top 8 drivers — a narrow box means reliable results, a wide box means unreliability. Verstappen's box is the tightest and leftmost in 2023, and still leftmost in 2024. Pérez's box widens dramatically in 2024. This suggests that his inconsistent driving was a big reason why Red Bull lost the constructors title.

Full Python Scripts

These are all the scripts that I used in this project — from data collection to advanced analysis. Click tabs to switch between files.

01_data_collection.py — used FastF1 API for safe batch download and creating CSV backups
import fastf1
import pandas as pd
import os
import time

# First I enable caching so that any session data is not redownloaded
fastf1.Cache.enable_cache("cache")

SEASONS = [2023, 2024]

def load_session_safe(year, round_num, session_type):
    # Loaded each session safely. If there is a failure then instead of crashing the function retuns None
    try:
        session = fastf1.get_session(year, round_num, session_type)
        session.load()
        print(f"   Loaded {year} Round {round_num}")
        return session
    except Exception as e:
        print(f"   Failed {year} Round {round_num}: {e}")
        return None

def extract_race_results(session, year, round_num):
    # Extracted the race results and added context columns
    try:
        results = session.results.copy()
        results['Year']      = year
        results['Round']     = round_num
        results['EventName'] = session.event['EventName']
        results['Country']   = session.event['Country']
        return results
    except Exception as e:
        print(f"   Could not extract results: {e}")
        return None

all_race_results = []

for year in SEASONS:
    schedule = fastf1.get_event_schedule(year, include_testing=False)
    total_rounds = len(schedule)
    print(f"\nSeason {year} — {total_rounds} rounds")

    for round_num in range(1, total_rounds + 1):
        backup_path = f"data/backups/{year}_round{round_num:02d}_results.csv"

        # If session data already downloaded then skip
        if os.path.exists(backup_path):
            print(f"  Round {round_num} — already saved, skipping")
            continue

        race_session = load_session_safe(year, round_num, 'R')

        if race_session is not None:
            results = extract_race_results(race_session, year, round_num)
            if results is not None:
                all_race_results.append(results)
                results.to_csv(backup_path, index=False)
                print(f"   Backup saved: {backup_path}")

        time.sleep(5)  # Enabled a safety delay to avoid rate limits

# Combined all rounds into a master CSV file
if all_race_results:
    master = pd.concat(all_race_results, ignore_index=True)
    master.to_csv("data/raw/master_race_results.csv", index=False)
    print(f"Master results saved: {len(master)} rows")
02_data_cleaning.py — removed nulls, fixed types, standardized teams and some basic feature engineering
import pandas as pd
import numpy as np

df = pd.read_csv("data/raw/master_race_results.csv")
print(f"Raw shape: {df.shape}")

# Dropped the columns that are completely null or irrelevant
cols_to_drop = [
    'BroadcastName', 'TeamColor', 'HeadshotUrl',
    'CountryCode', 'Q1', 'Q2', 'Q3',
    'TeamId', 'DriverId'
]
df = df.drop(columns=cols_to_drop)

# Renamed all columns to follow snake_case convention
df = df.rename(columns={
    'DriverNumber'      : 'driver_number',
    'Abbreviation'      : 'driver_code',
    'FirstName'         : 'first_name',
    'LastName'          : 'last_name',
    'FullName'          : 'driver_name',
    'TeamName'          : 'team',
    'Position'          : 'finish_position',
    'ClassifiedPosition': 'classified_position',
    'GridPosition'      : 'grid_position',
    'Time'              : 'race_time',
    'Status'            : 'status',
    'Points'            : 'points',
    'Laps'              : 'laps_completed',
    'Year'              : 'year',
    'Round'             : 'round',
    'EventName'         : 'race_name',
    'Country'           : 'country',
})

# Fixed data types
df['finish_position'] = pd.to_numeric(df['finish_position'], errors='coerce')
df['grid_position']   = pd.to_numeric(df['grid_position'],   errors='coerce')
df['points']          = pd.to_numeric(df['points'],          errors='coerce').fillna(0)
df['year']            = df['year'].astype(int)
df['round']           = df['round'].astype(int)

# Standardized team names for both seasons
team_name_map = {
    'Alfa Romeo' : 'Sauber',
    'AlphaTauri' : 'RB F1 Team',
}
df['team'] = df['team'].replace(team_name_map)

# Made DNF counts more accurate by counting only mechanical/accident retirements
DNF_STATUSES = ['Retired', 'Accident', 'Collision damage', 'Undertray', 'Withdrew']
DNS_STATUSES = ['Did not start']

df['dnf']             = df['status'].isin(DNF_STATUSES)
df['dns']             = df['status'].isin(DNS_STATUSES)
df['finished']        = ~df['dnf'] & ~df['dns']
df['is_redbull']      = df['team'] == 'Red Bull'
df['positions_gained'] = df['grid_position'] - df['finish_position']
df['points_finish']   = df['points'] > 0

df.to_csv("data/cleaned/master_results_cleaned.csv", index=False)
print(f"Cleaned data saved: {df.shape}")
03_eda.py — charts displaying wins, constructor points, Red Bull dominance metrics and cumulative points
import matplotlib
matplotlib.use('Agg')  # This saves charts without opening windows which makes the terminal run faster
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import os

df = pd.read_csv("data/cleaned/master_results_cleaned.csv")
os.makedirs("outputs/charts", exist_ok=True)

COLORS = {
    'Red Bull'      : '#3671C6',
    'Ferrari'       : '#E8002D',
    'Mercedes'      : '#27F4D2',
    'McLaren'       : '#FF8000',
    'Aston Martin'  : '#229971',
    'Alpine F1 Team': '#FF87BC',
    'Williams'      : '#64C4FF',
    'RB F1 Team'    : '#6692FF',
    'Haas F1 Team'  : '#B6BABD',
    'Sauber'        : '#52E252',
}

# wins per team
winners      = df[df['finish_position'] == 1]
wins_by_team = winners.groupby(['year', 'team']).size().reset_index(name='wins')

fig, axes = plt.subplots(1, 2, figsize=(14, 6))
fig.suptitle('Race Wins by Team — 2023 vs 2024', fontsize=16, fontweight='bold')

for i, year in enumerate([2023, 2024]):
    data   = wins_by_team[wins_by_team['year'] == year].sort_values('wins', ascending=True)
    colors = [COLORS.get(t, '#888') for t in data['team']]
    axes[i].barh(data['team'], data['wins'], color=colors)
    axes[i].set_title(f'{year} Season')

plt.tight_layout()
plt.savefig('outputs/charts/01_wins_by_team.png', dpi=150, bbox_inches='tight')

# constructor points
team_points = df.groupby(['year', 'team'])['points'].sum().reset_index()

# Red Bull dominance metrics
rb_wins    = df[(df['is_redbull']==True) & (df['finish_position']==1)].groupby('year').size()
rb_points  = df[df['is_redbull']==True].groupby('year')['points'].sum()
ttl_points = df.groupby('year')['points'].sum()
ttl_races  = df.groupby('year')['round'].nunique()

dominance = pd.DataFrame({
    'rb_wins'        : rb_wins,
    'rb_win_pct'     : (rb_wins / ttl_races * 100).round(1),
    'rb_points_share': (rb_points / ttl_points * 100).round(1),
}).reset_index()

# cumulative constructor points
top_teams = team_points[team_points['year']==2023].nlargest(5, 'points')['team'].tolist()

fig, axes = plt.subplots(1, 2, figsize=(16, 6))
for i, year in enumerate([2023, 2024]):
    year_df = df[df['year'] == year]
    for team in top_teams:
        rp = year_df[year_df['team']==team].groupby('round')['points'].sum().reset_index()
        rp['cumulative'] = rp['points'].cumsum()
        axes[i].plot(rp['round'], rp['cumulative'],
                     color=COLORS.get(team, '#888'), label=team, linewidth=2)

plt.tight_layout()
plt.savefig('outputs/charts/04_cumulative_points.png', dpi=150, bbox_inches='tight')
04_advanced_analysis.py — charts showing driver battles, Verstappen finishes heatmap, position change scatterplot and driver consistency boxplot
import matplotlib
matplotlib.use('Agg')
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("data/cleaned/master_results_cleaned.csv")

DRIVER_COLORS = {
    'VER': '#3671C6', 'NOR': '#FF8000',
    'LEC': '#E8002D', 'HAM': '#27F4D2', 'PER': '#9B59B6'
}

# cumulative driver points — Verstappen vs rivals
top_drivers = ['VER', 'NOR', 'LEC', 'HAM', 'PER']

fig, axes = plt.subplots(1, 2, figsize=(18, 7))
for i, year in enumerate([2023, 2024]):
    year_df = df[df['year'] == year]
    for drv in top_drivers:
        drv_df = year_df[year_df['driver_code'] == drv].sort_values('round')
        if len(drv_df) == 0: continue
        drv_df = drv_df.copy()
        drv_df['cumulative_pts'] = drv_df['points'].cumsum()
        axes[i].plot(drv_df['round'], drv_df['cumulative_pts'],
                     marker='o', markersize=3, linewidth=2,
                     color=DRIVER_COLORS[drv], label=drv)

plt.savefig('outputs/charts/05_driver_points_race_by_race.png', dpi=150)

# Verstappen heatmap
fig, axes = plt.subplots(2, 1, figsize=(18, 8))
for i, year in enumerate([2023, 2024]):
    ver_df    = df[(df['driver_code']=='VER') & (df['year']==year)].sort_values('round')
    positions = ver_df['finish_position'].values.reshape(1, -1)
    axes[i].imshow(positions, cmap='RdYlGn_r', aspect='auto', vmin=1, vmax=20)
    for j, pos in enumerate(ver_df['finish_position'].values):
        axes[i].text(j, 0, str(int(pos)), ha='center', va='center',
                    fontsize=9, fontweight='bold')

plt.savefig('outputs/charts/07_verstappen_positions_heatmap.png', dpi=150)

# grid vs finish scatter ──
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
for i, year in enumerate([2023, 2024]):
    year_df = df[df['year']==year].dropna(subset=['grid_position', 'finish_position'])
    rb      = year_df[year_df['is_redbull']==True]
    rest    = year_df[year_df['is_redbull']==False]
    axes[i].scatter(rest['grid_position'], rest['finish_position'],
                    alpha=0.4, color='#888', s=30, label='Rest of field')
    axes[i].scatter(rb['grid_position'], rb['finish_position'],
                    alpha=0.9, color='#3671C6', s=60, label='Red Bull', zorder=5)
    axes[i].plot([1, 20], [1, 20], 'k--', alpha=0.3)

plt.savefig('outputs/charts/09_grid_vs_finish.png', dpi=150)

# driver consistency boxplot
top_drivers_full = [
    'Max Verstappen', 'Lando Norris', 'Charles Leclerc',
    'Carlos Sainz', 'Lewis Hamilton', 'George Russell',
    'Sergio Pérez', 'Oscar Piastri'
]

fig, axes = plt.subplots(1, 2, figsize=(16, 7))
for i, year in enumerate([2023, 2024]):
    year_df = df[(df['year']==year) & (df['driver_name'].isin(top_drivers_full))]
    order   = year_df.groupby('driver_name')['finish_position'].median().sort_values().index
    sns.boxplot(data=year_df, x='finish_position', y='driver_name',
                order=order, ax=axes[i], palette='viridis')
    axes[i].axvline(x=10.5, color='red', linestyle='--', alpha=0.3)

plt.savefig('outputs/charts/10_driver_consistency.png', dpi=150)
View Full Project on GitHub

What The Data Says

I managed to extract 6 major insights from the 2023 and 2024 Formula 1 race datasets.

1

Historic 2023 Dominance

Red Bull's 2023 campaign is statistically one of the most dominant campaigns ever in F1 history. Winning 21 of 22 races with a 95.5% win rate and claiming 35.2% of all points scored by every team combined. No other constructor came close.

21/22 wins · 790 pts · 35.2% share
2

McLaren's Championship Turnaround

McLaren went from equal 5th in 2023 (266 points, matching Aston Martin) to constructors' champions in 2024 with 609 points. This is the biggest single-season points jump of any team in the field.

266 pts in 2023 → 609 pts in 2024
3

Verstappen's Personal Resilience

Inspite of being in a significantly slower car in 2024, Verstappen still won the drivers' championship (his 4th consecutive title). His average finish dropped from 1.27 to 3.63 but he still defeated all his rivals through consistency and race craft.

Avg finish: 1.27 → 3.63 · Still champion
4

Pérez Collapse Cost Red Bull the Title

Sergio Pérez scored 260 points in 2023 as a strong number 2 driver. But in 2024 he dropped to 138 which was a 47% reduction. His wide finishing position boxplot shows how inconsistent he was and that directly cost Red Bull the constructors championship in 2024.

260 pts → 138 pts · –47%
5

McLaren's Zero DNF Season

McLaren completed 2024 with absolutely no DNFs. Both McLarens finished every single race they started. Combined with their pace advantage in the second half of the season, this reliability was critical to their constructors title win.

0 DNFs across 48 race starts in 2024
6

Win Share Tells the Full Story

Red Bull's points share fell from 35.2% to 22% of all points scored across the field. 4 different teams won races in 2024 compared to just 2 in 2023. Formula 1 went from a one-team show to its most competitive season in years.

4 race-winning teams in 2024 vs 2 in 2023

Tech Stack

These are all the tools that I used across the full analytics pipeline.

🐍

Python 3.10

Core language for all data processing and analysis

🏎️

FastF1

Official F1 API wrapper for race data collection

🐼

Pandas

Data cleaning, transformation, and aggregation

📊

Matplotlib

Primary charting library for all visualizations

🎨

Seaborn

Statistical charts — boxplots and distribution plots

🔢

NumPy

Numerical operations and array handling

💾

CSV / Backups

Backup strategy for every race to handle API rate limits

📐

Feature Engineering

DNF flags, positions gained, points share KPIs

Who Built This

Here's myself — the analyst behind the project — background, skills, and other work.

SM

Shounak Mukherjee

Aspiring Data & Business Analyst
🎓 B.Tech CSE — IEM Kolkata (2023–present)
📍 Kolkata, India
✍️ Content Writing Intern — Mobile Gaming Head
🏎️ Interests: Music · Long Drives · Motorsports

Background

I'm a Computer Science & Engineering student at the Institute of Engineering and Management, Kolkata. I am actively building a career in data and business analytics. I have hands-on experience in data cleaning, visualization, and business insight generation using Excel, Power BI, MySQL, and Python. I have also been writing content professionally for over 2 years now that has helped strengthen my communication skills, analytical thinking, and ability to turn complex data into clear narratives.

Skills

Python Pandas MySQL Power BI Excel & Power Query Data Cleaning Feature Engineering DAX CTEs & Window Functions RFM Segmentation Matplotlib & Seaborn FastF1 API Sports Analytics

Other Projects

Forage Simulations