Analysing Trail Races with Python

I’m a trail runner and a data scientist — so naturally I wrote code to analyse my own races. This post walks through the pipeline I built after the 2025 Raid West Trail to compare my race performance against a previous recon run on the same course.

The Data Source

Garmin Connect exports GPX files containing timestamped GPS coordinates, elevation, and heart rate. A single race produces roughly 5,000–15,000 trackpoints depending on duration.

import gpxpy

with open('race.gpx') as f:
    gpx = gpxpy.parse(f)

points = [
    {
        'lat': pt.latitude,
        'lon': pt.longitude,
        'ele': pt.elevation,
        'time': pt.time,
    }
    for track in gpx.tracks
    for segment in track.segments
    for pt in segment.points
]

Calculating Pace

Raw GPS data gives you position and time. Pace is derived from Haversine distance between consecutive points:

from math import radians, sin, cos, sqrt, atan2

def haversine(p1, p2):
    R = 6371000  # Earth radius in metres
    lat1, lat2 = map(radians, [p1['lat'], p2['lat']])
    lon1, lon2 = map(radians, [p1['lon'], p2['lon']])
    dlat = lat2 - lat1
    dlon = lon2 - lon1
    a = sin(dlat/2)**2 + cos(lat1)*cos(lat2)*sin(dlon/2)**2
    return R * 2 * atan2(sqrt(a), sqrt(1 - a))

df['dist_m'] = [
    haversine(df.iloc[i-1], df.iloc[i])
    for i in range(1, len(df))
] + [0]

df['pace_min_km'] = (df['elapsed_s'].diff() / 60) / (df['dist_m'] / 1000)

Elevation-Adjusted Effort

On trails, flat pace means nothing — a 7 min/km on a 400m/km climb is elite. I use the Minetti cost function to calculate equivalent flat pace:

def minetti_cost(grade):
    """Metabolic cost relative to flat (grade as fraction, e.g. 0.3 = 30%)"""
    g = grade
    return 280.5*g**5 - 58.7*g**4 - 76.8*g**3 + 51.9*g**2 + 19.6*g + 3.6

df['grade'] = df['ele'].diff() / df['dist_m']
df['effort_factor'] = df['grade'].apply(minetti_cost) / minetti_cost(0)
df['adj_pace'] = df['pace_min_km'] / df['effort_factor']

Race vs Recon Comparison

The interesting part — overlaying two GPX files on the same course:

import pandas as pd
import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)

axes[0].plot(race['cum_km'], race['pace_min_km'],    color='#E10600', label='Race Day',   alpha=0.8)
axes[0].plot(recon['cum_km'], recon['pace_min_km'],  color='#888888', label='Recon Run', alpha=0.6)
axes[0].set_ylabel('Pace (min/km)')
axes[0].legend()
axes[0].set_title('Race vs Recon — Pace Comparison')

axes[1].fill_between(race['cum_km'], race['elevation'], alpha=0.4, color='#E10600')
axes[1].set_ylabel('Elevation (m)')
axes[1].set_xlabel('Distance (km)')

Key Findings from Raid West 2025

Segment	Race Pace	Recon Pace	Delta
Col de la Prise	10:42/km	11:15/km	-33s
Summit plateau	8:04/km	9:12/km	-68s
Final descent	6:31/km	7:10/km	-39s

Race adrenaline is real — I was 8–12% faster across every segment compared to the recon run at the same perceived effort.

What’s Next

I’m building a small web dashboard to visualise these comparisons interactively. The scraper at gtrail_data already pulls race results from the PowerBI dashboards used by Mauritius trail events — next step is combining that with personal GPS data for full race analysis.

Follow the project on GitHub — PRs and issues welcome.