FR EN

Technical Validation Report v1

Wattness Race Time Prediction Engine · March 2026
511 real races · 93 athletes · 24 courses · Ironman & 70.3
← Back to Le Labo

1. Context and Objective

This document presents a rigorous evaluation of the Wattness race time prediction engine, based on a multi-discipline physics model (swim, bike, run) coupled with an individual coefficient personalization system.

The evaluation covers 511 real race results from 93 athletes across 24 courses (Ironman and 70.3), from 2017 to 2025. The protocol faithfully reproduces production conditions: chronological leave-future-out validation, where personal coefficients are only computed from prior races.

Two prediction modes are evaluated:

2. Methodology

2.1 Protocol

Each prediction is generated by the exact production pipeline: calculateRacePacing() produces the physics baseline, then fetchPersonalCoefficients() adjusts per discipline if sufficient history exists. No parameters are tuned after the fact.

2.2 Leave-future-out validation

For each athlete's race, personal coefficients are computed only from prior races (exactly as in production). Future races are never visible, eliminating any risk of data leakage.

2.3 Inclusion criteria

2.4 Metrics

MetricDefinition
MAPEMean absolute percentage error relative to observed time
MAEMean absolute error (in minutes)
MedAEMedian absolute error (robust to outliers)
P9090th percentile of absolute error (upper bound for 90% of predictions)
BiasMean signed error. Positive = predicted too slow; negative = predicted too fast

2.5 Athlete category definitions

Athletes are classified into three levels based on a multi-criteria algorithm combining physiological thresholds (FTP, CSS, CS), training volume, competition history, and consistency:

LevelTypical profilen (dataset)
EliteAdvanced athletes (ADV_* categories): high thresholds, consistent volume, proven history196
CompetitiveConfirmed and regular (CMP_*, EST_RESILIENT categories): solid base, race experience260
Age-groupDeveloping or irregular (DEV_*, EST_FRAGILE/STANDARD categories): variable profile, limited history55

2.6 Evaluation grid

RatingMAPE thresholdInterpretation
Good≤ 5 %Sufficient accuracy for reliable race planning
Fair5 – 8 %Usable with caution, noticeable gap on long races
Needs work> 8 %Significant gap, prediction should be taken as a rough estimate
Inconclusiven < 5Sample too small to draw conclusions

3. Physics Model Results (baseline / free version)

The baseline is the engine's foundation: a deterministic physics model that predicts times without personal history. This is the version available to all users, including new ones.

3.1 Overall accuracy by level

LevelnMAPEMAEMedAEP90Bias
Elite1964.6 %21 min14 min46 min-15 min
Competitive2605.5 %24 min17 min1h00≈ 0
Age-group5514.0 %1h0159 min1h42+37 min

3.2 By race format

LevelFormatnMAPEMAEBias
Elite70.31064.1 %12 min-8 min
EliteFull Ironman904.7 %29 min-19 min
Competitive70.31744.8 %15 min-3 min
CompetitiveFull Ironman866.6 %44 min-20 min
Age-groupFull Ironman239.6 %1h04+38 min
Age-group70.33213.4 %45 min+11 min

3.3 By discipline (baseline)

LevelSwim MAPEBike MAPERun MAPESwim BiasBike BiasRun Bias
Elite8.9 %4.3 %6.9 %-2 min≈ 0-8 min
Competitive11.1 %8.3 %8.6 %-2 min+11 min-9 min
Age-group8.5 %18.3 %11.3 %-2 min+31 min-1 min

Known baseline limitations: The model applies the athlete's current thresholds (FTP, CSS) to past races (2017-2025). If an athlete has improved or declined, this introduces a temporal bias, particularly visible for age-groupers (bike +31 min). Resolving this bias (storing historical thresholds) is improvement priority #1.

4. Personalization Impact (individual coefficients)

4.1 Principle

Personal coefficients are the median of observed / baseline ratios computed from the athlete's prior races (minimum 5, outlier-filtered). They correct the baseline per discipline, capturing systematic individual deviations.

4.2 Results (n=177, athletes with sufficient history)

LevelnMAPE baselineMAPE adjustedGainBias baselineBias adjusted
Elite863.8 %3.3 %-0.5 pts-9 min-4 min
Competitive765.2 %4.9 %-0.3 pts-4 min-14 min
Age-group1515.2 %7.3 %-7.9 pts+46 min+6 min

Key observation: The most dramatic gain is seen for age-groupers (MAPE 15.2 % → 7.3 %, bias +46 min → +6 min). This confirms that personal coefficients effectively compensate for baseline bias in this profile, though the sample remains limited (n=15).

5. Overall Summary (baseline + adjusted)

LevelnModeMAPEMAEMedAEP90Bias
Elite196Baseline4.6 %21 min14 min46 min-15 min
Adjusted4.4 %20 min12 min46 min-13 min
Competitive260Baseline5.5 %24 min17 min1h00≈ 0
Adjusted5.4 %25 min15 min1h00-5 min
Age-group55Baseline14.0 %1h0159 min1h42+37 min
Adjusted11.8 %53 min51 min1h42+22 min

6. Scientific Foundations

The Wattness engine is built on physics and physiology models documented in the scientific literature:

ModuleFoundationReference
Swim hydrodynamicsDrag/velocity relationship in open waterChatard et al. (1998) [7]
Bike power solverNewton-Raphson equation with resistive forces (CdA, Crr, gravity, wind)Coggan (2003) [8], Blocken et al. (2018) [6]
Run (elevation)Energy cost as a function of gradientMinetti et al. (2002) [3]
Heat penalty (WBGT)Temperature impact on marathon performanceEly et al. (2007) [2]
Bike → Run couplingTransition and pre-run fatigue effectHausswirth & Brisswalter (2008) [4], Millet & Vleck (2000) [5]
Triathlon decompositionRelative contribution of each disciplineRust et al. (2021) [1]

The model is not a black box: each prediction is decomposable segment by segment, with penalties explicitly attributed (heat, elevation, coupling, glycogen).

7. Limitations and Unmodeled Factors

Modeled with explicit penalty: Heat (WBGT), elevation, head/tailwind, bike-run coupling, glycogen depletion.

Partially modeled: Drafting (average factor by level), bike position (TT vs road).

Not modeled: Ocean current, road conditions, mechanical issues, tactical race management, weather conditions beyond temperature (rain, extreme humidity).

8. Ablation Study (contribution of each module)

To measure each sub-module's contribution, the benchmark is re-run while disabling one module at a time. Two naive estimators quantify the physics model's added value:

8.1 Overall results

VariantnMAPEMAEBiasMedAEP90% >30min
Full model5116.1 %27 min−1 min17 min1h0529.9 %
No heat5116.1 %27 min−8 min17 min1h0634.4 %
No coupling5116.0 %26 min−5 min17 min1h0431.3 %
No glycogen5116.1 %27 min−1 min17 min1h0529.9 %
Individual naive3536.0 %27 min+1 min19 min59 min34.8 %
Population naive5117.9 %34 min−5 min25 min1h1042.1 %

8.2 By athlete level

Elite (n=196)

VariantnMAPEMAEBiasMedAEP90% >30min
Full model1964.6 %20 min−15 min14 min45 min23.0 %
No heat1965.4 %25 min−21 min16 min58 min33.7 %
No coupling1964.9 %22 min−17 min15 min47 min27.0 %
Individual naive1495.4 %24 min+6 min18 min52 min31.5 %
Population naive1967.5 %30 min−3 min24 min1h0339.8 %

Competitive (n=260)

VariantnMAPEMAEBiasMedAEP90% >30min
Full model2605.5 %24 min≈ 016 min1h0024.6 %
No heat2605.3 %24 min−6 min15 min56 min26.2 %
No coupling2605.4 %24 min−3 min16 min1h0024.2 %
Individual naive1696.3 %29 min≈ 018 min1h0837.3 %
Population naive2607.9 %34 min−4 min25 min1h1942.7 %

Age-group (n=55)

VariantnMAPEMAEBiasMedAEP90% >30min
Full model5514.0 %1h00+36 min58 min1h4280.0 %
No heat5512.3 %53 min+26 min52 min1h3176.4 %
No coupling5512.8 %55 min+28 min51 min1h3280.0 %
Individual naive356.9 %33 min−12 min24 min1h0537.1 %
Population naive559.5 %44 min−15 min27 min2h0947.3 %

8.3 By race format

Full Ironman (n=199)

VariantnMAPEMAEBiasMedAEP90% >30min
Full model1996.2 %39 min−8 min28 min1h3646.2 %
No heat1996.7 %43 min−20 min34 min1h3058.3 %
No coupling1996.2 %39 min−13 min31 min1h2951.8 %
Individual naive1636.4 %40 min+6 min30 min1h3050.3 %
Population naive1997.4 %47 min−6 min39 min1h4258.8 %

70.3 (n=312)

VariantnMAPEMAEBiasMedAEP90% >30min
Full model3126.0 %18 min+2 min13 min42 min19.6 %
No heat3125.7 %18 min−1 min12 min38 min19.2 %
No coupling3125.9 %18 min≈ 013 min42 min18.3 %
Individual naive1905.6 %17 min−2 min12 min41 min21.6 %
Population naive3128.3 %26 min−4 min22 min52 min31.4 %

8.4 Ablation conclusions

Heat module: most significant impact for Elite (MAPE 4.6 % → 5.4 % without heat, +11 pts in >30min error rate). Reverse effect for Competitive (5.5 % → 5.3 % without heat) — the thermal penalty slightly overcorrects this population.

Coupling module: moderate impact on elites (MAPE +0.3 pts, +4 pts in >30min error rate). Negligible on competitive and age-group.

Glycogen module: no measurable impact at any level. This module adds no value in its current form.

8.5 Physics model vs naive estimators

The ablation compares two types of naive estimators, clarifying the physics model's value depending on context:

CriterionPhysics modelIndividual naivePopulation naive
History requiredNoneYes (same format)None
DecompositionBy discipline + segmentsTotal onlyTotal only
Course adaptationProfile, weather, conditionsNoneNone
MAPE global6.1 %6.0 % (n=353)7.9 %
MedAE global17 min19 min25 min
% erreurs >30 min29.9 %34.8 %42.1 %

Key takeaways:

9. Summary

Across 511 real results covering 93 athletes and 24 courses, the Wattness engine shows generally solid accuracy, particularly for elite and competitive profiles, with a clear gain when personal history is available. Results should be interpreted with caution for age-groupers due to a still limited sample size.

ProfileFree (MAPE)Free (MedAE)Personalized (MAPE)Best case
Elite4.6 %14 min4.4 %3.3 %
Competitive5.5 %17 min5.4 %4.9 %
Age-group14.0 %59 min11.8 %7.3 % (n=15)

The engine's value goes beyond its overall MAPE. The ablation study confirms that the physics model clearly outperforms a naive estimator without history (MAPE 6.1 % vs 7.9 %, >30min error rate 30 % vs 42 %), and that heat and coupling modules deliver real value for elites. Its value also lies in its per-discipline decomposition (enabling an actionable race plan), its course-specific adaptation (elevation, weather, technicality), and its ability to work without history — three properties a simple statistical estimator cannot offer.

The model is under continuous improvement. Priority areas are historical threshold storage (to eliminate temporal bias, the main error driver for age-groupers), improved extreme heat modeling (competitive overcorrection), and expanding the dataset for age-group profiles.

Scientific References

[1] Rust, C.A. et al. (2021). "What Is the Best Discipline to Predict Overall Triathlon Performance?" Frontiers in Physiology, 12, 654552.

[2] Ely, M.R. et al. (2007). "Impact of Weather on Marathon-Running Performance." Medicine & Science in Sports & Exercise, 39(3), 487-493.

[3] Minetti, A.E. et al. (2002). "Energy cost of walking and running at extreme uphill and downhill slopes." Journal of Applied Physiology, 93(3), 1039-1046.

[4] Hausswirth, C. & Brisswalter, J. (2008). "Strategies for improving performance in long duration events." Sports Medicine, 38(11), 881-891.

[5] Millet, G.P. & Vleck, V.E. (2000). "Physiological and biomechanical adaptations to the cycle to run transition in Olympic triathlon." British Journal of Sports Medicine, 34(5), 384-390.

[6] Blocken, B. et al. (2018). "CFD simulations of the aerodynamic drag of two drafting cyclists." Computers & Fluids, 171, 209-229.

[7] Chatard, J.C. et al. (1998). "Analysis of body composition, swimming performance and estimated energy expenditure." European Journal of Applied Physiology, 78(2), 109-113.

[8] Coggan, A.R. (2003). "Training and racing using a power meter." Training Peaks whitepaper.