Post

Bradley Terry Preference Model on NFL Games

A post on the Bradley Terry model with NFL game data. The Bradley Terry model:

\[\log{\frac{p_{ij}}{1 - p_{ij}}} = \beta_i - \beta_j\]

Is a classical model for ranking or preference data and is sometimes referred to as a preference model.

1
2
3
4
5
6
7
8
9
10
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
from IPython.display import display, Markdown
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
import patsy
import numpy as np
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_curve, roc_auc_score
1
2
3
4
5
6
7
8
9
10
11
12
13
def render_plotly_html(fig: go.Figure) -> None:
    fig.show()
    display(
        Markdown(
            fig.to_html(
                include_plotlyjs="cdn",
            )
        )
    )


def render_table(df: pd.DataFrame | pd.Series) -> Markdown:
    return Markdown(df.to_markdown())

NFL Game Data.

This data was collected through ESPN’s public API. Our goal will be to model the probability of a home team beating an away team,

\[p_{ij} = \text{Probability that home team $i$ beats away team $j$}\]

We will use the Bradley Terry model using the NFL game data.

1
2
3
4
df = pd.concat([pd.read_csv("games_2023.csv"), pd.read_csv("games_2024.csv")])
print(df.shape)
ALL_TEAMS = list(sorted(set(df.awayTeamAbbr)))
render_table(df.describe())
1
(332, 60)
 season_typeweekeventfirstDowns_awayfirstDownsPassing_awayfirstDownsRushing_awayfirstDownsPenalty_awaytotalOffensivePlays_awaytotalYards_awayyardsPerPlay_awaytotalDrives_awaynetPassingYards_awayyardsPerPass_awayinterceptions_awayrushingYards_awayrushingAttempts_awayyardsPerRushAttempt_awayturnovers_awayfumblesLost_awayinterceptions_away.1defensiveTouchdowns_awayfirstDowns_homefirstDownsPassing_homefirstDownsRushing_homefirstDownsPenalty_hometotalOffensivePlays_hometotalYards_homeyardsPerPlay_hometotalDrives_homenetPassingYards_homeyardsPerPass_homeinterceptions_homerushingYards_homerushingAttempts_homeyardsPerRushAttempt_hometurnovers_homefumblesLost_homeinterceptions_home.1defensiveTouchdowns_homeawayScorehomeScoreawayTeamIdhomeTeamIdyear
count332332332332332332332332332332332332332332332332332332332332332332332332332332332332332332332332332332332332332332332332332332332332
mean2.036148.138554.01565e+0818.701810.78316.262051.6566362.3886320.1935.1156610.9759210.7985.955420.792169109.39526.41274.082531.334340.5421690.7921690.16265119.807211.22296.786141.7981963.1416342.1175.4328311.0452224.0396.353610.746988118.07827.22894.278611.265060.5180720.7469880.10542220.439823.135516.798216.46692023.14
std0.1869325.6262343744.94.858573.608642.966171.360718.4418983.8311.143031.6921872.27841.971810.90092246.22017.381791.151091.196530.7624290.9009220.3933624.781533.74633.260961.407238.3460684.01211.220291.7297174.25082.073760.88072250.7337.788091.214861.13270.7351580.8807220.3532779.4087810.31989.423489.574420.352206
min214.01547e+08630038581.26170.602391.200006000411192.16-9-0.5017101.5000000112023
25%234.01547e+0815841562654.4101604.7073213.300001795157.752834.610169.754.9082223.500001417982023
50%27.54.01548e+08191161623235.1112115.81106254101020116.52633515.5112196.31112274.21010202317162023
75%2134.01548e+08221382683785.912259.257.1113931.254.8211023139368396.256.1122777.51147.2532521102729.252524.252023
max3184.01672e+083223166925448.31746614.24274467.8544237252088972610.21747214.45350539.76352487034342024

Treatment Contrasts

For a good review of different contrasts, see this documentation. We will use the Treatment (dummy) coding.

1
2
3
4
5
6
7
8
# Dummy encode the home team as an indicator variable.
# We `drop_first=True` to avoid collinearity amongst the variables.
# In this case, `ARI` is dropped from the dataset and represents
# the reference level (equivalent with setting \beta_ARI = 0).
X_home = pd.get_dummies(df.homeTeamAbbr, drop_first=True).astype(int)
# Ensure we have linear independence.
assert not (X_home.sum(axis=1) == 1).all()
render_table(X_home.head())
 ATLBALBUFCARCHICINCLEDALDENDETGBHOUINDJAXKCLACLARLVMIAMINNENONYGNYJPHIPITSEASFTBTENWSH
00000000000000000000000010000000
10000000000000010000000000000000
20100000000000000000000000000000
30000001000000000000000000000000
40000000000000000000100000000000
1
2
3
4
5
# Same for the away team.
X_away = pd.get_dummies(df.awayTeamAbbr, drop_first=True).astype(int)
# Ensure we have linear independence.
assert not (X_away.sum(axis=1) == 1).all()
render_table(X_away.head())
 ATLBALBUFCARCHICINCLEDALDENDETGBHOUINDJAXKCLACLARLVMIAMINNENONYGNYJPHIPITSEASFTBTENWSH
00010000000000000000000000000000
10000000001000000000000000000000
20000000000010000000000000000000
30000010000000000000000000000000
40000000000000000000000000000100

Bradley Terry Model Contrasts

To specify the Bradley Terry model, we need to model the (logit) of the probability of the home team winning as the difference between the coefficient’s for each team:

\[\log{\frac{p_{ij}}{1 - p_{ij}}} = \beta_i - \beta_j\] \[\iff p_{ij} = \frac{\exp{\beta_i - \beta_j}}{1 + \exp{\beta_i - \beta_j}} = \frac{\exp{\beta_i}}{\exp{\beta_i} + \exp{\beta_j}}\]

We can represent this numerically by taking the difference of the contrasts we created before, X_diff = X_home - X_away. Each team pairing will be uniquely encoded and the parameter estimate will focus only on the difference of those two teams.

1
2
3
4
5
6
7
8
# We should have the same teams represented in the home and away teams.
assert (X_home.columns == X_away.columns).all()
# X_diff will represent which two teams played that week.
# Specifically, +1 will represent home and -1 will be away (except reference level).
X_diff = X_home - X_away
# Ensure we have linear independence.
assert not (X_away.sum(axis=1) == 0).all()
render_table(X_diff.head())
 ATLBALBUFCARCHICINCLEDALDENDETGBHOUINDJAXKCLACLARLVMIAMINNENONYGNYJPHIPITSEASFTBTENWSH
000-10000000000000000000010000000
1000000000-1000010000000000000000
201000000000-10000000000000000000
300000-11000000000000000000000000
40000000000000000000100000000-100
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# We can visualize the "encoding" and see how often one team played another.
def encoding(x: pd.Series) -> str:
    # Just concatenate all of the contrasts into one long string encoding.
    return "".join(str(i) for i in x)


# We can see how the encoding maps back to the original teams
# and the frequency of this matchup.
df["encoding"] = X_diff.apply(encoding, axis=1)
render_table(
    df.groupby("encoding")[["homeTeamAbbr", "awayTeamAbbr"]]
    .agg(["unique", "count"])
    .sort_values(
        [("homeTeamAbbr", "count"), ("awayTeamAbbr", "count")],  # type: ignore
        ascending=False,
    )
    .head()
)
encoding(‘homeTeamAbbr’, ‘unique’)(‘homeTeamAbbr’, ‘count’)(‘awayTeamAbbr’, ‘unique’)(‘awayTeamAbbr’, ‘count’)
00-10000000000000001000000000000[‘MIA’]2[‘BUF’]2
000-1000000000000000001000000000[‘NO’]2[‘CAR’]2
00000-10000000010000000000000000[‘KC’]2[‘CIN’]2
000000-1000010000000000000000000[‘HOU’]2[‘CLE’]2
00000000000-11000000000000000000[‘IND’]2[‘HOU’]2
1
2
3
# Tie goes to the home team, since we want to keep the outcomes binary.
y = pd.Series((df.homeScore >= df.awayScore), name="homeTeamWin").astype(int)
render_table(y.head())
 homeTeamWin
01
10
21
31
40

We also include an intercept $\alpha$, which is a coefficient for the home team advantage. Our final model is then:

\[\log{\frac{p_{ij}}{1 - p_{ij}}} = \alpha + \beta_i - \beta_j = \alpha + \Delta_{ij}\]

Where $\Delta_{ij}$ is the difference of the contrasts we computed before.

Design Matrix

1
2
3
4
5
6
7
formula = "homeTeamWin ~ 1 + " + " + ".join(X_diff.columns)
y, X = patsy.dmatrices(
    formula,
    pd.concat([y, X_diff], axis=1),
    return_type="dataframe",
)
Markdown(formula)

homeTeamWin ~ 1 + ATL + BAL + BUF + CAR + CHI + CIN + CLE + DAL + DEN + DET + GB + HOU + IND + JAX + KC + LAC + LAR + LV + MIA + MIN + NE + NO + NYG + NYJ + PHI + PIT + SEA + SF + TB + TEN + WSH

VIF

Variance inflation factor measures the amount of collinearity in the dataset. It is recommended to keep this under < 5 when performing statistical inference.

1
2
3
4
5
6
7
8
9
10
11
12
# Print the VIF to see how much collinearity there are in the features.
# We want to avoid VIF > 5.
render_plotly_html(
    px.histogram(
        pd.Series(
            [variance_inflation_factor(X, i) for i, _ in enumerate(X.columns)],
            name="VIF",
        ),
        nbins=15,
        title="Distribution of VIF",
    )
)

Bradley Terry GLM

Now we fit the BT model. Since we are modeling logits of a probability, this reduces to logistic regression. This is widely available, we will use statsmodel’s GLM.

1
2
3
4
5
6
7
8
# Fit a GLM with Binomial family and logit link function.
results = sm.GLM(
    endog=y,
    exog=X,
    family=sm.families.Binomial(
        link=sm.families.links.Logit(),
    ),
).fit()
1
results.summary()
Generalized Linear Model Regression Results
Dep. Variable:homeTeamWin No. Observations: 332
Model:GLM Df Residuals: 300
Model Family:Binomial Df Model: 31
Link Function:Logit Scale: 1.0000
Method:IRLS Log-Likelihood: -198.10
Date:Sat, 28 Sep 2024 Deviance: 396.19
Time:23:13:58 Pearson chi2: 332.
No. Iterations:4 Pseudo R-squ. (CS):0.1657
Covariance Type:nonrobust
coefstd errzP>|z|[0.0250.975]
Intercept 0.2235 0.122 1.829 0.067 -0.016 0.463
ATL 0.2448 0.724 0.338 0.735 -1.174 1.664
BAL 1.9120 0.717 2.666 0.008 0.506 3.318
BUF 1.5916 0.734 2.167 0.030 0.152 3.031
CAR -0.9843 0.855 -1.152 0.249 -2.659 0.691
CHI 0.3354 0.727 0.461 0.645 -1.089 1.760
CIN 0.8876 0.713 1.245 0.213 -0.510 2.285
CLE 1.2966 0.708 1.832 0.067 -0.091 2.684
DAL 1.3452 0.711 1.892 0.058 -0.048 2.739
DEN 0.7358 0.740 0.994 0.320 -0.714 2.186
DET 1.7979 0.719 2.499 0.012 0.388 3.208
GB 1.1053 0.718 1.540 0.124 -0.302 2.513
HOU 1.1260 0.714 1.577 0.115 -0.273 2.525
IND 0.7037 0.733 0.960 0.337 -0.733 2.141
JAX 0.7857 0.741 1.060 0.289 -0.667 2.239
KC 2.0205 0.749 2.698 0.007 0.553 3.488
LAC 0.2262 0.760 0.298 0.766 -1.262 1.715
LAR 1.2363 0.670 1.845 0.065 -0.077 2.549
LV 0.6112 0.753 0.812 0.417 -0.864 2.087
MIA 1.1409 0.746 1.530 0.126 -0.321 2.602
MIN 0.9293 0.730 1.272 0.203 -0.502 2.361
NE -0.2335 0.778 -0.300 0.764 -1.757 1.290
NO 0.7731 0.743 1.041 0.298 -0.682 2.229
NYG 0.2182 0.718 0.304 0.761 -1.190 1.626
NYJ 0.5562 0.744 0.748 0.455 -0.902 2.014
PHI 1.3884 0.714 1.944 0.052 -0.012 2.788
PIT 1.4859 0.716 2.076 0.038 0.083 2.889
SEA 1.3372 0.701 1.907 0.057 -0.037 2.712
SF 1.8241 0.699 2.611 0.009 0.455 3.193
TB 1.0179 0.729 1.396 0.163 -0.411 2.447
TEN -0.1036 0.763 -0.136 0.892 -1.599 1.392
WSH -0.0545 0.726 -0.075 0.940 -1.478 1.369
1
2
3
# Verify that the Binomial family with a Logit link is equivalent with the Logit binary model.
logit_params = sm.Logit(endog=y, exog=X).fit().params
pd.testing.assert_series_equal(logit_params, results.params)
1
2
3
Optimization terminated successfully.
         Current function value: 0.596675
         Iterations 6
1
2
3
4
5
6
7
8
# Compute the confusion matrix on the training predictions.
render_table(
    pd.DataFrame(
        confusion_matrix(y_pred=results.predict(X) > 0.5, y_true=y),
        columns=["Home Loss", "Home Win"],
        index=["Home Loss", "Home Win"],
    )
)
 Home LossHome Win
Home Loss8959
Home Win43141

Analysis of GLM Results

1
2
3
4
5
6
7
8
results_df = pd.DataFrame(
    {
        "p-value": results.pvalues,
        "parameter": results.params,
        "team": results.params.index,
        "Parameter Rank": results.params.rank(ascending=False),
    }
)

P-values

1
2
3
4
5
6
7
8
9
10
11
# Visualize the parameter estimate vs. p-value.
# As expected, this should roughly trace out a normal distribution.
render_plotly_html(
    px.scatter(
        results_df,
        x="parameter",
        y="p-value",
        color="team",
        title="P-Value vs. Parameter Estimate",
    )
)

Parameter Ranks

1
2
3
4
5
6
7
8
9
10
11
# Tally all of the team's wins (if y == 1, then home won, otherwise away won).
team_wins = pd.Series(
    np.where(y.squeeze(), df.homeTeamAbbr, df.awayTeamAbbr)
).value_counts()
# Tally games played in total by each team.
team_n = pd.concat([df.homeTeamAbbr, df.awayTeamAbbr]).value_counts()
win_df = pd.DataFrame({"wins": team_wins, "n": team_n})
win_df["win"] = win_df.wins / win_df.n
win_df.sort_values(by="win", ascending=False, inplace=True)
win_df["Win Rank"] = win_df.win.rank(ascending=False)
render_table(win_df.head())
 winsnwinWin Rank
KC17230.739131
DET16230.6956522
BAL15220.6818184
BUF15220.6818184
SF15220.6818184
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Join the parameter results with the raw win percentages.
results_win_df = results_df.merge(win_df, left_index=True, right_index=True)
# Plot the parameter rank vs. the win percentage rank
fig_rank = px.scatter(
    results_win_df,
    x="Parameter Rank",
    y="Win Rank",
    color="team",
    title="Win Percentage Rank vs. Parameter Estimate Rank",
)
# Add a reference line. If points fall on the line, then parameter_rank == win_rank.
# Otherwise, the Bradley Terry model and raw win percentage disagree.
fig_rank.add_scatter(
    x=results_win_df["Parameter Rank"],
    y=results_win_df["Parameter Rank"],
    mode="lines",
    name="Reference",
    line=dict(color="red", width=2, dash="dash"),
    hoverinfo="skip",
)
render_plotly_html(fig_rank)

It seems the BT model is especially useful when there are ties in the standings and in some cases (i.e. LAR) where the parameter rank < win rank (indicating the team may be undervalued).

1
2
# How often do the rankings agree?
(results_win_df["Parameter Rank"] == results_win_df["Win Rank"]).mean()
1
0.12903225806451613

Probability of team i beating team j

We can compute the probabilities using the equation:

\[p_{ij} = \frac{\exp{\alpha + \beta_i - \beta_j}}{1 + \exp{\alpha + \beta_i - \beta_j}}\]
1
2
3
4
5
6
7
8
9
# Compute the probability of home team i defeating away team j
log_odds = (
    results.params.filter(items=["Intercept"]).sum()
    + results.params.filter(items=ALL_TEAMS).values
    - results.params.filter(items=ALL_TEAMS).values.reshape(-1, 1)
)
probability_matrix = pd.DataFrame(np.exp(log_odds) / (1 + np.exp(log_odds)))
probability_matrix.columns = results.params.filter(items=ALL_TEAMS).index
probability_matrix.index = results.params.filter(items=ALL_TEAMS).index
1
2
3
4
5
6
7
8
9
render_plotly_html(
    px.imshow(
        probability_matrix,
        labels={"x": "Home Team", "y": "Away Team"},
        title="Probability of a Home Team Winning",
        aspect="auto",
        color_continuous_scale="RdBu_r",
    )
)

To interpret, we can look at one column (i.e. KC) and see their chances of beating away teams (red scores are better). Conversely, following a row tells us how likely a team is to win on the road.

Adding in Numeric Features

Since the BT model amounts to fitting a logisitic regression model, we can extend our previous attempt with numeric features.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Focus on x_per_y style metrics and turnovers.
numeric_features = [
    i + j
    for i in [
        "thirdDownEff",
        "fourthDownEff",
        "completionAttempts",
        "yardsPerRushAttempt",
        "yardsPerPass",
        "redZoneAttempts",
        "interceptions",
        "fumblesLost",
    ]
    for j in ["_home", "_away"]
]
1
2
3
def compute_percentage(x: str, delim: str = "-") -> float:
    first, second = x.split(delim)
    return float(first) / float(second) if float(second) > 0 else 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Construct a numeric features dataframe.
X_numeric = df[numeric_features].copy()
X_numeric["redZoneAttempts_home"] = X_numeric.redZoneAttempts_home.map(
    compute_percentage
)
X_numeric["thirdDownEff_home"] = X_numeric.thirdDownEff_home.map(compute_percentage)
X_numeric["redZoneAttempts_away"] = X_numeric.redZoneAttempts_away.map(
    compute_percentage
)
X_numeric["fourthDownEff_home"] = X_numeric.fourthDownEff_home.map(compute_percentage)
X_numeric["completionAttempts_away"] = X_numeric.completionAttempts_away.map(
    lambda x: compute_percentage(x, "/")
)
X_numeric["fourthDownEff_away"] = X_numeric.fourthDownEff_away.map(compute_percentage)
X_numeric["thirdDownEff_away"] = X_numeric.thirdDownEff_away.map(compute_percentage)
X_numeric["completionAttempts_home"] = X_numeric.completionAttempts_home.map(
    lambda x: compute_percentage(x, "/")
)
# After these conversions, we should only have numeric features
assert (X_numeric.dtypes != np.object_).all()
render_table(X_numeric.head())
 thirdDownEff_homethirdDownEff_awayfourthDownEff_homefourthDownEff_awaycompletionAttempts_homecompletionAttempts_awayyardsPerRushAttempt_homeyardsPerRushAttempt_awayyardsPerPass_homeyardsPerPass_awayredZoneAttempts_homeredZoneAttempts_awayinterceptions_homeinterceptions_awayfumblesLost_homefumblesLost_away
00.3846150.384615100.6363640.7073176.14.44.74.70.3333330.51301
10.3571430.33333300.3333330.5384620.6285713.93.55.86.90.6666670.6666671001
20.5333330.38888900.250.7727270.6363643.43.1640.601011
30.2857140.133333000.5517240.43755.23.84.520.66666701010
40.4285710.352941010.750.6176472.42.27.14.80.3333330.51020

Design Matrix

We expand the design matrix from before with the numeric features.

1
2
3
4
5
6
7
8
formula_full = formula
formula_full += " + " + " + ".join(X_numeric.columns)
y, X_full = patsy.dmatrices(
    formula_full,
    pd.concat([y, X_diff, X_numeric], axis=1),
    return_type="dataframe",
)
Markdown(formula_full)

homeTeamWin ~ 1 + ATL + BAL + BUF + CAR + CHI + CIN + CLE + DAL + DEN + DET + GB + HOU + IND + JAX + KC + LAC + LAR + LV + MIA + MIN + NE + NO + NYG + NYJ + PHI + PIT + SEA + SF + TB + TEN + WSH + thirdDownEff_home + thirdDownEff_away + fourthDownEff_home + fourthDownEff_away + completionAttempts_home + completionAttempts_away + yardsPerRushAttempt_home + yardsPerRushAttempt_away + yardsPerPass_home + yardsPerPass_away + redZoneAttempts_home + redZoneAttempts_away + interceptions_home + interceptions_away + fumblesLost_home + fumblesLost_away

Full Bradley Terry GLM

1
2
3
4
5
6
7
8
9
10
# Fit a GLM with Binomial family and logit link function.
model = sm.GLM(
    endog=y,
    exog=X_full,
    family=sm.families.Binomial(
        link=sm.families.links.Logit(),
    ),
)
results_full = model.fit()
y_hat = results_full.predict(X_full)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
fpr, tpr, _ = roc_curve(y, y_hat)
fig_roc = px.line(
    pd.DataFrame({"True Positive Rate": tpr, "False Positive Rate": fpr}),
    x="False Positive Rate",
    y="True Positive Rate",
    title=f"Probability of Home Team Winning ROC Curve, AUC: {round(roc_auc_score(y, y_hat), 4)}",
)
fig_roc.add_scatter(
    x=fpr,
    y=fpr,
    mode="lines",
    name="Reference",
    line=dict(color="red", width=2, dash="dash"),
    hoverinfo="skip",
)
render_plotly_html(fig_roc)
1
2
3
4
5
6
7
render_table(
    pd.DataFrame(
        confusion_matrix(y_pred=y_hat > 0.5, y_true=y),
        columns=["Home Loss", "Home Win"],
        index=["Home Loss", "Home Win"],
    )
)
 Home LossHome Win
Home Loss13711
Home Win10174

We see a large improvement in the confusion matrix and ROC AUC from adding the numerical features.

Summary

The Bradley Terry model is a classic way to rank preferences applicable in spots statistics. More recently, it has been used in LLM training (check out the Direct Preference Optimization paper for one example) to learn human preferences.

This post is licensed under CC BY 4.0 by the author.