NFL Week 12 Results: XGBoost's Revenge Tour & The Perfect Consensus

Published on Tue Nov 25 2025 09:00:00 GMT+0000 (Coordinated Universal Time) by cresencio

Week 12 is in the books, and our ensemble delivered a solid 10-4 performance (71.4%) — perfectly respectable, but the real story is what happened under the hood. XGBoost had its best week of the season, our consensus picks went 6-for-6, and the coin-flip games… well, they lived up to their name.

The Week 12 Scoreboard

10-4

71.4% Accuracy

Complete Results

Game	Prediction	Confidence	Result	Margin
BUF @ HOU	BUF	51.0%	❌ HOU 23-19	-4
NYJ @ BAL	BAL	83.6%	✅ BAL 23-10	+13
NYG @ DET	DET	83.6%	✅ DET 34-27	+7
IND @ KC	KC	66.0%	✅ KC 23-20	+3
SEA @ TEN	SEA	87.9%	✅ SEA 30-24	+6
PIT @ CHI	CHI	50.1%	✅ CHI 31-28	+3
NE @ CIN	CIN	60.8%	❌ NE 26-20	-6
MIN @ GB	GB	61.4%	✅ GB 23-6	+17
ATL @ NO	NO	58.7%	❌ ATL 24-10	-14
CLE @ LV	CLE	50.1%	✅ CLE 24-10	+14
PHI @ DAL	PHI	54.7%	❌ DAL 24-21	-3
JAX @ ARI	JAX	51.1%	✅ JAX 27-24	+3
TB @ LA	LA	65.6%	✅ LA 34-7	+27
CAR @ SF	SF	71.9%	✅ SF 20-9	+11

Model Showdown: XGBoost’s Redemption Arc

Week 12 was a tale of model redemption. XGBoost, which had been struggling all season with a cumulative 41.6% accuracy, suddenly clicked into gear with a dominant 85.7% week.

Week 12 Model Breakdown

Model	W-L	Accuracy	Season Total
XGBoost	12-2	85.7%	41.6%
Random Forest	12-2	85.7%	44.9%
Logistic	10-4	71.4%	63.5%
ELO	8-6	57.1%	61.8%
Bayesian	7-7	50.0%	55.6%
Ensemble	10-4	71.4%	61.2%

The machine learning models finally had their moment. XGBoost and Random Forest tied for the week’s best performance at 85.7%, while our steady ELO model had an uncharacteristically rough outing at 57.1%.

Key Insight: XGBoost went from season-worst to week-best, a 44-point swing that suggests the model may finally be adapting to mid-season patterns.

The Perfect Consensus: 6-for-6

When all models agree, good things happen. Our consensus picks — games where every model pointed the same direction — went a perfect 6-for-6.

Consensus Games

Game	Consensus Pick	Confidence	Result	Margin
NYJ @ BAL	Baltimore	83.6%	✅	+13
NYG @ DET	Detroit	83.6%	✅	+7
IND @ KC	Kansas City	66.0%	✅	+3
SEA @ TEN	Seattle	87.9%	✅	+6
TB @ LA	LA Rams	65.6%	✅	+27
CAR @ SF	San Francisco	71.9%	✅	+11

When our models unanimously agree, they’re now 42-8 (84%) on the season. That’s not just good — that’s bankable.

The Four Horsemen of Chaos

Every week has its upsets. Week 12 had four, and they came exactly where you’d expect — the coin-flip games under 55% confidence.

Week 12 Upsets

Game	We Picked	Winner	Confidence	Loss Type
BUF @ HOU	Buffalo	Houston	51.0%	Coin flip
NE @ CIN	Cincinnati	New England	60.8%	Value play backfired
ATL @ NO	New Orleans	Atlanta	58.7%	Division chaos
PHI @ DAL	Philadelphia	Dallas	54.7%	Coin flip

The NE @ CIN miss is the interesting one — that was our biggest “value play” of the season with a 35.6% edge. The models saw a 1-point game and got burned when New England actually won outright. The other three were all in the “anything can happen” zone under 60%.

Confidence Calibration: The Truth in the Numbers

Season-long, our confidence buckets continue to hold up remarkably well:

Confidence Range	Games	Correct	Accuracy
50-55%	40	20	50.0%
55-60%	35	19	54.3%
60-65%	28	16	57.1%
65-70%	25	15	60.0%
70-80%	32	25	78.1%
80%+	17	14	82.4%

The pattern is clear: when we’re confident, we’re usually right. Games over 70% confidence hit at nearly 80%, while coin-flips are exactly that — 50/50.

Weekly Accuracy Trends

The weekly trends tell an interesting story. XGBoost has been inconsistent all season but exploded in Week 12. ELO has been our steadiest performer but stumbled this week. The Ensemble continues to smooth out individual model volatility.

Scoring Analysis: Close But Not Always Right

We correctly predicted the game winner in 10 of 14 games, but how accurate were our score predictions?

Metric	Value
Avg Total Error	8.6 points
Avg Margin Error	7.0 points
Best Prediction	PHI@DAL (0.97 pts off total)
Worst Prediction	MIN@GB (17.53 pts off total)

Notable scoring misses:

NYG @ DET: Predicted 46.4 total, actual was 61 (+14.6 off)
MIN @ GB: Predicted 46.5 total, actual was 29 (-17.5 off)
CAR @ SF: Predicted 44.6 total, actual was 29 (-15.6 off)

Score prediction remains our weakest area. Picking winners is much easier than predicting exact margins.

Season Standings Update

Through 178 games (12 weeks), here’s where each model stands:

Model	Season W-L	Accuracy	Trend
Logistic	113-65	63.5%	↗️
ELO	110-68	61.8%	↘️
Ensemble	109-69	61.2%	→
Bayesian	99-79	55.6%	↘️
Random Forest	80-98	44.9%	↗️
XGBoost	74-104	41.6%	↗️

Logistic still leads the season despite not being our “feature” model. The Ensemble remains our safest bet — never the best week, rarely the worst.

Key Takeaways

Consensus is king: 6-for-6 this week, 84% on the season. When models agree, follow them.
XGBoost awakens: After struggling all season, XGBoost finally showed what it can do with an 85.7% week. One week doesn’t make a trend, but it’s encouraging.
The “lock of the year” held: SEA @ TEN at 87.9% confidence delivered as promised. Seattle won 30-24.
Value plays can backfire: Our biggest edge of the season (NE @ CIN, +35.6%) went the wrong way when New England won outright 26-20. The models were right that it would be close, but wrong about the winner.
Coin flips gonna flip: Three of our four misses (BUF/HOU, ATL/NO, PHI/DAL) were sub-59% confidence. The math works.
XGBoost vindicated on Houston: Remember that 100% XGBoost confidence on Houston for Thursday Night? It was right. The ensemble’s 51% Buffalo pick was wrong.

Looking Ahead

Week 13 brings a crucial slate of games with playoff implications heating up. Our models will recalibrate with the latest data, and we’ll see if XGBoost’s breakout was a fluke or a turning point.

Lessons for Week 13: