
Back in September, I launched what I thought would be a fun experiment: pitting three AI platforms against each other (and me) in my annual NFL football pool. ChatGPT, Perplexity, and Claude each got the same weekly data—team records, point differentials, injury reports—and made their picks. I wrote about the methodology and early results in "A Real Test of AI: ChatGPT vs Perplexity vs Claude in an NFL Football Pool."
The original hypothesis? That AI, with its ability to process vast amounts of data and identify patterns, might have an edge over my gut-feel, injury-report-skimming approach. Maybe one platform would emerge as clearly superior. Or maybe they'd all converge on similar picks, revealing some optimal strategy I was missing.
Well, nine weeks in, I've got news: the humans are winning.
Through Week 9, covering 134 total games, here's where we stand:
- My picks: 90 correct (67.2%)
- Claude: 84 correct (62.7%)
- Perplexity: 83 correct (61.9%)
- ChatGPT: 75 correct (56.0%)
That 11-point gap between me and ChatGPT represents a meaningful edge—enough to move from middle-of-the-pack to competitive in most pools. Even my lead over Claude and Perplexity, while smaller, has held steady across nine weeks of picks.
What's particularly interesting is how the season has unfolded. Weeks 5 and 6 were absolute chaos—upset city, with favorites falling left and right. In Week 5, I managed just 5 correct picks out of 14 games. The AIs? ChatGPT got 3, Perplexity 6, Claude 5. Week 6 wasn't much better: I scraped together 7 correct, while ChatGPT completely cratered with just 4.
These upset-heavy weeks should theoretically favor AI. If there's some hidden pattern in the data that signals when an underdog is about to pull off a surprise, machine learning ought to find it. Instead, the chaos seemed to amplify the differences in our approaches rather than reveal any AI advantage.
Looking at the weekly performance, Claude had the strongest single week with 14 correct picks in Week 1, though I matched that performance in Week 3. ChatGPT's best showing was 12 correct in Week 1, matched by me in Weeks 1 and 2. But consistency matters more than peak performance, and that's where the human approach seems to be winning out.
I'm genuinely surprised by these results. Not because I think I'm some football genius—I'm not—but because this seems like exactly the kind of problem AI should excel at. It's pattern recognition over a large dataset, it's probabilistic reasoning, it's synthesizing multiple variables into a binary decision. These are the things we're told AI does better than humans.
So what's going on? A few theories:
First, maybe football is just inherently noisy. The better team doesn't always win, and perhaps the factors that determine week-to-week outcomes are more random than we'd like to admit. In a noisy system, sophisticated analysis might not provide much advantage over informed intuition.
Second, the AIs might be overcomplicating things. When I skim the injury reports and matchup data, I'm doing a kind of lossy compression—keeping the signal, ditching the noise. The AIs might be finding patterns that aren't actually there, fitting their models to statistical artifacts rather than real predictive factors.
Third, there's the possibility that my approach incorporates information the AIs don't have access to—the kind of contextual awareness that comes from casually following the league. I'm not deep into film study, but I do absorb the narrative: which teams are trending up, which coaches are on the hot seat, which units are clicking. That qualitative sense might carry more weight than I realized.
Or maybe—and this is the most boring explanation—I'm just having a lucky season.
We're only halfway through, so there's plenty of time for the machines to mount a comeback. But right now, this experiment is reinforcing something I've suspected about AI more broadly: it's not magic. It's a tool, and like any tool, it has strengths and weaknesses. The hype suggests AI should dominate any task involving data and pattern recognition, but reality is messier.
I'll keep tracking this through the rest of the season. Maybe the AIs will adjust and surge ahead. Maybe my lead will evaporate as regression to the mean kicks in. Or maybe, just maybe, there's something about human judgment—flawed, biased, inconsistent as it is—that still has value in an uncertain world.
Check back in January for the final results. Place your bets accordingly.