Fantasy Basketball Intelligence — a tactical app for Yahoo leagues that reads like a report, not a feed. Built with Claude as advisor and implementer since December 2025.
Origin
In December 2025 I joined a fantasy basketball league with friends. None of us had played before. A month in, the usual banter started — who would win, who was already cooked. I got mad. I can't tell you exactly why. I promised the group I would build an AI that would win the league unanimously.
I prototyped with Claude over the weeks that followed. The split was clean: I named the problem, the model wrote the first pass, I broke it, the model fixed it. The pipeline grew on weekends, between problem sets, between sleep.
I lost in the final. The deciding factor was Victor Wembanyama. He proved again that he's an alien and an anomaly for basketball.
No excuses. The promise was earnest. The result was a loss. What survived is the project.
What FBI is
Fantasy Basketball Intelligence is a tactical app for Yahoo leagues that reads like a dossier, not a casino. Seven-tier rank ladder — ALPHA through GOLF, S down to D — instead of buy and sell. Projections sit inside a variance band, not alone. The interface treats the reader as a tactician under time pressure, not a gambler with infinite time.
The animating question, the one I came back to whenever a screen felt wrong:
If I had ninety seconds before lineup lock, what would I actually need on screen?
Most fantasy apps answer with bigger numbers, louder colors, more arrows. FBI's answer is the opposite — a smaller surface, with the decision named.
Key features
The dashboard surfaces, by what they help you decide:
| Surface | Decision it answers |
|---|---|
| Streaming Optimizer | Who do I add this week? |
| Matchup Analysis | Which categories am I winning or losing, and by how much? |
| Opponent Scouting | What is this team's strengths, weaknesses, and key players? |
| Injury Watch | Which injured players affect my decisions? |
| Trends | Who's hot and who's cold? |
| League Standings | How do teams rank, and who makes the playoffs? |
Each surface is built around a single question. None of them try to answer two.
Architecture
The stack is plain. Python and Flask serve the pages. Pandas does the data wrangling. A Bayesian engine handles the projections. The model — Claude — sits in the dev loop, not in the request loop.
Flask
Pandas
Bayesian engine
LLM (Claude)
The pipeline pulls from five sources: Yahoo for league state, the NBA's own stats feed, Basketball-Reference, FantasyLabs, and CBS for injuries. A normalizing layer joins them on player identity. From there, projections roll forward across short and long windows, weighted for matchup difficulty, minute trends, and back-to-back fatigue. The result is a per-player tier, surfaced with the band it sits inside.
The site lives on a Hetzner box behind Docker and Caddy.
Process — by the numbers
evenings
between problem sets
between sleep
None of those numbers are large. What they bought was the time to make decisions slowly: which prompt, which prior, which surface to cut.
The project was built across evenings and weekends while I was carrying a full CS course load. There were stretches where progress was a single failing test and a fixed scorer. That counted.
How I work with the model
My workflow has evolved many times from the beginning. At first, I was just giving quick one-liner prompts to Claude — change the color here, implement this feature with these requirements — just basic things that required a lot of iteration to get correct. Afterwards, I started spending a bit more time on the prompts to explain what I wanted more carefully.
This obviously helped, but around this time I figured: instead of me writing the prompt, the AI could itself understand my goal and generate a better prompt. So I started having discussions with Claude about what exactly I wanted and making him write the plan or prompt. This is where my workflow started to get a little more repeatable, and I figured I could create a plan template to give Claude and generate more detailed and well-structured plans.
But the reality was way different. Over time, this template grew into a more ceremonious plan before doing the actual work — it had to do a couple of checks, read documents, etc. It was a huge problem around the time because Claude had a 200k context window: before doing any work, 25% of the context was gone most of the time. And these implementations required a lot of context window, so it had to get compacted, which was not very efficient. The model lost the previous context, had to do searches again, eating away my token budget. And after realizing that Claude had been lying to me about his ratings, I started thinking about dropping this plan template completely — which I did. I rewrote a cleaner, more direct version, which I used for a while, but afterwards my workflow became more streamlined and basic.
02 · careful prompts
03 · AI-written plans
04 · ceremonious template
05 · grill-me + PRD
Currently what I am doing is: I start by explaining my goal. If it's an improvement over a current feature, I explain in which areas we need to build on and what was missing previously. Afterwards, I use the grill-me skill, which I saw from Matt Pocock, and it has been an absolute game changer for me. Previously, even if I explained and discussed with Claude for a long time, it didn't get the whole picture and there were always a couple of things missing. With this grill-me skill, Claude understands my goal from the start, and instead of implementing it thinks about the goal from different perspectives and starts asking as many questions as he likes. Then Claude writes a PRD, turns the PRD into separate issues which we can solve one by one. This wasn't a workflow created by me — rather, by Matt Pocock.
The engine and the brand
The Situation-Aware Projector — internally, SAP — is a four-stage pipeline that gives projections their context.
The first stage gathers what's true right now: injuries, the schedule, who plays where. The second stage looks at each player and asks which situations apply — a teammate is out, a back-to-back is coming, a role just shifted, an opposing star is missing. The third stage builds a baseline from recent form, then nudges it for each detected situation. The fourth stage surfaces the result in plain language beside the projection: a badge that explains why, not just what.
The other piece I keep coming back to is the brand system. The tier-badge library went through two registers — gem labels first (Diamond, Elite, Impact), then an S-to-D rank ladder with military call signs, which lands closer to the dossier voice. More than a hundred icons drawn for the app, a token set in CSS, a player-card template — built in the same vocabulary as the writing. Editorial DNA, not consumer-app DNA. The dossier visual is what kept me from drifting into the casino register I was trying to avoid.
100+ icons
token set
player-card template
Both pieces came out of long iteration loops with the model. Propose, push back, simplify, propose again. The version that shipped is the one I stopped finding holes in.
What didn't ship
killed
reverted
filed
The failures rail exists as a habit — a private record of attempts that were good enough to try and not good enough to keep.
- — The one that stayed with me longest was an over-animated badge system that thrashed older devices and ended up cut down to static stripes. Mar 2026 Killed
The point of the rail isn't to perform humility — it's that knowing what was tried makes the next decision faster.
Closing
The first time I tried it, the scores climbed for two hours before I noticed. A different model has been doing the judging since.
Loading five data sources on every page-view would be a four-second page. Pre-loading them on a schedule — atomic writes, stale fallback, activity tiers — makes the UI feel instant. The architecture followed from that decision.
A single projection reads as confidence. The data isn't always that confident. Surfacing the variance keeps the recommendation honest.
Public launch in September 2026, before NBA tipoff.
thefbi.live → Hetzner · Docker · Caddy