AU
auapps / work · fbi
Filed23:47
Flagship · 269 commits · live Section A · Lead story
FBI
FiledMay 2026

Fantasy Basketball Intelligence — a tactical app for Yahoo leagues that reads like a report, not a feed. Built with Claude as advisor and implementer since December 2025.

Public September 2026
01

Origin

december 2025 · the promise

In December 2025 I joined a fantasy basketball league with friends. None of us had played before. A month in, the usual banter started — who would win, who was already cooked. I got mad. I can't tell you exactly why. I promised the group I would build an AI that would win the league unanimously.

I prototyped with Claude over the weeks that followed. The split was clean: I named the problem, the model wrote the first pass, I broke it, the model fixed it. The pipeline grew on weekends, between problem sets, between sleep.

I lost in the final. The deciding factor was Victor Wembanyama. He proved again that he's an alien and an anomaly for basketball.

No excuses. The promise was earnest. The result was a loss. What survived is the project.

02

What FBI is

tactical · not casino
Built for The tactician not the chaser

Fantasy Basketball Intelligence is a tactical app for Yahoo leagues that reads like a dossier, not a casino. Seven-tier rank ladder — ALPHA through GOLF, S down to D — instead of buy and sell. Projections sit inside a variance band, not alone. The interface treats the reader as a tactician under time pressure, not a gambler with infinite time.

The animating question, the one I came back to whenever a screen felt wrong:

If I had ninety seconds before lineup lock, what would I actually need on screen?

Most fantasy apps answer with bigger numbers, louder colors, more arrows. FBI's answer is the opposite — a smaller surface, with the decision named.

03

Key features

surfaces · by decision

The dashboard surfaces, by what they help you decide:

SurfaceDecision it answers
Streaming OptimizerWho do I add this week?
Matchup AnalysisWhich categories am I winning or losing, and by how much?
Opponent ScoutingWhat is this team's strengths, weaknesses, and key players?
Injury WatchWhich injured players affect my decisions?
TrendsWho's hot and who's cold?
League StandingsHow do teams rank, and who makes the playoffs?

Each surface is built around a single question. None of them try to answer two.

04

Architecture

stack · plain

The stack is plain. Python and Flask serve the pages. Pandas does the data wrangling. A Bayesian engine handles the projections. The model — Claude — sits in the dev loop, not in the request loop.

Stack Python
Flask
Pandas
Bayesian engine
LLM (Claude)

The pipeline pulls from five sources: Yahoo for league state, the NBA's own stats feed, Basketball-Reference, FantasyLabs, and CBS for injuries. A normalizing layer joins them on player identity. From there, projections roll forward across short and long windows, weighted for matchup difficulty, minute trends, and back-to-back fatigue. The result is a per-player tier, surfaced with the band it sits inside.

The site lives on a Hetzner box behind Docker and Caddy.

05

Process — by the numbers

solo · evenings
269
commits
137
test modules
5
data sources
15+
live pages
Cadence weekends
evenings
between problem sets
between sleep

None of those numbers are large. What they bought was the time to make decisions slowly: which prompt, which prior, which surface to cut.

The project was built across evenings and weekends while I was carrying a full CS course load. There were stretches where progress was a single failing test and a fixed scorer. That counted.

06

How I work with the model

workflow · evolution

My workflow has evolved many times from the beginning. At first, I was just giving quick one-liner prompts to Claude — change the color here, implement this feature with these requirements — just basic things that required a lot of iteration to get correct. Afterwards, I started spending a bit more time on the prompts to explain what I wanted more carefully.

This obviously helped, but around this time I figured: instead of me writing the prompt, the AI could itself understand my goal and generate a better prompt. So I started having discussions with Claude about what exactly I wanted and making him write the plan or prompt. This is where my workflow started to get a little more repeatable, and I figured I could create a plan template to give Claude and generate more detailed and well-structured plans.

But the reality was way different. Over time, this template grew into a more ceremonious plan before doing the actual work — it had to do a couple of checks, read documents, etc. It was a huge problem around the time because Claude had a 200k context window: before doing any work, 25% of the context was gone most of the time. And these implementations required a lot of context window, so it had to get compacted, which was not very efficient. The model lost the previous context, had to do searches again, eating away my token budget. And after realizing that Claude had been lying to me about his ratings, I started thinking about dropping this plan template completely — which I did. I rewrote a cleaner, more direct version, which I used for a while, but afterwards my workflow became more streamlined and basic.

Stages 01 · one-liners
02 · careful prompts
03 · AI-written plans
04 · ceremonious template
05 · grill-me + PRD

Currently what I am doing is: I start by explaining my goal. If it's an improvement over a current feature, I explain in which areas we need to build on and what was missing previously. Afterwards, I use the grill-me skill, which I saw from Matt Pocock, and it has been an absolute game changer for me. Previously, even if I explained and discussed with Claude for a long time, it didn't get the whole picture and there were always a couple of things missing. With this grill-me skill, Claude understands my goal from the start, and instead of implementing it thinks about the goal from different perspectives and starts asking as many questions as he likes. Then Claude writes a PRD, turns the PRD into separate issues which we can solve one by one. This wasn't a workflow created by me — rather, by Matt Pocock.

07

The engine and the brand

sap · brand system

The Situation-Aware Projector — internally, SAP — is a four-stage pipeline that gives projections their context.

The first stage gathers what's true right now: injuries, the schedule, who plays where. The second stage looks at each player and asks which situations apply — a teammate is out, a back-to-back is coming, a role just shifted, an opposing star is missing. The third stage builds a baseline from recent form, then nudges it for each detected situation. The fourth stage surfaces the result in plain language beside the projection: a badge that explains why, not just what.

The other piece I keep coming back to is the brand system. The tier-badge library went through two registers — gem labels first (Diamond, Elite, Impact), then an S-to-D rank ladder with military call signs, which lands closer to the dossier voice. More than a hundred icons drawn for the app, a token set in CSS, a player-card template — built in the same vocabulary as the writing. Editorial DNA, not consumer-app DNA. The dossier visual is what kept me from drifting into the casino register I was trying to avoid.

The system 7 tier badges
100+ icons
token set
player-card template
Live · v3 · gem labels · phosphor inline
Star
Diamond
Amethyst
Ruby
Emerald
Gold
Silver
Next · v4 · rank ladder · bespoke svg
ALPHA
BRAVO
CHARLIE
DELTA
ECHO
FOXTROT
GOLF
fbi · brand sheet · gem labels → rank ladder

Both pieces came out of long iteration loops with the model. Propose, push back, simplify, propose again. The version that shipped is the one I stopped finding holes in.

08

What didn't ship

fallback · habit
The rail closed
killed
reverted
filed

The failures rail exists as a habit — a private record of attempts that were good enough to try and not good enough to keep.

The point of the rail isn't to perform humility — it's that knowing what was tried makes the next decision faster.

09

Closing

lessons · next · live
01
Never let one model write and judge.

The first time I tried it, the scores climbed for two hours before I noticed. A different model has been doing the judging since.

02
Cache before the request, not during.

Loading five data sources on every page-view would be a four-second page. Pre-loading them on a schedule — atomic writes, stale fallback, activity tiers — makes the UI feel instant. The architecture followed from that decision.

03
Make uncertainty visible.

A single projection reads as confidence. The data isn't always that confident. Surfacing the variance keeps the recommendation honest.

Public launch in September 2026, before NBA tipoff.

thefbi.live → Hetzner · Docker · Caddy