FBI — auapps

Flagship · 269 commits · live Section A · Lead story

FBI

FiledMay 2026

Fantasy Basketball Intelligence — a tactical app for Yahoo leagues that reads like a report, not a feed. Built with Claude as advisor and implementer since December 2025.

Public September 2026

Origin

december 2025 · the promise

In December 2025 I joined a fantasy basketball league with friends. None of us had played before. A month in, the usual banter started — who would win, who was already cooked. I got mad. I can't tell you exactly why. I promised the group I would build an AI that would win the league unanimously.

I prototyped with Claude over the weeks that followed. The split was clean: I named the problem, the model wrote the first pass, I broke it, the model fixed it. The pipeline grew on weekends, between problem sets, between sleep.

I lost in the final. The deciding factor was Victor Wembanyama. He proved again that he's an alien and an anomaly for basketball.

No excuses. The promise was earnest. The result was a loss. What survived is the project.

What FBI is

tactical · not casino

Built for The tactician not the chaser

Fantasy Basketball Intelligence is a tactical app for Yahoo leagues that reads like a dossier, not a casino. Seven-tier rank ladder — ALPHA through GOLF, S down to D — instead of buy and sell. Projections sit inside a variance band, not alone. The interface treats the reader as a tactician under time pressure, not a gambler with infinite time.

The animating question, the one I came back to whenever a screen felt wrong:

If I had ninety seconds before lineup lock, what would I actually need on screen?

Most fantasy apps answer with bigger numbers, louder colors, more arrows. FBI's answer is the opposite — a smaller surface, with the decision named.

S ALPHA Top General

A BRAVO Brigadier General

B+ CHARLIE Captain

B DELTA Lieutenant

C+ ECHO Staff Sergeant

C FOXTROT Sergeant

D GOLF Corporal

Key features

surfaces · by decision

The dashboard surfaces, by what they help you decide:

Surface	Decision it answers
Streaming Optimizer	Who do I add this week?
Matchup Analysis	Which categories am I winning or losing, and by how much?
Opponent Scouting	What is this team's strengths, weaknesses, and key players?
Injury Watch	Which injured players affect my decisions?
Trends	Who's hot and who's cold?
League Standings	How do teams rank, and who makes the playoffs?

Each surface is built around a single question. None of them try to answer two.

Architecture

stack · plain

The stack is plain. Python and Flask serve the pages. Pandas does the data wrangling. A Bayesian engine handles the projections. The model — Claude — sits in the dev loop, not in the request loop.

Stack Python
Flask
Pandas
Bayesian engine
LLM (Claude)

The pipeline pulls from five sources: Yahoo for league state, the NBA's own stats feed, Basketball-Reference, FantasyLabs, and CBS for injuries. A normalizing layer joins them on player identity. From there, projections roll forward across short and long windows, weighted for matchup difficulty, minute trends, and back-to-back fatigue. The result is a per-player tier, surfaced with the band it sits inside.

The site lives on a Hetzner box behind Docker and Caddy.

Sources Yahoo NBA B-Ref FantasyLabs CBS

Pipeline Pandasid-norm · merge

Cache multi-mode

Engine Bayesian

Output Tier

fbi · data flow · 5 sources → bayesian engine → tier

Process — by the numbers

solo · evenings

269

commits

137

test modules

data sources

15+

live pages

Cadence weekends
evenings
between problem sets
between sleep

None of those numbers are large. What they bought was the time to make decisions slowly: which prompt, which prior, which surface to cut.

The project was built across evenings and weekends while I was carrying a full CS course load. There were stretches where progress was a single failing test and a fixed scorer. That counted.

How I work with the model

workflow · evolution

My workflow has evolved many times from the beginning. At first, I was just giving quick one-liner prompts to Claude — change the color here, implement this feature with these requirements — just basic things that required a lot of iteration to get correct. Afterwards, I started spending a bit more time on the prompts to explain what I wanted more carefully.

This obviously helped, but around this time I figured: instead of me writing the prompt, the AI could itself understand my goal and generate a better prompt. So I started having discussions with Claude about what exactly I wanted and making him write the plan or prompt. This is where my workflow started to get a little more repeatable, and I figured I could create a plan template to give Claude and generate more detailed and well-structured plans.

But the reality was way different. Over time, this template grew into a more ceremonious plan before doing the actual work — it had to do a couple of checks, read documents, etc. It was a huge problem around the time because Claude had a 200k context window: before doing any work, 25% of the context was gone most of the time. And these implementations required a lot of context window, so it had to get compacted, which was not very efficient. The model lost the previous context, had to do searches again, eating away my token budget. And after realizing that Claude had been lying to me about his ratings — he was both implementing the work and reviewing his own process, so he kept handing himself elevated scores — I started thinking about dropping this plan template completely, which I did. I rewrote a cleaner, more direct version, which I used for a while, but afterwards my workflow became more streamlined and basic.

Stages 01 · one-liners
02 · careful prompts
03 · AI-written plans
04 · ceremonious template
05 · grill-me + PRD

Currently what I am doing is: I start by explaining my goal. If it's an improvement over a current feature, I explain in which areas we need to build on and what was missing previously. Afterwards, I use the grill-me skill, which I saw from Matt Pocock, and it has been an absolute game changer for me. Previously, even if I explained and discussed with Claude for a long time, it didn't get the whole picture and there were always a couple of things missing. With this grill-me skill, Claude understands my goal from the start, and instead of implementing it thinks about the goal from different perspectives and starts asking as many questions as he likes. Then Claude writes a PRD, turns the PRD into separate issues which we can solve one by one. This wasn't a workflow created by me — rather, by Matt Pocock.

The engine and the brand

sap · brand system

The Situation-Aware Projector — internally, SAP — is a four-stage pipeline that gives projections their context.

Context gathers facts

Detection spots situations

Projection builds + nudges

Display names the why

The first stage gathers what's true right now: injuries, the schedule, who plays where. The second stage looks at each player and asks which situations apply — a teammate is out, a back-to-back is coming, a role just shifted, an opposing star is missing. The third stage builds a baseline from recent form, then nudges it for each detected situation. The fourth stage surfaces the result in plain language beside the projection: a badge that explains why, not just what.

The other piece I keep coming back to is the brand system. The tier-badge library went through two registers — gem labels first (Diamond, Elite, Impact), then an S-to-D rank ladder with military call signs, which lands closer to the dossier voice. More than a hundred icons drawn for the app, a token set in CSS, a player-card template — built in the same vocabulary as the writing. Editorial DNA, not consumer-app DNA. The dossier visual is what kept me from drifting into the casino register I was trying to avoid.

The system 7 tier badges
100+ icons
token set
player-card template

Live · v3 · gem labels · phosphor inline

Star

Diamond

Amethyst

Ruby

Emerald

Gold

Silver

Next · v4 · rank ladder · bespoke svg

ALPHA

BRAVO

CHARLIE

DELTA

ECHO

FOXTROT

GOLF

fbi · brand sheet · gem labels → rank ladder

Both pieces came out of long iteration loops with the model. Propose, push back, simplify, propose again. The version that shipped is the one I stopped finding holes in.

What didn't ship

fallback · habit

The rail closed
killed
reverted
filed

The failures rail exists as a habit — a private record of attempts that were good enough to try and not good enough to keep.

— The one that stayed with me longest was an over-animated badge system that thrashed older devices and ended up cut down to static stripes. Mar 2026 Killed

The point of the rail isn't to perform humility — it's that knowing what was tried makes the next decision faster.

Closing

lessons · next · live

Never let one model write and judge.

The first time I tried it, the scores climbed for two hours before I noticed. A different model has been doing the judging since.

Cache before the request, not during.

Loading five data sources on every page-view would be a four-second page. Pre-loading them on a schedule — atomic writes, stale fallback, activity tiers — makes the UI feel instant. The architecture followed from that decision.

Make uncertainty visible.

A single projection reads as confidence. The data isn't always that confident. Surfacing the variance keeps the recommendation honest.

Public launch in September 2026, before NBA tipoff.

thefbi.live → Hetzner · Docker · Caddy