I Asked an AI to Hack My Own App. Here's What It Found.

At some point during the build, I had a thought that most solo developers either ignore or suppress: what if someone actually tried to break this?

BangOn has real users. Real accounts. Real data. It's not a portfolio project that lives on a subdomain and gets visited by nobody. People log in, they play with their friends, they care about their scores. If something went wrong — if someone could manipulate the leaderboard, or access another user's account, or do something I hadn't thought of — that would be genuinely bad.

So I stopped building features for a day and asked Claude to audit the codebase instead. Not "have a quick look", but a proper adversarial review. Pretend you're a senior security engineer who's been hired to find problems. Find them.

It found things.

THE FINDINGS

I'm not going to list every item because some of them are dull. But three stood out.

The first was timing attacks on authentication tokens. When you're comparing a token someone has sent you — a password reset link, an email confirmation code — against the one you stored, the naive approach is to compare them like strings. The problem is that string comparison short-circuits: it stops as soon as it finds a mismatch. A clever attacker, measuring response times with enough precision, can use that to figure out how many characters they've got right. It's a real attack. It has a real fix: timing-safe comparison, which takes the same amount of time regardless of where the mismatch is. We weren't using it. We are now.

The second was Row Level Security gaps. Supabase makes RLS easy to set up, which is good. It also makes it easy to think you've set it up correctly when you haven't, which is less good. There were a couple of tables where the policies were subtly wrong — not "anyone can read everything" wrong, but "someone who knew what they were doing could query data they shouldn't have access to" wrong. Fixed.

The third was more interesting. There was a path — obscure, but real — where a user could potentially escalate their own privileges to admin level through a specific sequence of API calls. Not easily. Not obviously. But the logic was there if you looked for it. That one got fixed the same afternoon and I felt slightly sick about it until it was deployed.

THE PROCESS

What struck me about the audit wasn't the findings themselves. It was how the conversation had to work to surface them.

If you just ask "is this secure?", you get "yes, here are some best practices". That's not useful. You have to ask the right questions. What are the attack surfaces? What does an adversary with read access to the codebase do first? What assumptions is this code making about the caller that could be violated?

That reframing — from "check this is fine" to "try to break this" — is what produced real findings rather than a checklist of things I'd already done.

I've started doing this regularly now. Before any significant change ships, I ask: what could go wrong here, specifically? What would a hostile user try? The AI is very good at this once you point it in the right direction. It has read more security writeups than any human, and it doesn't get bored or assume something is probably fine.

WHAT THIS HAS TO DO WITH VIBE CODING

There's a version of vibe coding that's just vibing. Ship fast, move on, trust that nothing bad will happen. That version ends badly.

The thing is, the AI will build whatever you ask it to build. It will build it quickly and it will build it confidently. It will not, unless you ask, stop halfway through and say "actually, have you considered the security implications of this design decision?" You have to ask. You have to create the space for that conversation.

The discipline isn't in the code. It's in knowing what questions to ask and when to ask them.

BangOn is a better app for having gone through this. Not because anything catastrophic was found, but because I now know what was checked and what the answers were. That's worth a day of not shipping features.

Next post: the scoring overhaul. Why "lower is better" is a terrible idea for a game that's supposed to be fun, and how we fixed it without breaking everyone's existing scores.