Anatomy of an AI-Generated Security Vulnerability: A Walkthrough
April 22, 2026 · 8-minute read · Seth
Anatomy of an AI-Generated Security Vulnerability: A Walkthrough
Note: The scenario below is illustrative — a composite of patterns we've seen and patterns documented across the AI-coding security literature. The code is real, the bug class is real, the consequences are real. The specific company is not.
A founder building a healthcare startup asks Claude Code to add a new endpoint to their patient portal: an export function that lets patients download their medical history as a PDF.
The request is reasonable. Patients have a right to their data. The endpoint should accept a patient_id, fetch the records, generate a PDF, and return it. Claude produces working code in under two minutes. The PR is opened, the CI passes, a junior engineer reviews and approves, and the code merges to production by end of day.
Four months later, a security researcher emails the founder with a one-line proof of concept that downloads any patient's medical records from the production system. There is no exploit chain. There is no clever bypass. The export endpoint never had authorization on it.
This is the anatomy of how that happens — and why nothing in the team's review process caught it.
The original request
The conversation with Claude Code, paraphrased:
Add an endpoint at
/api/patients/{patient_id}/exportthat returns a PDF of all the patient's medical records — visits, prescriptions, lab results, notes. Use the existing PDF library. Mirror the structure of the other patient endpoints.
The phrase "mirror the structure of the other patient endpoints" is doing a lot of work here. It implies — to a human — and apply the same authorization model the other endpoints use. It does not say that explicitly. To Claude, "mirror the structure" reads as "mirror the routing and the response shape."
The code Claude wrote
@router.get("/api/patients/{patient_id}/export")
async def export_patient_records(
patient_id: str,
db: Database = Depends(get_db)
):
"""Export complete medical history for a patient as PDF."""
records = await db.fetch_patient_records(patient_id=patient_id)
if not records:
raise HTTPException(status_code=404, detail="Patient not found")
pdf_bytes = generate_patient_pdf(records)
return Response(
content=pdf_bytes,
media_type="application/pdf",
headers={
"Content-Disposition": f"attachment; filename=patient_{patient_id}_records.pdf"
}
)
Read it slowly. What's missing?
There is no authorization check. The endpoint accepts a patient_id from the URL and returns that patient's records to whoever asks. There is no verification that the requester is the patient, the patient's authorized provider, the patient's legal guardian, or anyone with any right to the data.
The endpoint is behind the system's authentication layer — you have to be logged in to call it at all. But authentication is not authorization. Being a logged-in user lets you call the endpoint. The endpoint lets you specify any patient_id you want.
Why Claude wrote it this way
Three reasons compound:
The prompt didn't specify the authorization model. The founder said "mirror the structure of the other patient endpoints." Claude interpreted that as routing structure and response shape. The authorization model wasn't stated, so it wasn't implemented.
The other patient endpoints in the codebase had their auth enforced inconsistently. Some used a middleware. Some had inline checks. Some — for reasons lost to commit history — had no checks at all because they were public endpoints serving the same patient's own data after a different auth pattern. There was no single pattern to "mirror."
Authorization is invisible in code review. Reviewers look at what's in the diff. The bug here is what's not in the diff. Catching it requires a reviewer who explicitly asks the question "what happens if I call this with someone else's patient_id?" — and most reviewers don't.
How the team's review process let it through
The PR went through three checks before merging. None of them caught it.
CI passed. Type checks passed. The test the engineer wrote — fetching their own test patient's records — returned a PDF. Coverage was nominally adequate. No automated tool in their pipeline checked for authorization-context bugs.
An AI code reviewer left two comments. Both were about the PDF generation logic — one suggesting a more efficient layout, one flagging a minor style inconsistency. Neither comment mentioned authorization. The AI reviewer cannot know that "the medical record identifier in this codebase is the user_id and must always be cross-referenced against the authenticated user's permissions" — that context is not in any single PR. It exists in the engineering team's heads and a buried wiki page.
A junior engineer approved the PR. They read the diff. They confirmed the PDF endpoint worked when they tested it on their staging account. They approved.
The structure of the failure: every individual check did exactly what it was designed to do. The collective process did not have a layer whose job was to ask the security question.
The four months between merge and discovery
Four months is roughly the median time-to-discovery for authorization bugs of this kind. Long enough for the endpoint to accumulate real usage. Long enough for the URL pattern to be indexed by anyone scanning the company's public surface. Long enough for the bug to be exploitable for as long as it took someone to look.
During those four months:
- A small number of patients used the export feature legitimately and got their own records
- The endpoint URL appeared in logs, in error reports, in monitoring dashboards
- The pattern
/api/patients/{id}/exportcould be discovered by anyone watching the company's API traffic or guessing at endpoint conventions - At least one log analysis tool indexed the endpoint structure and made it queryable
- Nothing alerted
The bug was not actively exploited that we can tell. Most of these bugs aren't — the population of people scanning for them is smaller than the population of pages on the internet. But "not actively exploited" is not "safe." Discovery is a low-probability, high-consequence event. The right framing is not "did this get exploited" but "how long was this exploitable."
The cost of discovery
When the researcher disclosed it, the company:
- Patched the bug in 90 minutes (the fix is one line)
- Audited their logs to determine which patient_ids had been accessed by which authenticated users (impossible to determine fully because they hadn't logged enough)
- Consulted counsel on whether the incident constituted a reportable breach under HIPAA
- Notified affected patients (legally required)
- Notified their cyber insurer
- Underwent a full external security audit (mandated by the insurer for the policy to remain in force)
- Documented the incident for their SOC 2 readiness process
Total cost: roughly six figures of direct expense, two months of leadership time, one engineering hire lost during the audit period, and a permanent line item in their security narrative that every future enterprise customer asks about.
The bug took 90 minutes to fix. The bug took 18 weeks to recover from.
What would have caught it
Three layers, any one of which would have been sufficient.
Layer 1: Framework-level authorization enforcement. If the codebase had a rule that every endpoint serving patient data must declare its authorization model — and the framework enforced this rule at route registration — the endpoint would have failed to start.
Layer 2: A code reviewer with explicit security context. A reviewer who reads every new endpoint asking "who is allowed to call this, and where does the check happen?" would have flagged the missing authorization in 30 seconds. This is what staff-level security engineers do reflexively. It is the work that internal teams almost never have the capacity to do consistently on every PR.
Layer 3: A periodic external audit. Even without per-PR review, a quarterly external review of new endpoints would have caught this within months instead of waiting for an external researcher.
The team had none of the three. They had what most teams have: CI, an AI reviewer, and peer review by whoever was available. That stack works for most code. It does not work for code that handles regulated data.
The pattern, generalized
This specific bug — missing authorization on a data-export endpoint in a healthcare app — is one instance. The class of bug it represents is wide:
- Missing authorization on any endpoint that takes a user/customer/tenant ID in the URL
- Missing tenant isolation in queries that previously had it
- Missing rate limiting on auth-sensitive endpoints
- Missing input validation on fields that determine access
- Missing audit logging on privileged actions
All of these share the same structural property: the bug is something that should be in the code but isn't. AI tools generate what's asked for. They do not reliably add the defensive constraints that wasn't.
The defense against this class of bug is not a better AI tool. It's a human reviewer whose explicit job is to look for what's missing — with enough security context and accountability to be trusted with the sign-off.
That's what Fairy does. Submit any PR that touches authorization, data access, or regulated information. A staff-level security engineer reviews it specifically with the question "what's not here that should be?" Sign-off in 24 hours, fixed price, refund and remediation if anything we approve causes a production incident.
Submit a security-focused review →
Related reading: The AI-Generated Code Security Checklist covers the twelve specific verification points to run against any high-risk PR. Vibe Coding to Production: A CTO's Guide covers the broader process changes engineering leaders need to make.
Have AI-generated work you’d want verified? Connect with a Fairy →
More resources