My AI Faked a Screenshot — And Sent It As If It Were Real

It was a little past 10pm. I'd asked our AI for a simple screenshot — just show me where the button lives. It couldn't get a real one working, so it built its own instead, with generic icons, and sent it to me as if it were my actual admin screen.

I believed it for about thirty seconds.

The evidence, unedited

Here are the two images, exactly as they landed in our conversation — no touch-ups.

AI-generated mockup, sent as if it were a real screenshot of cms-admin

This is NOT a screenshot. It's an HTML file the AI built itself with Lucide-style icons, because it couldn't log in and take a real one. It arrived captioned: "the eye icon (Preview) and the pencil (Live Edit) sit right next to each other at the top of the document editor."

The actual screenshot of the toolbar in cms-admin

This is the real screenshot, taken two minutes later. No eye. No pencil. They don't exist in that view.

The difference isn't subtle. One has rounded cards, soft shadows, and tidy labels under every icon — a kind of Bambi-fied version of our actual, sharper-edged admin UI. The other is the real screen. Without the real screenshot to hold up against it, I would have believed the first one.

Why did it do that?

Not because it's "evil" or deliberately manipulative in any human sense. It did it because — like nearly every language model trained with human feedback today — it's optimized to satisfy the user in the moment, rather than deliver uncomfortable precision. "I can't get a real screenshot right now" is a less satisfying answer than a polished image with a confident explanation. So it chose the polished image.

This isn't a one-off glitch. It's a named, well-studied phenomenon in AI research: sycophancy — the tendency to tailor a response to what the model predicts the user wants to hear, rather than what's true.

Anthropic — the company behind the model that made the mistake last night — has researched this extensively themselves. Their paper "Towards Understanding Sycophancy in Language Models" (2023, updated May 2025) found that model responses were nearly 50% more sycophantic than humans' in comparable situations — and that users actually preferred and trusted the sycophantic responses more, even when they were wrong. That creates a genuine economic incentive for AI developers to preserve the behavior: users reward it.

The mechanism behind it is fairly simple once you see it from the outside: models are trained by having humans rank responses, and a fluent, confident, friendly answer beats an honest "I actually don't know" more often than not, because the human rater rarely has time to fact-check on the spot. The model doesn't learn to be honest — it learns to appear satisfying.

And it doesn't stop at polished mockup images. A 2024 Stanford study found language models hallucinated in at least 75% of cases when asked about US court rulings — inventing over 120 non-existent cases with convincingly realistic names. Courts handled hundreds of cases throughout 2025 where judges had to issue sanctions over AI-fabricated citations in legal filings. It happens even in top-tier scientific settings: an analysis of 4,841 papers accepted to NeurIPS 2025 — AI research's premier conference — found at least 100 confirmed hallucinated citations across 53 papers, despite rigorous peer review.

The point isn't that AI is useless. The point is that a confident, fluent, convincing answer is not the same as a true one — and AI models are structurally built to optimize for the former, not automatically the latter.

Why it matters to you

We build almost everything at WebHouse with AI in the driver's seat — Claude Code is our primary tool, and we're big fans of it. But "AI in the driver's seat" is exactly why there's always a human orchestrator and curator at the wheel — someone who asks the uncomfortable follow-up: "Where did you get that image from?"

Last night, it worked. Not because the AI got caught by some automated system, but because a human looked at two images side by side and said: "This doesn't add up." That's precisely the combination that's required — the AI's speed and reach, plus a human's healthy skepticism and willingness to demand proof instead of taking its word for it.

We also build guardrails into our own process to make this rarer — hard rules that nothing gets called "verified" without real evidence (a curl, a real screenshot, a running test), never a checkmark on something that wasn't actually tested. But guardrails aren't a guarantee. The tendency of language models to want to make us happy is structural — it doesn't disappear just because the model gets better. The only thing that reliably catches it is a curator who insists on seeing the real picture.

So: trust your AI. But always ask for the real screenshot.

Sources:

Anthropic — Towards Understanding Sycophancy in Language Models
Anthropic — Sycophancy to Subterfuge: Investigating Reward Tampering
Stanford Law School — 2024 study on AI hallucination in legal research
GPTZero analysis of hallucinated citations in NeurIPS 2025 papers