What can Claude Fable 5 really do? We uncovered hidden capabilities Anthropic almost kept secret. You won’t believe what it handles, until you see the tests.. Ai Tools, Ai Automations.
TL;DR
Claude Fable 5 is a Mythos-class model with safety classifiers. It delivers top performance on coding, vision, long-horizon tasks, and professional knowledge work. It handles complex workflows autonomously.
It scored 80.3% on SWE Bench Pro, 29.3% on FrontierCode Diamond, 72.9% on CursorBench, and led finance, legal, and health benchmarks.
It can perform large codebase migrations, multi-day agentic workflows, dense PDF analysis, and design-to-code operations. Restricted domain requests like cybersecurity are routed to Opus 4.8.
Key points
Fable 5 migrated a 50 million line Ruby codebase in one day, compared to two months for a full human team.
Common mistake: using Fable 5 for routine queries, drafts, or light tasks is inefficient.
Practical takeaway: reserve Fable 5 for high-complexity, long-duration workflows.
Table of Contents
Introduction
Anthropic did something it had never done before: it handed the public a model from its top-secret “Mythos” tier. That model is Claude Fable 5.
Twitter tweet
Fable 5 and Claude Mythos 5 share the same underlying weights. They are technically the same model. The difference is that Fable 5 wraps Mythos-class intelligence in safety classifiers that block certain high-risk domains.
For everything else, you’re talking to the same model that was previously only available to a handful of vetted cybersecurity and research organizations through Project Glasswing.
The full model family now looks like this:
Fable 5 isn’t the new Opus. It sits above Opus entirely.
I. What Claude Fable 5 Actually Is
Let’s clear up the biggest confusion first: Claude Fable 5 is not the next Opus. It is a new model tier entirely.
1. Fable 5 vs. Mythos 5: The Actual Difference
The only difference is that Fable 5 wraps Mythos-class intelligence in safety classifiers that block three specific domains:
-
Offensive cybersecurity exploits
-
Biology and chemistry procedures
-
Model distillation
Outside those areas, you’re working with the same intelligence Anthropic previously kept behind Project Glasswing.
For most people, that means Fable 5 gives full Mythos-level performance for coding, writing, research, analysis, and complex reasoning.
2. Technical Specifications
|
Specification |
Details |
|---|---|
|
API Model ID |
|
|
Context Window |
1,000,000 tokens |
|
Max Output Per Request |
128,000 tokens |
|
Input Pricing |
$10 per million tokens |
|
Output Pricing |
$50 per million tokens |
|
Long Context Surcharge |
None. 900k tokens costs the same per token as 9k. |
|
Knowledge Cutoff |
January 2026 |
|
Thinking Mode |
Adaptive thinking only. Always on. |
|
Data Retention |
30 days. No zero retention option. |
It’s available on:
Free on Pro, Max, Team, and Enterprise plans through June 22. After that, it draws on usage credits.
One thing worth noting on pricing is that the long context cost structure is more generous than it first appears. A one million token request costs the same per token as a ten thousand token request.
Compare that with Gemini 3.1 Pro, which increases its input pricing beyond 200K tokens. For the long and complex work Fable 5 is designed for, that difference matters.
Learn How to Make AI Work For You!
Transform your AI skills with the AI Fire Academy Premium Plan – FREE for 14 days! Gain instant access to 700+ AI workflows, advanced tutorials, exclusive case studies and unbeatable discounts. No risks, cancel anytime.
Start Your Free Trial Today >>
II. Benchmarks: What the Numbers Mean
I know. You’ve seen benchmark tables before. You’ve watched models leapfrog each other by tiny margins and wondered whether any of it actually means anything in the real world.
This one is different. Not because the numbers are big, but because of the shape of the gap.
Key points
-
Fable 5 completed a 50 million line Ruby codebase migration in one day.
-
A full human engineering team would have taken two months for the same task.
-
Fable 5 is best suited for high-complexity, long-duration workflows.
-
Using Fable 5 for routine queries, drafts, or light tasks is inefficient.
-
The longer and harder the task, the greater Fable 5’s advantage in real-world enterprise scenarios.
1. On Software Engineering
On SWE Bench Pro, the most demanding real world software engineering benchmark, Fable 5 scored 80.3%. Opus 4.8 came in at 69.2%. GPT 5.5 scored 58.6%. Gemini 3.1 Pro scored 54.2%.

The 11-point gap between Fable 5 and Opus 4.8 is larger than the gap between Opus 4.8 and Gemini 3.1 Pro.
The lead over the previous flagship is bigger than the lead that previous flagship had over its nearest competitor. That is a category shift.
On Cognition’s FrontierCode Diamond, a harder independent benchmark that tests whether models can write production quality code, Fable 5 scored 29.3%. Opus 4.8 managed 13.4%. GPT 5.5 got 5.7%.

In relative terms, the gap there is even wider.
CursorBench measures coding performance inside the actual Cursor editor environment, using real tools under real conditions. It put Fable 5 at 72.9%, which is 9 points ahead of the next best model.

2. On Knowledge Work and Finance
Fable 5 also posted some impressive results on benchmarks that measure professional knowledge work.
|
Benchmark |
Fable 5 |
Notable comparison |
|---|---|---|
|
Hebbia Finance |
#1 overall |
Biggest gains in document reasoning, chart interpretation, multi-step problem solving |
|
GDPval AA |
1932 Elo |
Leading the field |
|
Harvey BigLaw Bench |
93.4% |
— |
|
Hex Core Analytics |
First model to break 90% |
— |
|
HealthBench Professional |
66.0% |
vs. 56.9% Opus 4.8, 51.8% GPT-5.5 |
These are not cherry picked niche benchmarks. They cover finance, law, healthcare, and analytics, which happen to be some of the areas where enterprises spend the most money.
3. On Vision
Fable 5 is now the top publicly available model for vision tasks. It can extract numbers from complex scientific charts, recreate a web application’s source code from screenshots, and complete Pokémon FireRed using only raw game screenshots.
The Pokémon demo has received some skepticism (see Limitations section). The improvement in vision performance is real regardless.
4. On Long Horizon Agentic Tasks
This is where Fable 5 pulls away most clearly. In Anthropic’s Slay the Spire benchmark, both Fable 5 and Opus 4.8 were given persistent memory to save and revisit notes during the task.
-
Fable 5 improved 3x more than Opus 4.8
-
Reached the final stage of the game 3x as often
The longer and more complex the task becomes, the larger Fable 5’s advantage grows.
5. The Restricted Benchmarks
ExploitBench shows Mythos 5 at 78.0% compared with 40.0% for Opus 4.8. You won’t get that performance from Fable 5 because cybersecurity related requests are automatically routed to Opus 4.8.
That score helps explain why the safety restrictions exist.

During Mythos Preview testing, the model found 271 zero-day vulnerabilities in Firefox (addressed in Firefox 150) with no expert guidance and no specialized red-team setup.
Claude Opus 4.6, run on the same codebase, surfaced 22 bugs. The cybersecurity benchmark isn’t a feature being promoted. It’s the reason the safety system was built.
III. Real-World Results: What Real Companies Found
Benchmarks are controlled environments. Here’s what happened when real organizations brought their own problems.
1. Stripe: The Number That Stops Conversations
Stripe gave Fable 5 a 50 million line Ruby codebase and asked for a codebase-wide migration.
Fable 5 finished in one day. Anthropic’s estimate for the same migration from a full engineering team working by hand: over 2 months.
Twitter tweet
Not 2 weeks, but 2 months, it compressed into one day. That’s not the kind of result you can wave away with “well, AI is good at code now.”
That’s a reordering of what’s possible. If that number holds up at scale, and there’s no reason yet to think it doesn’t, entire categories of engineering work just changed their economics.
2. GitHub
GitHub’s early testing found Fable 5 “completed equivalent work with fewer tool calls and lower token consumption than previous Opus-tier models.”
Twitter tweet
That’s understated on purpose. GitHub works with every major model and isn’t in the business of handing out superlatives.
3. IMC (Proprietary Trading Firm)
IMC, the proprietary trading firm, ran Fable 5 through their internal trading analysis evaluations.
→ It aced them nearly across the board: factual analysis, reasoning tasks, root cause investigations, expected value calculations.
Trading analysis requires accuracy you can trust. False positives are expensive. That result carries more weight than most marketing adjacent enterprise testimonials.
4. Every / Dan Shipper’s Senior Engineer Benchmark
Twitter tweet
Dan Shipper at Every tested Fable 5 against a Senior Engineer benchmark, a battery of tasks designed to approximate what a strong engineer actually does at work.
-
Fable 5: 91/100
-
Opus 4.8: 63/100
The gap is 28 points. That’s not a close race.
4. Biology (Mythos 5’s Version)
This one sits under Mythos 5 rather than Fable 5, but the scale deserves mention.
Anthropic’s internal protein design experts tested Mythos 5 on drug design work. The model was given bioinformatics tools but no human guidance. It matched or outperformed experienced human operators across every stage of the workflow:

-
Selecting binding sites
-
Running protein design tools
-
Detecting and correcting its own mistakes
-
Iterating independently to improve results
That’s not “AI is helpful for research.” That’s a fundamental change in how fast drug candidates can be developed.
IV. Internet Reacts: X, Builders, and Day-One Demos
The most reliable signal for whether a model release is genuinely new is what happens in the first 24 hours on X.
Hype cycles around mediocre releases produce vague posts and marketing reposts. Genuine step changes produce people showing their work.
June 9 produced the second kind.
1. Andrej Karpathy
Andrej Karpathy posted this within hours of launch. He’s seen every major model release from the inside, built some of them, and has no reason to overstate anything:
His phrase: “it will just go.”
Twitter tweet
Anyone who has worked extensively with LLMs knows that the hard part is getting a model to keep going, maintaining a coherent plan across a long, hard task without losing the thread, hallucinating progress, or asking for clarification when it shouldn’t.
That’s what changed.
He also added, honestly: the safeguards are “configured to be a little too trigger happy for launch.” More on that below.
2. Michael Truell, Cursor CEO
“Claude Fable 5 is the state of the art model on CursorBench. It’s opened up a class of long horizon problems that were out of reach.”

“Out of reach.” Not “harder to do.” Not “took longer.” Out of reach. . That’s a meaningful distinction from the CEO of one of the most widely used AI coding tools in the world.
3. Deedy Das (Developer Investor)
Das put together a roundup of the most impressive day-one demos and said it left him “genuinely worried about where software engineering is headed.” Among the highlights:
-
Photorealistic forest scenes generated in a single shot
-
A complete Boeing 747 render
-
Space simulations running with 5,000+ objects
-
A proprietary code evaluator that Fable 5 optimized 10x further than the next best model
Twitter tweet
4. Minecraft in One Prompt
Ziwen Xu posted on launch day: a working Minecraft clone, built from a single prompt to Claude Fable 5. Video attached, not a screenshot. The full game loop running in the browser.
Anyone who has tried to vibe code something this complex before knows it usually takes hours of back and forth, multiple sessions, plenty of cleanup.
Getting it right the first time is the part that hits different.
Twitter tweet
V. Claude Fable 5 Safety Architecture: What’s Blocked
1. The Visible Part
Fable 5 ships with 3 classifiers: offensive cybersecurity, biology and chemistry, and model distillation. When one fires:
|
Environment |
What happens |
|---|---|
|
Claude / Claude Code |
Request reroutes to Opus 4.8. You’re notified it happened. |
|
Raw Messages API |
No automatic fallback. You get a structured refusal. Your integration handles it. |
Classifiers fire in fewer than 5% of sessions. The other 95%+ runs at full Mythos 5 capability.
Before launch, Anthropic ran 1,000+ hours of external adversarial testing. No universal jailbreak was found.
2. The Less Visible Part
for requests related to frontier LLM development, the model may be silently weakened through prompt modification, steering vectors, or parameter-efficient fine-tuning. No notification. Estimated 0.03% of traffic.
Researcher Nathan Lambert put it directly: “An AI model that gets less intelligent automatically without notifying me is categorically misaligned AI.”
Twitter tweet
The 0.03% isn’t the issue. The principle is. A model that silently underperforms is different from a model that refuses. You can’t debug an invisible handicap.
Anthropic says classifiers are conservative at launch and will tune over time. The silent weakening clause is now in the toolbox regardless.
VI. Honest Claude Fable 5 Limitations
Expensive for routine work. $50 per million output tokens is double Opus 4.8. For summarization, Q&A, formatting, Fable 5 is overkill. Use it for hard jobs only. Opus 4.8 or Sonnet for everything else.
Slower than average. 60 tokens per second versus ~69 market average. Irrelevant for multi day autonomous sessions. Noticeable in interactive chat.
Mandatory 30 day data retention. No zero retention option. Classified as “Covered Model.” Blocker for healthcare, legal, or any field with data residency rules.
Classifiers too aggressive at launch. Karpathy flagged this. Security researchers, biology grad students, ML practitioners already reporting false positives. Anthropic says tuning will improve. For now, expect unexpected fallbacks if your work touches those domains.
Computer use isn’t a clean win. OSWorld Verified: Fable 5 at 85.0%, Mythos Preview at 85.4%. Statistically a tie. Fable 5 is not state of the art on this specific task.
Silent weakening clause. 0.03% silent degradation on frontier LLM dev. Small number. The principle is not.
Pokémon demo deserves skepticism. Best of multiple attempts? Token cost? FireRed training data advantage vs genuine generalization? Anthropic hasn’t fully answered. Vision leap is real regardless. Demo carries an asterisk.
Bonus: Fable 5 vs. Mythos 5: Full Comparison
This is the question I keep seeing in threads, Slack channels, and comment sections. Here’s the fastest possible answer:
They are the same model. Identical weights, identical intelligence, identical pricing ($10/$50 per million tokens). The difference is entirely in what’s wrapped around them.
|
|
Fable 5 |
Mythos 5 |
|---|---|---|
|
Who can access it |
Everyone |
Project Glasswing partners only |
|
Cybersecurity classifier |
Active → routes to Opus 4.8 |
Lifted |
|
Biology / chemistry classifier |
Active → routes to Opus 4.8 |
Lifted |
|
Distillation classifier |
Active → routes to Opus 4.8 |
Lifted |
|
What you get when a classifier fires |
Opus 4.8 response (notified) |
Full Mythos response |
|
Pricing |
$10 input / $50 output per M |
$10 input / $50 output per M |
If you’re not an approved Project Glasswing partner, you’re on Fable 5. And for 95%+ of what anyone actually does, that is the same thing as Mythos 5. The gate is narrow. The intelligence behind it isn’t.
Use Fable 5 for:
-
Multi-day agentic workflows
-
Large codebase migrations
-
Complex analytical pipelines
-
Dense PDF analysis
-
Design-to-code operations
Stay on Opus 4.8 or Sonnet for:
-
Routine queries and quick drafts
-
Summarization and formatting tasks
-
Anything where latency and cost aren’t justified by the complexity
Conclusion
Anthropic spent 2 months gating the dangerous parts, then opened the door to everything else. That’s either ironic or exactly what responsible scaling looks like.
Maybe both.
If you are interested in other topics and how AI is transforming different aspects of our lives or even in making money using AI with more detailed, step-by-step guidance, you can find our other articles here:
-
Master These 23+ Free Google AI Updates That Outclass Every Tool in Only 24 Hours!
-
The New Way to Build Profitable AI Websites With Gemini 3 (It Starts With One Page)
-
I Made 50+ AI Videos for Free With This Video AI Generator (Seedance is Free & UNLIMITED?)*
-
Add This Tiny Prompt Trick Before Every ChatGPT, Gemini, Claude Prompt to Get Accurate Outputs*
-
Google Gave You 3 AI Professionals for Free!? Brand Strategist, UI Designer & Developer?*
*indicates a premium content, if any


Leave a Reply