Reverse Engineering of Claude Fable 5.0 and Mythos Class
Forensic analysis of the complete Anthropic System Prompt (119,726 characters / 1,586 lines). Claudeception, Fable vs Mythos routing, draconian copyright rules, persistence sandbox, and identified attack vectors.
65% of a 120K character system prompt does not tell the model how to think. It tells it how to format, how to comply with copyright laws, and how to use the tools stitched to its body. The LLM is not a brain: it is a JIT compiler of SaaS micro-applications.
◈ 1. Context: Mythos-Class vs. Fable-Class
| Dimension | Claude Fable 5.0 | Claude Mythos 5.0 |
|---|---|---|
| Distribution | Commercial, generalist | Governments / partners with closed accreditation |
| Security Classifiers | Active (cybersecurity, biology, chemistry) | No restrictive classifiers |
| Fallback Routing | Redirects to Opus 4.8 if risk detected | No fallback routing |
| Knowledge Cutoff | End of January 2026 | End of January 2026 |
| API String | claude-fable-5 | claude-mythos-5 |
| Neural Network | Shared architecture | Shared architecture |
Both models share the same underlying neural network. The difference is purely operational: security classification layers that intercept high-risk queries in the commercial Fable version. Mythos is the "naked" model—without filtering restrictions—distributed exclusively to organizations with prior accreditation under closed security schemas.
◈ 2. System Prompt Structure: 7 Macro-Blocks
The System Prompt is divided into seven blocks of operational control. Each block governs a distinct dimension of the model's behavior:
claude_behavior
Tone, asymmetric denial, psychological epistemology control, advertising policies. Forbids lists to reject tasks.
persistent_storage_for_artifacts
Native window.storage API for state retention between sessions. Personal/shared scopes.
anthropic_api_in_artifacts
Claudeception. Artifacts that recursively call the Anthropic API with claude-sonnet-4.
CRITICAL_COPYRIGHT_COMPLIANCE
Maximum 15 words of direct quotation. One quote per source. Zero tolerance for lyrics/poems/haikus.
computer_use & Skills
Command execution in Ubuntu 24 container. bash_tool, str_replace, create_file.
search_instructions
Web search instructions, results formatting, and legal copyright compliance.
Tool Definitions (18 tools)
bash_tool, create_file, str_replace, message_compose_v1, places_map_display_v0, recipe_display_v0, ask_user_input_v0, recommend_claude_apps, and 10 more.
⚡ 3. Claudeception: The Recursive Paradigm
Fable 5.0 is instructed to create artifacts containing JavaScript code capable of recursively calling the Anthropic API. The LLM generates frontend code that, in turn, invokes another LLM as a backend. It is a paradigm shift: the model doesn't just generate text—it compiles full SaaS micro-applications in real-time.
| Parameter | Value / Constraint |
|---|---|
| Designated model | claude-sonnet-4-20250514 |
| Endpoint | /v1/messages (without user API Key) |
| Available tools | web_search_20250305 |
| Input formats | PDFs and images (Base64) |
| Token security | API Key injected/signed by claude.ai gateway |
const response = await fetch('/v1/messages', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': window.__CLAUDE_API_KEY__ // injected by gateway
},
body: JSON.stringify({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
messages: [{ role: 'user', content: userPrompt }],
tools: [{
type: 'web_search_20250305',
name: 'web_search',
max_uses: 5
}]
})
}); ◈ 4. Persistent Storage API: The State of the LLM
Claude's browser sandbox exposes a native API (window.storage) that allows generated applications to retain state across sessions. This converts the LLM into an application compiler with a persistent database.
// Read
const data = await window.storage.get('user:profile', false);
// Write
await window.storage.set('user:profile', { name: 'Borja', role: 'operator' }, false);
// Delete
await window.storage.delete('user:profile', false);
// List with prefix
const keys = await window.storage.list('user:', false);
// Shared data (global for all users)
const leaderboard = await window.storage.get('game:scores', true);
The prompt strictly forbids the use of native browser APIs (localStorage, sessionStorage) within artifacts. The isolated claude.ai environment does not support them. Only window.storage is functional.
◈ 5. Draconian Copyright Restrictions
Designed to mitigate legal claims from publishers and creators. The CRITICAL_COPYRIGHT_COMPLIANCE section establishes absolute numerical limits:
◈ 6. Catalogue of 18 Tools
| Tool | Use Case |
|---|---|
| bash_tool | Command execution in Ubuntu 24 container. Requires descriptive justification. |
| create_file | Atomic file writing in the container. |
| str_replace | Block modification via search and replace (inline editing). |
| message_compose_v1 | Structured correspondence generator according to relational risk level. |
| places_map_display_v0 | Native interactive map renderer from place IDs. |
| recipe_display_v0 | Structured recipe visualizer with dynamic measurement. |
| ask_user_input_v0 | Buttons and multiple-choice interactions to prevent free-text typing on mobile. |
| recommend_claude_apps | Recommendation engine for ecosystem tools and connectors. |
◈ 7. Behavioral Directives: What the Control Reveals
Asymmetric Denial
Claude is prohibited from using lists or bullet points to reject tasks. The objective: "soften the blow" of the denial. The rejection must flow as prose, not as a checklist of restrictions.
Psychological Guardrails
Prohibition of clinical diagnosis: it cannot attribute labels such as "depression" if the user does not introduce them first. Prohibition of painful grounding techniques (squeezing ice, rubber bands)—they reinforce the destructive neurochemical response.
Advertising Policies
Explicit prohibition against promoting integrated commercial products. Insistence on semantic differentiation: "Claude products are ad-free", not "Claude is ad-free".
⚠ 8. Identified Attack Vectors
Cross-analysis with the red-team engine cassandra-mythos (FastAPI + LangGraph).
Vector 1: AST Fragmentation (Pliny Attack)
The SymbolicAttackGenerator engine from cassandra-mythos demonstrates how an attacker can decompose a malicious instruction into individual non-syntactic fragments that evade the static per-turn filter.
Vector 2: Context Inheritance in Fallback Routing
When Fable redirects high-risk queries to Opus 4.8, if the routing inherits the execution context without sanitizing the history hashes, an exploit injected in the initial context transfers intact to the fallback model—which may lack the same static stopping layers.
Vector 3: Pack Hunting (Multi-Agent)
The linear structure of the prompt allows "anesthetizing" one rule block by forcing it into conflict with another. Example: forcing the model to violate malicious code synthesis rules to comply with a legitimate defensive cybersecurity instruction, both permitted under harmful_content_safety.
◈ 9. Reverse Engineering Conclusions
The LLM as a JIT Compiler
65% of the system prompt is composed of tool usage instructions, output formatting, and legal copyright compliance. The LLM is not a thinking brain—it is a JIT compiler that generates complete SaaS micro-applications with persistence, web search, and map and recipe rendering.
Hybrid Sandbox = Product, Not Model
The coupling of persistent_storage_for_artifacts (persistent key-value database) and anthropic_api_in_artifacts (Claudeception) confirms that Anthropic visualizes Mythos-class models as application platforms, not as chatbots.
The Structural Weakness of Linear Control
The linear segmentation of the prompt into independent blocks (copyright, well-being, safety) enables inter-block conflict attacks. A design with dependency graphs or hierarchical prioritization among blocks would be more resistant to "pack hunting".