AI Agents Claude Token Optimization
6 min read AI Testing

Does Caveman AI Really Cut 65% of Claude Tokens? We Tested It

The viral Caveman AI plugin promises dramatic Claude token savings - but does it deliver in real coding sessions? We ran identical tasks with and without Caveman to separate hype from reality. Here's what actually happened when we put those 65% claims to the test.

The Caveman AI Hype

When the Caveman AI plugin went viral across developer communities, its promise seemed almost too good to be true: reduce Claude token usage by 65% simply by making Claude's responses more concise. The GitHub repository quickly amassed 40,000 stars, and social media buzzed with claims of dramatic cost savings.

The premise is straightforward - Caveman modifies Claude's responses to use shorter sentences, fewer unnecessary words, and more compact phrasing while maintaining meaning. As shown in the plugin's examples, sentences that might normally take 20 tokens could be reduced to just 5.

The catch: These impressive reductions only apply to conversational tokens - the back-and-forth between you and Claude. In real coding sessions, most tokens are consumed during Claude's thinking and code generation processes, which Caveman doesn't affect.

Our Testing Methodology

To test Caveman's real-world impact, we designed a controlled experiment using Claude Code:

  1. Created two identical project folders with the same starting conditions
  2. Ran the exact same coding task in both - implementing an API from a project description
  3. Used Claude Opus 4.7 with high effort thinking in both sessions
  4. Enabled Caveman in one session but not the other
  5. Measured token usage before and after each session

This approach allowed us to isolate Caveman's effect while keeping all other variables constant. We specifically chose a coding task that would involve some conversational elements but also substantial code generation.

Standard Claude Results

Our baseline test without Caveman produced exactly what you'd expect from a typical Claude Code session. The model:

  • Generated a detailed implementation plan with complete sentences
  • Provided status updates during execution ("Now creating the API endpoints")
  • Completed the task in 3 minutes

Token usage increased by 4% during this session (from 13% to 17% of our allocated tokens). While Claude's responses were verbose, the majority of tokens clearly went toward thinking through and generating the actual code rather than communication.

Caveman-Enabled Results

With Caveman active, we immediately noticed differences in Claude's communication style:

  • Implementation plans used ultra-concise phrasing ("Plan enum, service, form, request")
  • Status updates were minimal ("Creating API", "Tests fixed")
  • The same task completed in 4 minutes (slightly longer)

Despite Caveman's dramatic shortening of conversational elements, the token usage increased by the same 4% (from 17% to 21%). This suggests that in coding sessions, the conversation represents a small enough portion of total tokens that shortening it has negligible impact on overall usage.

Token Usage Comparison

The identical token usage in both sessions reveals an important truth about how Claude consumes tokens:

80-90% of tokens go toward thinking and code generation, not conversation. Even dramatic reductions in conversational tokens barely move the needle on total usage during coding tasks.

This aligns with insights from Claude's developers and experienced users on Reddit. As one commenter noted: "It's not the prompts that cost the money, it's the thinking." Another added: "This optimizes the cheapest part of the bill."

When Caveman Actually Helps

Our tests suggest Caveman may provide more value in scenarios where:

  • You're having extended conversations with Claude (brainstorming, Q&A)
  • The task involves minimal code generation but lots of discussion
  • You're using Claude primarily as a conversational assistant

In these cases, where conversational tokens represent a larger portion of total usage, Caveman's shortening effect could translate to meaningful savings. However, for pure coding sessions, the benefits appear minimal.

Better Ways to Save Tokens

If you're looking to reduce Claude costs, consider these more effective strategies:

  1. Use smaller models when appropriate (Haiku for simple tasks)
  2. Break complex tasks into smaller, focused prompts
  3. Provide clearer initial instructions to reduce back-and-forth
  4. Monitor usage through Claude's API with alerts

These approaches target the actual major consumers of tokens (thinking and code generation) rather than optimizing the smaller conversational portion.

Watch the Full Tutorial

See our side-by-side testing in action, including the exact token usage measurements and a detailed comparison of Claude's responses with and without Caveman enabled.

Testing Caveman AI token reduction claims with Claude video

Key Takeaways

The Caveman AI plugin does exactly what it promises - dramatically shortens Claude's conversational responses. However, in real coding sessions where most tokens go toward thinking and code generation, these savings have minimal impact on total usage.

In summary: Caveman can help in chat-heavy workflows but won't significantly reduce costs for typical coding sessions. Focus instead on using appropriate model sizes and breaking complex tasks into smaller steps.

Frequently Asked Questions

Common questions about this topic

Caveman AI is a plugin/skill for Claude that aims to reduce token usage by simplifying Claude's responses. It removes unnecessary words and phrases while maintaining meaning.

The creators claim it can reduce token usage by 65% during conversations by making responses more concise without losing essential information.

  • Works with Claude Code and Claude AI
  • Easy to install with one command
  • Primarily affects conversational tokens

In our side-by-side tests of identical coding tasks, Caveman showed no measurable token savings - both sessions used 4% of available tokens.

The plugin primarily shortens conversational responses, but most tokens are consumed during Claude's thinking and code generation processes, which Caveman doesn't affect.

  • Same token usage despite shorter responses
  • Thinking/coding dominates token consumption
  • Conversation is small portion of total usage

Caveman may provide more benefit in chat-heavy workflows where Claude spends more time conversing than generating code.

For example, when brainstorming ideas, discussing multiple approaches to a problem, or using Claude primarily as a conversational assistant rather than a code generator.

  • Best for extended discussions
  • Minimal code generation scenarios
  • When conversation dominates token usage

No, our tests showed identical code output quality with and without Caveman.

The plugin only modifies how Claude communicates its thoughts, not the underlying reasoning or code generation capabilities. The actual code produced was the same in both sessions.

  • Same code quality
  • Same functionality
  • Only communication style changes

Installation is simple - just run the Caveman installation command in Claude Code. No additional configuration is needed.

You can then invoke Caveman mode when desired by typing /caveman before your prompt. The plugin automatically modifies Claude's responses to be more concise while maintaining meaning.

  • One-command installation
  • No configuration required
  • Invoke with /caveman prefix

Yes, more effective strategies include using smaller models for simpler tasks, breaking complex tasks into smaller steps, providing clearer initial instructions, and monitoring usage through Claude's API.

These approaches target the actual major consumers of tokens (thinking and code generation) rather than optimizing the smaller conversational portion that Caveman affects.

  • Use Haiku for simple tasks
  • Break down complex projects
  • Monitor usage with API

The 65% claim comes from measuring shortened responses in conversational contexts where communication dominates token usage.

However, in real-world coding sessions, conversational tokens represent a small portion of total usage. Most tokens are consumed during Claude's thinking and code generation processes, which Caveman doesn't affect.

  • Based on conversational examples
  • Doesn't account for thinking tokens
  • Marketing vs. real-world usage

GrowwStacks helps businesses implement cost-effective AI workflows tailored to their needs. Our team analyzes your usage patterns to recommend the most efficient model configurations, prompt strategies, and automation approaches.

We can design custom solutions that balance performance with cost, whether you're using Claude, GPT, or other AI systems. Our implementations typically reduce token usage by 30-50% through optimized workflows rather than just response shortening.

  • Custom AI workflow optimization
  • Usage pattern analysis
  • Model and prompt strategy consulting

Need Help Optimizing Your AI Workflow Costs?

Token costs can quickly spiral out of control without the right strategy. Our team specializes in designing efficient AI workflows that deliver maximum value with minimal waste.