How to Make Claude Code Validate its own Work

very powerful model out of the box. To leverage its full capabilities, however, you need to give it access to validate and verify its own work.

In a previous article, I mentioned Claude validating its own work as an important part of how I optimize my own use of Claude Code. In this article, however, I’ll dive deeper into how I make Claude validate its own work.

The benefits are incredible. When you make Claude validate its own work, you get:

A model better at one-shotting implementations (spends less time iterating)
A model that can run for longer (the model keeps going until it’s successfully able to verify its own work)
The model can complete more complex work

I’ll dive deeper into some specific tasks where I ask Claude to verify its own work, where I save a lot of time. I’ll also cover my thought process when setting up Claude in this way.

In this article I’ll discuss how to let Claude code verify its own work to increase performance. Image by ChatGPT.

Why should you have Claude verify its own work?

The number one reason you should make Claude verify its own work is that it simply makes Claude perform better. You can imagine this with the following scenario:

Imagine you had to implement a piece of code to calculate the Fibonacci sequence. Obviously, some people have done this exact task before, and it’s going to be relatively simple for them to do. However, imagine that you have to complete this task perfectly without ever getting the opportunity to run the code and see the output, i.e., you have to create the perfect code on your first attempt at the problem. So, naturally, this is way harder than if you get the opportunity to test the code yourself, tweak it if you see it’s not producing the exact correct numbers, and continue like that until your piece of code is producing the correct output.

The same exact concept applies to Claude Code. If you don’t give it the chance to verify its own work, it’s like asking it to write code for the Fibonacci sequence without letting it ever see the output of the code. Obviously, you’re putting Claude Code in a worse position where it’s going to produce inferior results compared to when Claude Code gets the opportunity to test its own code.

How to make Claude verify work in practice

The wording “make Claude verify its own work”, often gets thrown around, for example on LinkedIn and X. However, I notice relatively few people explaining exactly how they do it themselves, which makes it hard for others to replicate.

Thus, I’ll cover some real-world examples of how I made Claude verify its own work. I’ll cover the process from:

Hearing about a problem
Understanding what’s causing the problem
Implementing a solution with Claude and ensuring it can verify its own work

Long LLM processing times

My first concrete example is a case where I was analyzing user data from an interaction with a conversational AI agent. After the conversation, I have to process the chat, such as fetching the transcript and performing classification and data extraction on the transcript.

I started investigating the problem by reproducing it and running the LLM processing on the same conversation multiple times, and seeing how long it took. It turned out that the median and average time were relatively acceptable, around 30 seconds, but around every tenth time, processing time would be over two minutes, which is, of course, completely unacceptable. I explained the situation to Claude Code and asked him what could be causing this issue.

The most likely cause, it turned out, was that I was simply inputting a lot of tokens and outputting a lot of tokens, which in some situations take a lot of time to produce. Thus, the solution was to take this one single LLM call and split it into three to make the number of output tokens it had to produce fewer, so that it can run in parallel.

This is an example of a perfect task where Claude Code can verify its own work:

A perfect task to verify your own work is a task where you have a known expected output you want to produce and you can keep working and iterating on the problem until you reach that exact output.

This is great because what I have now is a number of input tokens that are run, and an expected output, which is what I expect if I do everything in one LLM call. And I can simply ask Claude Code to split a LLM call into three pieces and to make sure that you’ve done it correctly, compare the result from the split LLM calls versus the single monolithic LLM call, they are almost exactly the same (not exactly the same because LLMs are stochastic)

I prompted my Claude Code instance with all this information. It kept iterating on its code until it ensured the outputs were the same, and it successfully one-shot the problem, coming back to me with a successful solution.

Designing a web page

The last example I provided was great because it’s very simple for the LLM or Claude Code to verify the results. It can simply perform an API call, compare outputs, and see if it’s correct.

However, what happens when the output you want to produce is a visual?

My second example includes a problem where I received a design for what a web page should look like, and I wanted Claude Code to produce that exact design. Of course, given the framework of the application and the existing codebase it was written for.

This might sound like a harder task because it involves visually looking at results. Luckily, we have Claude in Chrome, which is an MCP where you can give Claude access to your Google Chrome and let it visually inspect results.

So I was provided with a screenshot of a design of what the page should look like, including how the page was organized into different components and the coloring scheme used in the design.

This task is pretty straightforward. I simply gave Claude Code screenshots and asked him to implement the design. If your design is quite simple, this might just work out of the box. However, some more complex designs are harder to one-shot, especially if you’re doing it in an existing large codebase that has a lot of dependencies and design protocols.

Thus, to give Claude Code the best chance at one-shotting the problem itself, I gave it access to Google Chrome. If you want to set this up yourself, you can simply ask your Claude Code instance, how do I give you access to Google Chrome?

I instructed my Claude agent to first attempt implementing the design, then go into Google Chrome, load the relevant page after spinning up the servers, of course, taking a screenshot and comparing the designs. If it saw any discrepancies, it should continue iterating until the designs look almost the same.

Furthermore, I asked my agent to inform me of any discrepancies between the two designs if it was not possible to implement something or if it was unclear how to implement something. This is a great tactic because it makes Claude come to you with questions instead of you having to instruct Claude on absolutely everything regarding the design. Overall, this is a great technique to work better with your coding agents.

Conclusion

In this article, I covered how to make Claude Code validate its own work, to vastly improve the performance of your Claude Code instance or coding agent in general. I discussed why it’s so important to highlight how allowing Claude to verify its own work simply makes it perform a lot better with a higher success rate on one-shot implementations, and letting the agent work for longer periods of time, and still successfully completing tasks. I covered two specific situations I was put in where I gave Claude Code access to verify its own work, including splitting an LLM call into three separate calls to improve latency and following the designs made for a web page and implementing it into my application. Both of these are specific situations that I’ve been put in where I’ve successfully allowed Claude to verify its own work and increase its performance.

👋 Get in Touch

👉 My free eBook and Webinar:

🚀 10x Your Engineering with LLMs (Free 3-Day Email Course)

📚 Get my free Vision Language Models ebook

💻 My webinar on Vision Language Models

👉 Find me on socials:

💌 Substack

🔗 LinkedIn

🐦 X / Twitter

What's Hot

OpenAI claims ChatGPT’s new default model hallucinates way less

Threads is finally getting DMs on the web

Airbnb co-founder taps Peter Arnell as first US chief brand architect

Single Agent vs Multi-Agent: When to Build a Multi-Agent System

How to Build an Efficient Knowledge Base for AI Models

Playing Connect Four with Deep Q-Learning

OpenAI claims ChatGPT’s new default model hallucinates way less

Threads is finally getting DMs on the web

Airbnb co-founder taps Peter Arnell as first US chief brand architect

Quantization from the ground up

David Sacks is done as AI czar — here’s what he’s doing instead

Judge sides with Anthropic to temporarily block the Pentagon’s ban

Most Popular

OpenAI claims ChatGPT’s new default model hallucinates way less

Threads is finally getting DMs on the web

Airbnb co-founder taps Peter Arnell as first US chief brand architect

Our Picks

Quantization from the ground up

David Sacks is done as AI czar — here’s what he’s doing instead

Judge sides with Anthropic to temporarily block the Pentagon’s ban

Subscribe to Updates

What's Hot

How to Make Claude Code Validate its own Work

Why should you have Claude verify its own work?

How to make Claude verify work in practice

Long LLM processing times

Designing a web page

Conclusion

Related Posts

Subscribe to Updates