Skip to main content

How to Use the Claude Message Batches API (And When Not To)

Michael 6 min read
How to Use the Claude Message Batches API (And When Not To)

Claude API offers a Message Batches API that costs 50% less than typical API calls, and your manager is excited. But before you think solving for the bottom line will make everything fine, let's examine what exactly that means in this Claude Code tutorial on the Message Batches API.

The Claude API requires payment for tokens to make API calls. At the time of this writing, these are the published token rates:

Current API Pricing
Model Cost Per Million Input Tokens Cost Per Million Output Tokens
Haiku 4.5 $1 $5
Sonnet 4.6 $3 $15
Opus 4.6 $5 $25

For every million tokens you send as input to the Haiku 4.5 model over the Claude API, you'll pay $1. For every million tokens received from the Haiku 4.5 model, you'll pay $5. A token is a unit of measurement used by Claude to measure text. One token is approximately 4 characters of text, or about 750 English language words per 1,000 tokens. Both what you sent to the model and what is received by the model counts toward your bill.

What is the Message Batches API?

The Claude Message Batches API is a separately exposed API that has a few features the Claude Message API does not:

  • Each call is asynchronous
  • There is no guaranteed latency SLA
  • Up to a 24-hour processing window
  • 50% cost savings on both the input and output
  • No multi-turn tool calling with a single request
  • A custom_id field for matching requests to responses

Let's stop and break these down one-by-one for better clarity.

Each Call is Asynchronous

The Claude Message API is a blocking API. Each call, once sent, is "blocked" until a response is received. That is called synchronous processing. However, the Claude Message Batches API is asynchronous, meaning it is able to receive more requests because you're not waiting on a response before another can be sent.

No Guaranteed Latency SLA & 24-Hour Processing Window

Let's tackle the next two points together. There is no guaranteed latency SLA (Service Level Agreement). What this means is that Anthropic cannot guarantee your call will return within a certain amount of time after submitting it. It could return in 20 minutes or 20 hours. The only guarantee Anthropic makes is that your request's response will return within 24 hours. And this is important to consider in your business. If your company must produce reports within 12 hours, the Claude Message Batches API wouldn't work for you because they can't guarantee a result before 24 hours has expired. And if your business requires an API call to turn around a result in 30 minutes, this same API isn't for you because there's no guaranteed latency SLA.

But There's a Price Savings. Cha-ching!

But with these restrictions does come a bonus. All calls to the Claude Message Batches API, both inbound and outbound tokens, are priced at 50% less than the standard token rates. That means the rates for your calls, at the time of this writing, are:

Batch API (50% off)
Model Cost Per Million Input Tokens Cost Per Million Output Tokens
Haiku 4.5 $0.50 $2.50
Sonnet 4.6 $1.50 $7.50
Opus 4.6 $2.50 $12.50

Tool Calling is Not an Option

There's no multi-turn tool calling. The Claude Message API is able to call tools to interface with databases or other APIs or other systems. However, each call made to the Claude Message Batches API must be fully-formed and self-contained. There is no opportunity to set up and call tools you've defined that will execute based on the response. The call to the Claude Message Batches API is sent and returned with no option for using tools on the response.

You'll Recognize Your Batch

And Anthropic has made a custom_id field available to help identify the batch, or batches, you've sent to the API so that you can identify them during the response. During request, you send, as part of the payload, a string value in custom_id. During the response, you will find that same field available to you for matching. Anthropic has provided some sample code on how to make the call to the API, including the use of the custom_id, on the How to Use the Message Batches API page of its documentation.

The Decision Framework

Now that you know more about the Claude Message Batches API, when would you use it over the Claude Message API?

The first question to ask is this: does anyone have to wait for this result?

Hey, Wait a Minute!

Suppose you've just fixed a bug in production and need to get your code merged. But you're using the Claude API to perform some pre-merge checks, and eagerly anticipating the results so you can move forward. This would not be a candidate for the Claude Message Batches API, because you're waiting on the API call to return so you can do your job. That's a synchronous blocking workflow.

However, if you have a bunch of reports that just need to be available sometime in the next 24 hours, the Claude Message Batches API is the perfect solution because no one is waiting for the API call to finish so they can do their job. Plus you get that sweet, sweet 50% off discount for using the Batches API. That's a non-blocking, latency-tolerant workflow.

Handling Failures

Failures happen, and API calls are not exempt. Fortunately, Anthropic makes it easy to catch errors by using the custom_id field.

How it works

When you submit your batches to the Claude Message Batches API, the responses can return in a number of states, but we want to handle failures so you'd be looking for the value of errored in the result_type field in the response json. In the same response JSON, the custom_id value submitted in the request will also be in the response. By examining these two field values in the response JSON, you can determine which requests failed so that you can evaluate and resubmit. Here's an example output for an errored batch:

{
  "custom_id": "the-request-id-you-used",
  "result": {
    "type": "errored",
    "error": {
      "type": "request_too_large",
      "message": "the message returned with the error"
    }
  }
}

One note here: the official Anthropic documentation isn't perfectly clear on how the error messages are returning, choosing to opt out of showing the custom_id wrapper around the result node in their example.

How Should They Be Resubmitted?

Once you've identified which original requests failed and should be modified, you should remember that only the ones that failed should be resubmitted, not all of the requests all over again.

Why Do Batches Fail?

Batches can fail for a number of reasons, but here are some more common ones, according to the official Anthropic docs:

  • Sometimes batches fail because they exceed context limits. When using Claude API, the total batch size maximum is 256MB.
  • The model you're using must be supported (i.e. not deprecated)
  • Each batch must have a unique custom_id value
  • Ensure it has been, at most, 29 days since the created_at value in the batch, else the results will no longer be viewable
  • Confirm that the batch has not been cancelled

For instance, if your batch fails because of size, you may need to submit multiple requests for that batch (still using a unique custom_id), by chunking the input data rather than attempting to send it all in one request.

Does Your Batch Fit Your SLA?

I touched on the SLA and the 24-hour processing window when using the Claude Message Batches API earlier in this post, but we didn't talk much about calculating whether or not it fits into your workflow. It's very simple, and concrete, math:

IF your_sla_in_hours > 24 THEN submission_increment = (your_sla_in_hours - 24) ELSE ...Do not use the Batches API

  • if your SLA is 48 hours, submit your batches every 24 hours or less.
  • If your SLA is 25 hours, submit your batches every 1 hour or less (this is risky though due to any unforeseen delays in batch processing)
  • If your SLA is 18 hours, fuggeddabouddit.

That's All Folks

The Claude Message Batches API can certainly be a set-it-and-forget, cost-saving solution for the right environment, but the SLA has to always be taken into account due to the option's 24-hour turnaround rule. And there is no minimum guaranteed latency SLA either, so your call could return in 20 minutes or 20 hours, or more. But for the budget-conscious, it's an easy decision if it will fit into your SLA. Run the math before you pitch it to your manager. The savings are real. So is the constraint.

Exam Prep

Preparing for the Claude Certified Architect Foundations exam? See what's covered and browse the full tutorial library mapped to all five exam domains.

Related Posts

The Best Way to Get Claude Code to Find and Fix Its Own Bugs

How to Integrate Claude Code into CI/CD Pipelines

When to Use Plan Mode in Claude Code (And When to Skip It)

← Back to all posts