Anthropic Launch Batch API - up to 95% discount

Anthropic Launch Batch API - up to 95% discount

October 14, 2024·
Shaun Smith

Anthropic Introduce Batch Pricing

In our recent article examining OpenAI’s Prompt Caching, we noted that “Batch Mode pricing is a clear area where Anthropic have a gap” in their lineup. Four days later, Anthropic launched Batch Pricing as a Beta for their Claude models. This new feature offers a 50% discount on both input and output tokens, with the trade-off being a potential wait of up to 24 hours for results.

Anthropic Beta Batch Picture

Anthropic Batch API Beta

Our testing has revealed a further discount - it is possible to combine the Cache discount with Batch Pricing. With the correct configuration, users can achieve up to a 95% discount on input tokens.

In this article, we’ll examine Anthropic’s new pricing structure and compare it with OpenAI’s current offerings.

Pricing Comparison and Implications

To understand the impact of this new pricing option, we’ll revisit our previous scenarios, include the new Batch Pricing. We’ll compare Sonnet 3.5 and GPT-4o, assuming 10,000 documents or questions as before:

Scenario Model Normal Cached Batched
Summarisation1 Sonnet 3.5 $855.00 $760.51 $427.50
GPT-4o $662.50 $618.75 $331.25
Knowledge Base2 Sonnet 3.5 $936.00 $126.10 $468.00
GPT-4o $775.00 $400.04 $387.50

The standout figure here is the $126.10 for the cached Knowledge Base scenario. This shows how the 90% input discount delivers incredible price/performance when working with large amounts of unchanging content.

Batching and Caching Together

During our testing, we noted in the Anthropic API documentation that it was possible to combine the Batch and Caching Beta features.

Anthropic Usage Console showing 1.2m Input Tokens

Token Usage after a Test Run

The above shows the token usage after a test run using prompts with ~15,000 cacheable tokens, ~25 input tokens and ~200 output tokens.

It’s worth noting that Anthropic’s pricing structure for Prompt Caching involves a 25% premium for the initial cache preparation, with subsequent uses benefiting from a substantial 90% discount.

During this test, the first small batch was charged multiple times for cache writes, with further batches benefitting from the 90% cache read discount. At the time of the test, the Batches were being turned around quickly (within the Cache Expiry window).

Anthropic Cost Console showing cost of Test Runs

Cost of Test Run: $0.18 for 1.2m Sonnet 3.5 tokens

A pleasant surprise was that the 90% caching discount applies to the Batch Price rather than the standard input price. As the screenshots above demonstrate, the charge for 1.2m Sonnet 3.5 input tokens was $0.18 - consistent with the discount being applied to Batch rather than the normal pricing.

This now leaves us with 6 potential prices for Input Tokens3.

Anthropic explicitly makes no guarantee about the availability/application of Cache pricing when processing batches. In the pathological case where every Message attracts Cache Write pricing it will increase costs, but it’s reasonable to assume that Caching will work as expected (very few writes, many reads).

Here is the earlier comparison, updated to include combined Batch and Cache pricing:

Scenario Model Normal Cached Batched Both
Summarisation Sonnet 3.5 $855.00 $760.51 $427.50 $380.26
GPT-4o - $618.75 $331.25 -
Knowledge Base Sonnet 3.5 $936.00 $126.10 $468.00 $63.05
GPT-4o - $400.04 $387.50 -

Given the right conditions, combining Batching and Caching can bring the discount between 55% and an extraordinary 93%.

Update: Try this Claude Artifact Price Calculator to compare scenarios4

Implementation and Comparison with OpenAI

Both OpenAI and Anthropic now offer asynchronous Batch APIs, designed to handle large volumes of requests efficiently at a 50% discount.

For developers, switching between real-time and batch processing means moving from handling synchronous to asynchronous calls. However, as noted by Head of Claude Relations, Alex Albert on x.com:

In my opinion, any task that doesn’t need to be processed in real-time should be sent through the Batches API. Not only does it save you money, but it eliminates all the complexity of running many parallel queries, dealing with rate limiting etc.

This is good advice, and enables both User and Provider to optimise their respective operations. The below table summarises some of the key differences between the OpenAI and Anthropic Batching implementations.

Feature OpenAI Anthropic
Supported Models GPT-4, GPT-3.5 series, fine-tuned models Claude 3 Series (Haiku, Sonnet, Opus)
API Workflow File-based (.jsonl upload) Direct API calls5
Result Retrieval File download upon completion Streamable, available for 29 days
Batch Size Limits 50,000 requests or 100 MB 10,000 requests or 32 MB

A key advantage of OpenAI’s implementation is support for fine-tuned models in batch processing, making it more suitable for specialized tasks. This capability is currently not available with Anthropic’s offering.

Conclusion

Anthropic’s Batch Pricing is more aggressive than it initially appears. It significantly alters the cost optimization landscape for LLM-powered applications, making powerful models like Sonnet 3.5 more accessible than ever.

While OpenAI offer the ability to fine-tune models (specialize them for a specific task), Anthropic’s pricing structure starts to challenge the extra cost and complexity of doing so. Developers now have a cost-effective way of deploying simple in-context (within prompt) learning against more capable models - speeding up deployment and potentially improving outputs for more complex tasks.

For those cases where fine-tuning remains cost-effective then OpenAI still retain clear leadership, as the only tunable Anthropic model is Haiku 3 via Amazon Bedrock. However, given OpenAI’s current “free” discount to fine-tune models until Oct 31, it could be interpreted that OpenAI have either overestimated demand or wish to lock developers further in to their models. The final question is when Anthropic will release these features from “Beta” status, providing assurance about their implementation and support.

Footnotes


  1. Document Summarisation assumptions: Summarisation Prompt: 3,500 tokens, Average Document Length: 15,000 tokens, Expected Output Length : 2,000 tokens. ↩︎

  2. Knowledge Base Questions assumptions: Knowledge Base: 30,000 tokens, Average Question Length: 200 tokens, Expected Output Length: 200 tokens ↩︎

  3. Anthropic Token Pricing (15 October 2024, Cache Batch Read and Cache Batch Write calculated)

     
    Input Type
    Haiku 3
    Price (m/tok)
    Sonnet 3.5
    Price (m/tok)
    Opus 3
    Price (m/tok)
    Cache Batch Read $0.02 $0.15 $0.75
    Cache Read $0.03 $0.30 $1.50
    Batch $0.13 $1.50 $7.50
    Cache Batch Write $0.16 $1.88 $9.38
    Normal $0.25 $3.00 $15.00
    Cache Write $0.30 $3.75 $18.75
     ↩︎
  4. Added 15/10. Some prices may be a cent or two out due to JS arithmetic precision; combined pricing is best possible case. ↩︎

  5. An alternative S3 file-based batch interface for Claude models is available via Amazon Bedrock↩︎