Agent-native Cache Layer

Caching for
AI Agents

Make your agents perform at their peak. Slash latency and get sub-second responses with a multi-layer caching engine designed for agentic workloads.

agent.ts

import OpenAI from "openai";
import { Alchymos } from '@alchymos/openai';

const openai = new OpenAI(process.env.OPENAI_API_KEY);
const alchymos = new Alchymos(process.env.ALCHYMOS_API_KEY);

alchymos.withLlm(openai);

const response = await openai.chat.completions.create(params);

Everything you need to scale

Purpose-built infrastructure for the next generation of AI agents.

Irresistibly Simple

Integrate in minutes with our drop-in SDK. No complex configuration or infrastructure changes required.

Reduce Server Load

Offload repetitive queries to our global edge network. Reduce latency and server load.

Cost Efficiency

Cut your LLM API costs significantly by serving cached responses for identical queries.

Global Edge Network

Low-latency responses from anywhere in the world, powered by our distributed edge infrastructure.

Full Observability

See what your agents see. Detailed dashboards for cache hit rates, latency, and costs.

SDK-First

Type-safe, zero-config SDK with first-class TypeScript support, and full Intellisense for fast integration.

Layered Edge Cache

Smart Caching

A multi-layer, multi-level fine-grained cache designed specifically for LLM workloads. Reduce latency and costs without sacrificing relevance.

Cache Performance

Monitor cache effectiveness and performance in real-time with detailed hit rate analytics and bandwidth savings.

Auto-refresh

Cache Hit Rate

74.2%

↗ Excellent

Cache Hits vs Misses

1,240

Requests served from cache

Fine-Grained Cache Rules

Configure exactly what gets cached and for how long. Create rules based on provider, model, user, or custom metadata.

Granular Configuration

Define precise rules with TTLs, priority levels, and custom tags for powerful cache control.

Semantic Matching

Cache based on meaning, not just exact string matches.

Active Rules

Rules are evaluated in priority order when cache operations occur

OpenAI LLM

Cache responses generated by OpenAI LLM provider

EnabledOPENAILLM600S100

MCP Client

Cache requests coming from client to MCP provider

EnabledMCPCLIENT600S100

MCP Server

Cache server-side requests to MCP provider

DisabledMCPSERVER600S100

MCP Server | Cache Rule

Generated from event alc_hfgl83dxsklb0sdm4m54l

EnabledMCPSERVERTOOLGET_BOOK_BY_TITLE{"title": "Building Agentic AI Systems"}60S100

Live Observability

See what your agents see

Stop flying blind. Alchymos gives you deep visibility into your AI traffic. Track costs, debug latency spikes, and audit agent decisions in real-time.

Actions-Level Insights

Track token usage, latency, and error rates by user, team, or agent. Drill down into specific timeframes to identify usage patterns and anomalies.

Weekly activity report available

Action metrics

Top actions by event count (hover for details)

523

OperationRequests

search_books

20038.2%

search_by_isbn

5610.7%

get_trending_books

5510.5%

get_book_by_id

509.6%

get_book_by_title

479.9%

Latency Pulse Monitoring

Monitor request latency and cache hit/miss durations in real time. Drill into p50 and p90 response times, identify cache misses, and correlate upstream latency to improve agent performance.

Response Time Analytics

Hit vs Miss Duration

Average duration (ms) for cache hits and misses

p50

p90

Detailed Execution Traces

Debug failures by replaying exact request sequences. Identify cache misses and optimize your prompt chains with full visibility into every request step.

Real-time log streaming

workflow chat.completions.create

Nov 25, 2025, 14:42:40

Request ID

req_mieml8ij_sa912r

P95 Latency

4.00 s

Cache Outcome

HIT

Steps Timeline

OPENAILLMprompt • chat.completions.create

2.51 s

HIT

Prompt

Do this,
1. Find more information about the book with title 'Building Agentic AI Systems'.
2. Find more information about the author...

MCPCLIENTtool • get_book_by_title

0.48 s

HIT

OPENAILLMprompt • chat.completions.create

0.33 s

HIT

Tool call: get_author_info

MCPCLIENTtool • get_author_info

1.68 s

MISS

MCPSERVERtool • get_author_info

0.35 s

HIT

Cost Efficiency

Usage & Savings

Visualize your token consumption and cost savings in real-time. Understand exactly how much Alchymos is optimizing your AI spend.

Tokens Saving

Drastically reduce your token usage by caching repetitive prompts and responses. Alchymos handles the complexity of semantic matching to maximize hit rates.

Cache hit-rate insights available

Tokens Original

1,299,461

Total tokens LLM would charge

Tokens Saved

974,596

Tokens saved by cache

Tokens Billed

324,865

Actual tokens billed

% Tokens Saving Rate

72.4%

Percent of tokens avoided by cache

Token Usage Over Time

Original vs billed tokens (daily)

Cache Efficiency Over Time

Percent of tokens saved per day (7-day MA)

Cost Saving

Turn your AI cost center into a predictable utility. Monitor accumulated savings and ROI in real-time as you scale your agent fleet.

Realtime cost delta preview

Cost Original

$38.98

Total cost LLM would charge

Cost Saved

$29.24

Estimated dollars saved to date

Cost Billed

$9.74

Actual cost billed

% Cost Saving Rate

64.8%

Total cost saved

Cost Over Time

Original vs billed cost (daily)

Cumulative Savings Over Time

Total dollars saved accumulating over time

Universal Compatibility

Works with your stack

Drop-in compatibility with the most popular AI frameworks and providers. No complex migration required. Just change the base URL or use our lightweight SDK.

Built for Developers

Type-safe SDK for TypeScript/Node.js
Full Intellisense support
Zero-config middleware for Python
Works with FastAPI & Django
Drop-in replacement for the most popular frameworks
No code changes needed
Edge-first with minimal overhead
Engineered for sub-second response times

Available for OpenAI, MCP & Vercel AI

agent.ts

import OpenAI from "openai";
import { Alchymos } from '@alchymos/openai';

const openai = new OpenAI(process.env.OPENAI_API_KEY);
const alchymos = new Alchymos(process.env.ALCHYMOS_API_KEY);

alchymos.withLlm(openai);

const response = await openai.chat.completions.create(params);

Supported Ecosystem

Plug into OpenAI, MCP, or Vercel AI with zero friction. Python ecosystem (LangChain/LangGraph) coming soon, plus bespoke integrations on request.

Explore integration options

OpenAI

TypeScript

MCP

TypeScript

Vercel AI

TypeScript

LangChain

Python (coming soon)

LangGraph

Python (coming soon)

Custom Framework

Any Language (upon request)

Ready to speed up your agents?

Try Alchymos — speed up your agents, cut costs, and improve reliability.

Try it today Contact

* Illustrative metrics highlighting core system capabilities and potential performance

Caching for AI Agents

Everything you need to scale

Irresistibly Simple

Reduce Server Load

Cost Efficiency

Global Edge Network

Full Observability

SDK-First

Smart Caching

Cache Performance

Fine-Grained Cache Rules

Granular Configuration

Semantic Matching

See what your agents see

Actions-Level Insights

Latency Pulse Monitoring

Hit vs Miss Duration

Detailed Execution Traces

Usage & Savings

Tokens Saving

Cost Saving

Works with your stack

Built for Developers

Supported Ecosystem

OpenAI

MCP

Vercel AI

LangChain

LangGraph

Custom Framework

Ready to speed up your agents?

Caching for
AI Agents