Agent-native Cache Layer

Caching for
AI Agents

Make your agents perform at their peak. Slash latency and get sub-second responses with a multi-layer caching engine designed for agentic workloads.

agent.ts
import OpenAI from "openai";
import { Alchymos } from '@alchymos/openai';

const openai = new OpenAI(process.env.OPENAI_API_KEY);
const alchymos = new Alchymos(process.env.ALCHYMOS_API_KEY);

alchymos.withLlm(openai);

const response = await openai.chat.completions.create(params);

Everything you need to scale

Purpose-built infrastructure for the next generation of AI agents.

Irresistibly Simple

Integrate in minutes with our drop-in SDK. No complex configuration or infrastructure changes required.

Reduce Server Load

Offload repetitive queries to our global edge network. Reduce latency and server load.

Cost Efficiency

Cut your LLM API costs significantly by serving cached responses for identical queries.

Global Edge Network

Low-latency responses from anywhere in the world, powered by our distributed edge infrastructure.

Full Observability

See what your agents see. Detailed dashboards for cache hit rates, latency, and costs.

SDK-First

Type-safe, zero-config SDK with first-class TypeScript support, and full Intellisense for fast integration.

Layered Edge Cache

Smart Caching

A multi-layer, multi-level fine-grained cache designed specifically for LLM workloads. Reduce latency and costs without sacrificing relevance.

Cache Performance

Monitor cache effectiveness and performance in real-time with detailed hit rate analytics and bandwidth savings.

Auto-refresh
Cache Hit Rate
74.2%
↗ Excellent
Cache Hits vs Misses
1,240
Requests served from cache

Fine-Grained Cache Rules

Configure exactly what gets cached and for how long. Create rules based on provider, model, user, or custom metadata.

Granular Configuration

Define precise rules with TTLs, priority levels, and custom tags for powerful cache control.

Semantic Matching

Cache based on meaning, not just exact string matches.

Active Rules
Rules are evaluated in priority order when cache operations occur
OpenAI LLM
Cache responses generated by OpenAI LLM provider
EnabledOPENAILLM600S100
MCP Client
Cache requests coming from client to MCP provider
EnabledMCPCLIENT600S100
MCP Server
Cache server-side requests to MCP provider
DisabledMCPSERVER600S100
MCP Server | Cache Rule
Generated from event alc_hfgl83dxsklb0sdm4m54l
EnabledMCPSERVERTOOLGET_BOOK_BY_TITLE{"title": "Building Agentic AI Systems"}60S100
Live Observability

See what your agents see

Stop flying blind. Alchymos gives you deep visibility into your AI traffic. Track costs, debug latency spikes, and audit agent decisions in real-time.

Actions-Level Insights

Track token usage, latency, and error rates by user, team, or agent. Drill down into specific timeframes to identify usage patterns and anomalies.

Weekly activity report available
Action metrics
Top actions by event count (hover for details)
523
OperationRequests
search_books
20038.2%
search_by_isbn
5610.7%
get_trending_books
5510.5%
get_book_by_id
509.6%
get_book_by_title
479.9%

Latency Pulse Monitoring

Monitor request latency and cache hit/miss durations in real time. Drill into p50 and p90 response times, identify cache misses, and correlate upstream latency to improve agent performance.

Response Time Analytics

Hit vs Miss Duration

Average duration (ms) for cache hits and misses

p50
p90

Detailed Execution Traces

Debug failures by replaying exact request sequences. Identify cache misses and optimize your prompt chains with full visibility into every request step.

Real-time log streaming
workflow chat.completions.create
Nov 25, 2025, 14:42:40
Request ID
req_mieml8ij_sa912r
P95 Latency
4.00 s
Cache Outcome
HIT
Steps Timeline
1
OPENAILLMprompt • chat.completions.create
2.51 s
HIT
Prompt
Do this,
1. Find more information about the book with title 'Building Agentic AI Systems'.
2. Find more information about the author...
2
MCPCLIENTtool • get_book_by_title
0.48 s
HIT
3
OPENAILLMprompt • chat.completions.create
0.33 s
HIT
Tool call: get_author_info
4
MCPCLIENTtool • get_author_info
1.68 s
MISS
5
MCPSERVERtool • get_author_info
0.35 s
HIT
Cost Efficiency

Usage & Savings

Visualize your token consumption and cost savings in real-time. Understand exactly how much Alchymos is optimizing your AI spend.

Tokens Saving

Drastically reduce your token usage by caching repetitive prompts and responses. Alchymos handles the complexity of semantic matching to maximize hit rates.

Cache hit-rate insights available
Tokens Original
1,299,461
Total tokens LLM would charge
Tokens Saved
974,596
Tokens saved by cache
Tokens Billed
324,865
Actual tokens billed
% Tokens Saving Rate
72.4%
Percent of tokens avoided by cache
Token Usage Over Time
Original vs billed tokens (daily)
Cache Efficiency Over Time
Percent of tokens saved per day (7-day MA)

Cost Saving

Turn your AI cost center into a predictable utility. Monitor accumulated savings and ROI in real-time as you scale your agent fleet.

Realtime cost delta preview
Cost Original
$38.98
Total cost LLM would charge
Cost Saved
$29.24
Estimated dollars saved to date
Cost Billed
$9.74
Actual cost billed
% Cost Saving Rate
64.8%
Total cost saved
Cost Over Time
Original vs billed cost (daily)
Cumulative Savings Over Time
Total dollars saved accumulating over time
Universal Compatibility

Works with your stack

Drop-in compatibility with the most popular AI frameworks and providers. No complex migration required. Just change the base URL or use our lightweight SDK.

Built for Developers

  • Type-safe SDK for TypeScript/Node.js
    Full Intellisense support
  • Zero-config middleware for Python
    Works with FastAPI & Django
  • Drop-in replacement for the most popular frameworks
    No code changes needed
  • Edge-first with minimal overhead
    Engineered for sub-second response times
Available for OpenAI, MCP & Vercel AI
agent.ts
import OpenAI from "openai";
import { Alchymos } from '@alchymos/openai';

const openai = new OpenAI(process.env.OPENAI_API_KEY);
const alchymos = new Alchymos(process.env.ALCHYMOS_API_KEY);

alchymos.withLlm(openai);

const response = await openai.chat.completions.create(params);

Supported Ecosystem

Plug into OpenAI, MCP, or Vercel AI with zero friction. Python ecosystem (LangChain/LangGraph) coming soon, plus bespoke integrations on request.

Explore integration options

OpenAI

TypeScript

MCP

TypeScript

Vercel AI

TypeScript

LangChain

Python (coming soon)

LangGraph

Python (coming soon)

Custom Framework

Any Language (upon request)

Ready to speed up your agents?

Try Alchymos — speed up your agents, cut costs, and improve reliability.

* Illustrative metrics highlighting core system capabilities and potential performance