Agent-native Edge Caching

Caching for the AI agents — faster, cheaper, visible.

Offload repetitive agent traffic at the edge to reduce latency and inference costs with three lines of code.

Without Alchymos2.4s
With Alchymos45ms

Multi-layer Caching

Simple process, powerful results

From edge CDNs to regional Redis and local memory, Alchymos intelligently routes your agent's traffic to serve responses from the fastest available layer.

Using CDN only

Best for read-heavy, mostly static outputs. Provides low-latency global access for common queries.

Full layered caching

Combines local in-memory speed with regional distributed storage. Includes TTL management and optional invalidation.

Multi-region replication

Regional failover and global distribution ensure strict latency targets and high availability for mission-critical agents.

Global Architecture
Client
Edge CDN
Regional Cache
LLM Provider
TypeScript
import OpenAI from "openai";
import { Alchymos } from '@alchymos/openai';

const openai = new OpenAI(process.env.OPENAI_API_KEY);
const alchymos = new Alchymos(process.env.ALCHYMOS_API_KEY);

alchymos.withLlm(openai);

const response = await openai.chat.completions.create(params);

Alchymos Mechanics

Everything you need to control your cache

Global edge distribution

Serve cached responses from the nearest region to your user for minimum latency.

Authenticated caching

Cache scoped securely to specific organizations, licenses, or user IDs.

Fine-grained rules

Configure caching behavior by tool, parameters, or request type.

Automatic invalidation

Intelligently purge cache entries when upstream data changes.

Purge API

Manual and programmatic control to clear cache on demand.

Metrics & analytics

Detailed dashboards for cache hit/miss rates, latency reduction, and tokens saved.

Cache debugging

Inspect headers to see exactly why requests are cached or bypassed.

Performance SLAs

Guaranteed p50/p95 response times for enterprise workloads.