Offload repetitive agent traffic at the edge to reduce latency and inference costs with three lines of code.
Simple process, powerful results
From edge CDNs to regional Redis and local memory, Alchymos intelligently routes your agent's traffic to serve responses from the fastest available layer.
Best for read-heavy, mostly static outputs. Provides low-latency global access for common queries.
Combines local in-memory speed with regional distributed storage. Includes TTL management and optional invalidation.
Regional failover and global distribution ensure strict latency targets and high availability for mission-critical agents.
import OpenAI from "openai";
import { Alchymos } from '@alchymos/openai';
const openai = new OpenAI(process.env.OPENAI_API_KEY);
const alchymos = new Alchymos(process.env.ALCHYMOS_API_KEY);
alchymos.withLlm(openai);
const response = await openai.chat.completions.create(params);Everything you need to control your cache
Serve cached responses from the nearest region to your user for minimum latency.
Cache scoped securely to specific organizations, licenses, or user IDs.
Configure caching behavior by tool, parameters, or request type.
Intelligently purge cache entries when upstream data changes.
Manual and programmatic control to clear cache on demand.
Detailed dashboards for cache hit/miss rates, latency reduction, and tokens saved.
Inspect headers to see exactly why requests are cached or bypassed.
Guaranteed p50/p95 response times for enterprise workloads.