Backed byCombinator
Compress LLM input without sacrificing accuracy
Save on LLM costs, improve latency and fit more context in your requests by compressing the input with a compression model.
Input
29 tokens
14 tokens
Cost per 1M tokens
- gpt-4o$2.50$1.21
- gpt-5$1.25$0.60
- gemini-2.5-flash$0.30$0.14
- gpt-4o-mini$0.15$0.07
- gemini-2.5-flash-lite$0.10$0.05
Benchmark results
Tested on LongBench v2, a public long-context benchmark.
66%
fewer tokens
100%
accuracy maintained
Token usage comparison
Without compression100%
With compression34%
230 questions • 50 runs averaged • GPT-4o-miniView detailed results →
Get Full Access
Be the first to experience the future of LLM input optimization. We are onboarding new users. Get exclusive early access before the public release.