Javaadvancedai

Streaming LLM Proxy with Cache

Proxy for LLM APIs (OpenAI, Anthropic) with semantic cache (embeddings + cosine similarity), per-user rate limiting, and cost tracking.

5 stepsJava · Spring Boot · Redis · PostgreSQL · Spring WebFlux

Project steps

01
Reverse proxy streaming
Spring WebFlux WebClient that forwards SSE streams from the LLM API.
02
Semantic cache
Embeds the query; if cosine similarity > 0.95 with a cached query, returns the saved response.
03
Cache storage
Redis for hot cache (TTL 24h) + PostgreSQL for persistent cold cache.
04
Rate limiting
Sliding window per API key: tokens/min and requests/day.
05
Cost tracking
Token counting, cost per model, per-user consumption dashboard.

Recommended resources

Ready to build this?

Fork the repo on GitHub and start building. A mentor will review your code when you open a PR.

5 steps

Tech stack

JavaSpring BootRedisPostgreSQLSpring WebFlux