Explore projects
Javaadvancedai
Streaming LLM Proxy with Cache
Proxy for LLM APIs (OpenAI, Anthropic) with semantic cache (embeddings + cosine similarity), per-user rate limiting, and cost tracking.
5 steps
Project steps
- 01
Reverse proxy streaming
Spring WebFlux WebClient that forwards SSE streams from the LLM API.
- 02
Semantic cache
Embeds the query; if cosine similarity > 0.95 with a cached query, returns the saved response.
- 03
Cache storage
Redis for hot cache (TTL 24h) + PostgreSQL for persistent cold cache.
- 04
Rate limiting
Sliding window per API key: tokens/min and requests/day.
- 05
Cost tracking
Token counting, cost per model, per-user consumption dashboard.
Recommended resources
Ready to build this?
Fork the repo on GitHub and start building. A mentor will review your code when you open a PR.
5 steps
Tech stack
JavaSpring BootRedisPostgreSQLSpring WebFlux