CODESEEDSign in
Explore projects
Javaadvancedai

Streaming LLM Proxy with Cache

Proxy for LLM APIs (OpenAI, Anthropic) with semantic cache (embeddings + cosine similarity), per-user rate limiting, and cost tracking.

5 steps

Project steps

  1. 01

    Reverse proxy streaming

    Spring WebFlux WebClient that forwards SSE streams from the LLM API.

  2. 02

    Semantic cache

    Embeds the query; if cosine similarity > 0.95 with a cached query, returns the saved response.

  3. 03

    Cache storage

    Redis for hot cache (TTL 24h) + PostgreSQL for persistent cold cache.

  4. 04

    Rate limiting

    Sliding window per API key: tokens/min and requests/day.

  5. 05

    Cost tracking

    Token counting, cost per model, per-user consumption dashboard.

Recommended resources

Ready to build this?

Fork the repo on GitHub and start building. A mentor will review your code when you open a PR.

5 steps

Tech stack

JavaSpring BootRedisPostgreSQLSpring WebFlux