CODESEEDSign in
Explore projects
Pythonintermediateai

Document Q&A with Local RAG

Loads PDFs or text files, indexes them with local embeddings (sentence-transformers), and answers questions about the content.

5 steps

Project steps

  1. 01

    PDF Ingestion

    Extracts text from PDFs with pypdf, splits into chunks of ~500 tokens with overlap.

  2. 02

    Local Embeddings

    Calculates embeddings with all-MiniLM-L6-v2 (runs on CPU, ~80MB).

  3. 03

    FAISS Index

    Indexes embeddings in FAISS IndexFlatL2, saves to disk.

  4. 04

    Retrieval

    On query, calculates embedding, searches for top-5 relevant chunks.

  5. 05

    FastAPI API

    POST /ask {document_id, question} → {answer, sources[]}.

Recommended resources

Ready to build this?

Fork the repo on GitHub and start building. A mentor will review your code when you open a PR.

5 steps

Tech stack

Pythonsentence-transformersFAISSpypdfFastAPI