Pythonintermediateai

Document Q&A with Local RAG

Loads PDFs or text files, indexes them with local embeddings (sentence-transformers), and answers questions about the content.

5 stepsPython · sentence-transformers · FAISS · pypdf · FastAPI

Project steps

01
PDF Ingestion
Extracts text from PDFs with pypdf, splits into chunks of ~500 tokens with overlap.
02
Local Embeddings
Calculates embeddings with all-MiniLM-L6-v2 (runs on CPU, ~80MB).
03
FAISS Index
Indexes embeddings in FAISS IndexFlatL2, saves to disk.
04
Retrieval
On query, calculates embedding, searches for top-5 relevant chunks.
05
FastAPI API
POST /ask {document_id, question} → {answer, sources[]}.

Recommended resources

Ready to build this?

Fork the repo on GitHub and start building. A mentor will review your code when you open a PR.

5 steps

Tech stack

Pythonsentence-transformersFAISSpypdfFastAPI