Explore projects
Pythonintermediateai
Document Q&A with Local RAG
Loads PDFs or text files, indexes them with local embeddings (sentence-transformers), and answers questions about the content.
5 steps
Project steps
- 01
PDF Ingestion
Extracts text from PDFs with pypdf, splits into chunks of ~500 tokens with overlap.
- 02
Local Embeddings
Calculates embeddings with all-MiniLM-L6-v2 (runs on CPU, ~80MB).
- 03
FAISS Index
Indexes embeddings in FAISS IndexFlatL2, saves to disk.
- 04
Retrieval
On query, calculates embedding, searches for top-5 relevant chunks.
- 05
FastAPI API
POST /ask {document_id, question} → {answer, sources[]}.
Recommended resources
Ready to build this?
Fork the repo on GitHub and start building. A mentor will review your code when you open a PR.
5 steps
Tech stack
Pythonsentence-transformersFAISSpypdfFastAPI