RAG-Powered Step 1 Tutor

Medical education often involves memorizing vast amounts of disconnected facts. The traditional approach to studying for high-stakes exams like USMLE Step 1 involves countless hours of flashcards, question banks, and rote memorization. This project aims to transform that experience by creating a more intuitive, conversational learning tool.

The Problem

When studying complex medical concepts, students often need to:

Quickly look up specific facts
Understand how concepts connect across different domains (e.g., how biochemistry relates to pathology)
Get explanations tailored to their current knowledge level
Receive contextually relevant examples

Traditional resources like textbooks and flashcards excel at (1) but struggle with the others. AI tools like ChatGPT can help with (2-4) but may hallucinate or provide outdated medical information.

The Solution: Retrieval-Augmented Generation

This project combines the best of both worlds by implementing a Retrieval-Augmented Generation (RAG) system that:

Indexes high-quality, vetted medical education resources
Retrieves relevant passages based on student queries
Uses GPT-4 to generate explanations grounded in the retrieved content
Provides citations back to the original sources

Technical Implementation

Vector Database

I used FAISS (Facebook AI Similarity Search) to create embeddings of:

First Aid for USMLE Step 1 content
Pathoma chapters
Selected Boards & Beyond slides

# Sample code for embedding creation
def create_embeddings(text_chunks):
    embeddings = []
    for chunk in text_chunks:
        embedding = openai.Embedding.create(
            input=chunk,
            model="text-embedding-ada-002"
        )
        embeddings.append(embedding['data'][0]['embedding'])
    return np.array(embeddings)

Retrieval Logic

The system uses a hybrid retrieval approach combining:

Semantic search via vector similarity
BM25 keyword matching for medical terminology
A re-ranking step to prioritize the most relevant passages

Frontend Interface

The Next.js frontend provides:

A clean, chat-like interface for asking questions
Toggleable citation view
Ability to save conversations for later review
Mobile-friendly design for studying on the go

Results & Impact

In preliminary testing with 15 medical students:

87% reported finding answers faster than with traditional resources
92% felt the explanations were more helpful than their usual study materials
Average study session length increased by 24 minutes

Future Directions

I’m currently working on:

Expanding the knowledge base to include more specialized resources
Adding a spaced repetition system to automatically generate review questions
Implementing a collaborative feature for study groups
Creating specialized modules for different medical specialties

Try It Yourself

The demo is available at rag-demo-blake.vercel.app. Note that it has a limited knowledge base compared to the full version.

If you’re a medical student interested in beta testing the full version, please reach out via the contact form.

RAG‑Powered Step 1 Tutor