RAG Chatbot Backend

RAG Chatbot Backend

Backend service for the research paper chatbot on Nina Roussille’s website.

Setup

Install dependencies:
```
npm install
```

Set up environment variables:

cp .env.example .env
# Edit .env and add your OPENAI_API_KEY

Ingest PDFs (first time):
```
npm run ingest
```
This will:
- Create a vector store in OpenAI
- Upload all PDFs from ../files/
- Print the VECTOR_STORE_ID (add this to .env)

Run the server:

npm start
# or for development with auto-reload:
npm run dev

Deployment

Option 1: Render.com (Recommended)

Create account at render.com
New → Web Service
Connect your GitHub repo
Set build command: cd rag-backend && npm install
Set start command: cd rag-backend && npm start
Add environment variables:
- OPENAI_API_KEY
- VECTOR_STORE_ID
- PORT (auto-set by Render)

Option 2: Railway

Create account at railway.app
New Project → Deploy from GitHub
Select your repo
Add environment variables
Deploy!

Option 3: Vercel/Netlify Functions

See their serverless function documentation for Express apps.

Updating Papers

When new PDFs are added to /files/:

Run npm run ingest again
It will add new files to the existing vector store

Or use the GitHub Actions workflow (see .github/workflows/ingest-pdfs.yml)

Cost Estimates

Storage: ~$0.10/GB/month for vector store
Ingestion: ~$0.10/GB one-time per PDF
Queries: ~$0.15-0.60 per 1M input tokens, ~$0.60-1.80 per 1M output tokens
Estimated monthly: $5-20 for moderate traffic

API Endpoints

POST /api/chat - Main chat endpoint
- Body: { message: string, history?: array }
- Returns: { answer: string, citations?: array }
GET /health - Health check