Upload Documents

Upload documents to index them using your preferred RAG mode.

Current Default RAG Mode: Normal RAG
You can override this setting for individual uploads below.
Supported formats: PDF, TXT, DOCX, HTML
⚠️ Graph RAG takes longer but provides richer context
📊 Extracts and stores knowledge graph in Neo4j without vector embeddings
🌲 Builds a hierarchical tree index (like a smart Table of Contents) from your PDF. Best for long professional documents: annual reports, research papers, legal filings.
⏱ Indexing time: ~2–5 min per 100 pages (one-time, LLM-based tree construction). No embeddings generated.

Normal RAG Process

  1. Text extraction from document
  2. Split into overlapping chunks
  3. Generate embeddings for chunks
  4. Store in vector database

Graph RAG Process

  1. Text extraction from document
  2. Extract entities and relationships
  3. Build knowledge graph
  4. Generate embeddings for graph elements
  5. Store both chunks and graph in database

Note: Graph RAG processing takes significantly longer than Normal RAG due to entity extraction and relationship mapping, but provides much richer contextual understanding.

🌲 PageIndex RAG — How It Works NEW — Vectorless
Index Phase (one-time, ~2–5 min)
  1. LLM reads the full document
  2. Builds a hierarchical tree (like a Table of Contents)
  3. Each node gets a title, page range, and summary
  4. Tree saved as JSON — no vector DB needed
Query Phase (~10–20 s)
  1. LLM agent inspects the tree structure
  2. Reasons about which sections are relevant
  3. Fetches exact page text (tight ranges only)
  4. Synthesises answer with traceable page citations
Best for: Annual reports, SEC filings, research papers, legal manuals, technical specifications — any long document where exact section navigation matters. PageIndex achieved 98.7% accuracy on the FinanceBench benchmark.