FlashRAG

Normal
GitHub Python MIT

Description

FlashRAG is a Python toolkit for RAG research that ships with 36 benchmark datasets and 23 SOTA RAG algorithms, with modular retrievers, rerankers, generators, and compressors for quick reproduction and custom pipelines.

Key Features

  • Modular components - retriever, reranker, generator, and compressor modules for flexible pipelines
  • 36 benchmark datasets - pre-processed RAG datasets ready for training and evaluation
  • 23 SOTA algorithms - includes 23 published RAG algorithms, including 7 reasoning-augmented methods
  • Efficient preprocessing - scripts for corpus processing, index building, and pre-retrieval
  • Accelerated inference - integrated with vLLM, FastChat, and Faiss for fast LLM and vector search
  • Visual UI - graphical interface for configuring and evaluating RAG baselines

Use Cases

💡 Reproduce published RAG algorithms as paper baselines
💡 Compare retriever + generator combinations in a unified framework
💡 Quickly assemble custom RAG pipelines for new domains
💡 Run batch evaluations on 36 public RAG benchmarks
💡 Study reasoning-augmented RAG on multi-hop question answering

Quick Start

# Install FlashRAG
pip install flashrag-dev

# Prepare a dataset (auto-downloaded)
from flashrag.config import Config
from flashrag.pipeline import SequentialPipeline

config = Config(config_file="basic_config.yaml")
pipeline = SequentialPipeline(config)
result = pipeline.run("What is the capital of China?")
print(result)

Related Projects