PyMuPDF

Active

GitHub Python AGPL-3.0

Description

High-performance Python library for data extraction, analysis, conversion and manipulation of PDF and other document formats.

Related Projects

SAG

1.1k · Python

Stale

SQL-Driven RAG Engine that automatically builds knowledge graphs during querying, combining SQL query capabilities with Retrieval-Augmented Generation for efficient knowledge retrieval.

pythonragtools +2

Airweave

6.3k · Python

Active

Open-source context retrieval layer for AI agents that automatically extracts, indexes, and retrieves structured context from diverse data sources.

pythonragagent +2

Modular RAG MCP Server

889 · Python

Normal

A modular RAG system with MCP Server architecture. Using Skill to make AI follow each step of the spec and complete the code 100% by AI.

pythonragmcp +2

Docstrange

1.5k · Python

Stale

Extract and convert data from any document (PDFs, images, Word, PPT, URLs) into multiple formats including Markdown, JSON, and CSV.