AgentList
HomeProjectsArticlesAbout
Explore Projects
HomeProjectsArticlesAbout
Explore Projects
Projects MinerU

MinerU

Active
GitHub Python NOASSERTION

Description

Transforms complex documents like PDFs into LLM-ready markdown/JSON for Agentic workflows, supporting layout analysis, formula recognition, and table extraction.

Tags

data-processing rag python llm tools

Categories

📚 RAG Tools
Visit GitHub

Project Metrics

Stars 66.2k
Forks 5.6k
Watchers 66.2k
Issues 18
Created February 29, 2024
Last commit June 2, 2026

Deployment

Local

Related Projects

Quivr

39.2k · Python
Stale

Opinionated RAG framework for integrating GenAI into your apps. Works with any LLM, any vectorstore, any files — so you can focus on your product instead of building RAG pipelines.

ragpythonvector-database +3

Unstract

6.6k · Python
Active

LLM-driven extraction of unstructured data, built for API deployments and ETL pipeline workflows. Automates document parsing, PDF extraction, and intelligent data processing with LLM-powered intelligence.

data-processingragpython +3

SAG

1.1k · Python
Stale

SQL-Driven RAG Engine that automatically builds knowledge graphs during querying, combining SQL query capabilities with Retrieval-Augmented Generation for efficient knowledge retrieval.

pythonragtools +2

Airweave

6.4k · Python
Active

Open-source context retrieval layer for AI agents that automatically extracts, indexes, and retrieves structured context from diverse data sources.

pythonragagent +2
AgentList

The most comprehensive directory of open-source AI Agent projects. Discover and compare top Agent frameworks like LangChain, CrewAI, and more.

Quick Links

  • Project List
  • Featured Articles
  • Browse Categories

Contact

  • About
  • Privacy Policy
  • Contact Us

© 2026 AgentList. All rights reserved.

Made with for the open source community