AgentList
HomeProjectsArticlesAbout
Explore Projects
HomeProjectsArticlesAbout
Explore Projects
Home / Projects / MinerU

MinerU

Active
GitHub Python NOASSERTION

Description

Transforms complex documents like PDFs into LLM-ready markdown/JSON for Agentic workflows, supporting layout analysis, formula recognition, and table extraction.

Tags

data-processing rag python llm tools

Categories

📚 RAG Tools
Visit GitHub

Project Metrics

Stars 60.3k
Forks 5.0k
Watchers 60.3k
Issues 79
Created February 29, 2024
Last commit April 17, 2026

Deployment

Local

Related Projects

PDFMathTranslate

33.2k · Python
Active

AI-powered PDF scientific paper translation with preserved formats, supporting Google/DeepL/Ollama/OpenAI services via CLI/GUI/MCP/Docker/Zotero.

ragpythontools +2

Quivr

39.1k · Python
Stale

Opinionated RAG framework for integrating GenAI into your apps. Works with any LLM, any vectorstore, any files — so you can focus on your product instead of building RAG pipelines.

ragpythonvector-database +3

Unstract

6.5k · Python
Active

LLM-driven extraction of unstructured data, built for API deployments and ETL pipeline workflows. Automates document parsing, PDF extraction, and intelligent data processing with LLM-powered intelligence.

data-processingragpython +3

document.ai

3.7k · Python
Stale

A universal local knowledge base solution based on vector databases and GPT, providing one-stop document processing with vectorization, semantic search, and intelligent Q&A for building private knowledge bases.

ragvector-databasepython +2
AgentList

Curated directory of open-source AI agent projects

Quick Links

  • Project List
  • Featured Articles
  • Browse Categories

Contact

  • About
  • Privacy Policy
  • Contact Us

© 2026 AgentList. All rights reserved.

Made with for the open source community