AgentList
HomeProjectsArticlesAbout
Explore Projects
HomeProjectsArticlesAbout
Explore Projects
Projects Docstrange

Docstrange

Stale
GitHub Python MIT

Description

Extract and convert data from any document (PDFs, images, Word, PPT, URLs) into multiple formats including Markdown, JSON, and CSV.

Tags

python rag tools data-processing agent

Categories

📚 RAG Tools ⚡ Agent Tools
Visit GitHub

Project Metrics

Stars 1.5k
Forks 131
Watchers 1.5k
Issues 34
Created July 31, 2025
Last commit October 31, 2025

Deployment

Local

Related Projects

LangExtract

36.8k · Python
Active

A Python library by Google for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization, designed for data annotation and knowledge extraction workflows.

data-processingllmpython +2

Airweave

6.4k · Python
Active

Open-source context retrieval layer for AI agents that automatically extracts, indexes, and retrieves structured context from diverse data sources.

pythonragagent +2

PDFMathTranslate

34.4k · Python
Active

AI-powered PDF scientific paper translation with preserved formats, supporting Google/DeepL/Ollama/OpenAI services via CLI/GUI/MCP/Docker/Zotero.

ragpythontools +2

Crawlee

23.6k · TypeScript
Active

A web scraping and browser automation library for Node.js to build reliable crawlers, supporting Puppeteer, Playwright, Cheerio, and raw HTTP. Extract data for AI, LLMs, RAG, or GPTs with proxy rotation and both headful and headless modes.

typescriptjavascriptdata-processing +3
AgentList

The most comprehensive directory of open-source AI Agent projects. Discover and compare top Agent frameworks like LangChain, CrewAI, and more.

Quick Links

  • Project List
  • Featured Articles
  • Browse Categories

Contact

  • About
  • Privacy Policy
  • Contact Us

© 2026 AgentList. All rights reserved.

Made with for the open source community