AgentList
HomeProjectsArticlesAbout
Explore Projects
HomeProjectsArticlesAbout
Explore Projects
Home / Projects / Docstrange

Docstrange

Active
GitHub Python MIT

Description

Extract and convert data from any document (PDFs, images, Word, PPT, URLs) into multiple formats including Markdown, JSON, and CSV.

Tags

python rag tools data-processing agent

Categories

📚 RAG Tools ⚡ Agent Tools
Visit GitHub

Project Metrics

Stars 1.4k
Forks 126
Watchers 12
Issues 32
Created July 31, 2025
Last commit April 17, 2026

Deployment

Local

Related Projects

LangExtract

35.7k · Python
Active

A Python library by Google for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization, designed for data annotation and knowledge extraction workflows.

data-processingllmpython +2

PDFMathTranslate

33.2k · Python
Active

AI-powered PDF scientific paper translation with preserved formats, supporting Google/DeepL/Ollama/OpenAI services via CLI/GUI/MCP/Docker/Zotero.

ragpythontools +2

Crawlee

22.8k · TypeScript
Active

A web scraping and browser automation library for Node.js to build reliable crawlers, supporting Puppeteer, Playwright, Cheerio, and raw HTTP. Extract data for AI, LLMs, RAG, or GPTs with proxy rotation and both headful and headless modes.

typescriptjavascriptdata-processing +3

txtai

12.4k · Python
Active

All-in-one AI framework for semantic search, LLM orchestration, and language model workflows with agent support, RAG, and vector database

semantic-searchragembeddings +4
AgentList

Curated directory of open-source AI agent projects

Quick Links

  • Project List
  • Featured Articles
  • Browse Categories

Contact

  • About
  • Privacy Policy
  • Contact Us

© 2026 AgentList. All rights reserved.

Made with for the open source community