Parsr

Normal

GitHub JavaScript Apache-2.0

Description

Transforms PDF, documents and images into enriched structured data with table recognition, reading order restoration, and Markdown output.

Related Projects

Crawlee

23.6k · TypeScript

Active

A web scraping and browser automation library for Node.js to build reliable crawlers, supporting Puppeteer, Playwright, Cheerio, and raw HTTP. Extract data for AI, LLMs, RAG, or GPTs with proxy rotation and both headful and headless modes.

typescriptjavascriptdata-processing +3

Unstract

6.6k · Python

Active

LLM-driven extraction of unstructured data, built for API deployments and ETL pipeline workflows. Automates document parsing, PDF extraction, and intelligent data processing with LLM-powered intelligence.

data-processingragpython +3

SAG

1.1k · Python

Stale

SQL-Driven RAG Engine that automatically builds knowledge graphs during querying, combining SQL query capabilities with Retrieval-Augmented Generation for efficient knowledge retrieval.

pythonragtools +2

Airweave

6.4k · Python

Active

Open-source context retrieval layer for AI agents that automatically extracts, indexes, and retrieves structured context from diverse data sources.

pythonragagent +2

Parsr

Description

Tags

Categories

Related Projects

Crawlee

Unstract

SAG

Airweave