AgentList
HomeProjectsArticlesAbout
Explore Projects
HomeProjectsArticlesAbout
Explore Projects
Projects Crawlee

Crawlee

Active
GitHub TypeScript Apache-2.0

Description

A web scraping and browser automation library for Node.js to build reliable crawlers, supporting Puppeteer, Playwright, Cheerio, and raw HTTP. Extract data for AI, LLMs, RAG, or GPTs with proxy rotation and both headful and headless modes.

Tags

typescript javascript data-processing tools rag automation

Categories

📚 RAG Tools ⚡ Agent Tools
Visit GitHub

Project Metrics

Stars 23.6k
Forks 1.4k
Watchers 23.6k
Issues 174
Created August 26, 2016
Last commit June 2, 2026

Deployment

Local

Related Projects

Parsr

6.2k · JavaScript
Normal

Transforms PDF, documents and images into enriched structured data with table recognition, reading order restoration, and Markdown output.

javascriptragtools +2

Docstrange

1.5k · Python
Stale

Extract and convert data from any document (PDFs, images, Word, PPT, URLs) into multiple formats including Markdown, JSON, and CSV.

pythonragtools +2

Zerox

12.2k · TypeScript
Stale

OCR and document extraction tool using vision models, efficiently converting PDFs and images into structured text.

typescriptragtools +2

WrenAI

15.4k · Python
Active

Open-source text-to-SQL and text-to-chart GenBI agent with a semantic layer. Ask your database questions in natural language and get accurate SQL, charts, and BI insights. Supports 12+ data sources and any LLM.

llmtypescriptagent +2
AgentList

The most comprehensive directory of open-source AI Agent projects. Discover and compare top Agent frameworks like LangChain, CrewAI, and more.

Quick Links

  • Project List
  • Featured Articles
  • Browse Categories

Contact

  • About
  • Privacy Policy
  • Contact Us

© 2026 AgentList. All rights reserved.

Made with for the open source community