Crawlee Python

Active

GitHub Python Apache-2.0

Description

Crawlee for Python is Apify's web scraping and browser automation library, designed for reliable, headful or headless data collection.

Key Features

Unified API for HTTP scraping, headless browser, and Playwright-based crawlers
Automatic request queuing, retries, throttling, and proxy rotation
{"Pluggable HTTP clients":"httpx, curl-impersonate, and raw socket"}
Browser fingerprint management and stealth mode to bypass anti-bot defenses
Dataset and Key-Value Store integrations for structured storage of crawl results
Native Interoperability with the Apify platform for deploying crawlers to the cloud

Use Cases

💡 Building production web crawlers for e-commerce price monitoring

💡 Scraping JavaScript-rendered pages that require a real browser

💡 Feeding structured web data into RAG pipelines and downstream LLM agents

💡 Authoring reliable long-running crawlers with built-in retries and proxy management

💡 Migrating Node.js Crawlee projects to Python while keeping the same conceptual model

Quick Start

pip install crawlee
from crawlee.playwright_crawler import PlaywrightCrawler
crawler = PlaywrightCrawler()
@crawler.router.default_handler
async def handle(context):
    await context.page.goto(context.request.url)
    title = await context.page.title()
    await context.push_data({"url": context.request.url, "title": title})
await crawler.run(["https://example.com"])

Visit GitHub Visit Website View Docs

Crawlee Python

Description

Key Features

Use Cases

Tags

Categories

Quick Start

Related Projects

Firecrawl

Vision Agents

browser-harness

Maxun