Crawlee Python
ActiveDescription
Crawlee for Python is Apify's web scraping and browser automation library, designed for reliable, headful or headless data collection.
Key Features
- Unified API for HTTP scraping, headless browser, and Playwright-based crawlers
- Automatic request queuing, retries, throttling, and proxy rotation
- {"Pluggable HTTP clients":"httpx, curl-impersonate, and raw socket"}
- Browser fingerprint management and stealth mode to bypass anti-bot defenses
- Dataset and Key-Value Store integrations for structured storage of crawl results
- Native Interoperability with the Apify platform for deploying crawlers to the cloud
Use Cases
Categories
Quick Start
pip install crawlee
from crawlee.playwright_crawler import PlaywrightCrawler
crawler = PlaywrightCrawler()
@crawler.router.default_handler
async def handle(context):
await context.page.goto(context.request.url)
title = await context.page.title()
await context.push_data({"url": context.request.url, "title": title})
await crawler.run(["https://example.com"])