Crawlee Python

活跃

GitHub Python Apache-2.0

简介

Crawlee Python 是 Apify 出品的网页抓取与浏览器自动化库，专注可靠的数据采集与爬虫场景。

核心特性

统一 API，支持 HTTP 抓取、无头浏览器和基于 Playwright 的爬虫
自动请求队列、重试、限速和代理轮换
可插拔的 HTTP 客户端：httpx、curl-impersonate 和原始 socket
浏览器指纹管理与隐身模式，绕过反爬虫防护
数据集和 KV 存储集成，方便结构化保存抓取结果
与 Apify 平台原生集成，可一键部署爬虫到云端

适用场景

💡 为电商价格监控构建生产级网页爬虫

💡 抓取需要真实浏览器的 JavaScript 渲染页面

💡 为 RAG 流水线和下游 LLM Agent 喂入结构化网页数据

💡 编写长时间可靠运行的爬虫，自带重试和代理管理

💡 把现有 Node.js Crawlee 项目迁移到 Python，保持相同心智模型

分类

⚡ Agent 工具 🌐 浏览器 Agent

快速开始

pip install crawlee
from crawlee.playwright_crawler import PlaywrightCrawler
crawler = PlaywrightCrawler()
@crawler.router.default_handler
async def handle(context):
    await context.page.goto(context.request.url)
    title = await context.page.title()
    await context.push_data({"url": context.request.url, "title": title})
await crawler.run(["https://example.com"])

访问 GitHub 访问官网查看文档

Crawlee Python

简介

核心特性

适用场景

标签

分类

快速开始

相关项目

Firecrawl

Vision Agents

browser-harness

Maxun