Firecrawl
Firecrawl is the Web Data API for AI, turning web pages into clean, structured, LLM-friendly data with crawl, scrape, and search capabilities.
Tools for retrieval-augmented generation
Firecrawl is the Web Data API for AI, turning web pages into clean, structured, LLM-friendly data with crawl, scrape, and search capabilities.
LangChain is the open-source agent engineering platform that unifies model IO, tool calling, RAG, memory and observability under one composable framework.
llama.cpp is a lightweight C/C++ inference engine that runs a wide range of open-source large language models efficiently on consumer hardware.
100+ AI Agent and RAG apps you can actually run — clone, customize, and ship. A great reference for quickly building LLM-powered applications.
Supabase's built-in pgvector search, turning Postgres into a RAG database.
A high-throughput and memory-efficient inference and serving engine for LLMs, featuring PagedAttention, continuous batching, and optimized KV cache management for production deployments.
A leading open-source RAG engine that fuses cutting-edge retrieval-augmented generation with agent capabilities to create a superior context layer for LLMs.
Comprehensive guides, papers, lessons, notebooks and resources for prompt engineering, context engineering, RAG, and AI Agents.
Transforms complex documents like PDFs into LLM-ready markdown/JSON for Agentic workflows, supporting layout analysis, formula recognition, and table extraction.
A comprehensive tutorial on AI agent principles and practice, systematically covering core concepts, framework usage and hands-on projects.
Pathway is a Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG applications.
Docling is an open-source document processing tool by IBM that converts PDF, Word, PPT, HTML and more into structured data for AI, purpose-built for GenAI and RAG pipelines.
AI-driven public opinion and trend monitor with multi-platform aggregation, RSS subscriptions, smart keyword filtering, AI-powered news analysis and briefings, supporting MCP integration and push notifications via WeChat, Feishu, DingTalk, Telegram and more.
Embedchain is a universal memory layer for AI agents, enabling quick integration of diverse data sources into LLMs for context-aware AI applications.
Mem0 is a long-term memory layer for AI agents, supporting cross-session memory management and personalized context retrieval.
Ready-to-run cloud templates for RAG, AI pipelines and enterprise search with live data, always in sync with Sharepoint, Google Drive, S3, Kafka and more.
Context7 is Upstash's context-engineering toolkit for agents, helping applications manage long context windows, retrieval injection, and history compression.
CodeGraph is a context graph for coding agents, mapping how a codebase is wired together so LLM-driven tools can navigate dependencies and produce more accurate edits.
LLM-powered stock analysis system for A/H/US markets with multi-source quotes, real-time news, LLM decision dashboard and multi-channel push notifications.
Leading data framework for LLM applications, with unified RAG, Agent, and Workflow capabilities.
Data framework for LLM apps specializing in RAG and agent data integration.
LlamaIndex is a data framework for building LLM applications. It provides data connectors, indexing, query engines, and agent workflow orchestration — a core tool in the RAG ecosystem.
LlamaIndex is a data framework that provides the data connection layer for LLM applications, with strong RAG capabilities across diverse data sources and vector databases.
Open-source AI engine to run any model — LLMs, vision, voice, image, video — on any hardware without GPU. Provides OpenAI-compatible API for fully local, privacy-first AI inference.
(24 / 170)
A systematic comparison of the three categories of agent memory -- working, long-term, and shared -- covering storage media, lifecycle, retrieval methods, typical frameworks, and design patterns, fully addressing agent personalization and multi-agent collaboration engineering.
A deep dive into the four-layer agent memory architecture, with practical code for vector retrieval and memory compression to help you build scalable long-term memory systems.
Exploring how small language models are fine-tuned and deployed for agent workloads at the edge, balancing latency, cost, and accuracy for production AI agents.
A systematic guide to seven tool-call fault tolerance patterns: timeout hierarchy, exponential backoff with jitter, circuit breakers, fallback provider chains, recoverable error classification, structured validation, and idempotency keys -- keeping agents stable in unstable real-world environments.
Learn how to build stateful AI agents with long-term memory using Letta (formerly MemGPT), solving the LLM context window limitation.
Long-conversation agents fail at context management, not model capability. A systematic comparison of sliding window, retrieval injection, and layered compression strategies with practical decay diagnosis and recovery patterns.