Embedchain

Active
GitHub Python Apache-2.0

Description

Embedchain is a universal memory layer for AI agents, enabling quick integration of diverse data sources into LLMs for context-aware AI applications.

Key Features

  • Universal memory layer — quickly integrates diverse data sources (web, PDFs, YouTube, Notion) into LLMs for context
  • Vectorized storage — auto-chunks, embeds, and stores data in vector databases with semantic retrieval support
  • Multi-LLM backend support — compatible with OpenAI, Cohere, Ollama and other LLM and embedding models
  • Simple API — three lines of code to complete data loading, indexing, and querying end-to-end
  • Multi-database adapters — supports Chroma, Pinecone, Qdrant, Weaviate and other mainstream vector databases
  • Streaming responses — supports streaming LLM output for improved user experience

Use Cases

💡 Rapid RAG application: connect document collections to LLMs to build private Q&A systems in minutes
💡 Personal knowledge assistant: integrate Notion, web bookmarks, and PDF notes into a conversational knowledge tool
💡 Customer service knowledge: load product docs and FAQs as vector indexes for agent-based precise answer retrieval
💡 Code documentation Q&A: index project docs and API references for natural language technical documentation queries
💡 Multi-source information aggregation: extract info from web, video subtitles, and local files into unified semantic indexes

Quick Start

# Install Embedchain
pip install embedchain

# Quick start: 3 steps to build a RAG app
from embedchain import App

# Create app and load data
app = App.from_config(config={
    "llm": {"provider": "openai", "config": {"model": "gpt-4o-mini"}},
    "vectordb": {"provider": "chroma"}
})

app.add("https://www.example.com/docs")  # Load web data
answer = app.query("What is the main content of this document?")  # Query
print(answer)

Related Projects