Chroma

Active
GitHub Rust Apache-2.0

Description

Chroma is an open-source AI-native embedding database designed for building LLM applications. It provides simple APIs to store embeddings and perform similarity search, making it ideal for RAG applications.

Key Features

  • Minimal core API — only 4 functions: create_collection, add, query, get — up and running in 5 minutes
  • Auto embedding — automatically handles tokenization, embedding, and indexing on add, no manual processing
  • Metadata filtering — precise filtering by metadata fields and full-text document search
  • Hybrid search — supports combined vector similarity search and full-text retrieval modes
  • Multi-language clients — Python and JavaScript/TypeScript clients, pip/npm one-click install
  • Persistent storage — in-memory mode for prototyping, persistent mode for production, chroma run for server

Use Cases

💡 RAG application backend — provide vector retrieval for LLMs, enable Q&A over private data
💡 Semantic search — similarity retrieval for unstructured data like documents and image descriptions
💡 Recommendation systems — personalized recommendations based on user behavior embeddings
💡 Knowledge graph supplement — store entity embeddings for semantic-level knowledge association queries
💡 Rapid prototyping — quickly validate retrieval effectiveness of AI apps using in-memory mode

Categories

Quick Start

pip install chromadb

import chromadb

# Start in-memory for quick prototyping
client = chromadb.Client()
collection = client.create_collection('my-docs')

# Add documents (auto-embedding)
collection.add(
    documents=['This is document 1', 'This is document 2'],
    metadatas=[{'source': 'notion'}, {'source': 'google-docs'}],
    ids=['doc1', 'doc2']
)

# Query top 2 most similar results
results = collection.query(
    query_texts=['query document'],
    n_results=2
)
print(results)

Related Projects

Related Articles