vLLM
ActiveDescription
A high-throughput and memory-efficient inference and serving engine for LLMs, featuring PagedAttention, continuous batching, and optimized KV cache management for production deployments.
A high-throughput and memory-efficient inference and serving engine for LLMs, featuring PagedAttention, continuous batching, and optimized KV cache management for production deployments.
Run any open-source LLMs such as DeepSeek and Llama as OpenAI-compatible API endpoints in the cloud. Supports fine-tuning, quantization, and distributed inference for production-grade LLM deployment.
A comprehensive single-package Retrieval-Augmented Generation platform built on Langflow, Docling, and OpenSearch, providing a complete pipeline from document parsing to vector retrieval and generation with multi-model and multi-vector-database support.
Open-source text-to-SQL and text-to-chart GenBI agent with a semantic layer. Ask your database questions in natural language and get accurate SQL, charts, and BI insights. Supports 12+ data sources and any LLM.
A Python library by Google for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization, designed for data annotation and knowledge extraction workflows.