vLLM
活跃简介
A high-throughput and memory-efficient inference and serving engine for LLMs, featuring PagedAttention, continuous batching, and optimized KV cache management for production deployments.
A high-throughput and memory-efficient inference and serving engine for LLMs, featuring PagedAttention, continuous batching, and optimized KV cache management for production deployments.
Run any open-source LLMs such as DeepSeek and Llama as OpenAI-compatible API endpoints in the cloud. Supports fine-tuning, quantization, and distributed inference for production-grade LLM deployment.
一站式检索增强生成(RAG)平台,集成 Langflow、Docling 和 OpenSearch,提供从文档解析到向量检索再到生成的完整流水线,支持多种模型和向量数据库。
Open-source text-to-SQL and text-to-chart GenBI agent with a semantic layer. Ask your database questions in natural language and get accurate SQL, charts, and BI insights. Supports 12+ data sources and any LLM.
A Python library by Google for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization, designed for data annotation and knowledge extraction workflows.