相关项目
mistral.rs
Fast, flexible LLM inference engine built in Rust — supports multiple model architectures and quantization schemes for high-performance local LLM deployment.
vLLM
A high-throughput and memory-efficient inference and serving engine for LLMs, featuring PagedAttention, continuous batching, and optimized KV cache management for production deployments.
Vision Agents
Open Vision Agents by Stream. Build voice and vision agents quickly with any model or video provider, using Stream's edge network for ultra-low latency realtime interactions.
MiroThinker
A deep research agent framework optimized for complex research and prediction tasks, with MiroThinker-1.7 and MiroThinker-H1 models achieving 74.0 and 88.2 on BrowseComp benchmark, supporting multi-step reasoning and information retrieval.