KTransformers
ActiveDescription
A flexible framework for experiencing heterogeneous LLM inference and fine-tuning optimizations — run large language models efficiently on consumer hardware with kernel-level optimizations.
A flexible framework for experiencing heterogeneous LLM inference and fine-tuning optimizations — run large language models efficiently on consumer hardware with kernel-level optimizations.
An end-to-end RL training framework by NVIDIA for orchestrating tools and agentic workflows. Optimizes multi-step agent decision-making and tool-use policies.
Fast, flexible LLM inference engine built in Rust — supports multiple model architectures and quantization schemes for high-performance local LLM deployment.
Open Vision Agents by Stream. Build voice and vision agents quickly with any model or video provider, using Stream's edge network for ultra-low latency realtime interactions.
A deep research agent framework optimized for complex research and prediction tasks, with MiroThinker-1.7 and MiroThinker-H1 models achieving 74.0 and 88.2 on BrowseComp benchmark, supporting multi-step reasoning and information retrieval.