Whisper

Normal

Description

OpenAI Whisper is a multilingual speech recognition foundation model that performs high-quality English and multilingual transcription, translation and language identification locally.

Key Features

Multilingual — transcription across 99 languages plus translation to English
Multiple sizes — from tiny to large, pick the trade-off between accuracy and speed
Robust — handles accents, background noise and other real-world audio conditions
Timestamps — emits word- and sentence-level timestamps for subtitles and search
Translation — automatically translates non-English speech into English text
Easy integration — CLI and Python API with batched processing for long audio

Use Cases

💡 Auto-generating transcripts and subtitles for meetings, podcasts and interviews

💡 Adding voice input to AI agents for spoken conversation

💡 Transcribing and translating multilingual customer support or teaching videos

💡 Deploying end-to-end speech-to-text pipelines in offline environments

💡 Enabling content search and structured analysis over long-form audio

Quick Start

# Install dependencies
pip install -U openai-whisper

# CLI transcription
whisper audio.wav --language English --model small

# Python API
import whisper
model = whisper.load_model('base')
result = model.transcribe('audio.wav', language='en')
print(result['text'])

Visit GitHub Visit Website View Docs

Whisper

Description

Key Features

Use Cases

Tags

Categories

Quick Start

Related Projects

Screenshot to Code

FastRTC

Gemini Cookbook

Jina AI Serve