Whisper
NormalDescription
OpenAI Whisper is a multilingual speech recognition foundation model that performs high-quality English and multilingual transcription, translation and language identification locally.
Key Features
- Multilingual — transcription across 99 languages plus translation to English
- Multiple sizes — from tiny to large, pick the trade-off between accuracy and speed
- Robust — handles accents, background noise and other real-world audio conditions
- Timestamps — emits word- and sentence-level timestamps for subtitles and search
- Translation — automatically translates non-English speech into English text
- Easy integration — CLI and Python API with batched processing for long audio
Use Cases
Categories
Quick Start
# Install dependencies
pip install -U openai-whisper
# CLI transcription
whisper audio.wav --language English --model small
# Python API
import whisper
model = whisper.load_model('base')
result = model.transcribe('audio.wav', language='en')
print(result['text'])