Whisper

Normal
GitHub Python MIT

Description

OpenAI Whisper is a multilingual speech recognition foundation model that performs high-quality English and multilingual transcription, translation and language identification locally.

Key Features

  • Multilingual — transcription across 99 languages plus translation to English
  • Multiple sizes — from tiny to large, pick the trade-off between accuracy and speed
  • Robust — handles accents, background noise and other real-world audio conditions
  • Timestamps — emits word- and sentence-level timestamps for subtitles and search
  • Translation — automatically translates non-English speech into English text
  • Easy integration — CLI and Python API with batched processing for long audio

Use Cases

💡 Auto-generating transcripts and subtitles for meetings, podcasts and interviews
💡 Adding voice input to AI agents for spoken conversation
💡 Transcribing and translating multilingual customer support or teaching videos
💡 Deploying end-to-end speech-to-text pipelines in offline environments
💡 Enabling content search and structured analysis over long-form audio

Quick Start

# Install dependencies
pip install -U openai-whisper

# CLI transcription
whisper audio.wav --language English --model small

# Python API
import whisper
model = whisper.load_model('base')
result = model.transcribe('audio.wav', language='en')
print(result['text'])

Related Projects