Question 1

Does my audio get uploaded anywhere?

Accepted Answer

No. The audio file stays in your browser the entire time. We run OpenAI's Whisper neural network locally with WebGPU (or WebAssembly on devices without WebGPU), so there is no server in the loop, nothing to log, and nothing for us to retain.

Question 2

Why does it ask to download a model the first time?

Accepted Answer

Whisper has to run on your device for the privacy guarantee to mean anything. The first time you hit Transcribe, your browser downloads the model and caches it. Subsequent runs use the cached copy and start instantly. You can review or delete cached models at any time using the 'Local models' chip in the page header.

Question 3

How big is the download?

Accepted Answer

Whisper Base is around 200 MB on WebGPU (about 80 MB on the WebAssembly fallback) and is the default. Whisper Turbo is around 560 MB and produces noticeably better results on accented English, non-English audio, and noisy recordings. Pick the one that fits your bandwidth and quality needs; both are cached after the first run.

Question 4

Which languages are supported?

Accepted Answer

Both models are multilingual. Whisper recognizes 99 languages including English, Arabic, Spanish, French, German, Hindi, Chinese, Japanese, Korean, Portuguese, Russian, Indonesian and many more. Leave the language picker on Auto-detect to let Whisper figure it out, or pick the source language to skip detection.

Question 5

Can it translate to English?

Accepted Answer

Yes. Set the Task switch to 'Translate to English' and Whisper will produce an English transcript even when the audio is in another language. This works for all 99 supported languages and is genuinely useful for podcasts, lectures, and interviews you want to read rather than listen to.

Question 6

What output formats do you support?

Accepted Answer

Plain text, SubRip subtitles (.srt), WebVTT subtitles (.vtt), and JSON with timestamps. SRT and VTT are the standard subtitle formats for YouTube, Vimeo, video editors, and TVs. JSON gives you the raw timestamp + text array if you want to post-process the transcript.

Question 7

How long can the audio be?

Accepted Answer

The tool accepts files up to 500 MB. There is no hard duration cap. Whisper processes long audio in 30-second sliding windows internally, so a one-hour podcast typically transcribes in 2 to 6 minutes on WebGPU with Whisper Base.

Question 8

Does this work on video files?

Accepted Answer

Yes. Drop an MP4, MOV, or WebM and we extract the audio track in the browser before transcribing. Useful for subtitling your own videos without ever uploading them to a third-party service.

Question 9

Can I record straight from my microphone?

Accepted Answer

Yes. Click 'Record from microphone' under the drop zone and grant the permission prompt. We use the browser's MediaRecorder to capture audio locally, hand the resulting file to Whisper, and never touch the network. Tap Stop when you're done and the transcript runs against your recording.

Question 10

Does it support real-time live transcription?

Accepted Answer

Yes. Switch to Live dictation at the top of the tool and click Start. Whisper Base runs in a background worker, a voice-activity detector segments your speech, and each segment's transcript appears the moment you pause. Works in 99 languages with the same translate-to-English option. WebGPU is required for live mode (Chrome, Edge, Brave, Arc on desktop; Safari 26+ on iOS).

Question 11

How accurate is the transcript?

Accepted Answer

Whisper is one of the most accurate open-source speech models available. On clear English audio, error rates are usually in the 3-7% range. Accuracy drops on heavy accents, overlapping speakers, music-heavy audio, and noisy environments. Switch to Turbo if Base is missing words you can clearly hear.

Question 12

Does this work offline?

Accepted Answer

Yes, after the first visit. Once the model is cached, your browser does not need a network connection to transcribe. The page itself also works offline if you have visited it before.

Question 13

Why does it say WebGPU is faster?

Accepted Answer

WebGPU lets the model run on your graphics card instead of your CPU. For Whisper this is roughly 3 to 10 times faster. We auto-detect WebGPU support and fall back to WebAssembly when it is unavailable; the badge above the action button tells you which path you got.

Question 14

Can I use the transcript commercially?

Accepted Answer

Yes. The transcript is yours. Whisper itself is released by OpenAI under the MIT license, and the ONNX model we load (onnx-community/whisper-base, onnx-community/whisper-large-v3-turbo) inherits that license. There are no per-minute fees and no rights claim by us.

Audio Transcription

Free audio transcription that runs entirely in your browser

How to use it

What it does

Common use cases

Why local AI matters

Frequently asked questions