Speech to Text on WSL in one line
Let's assume you downloaded a Voice file from your iPhone and now want to convert it to a text. You just need to run that line:
Full command
MODEL=tiny.en FILE=YourVoiceFile.m4a docker run --rm --entrypoint /bin/sh -e MODEL -e FILE -v "$(pwd):/data" ghcr.io/ggml-org/whisper.cpp:main -lc 'BASE="${FILE%.*}"; test -f /data/models/ggml-$MODEL.bin || { mkdir -p /data/models && curl -L "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-$MODEL.bin" -o "/data/models/ggml-$MODEL.bin"; }; ffmpeg -y -i "/data/$FILE" -ar 16000 -ac 1 -c:a pcm_s16le "/data/$BASE.wav" && whisper-cli -m "/data/models/ggml-$MODEL.bin" -t 8 -f "/data/$BASE.wav" -otxt -of "/data/$BASE"'
Description:
MODEL=tiny.en
Selects the Whisper model. Change this to base.en, small.en, etc.
FILE=YourVoiceFile.m4a
Sets the input audio filename from your current folder.
docker run --rm
Starts a temporary container and removes it when done.
--entrypoint /bin/sh
Forces the container to run a shell so the inline script works.
-e MODEL -e FILE
Passes both environment variables into the container.
-v "$(pwd):/data"
Mounts your current folder into the container as /data, so inputs and outputs are shared.
BASE="${FILE%.*}"
Removes the file extension, so YourVoiceFile.m4a becomes YourVoiceFile.
test -f /data/models/ggml-$MODEL.bin || ...
Checks whether the chosen model is already cached locally.
mkdir -p /data/models
Creates the local models folder if it does not exist.
curl -L ... -o /data/models/ggml-$MODEL.bin
Downloads the selected model if missing.
ffmpeg -y -i "/data/$FILE" -ar 16000 -ac 1 -c:a pcm_s16le "/data/$BASE.wav"
Converts your audio file to 16 kHz, mono, 16-bit WAV, which is a safe input format for Whisper.
whisper-cli -m "/data/models/ggml-$MODEL.bin" -t 8 -f "/data/$BASE.wav" -otxt -of "/data/$BASE"
Runs transcription using 8 CPU threads and writes the transcript as /data/YourVoiceFile.txt.
Output files:
YourVoiceFile.wavYourVoiceFile.txt- cached model in
models/