I’ve been struggling with transcription issues for quite some time, for a variety of reasons. Examples: I need a text transcribed to be pasted into Suno, that only exists as a m4a-file (i.e. music, that sometimes has hardcoded subtitles that...
Content
I’ve been struggling with transcription issues for quite some time, for a variety of reasons. Examples: I need a text transcribed to be pasted into Suno, that only exists as a m4a-file (i.e. music, that sometimes has hardcoded subtitles that has to be manually transcribed). Etc.
I first found a Samsung app that could handle transcription, but it quickly became clear that it was limited to its own ecosystem. In practice, you could only transcribe audio that had been recorded inside that specific app.
Since then, I’ve been looking around on and off, and more recently I picked it up again as the need increased – partly to get correct transcriptions, but also to be able to process any audio files I download or record. Samsung’s app is decent, but the quality varies. Right after recording, it performs a quick transcription, but the result is noticeably worse than if you re-run the transcription once the audio file is fully finalized.
At that point I came across “Whisper Transcribe” for Windows. It works, but it requires an account and, of course, paid credits to continue transcribing. You get a small number of free credits at first, but once those run out, you’re expected to pay quite a bit just to keep going.
I already knew that there must be software capable of doing this completely locally. I had previously discovered that Whisper exists in an open-source form as well (I’m not even sure whether the Windows application actually builds on that or not). So today I decided to finally figure out how to do it properly myself.
The end result was the following (thanks to ChatGPT):
A Whisper installer for WSL/Linux, with explicit support for NVIDIA GTX 1060 – something newer Python libraries clearly no longer handle well.
A Whisper runner for WSL/Linux: run whisper <input-file> and get a .txt transcript generated from the audio file.
A Windows Registry file that allows transcription to be executed directly from Windows Explorer via right-click.
A batch file that bridges Windows and WSL so everything runs cleanly, including proper handling of spaces and non-ASCII characters in file names.
The result is a fully local, offline transcription setup that works on any audio file, without accounts, credits, or vendor lock-in.
WSL uses python and pip…
Table of Contents
Toggle
whisper.batwhisper.reg (explorer right clicks)installer för WSL/Linux (with 1060-compatibilty and pre-uninstaller)The script itself
whisper.bat
@echo off
setlocal EnableExtensions
REM Force UTF-8 codepage (fixes å ä ö)
chcp 65001 >nul
REM File passed from Explorer
set "WIN_FILE=%~1"
REM Convert Windows path to WSL path (UTF-8 safe now)
for /f "delims=" %%i in ('wsl wslpath "%WIN_FILE%"') do set "WSL_FILE=%%i"
REM Run whisper on that file
wsl bash -lc "/usr/local/tornevall/whisper \"%WSL_FILE%\""
endlocal
whisper.reg (explorer right clicks)
Windows Registry Editor Version 5.00
[HKEY_CLASSES_ROOT\*\shell\WhisperWSL]
@="Transkribera med Whisper (WSL)"
"Icon"="wsl.exe"
installer för WSL/Linux (with 1060-compatibilty and pre-uninstaller)
To make sure stuff are removed properly before reinstalling there is a -u switch for this in the script. In case you make it wrong the first time, this switch is there to make sure you can reinstall it a second time without conflicts.
# --- Refuse overwrite ---
if [[ -f "$OUTPUT" ]]; then
echo "Error: Output file already exists:"
echo " $OUTPUT"
echo "Aborting to avoid overwrite."
exit 1
fi
# Prefer venv whisper if installed via install script
WHISPER_VENV="${WHISPER_VENV:-$HOME/.venvs/whisper}"
WHISPER_BIN="whisper"
if [[ -x "$WHISPER_VENV/bin/whisper" ]]; then
WHISPER_BIN="$WHISPER_VENV/bin/whisper"
fi
if [[ "$WHISPER_BIN" == "whisper" ]] && ! command -v whisper >/dev/null 2>&1; then
echo "Error: whisper not found in PATH or venv."
exit 1
fi
if [[ -n "$LANGUAGE" ]]; then
ARGS+=( --language "$LANGUAGE" )
fi
"$WHISPER_BIN" "${ARGS[@]}"
GENERATED_TXT="$TMPDIR/$STEM.txt"
if [[ ! -f "$GENERATED_TXT" ]]; then
FOUND_TXT="$(find "$TMPDIR" -maxdepth 1 -type f -name "*.txt" | head -n 1 || true)"
if [[ -z "${FOUND_TXT:-}" ]]; then
echo "Error: No .txt output produced."
exit 1
fi
GENERATED_TXT="$FOUND_TXT"
fi
# --- Final move (no overwrite possible due to earlier check) ---
mv "$GENERATED_TXT" "$OUTPUT"