The Struggle: Transcribe stuff for free with Whisper and WSL/Linux

The Struggle: Transcribe stuff for free with Whisper and WSL/Linux – With a GTX 1060

Permalink

Published: 2025-12-23 11:26:07

Discovered: 2026-02-05 14:24:03

Hash: 30b1980e02b98f24cf08ff2a3b59ce922f5c1d2d

https://www.tornevalls.se/the-struggle-transcribe-stuff-for-free-with-whisper-and-wsl-linux-with-a-gtx-1060/

Description

I’ve been struggling with transcription issues for quite some time, for a variety of reasons. Examples: I need a text transcribed to be pasted into Suno, that only exists as a m4a-file (i.e. music, that sometimes has hardcoded subtitles that...

Content

I’ve been struggling with transcription issues for quite some time, for a variety of reasons. Examples: I need a text transcribed to be pasted into Suno, that only exists as a m4a-file (i.e. music, that sometimes has hardcoded subtitles that has to be manually transcribed). Etc.

I first found a Samsung app that could handle transcription, but it quickly became clear that it was limited to its own ecosystem. In practice, you could only transcribe audio that had been recorded inside that specific app.

Since then, I’ve been looking around on and off, and more recently I picked it up again as the need increased – partly to get correct transcriptions, but also to be able to process any audio files I download or record. Samsung’s app is decent, but the quality varies. Right after recording, it performs a quick transcription, but the result is noticeably worse than if you re-run the transcription once the audio file is fully finalized.

At that point I came across “Whisper Transcribe” for Windows. It works, but it requires an account and, of course, paid credits to continue transcribing. You get a small number of free credits at first, but once those run out, you’re expected to pay quite a bit just to keep going.

I already knew that there must be software capable of doing this completely locally. I had previously discovered that Whisper exists in an open-source form as well (I’m not even sure whether the Windows application actually builds on that or not). So today I decided to finally figure out how to do it properly myself.

The end result was the following (thanks to ChatGPT):

A Whisper installer for WSL/Linux, with explicit support for NVIDIA GTX 1060 – something newer Python libraries clearly no longer handle well.

A Whisper runner for WSL/Linux: run whisper <input-file> and get a .txt transcript generated from the audio file.

A Windows Registry file that allows transcription to be executed directly from Windows Explorer via right-click.

A batch file that bridges Windows and WSL so everything runs cleanly, including proper handling of spaces and non-ASCII characters in file names.

The result is a fully local, offline transcription setup that works on any audio file, without accounts, credits, or vendor lock-in.

WSL uses python and pip…

Table of Contents

Toggle

whisper.batwhisper.reg (explorer right clicks)installer för WSL/Linux (with 1060-compatibilty and pre-uninstaller)The script itself

whisper.bat

@echo off

setlocal EnableExtensions

REM Force UTF-8 codepage (fixes å ä ö)

chcp 65001 >nul

REM File passed from Explorer

set "WIN_FILE=%~1"

REM Convert Windows path to WSL path (UTF-8 safe now)

for /f "delims=" %%i in ('wsl wslpath "%WIN_FILE%"') do set "WSL_FILE=%%i"

REM Run whisper on that file

wsl bash -lc "/usr/local/tornevall/whisper \"%WSL_FILE%\""

endlocal

whisper.reg (explorer right clicks)

Windows Registry Editor Version 5.00

[HKEY_CLASSES_ROOT\*\shell\WhisperWSL]

@="Transkribera med Whisper (WSL)"

"Icon"="wsl.exe"

[HKEY_CLASSES_ROOT\*\shell\WhisperWSL\command]

@="\"F:\\viktigt\\Private\\Linux-Scripts\\Whisper.bat\" \"%1\""

installer för WSL/Linux (with 1060-compatibilty and pre-uninstaller)

To make sure stuff are removed properly before reinstalling there is a -u switch for this in the script. In case you make it wrong the first time, this switch is there to make sure you can reinstall it a second time without conflicts.

#!/usr/bin/env bash

set -euo pipefail

VENV_DIR="${VENV_DIR:-$HOME/.venvs/whisper}"

MODE="install"

# --- Parse args ---

while getopts ":u" opt; do

 case "$opt" in

 u) MODE="uninstall" ;;

 *)

 echo "Usage: $0 [-u]"

 exit 1

 ;;

 esac

done

echo "==> Whisper installer (GTX 1060 compatible)"

echo "==> Mode: $MODE"

# --- Sanity ---

if [[ ! -d "$VENV_DIR" ]]; then

 echo "Error: venv not found: $VENV_DIR"

 exit 1

fi

# shellcheck disable=SC1090

source "$VENV_DIR/bin/activate"

python -m pip install --upgrade pip setuptools wheel

# ==================================================

# UNINSTALL MODE (-u)

# ==================================================

if [[ "$MODE" == "uninstall" ]]; then

 echo "==> Uninstalling incompatible packages ONLY (-u)"

 pip uninstall -y torch torchvision torchaudio || true

 pip uninstall -y numpy || true

 echo ""

 echo "Done."

 echo "Uninstall completed. Nothing else touched."

 exit 0

fi

# ==================================================

# INSTALL MODE (DEFAULT)

# ==================================================

echo "==> Installing compatible stack (no forced uninstall)"

pip install \

 numpy==1.26.4 \

 torch==1.13.1+cu116 \

 torchvision==0.14.1+cu116 \

 torchaudio==0.13.1 \

 --extra-index-url https://download.pytorch.org/whl/cu116

# --- Verify ---

echo "==> Verifying environment"

python - << 'EOF'

import torch, numpy

print("Torch:", torch.__version__)

print("NumPy:", numpy.__version__)

print("CUDA available:", torch.cuda.is_available())

if torch.cuda.is_available():

 print("GPU:", torch.cuda.get_device_name(0))

 print("Capability:", torch.cuda.get_device_capability(0))

EOF

echo ""

echo "Done."

echo "Install completed without destructive actions."

The script itself

The script can run without any switches – and only with the audio file intended to be transcribed (but as you can see, it can do a bit more).

#!/usr/bin/env bash

set -euo pipefail

# whisper-run.sh

# Usage:

# whisper <input.extension> [model] [language]

#

# Output:

# <input-filename>.txt (same directory)

#

# Behaviour:

# - Refuses to overwrite existing .txt

# - Stops execution if output exists

if [[ $# -lt 1 ]]; then

 echo "Usage: whisper <input.extension> [model] [language]"

 exit 1

fi

INPUT="$1"

MODEL="${2:-small}"

LANGUAGE="${3:-}"

if [[ ! -f "$INPUT" ]]; then

 echo "Error: Input file not found: $INPUT"

 exit 1

fi

BASENAME="$(basename "$INPUT")"

STEM="${BASENAME%.*}"

OUTDIR="$(dirname "$INPUT")"

OUTPUT="$OUTDIR/$STEM.txt"

# --- Refuse overwrite ---

if [[ -f "$OUTPUT" ]]; then

 echo "Error: Output file already exists:"

 echo " $OUTPUT"

 echo "Aborting to avoid overwrite."

 exit 1

fi

# Prefer venv whisper if installed via install script

WHISPER_VENV="${WHISPER_VENV:-$HOME/.venvs/whisper}"

WHISPER_BIN="whisper"

if [[ -x "$WHISPER_VENV/bin/whisper" ]]; then

 WHISPER_BIN="$WHISPER_VENV/bin/whisper"

fi

if [[ "$WHISPER_BIN" == "whisper" ]] && ! command -v whisper >/dev/null 2>&1; then

 echo "Error: whisper not found in PATH or venv."

 exit 1

fi

TMPDIR="$(mktemp -d)"

cleanup() { rm -rf "$TMPDIR"; }

trap cleanup EXIT

echo "==> Transcribing:"

echo " input: $INPUT"

echo " output: $OUTPUT"

echo " model: $MODEL"

echo " lang: ${LANGUAGE:-auto}"

ARGS=(

 "$INPUT"

 --model "$MODEL"

 --output_dir "$TMPDIR"

 --output_format txt

 --task transcribe

 --verbose False

 --fp16 False

)

if [[ -n "$LANGUAGE" ]]; then

 ARGS+=( --language "$LANGUAGE" )

fi

"$WHISPER_BIN" "${ARGS[@]}"

GENERATED_TXT="$TMPDIR/$STEM.txt"

if [[ ! -f "$GENERATED_TXT" ]]; then

 FOUND_TXT="$(find "$TMPDIR" -maxdepth 1 -type f -name "*.txt" | head -n 1 || true)"

 if [[ -z "${FOUND_TXT:-}" ]]; then

 echo "Error: No .txt output produced."

 exit 1

 fi

 GENERATED_TXT="$FOUND_TXT"

fi

# --- Final move (no overwrite possible due to earlier check) ---

mv "$GENERATED_TXT" "$OUTPUT"

echo "==> Done:"

echo " $OUTPUT"

Tornevall Networks

The Struggle: Transcribe stuff for free with Whisper and WSL/Linux – With a GTX 1060