A minimalist Windows desktop app for local, privacy-first push-to-talk voice transcription. Inspired by Wispr Flow.
Hold a keyboard shortcut, speak, release — the transcribed text is instantly typed into whatever window you were using. Nothing ever leaves your computer.
- Features
- Requirements
- Installation
- Downloading Models
- First Run
- How to Use
- Settings
- Whisper Models
- Transcription History
- Tray Icon & Navigation
- Logs
- Privacy
- Known Limitations
- Building from Source
- 🎙️ Push-to-talk recording with a configurable hotkey — default is
Left Alt + Left Win - 🤖 Local AI transcription via Whisper.net (whisper.cpp backend) — no cloud, no API key
- ⚡ CUDA GPU acceleration — automatic if an NVIDIA GPU is present
- 🌍 57 languages supported, including auto-detection
- 🪟 Floating pill widget — stays on top, draggable, remembers position
- ⏱️ Adaptive ETA countdown — estimated from real previous transcription timings stored locally per model and runtime environment
- 📋 Transcription history — browse, select and copy past transcriptions
- 🔕 System tray — runs silently in the background
- 🔒 100% offline — your voice data never leaves the machine
| Component | Minimum |
|---|---|
| OS | Windows 10 or Windows 11 (64-bit) |
| Runtime | .NET 8 Desktop Runtime |
| RAM | 4 GB (8 GB+ recommended for large models) |
| Disk | 78 MB – 3 GB depending on the chosen model |
| GPU (optional) | NVIDIA GPU with CUDA support (strongly recommended for large/medium models) |
| Microphone | Any Windows-recognized input device |
Without a CUDA-capable GPU the app still works, but transcription will be significantly slower on large models. Use
smallortinymodels for CPU-only usage.
- Download the latest release from the Releases page.
- Extract the ZIP archive to a folder of your choice (e.g.
C:\Apps\WhisperWriter). - Download a Whisper model using the included script (see Downloading Models) or manually (see Whisper Models).
- Run
WhisperWriter.exe.
If Windows shows a SmartScreen warning, click More info → Run anyway. The executable is signed with a self-signed certificate.
The easiest way to get Whisper models is the included download script.
Simply double-click download-models.bat in the WhisperWriter folder. No setup required — it launches PowerShell automatically with the correct settings.
# Run from the WhisperWriter folder
.\download-models.ps1If you get an "execution policy" error in PowerShell or PowerShell ISE, use Option A (the
.batfile) instead — it bypasses the policy automatically. Alternatively, run this once to allow local scripts permanently:Set-ExecutionPolicy -Scope CurrentUser RemoteSigned
- Shows a numbered list of all available models with disk size, VRAM requirements and a short description.
- Already-downloaded models are marked [downloaded] in green.
- You type the numbers of the models you want (comma-separated, spaces, or ranges like
1-3,7). - After confirming, missing models are downloaded one by one with a live progress indicator.
- Files are saved directly into the
llms\folder next to the script.
Example session:
# Model Disk VRAM Notes
--- ------------------- -------- -------- -----
1 large-v3-turbo 1.6 GB ~3 GB Best speed/accuracy tradeoff
2 large-v3 3.1 GB ~10 GB Most accurate, latest generation
...
5 medium 1.5 GB ~5 GB Good balance, multilingual [downloaded]
Your selection: 1,5
On the very first launch:
- The app checks for
llms\ggml-large-v2.binby default. - If the model file is not found, it automatically downloads
ggml-medium(~1.5 GB) from the official Hugging Face repository. This may take a few minutes depending on your connection. - The app also creates a local SQLite database
llms\eta-time-stats.dbfor ETA timing statistics. - The floating pill widget appears at the bottom-center of your screen showing "Loading…" while the model is being loaded into memory (or downloaded).
- Once it shows "Ready", you can start transcribing.
- Click into any text field — a browser address bar, Notepad, Word, chat app, anywhere.
- Hold your push-to-talk hotkey — by default
Left Alt + Left Win. The widget turns red and shows "Recording…". Speak clearly into your microphone. - Release the keys — the widget shows "Transcribing…". Starting from the second matching transcription for the same model and runtime environment, it also shows an ETA countdown like
~Xs. - The transcribed text is automatically typed into the window that had focus before you started recording.
⚠️ Make sure to click into the target text field before pressing the hotkey. The app saves the focus at the moment you press the keys.
- Speak at a natural pace — Whisper handles natural speech well.
- Use the Prompt setting to pre-seed Whisper with domain-specific vocabulary (names, technical terms, abbreviations) for better accuracy.
- For Czech or other non-English languages, explicitly set the language instead of using auto-detect — it improves both speed and accuracy.
- The widget is draggable — click and drag it anywhere on screen. Its position is saved automatically.
Open Settings via the tray icon → Settings.
| Setting | Description |
|---|---|
| Model | Which Whisper model to use. Affects accuracy, speed and VRAM usage. See Whisper Models. |
| Language | Transcription language. auto attempts to detect the language automatically. Explicitly selecting a language is faster and more reliable. |
| Prompt | Optional hint text for Whisper. Use it to teach the model uncommon words, names or formatting. Example: Whisper, GPT-4, OpenAI, CUDA. |
| Push-to-talk hotkey | Configurable directly in Settings. Click Change…, hold the desired key combination, then release it to save. Default is Left Alt + Left Win. |
| History size | Maximum number of transcriptions kept in memory during a session (1 to 2,147,483,647). |
Click Save to apply changes. The model is reloaded automatically if you change it.
Models must be placed in the llms\ folder next to WhisperWriter.exe. File names must match exactly.
| Model | File name | Disk | VRAM | Notes |
|---|---|---|---|---|
| large-v3-turbo | ggml-large-v3-turbo.bin |
1.6 GB | ~3 GB | Best speed/accuracy tradeoff |
| large-v3 | ggml-large-v3.bin |
3 GB | ~10 GB | Most accurate, latest generation |
| large-v2 | ggml-large-v2.bin |
3 GB | ~10 GB | Most accurate, recommended (default) |
| large-v1 | ggml-large-v1.bin |
3 GB | ~10 GB | Accurate, older generation |
| medium | ggml-medium.bin |
1.5 GB | ~5 GB | Good balance, multilingual |
| medium.en | ggml-medium.en.bin |
1.5 GB | ~5 GB | Good balance, English only |
| small | ggml-small.bin |
488 MB | ~2 GB | Fast, multilingual |
| small.en | ggml-small.en.bin |
488 MB | ~2 GB | Fast, English only |
| base | ggml-base.bin |
148 MB | ~1 GB | Very fast, multilingual |
| base.en | ggml-base.en.bin |
148 MB | ~1 GB | Very fast, English only |
| tiny | ggml-tiny.bin |
78 MB | ~390 MB | Fastest, multilingual, least accurate |
| tiny.en | ggml-tiny.en.bin |
78 MB | ~390 MB | Fastest, English only, least accurate |
Option A – download script (recommended):
Use the included download-models.ps1 script — see Downloading Models.
Option B – manual download:
Download GGML model files from the official Whisper.net / whisper.cpp model repository:
👉 https://huggingface.co/ggerganov/whisper.cpp
Download the .bin file for your chosen model and place it in the llms\ folder.
- NVIDIA GPU with 4+ GB VRAM →
large-v3-turbo(best balance) orlarge-v2(highest accuracy) - NVIDIA GPU with 2–4 GB VRAM →
mediumorsmall - CPU only or weak GPU →
small,baseortiny - English only → prefer
.envariants (slightly faster and more accurate for English)
Access via tray icon → Transcriptions.
- Lists all transcriptions from the current session, newest first.
- Each entry shows the transcribed text, timestamp and transcription duration.
- The text in each entry is selectable — click and drag to select, then use
Ctrl+Cor the Copy to clipboard button at the bottom. - History is kept in memory only and is cleared when the app exits.
- The maximum number of stored entries is controlled by the History size setting.
WhisperWriter lives in the system tray (notification area, bottom-right of taskbar).
| Action | Result |
|---|---|
| Left double-click on tray icon | Show / bring the floating widget to front |
| Right-click on tray icon | Open context menu |
| Context menu → About WhisperWriter | Show the About window |
| Context menu → Transcriptions | Open transcription history |
| Context menu → Settings | Open settings |
| Context menu → Exit | Quit the application |
Closing the floating widget does not exit the app — it continues running in the tray.
Application logs are written to the logs\ folder next to WhisperWriter.exe.
- File pattern:
logs\whisperwriter-YYYYMMDD.log - Logs are retained for 14 days, then automatically deleted.
- All errors, warnings and key events (model loading, transcription start/end, exceptions) are logged.
If the app behaves unexpectedly, check the latest log file for details.
WhisperWriter stores ETA timing data in a local SQLite database:
- File:
llms\eta-time-stats.db - The database is used only to improve the ETA countdown.
- It stores timing samples per model and per detected runtime environment.
- The environment fingerprint includes CPU, all detected GPUs (stored as a sorted array), RAM, OS version, whether Whisper is running on CPU or GPU, CUDA version, Whisper thread count, AC/battery power, and power-saver state.
- For a new ETA estimate, WhisperWriter prefers past transcriptions with a similar audio length (roughly ±30%, then ±50%, then fallback to all samples for the same model/environment).
- ETA is shown only after at least one previous matching sample exists, so it starts appearing from the second matching transcription onward.
- No internet connection required after the model is downloaded (first run only).
- No telemetry, no analytics, no tracking of any kind.
- Audio is recorded only while the hotkey is held and is discarded immediately after transcription.
- Audio data never leaves your computer — transcription runs entirely locally using whisper.cpp.
- Hotkey is fixed to
Left Ctrl + Left Winand cannot be changed through the UI yet. - Only one microphone is supported (Windows default input device). Device selection is not available yet.
- History is session-only — transcriptions are not persisted to disk.
- The ETA countdown during transcription is an estimate based on a fixed factor calibrated for NVIDIA Quadro T2000 + large-v2 model. It may be inaccurate on different hardware.
- On very short recordings (under ~1 second) Whisper may produce empty or noisy output.
- .NET 8 SDK (or any newer .NET SDK — 9, 10, … all work)
- Windows 10/11
- Visual Studio 2022 (optional, for IDE support)
- CUDA Toolkit 11, 12 or 13 (optional, for GPU acceleration — not needed to compile)
After cloning, simply double-click setup-dev.bat in the repository root.
The script performs all first-time setup steps automatically:
- Verifies a compatible .NET SDK (>= 8) is installed.
- Detects the locally installed CUDA Toolkit (versions 11, 12, 13) and extracts the runtime version.
- Runs
dotnet restore. - Copies the CUDA runtime DLLs (
cudart64_N.dll,cublas64_N.dll,cublasLt64_N.dll) tobin\Debug\runtimes\cuda\win-x64. - If the detected CUDA path differs from the one stored in
WhisperWriter.csproj, offers to update it. - Checks for a Whisper model in
llms\and offers to launchdownload-models.ps1if none is found.
If CUDA is not installed the script skips steps 4–5 and continues — the app will run on CPU.
# Clone the repository
git clone https://github.com/tomFlidr/whisper-writer.git
cd whisper-writer
# One-time setup (NuGet restore, CUDA DLL copy, model download prompt)
.\setup-dev.bat
# Build
dotnet build WhisperWriter.csproj -c Debug
# Run
.\bin\Debug\net8.0-windows\WhisperWriter.exeBefore building, make sure
WhisperWriter.exeis not running — the output file will be locked by the running process.
The MSBuild target CopyCudaRuntimeDlls in WhisperWriter.csproj automatically derives the CUDA major version from the CudaBinDir path (e.g. …\CUDA\v13.2\bin\x64 → version 13). The correct DLL names are assembled at build time — no manual version editing is needed. setup-dev.ps1 updates CudaBinDir in the project file to match whatever CUDA version is installed on the machine.
| Package | Version | Purpose |
|---|---|---|
| Whisper.net | 1.9.0 | Whisper model inference |
| Whisper.net.Runtime | 1.9.0 | Native whisper.cpp runtime |
| Whisper.net.Runtime.Cuda | 1.9.0 | CUDA GPU acceleration |
| NAudio | 2.2.1 | Microphone capture |
| Microsoft.Data.Sqlite | 8.0.0 | Local ETA timing statistics database |
| System.Management | 8.0.0 | Windows hardware discovery for ETA environment fingerprinting |
| Serilog | 4.3.0 | Logging |
| System.Text.Json | 8.0.5 | Settings serialization |
Releases are built and published automatically via GitHub Actions. No manual steps are needed.
# Tag the current commit and push the tag
git tag v1.2.3
git push origin v1.2.3GitHub Actions will automatically:
- Build the project for Windows x64 (with CUDA support) and Windows x86 (CPU only).
- Pack each build into a ZIP archive.
- Create a GitHub Release with auto-generated release notes and attach both ZIPs.
| Archive | Runtime | GPU |
|---|---|---|
WhisperWriter-v1.x.x-win-x64.zip |
Windows 64-bit | CUDA (NVIDIA) + CPU fallback |
WhisperWriter-v1.x.x-win-x86.zip |
Windows 32-bit | CPU only |
Pre-releases: if the tag name contains a hyphen (e.g.
v1.0.0-beta), the GitHub Release is automatically marked as a pre-release.
Signing: Authenticode signing and strong-name signing are skipped in CI (
.pfx/.snkare not committed to the repository). Release builds are unsigned; local Debug builds are signed if the key files are present.
After every successful build that changes user-facing behavior, update this README if the changes affect installation steps, settings, model list, hotkeys, UI navigation or any other section described here.
The internal developer/AI context is maintained separately in .github/copilot-instructions.md.