Foundry Local Integration: Simplifying Handy's Speech-to-Text Pipeline #1295

samuel100 · 2026-04-15T14:50:07Z

samuel100
Apr 15, 2026

Hi everyone 👋

We've been exploring integrating Microsoft Foundry Local as the speech-to-text backend for Handy, and we'd love to share what we've learned and collaborate with the community on taking this further.

The Problem Today

Handy currently maintains an inference pipeline with 8 different engine backends (whisper-cpp, Parakeet, Moonshine, SenseVoice, GigaAM, Canary, Cohere), each with its own loading code, inference parameters, and platform-specific GPU acceleration paths (Metal on macOS, Vulkan on Linux, DirectML on Windows). On top of that, there's a custom HTTP model download system with resumable transfers, SHA256 verification, and tar.gz extraction, plus a separate Silero VAD model for voice activity detection.

This complexity is the root cause of many of the crash bugs the community has reported:

GPU driver crashes: [BUG] Handy Crash on Windows 10 (Vulkan.dll) #99 (Vulkan.dll crash on Windows 10), [BUG] Crash on|after model load (AVX2) #91 (SIGILL on model load due to AVX2/FMA3 requirements)
Model loading failures: [BUG] Not able to use any of the Whisper models : makes Handy to crash #261 (Whisper models crash), [BUG] App crashes when selecting Whisper medium #1080 (crash selecting Whisper Medium), [BUG] app crashes when I try to select any model other than Parakeet V3 #999 (crash on any model except Parakeet V3), [BUG] Whisper Models Fail After Choosing >= Medium #870 (Whisper fails on Medium+)
Platform-specific crashes: [BUG] Handy crashes under Linux when changing Model #563 (Linux crash on model change), [BUG] Handy crashing Windows 11 #436 (Windows 11 crash), [BUG] Crashing on Kubuntu 25.10 #924 (Kubuntu crash)
Download corruption: Critical: App Becomes Unusable After Crash During Whispersmall download on Windows 10 (v0.31 & v0.32) #56 (app unusable after download crash), App Becomes Unusable After Crash Windows 10/Linux (v0.31 & v0.32) #55 (app becomes unusable after crash)

Each engine has its own hardware detection, its own failure modes, and its own quirks. That's a lot of surface area for bugs.

What Foundry Local Solves

Foundry Local is a cross-platform end-to-end local AI runtime — a ~20MB SDK that handles model acquisition, hardware acceleration, and inference via ONNX Runtime. It has a Rust SDK that maps cleanly to Handy's architecture.

Here's what it addresses:

Problem	Before	With Foundry Local
Engine complexity	8 engine backends, ~2,500 lines of inference code	Single SDK, ~500 lines
GPU crashes	Manual Metal/Vulkan/DirectML/CUDA configuration	Automatic hardware detection (CPU/GPU/NPU)
Model downloads	Custom HTTP + SHA256 + tar.gz extraction	Built-in catalog with caching
Voice detection	Separate Silero VAD model (~15MB ONNX)	Handled internally by Foundry Local
CPU compatibility	AVX2/FMA3 crashes on older CPUs	Foundry Local handles feature detection
Configuration	Manual accelerator settings in UI	Zero configuration needed

Proof of Concept

We built a working integration on a feature/foundry-local branch that replaces the entire inference pipeline. The changes:

16 files changed, 2,659 lines removed, 466 lines added (80% code reduction in the core pipeline)
Removed all 8 engine-specific code paths in favor of one FoundryManager
Removed GPU/accelerator settings from the UI entirely - this is natively handled by Foundry Local.
Model catalog comes from Foundry Local (currently Whisper variants: tiny, base, small, medium, large-v3-turbo)
Audio capture (cpal + resampling) stays the same — captured audio is written to a temp WAV and passed to Foundry Local's AudioClient
Transcription works end-to-end: record → WAV → Foundry Local → text → paste

Where We'd Like to Collaborate

1. Expanding the Model Catalog

Foundry Local currently offers Whisper variants for speech-to-text. We'd love to see:

Parakeet models added to the Foundry Local catalog — Parakeet V3 is one of Handy's most popular models for European language support
Streaming/real-time models — We have done a lot of work recently on implementing real-time models, which are based on an optimized version of nemotron-speech-streaming-en-0.6b where we have been able to reduce the model size by 4X and increase speed by 3.6X without impacting WER.
Multilingual models with broader language coverage.
Wider breadth of hardware specific models - currently the models are CPU and CUDA but we are working on expanding this to WebGpu (which compiles to Metal on MacOS) and NPU.

2. Streaming Transcription

Foundry Local's AudioClient supports both batch (transcribe()) and streaming output (transcribe_streaming()) modes. We have recently created a LiveTranscription API that does true streaming of inputs (~300ms latency) using the nemotron model. Currently our integration prototype uses batch mode (record → stop → transcribe). We'd like to explore:

Live transcription — True streaming
Partial result display — showing text as it's being transcribed for immediate feedback

3. Production Bundling

One open question: Foundry Local's native libraries need to be bundled with the app for production distribution. We need to figure out the best approach for:

macOS .app bundles and .dmg distribution
Windows MSI/NSIS installers
Linux AppImage/deb/rpm packages

Note

Foundry Local is bundled with the app, which means users do not need to download/install another tool to use Handy.

4. Hardware Variant Selection

Foundry Local auto-selects the best hardware variant (CPU/GPU/NPU) for each model. In the future, as GPU variants become available for Whisper, it would be great to:

Show users which variant is being used (e.g., "Whisper Small — GPU accelerated")
Allow advanced users to override the auto-selection
Support NPU acceleration on compatible hardware (Qualcomm, Intel)

5. Add SLMs for post-processing

Foundry Local supports a number of SLMs (Qwen Family, DeepSeek, Mistral, Phi, etc) in the same SDK, for example:

let stream_messages: Vec<ChatCompletionRequestMessage> = vec![
        ChatCompletionRequestSystemMessage::from("You are a helpful assistant.").into(),
        ChatCompletionRequestUserMessage::from("Explain the borrow checker in two sentences.")
            .into(),
    ];

println!("\n--- Streaming completion ---");
print!("Assistant: ");
let mut stream = client
    .complete_streaming_chat(&stream_messages, None)
    .await?;
while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    if let Some(choice) = chunk.choices.first() {
        if let Some(ref content) = choice.delta.content {
            print!("{content}");
            io::stdout().flush().ok();
        }
    }
}
println!("\n");

There are probably some interesting scenarios where the SLM can do some post-processing like punctuation & grammar, denoising, domain specific fix ups.

Try It Out

If you want to experiment with the integration:

git checkout feature/foundry-local
bun install
CMAKE_POLICY_VERSION_MINIMUM=3.5 bun run tauri dev

Note

I have kept the branch in a fork: https://github.com/samuel100/Handy/tree/feature/foundry-local. Happy to create a PR.

The branch has a working implementation with model download, selection, transcription, and tray menu support. We'd love feedback on:

Transcription quality compared to the current whisper-cpp backend
Any platform-specific issues
Model catalog preferences — which models matter most to you?

Let's Collaborate

We think Foundry Local could significantly simplify Handy's codebase, eliminate an entire class of GPU/platform crashes, and open the door to new capabilities like NPU acceleration and streaming transcription. We'd love to work with both the Handy community and the Foundry Local team to make this happen.

What do you think? Would this be a direction you'd want to see Handy go?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Foundry Local Integration: Simplifying Handy's Speech-to-Text Pipeline #1295

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Foundry Local Integration: Simplifying Handy's Speech-to-Text Pipeline #1295

Uh oh!

samuel100 Apr 15, 2026

The Problem Today

What Foundry Local Solves

Proof of Concept

Where We'd Like to Collaborate

1. Expanding the Model Catalog

2. Streaming Transcription

3. Production Bundling

4. Hardware Variant Selection

5. Add SLMs for post-processing

Try It Out

Let's Collaborate

Replies: 0 comments

samuel100
Apr 15, 2026