You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've been exploring integrating Microsoft Foundry Local as the speech-to-text backend for Handy, and we'd love to share what we've learned and collaborate with the community on taking this further.
The Problem Today
Handy currently maintains an inference pipeline with 8 different engine backends (whisper-cpp, Parakeet, Moonshine, SenseVoice, GigaAM, Canary, Cohere), each with its own loading code, inference parameters, and platform-specific GPU acceleration paths (Metal on macOS, Vulkan on Linux, DirectML on Windows). On top of that, there's a custom HTTP model download system with resumable transfers, SHA256 verification, and tar.gz extraction, plus a separate Silero VAD model for voice activity detection.
This complexity is the root cause of many of the crash bugs the community has reported:
Each engine has its own hardware detection, its own failure modes, and its own quirks. That's a lot of surface area for bugs.
What Foundry Local Solves
Foundry Local is a cross-platform end-to-end local AI runtime — a ~20MB SDK that handles model acquisition, hardware acceleration, and inference via ONNX Runtime. It has a Rust SDK that maps cleanly to Handy's architecture.
Here's what it addresses:
Problem
Before
With Foundry Local
Engine complexity
8 engine backends, ~2,500 lines of inference code
Single SDK, ~500 lines
GPU crashes
Manual Metal/Vulkan/DirectML/CUDA configuration
Automatic hardware detection (CPU/GPU/NPU)
Model downloads
Custom HTTP + SHA256 + tar.gz extraction
Built-in catalog with caching
Voice detection
Separate Silero VAD model (~15MB ONNX)
Handled internally by Foundry Local
CPU compatibility
AVX2/FMA3 crashes on older CPUs
Foundry Local handles feature detection
Configuration
Manual accelerator settings in UI
Zero configuration needed
Proof of Concept
We built a working integration on a feature/foundry-local branch that replaces the entire inference pipeline. The changes:
16 files changed, 2,659 lines removed, 466 lines added (80% code reduction in the core pipeline)
Removed all 8 engine-specific code paths in favor of one FoundryManager
Removed GPU/accelerator settings from the UI entirely - this is natively handled by Foundry Local.
Model catalog comes from Foundry Local (currently Whisper variants: tiny, base, small, medium, large-v3-turbo)
Audio capture (cpal + resampling) stays the same — captured audio is written to a temp WAV and passed to Foundry Local's AudioClient
Transcription works end-to-end: record → WAV → Foundry Local → text → paste
Where We'd Like to Collaborate
1. Expanding the Model Catalog
Foundry Local currently offers Whisper variants for speech-to-text. We'd love to see:
Parakeet models added to the Foundry Local catalog — Parakeet V3 is one of Handy's most popular models for European language support
Streaming/real-time models — We have done a lot of work recently on implementing real-time models, which are based on an optimized version of nemotron-speech-streaming-en-0.6b where we have been able to reduce the model size by 4X and increase speed by 3.6X without impacting WER.
Multilingual models with broader language coverage.
Wider breadth of hardware specific models - currently the models are CPU and CUDA but we are working on expanding this to WebGpu (which compiles to Metal on MacOS) and NPU.
2. Streaming Transcription
Foundry Local's AudioClient supports both batch (transcribe()) and streaming output (transcribe_streaming()) modes. We have recently created a LiveTranscription API that does true streaming of inputs (~300ms latency) using the nemotron model. Currently our integration prototype uses batch mode (record → stop → transcribe). We'd like to explore:
Live transcription — True streaming
Partial result display — showing text as it's being transcribed for immediate feedback
3. Production Bundling
One open question: Foundry Local's native libraries need to be bundled with the app for production distribution. We need to figure out the best approach for:
macOS .app bundles and .dmg distribution
Windows MSI/NSIS installers
Linux AppImage/deb/rpm packages
Note
Foundry Local is bundled with the app, which means users do not need to download/install another tool to use Handy.
4. Hardware Variant Selection
Foundry Local auto-selects the best hardware variant (CPU/GPU/NPU) for each model. In the future, as GPU variants become available for Whisper, it would be great to:
Show users which variant is being used (e.g., "Whisper Small — GPU accelerated")
Allow advanced users to override the auto-selection
Support NPU acceleration on compatible hardware (Qualcomm, Intel)
5. Add SLMs for post-processing
Foundry Local supports a number of SLMs (Qwen Family, DeepSeek, Mistral, Phi, etc) in the same SDK, for example:
let stream_messages:Vec<ChatCompletionRequestMessage> = vec![ChatCompletionRequestSystemMessage::from("You are a helpful assistant.").into(),ChatCompletionRequestUserMessage::from("Explain the borrow checker in two sentences.").into(),];println!("\n--- Streaming completion ---");print!("Assistant: ");letmut stream = client
.complete_streaming_chat(&stream_messages,None).await?;whileletSome(chunk) = stream.next().await{let chunk = chunk?;ifletSome(choice) = chunk.choices.first(){ifletSome(ref content) = choice.delta.content{print!("{content}");
io::stdout().flush().ok();}}}println!("\n");
There are probably some interesting scenarios where the SLM can do some post-processing like punctuation & grammar, denoising, domain specific fix ups.
Try It Out
If you want to experiment with the integration:
git checkout feature/foundry-local
bun install
CMAKE_POLICY_VERSION_MINIMUM=3.5 bun run tauri dev
The branch has a working implementation with model download, selection, transcription, and tray menu support. We'd love feedback on:
Transcription quality compared to the current whisper-cpp backend
Any platform-specific issues
Model catalog preferences — which models matter most to you?
Let's Collaborate
We think Foundry Local could significantly simplify Handy's codebase, eliminate an entire class of GPU/platform crashes, and open the door to new capabilities like NPU acceleration and streaming transcription. We'd love to work with both the Handy community and the Foundry Local team to make this happen.
What do you think? Would this be a direction you'd want to see Handy go?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone 👋
We've been exploring integrating Microsoft Foundry Local as the speech-to-text backend for Handy, and we'd love to share what we've learned and collaborate with the community on taking this further.
The Problem Today
Handy currently maintains an inference pipeline with 8 different engine backends (whisper-cpp, Parakeet, Moonshine, SenseVoice, GigaAM, Canary, Cohere), each with its own loading code, inference parameters, and platform-specific GPU acceleration paths (Metal on macOS, Vulkan on Linux, DirectML on Windows). On top of that, there's a custom HTTP model download system with resumable transfers, SHA256 verification, and tar.gz extraction, plus a separate Silero VAD model for voice activity detection.
This complexity is the root cause of many of the crash bugs the community has reported:
Each engine has its own hardware detection, its own failure modes, and its own quirks. That's a lot of surface area for bugs.
What Foundry Local Solves
Foundry Local is a cross-platform end-to-end local AI runtime — a ~20MB SDK that handles model acquisition, hardware acceleration, and inference via ONNX Runtime. It has a Rust SDK that maps cleanly to Handy's architecture.
Here's what it addresses:
Proof of Concept
We built a working integration on a
feature/foundry-localbranch that replaces the entire inference pipeline. The changes:FoundryManagerAudioClientWhere We'd Like to Collaborate
1. Expanding the Model Catalog
Foundry Local currently offers Whisper variants for speech-to-text. We'd love to see:
2. Streaming Transcription
Foundry Local's
AudioClientsupports both batch (transcribe()) and streaming output (transcribe_streaming()) modes. We have recently created a LiveTranscription API that does true streaming of inputs (~300ms latency) using the nemotron model. Currently our integration prototype uses batch mode (record → stop → transcribe). We'd like to explore:3. Production Bundling
One open question: Foundry Local's native libraries need to be bundled with the app for production distribution. We need to figure out the best approach for:
.appbundles and.dmgdistributionNote
Foundry Local is bundled with the app, which means users do not need to download/install another tool to use Handy.
4. Hardware Variant Selection
Foundry Local auto-selects the best hardware variant (CPU/GPU/NPU) for each model. In the future, as GPU variants become available for Whisper, it would be great to:
5. Add SLMs for post-processing
Foundry Local supports a number of SLMs (Qwen Family, DeepSeek, Mistral, Phi, etc) in the same SDK, for example:
There are probably some interesting scenarios where the SLM can do some post-processing like punctuation & grammar, denoising, domain specific fix ups.
Try It Out
If you want to experiment with the integration:
Note
I have kept the branch in a fork: https://github.com/samuel100/Handy/tree/feature/foundry-local. Happy to create a PR.
The branch has a working implementation with model download, selection, transcription, and tray menu support. We'd love feedback on:
Let's Collaborate
We think Foundry Local could significantly simplify Handy's codebase, eliminate an entire class of GPU/platform crashes, and open the door to new capabilities like NPU acceleration and streaming transcription. We'd love to work with both the Handy community and the Foundry Local team to make this happen.
What do you think? Would this be a direction you'd want to see Handy go?
Beta Was this translation helpful? Give feedback.
All reactions