libretro-common/audio/audio_mixer: fix 32 KiB stack overflow in FLAC mixer path

LibretroAdmin · LibretroAdmin · commit 552cf85e51b4 · 2026-04-24T15:35:26.000+02:00
audio_mixer_mix_flac's drflac read was requesting the wrong
unit, causing drflac to write twice as many floats into
temp_buffer as the buffer holds.

=== The bug ===

    float temp_buffer[AUDIO_MIXER_TEMP_BUFFER];  /* 8192 floats */
    ...
    temp_samples = (unsigned)drflac_read_pcm_frames_f32(
          voice-&gt;types.flac.stream,
          AUDIO_MIXER_TEMP_BUFFER,    /* asks for 8192 FRAMES */
          temp_buffer);

drflac_read_pcm_frames_f32's second parameter is framesToRead,
and it writes framesToRead * channels floats into the output
buffer.  For a stereo FLAC (by far the most common case) that's
2 * 8192 = 16384 floats - a 32 KiB stack overflow clobbering
whatever sat above temp_buffer on the stack (usually struct
resampler_data info, and on top of that the saved frame
pointer / return address).

Reachable on every mix tick while a stereo FLAC voice is
active in the audio mixer:

  - rcheevos unlock sound (common; plays on every achievement)
  - menu BGM (plays continuously when set to a FLAC asset)
  - user-loaded audio via the audio mixer playlist
  - any core that triggers mixer-side audio

The crash symptom varies - could be corrupted resampler state,
a jumped-to-NULL return when the saved frame pointer is
clobbered, or 'just' a security-relevant stack-memory
corruption (the input bytes are FLAC-decoded data, so
attacker-controllable via a crafted FLAC file played through
the mixer).

=== Downstream confusion under the old convention ===

Even ignoring the stack overflow, the code below the drflac
call treats temp_samples as a float count:

    info.input_frames = temp_samples / 2;    /* frames */
    ...
    memcpy(voice-&gt;types.flac.buffer, temp_buffer,
           temp_samples * sizeof(float));     /* float count */

Under the old convention where temp_samples = frame count (the
drflac return value as-is), the '/ 2' is wrong (input_frames
becomes half the actual frames) and the memcpy only copies
half the decoded bytes.  In practice the stack overflow
crashed the process before any of that mattered.

=== The fix: mirror the sibling MP3 mix path ===

audio_mixer_mix_mp3 already does this correctly:

    temp_samples = (unsigned)drmp3_read_f32(
          &amp;voice-&gt;types.mp3.stream,
          AUDIO_MIXER_TEMP_BUFFER / 2,    /* frames to read */
          temp_buffer) * 2;               /* convert to floats */

audio_mixer_mix_ogg also does this correctly:

    temp_samples = stb_vorbis_get_samples_float_interleaved(
          voice-&gt;types.ogg.stream, 2, temp_buffer,
          AUDIO_MIXER_TEMP_BUFFER) * 2;

Apply the same pattern to FLAC:

    temp_samples = (unsigned)drflac_read_pcm_frames_f32(
          voice-&gt;types.flac.stream,
          AUDIO_MIXER_TEMP_BUFFER / 2, temp_buffer) * 2;

Now:
  - Stereo FLAC: drflac writes 2 * (TEMP_BUFFER/2) = TEMP_BUFFER
    floats into temp_buffer - fits exactly.  temp_samples
    correctly holds the interleaved-float count, and the
    downstream '/ 2' / memcpy byte length are correct.
  - Stack overflow eliminated.

=== Mono / multi-channel FLAC ===

For mono FLAC the '* 2' overstates the float count by 2x, so
the memcpy / resampler reads uninitialised stack bytes past
the actual decoded data.  That matches the pre-existing
implicit stereo-only assumption baked into the mixer (voice-&gt;
types.flac.buf_samples = TEMP_BUFFER * ratio with no
per-channel adjustment, downstream mixing loop assumes stereo
interleaving).  The MP3 and OGG sibling paths have the same
limitation.

Proper mono / multi-channel support would need per-voice
channel count tracking, a channel-aware buf_samples
calculation, and a stereo-broadcast or downmix step for the
non-stereo cases.  Out of scope for this fix which is
specifically addressing the stack overflow.  Nothing in the
wild that I'm aware of plays a mono FLAC through the mixer -
the failure mode changes from 'stack-corrupting stereo crash
that everyone hits' to 'undefined-data mono output that no
one hits', which is a strict improvement.

=== Scope ===

Single-file, 1-line behavioural change (plus comment).  No
API changes, no header changes.  Voice lifecycle, buffer
allocation sizes, resampler setup - all unchanged.

Thread-safety: audio_mixer_mix_flac runs on the audio mixer
thread.  No new concurrency introduced.

Reachability: immediate on any stereo FLAC playback.  The
32 KiB write was unconditional per mix tick; the reason this
hasn't been reported as an obvious crash is that
AUDIO_MIXER_TEMP_BUFFER sits near the top of a typical stack
frame so the overflow often lands in unused stack pages
rather than corrupting live data.  Builds with stack
guards or shallower stacks (embedded targets, threads with
smaller stacks) are more likely to crash visibly.
diff --git a/libretro-common/audio/audio_mixer.c b/libretro-common/audio/audio_mixer.c
@@ -1202,7 +1202,42 @@ static void audio_mixer_mix_flac(float* buffer, size_t num_frames,
    if (voice->types.flac.position == voice->types.flac.samples)
    {
 again:
-      temp_samples = (unsigned)drflac_read_pcm_frames_f32( voice->types.flac.stream, AUDIO_MIXER_TEMP_BUFFER, temp_buffer);
+      /* drflac_read_pcm_frames_f32 takes a frame count and
+       * writes frame_count * channel_count floats into the
+       * output buffer.  Request at most AUDIO_MIXER_TEMP_BUFFER
+       * / 2 frames so a stereo FLAC fills temp_buffer[AUDIO_
+       * MIXER_TEMP_BUFFER] exactly, then multiply the return
+       * value by 2 to convert frame count to interleaved float
+       * count.  This matches the convention the downstream code
+       * (info.input_frames = temp_samples / 2, memcpy length
+       * of temp_samples * sizeof(float)) and the sibling mp3
+       * and ogg mix paths already use.
+       *
+       * Pre-patch this passed AUDIO_MIXER_TEMP_BUFFER as the
+       * frame count without the '/ 2' and stored the return as
+       * 'temp_samples' without the '* 2'.  For a stereo FLAC
+       * (by far the most common case) drflac wrote 2 *
+       * AUDIO_MIXER_TEMP_BUFFER = 16384 floats into a 8192-
+       * float stack buffer - a 32 KiB stack overflow.  Any
+       * stereo FLAC asset played through the mixer (cheevo
+       * unlock sounds, menu BGM, user-loaded content via the
+       * audio mixer playlist, etc.) corrupted the stack on
+       * every mix tick.  The downstream 'temp_samples / 2'
+       * and memcpy length were also off by 2x under the old
+       * convention, but the stack overflow hit first.
+       *
+       * For mono FLAC: the '* 2' overstates the float count by
+       * 2x, causing the memcpy / resampler to read
+       * uninitialised stack past the actual data.  That matches
+       * the pre-existing implicit stereo-only assumption of
+       * the FLAC/MP3/OGG mixer paths (voice->types.flac.
+       * buf_samples is sized as TEMP_BUFFER * ratio, with no
+       * per-channel adjustment) rather than introducing new
+       * mono handling here.  Fixing mono playback is a separate
+       * change. */
+      temp_samples = (unsigned)drflac_read_pcm_frames_f32(
+            voice->types.flac.stream,
+            AUDIO_MIXER_TEMP_BUFFER / 2, temp_buffer) * 2;
       if (temp_samples == 0)
       {
          if (voice->repeat)