VULKAN: Decline SW framebuffer when linear image has padded rowPitch

LibretroAdmin · LibretroAdmin · commit 0e2ea0666782 · 2026-04-25T02:29:28.000+02:00
When the linear-tiled VkImage backing the per-frame swapchain texture
ends up with a row pitch wider than width*bpp, vulkan_get_current_sw_framebuffer
now returns false instead of handing the core a directly mapped pointer.
The core falls back to its own tightly packed buffer, which vulkan_frame()
already uploads row-by-row via the existing slow path.

Why
---
RetroArch's pattern (host-write into a HOST_VISIBLE linear-tiled image at
vkGetImageSubresourceLayout's rowPitch, transition PREINITIALIZED -&gt; GENERAL,
sample) is fully spec-correct and works on every conformant Vulkan driver.

On MoltenVK it doesn't always work. Linear images are backed by a buffer-
backed MTLTexture, and Apple requires bytesPerRow alignment of 64 bytes on
Apple GPUs (256 on the simulator). For widths whose tight pitch isn't
already aligned (e.g. the 2048 core at 376x444 XRGB8888, where 376*4 = 1504
gets padded up to 1536), the host writes and GPU sampling go through
different paths in MoltenVK's MVKImage and produce a diagonal shear.

The check
---
A pure runtime test: 'is rowPitch wider than width*bpp?'. On Mesa, NVIDIA,
AMD, ARM Mali, Qualcomm Adreno etc., linear images with sampled+transfer_src
usage at retro-friendly widths report rowPitch == width*bpp, so the test is
false and the existing direct-write fast path is taken unchanged. Only
MoltenVK at awkward widths takes the fallback, paying one extra row-by-row
memcpy per frame (~40 MB/s at 60 fps for 376x444x4 - negligible).

Fixes 2048 core rendering on iOS Vulkan.
diff --git a/gfx/drivers/vulkan.c b/gfx/drivers/vulkan.c
@@ -7340,6 +7340,26 @@ static bool vulkan_get_current_sw_framebuffer(void *data,
       }
    }
 
+   /* If the driver picked a row pitch wider than width*bpp for this linear
+    * image (i.e. the image has trailing per-row padding), decline the
+    * direct-write SW framebuffer and let the core fall back to its own
+    * tightly packed buffer. vulkan_frame() will then upload it row-by-row
+    * via the slow path. We hit this on MoltenVK / Apple GPUs, where
+    * buffer-backed MTLTextures require bytesPerRow to be aligned to 64 (or
+    * 256 in the simulator), so vkGetImageSubresourceLayout reports a
+    * padded rowPitch for "awkward" widths. Spec-correct host writes at the
+    * reported rowPitch *should* be readable by the GPU sampler at the same
+    * stride on any conformant driver, but in practice this is fragile on
+    * Apple platforms and produces sheared output. The check is a pure
+    * runtime test, so non-Apple drivers that report rowPitch == width*bpp
+    * (the overwhelming majority for retro-friendly widths) keep the
+    * direct-write fast path unchanged. */
+   {
+      unsigned bpp = vulkan_format_to_bpp(chain->texture.format);
+      if (chain->texture.stride != (size_t)framebuffer->width * bpp)
+         return false;
+   }
+
    framebuffer->data         = chain->texture.mapped;
    framebuffer->pitch        = chain->texture.stride;
    framebuffer->format       = vk->video.rgb32