Fix Warp band-mismatch crash on 4-band rasters with implicit alpha#41
Draft
josemaria-vilaplana wants to merge 2 commits intomasterfrom
Draft
Fix Warp band-mismatch crash on 4-band rasters with implicit alpha#41josemaria-vilaplana wants to merge 2 commits intomasterfrom
josemaria-vilaplana wants to merge 2 commits intomasterfrom
Conversation
When the source raster has photometric metadata that doesn't add up cleanly
(e.g. SamplesPerPixel != color channels + ExtraSamples), GDAL Warp implicitly
adds an alpha mask band to the destination. create_tile_ds was allocating
exactly src.RasterCount bands, so Warp raised:
RuntimeError: Destination dataset has N bands, but at least N+1 are needed
That worker crash also tripped a secondary AttributeError in create_metadata
when the parent process tried to dereference the missing stats result.
This patch:
- Adds extra_bands to create_tile_ds so callers can reserve room for the
alpha mask GDAL Warp may add.
- Adds band_count to read_raster_data_stats so the alpha band is excluded
from the parquet output (only the original data bands are persisted).
- Wraps the source-zoom Warp in warp_source_into_tile_with_alpha_fallback,
which retries once with extra_bands=1 on RuntimeError. The result is a
sticky needs_extra_band flag — set on first failure, propagated to every
subsequent tile of the same raster — so the retry runs at most once per
raster, never per-tile.
- Applies the same logic to the parallel _read_raster_worker.
Healthy rasters take the unchanged fast path. Affected rasters pay one extra
Warp call total, regardless of pyramid size.
Reproducer: any 4-band Byte GeoTIFF where Photometric is set to a single
color channel plus ExtraSamples set to Undefined for the remaining bands
(e.g. Gray + 3x Undefined), or any RGBA where Warp's alpha auto-detection
fires.
Tracking: sc-547263 (BBVA import support ticket).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Builds a synthetic 4-band Byte GeoTIFF with photometric / SamplesPerPixel mismatch (1 Gray color channel + 3 Undefined ExtraSamples, noDataValue=255) and asserts that convert_to_raquet_files writes at least one data tile to the output parquet. The synthetic GeoTIFF builder runs in a spawn'd subprocess so osgeo never loads in the test parent process — pyarrow and osgeo collide on the 'file' filesystem registration when loaded in the same interpreter (apache/arrow#44696). Without the create_tile_ds extra_bands fix, the worker process crashes with `RuntimeError: Destination dataset has 4 bands, but at least 5 are needed` and the parent silently emits a parquet with the metadata row but zero data rows — assertion `len(table) > 1` catches that regression mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
|
The CI failure here is in the Check formatting (ruff) step, not unit tests — the unit tests pass ✅. The 9 ruff errors are all in files this PR doesn't touch ( Categories:
Happy to fix these in a separate cleanup PR if you'd like, or land this one with a CI override and clean up afterwards. Wanted to flag rather than mix the lint cleanup into a bug fix. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a source raster has photometric metadata that doesn't add up cleanly (e.g.
SamplesPerPixel != color channels + ExtraSamples, or RGBA inputs with ambiguous alpha tagging), GDAL Warp implicitly adds an alpha mask band to the destination.create_tile_dswas allocating exactlysrc.RasterCountbands, so Warp raised:The worker process died on that exception. Depending on raquet's version, the parent process either propagated
AttributeError: 'NoneType' object has no attribute '__dict__'(whencreate_metadatatried to read missing stats) or silently produced a parquet with the metadata row but zero data tiles — both end with no usable output.This was hit in production by a customer (CARTO/cloud-native) importing a 4-band Byte GeoTIFF whose
PhotometricwasGray(1 color channel) butSamplesPerPixelwas 4, with the remaining 3 bands taggedExtraSamples=Unspecified.Approach
Strategy: retry once with an extra band, then make the choice sticky for the rest of the raster. Detection by failure rather than upfront introspection — GDAL's alpha heuristics drift across versions, so destination band-count is the only signal that's stable.
create_tile_dsaccepts a newextra_bands: int = 0so callers can reserve one extra band for the alpha mask Warp may add.read_raster_data_statsaccepts a newband_count: int | None = Noneso the alpha band is excluded from the parquet output (only the original data bands are persisted).warp_source_into_tile_with_alpha_fallback, which catchesRuntimeErrorand retries once withextra_bands=1. The result returns aneeds_extra_bandflag the caller keeps sticky across tiles in the same raster — so the retry happens at most once per raster, never per-tile._read_raster_worker.Cost model
create_tile_dsallocatesRasterCountbands, Warp succeeds, done.RasterCount + 1from the start.Validation
tests/test_geotiff2raquet.py::TestGeotiff2Raquet::test_4band_byte_alpha_mismatch— new regression test that builds a synthetic 4-band Byte GeoTIFF matching the customer shape (Photometric=Gray + 3x Undefined,noDataValue=255) and asserts the parquet output contains at least one data tile. Verified that this test fails onmasterwithout the fix (parquet has only the metadata row,len(table) == 1) and passes with the fix (len(table) == 19, full pyramid).spawn'd subprocess so osgeo never loads in the test parent — pyarrow and osgeo collide on the'file'filesystem registration when loaded in the same interpreter ([Python][C++]ArrowKeyError: Attempted to register factory for scheme 'file'when using pip-installed GDAL apache/arrow#44696).tests/test_geotiff2raquet.py: 17/17 pass with the fix.test_earthengine.pyandtest_imageserver.pywhich need external creds): 46/47 pass. The one failure (test_cli.py::TestVersion::test_version) is pre-existing and reproduces on cleanmaster— it's an environment quirk whereimportlib.metadatacan't find theraquet-iodistribution when running viaPYTHONPATHwithoutpip install.Notes
RuntimeErrorcatch" inwarp_source_into_tile_with_alpha_fallbackis intentional: any first-tile failure triggers a single retry with the wider destination, and if that also fails the second exception is propagated as-is. Worst case for an unrelated RuntimeError is one extra Warp call before the real error surfaces.read_raster_data_stats(it only iteratesband_countbands). If a future user wants alpha-aware handling, that would be an opt-in extension on top of this patch — not affected by this change.convert_to_raquet_filesswallows worker-death and produces an empty-but-valid parquet (the symptom that the new regression test catches vialen(table) > 1). Even with this PR's fix in place, the parent could benefit from propagating worker exceptions instead of producing zero-data outputs. Out of scope here, worth a follow-up.