-
Notifications
You must be signed in to change notification settings - Fork 0
Home
GPUParquetWriter is a high-performance Rust tool designed to rewrite and optimize Parquet files specifically for GPU-accelerated analytics. By transforming "CPU-optimized" layouts (default from DuckDB) into "GPU-optimized" formats, it unlocks the full potential of GPUDirect Storage and massive parallelism.
- More Pages
$\rightarrow$ High SM Utilization. - Larger Row Groups
$\rightarrow$ I/O Bandwidth Saturated. - Better Encoding
$\rightarrow$ Free Bandwidth Boost. - Save Unnecessary Compression
$\rightarrow$ GPU Available for Work.
To achieve the 129 GB/s throughput shown below, these files must be consumed by a parallelized, GPU-native reader. For example:
PystachIO: Efficient Distributed GPU Query Processing with PyTorch over Fast Networks & Fast Storage
This tool generated all optimized datasets for the paper.
Tested on NVIDIA A100 (40GB) with 4x NVMe SSDs (26 GB/s aggregate) using GPUDirect Storage.
The following plot demonstrates the cumulative performance gains of each optimization step, achieving a 129x speedup over the baseline.
| Optimization Stage | Throughput | Command Flag | Strategy/Value | Mechanism of Gain |
|---|---|---|---|---|
| CPU-Optimized Parquet | ~1.0 GB/s | generaged from DuckDB | None | Limited by small I/O & idle SMs. |
| + Rewrite # Pages | ~50 GB/s | --page-count |
200 |
SM Saturation (More active threads). |
| + Rewrite RG Size | ~70 GB/s | --row-group-precise-row-count |
10M |
I/O Saturation (Chunks >8MB). |
| + Better Encoding | ~115 GB/s | --encodings |
parquet_v2 |
Free Bandwidth (Data density). |
| + No Unnecessary Comp. | ~129 GB/s | --compression |
uncompressed |
Compute Freedom (Zero Overhead). |
Larger chunks reduce metadata overhead. Options are mutually exclusive:
-
--row-group-precise-row-count <N>(Recommended): Forces exactly N rows per group. -
--row-group-size <SIZE>: Sets approximate target size (e.g.,1G,256M). -
--row-group-count <N>: Splits file into N equal parts.
-
--encodings parquet_v2: (Default) Uses modern, efficient encodings (DELTA_BINARY_PACKED,BYTE_STREAM_SPLIT). -
--encodings parquet_v1: Restricts to legacy encodings (PLAIN,RLE_DICTIONARY). -
--encodings plain: Forces rawPLAINencoding.
Benchmarks multiple codecs per column to find the local optimum.
-
--compression lightweight: (Default) Testsuncompressedandsnappy. -
--compression optimal: Brute-force tests all supported algorithms (zstd,lz4,gzip,brotli). -
Custom List: e.g.,
--compression "uncompressed,zstd(3)"allows you to target specific algorithms.
# Basic Example
cargo run --release -- --input <FILE> --output <FILE> [OPTIONS]Essential Options:
-
--input,--output: Paths to source and destination files. -
--page-count <N>: Target pages per row group (Default:0[auto]). -
--row-group-precise-row-count <N>: Target exact rows per group. -
--compression <STRATEGY>:lightweight(default),optimal, or specific codecs. -
--encodings <STRATEGY>:parquet_v2(default),parquet_v1, orplain. -
--optimizer <STRATEGY>:brute-force(default) checks every combination;thresholduses heuristics for speed. -
--full-report-path <PATH>: Exports a CSV detailing the decision process for every column.
For full options: cargo run --release -- --help
# Line 1: CPU-Optimized Baseline from DuckDB
# Default generated by DuckDB TPC-H: https://duckdb.org/docs/stable/core_extensions/tpch
# then export the TPC-H tables to parquet with default config
# Line 2: Rewrite # Pages (Target 200 pages per RG)
./target/release/parquet-column-grained-rewriter \
--input raw.parquet --output step2_pages.parquet \
--page-count 200 \
--compression snappy --encodings parquet_v1 --optimizer threshold
# Line 3: Rewrite RG Size (Target 10M rows per RG)
./target/release/parquet-column-grained-rewriter \
--input raw.parquet --output step3_rg_size.parquet \
--page-count 200 --row-group-precise-row-count 10M \
--compression snappy --encodings parquet_v1 --optimizer threshold
# Line 4: Better Encoding (Parquet V2 with 0.0 threshold to force efficient encodings)
./target/release/parquet-column-grained-rewriter \
--input raw.parquet --output step4_encoding.parquet \
--page-count 200 --row-group-precise-row-count 10M \
--compression snappy --encodings parquet_v2 \
--optimizer threshold --encoding-threshold 0.0
# Line 5: No Useless Compression (Uncompressed + Parquet V2)
./target/release/parquet-column-grained-rewriter \
--input raw.parquet --output step5_optimal.parquet \
--page-count 200 --row-group-precise-row-count 10M \
--compression uncompressed --encodings parquet_v2 \
--optimizer threshold --compression-threshold 16.0