This is essentially the same issue as unslothai/unsloth#2749 .
Pull request #254 claims to have fixed this, but the implementation does not seem to solve the core issue. While the pull request adds support for 16-bit base models and mxfp4 native models, it does not address the problem that the original poster in #2749 and others were facing: saving a bnb-4bit model like Qwen3-8B-bnb-4int with merged_16bit.
Currently, it is impossible to use save_pretrained_merged with merged_16bit for bnb-4bit models like unsloth/gemma-3-27b-it-bnb-4bit and unsloth/Mistral-Small-3.2-24B-Instruct-2506-unsloth-bnb-4bit due to the following code in saving_utils.py:
if base_model_is_quantized and (quant_type == "nf4" or quant_type == "fp4") and save_method == "merged_16bit":
warnings.warn("Base model should be a 16bits or mxfp4 base model for a 16bit model merge. Use `save_method=forced_merged_4bit` instead")
return None
Since bnb-4bit models use nf4 quantization, this condition always evaluates to true, causing the function to return None and fail the merge operation.
Does this mean there is currently no way to save bnb-4bit models as 16-bit merged models, or is this feature simply not implemented yet?
This is essentially the same issue as unslothai/unsloth#2749 .
Pull request #254 claims to have fixed this, but the implementation does not seem to solve the core issue. While the pull request adds support for 16-bit base models and mxfp4 native models, it does not address the problem that the original poster in #2749 and others were facing: saving a bnb-4bit model like
Qwen3-8B-bnb-4intwithmerged_16bit.Currently, it is impossible to use
save_pretrained_mergedwithmerged_16bitfor bnb-4bit models likeunsloth/gemma-3-27b-it-bnb-4bitandunsloth/Mistral-Small-3.2-24B-Instruct-2506-unsloth-bnb-4bitdue to the following code insaving_utils.py:Since bnb-4bit models use
nf4quantization, this condition always evaluates to true, causing the function to returnNoneand fail the merge operation.Does this mean there is currently no way to save bnb-4bit models as 16-bit merged models, or is this feature simply not implemented yet?