Skip to content

Commit 1de83f9

Browse files
authored
Fix OOM regression in _apply() for quantized models during inference (#13372)
Skip unnecessary clone of inference-mode tensors when already inside torch.inference_mode(), matching the existing guard in set_attr_param. The unconditional clone introduced in 20561aa caused transient VRAM doubling during model movement for FP8/quantized models.
1 parent 8f37471 commit 1de83f9

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

comfy/ops.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1151,7 +1151,7 @@ def _apply(self, fn, recurse=True): # This is to get torch.compile + moving wei
11511151
if param is None:
11521152
continue
11531153
p = fn(param)
1154-
if p.is_inference():
1154+
if (not torch.is_inference_mode_enabled()) and p.is_inference():
11551155
p = p.clone()
11561156
self.register_parameter(key, torch.nn.Parameter(p, requires_grad=False))
11571157
for key, buf in self._buffers.items():

0 commit comments

Comments
 (0)