Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Include/cpython/longintrepr.h
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,11 @@ _PyLong_CompactValue(const PyLongObject *op)
assert(PyType_HasFeature(op->ob_base.ob_type, Py_TPFLAGS_LONG_SUBCLASS));
assert(PyUnstable_Long_IsCompact(op));
sign = 1 - (op->long_value.lv_tag & _PyLong_SIGN_MASK);
if (sign == 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not it emit a warning or generate inefficient code in non-debug build?

You can write assert(size != 0 || ...).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that sane compiler should recognize here no-op. I've tested assembly code for a toy program with gcc -O2.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In release mode, assertions are removed and so the code becomes if (size == 0) {}. C compilers are good to remove such dead code. I looked at the assembly code of Python built in release mode, and I cannot see any instruction related to if (size == 0).

gcc -O3 doesn't emit a warning on such code.

// gh-147988: Make sure that the digit is zero.
// It helps detecting the usage of uninitialized digits.
assert(op->long_value.ob_digit[0] == 0);
}
return sign * (Py_ssize_t)op->long_value.ob_digit[0];
}

Expand Down
35 changes: 28 additions & 7 deletions Objects/longobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -187,9 +187,11 @@ long_alloc(Py_ssize_t size)
_PyObject_Init((PyObject*)result, &PyLong_Type);
}
_PyLong_SetSignAndDigitCount(result, size != 0, size);
/* The digit has to be initialized explicitly to avoid
* use-of-uninitialized-value. */
result->long_value.ob_digit[0] = 0;
#ifdef Py_DEBUG
// gh-147988: Fill digits with an invalid pattern to catch usage
// of uninitialized digits.
memset(result->long_value.ob_digit, 0xFF, ndigits * sizeof(digit));
#endif
return result;
}

Expand Down Expand Up @@ -1094,6 +1096,7 @@ _PyLong_FromByteArray(const unsigned char* bytes, size_t n,
int sign = is_signed ? -1: 1;
if (idigit == 0) {
sign = 0;
v->long_value.ob_digit[0] = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed in non-debug build?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As described in the issue, MSAN generates a memory error without this line, since _PyLong_CompactValue() uses uninitialized memory (ob_digit[0]).

If you ignore MSAN (and other sanitizers), maybe_small_long() returns the zero singleton, and _PyLong_CompactValue() returns 0 because it computes sign * digit where sign is 0 and digit is a random number.

This PR moves v->long_value.ob_digit[0] = 0; from long_alloc() to _PyLong_FromByteArray(), and make it conditional: only write ob_digit[0] if (idigit == 0) is true. So in the common case, the assignment is no longer done.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, we could live without all this. But I think it's a good idea to set digit[0] for zero.

Historically, we set number of digits to zero for zero. I think it should be 1 to reflect actual memory allocation. Then zero representation will make sense in both "big int" format (sign + magnitude as 1-element array) and as a "small int" (sign*digit[0])

}
_PyLong_SetSignAndDigitCount(v, sign, idigit);
return (PyObject *)maybe_small_long(long_normalize(v));
Expand Down Expand Up @@ -2852,6 +2855,7 @@ long_from_non_binary_base(const char *start, const char *end, Py_ssize_t digits,
*res = NULL;
return 0;
}
z->long_value.ob_digit[0] = 0;
_PyLong_SetSignAndDigitCount(z, 0, 0);

/* `convwidth` consecutive input digits are treated as a single
Expand Down Expand Up @@ -3365,6 +3369,7 @@ x_divrem(PyLongObject *v1, PyLongObject *w1, PyLongObject **prem)
*prem = NULL;
return NULL;
}
a->long_value.ob_digit[0] = 0;
v0 = v->long_value.ob_digit;
w0 = w->long_value.ob_digit;
wm1 = w0[size_w-1];
Expand Down Expand Up @@ -4141,10 +4146,6 @@ k_mul(PyLongObject *a, PyLongObject *b)
/* 1. Allocate result space. */
ret = long_alloc(asize + bsize);
if (ret == NULL) goto fail;
#ifdef Py_DEBUG
/* Fill with trash, to catch reference to uninitialized digits. */
memset(ret->long_value.ob_digit, 0xDF, _PyLong_DigitCount(ret) * sizeof(digit));
#endif

/* 2. t1 <- ah*bh, and copy into high digits of result. */
if ((t1 = k_mul(ah, bh)) == NULL) goto fail;
Expand Down Expand Up @@ -5633,6 +5634,12 @@ long_bitwise(PyLongObject *a,
Py_UNREACHABLE();
}

if ((size_z + negz) == 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is equivalent to size_z == 0 & !negz, right? Why write the condition in such form? Why add this if? Is it needed in non-debug build?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is equivalent to size_z == 0 & !negz, right? Why write the condition in such form?

My intent is to add a special case for long_alloc(0) on long_alloc(size_z + negz) below.

Why add this if?

The alternative is to add z->long_value.ob_digit[0] = 0; after long_alloc() below.

Do you prefer adding z->long_value.ob_digit[0] = 0; ?

Is it needed in non-debug build?

See my reply on _PyLong_FromByteArray() change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it does not matter.

Py_XDECREF(new_a);
Py_XDECREF(new_b);
return get_small_int(0);
}

/* We allow an extra digit if z is negative, to make sure that
the final two's complement of z doesn't overflow. */
z = long_alloc(size_z + negz);
Expand Down Expand Up @@ -6951,6 +6958,20 @@ PyLongWriter_Finish(PyLongWriter *writer)
PyLongObject *obj = (PyLongObject *)writer;
assert(Py_REFCNT(obj) == 1);

#ifdef Py_DEBUG
// gh-147988: Detect uninitialized digits:
// long_alloc() fills digits with 0xFF byte pattern.
Py_ssize_t ndigits = _PyLong_DigitCount(obj);
if (ndigits == 0) {
// Check ob_digit[0] digit for the number zero
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not we simply assert obj->long_value.ob_digit[0] == 0?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the code to raise SystemError, so there is now more complex code below.

ndigits = 1;
}
for (Py_ssize_t i=0; i < ndigits; i++) {
Comment thread
vstinner marked this conversation as resolved.
Outdated
digit d = obj->long_value.ob_digit[i];
assert(d < PyLong_BASE);
}
#endif

// Normalize and get singleton if possible
obj = maybe_small_long(long_normalize(obj));

Expand Down
Loading