Skip to content

Commit e4c8ce2

Browse files
committed
Optimize str.translate lookup for ASCII characters
This commit introduces two optimizations to charmaptranslate_lookup: 1. Use _PyLong_FromUnsignedChar for characters 0-255 instead of PyLong_FromLong. This uses the small int singleton cache, avoiding memory allocation and deallocation for the key object. 2. Use PyDict_GetItemRef directly when the mapping is a dict, instead of the more general PyMapping_GetOptionalItem. This avoids the overhead of the generic mapping protocol. These optimizations reduce instruction count by approximately 7.4% for ASCII translation workloads (measured with callgrind). Note: For the specific use case of PEP 503 normalization (lowercase + character replacement), str.lower().replace().replace() is still faster than str.translate() because it uses specialized C code paths that avoid dictionary lookups entirely. However, these optimizations help str.translate() performance for general use cases.
1 parent 7ca9e7a commit e4c8ce2

1 file changed

Lines changed: 24 additions & 4 deletions

File tree

Objects/unicodeobject.c

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9115,13 +9115,33 @@ unicode_translate_call_errorhandler(const char *errors,
91159115
static int
91169116
charmaptranslate_lookup(Py_UCS4 c, PyObject *mapping, PyObject **result, Py_UCS4 *replace)
91179117
{
9118-
PyObject *w = PyLong_FromLong((long)c);
9118+
PyObject *w;
91199119
PyObject *x;
9120+
int rc;
91209121

9121-
if (w == NULL)
9122-
return -1;
9123-
int rc = PyMapping_GetOptionalItem(mapping, w, &x);
9122+
/* Optimization: For characters 0-255, use the small int singleton cache.
9123+
This avoids memory allocation and deallocation for the key object.
9124+
_PyLong_FromUnsignedChar returns an immortal singleton that doesn't
9125+
need to be decref'd (but we do it anyway for code simplicity). */
9126+
if (c < 256) {
9127+
w = _PyLong_FromUnsignedChar((unsigned char)c);
9128+
/* w is an immortal singleton, but we handle it uniformly below */
9129+
}
9130+
else {
9131+
w = PyLong_FromLong((long)c);
9132+
if (w == NULL)
9133+
return -1;
9134+
}
9135+
9136+
/* Fast path for dict mappings: use PyDict_GetItemRef directly */
9137+
if (PyDict_CheckExact(mapping)) {
9138+
rc = PyDict_GetItemRef(mapping, w, &x);
9139+
}
9140+
else {
9141+
rc = PyMapping_GetOptionalItem(mapping, w, &x);
9142+
}
91249143
Py_DECREF(w);
9144+
91259145
if (rc == 0) {
91269146
/* No mapping found means: use 1:1 mapping. */
91279147
*result = NULL;

0 commit comments

Comments
 (0)