gh-48181: Document codecs.charmap_build#135997
Conversation
|
The details are a bit more involved. The function returns a dictionary, if there are non-BMP chars involved or there's no 1-1 mapping of NUL to \x00. In all other cases, a special trie object of type EncodingMap is returned, which optimizes the lookups. The details can be found in the PyUnicode_BuildEncodingMap() function in unicodeobject.c. Here's the documentation of that function (the charmap_bulid() function is a wrapper around this C API): https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_BuildEncodingMap You may want to copy that description and perhaps also include an example of how it is used (see e.g. encodings/cp1251.py at the end). The usual approach is to create a decoding mapping string (going from ordinal to Unicode code string) and then pass this to charmap_build() to create a corresponding encoding map (going from Unicode code point ordinal to bytes ordinal). |
I wrote that doc some time ago, but I simplified it here unnecessarily. EncodingMap should be documented too, no? I personally do not see need for an example. |
|
Ah, I didn't know. I'm not sure about EncodingMap, since this is really only used internally. You can't create it directly from Python and it only has a single method .size() which returns the size of the trie (number of mappings). I don't think it's used anywhere. It may be worth noting that an internal object EncodingMap is returned, which can be used with the codecs.charmap_encode() function. But that function isn't documented either. |
|
I see, I think we should then leave it undocumented then, I modified the text. |
|
Just noticed: you have encoding and decoding reversed in the text. The function builds an encoding mapping (Unicode to bytes) and uses a decoding mapping string as input (bytes ordinals via the position in the string to Unicode). |
malemburg
left a comment
There was a problem hiding this comment.
lgtm now. Thanks, @StanFromIreland
|
Do you want me to merge it or will you do this ? |
|
Thank you, I am only a triager;-) Hopefully one day... |
|
Thanks @StanFromIreland for the PR, and @malemburg for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14. |
(cherry picked from commit 2bdd503) Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
(cherry picked from commit 2bdd503) Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
|
GH-136123 is a backport of this pull request to the 3.14 branch. |
|
GH-136124 is a backport of this pull request to the 3.13 branch. |
|
Thanks, @StanFromIreland, for your work on this. |
That is one old issue:-)
I used my docs from when I documented the underlying C API.
Per @malemburg 's comment, this does not need more testing, since the C API is already well tested. This raises the question, should we note what this function is in the documentation (i.e. link to to C API it exports)? I am also not sure about the best place to put it, I am happy to move it.
📚 Documentation preview 📚: https://cpython-previews--135997.org.readthedocs.build/