1- //! Offset calculator to convert between byte, char, and line offsets in a string .
1+ //! Converts string offsets between UTF-8 bytes, UTF-16 code units, Unicode code points, and lines .
22//!
33//! # Example
44//!
1717//! // ...but only 3 UTF-16 code units...
1818//! assert_eq!(offsets.utf8_to_utf16(12), 8);
1919//! assert_eq!(offsets.utf8_to_utf16(19), 11);
20- //! // ...and only 2 Unicode characters .
20+ //! // ...and only 2 Unicode code points .
2121//! assert_eq!(offsets.utf8s_to_chars(12..19), 8..10);
2222//! ```
2323//!
@@ -30,14 +30,16 @@ mod bitrank;
3030
3131use bitrank:: { BitRank , BitRankBuilder } ;
3232
33- /// Offset calculator to convert between byte, char, and line offsets in a string.
33+ /// Converts positions within a given string between UTF-8 byte offsets (the usual in Rust), UTF-16
34+ /// code units, Unicode code points, and line numbers.
3435///
3536/// Rust strings are UTF-8, but JavaScript has UTF-16 strings, and in Python, strings are sequences
3637/// of Unicode code points. It's therefore necessary to adjust string offsets when communicating
3738/// across programming language boundaries. [`StringOffsets`] does these adjustments.
3839///
39- /// Each `StringOffsets` value contains offset information for a single string. [Building the
40- /// data structure](StringOffsets::new) takes O(n) time and memory, but then each conversion is fast.
40+ /// Each `StringOffsets` instance contains offset information for a single string. [Building the
41+ /// data structure](StringOffsets::new) takes O(n) time and memory, but then most conversions are
42+ /// O(1).
4143///
4244/// ["UTF-8 Conversions with BitRank"](https://adaptivepatchwork.com/2023/07/10/utf-conversion/)
4345/// is a blog post explaining the implementation.
0 commit comments