Skip to content

Commit 3cecb0b

Browse files
committed
address review comments
1 parent bfaa2de commit 3cecb0b

File tree

3 files changed

+12
-10
lines changed

3 files changed

+12
-10
lines changed

crates/string-offsets/Cargo.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,10 @@ name = "string-offsets"
33
authors = ["The blackbird team <support@github.com>"]
44
version = "0.1.0"
55
edition = "2021"
6-
description = "Offset calculator to convert between byte, char, and line offsets in a string."
6+
description = "Converts string offsets between UTF-8 bytes, UTF-16 code units, Unicode code points, and lines."
77
repository = "https://github.com/github/rust-gems"
88
license = "MIT"
9-
keywords = ["unicode", "string", "offsets", "positions", "interoperability"]
9+
keywords = ["unicode", "positions", "utf16", "characters", "lines"]
1010
categories = ["algorithms", "data-structures", "text-processing", "development-tools::ffi"]
1111

1212
[dev-dependencies]

crates/string-offsets/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
# string-offsets
22

3-
Offset calculator to convert between byte, char, and line offsets in a string.
3+
Converts string offsets between UTF-8 bytes, UTF-16 code units, Unicode code points, and lines.
44

55
Rust strings are UTF-8, but JavaScript has UTF-16 strings, and in Python, strings are sequences of
66
Unicode code points. It's therefore necessary to adjust string offsets when communicating across
77
programming language boundaries. [`StringOffsets`] does these adjustments.
88

9-
Each `StringOffsets` value contains offset information for a single string. [Building the data
10-
structure](StringOffsets::new) takes O(n) time and memory, but then each conversion is fast.
9+
Each `StringOffsets` instance contains offset information for a single string. [Building the data
10+
structure](StringOffsets::new) takes O(n) time and memory, but then most conversions are O(1).
1111

1212
["UTF-8 Conversions with BitRank"](https://adaptivepatchwork.com/2023/07/10/utf-conversion/) is a
1313
blog post explaining the implementation.

crates/string-offsets/src/lib.rs

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
//! Offset calculator to convert between byte, char, and line offsets in a string.
1+
//! Converts string offsets between UTF-8 bytes, UTF-16 code units, Unicode code points, and lines.
22
//!
33
//! # Example
44
//!
@@ -17,7 +17,7 @@
1717
//! // ...but only 3 UTF-16 code units...
1818
//! assert_eq!(offsets.utf8_to_utf16(12), 8);
1919
//! assert_eq!(offsets.utf8_to_utf16(19), 11);
20-
//! // ...and only 2 Unicode characters.
20+
//! // ...and only 2 Unicode code points.
2121
//! assert_eq!(offsets.utf8s_to_chars(12..19), 8..10);
2222
//! ```
2323
//!
@@ -30,14 +30,16 @@ mod bitrank;
3030

3131
use bitrank::{BitRank, BitRankBuilder};
3232

33-
/// Offset calculator to convert between byte, char, and line offsets in a string.
33+
/// Converts positions within a given string between UTF-8 byte offsets (the usual in Rust), UTF-16
34+
/// code units, Unicode code points, and line numbers.
3435
///
3536
/// Rust strings are UTF-8, but JavaScript has UTF-16 strings, and in Python, strings are sequences
3637
/// of Unicode code points. It's therefore necessary to adjust string offsets when communicating
3738
/// across programming language boundaries. [`StringOffsets`] does these adjustments.
3839
///
39-
/// Each `StringOffsets` value contains offset information for a single string. [Building the
40-
/// data structure](StringOffsets::new) takes O(n) time and memory, but then each conversion is fast.
40+
/// Each `StringOffsets` instance contains offset information for a single string. [Building the
41+
/// data structure](StringOffsets::new) takes O(n) time and memory, but then most conversions are
42+
/// O(1).
4143
///
4244
/// ["UTF-8 Conversions with BitRank"](https://adaptivepatchwork.com/2023/07/10/utf-conversion/)
4345
/// is a blog post explaining the implementation.

0 commit comments

Comments
 (0)