Skip to content

Commit 83dd2f1

Browse files
committed
remove dependency and update readme
1 parent 3f0d4fe commit 83dd2f1

2 files changed

Lines changed: 1 addition & 4 deletions

File tree

crates/bpe/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,7 @@ Two additional encoders are included that are faster but deviate from the origin
219219

220220
- The greedy encoder picks the left-longest token.
221221
- The minimal encoder computes an encoding with the minimal number of tokens.
222-
- The minimal_dropout encoder implements BPE-Dropout [algorithm](https://arxiv.org/abs/1910.13267), randomly ignoring some multi-byte tokens at runtime.
222+
- The minimal_dropout encoder implements BPE-Dropout [algorithm](https://arxiv.org/abs/1910.13267), randomly ignoring some multi-byte tokens at runtime. Note that this implementation differs from the paper, and **has not** been tested in an actual language model training pipeline.
223223

224224
The benchmark measured the runtime of encoding of slices of lengths 10, 100, 1000, and 10000 from a random 20000 token original text using the o200k token set.
225225
(All encodings were computed from scratch for each slice.)

crates/bpe/tests/Cargo.toml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,3 @@ bpe-openai = { path = "../../bpe-openai" }
88
itertools = "0.14"
99
rand = "0.9"
1010
tiktoken-rs = "0.9"
11-
12-
[dev-dependencies]
13-
rand_chacha = { version = "0.9" }

0 commit comments

Comments
 (0)