Skip to content

Commit 23c8ec9

Browse files
committed
update docs
1 parent a336729 commit 23c8ec9

1 file changed

Lines changed: 3 additions & 2 deletions

File tree

crates/bpe/src/byte_pair_encoding.rs

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -552,8 +552,9 @@ impl BytePairEncoding {
552552
encoded
553553
}
554554

555-
/// This function computes the shortest possible encoding sequence which will usually differ from the
556-
/// tokenization produced by the original BPE algorithm.
555+
/// This function computes the encoding while randomly rejecting some merges.
556+
/// Result of the encoding will be non-deterministic unless `seed` is provided.
557+
/// Implementation loosely follows original BPE dropout paper: https://arxiv.org/abs/1910.13267
557558
#[cfg(feature = "rand")]
558559
pub fn encode_minimal_dropout(&self, text: &[u8], dropout: f32, seed: Option<u64>) -> Vec<u32> {
559560
use rand::rngs::StdRng;

0 commit comments

Comments
 (0)