Skip to content

Commit 3c95078

Browse files
author
Hendrik van Antwerpen
committed
Add README
1 parent a83b250 commit 3c95078

1 file changed

Lines changed: 40 additions & 0 deletions

File tree

crates/bpe-openai/README.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# OpenAI Byte Pair Encoders
2+
3+
Fast tokenizers for OpenAI token sets based on the [bpe](https://crates.io/crates/bpe) crate.
4+
Serialized BPE instances are generated during build and lazily loaded at runtime as static values.
5+
The overhead of loading the tokenizers is small because it happens only once per proces and only requires deserialization (as opposed to actually building the internal datastructures).
6+
For convencience it re-exports the `bpe` crate so that depending on this crate is enough to use these tokenizers.
7+
8+
Supported token sets:
9+
10+
- cl100k
11+
- o200k
12+
13+
## Usage
14+
15+
Add a dependency by running
16+
17+
```sh
18+
cargo add bpe-openai
19+
```
20+
21+
or by adding the following to `Cargo.toml`
22+
23+
```toml
24+
[dependencies]
25+
bpe-openai = "0.1"
26+
```
27+
28+
Counting tokens is as simple as:
29+
30+
```rust
31+
use bpe_openai::cl100k;
32+
33+
fn main() {
34+
let bpe = cl100k();
35+
let count = bpe.count("Hello, world!");
36+
println!("{tokens}");
37+
}
38+
```
39+
40+
For more detailed documentation we refer to [bpe](https://crates.io/crates/bpe).

0 commit comments

Comments
 (0)