Skip to content

Commit ed45357

Browse files
author
Hendrik van Antwerpen
committed
Add CONTRIBUTING.md for bpe explaining project structure and benchmark instructions
1 parent 02118ef commit ed45357

File tree

2 files changed

+39
-23
lines changed

2 files changed

+39
-23
lines changed

crates/bpe/CONTRIBUTING.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Contributing
2+
3+
Here are specific details that are useful when you want to contribute to the BPE crates.
4+
Make sure to read the repository's [contribution guidelines][contributing] as well.
5+
6+
## Project structure
7+
8+
This project has a slightly unusual structure to resolve some dependency issues.
9+
10+
- This directory contains `bpe`, the BPE code itself.
11+
- A sibling directory contains `bpe-openai`, which exposes tokenizers for OpenAI token sets, and depends on `bpe`.
12+
- Tests are located in the `tests` subdirectory, and benchmarks in the `benchmarks` subdirectory. Both of these are separate crates so they can depend on `bpe-openai` without causing a cyclic dependency.
13+
14+
Only the `bpe` and `bpe-openai` crates are meant to be published. The other ones are for development use only.
15+
16+
## Running benchmarks
17+
18+
Change the working directory to the `benchmarks` directory:
19+
20+
```sh
21+
cd benchmarks
22+
```
23+
24+
Run the benchmark as follows (required [cargo-criterion](https://crates.io/crates/cargo-criterion) installed):
25+
26+
```sh
27+
cargo criterion
28+
```
29+
30+
(Using `cargo bench` ignores the settings in `criterion.toml`!)
31+
Open the full report which should be located in `target/criterion/reports/index.html`.
32+
33+
Update the figures in this repo as follows (requires `rsvg-convert` from `librsvg` installed):
34+
35+
```sh
36+
script/copy-results
37+
```
38+
39+
[contributing]: ../../CONTRIBUTING.md

crates/bpe/README.md

Lines changed: 0 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -296,26 +296,3 @@ The performance of tiktoken shows a quadratic growth with the input size.
296296
The Huggingface encoder scales better, but becomes slower and slower compared to our implementation as input size increases.
297297

298298
![worst-case encoding runtime comparison](./images/performance-worstcase.svg)
299-
300-
### Running the benchmarks
301-
302-
Benchmarks are located in a separate crate in the `benchmarks` directory.
303-
304-
```sh
305-
cd benchmarks
306-
```
307-
308-
Run the benchmark as follows (required [cargo-criterion](https://crates.io/crates/cargo-criterion) installed):
309-
310-
```sh
311-
cargo criterion
312-
```
313-
314-
(Using `cargo bench` ignores the settings in `criterion.toml`!)
315-
Open the full report which should be located in `target/criterion/reports/index.html`.
316-
317-
Update the figures in this repo as follows (requires `rsvg-convert` from `librsvg` installed):
318-
319-
```sh
320-
script/copy-results
321-
```

0 commit comments

Comments
 (0)