Skip to content

Commit 51221c5

Browse files
committed
Merge branch 'main' into jorendorff/geo-serde-build
2 parents 133dca8 + c635a29 commit 51221c5

File tree

17 files changed

+848
-362
lines changed

17 files changed

+848
-362
lines changed
Lines changed: 289 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
---
2+
name: update-deps
3+
description: Keep dependencies up-to-date. Discovers outdated deps via dependabot alerts/PRs, creates one PR per ecosystem, iterates until CI is green, then assigns for review.
4+
user-invocable: true
5+
---
6+
7+
# Update Dependencies
8+
9+
Automate the full dependency update lifecycle: discover what's outdated, apply updates grouped by ecosystem, fix breakage, get CI green, and hand off for human review.
10+
11+
## Repository context
12+
13+
This is a Rust workspace containing utility crates published to crates.io. All dependency update PRs target the **`main`** branch.
14+
15+
Dependabot is configured (`.github/dependabot.yaml`) to open PRs against `main` on the 2nd of each month. This skill gathers individual dependabot PRs, combines updates by ecosystem, fixes any breakage, gets CI green, and creates consolidated PRs for human review.
16+
17+
### Crates in this workspace
18+
19+
| Crate | Description |
20+
|---|---|
21+
| **bpe** | Fast byte-pair encoding |
22+
| **bpe-openai** | OpenAI tokenizers built on bpe |
23+
| **geo_filters** | Probabilistic cardinality estimation |
24+
| **string-offsets** | UTF-8/UTF-16/Unicode position conversion (with WASM/JS bindings) |
25+
26+
Supporting packages (not published): `bpe-tests`, `bpe-benchmarks`.
27+
28+
### Ecosystems in this repo
29+
30+
| Ecosystem | Directories | Notes |
31+
|---|---|---|
32+
| **cargo** | `/` (workspace root) | Deps declared per-crate; `Cargo.lock` at workspace root pins versions |
33+
| **github-actions** | `.github/workflows/` | CI and publish workflows |
34+
| **npm** | `crates/string-offsets/js/` | JS bindings for string-offsets (WASM) |
35+
36+
### Build and validation commands
37+
38+
```bash
39+
make build # cargo build --all-targets --all-features
40+
make build-js # npm run compile in crates/string-offsets/js
41+
make lint # cargo fmt --check + cargo clippy (deny warnings, forbid unwrap_used)
42+
make test # cargo test + doc tests
43+
```
44+
45+
CI runs on `ubuntu-latest` with the `mold` linker. The lint job depends on build.
46+
47+
## Workflow
48+
49+
### 1. Assess repo state
50+
51+
Determine the repo identity and confirm the target branch.
52+
53+
```bash
54+
git remote get-url origin # extract owner/repo
55+
git fetch origin main
56+
git rev-parse --verify origin/main
57+
```
58+
59+
Detect which ecosystems have pending updates:
60+
61+
```bash
62+
[ -f Cargo.toml ] && echo "cargo"
63+
ls .github/workflows/*.yml .github/workflows/*.yaml 2>/dev/null && echo "github-actions"
64+
[ -f crates/string-offsets/js/package.json ] && echo "npm"
65+
```
66+
67+
Report discovered ecosystems to the user.
68+
69+
### 2. Gather dependency intelligence
70+
71+
Fetch open dependabot PRs:
72+
73+
```bash
74+
gh pr list --author 'app/dependabot' --base main --state open --json number,title,headRefName,labels --limit 100
75+
```
76+
77+
Fetch open dependabot alerts:
78+
79+
```bash
80+
gh api --paginate /repos/{owner}/{repo}/dependabot/alerts --jq '[.[] | select(.state=="open") | {number: .number, package: .security_vulnerability.package.name, ecosystem: .security_vulnerability.package.ecosystem, severity: .security_advisory.severity, summary: .security_advisory.summary}]'
81+
```
82+
83+
For ecosystems without dependabot coverage or when running ad-hoc, use native tooling:
84+
85+
- **cargo:** `cargo update --dry-run`
86+
- **npm:** find directories containing `package.json`, then run `npm outdated --json || true` in each (npm exits non-zero when updates exist)
87+
88+
Also fetch the advisory URLs for any security-related updates. Individual alert details are at `https://github.com/{owner}/{repo}/security/dependabot/{alert_number}`. Fetch alert numbers and GHSA IDs via:
89+
90+
```bash
91+
gh api --paginate /repos/{owner}/{repo}/dependabot/alerts --jq '[.[] | {number: .number, state, package: .security_vulnerability.package.name, ecosystem: .security_vulnerability.package.ecosystem, severity: .security_advisory.severity, ghsa_id: .security_advisory.ghsa_id, summary: .security_advisory.summary}]'
92+
```
93+
94+
Include both open and auto_dismissed/dismissed alerts — the update may resolve alerts in any state.
95+
96+
Cross-reference and group all updates by ecosystem. Present a summary to the user:
97+
98+
- How many updates per ecosystem
99+
- Which have security alerts (with severity, GHSA IDs, and advisory links)
100+
- Which dependabot PRs already exist
101+
102+
**Flag high-risk upgrades.** Before proceeding, explicitly call out upgrades that carry elevated risk:
103+
104+
- **Major version bumps** — likely contain breaking API changes
105+
- **Packages with wide blast radius** — for this repo, pay special attention to: `serde`, `itertools`, `regex-automata`, `wasm-bindgen`, `criterion`, and the Rust toolchain itself
106+
- **Multiple major bumps in the same PR** — each major bump multiplies the risk; consider splitting them
107+
108+
Present the risk assessment to the user and recommend which upgrades to include vs. defer. When in doubt, prefer a smaller, safe update over an ambitious one that might break.
109+
110+
### 3. Create branch and apply updates
111+
112+
For each selected ecosystem, starting from `main`:
113+
114+
```bash
115+
git checkout main
116+
git pull origin main
117+
git checkout -b deps/{ecosystem}-updates-$(date +%Y-%m-%d)
118+
```
119+
120+
Apply updates using ecosystem-appropriate tooling:
121+
122+
**cargo:**
123+
124+
```bash
125+
cargo update
126+
# For major bumps, edit Cargo.toml version constraints then:
127+
cargo check
128+
```
129+
130+
This is a Cargo workspace — always run from the repo root. All crate `Cargo.toml` files are in `crates/`. The `Cargo.lock` at the root is the single source of truth.
131+
132+
**npm:**
133+
134+
```bash
135+
cd crates/string-offsets/js
136+
npm update
137+
npm install
138+
```
139+
140+
**github-actions:**
141+
142+
- Parse workflow YAML files in `.github/workflows/` for `uses:` directives
143+
- For each action with an outdated version (from dependabot PRs/alerts), update the SHA or version tag
144+
- Be careful to preserve comments and formatting
145+
146+
### 4. Build, lint, and test locally
147+
148+
Always run:
149+
150+
```bash
151+
make lint # cargo fmt --check + clippy with deny warnings
152+
make test # cargo test with backtrace
153+
make build # full workspace build (all targets, all features)
154+
```
155+
156+
If npm dependencies changed:
157+
158+
```bash
159+
make build-js # npm compile for string-offsets JS binding
160+
```
161+
162+
**If the build/lint/test fails:**
163+
164+
1. Read the error output carefully
165+
2. Analyze what broke — likely API changes, type errors, or deprecation removals
166+
3. Make the necessary code changes to fix the breakage
167+
4. Run the pipeline again
168+
5. Repeat up to 3 times
169+
170+
If still failing after 3 iterations, report the situation to the user and ask for guidance. Do not push broken code.
171+
172+
### 5. Commit and push
173+
174+
Stage all changes and commit with a descriptive message:
175+
176+
```bash
177+
git add -A
178+
git commit -m "chore(deps): update {ecosystem} dependencies
179+
180+
Updated packages:
181+
- package-a: 1.0.0 → 2.0.0
182+
- package-b: 3.1.0 → 3.2.0
183+
184+
{If code changes were needed:}
185+
Fixed breaking changes:
186+
- Updated X API usage for package-a v2
187+
188+
Supersedes: #{dependabot_pr_1}, #{dependabot_pr_2}
189+
"
190+
```
191+
192+
Push the branch:
193+
194+
```bash
195+
git push -u origin HEAD
196+
```
197+
198+
### 6. Create the PR
199+
200+
**Title:** `chore(deps): update {ecosystem} dependencies`
201+
202+
**Body should include:**
203+
204+
- List of updated dependencies with version changes (old → new)
205+
- Any security alerts resolved — for each, link to the specific dependabot alert (`https://github.com/{owner}/{repo}/security/dependabot/{alert_number}`) and the GHSA advisory (`https://github.com/advisories/GHSA-xxxx-xxxx-xxxx`), along with severity and summary
206+
- **High-risk changes flagged for reviewer attention** (major version bumps, wide-blast-radius packages)
207+
- Code changes made to fix breakage (if any)
208+
- References to superseded dependabot PRs
209+
- Note that this was generated by the update-deps skill
210+
211+
Write the body to a temp file and create the PR **targeting `main`**:
212+
213+
```bash
214+
gh pr create --title "chore(deps): update {ecosystem} dependencies" --body-file /tmp/deps-pr-body.md --base main
215+
rm /tmp/deps-pr-body.md
216+
```
217+
218+
### 7. Monitor CI and iterate on failures
219+
220+
Watch the PR's checks:
221+
222+
```bash
223+
gh pr checks {pr_number} --watch --fail-fast
224+
```
225+
226+
**If checks fail:**
227+
228+
1. Get the failed run details:
229+
230+
```bash
231+
gh run list --branch {branch} --status failure --json databaseId,name --limit 1
232+
gh run view {run_id} --log-failed
233+
```
234+
235+
2. Analyze the failure — CI runs on `ubuntu-latest` with `mold` linker, which may differ from local builds.
236+
237+
3. Fix the issue locally, commit, and push:
238+
239+
```bash
240+
git add -A
241+
git commit -m "fix: resolve CI failure in {ecosystem} dep update
242+
243+
{Brief description of what failed and why}"
244+
git push
245+
```
246+
247+
4. Monitor again. Repeat up to 3 iterations total.
248+
249+
5. If still failing after 3 pushes, report to the user with the failure details and ask for help.
250+
251+
### 8. Close superseded dependabot PRs
252+
253+
For each dependabot PR that this update supersedes:
254+
255+
```bash
256+
gh pr close {dependabot_pr_number} --comment "Superseded by #{new_pr_number} which includes this update along with other {ecosystem} dependency updates."
257+
```
258+
259+
### 9. Assign for review
260+
261+
Request review from CODEOWNERS or a user-provided reviewer (not the PR author):
262+
263+
```bash
264+
gh pr edit {pr_number} --add-reviewer {reviewer_login}
265+
```
266+
267+
Report the final PR URL and a summary of what was done.
268+
269+
## Guidelines
270+
271+
- **All PRs target `main`.** There is no separate dev branch.
272+
- **Never push to `main` directly.** Always work on a feature branch.
273+
- **Never push code that doesn't pass `make lint` and `make test`.** If you can't fix it in 3 tries, stop and ask.
274+
- **Be conservative with major version bumps.** If a major version update breaks things and the fix isn't obvious, skip that package and note it in the PR description.
275+
- **Regenerate lockfiles.** Always regenerate `Cargo.lock` and `package-lock.json` after updating — don't just edit manifests.
276+
- **One ecosystem at a time.** Complete the full cycle (update → build → push → PR → CI green) for one ecosystem before moving to the next.
277+
- **If no updates are needed** for an ecosystem, skip it and tell the user.
278+
- **Security alerts take priority.** Address security alerts first within each ecosystem.
279+
- **Clippy is strict.** This repo forbids `unwrap_used` outside tests and denies all warnings. New dependency versions may trigger new clippy lints — fix them.
280+
281+
## Edge cases
282+
283+
- **Cargo workspace:** Dependencies are declared per-crate but share a single `Cargo.lock` at the workspace root. Always run `cargo update` and `cargo check` from the repo root.
284+
- **npm:** Look for `package.json` files to discover npm packages rather than hardcoding paths — the repo layout may change.
285+
- **WASM builds:** After updating `wasm-bindgen` or related deps, verify `make build-js` still works — WASM toolchain version mismatches are common.
286+
- **Rate limits:** If `gh api` hits rate limits, wait and retry. Report to user if persistent.
287+
- **Nothing to update:** Report cleanly and move to the next ecosystem (or exit).
288+
- **Merge conflicts on push:** Rebase on `main` and retry: `git fetch origin main && git rebase origin/main`.
289+
- **Branch already exists:** If `deps/{ecosystem}-updates-{date}` already exists, append a counter or ask user.

crates/bpe/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ aneubeck-daachorse = "1.1.1"
2121
base64 = { version = "0.22", optional = true }
2222
fnv = "1.0"
2323
itertools = "0.14"
24-
rand = { version = "0.9", optional = true }
24+
rand = { version = "0.10", optional = true }
2525
serde = { version = "1", features = ["derive"] }
2626

2727
[dev-dependencies]

crates/bpe/benchmarks/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,6 @@ test = true
2121
bpe = { path = "../../bpe", features = ["rand", "tiktoken"] }
2222
bpe-openai = { path = "../../bpe-openai" }
2323
criterion = "0.8"
24-
rand = "0.9"
24+
rand = "0.10"
2525
tiktoken-rs = "0.9"
2626
tokenizers = { version = "0.22", features = ["http"] }

crates/bpe/benchmarks/performance.rs

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,7 @@ use bpe_benchmarks::*;
99
use criterion::{
1010
criterion_group, criterion_main, AxisScale, BenchmarkId, Criterion, PlotConfiguration,
1111
};
12-
use rand::rngs::StdRng;
13-
use rand::SeedableRng;
14-
use rand::{rng, Rng};
12+
use rand::{rng, RngExt};
1513

1614
fn counting_benchmark(c: &mut Criterion) {
1715
for (name, bpe, _, _) in TOKENIZERS.iter() {

crates/bpe/src/byte_pair_encoding.rs

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ pub fn find_hash_factor_for_tiktoken(data: &str) -> Result<u64, base64::DecodeEr
168168
pub fn find_hash_factor_for_dictionary(tokens: impl IntoIterator<Item = Vec<u8>>) -> u64 {
169169
use std::collections::HashSet;
170170

171-
use rand::Rng;
171+
use rand::RngExt;
172172

173173
let all_tokens = tokens.into_iter().collect_vec();
174174
let mut rnd = rand::rng();
@@ -573,7 +573,7 @@ impl BytePairEncoding {
573573
// and hence may include previously discarded token later down the byte stream. At the sentence level though we don't expect it to make much difference.
574574
// Also, this implementation of BPE constructs merges on the fly from the set of tokens, hence might come up with a different set of merges with the same dictionary.
575575
#[cfg(feature = "rand")]
576-
pub fn encode_minimal_dropout<R: rand::Rng>(
576+
pub fn encode_minimal_dropout<R: rand::RngExt>(
577577
&self,
578578
text: &[u8],
579579
dropout: f32,
@@ -627,7 +627,7 @@ pub fn create_test_string_with_predicate(
627627
min_bytes: usize,
628628
predicate: impl Fn(&str) -> bool,
629629
) -> String {
630-
use rand::{rng, Rng};
630+
use rand::{rng, RngExt};
631631
// the string we accumulated thus far
632632
let mut result = String::new();
633633
// the tokens we added so we can backtrack
@@ -662,7 +662,7 @@ pub fn create_test_string_with_predicate(
662662

663663
#[cfg(feature = "rand")]
664664
pub fn select_test_string(text: &str, min_bytes: usize) -> &str {
665-
use rand::{rng, Rng};
665+
use rand::{rng, RngExt};
666666
let mut start = rng().random_range(0..text.len() - min_bytes);
667667
while !text.is_char_boundary(start) {
668668
start -= 1;
@@ -677,7 +677,7 @@ pub fn select_test_string(text: &str, min_bytes: usize) -> &str {
677677
/// Generate test bytes by concatenating random tokens.
678678
#[cfg(feature = "rand")]
679679
pub fn create_test_bytes(bpe: &BytePairEncoding, min_bytes: usize) -> Vec<u8> {
680-
use rand::{rng, Rng};
680+
use rand::{rng, RngExt};
681681
let mut result = Vec::new();
682682
while result.len() < min_bytes {
683683
let i = rng().random_range(0..bpe.num_tokens());

crates/bpe/tests/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,5 @@ edition = "2021"
66
bpe = { path = "../../bpe", features = ["rand"] }
77
bpe-openai = { path = "../../bpe-openai" }
88
itertools = "0.14"
9-
rand = "0.9"
9+
rand = "0.10"
1010
tiktoken-rs = "0.9"

crates/bpe/tests/src/lib.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#[cfg(test)]
22
mod tests {
33
use itertools::Itertools;
4-
use rand::{rng, Rng};
4+
use rand::{rng, RngExt};
55
use tiktoken_rs::cl100k_base_singleton;
66

77
use bpe::appendable_encoder::AppendableEncoder;

0 commit comments

Comments
 (0)