Conversation
Add timestamped REQUEST/RESPONSE logging to Gemini API calls with model, duration, and token counts. Add verbose prompt logging behind VERBOSE flag. Add per-language timing and formatted token usage summary table at end of pipeline run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Strip metadata from fence language tag before syntax lookup so "sh copy" maps to shell, not js (avoids treating // in URLs as comments). Use strippedCode instead of original block content when restoring translated comments to prevent duplication. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove duplicate "Creating Pull Request" banner - Wrap verbose prompt output in collapsible groups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Support comma-separated exclude paths to skip specific files or directories from translation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Relative paths passed to runSanitizer caused English source lookups to fail silently in GitHub Actions, skipping all English-comparison fixes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
JSX attribute translation runs after the sanitizer and can reintroduce issues. Add a second sanitizer pass after Phase 4 to catch these. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Replace per-language sequential processing with a single shared Gemini concurrency pool. Languages dispatch files simultaneously; commits serialized via SharedCommitter then squashed one-per-language. Bump default concurrency from 6 to 16. Add parallel JSX attribute translation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Pass English content map (fetched from BASE_BRANCH via GitHub API) to the sanitizer instead of reading from disk. Ensures English comparison matches the same branch used for translation, not whatever the CI runner checked out. Disk fallback preserved for local/CLI usage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Restore closing code fence indentation lost during code block extraction/restoration. Explicitly instruct Gemini to translate frontmatter values and transliterate author names for non-Latin scripts. Add validation to reject untranslated frontmatter. Collapse blank lines left by multi-line comment restoration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
The prompt told Gemini to keep `lang` unchanged, preserving `lang: en` from the English source. Now explicitly instructs it to set the lang field to the target language code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Add fixFrontmatterLang() as a deterministic backup that forces the frontmatter `lang` field to match the locale derived from the file path (public/content/translations/LANG_CODE/**/*.md). - 10 unit tests covering edge cases - Exported via _testOnly for testing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Gemini was dropping CODE_BLOCK placeholders and hallucinating replacement code from training data, producing wrong language tags and modified code content. - Prompt: tell Gemini placeholders are sacrosanct - Prompt: fallback rules if a real fence slips through - Validation: reject output with missing placeholders - Validation: reject output with hallucinated code fences Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
warnCodeFenceContentDrift was flagging every code block with translated comments as "differs from English" -- noise that obscured real code corruption. Now strips comments (// /* */ # and docstrings) before comparing, so only functional code differences trigger the warning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
When files fail to translate, post the list as a comment on the newly created PR for easy follow-up tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Adds batching for large JSON translation files (~100 keys per Gemini request) and pre-translation HTML placeholder extraction/restoration for values with embedded HTML tags. This targets the 8+ language failures on glossary.json (406 keys, 595 HTML tags) and learn-quizzes.json (696 keys). Also updates the translation roadmap with the agreed-upon priority plan for v3 fixes and v4 infrastructure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Adds BLOCK_NONE safety settings for all harm categories to prevent Gemini from silently returning empty responses for educational blockchain content (mining, attacks, etc.). Inspects response candidates, finishReason, and safetyRatings before accessing response.text, logging detailed diagnostics when non-STOP finish reasons are detected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Use HarmCategory and HarmBlockThreshold enums from @google/genai instead of plain strings. Fixes TS2322 type error in CI where the SDK types are available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
The sanitizer was pushing absolute filesystem paths into changedFiles, causing GitHub tree API to reject them with "tree.path cannot start with a slash". Uses path.relative() to convert to repo-relative paths, matching the pattern already used for logging in the same file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Technical titles like "Ethash", "JSON-RPC API", "PeerDAS" are legitimately kept in English. The previous check failed if either title or description matched English. Now only fails when BOTH are identical, catching genuinely untranslated output while allowing technical/proper-noun titles. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Same slash bug as the sanitizer -- absolute filesystem paths passed to GitHub tree API. Applies path.relative() at the commit point in jsx-translation.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Wires Atlas's glossary lookup module into the translation pipeline. For each file, filters the 519-term glossary to only terms present in the source text, then merges with the existing Supabase glossary (local terms take priority). Includes glossary data: 519 terms + 24 language translation files (Tier 1 complete, Tier 2/3 in progress). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Document the plan to remove the Supabase glossary fallback once the local enhanced glossary is complete. Pre-sweep cleanup item. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Educational diagrams and infographics with English text remain untranslated despite full text coverage. Documents the problem, proposed approach using Gemini image generation, and key challenges (text detection, RTL, visual QA). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Adds rules from Atlas's translation-rules reconciliation: Common (all languages): - EVM opcodes, hex values, crypto primitives stay Latin - Network/testnet names stay Latin - Client names: never translate meaning, transliterate OK - License identifiers, math notation stay as-is Group-specific: - Latin: keep English technical loanwords (de/fr) - Cyrillic: CLDR plural categories, grammatical case - First mention rule for acronyms (markdown only) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
- Consolidate 7 "stay Latin" rules into single grouped directive - Fix ambiguous "may"/"CAN" to binary directives - Add RTL markdown syntax LTR preservation rule - Make CJK-phonetic brand override explicit - Fix glossary "exactly" to allow grammatical declension - Remove first-mention acronym rule (deferred to glossary) - Clarify loanword rule: glossary is authoritative Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
pettinarip
left a comment
There was a problem hiding this comment.
@wackerow a few findings from the code review. Looks good overall
Code Review
P0 — Shell injection via execSync (main.ts:477)
englishA = execSync(`git show ${manifest.sourceCommitSha}:${file.path}`, { encoding: "utf-8" })sourceCommitSha comes from a committed .manifest-source.json, file.path from TARGET_PATH input. Neither validated before shell interpolation. Crafted values achieve arbitrary command execution with GEMINI_API_KEY and I18N_GITHUB_TOKEN in scope. validateTargetPath() exists but is never called here.
Fix: execFileSync('git', ['show', \${sha}:${filePath}`])+ validate SHA against/^[0-9a-f]{40}$/i`.
P1 — High (11)
| # | File | Issue |
|---|---|---|
| 1 | task-pool.ts:76 |
Task pool swallows all errors — failed translations silently dropped, pipeline merges incomplete branch |
| 2 | commits.ts:732, branches.ts:170 |
process.exit(1) in library functions — bypasses cleanup, strands temp branch. Must throw instead. |
| 3 | commits.ts:691 |
422 retry infinite recursion — passes attempt not attempt + 1 |
| 4 | translate.ts:1069 |
(response as any) with ESLint-disable — project bans any |
| 5 | translate.ts:1059 |
Gemini SDK call has no timeout — hung call blocks rate-limiter slot forever |
| 6 | main.ts:681 |
squashByLanguage failure swallowed — warns and continues to merge with mixed history |
| 7 | main.ts:127 |
Prompt injection via ETHGlossary API — note field interpolated unescaped into Gemini prompt |
| 8 | incremental-translate.ts:127 |
parseIncrementalResponse throws on malformed batch — loses all translations for the file |
| 9 | output-validation.ts:1 |
Zero test coverage — last gate before content is committed to GitHub |
| 10 | incremental-translate.ts:354 |
4 of 8 exported functions untested — buildSectionList, extractJsonSections, replaceJsonValues, removeMarkdownSection |
| 11 | main.ts:708 |
mergeBranchInto return value discarded — returns false on conflict, main.ts logs "Merged successfully" |
P2: Dead code at scale (~960 lines of exported-but-never-called code):
- lib/github/files.ts — 328 lines, zero callers
- lib/workflows/pr-creation.ts — 182 lines, zero callers
- IncrementalCommitter class — 164 lines, replaced by SharedCommitter
- putCommitFile/getPathSha — 106 lines, legacy REST path
- Most of gemini.ts — only isGeminiAvailable() is actually imported
- lib/ai/ -> lib/llm/ (LLM-agnostic naming) - gemini.ts + translate.ts nested in lib/llm/gemini/ - output-validation.ts (was gemini-output-validation) - rate-limiter.ts moved to lib/utils/ - glossary/ dir with lookup + data + schema - PIPELINE_CONFIG imports TRANSLATABLE_ATTRIBUTES (DRY) - Delete propagate-inert.ts (dead code) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Remove src/scripts/i18n/ and crowdin-ai-import.yml. All functionality migrated to src/scripts/intl-pipeline/. FUTURE.md preserved in intl-pipeline for reference. Git history available for restoration if needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Move PIPELINE-SPEC.md, SPEC.md from test fixtures. Add CONCURRENCY-SPEC.md for chunking, concurrency, and commit strategy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
37 tests across 3 files matching CONCURRENCY-SPEC.md. Chunking: 27 tests (JSON byte-size, MD paragraph, incremental batching). Concurrency: 4 fixme tests. Commit strategy: 6 fixme tests. All stubs throw until implementation is wired in. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Organize tests/unit/intl-pipeline/ to match the pattern of sanitizer/ and data-layer/ directories. Update fixture and import paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
JSON: split by byte size (64KB) not key count. Markdown: heading + paragraph splitting with fence safety. Incremental: batch sections with CONTEXT replication. All 27 chunking tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Wire task pool into main.ts: concurrent Gemini calls across file/language pairs with configurable pool size. Temp branch (tmp-intl/run-MMDD-HHMM) for crash safety, squash per-language after all tasks drain, merge to target branch on success. 164 tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Property is 'content', not 'english'/'locale'. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Replace 5 fixme tests with documented contracts validated by GH Action runs (test-manual-11 series). Squash, merge, sanitizer ordering, zero-drift all confirmed working in production. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Update all test imports from deleted @/scripts/i18n/ to @/scripts/intl-pipeline/ paths. Remove propagate-inert.spec.ts (module deleted). Remove completed items from FUTURE.md. 772 tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
gemini-translations.yml -> intl-pipeline.yml "Gemini Translations" -> "Intl Pipeline" Reorder workflow inputs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Captures process lessons from building the incremental translation pipeline: spec-first vs heuristic-first, context window management, team composition, package separation. Public-facing, no internal references. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Replace local glossary files (591KB) with async fetch to ETHGlossary API (/api/v1/filter). Includes term notes in glossary map for richer LLM context. Graceful fallback on API failure. No auth required. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Replace execSync with execFileSync to avoid shell
interpolation. Validate sourceCommitSha against
/^[0-9a-f]{40}$/i. Call validateTargetPath() on
file path before use.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
- Task pool tracks errors, pipeline aborts on failure - Replace process.exit(1) with thrown errors - Fix 422 retry infinite recursion (attempt + 1) - Type Gemini response properly (remove as any) - Add 5-min timeout to Gemini SDK calls - Squash failure is now fatal (no swallow) - Sanitize glossary notes against prompt injection - Catch malformed batch responses per-batch - Check mergeBranchInto return value Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Create or update PR for target branch. First run
creates PR, subsequent runs append run summary to
existing body. Target branch auto-derived from base:
intl/pending-{base}. findOpenPR + updatePRBody added
to pull-requests.ts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Delete lib/github/files.ts (328 lines, zero callers). Remove IncrementalCommitter, putCommitFile, getPathSha from commits.ts (252 lines, replaced by SharedCommitter). Strip gemini.ts to isGeminiAvailable only (254 lines, JSX attr functions unused). Clean stale comment in types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Merge gemini/gemini.ts (isGeminiAvailable) and gemini/translate.ts into single lib/llm/gemini.ts. Remove gemini/ subdirectory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
P1-9: 15 tests for output-validation.ts (JSON/MD validation, refusal detection, placeholder checks). P1-10: 24 tests for 4 previously untested functions (buildSectionList, removeMarkdownSection, extractJsonSections, replaceJsonValues). 807 total. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
2d83d73 to
861c2f2
Compare
|
Thanks @pettinarip! All P0/P1 items addressed, P2 cleaned up. P0 -- Shell injection: Replaced P1 fixes:
P2 -- Dead code removed (~850 lines):
Git history: Rebased to remove ~591K lines of glossary JSON data that were temporarily committed during migration. These files were replaced by the ETHGlossary API integration and never needed to be in the branch history. 807 unit tests passing. E2E validated via GH Action (test-manual-13 series). |
pettinarip
left a comment
There was a problem hiding this comment.
@wackerow a few additional P1 findings that might be worth validating before merge:
1. Path traversal via TARGET_PATH (main.ts:603)
fs.readFileSync is called on config.targetPaths before validateTargetPath is ever invoked. A crafted path like ../../.env.local reads secrets from the runner and sends them to Gemini. Fix: Call validateTargetPath() on all config.targetPaths at the top of main(), before any filesystem reads.
2. AbortController signal not wired (gemini.ts:1067)
The 5-minute timeout creates an AbortController but never passes controller.signal to generateContent(). A hung Gemini call blocks a pool slot permanently and can deadlock pool.drain(). Fix: Pass signal: controller.signal into the generateContent config object.
3. stamp-only tasks skip squash/merge/PR (main.ts:670)
stamp-only tasks call committer.commitFile() but never push to committedFiles. After drain(), committedFiles.length === 0 so squashByLanguage(), merge, and PR creation are all silently skipped. Fix: Gate downstream steps on committer.totalFiles > 0 instead of committedFiles.length > 0, or push a synthetic entry.
4. Glossary terms injected into LLM prompt without sanitization (main.ts:130)
term.note is sanitized (control chars stripped, 200 char cap), but term.english and term.translation are injected raw into prompts. A compromised glossary API can manipulate all translations. Fix: Apply the same sanitization (strip control chars, cap length, reject embedded newlines) to all three fields.
Summary
Complete overhaul of the translation pipeline. Replaces the legacy
src/scripts/i18n/system with a newsrc/scripts/intl-pipeline/module built around incremental translation powered by Gemini 3.1 Pro.What changed:
intl-content-treev0.3.0), route deterministic vs LLM-required changes, propagate inert changes by script, send only changed prose to the LLM, assemble output, update manifestssrc/scripts/i18n/,src/scripts/crowdin/, legacy workflow files,markdownChecker.ts, ~850 lines dead codeTest plan
Generated with Claude Code