docs: wire new contributor skills and plan-comparison diagnostic into AGENTS.md

timsaucer · claude · timsaucer · commit e4614993d431 · 2026-04-24T12:00:55.000-04:00
- List the three contributor skills (`check-upstream`, `write-dataframe-code`, `audit-skill-md`) under the Skills section so agents know what tools they have before starting work. - Document the plan-comparison diagnostic workflow (comparing `ctx.sql(...).optimized_logical_plan()` against a DataFrame's `optimized_logical_plan()` via `LogicalPlan.__eq__`) for translating SQL queries to DataFrame form. Points at the full write-up in the `write-dataframe-code` skill rather than duplicating it. `CLAUDE.md` is a symlink to `AGENTS.md`, so the change lands in both. Implements PR 4f of the plan in #1394. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/AGENTS.md b/AGENTS.md
@@ -33,6 +33,35 @@ Skills follow the [Agent Skills](https://agentskills.io) open standard. Each ski
 - `SKILL.md` — The skill definition with YAML frontmatter (name, description, argument-hint) and detailed instructions.
 - Additional supporting files as needed.
 
+Currently available skills:
+
+- [`check-upstream`](.ai/skills/check-upstream/SKILL.md) — audit upstream
+  Apache DataFusion features (functions, DataFrame ops, SessionContext
+  methods, FFI types) not yet exposed in the Python bindings.
+- [`write-dataframe-code`](.ai/skills/write-dataframe-code/SKILL.md) —
+  contributor-facing guide for writing idiomatic DataFrame code inside this
+  repo (TPC-H pattern index, plan-comparison diagnostic, docstring
+  conventions). Layers on top of the user-facing [`SKILL.md`](SKILL.md).
+- [`audit-skill-md`](.ai/skills/audit-skill-md/SKILL.md) — cross-reference
+  the repo-root `SKILL.md` against the current public Python API and report
+  new APIs needing coverage and stale mentions. Run after upstream syncs.
+
+## Plan-comparison diagnostic
+
+When translating a SQL query to a DataFrame — TPC-H, a benchmark, or an
+answer to a user question — correctness is gated by the answer-file
+comparison in `examples/tpch/_tests.py`, but plan-level equivalence is a
+separate question. Two surface-different DataFrame forms that resolve to
+the same optimized logical plan are effectively the same query.
+
+As an ad-hoc check, compare `ctx.sql(reference_sql).optimized_logical_plan()`
+against the DataFrame's `optimized_logical_plan()`. Use `LogicalPlan.__eq__`
+for structural equality and `LogicalPlan.display_indent()` for readable
+diffs. This is a diagnostic, not a gate — a mismatch does not mean the
+DataFrame form is wrong, only that the two forms are not literally the same
+plan. The [`write-dataframe-code`](.ai/skills/write-dataframe-code/SKILL.md)
+skill has the full workflow.
+
 ## Pull Requests
 
 Every pull request must follow the template in