Skip to content

Commit 0caa510

Browse files
davidsmfreireclaude
andcommitted
docs: rewrite root README + add per-crate READMEs + refresh ROADMAP
Root README went from a 68-line stub to a full overview: motivating example, install paths, quick start, configuration (`.sqlshield.toml`, dialect, CLI flags, output formats, exit codes, stdin), editor and Python integration pointers, a full feature-support matrix, an honest limitations section, an architecture diagram of the validation pipeline, and a comparison vs. postguard / schemasafe / sqlc / squawk. Per-crate READMEs target their own publication channel: - `sqlshield-cli/README.md` (new) — focused CLI usage, exit codes, JSON output, stdin mode. crates.io display. - `sqlshield/README.md` (new) — library API examples for the core engine, list of public surface, pointer to front-ends. - `sqlshield-py/README.md` — expanded from 3 lines to a usable PyPI listing covering `validate_query` / `validate_files`, the `SqlValidationError` shape, and the format-placeholder behavior. `ROADMAP.md` (was 3 lines) split into Done / Considering / Not planned to reflect the post-rewrite reality. Live DB introspection (the original single line) moved to Considering; MERGE, more language extractors, and quoted-identifier folding added. `CONTRIBUTING.md` and `SECURITY.md` already in good shape, untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b4c961b commit 0caa510

5 files changed

Lines changed: 491 additions & 46 deletions

File tree

README.md

Lines changed: 222 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,68 +1,247 @@
1-
# SQL Shield
1+
# sqlshield
22

3-
Validate raw SQL queries present in your Python or Rust codebase against a schema using ```sqlshield```:
3+
> Schema-aware SQL linter for embedded queries. Catches missing tables,
4+
> missing columns, and broken JOINs in raw SQL strings inside Python and
5+
> Rust source — at edit time, not at runtime.
46
5-
```shell
6-
$ sqlshield --help
7-
Usage: sqlshield [OPTIONS]
7+
```python
8+
def fetch_user(uid):
9+
return db.execute(f"SELECT id, nickname FROM users WHERE id = {uid}")
10+
```
811

9-
Options:
10-
-d, --directory <DIRECTORY>
11-
Directory. Defaults to "." (current)
12+
```text
13+
$ sqlshield --directory src --schema schema.sql
14+
src/queries.py:2: error: Column `nickname` not found in table `users`
15+
```
1216

13-
-s, --schema <SCHEMA>
14-
Schema file. Defaults to "schema.sql"
17+
## Why
1518

16-
-h, --help
17-
Print help (see a summary with '-h')
19+
Raw SQL strings inside application code are a common blind spot. Type
20+
checkers don't read them. Database connections are mocked in unit tests.
21+
The error surfaces only when the query runs — which is fine on the happy
22+
path and miserable everywhere else. sqlshield reads your source files,
23+
extracts every SQL string, and validates each one against your schema —
24+
without touching a database.
1825

19-
-V, --version
20-
Print version
21-
```
26+
It works on:
27+
28+
- Plain `"…"` and raw `r#"…"#` Rust string literals (sqlx-idiomatic).
29+
- Python f-strings (`f"…{x}…"`) and `.format()` strings (`{{` / `}}`
30+
escapes preserved).
31+
- Standalone `.sql` files (via the LSP).
32+
33+
It checks:
2234

23-
## Installation
35+
- Tables and columns referenced anywhere — projection, `WHERE`, `HAVING`,
36+
`GROUP BY`, `ORDER BY`, `JOIN ON` / `USING`, function arguments, `CASE`
37+
branches, `CAST`, arithmetic, set operations.
38+
- `INSERT` / `UPDATE` / `DELETE` target tables and column lists.
39+
- `WITH` / CTEs (including `WITH RECURSIVE` and explicit column lists),
40+
derived tables in `FROM`, parenthesized join groups, scalar / `IN` /
41+
`EXISTS` subqueries — each in their own scope.
42+
- Schema-qualified table names (`public.users`) — strict for qualified
43+
queries, permissive for bare ones.
2444

25-
- Pip:
45+
## Install
2646

27-
```shell
47+
```sh
48+
# Rust users
49+
cargo install sqlshield-cli
50+
51+
# Python users
2852
pip install sqlshield
2953
```
3054

31-
- Cargo
55+
Or build from source:
3256

33-
```shell
34-
cargo install sqlshield-cli
57+
```sh
58+
git clone https://github.com/davidsmfreire/sqlshield
59+
cd sqlshield
60+
cargo build --release
61+
./target/release/sqlshield --help
62+
```
63+
64+
## Quick start
65+
66+
1. Write a schema file (`schema.sql` by default):
67+
68+
```sql
69+
CREATE TABLE users (id INT, name VARCHAR(255), email VARCHAR(255));
70+
CREATE TABLE orders (id INT, user_id INT, total INT);
71+
```
72+
73+
2. Run sqlshield on your source tree:
74+
75+
```sh
76+
sqlshield --directory src --schema schema.sql
77+
```
78+
79+
3. Each finding is reported as `path:line: error: <description>`. The
80+
process exits `0` on clean, `1` if validation errors were found, and
81+
`2` for IO / config problems (missing schema, malformed config,
82+
stdin read failure).
83+
84+
### Standalone query mode
85+
86+
```sh
87+
echo "SELECT id, missing FROM users" | sqlshield --stdin --schema schema.sql
88+
# error: Column `missing` not found in table `users`
89+
```
90+
91+
Useful for editor integrations that pipe a single buffer through the
92+
linter.
93+
94+
### JSON output
95+
96+
```sh
97+
sqlshield --directory src --schema schema.sql --format json
3598
```
3699

37-
## Features
100+
```json
101+
[
102+
{
103+
"location": "src/queries.py:2",
104+
"description": "Column `nickname` not found in table `users`"
105+
}
106+
]
107+
```
38108

39-
The tool validates the following main clauses:
109+
Stable shape; safe to pipe into `jq` or feed to a CI annotator.
40110

41-
- SELECT :heavy_check_mark:
42-
- WITH :heavy_check_mark:
43-
- JOIN :heavy_check_mark:
44-
- Derived tables (`FROM (SELECT …) alias`) :heavy_check_mark:
45-
- INSERT :heavy_check_mark:
46-
- UPDATE :heavy_check_mark:
47-
- DELETE :heavy_check_mark:
111+
## Configuration
48112

49-
Other clauses:
113+
Drop a `.sqlshield.toml` at the project root. CLI flags override the
114+
config; the config overrides defaults.
115+
116+
```toml
117+
# .sqlshield.toml
118+
schema = "db/schema.sql"
119+
directory = "src"
120+
dialect = "postgres"
121+
```
50122

51-
- WHERE :heavy_check_mark:
52-
- ORDER BY :heavy_check_mark:
53-
- GROUP BY :heavy_check_mark:
54-
- HAVING :heavy_check_mark:
123+
Supported dialects: `generic` (default), `postgres` / `postgresql` / `pg`,
124+
`mysql`, `sqlite`, `mssql` / `sqlserver`, `snowflake`, `bigquery` / `bq`,
125+
`redshift`, `clickhouse`, `duckdb`, `hive`, `ansi`. The dialect controls
126+
how the SQL parser handles vendor-specific syntax (Postgres `::` casts,
127+
MySQL backticks, …).
55128

56-
Schema-qualified table names (`public.users`) are resolved strictly
57-
when the query is qualified and permissively when unqualified.
129+
The walker prunes `target/`, `.git/`, `node_modules/`, `.venv/`, `venv/`,
130+
`__pycache__/`, `.pytest_cache/`, `.mypy_cache/`, `.ruff_cache/`, `.tox/`,
131+
`dist/`, `build/`, `.idea/`, and `.vscode/` automatically.
58132

59133
## Editor integration
60134

61-
[`sqlshield-lsp`](sqlshield-lsp/README.md) provides a Language Server
62-
Protocol front-end so any LSP-aware editor (VS Code, Neovim, Helix, …) can
63-
show diagnostics inline on the offending SQL string.
135+
[`sqlshield-lsp`](sqlshield-lsp/README.md) is a Language Server that
136+
publishes diagnostics for embedded SQL on every `didOpen` / `didChange`.
137+
Any LSP-aware editor (VS Code, Neovim, Helix, Emacs, Zed) can show inline
138+
squiggles on the offending SQL string. The crate's README has the wiring
139+
recipes.
140+
141+
## Python integration
142+
143+
[`sqlshield-py`](sqlshield-py/) exposes `validate_query` and
144+
`validate_files` as Python functions:
145+
146+
```python
147+
import sqlshield
148+
149+
errors = sqlshield.validate_query(
150+
"SELECT email FROM users",
151+
"CREATE TABLE users (id INT, name VARCHAR(255))",
152+
)
153+
# ['Column `email` not found in table `users`']
154+
```
155+
156+
## Feature support
157+
158+
| Clause / construct | Status |
159+
| --------------------------------------------------- | :----: |
160+
| `SELECT` projection ||
161+
| `WHERE` / `HAVING` / `GROUP BY` / `ORDER BY` ||
162+
| Projection alias references in `HAVING` / `ORDER BY` ||
163+
| `JOIN` `ON` / `USING` ||
164+
| Parenthesized join groups ||
165+
| `WITH` / CTE, `WITH RECURSIVE`, explicit `(a, b)` lists ||
166+
| Derived tables (`FROM (SELECT …) alias`) ||
167+
| Subqueries (`IN`, `EXISTS`, scalar) — own scope ||
168+
| `UNION` / `INTERSECT` / `EXCEPT` ||
169+
| `INSERT` (incl. `INSERT … SELECT`) ||
170+
| `UPDATE` (assignments, `WHERE`, `FROM`) ||
171+
| `DELETE` (`USING`, `WHERE`) ||
172+
| `WITH … INSERT/UPDATE` ||
173+
| Schema-qualified names (`public.users`) ||
174+
| `ALTER TABLE ADD/DROP/RENAME COLUMN` ingestion ||
175+
| `CREATE VIEW` / `CREATE TABLE … AS SELECT` ||
176+
| Function args / `CASE` / `CAST` / arithmetic ||
177+
| Case-insensitive identifier matching ||
178+
| 12 SQL dialects via `--dialect` ||
179+
| Live database introspection ||
180+
| Quoted-vs-unquoted identifier folding (Postgres rules) ||
181+
| `MERGE` ||
182+
183+
## Limitations
184+
185+
- **Identifier matching is ASCII case-insensitive.** sqlshield treats
186+
`Id` and `id` as the same column. Postgres-style "quoted identifiers
187+
are case-sensitive" semantics aren't modeled.
188+
- **Dynamic table / column names** (`SELECT {col} FROM t`) substitute the
189+
placeholder with `1`. Column-position placeholders silently pass; table-
190+
position placeholders break the parse and the query is dropped.
191+
- **Two qualified tables sharing a bare name** (`schema_a.users` and
192+
`schema_b.users`) collide on the bare key — last declaration wins for
193+
unqualified queries. Qualified references resolve strictly.
194+
- **Schema is parsed once.** Triggers, stored procedures, and INSTEAD OF
195+
rules aren't tracked.
196+
- **Per-file errors are silently swallowed** during a directory scan
197+
(parse failures, missing-extension errors). Use `--stdin` to surface
198+
them for a single query.
199+
200+
## Architecture
201+
202+
```text
203+
Source file (*.py, *.rs)
204+
│ tree-sitter extracts string literals (decoded escapes / raw strings)
205+
206+
SQL string (with `{…}` placeholders replaced by `1`)
207+
│ sqlparser parses with the chosen dialect
208+
209+
AST (Vec<Statement>)
210+
│ recursive walker: scope-aware Expr resolution + clause validators
211+
212+
Vec<SqlValidationError>
213+
```
214+
215+
Workspace layout:
216+
217+
- [`sqlshield/`](sqlshield/) — core library. Public surface:
218+
`validate_query`, `validate_files`, `Dialect`, `SqlShieldError`.
219+
- [`sqlshield-cli/`](sqlshield-cli/) — clap-based CLI wrapper.
220+
- [`sqlshield-py/`](sqlshield-py/) — PyO3 bindings.
221+
- [`sqlshield-lsp/`](sqlshield-lsp/)`tower-lsp` Language Server.
222+
223+
## Similar tools
224+
225+
- [`postguard`](https://github.com/andywer/postguard) — Postgres-only,
226+
ts-only, runs against a live database.
227+
- [`schemasafe`](https://github.com/schemasafe/schemasafe)
228+
query-checker for TypeScript / JavaScript.
229+
- [`sqlc`](https://github.com/sqlc-dev/sqlc) — code generator that
230+
type-checks SQL against a schema; reads the schema, then generates
231+
bindings rather than linting existing code.
232+
- [`squawk`](https://github.com/sbdchd/squawk) — Postgres migration
233+
linter; complementary, not overlapping (squawk lints DDL, sqlshield
234+
lints embedded DML/SELECT).
235+
236+
sqlshield's niche: language-agnostic extraction (Python + Rust today,
237+
extensible) with a multi-dialect parser, no database connection
238+
required.
239+
240+
## Contributing
241+
242+
See [CONTRIBUTING.md](CONTRIBUTING.md) for the dev setup, the
243+
ClauseValidation extension recipe, and the release process.
64244

65-
## Similar work
245+
## License
66246

67-
- <https://github.com/andywer/postguard>
68-
- <https://github.com/schemasafe>
247+
[MIT](LICENSE).

ROADMAP.md

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,36 @@
11
# Roadmap
22

3-
- Allow direct db connection instead of providing schema
3+
## Done
4+
5+
- Schema-aware validation across SELECT / INSERT / UPDATE / DELETE,
6+
CTEs (incl. `WITH RECURSIVE`), set ops, derived tables, JOIN ON /
7+
USING, scope-aware subqueries.
8+
- Schema ingestion: `CREATE TABLE`, `ALTER TABLE` (ADD/DROP/RENAME
9+
COLUMN), `CREATE VIEW`, `CREATE TABLE … AS SELECT`.
10+
- 12 SQL dialects via `--dialect`.
11+
- ASCII case-insensitive identifier matching.
12+
- Output formats: text + JSON; split exit codes; `--stdin` mode.
13+
- `.sqlshield.toml` configuration with CLI override layering.
14+
- Parallel file walker (rayon) with default ignore list.
15+
- Language Server (`sqlshield-lsp`) for inline editor diagnostics
16+
in `.py` / `.rs` / `.sql`.
17+
- Python bindings (`sqlshield-py`).
18+
19+
## Considering
20+
21+
- **Live database introspection** — connect to Postgres / MySQL /
22+
Sqlite and read the schema directly, no SQL dump required.
23+
- **Postgres quoted-vs-unquoted identifier folding** — currently we
24+
treat all identifiers as case-insensitive; a quoted-aware mode
25+
would match Postgres semantics more precisely.
26+
- **`MERGE` support** — would round out the DML coverage.
27+
- **More language extractors** — Go, TypeScript, Java string literals.
28+
Each is a small `finder/<lang>.rs` module + tree-sitter grammar.
29+
- **First-party VS Code extension** — currently the LSP is wired via
30+
generic LSP-client extensions.
31+
32+
## Not planned
33+
34+
- Anything that requires running queries on a live database (parameter
35+
type-checking against actual table types, constraint validation).
36+
sqlshield is deliberately a static linter.

0 commit comments

Comments
 (0)