|
1 | | -# SQL Shield |
| 1 | +# sqlshield |
2 | 2 |
|
3 | | -Validate raw SQL queries present in your Python or Rust codebase against a schema using ```sqlshield```: |
| 3 | +> Schema-aware SQL linter for embedded queries. Catches missing tables, |
| 4 | +> missing columns, and broken JOINs in raw SQL strings inside Python and |
| 5 | +> Rust source — at edit time, not at runtime. |
4 | 6 |
|
5 | | -```shell |
6 | | -$ sqlshield --help |
7 | | -Usage: sqlshield [OPTIONS] |
| 7 | +```python |
| 8 | +def fetch_user(uid): |
| 9 | + return db.execute(f"SELECT id, nickname FROM users WHERE id = {uid}") |
| 10 | +``` |
8 | 11 |
|
9 | | -Options: |
10 | | - -d, --directory <DIRECTORY> |
11 | | - Directory. Defaults to "." (current) |
| 12 | +```text |
| 13 | +$ sqlshield --directory src --schema schema.sql |
| 14 | +src/queries.py:2: error: Column `nickname` not found in table `users` |
| 15 | +``` |
12 | 16 |
|
13 | | - -s, --schema <SCHEMA> |
14 | | - Schema file. Defaults to "schema.sql" |
| 17 | +## Why |
15 | 18 |
|
16 | | - -h, --help |
17 | | - Print help (see a summary with '-h') |
| 19 | +Raw SQL strings inside application code are a common blind spot. Type |
| 20 | +checkers don't read them. Database connections are mocked in unit tests. |
| 21 | +The error surfaces only when the query runs — which is fine on the happy |
| 22 | +path and miserable everywhere else. sqlshield reads your source files, |
| 23 | +extracts every SQL string, and validates each one against your schema — |
| 24 | +without touching a database. |
18 | 25 |
|
19 | | - -V, --version |
20 | | - Print version |
21 | | -``` |
| 26 | +It works on: |
| 27 | + |
| 28 | +- Plain `"…"` and raw `r#"…"#` Rust string literals (sqlx-idiomatic). |
| 29 | +- Python f-strings (`f"…{x}…"`) and `.format()` strings (`{{` / `}}` |
| 30 | + escapes preserved). |
| 31 | +- Standalone `.sql` files (via the LSP). |
| 32 | + |
| 33 | +It checks: |
22 | 34 |
|
23 | | -## Installation |
| 35 | +- Tables and columns referenced anywhere — projection, `WHERE`, `HAVING`, |
| 36 | + `GROUP BY`, `ORDER BY`, `JOIN ON` / `USING`, function arguments, `CASE` |
| 37 | + branches, `CAST`, arithmetic, set operations. |
| 38 | +- `INSERT` / `UPDATE` / `DELETE` target tables and column lists. |
| 39 | +- `WITH` / CTEs (including `WITH RECURSIVE` and explicit column lists), |
| 40 | + derived tables in `FROM`, parenthesized join groups, scalar / `IN` / |
| 41 | + `EXISTS` subqueries — each in their own scope. |
| 42 | +- Schema-qualified table names (`public.users`) — strict for qualified |
| 43 | + queries, permissive for bare ones. |
24 | 44 |
|
25 | | -- Pip: |
| 45 | +## Install |
26 | 46 |
|
27 | | -```shell |
| 47 | +```sh |
| 48 | +# Rust users |
| 49 | +cargo install sqlshield-cli |
| 50 | + |
| 51 | +# Python users |
28 | 52 | pip install sqlshield |
29 | 53 | ``` |
30 | 54 |
|
31 | | -- Cargo |
| 55 | +Or build from source: |
32 | 56 |
|
33 | | -```shell |
34 | | -cargo install sqlshield-cli |
| 57 | +```sh |
| 58 | +git clone https://github.com/davidsmfreire/sqlshield |
| 59 | +cd sqlshield |
| 60 | +cargo build --release |
| 61 | +./target/release/sqlshield --help |
| 62 | +``` |
| 63 | + |
| 64 | +## Quick start |
| 65 | + |
| 66 | +1. Write a schema file (`schema.sql` by default): |
| 67 | + |
| 68 | + ```sql |
| 69 | + CREATE TABLE users (id INT, name VARCHAR(255), email VARCHAR(255)); |
| 70 | + CREATE TABLE orders (id INT, user_id INT, total INT); |
| 71 | + ``` |
| 72 | + |
| 73 | +2. Run sqlshield on your source tree: |
| 74 | + |
| 75 | + ```sh |
| 76 | + sqlshield --directory src --schema schema.sql |
| 77 | + ``` |
| 78 | + |
| 79 | +3. Each finding is reported as `path:line: error: <description>`. The |
| 80 | + process exits `0` on clean, `1` if validation errors were found, and |
| 81 | + `2` for IO / config problems (missing schema, malformed config, |
| 82 | + stdin read failure). |
| 83 | + |
| 84 | +### Standalone query mode |
| 85 | + |
| 86 | +```sh |
| 87 | +echo "SELECT id, missing FROM users" | sqlshield --stdin --schema schema.sql |
| 88 | +# error: Column `missing` not found in table `users` |
| 89 | +``` |
| 90 | + |
| 91 | +Useful for editor integrations that pipe a single buffer through the |
| 92 | +linter. |
| 93 | + |
| 94 | +### JSON output |
| 95 | + |
| 96 | +```sh |
| 97 | +sqlshield --directory src --schema schema.sql --format json |
35 | 98 | ``` |
36 | 99 |
|
37 | | -## Features |
| 100 | +```json |
| 101 | +[ |
| 102 | + { |
| 103 | + "location": "src/queries.py:2", |
| 104 | + "description": "Column `nickname` not found in table `users`" |
| 105 | + } |
| 106 | +] |
| 107 | +``` |
38 | 108 |
|
39 | | -The tool validates the following main clauses: |
| 109 | +Stable shape; safe to pipe into `jq` or feed to a CI annotator. |
40 | 110 |
|
41 | | -- SELECT :heavy_check_mark: |
42 | | - - WITH :heavy_check_mark: |
43 | | - - JOIN :heavy_check_mark: |
44 | | - - Derived tables (`FROM (SELECT …) alias`) :heavy_check_mark: |
45 | | -- INSERT :heavy_check_mark: |
46 | | -- UPDATE :heavy_check_mark: |
47 | | -- DELETE :heavy_check_mark: |
| 111 | +## Configuration |
48 | 112 |
|
49 | | -Other clauses: |
| 113 | +Drop a `.sqlshield.toml` at the project root. CLI flags override the |
| 114 | +config; the config overrides defaults. |
| 115 | + |
| 116 | +```toml |
| 117 | +# .sqlshield.toml |
| 118 | +schema = "db/schema.sql" |
| 119 | +directory = "src" |
| 120 | +dialect = "postgres" |
| 121 | +``` |
50 | 122 |
|
51 | | -- WHERE :heavy_check_mark: |
52 | | -- ORDER BY :heavy_check_mark: |
53 | | -- GROUP BY :heavy_check_mark: |
54 | | -- HAVING :heavy_check_mark: |
| 123 | +Supported dialects: `generic` (default), `postgres` / `postgresql` / `pg`, |
| 124 | +`mysql`, `sqlite`, `mssql` / `sqlserver`, `snowflake`, `bigquery` / `bq`, |
| 125 | +`redshift`, `clickhouse`, `duckdb`, `hive`, `ansi`. The dialect controls |
| 126 | +how the SQL parser handles vendor-specific syntax (Postgres `::` casts, |
| 127 | +MySQL backticks, …). |
55 | 128 |
|
56 | | -Schema-qualified table names (`public.users`) are resolved strictly |
57 | | -when the query is qualified and permissively when unqualified. |
| 129 | +The walker prunes `target/`, `.git/`, `node_modules/`, `.venv/`, `venv/`, |
| 130 | +`__pycache__/`, `.pytest_cache/`, `.mypy_cache/`, `.ruff_cache/`, `.tox/`, |
| 131 | +`dist/`, `build/`, `.idea/`, and `.vscode/` automatically. |
58 | 132 |
|
59 | 133 | ## Editor integration |
60 | 134 |
|
61 | | -[`sqlshield-lsp`](sqlshield-lsp/README.md) provides a Language Server |
62 | | -Protocol front-end so any LSP-aware editor (VS Code, Neovim, Helix, …) can |
63 | | -show diagnostics inline on the offending SQL string. |
| 135 | +[`sqlshield-lsp`](sqlshield-lsp/README.md) is a Language Server that |
| 136 | +publishes diagnostics for embedded SQL on every `didOpen` / `didChange`. |
| 137 | +Any LSP-aware editor (VS Code, Neovim, Helix, Emacs, Zed) can show inline |
| 138 | +squiggles on the offending SQL string. The crate's README has the wiring |
| 139 | +recipes. |
| 140 | + |
| 141 | +## Python integration |
| 142 | + |
| 143 | +[`sqlshield-py`](sqlshield-py/) exposes `validate_query` and |
| 144 | +`validate_files` as Python functions: |
| 145 | + |
| 146 | +```python |
| 147 | +import sqlshield |
| 148 | + |
| 149 | +errors = sqlshield.validate_query( |
| 150 | + "SELECT email FROM users", |
| 151 | + "CREATE TABLE users (id INT, name VARCHAR(255))", |
| 152 | +) |
| 153 | +# ['Column `email` not found in table `users`'] |
| 154 | +``` |
| 155 | + |
| 156 | +## Feature support |
| 157 | + |
| 158 | +| Clause / construct | Status | |
| 159 | +| --------------------------------------------------- | :----: | |
| 160 | +| `SELECT` projection | ✅ | |
| 161 | +| `WHERE` / `HAVING` / `GROUP BY` / `ORDER BY` | ✅ | |
| 162 | +| Projection alias references in `HAVING` / `ORDER BY` | ✅ | |
| 163 | +| `JOIN` `ON` / `USING` | ✅ | |
| 164 | +| Parenthesized join groups | ✅ | |
| 165 | +| `WITH` / CTE, `WITH RECURSIVE`, explicit `(a, b)` lists | ✅ | |
| 166 | +| Derived tables (`FROM (SELECT …) alias`) | ✅ | |
| 167 | +| Subqueries (`IN`, `EXISTS`, scalar) — own scope | ✅ | |
| 168 | +| `UNION` / `INTERSECT` / `EXCEPT` | ✅ | |
| 169 | +| `INSERT` (incl. `INSERT … SELECT`) | ✅ | |
| 170 | +| `UPDATE` (assignments, `WHERE`, `FROM`) | ✅ | |
| 171 | +| `DELETE` (`USING`, `WHERE`) | ✅ | |
| 172 | +| `WITH … INSERT/UPDATE` | ✅ | |
| 173 | +| Schema-qualified names (`public.users`) | ✅ | |
| 174 | +| `ALTER TABLE ADD/DROP/RENAME COLUMN` ingestion | ✅ | |
| 175 | +| `CREATE VIEW` / `CREATE TABLE … AS SELECT` | ✅ | |
| 176 | +| Function args / `CASE` / `CAST` / arithmetic | ✅ | |
| 177 | +| Case-insensitive identifier matching | ✅ | |
| 178 | +| 12 SQL dialects via `--dialect` | ✅ | |
| 179 | +| Live database introspection | ✗ | |
| 180 | +| Quoted-vs-unquoted identifier folding (Postgres rules) | ✗ | |
| 181 | +| `MERGE` | ✗ | |
| 182 | + |
| 183 | +## Limitations |
| 184 | + |
| 185 | +- **Identifier matching is ASCII case-insensitive.** sqlshield treats |
| 186 | + `Id` and `id` as the same column. Postgres-style "quoted identifiers |
| 187 | + are case-sensitive" semantics aren't modeled. |
| 188 | +- **Dynamic table / column names** (`SELECT {col} FROM t`) substitute the |
| 189 | + placeholder with `1`. Column-position placeholders silently pass; table- |
| 190 | + position placeholders break the parse and the query is dropped. |
| 191 | +- **Two qualified tables sharing a bare name** (`schema_a.users` and |
| 192 | + `schema_b.users`) collide on the bare key — last declaration wins for |
| 193 | + unqualified queries. Qualified references resolve strictly. |
| 194 | +- **Schema is parsed once.** Triggers, stored procedures, and INSTEAD OF |
| 195 | + rules aren't tracked. |
| 196 | +- **Per-file errors are silently swallowed** during a directory scan |
| 197 | + (parse failures, missing-extension errors). Use `--stdin` to surface |
| 198 | + them for a single query. |
| 199 | + |
| 200 | +## Architecture |
| 201 | + |
| 202 | +```text |
| 203 | +Source file (*.py, *.rs) |
| 204 | + │ tree-sitter extracts string literals (decoded escapes / raw strings) |
| 205 | + ▼ |
| 206 | +SQL string (with `{…}` placeholders replaced by `1`) |
| 207 | + │ sqlparser parses with the chosen dialect |
| 208 | + ▼ |
| 209 | +AST (Vec<Statement>) |
| 210 | + │ recursive walker: scope-aware Expr resolution + clause validators |
| 211 | + ▼ |
| 212 | +Vec<SqlValidationError> |
| 213 | +``` |
| 214 | + |
| 215 | +Workspace layout: |
| 216 | + |
| 217 | +- [`sqlshield/`](sqlshield/) — core library. Public surface: |
| 218 | + `validate_query`, `validate_files`, `Dialect`, `SqlShieldError`. |
| 219 | +- [`sqlshield-cli/`](sqlshield-cli/) — clap-based CLI wrapper. |
| 220 | +- [`sqlshield-py/`](sqlshield-py/) — PyO3 bindings. |
| 221 | +- [`sqlshield-lsp/`](sqlshield-lsp/) — `tower-lsp` Language Server. |
| 222 | + |
| 223 | +## Similar tools |
| 224 | + |
| 225 | +- [`postguard`](https://github.com/andywer/postguard) — Postgres-only, |
| 226 | + ts-only, runs against a live database. |
| 227 | +- [`schemasafe`](https://github.com/schemasafe/schemasafe) — |
| 228 | + query-checker for TypeScript / JavaScript. |
| 229 | +- [`sqlc`](https://github.com/sqlc-dev/sqlc) — code generator that |
| 230 | + type-checks SQL against a schema; reads the schema, then generates |
| 231 | + bindings rather than linting existing code. |
| 232 | +- [`squawk`](https://github.com/sbdchd/squawk) — Postgres migration |
| 233 | + linter; complementary, not overlapping (squawk lints DDL, sqlshield |
| 234 | + lints embedded DML/SELECT). |
| 235 | + |
| 236 | +sqlshield's niche: language-agnostic extraction (Python + Rust today, |
| 237 | +extensible) with a multi-dialect parser, no database connection |
| 238 | +required. |
| 239 | + |
| 240 | +## Contributing |
| 241 | + |
| 242 | +See [CONTRIBUTING.md](CONTRIBUTING.md) for the dev setup, the |
| 243 | +ClauseValidation extension recipe, and the release process. |
64 | 244 |
|
65 | | -## Similar work |
| 245 | +## License |
66 | 246 |
|
67 | | -- <https://github.com/andywer/postguard> |
68 | | -- <https://github.com/schemasafe> |
| 247 | +[MIT](LICENSE). |
0 commit comments