|
| 1 | +# DataFusion in Python |
| 2 | + |
| 3 | +> Apache DataFusion Python is a Python binding for Apache DataFusion, an in-process, Arrow-native query engine. It exposes a SQL interface and a lazy DataFrame API over PyArrow and any Arrow C Data Interface source. This file points agents and LLM-based tools at the most useful entry points for writing DataFusion Python code. |
| 4 | + |
| 5 | +## Agent Guide |
| 6 | + |
| 7 | +- [SKILL.md (agent skill)](https://datafusion.apache.org/python/skill.html): idiomatic DataFrame API patterns, SQL-to-DataFrame mappings, common pitfalls, and the full `functions` catalog. Primary source of truth for writing datafusion-python code. |
| 8 | + |
| 9 | +## User Guide |
| 10 | + |
| 11 | +- [Introduction](https://datafusion.apache.org/python/user-guide/introduction.html): install, the Pokemon quick start, Jupyter tips. |
| 12 | +- [Basics](https://datafusion.apache.org/python/user-guide/basics.html): `SessionContext`, `DataFrame`, and `Expr` at a glance. |
| 13 | +- [Data sources](https://datafusion.apache.org/python/user-guide/data-sources.html): Parquet, CSV, JSON, Arrow, Pandas, Polars, and Python objects. |
| 14 | +- [DataFrame operations](https://datafusion.apache.org/python/user-guide/dataframe/index.html): the lazy query-building interface. |
| 15 | +- [Common operations](https://datafusion.apache.org/python/user-guide/common-operations/index.html): select, filter, join, aggregate, window, expressions, and functions. |
| 16 | +- [SQL](https://datafusion.apache.org/python/user-guide/sql.html): running SQL against registered tables. |
| 17 | +- [Configuration](https://datafusion.apache.org/python/user-guide/configuration.html): session and runtime options. |
| 18 | + |
| 19 | +## DataFrame API reference |
| 20 | + |
| 21 | +- [`datafusion.dataframe.DataFrame`](https://datafusion.apache.org/python/autoapi/datafusion/dataframe/index.html): the lazy DataFrame builder (`select`, `filter`, `aggregate`, `join`, `sort`, `limit`, set operations). |
| 22 | +- [`datafusion.expr`](https://datafusion.apache.org/python/autoapi/datafusion/expr/index.html): expression tree nodes (`Expr`, `Window`, `WindowFrame`, `GroupingSet`). |
| 23 | +- [`datafusion.functions`](https://datafusion.apache.org/python/autoapi/datafusion/functions/index.html): 290+ scalar, aggregate, and window functions. |
| 24 | +- [`datafusion.context.SessionContext`](https://datafusion.apache.org/python/autoapi/datafusion/context/index.html): session entry point, data loading, SQL execution. |
| 25 | + |
| 26 | +## Examples |
| 27 | + |
| 28 | +- [TPC-H queries (GitHub)](https://github.com/apache/datafusion-python/tree/main/examples/tpch): canonical translations of TPC-H Q01–Q22 to idiomatic DataFrame code, each with reference SQL embedded in the module docstring. |
| 29 | +- [Other examples (GitHub)](https://github.com/apache/datafusion-python/tree/main/examples): UDF/UDAF/UDWF, Substrait, Pandas/Polars interop, S3 reads. |
| 30 | + |
| 31 | +## Optional |
| 32 | + |
| 33 | +- [Contributor guide](https://datafusion.apache.org/python/contributor-guide/introduction.html): building from source, extending the Python bindings. |
| 34 | +- [Upgrade guides](https://datafusion.apache.org/python/user-guide/upgrade-guides.html): migration notes between releases. |
| 35 | +- [Upstream Rust `DataFusion`](https://datafusion.apache.org/): the underlying query engine. |
0 commit comments