|
| 1 | +.. Licensed to the Apache Software Foundation (ASF) under one |
| 2 | +.. or more contributor license agreements. See the NOTICE file |
| 3 | +.. distributed with this work for additional information |
| 4 | +.. regarding copyright ownership. The ASF licenses this file |
| 5 | +.. to you under the Apache License, Version 2.0 (the |
| 6 | +.. "License"); you may not use this file except in compliance |
| 7 | +.. with the License. You may obtain a copy of the License at |
| 8 | +
|
| 9 | +.. http://www.apache.org/licenses/LICENSE-2.0 |
| 10 | +
|
| 11 | +.. Unless required by applicable law or agreed to in writing, |
| 12 | +.. software distributed under the License is distributed on an |
| 13 | +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 14 | +.. KIND, either express or implied. See the License for the |
| 15 | +.. specific language governing permissions and limitations |
| 16 | +.. under the License. |
| 17 | +
|
| 18 | +.. _execution_metrics: |
| 19 | + |
| 20 | +Execution Metrics |
| 21 | +================= |
| 22 | + |
| 23 | +Overview |
| 24 | +-------- |
| 25 | + |
| 26 | +When DataFusion executes a query it compiles the logical plan into a tree of |
| 27 | +*physical plan operators* (e.g. ``FilterExec``, ``ProjectionExec``, |
| 28 | +``HashAggregateExec``). Each operator can record runtime statistics while it |
| 29 | +runs. These statistics are called **execution metrics**. |
| 30 | + |
| 31 | +Typical metrics include: |
| 32 | + |
| 33 | +- **output_rows** – number of rows produced by the operator |
| 34 | +- **elapsed_compute** – total CPU time (nanoseconds) spent inside the operator |
| 35 | +- **spill_count** – number of times the operator spilled data to disk |
| 36 | +- **spilled_bytes** – total bytes written to disk during spills |
| 37 | +- **spilled_rows** – total rows written to disk during spills |
| 38 | + |
| 39 | +Metrics are collected *per-partition*: DataFusion may execute each operator |
| 40 | +in parallel across several partitions. The convenience properties on |
| 41 | +:py:class:`~datafusion.MetricsSet` (e.g. ``output_rows``, ``elapsed_compute``) |
| 42 | +automatically sum the named metric across **all** partitions, giving a single |
| 43 | +aggregate value for the operator as a whole. You can also access the raw |
| 44 | +per-partition :py:class:`~datafusion.Metric` objects via |
| 45 | +:py:meth:`~datafusion.MetricsSet.metrics`. |
| 46 | + |
| 47 | +When Are Metrics Available? |
| 48 | +--------------------------- |
| 49 | + |
| 50 | +Metrics are populated only **after** the DataFrame has been executed. |
| 51 | +Execution is triggered by any of the terminal operations: |
| 52 | + |
| 53 | +- :py:meth:`~datafusion.DataFrame.collect` |
| 54 | +- :py:meth:`~datafusion.DataFrame.collect_partitioned` |
| 55 | +- :py:meth:`~datafusion.DataFrame.execute_stream` |
| 56 | +- :py:meth:`~datafusion.DataFrame.execute_stream_partitioned` |
| 57 | + |
| 58 | +Calling :py:meth:`~datafusion.ExecutionPlan.collect_metrics` before execution |
| 59 | +will return entries with empty (or ``None``) metric sets because the operators |
| 60 | +have not run yet. |
| 61 | + |
| 62 | +Reading the Physical Plan Tree |
| 63 | +-------------------------------- |
| 64 | + |
| 65 | +:py:meth:`~datafusion.DataFrame.execution_plan` returns the root |
| 66 | +:py:class:`~datafusion.ExecutionPlan` node of the physical plan tree. The tree |
| 67 | +mirrors the operator pipeline: the root is typically a projection or |
| 68 | +coalescing node; its children are filters, aggregates, scans, etc. |
| 69 | + |
| 70 | +The ``operator_name`` string returned by |
| 71 | +:py:meth:`~datafusion.ExecutionPlan.collect_metrics` is the *display* name of |
| 72 | +the node, for example ``"FilterExec: column1@0 > 1"``. This is the same string |
| 73 | +you would see when calling ``plan.display()``. |
| 74 | + |
| 75 | +Available Metrics |
| 76 | +----------------- |
| 77 | + |
| 78 | +The following metrics are directly accessible as properties on |
| 79 | +:py:class:`~datafusion.MetricsSet`: |
| 80 | + |
| 81 | +.. list-table:: |
| 82 | + :header-rows: 1 |
| 83 | + :widths: 25 75 |
| 84 | + |
| 85 | + * - Property |
| 86 | + - Description |
| 87 | + * - ``output_rows`` |
| 88 | + - Number of rows emitted by the operator (summed across partitions). |
| 89 | + * - ``elapsed_compute`` |
| 90 | + - CPU time in nanoseconds spent inside the operator's execute loop |
| 91 | + (summed across partitions). |
| 92 | + * - ``spill_count`` |
| 93 | + - Number of spill-to-disk events due to memory pressure (summed across |
| 94 | + partitions). |
| 95 | + * - ``spilled_bytes`` |
| 96 | + - Total bytes written to disk during spills (summed across partitions). |
| 97 | + * - ``spilled_rows`` |
| 98 | + - Total rows written to disk during spills (summed across partitions). |
| 99 | + |
| 100 | +Any metric not listed above can be accessed via |
| 101 | +:py:meth:`~datafusion.MetricsSet.sum_by_name`, or by iterating over the raw |
| 102 | +:py:class:`~datafusion.Metric` objects returned by |
| 103 | +:py:meth:`~datafusion.MetricsSet.metrics`. |
| 104 | + |
| 105 | +Labels |
| 106 | +------ |
| 107 | + |
| 108 | +A :py:class:`~datafusion.Metric` may carry *labels*: key/value pairs that |
| 109 | +provide additional context. For example, some operators tag their output |
| 110 | +metrics with an ``output_type`` label to distinguish between intermediate and |
| 111 | +final output: |
| 112 | + |
| 113 | +.. code-block:: python |
| 114 | +
|
| 115 | + for metric in metrics_set.metrics(): |
| 116 | + print(metric.name, metric.labels()) |
| 117 | + # output_rows {'output_type': 'final'} |
| 118 | +
|
| 119 | +Labels are operator-specific; most metrics have no labels. |
| 120 | + |
| 121 | +End-to-End Example |
| 122 | +------------------ |
| 123 | + |
| 124 | +.. code-block:: python |
| 125 | +
|
| 126 | + from datafusion import SessionContext |
| 127 | +
|
| 128 | + ctx = SessionContext() |
| 129 | + ctx.sql("CREATE TABLE sales AS VALUES (1, 100), (2, 200), (3, 50)") |
| 130 | +
|
| 131 | + df = ctx.sql("SELECT * FROM sales WHERE column1 > 1") |
| 132 | +
|
| 133 | + # Execute the query — this populates the metrics |
| 134 | + results = df.collect() |
| 135 | +
|
| 136 | + # Retrieve the physical plan with metrics |
| 137 | + plan = df.execution_plan() |
| 138 | +
|
| 139 | + # Walk every operator and print its metrics |
| 140 | + for operator_name, ms in plan.collect_metrics(): |
| 141 | + if ms.output_rows is not None: |
| 142 | + print(f"{operator_name}") |
| 143 | + print(f" output_rows = {ms.output_rows}") |
| 144 | + print(f" elapsed_compute = {ms.elapsed_compute} ns") |
| 145 | +
|
| 146 | + # Access raw per-partition metrics |
| 147 | + for operator_name, ms in plan.collect_metrics(): |
| 148 | + for metric in ms.metrics(): |
| 149 | + print( |
| 150 | + f" partition={metric.partition} " |
| 151 | + f"{metric.name}={metric.value} " |
| 152 | + f"labels={metric.labels()}" |
| 153 | + ) |
| 154 | +
|
| 155 | +API Reference |
| 156 | +------------- |
| 157 | + |
| 158 | +- :py:class:`datafusion.ExecutionPlan` — physical plan node |
| 159 | +- :py:meth:`datafusion.ExecutionPlan.collect_metrics` — walk the tree and |
| 160 | + return ``(operator_name, MetricsSet)`` pairs |
| 161 | +- :py:meth:`datafusion.ExecutionPlan.metrics` — return the |
| 162 | + :py:class:`~datafusion.MetricsSet` for a single node |
| 163 | +- :py:class:`datafusion.MetricsSet` — aggregated metrics for one operator |
| 164 | +- :py:class:`datafusion.Metric` — a single per-partition metric value |
0 commit comments