Skip to content

Releases: kubernetes-sigs/gateway-api-inference-extension

v1.5.0-rc.2

16 Apr 17:27
v1.5.0-rc.2
9ab998c

Choose a tag to compare

v1.5.0-rc.2 Pre-release
Pre-release

What's Changed

Full Changelog: v1.5.0-rc.1...v1.5.0-rc.2

v1.5.0-rc.1

09 Apr 05:11
v1.5.0-rc.1
e40b74c

Choose a tag to compare

v1.5.0-rc.1 Pre-release
Pre-release

What's Changed

  • chore(deps): bump the kubernetes group in /conformance with 4 updates by @dependabot[bot] in #2465
  • chore(deps): bump github.com/prometheus/prometheus from 0.309.1 to 0.310.0 by @dependabot[bot] in #2464
  • latencypredictor: training server mean objective + sliding window by @kaushikmitr in #2432
  • [Feat]: Mirror request plugins execution to the response path by @abdallahsamabd in #2369
  • remove redundant field body and isStreaming in requestcontrol.Response by @zetxqx in #2474
  • Read metrics Datasource configuration from config file by @Mohamedma96 in #2441
  • Update release promotion checklist for standalone chart by @danehans in #2479
  • Adds missing license header by @davidbreitgand in #2484
  • feat: Introduce Pluggable Parser Framework for EPP payload processing by @zetxqx in #2359
  • Training server ensemble by @kaushikmitr in #2473
  • Fix vllm gpu deployment yaml by @rahulgurnani in #2487
  • feat(bbr): integrate request body into RequestContext for plugins by @noalimoy in #2442
  • [epp/datalayer]: Move and rename FakeDataSource into pkg/epp/framework/plugins/datalayer/source/mocks package by @Mohamedma96 in #2429
  • Updated go dependencies by @ahg-g in #2496
  • release: include missing staging digests in release process by @danehans in #2497
  • refactor(observability): extract common logging options to shared pac… by @yehuditkerido in #2395
  • [pluggable bbr] remove empty placeholder plugin by @nirrozenbaum in #2451
  • docs: switch quickstarts to agentgateway by @danehans in #2505
  • latencypredictor: improve TPOT training accuracy by @kaushikmitr in #2509
  • fix: log correct error from Prepare Data Plugins by @varad-ahirwadkar in #2512
  • Second part of refactoring of the Ext Proc code between the EPP and the BBR by @shmuelk in #2446
  • remove unused error code by @nirrozenbaum in #2516
  • Add optional inference objective by @Gregory-Pereira in #1995
  • chore(conformance): bump gateway-api to v1.5.0 by @danehans in #2519
  • small refactor of pluggable bbr interfaces by @nirrozenbaum in #2513
  • docs: add flow control user guide and concepts by @LukeAVanDrie in #2438
  • feat: add inference request and received time in FlowControlRequest by @loicmarchal in #2475
  • chore(deps): bump google.golang.org/grpc from 1.79.1 to 1.79.2 by @dependabot[bot] in #2533
  • chore(deps): bump sigs.k8s.io/controller-runtime from 0.23.1 to 0.23.3 by @dependabot[bot] in #2535
  • chore(deps): bump go.opentelemetry.io/otel/exporters/stdout/stdouttrace from 1.41.0 to 1.42.0 by @dependabot[bot] in #2534
  • chore(deps): bump golang.org/x/sync from 0.19.0 to 0.20.0 in /conformance by @dependabot[bot] in #2537
  • chore(deps): bump sigs.k8s.io/controller-runtime from 0.23.1 to 0.23.3 in /conformance by @dependabot[bot] in #2538
  • feat: pool-wide saturation computation + expose as gauge by @evacchi in #2343
  • Add support for gRPC response trailers in EPP by @zetxqx in #2510
  • latencypredictor: prediction server perf + mean + ensemble by @kaushikmitr in #2488
  • Support arm64 builds by @ed-pai in #2350
  • refactor: move chunking utilities to pkg/common/envoy/request by @noalimoy in #2549
  • refactor: consolidate envoy request utilities under pkg/common/envoy by @asaadbalum in #2550
  • feat(tracing): add tracing support and configuration options for BBR by @AtharvaPakade in #2518
  • flowcontrol: add benchmark suite by @LukeAVanDrie in #2539
  • Revert "Support arm64 builds (#2350)" by @kfswain in #2554
  • [bbr] Apply request body mutations to ext_proc response by @asaadbalum in #2551
  • [bbr] Apply response plugin body/header mutations to ext_proc response by @abdallahsamabd in #2477
  • docs: add flow control guide to nav and cross-link by @LukeAVanDrie in #2546
  • fix(bbr): enable response header/body processing in Istio EnvoyFilter by @abdallahsamabd in #2557
  • test(envoy): add missing unit tests for envoy util functions by @yehuditkerido in #2529
  • fix(bbr): remove body content from debug logs to prevent data leak by @abdallahsamabd in #2559
  • fix(flaky) increase the timeout for setting up the ext_proc conn in integration test by @zetxqx in #2547
  • [bbr] Skip body re-serialization when unchanged by @asaadbalum in #2556
  • fix: Remove duplicate loop by @gyliu513 in #2564
  • fix(bbr): add read lock to GetBaseModel to prevent concurrent map crash by @gyliu513 in #2570
  • fix CI timeout issue with multi arch builds by @farazmd in #2572
  • Token in flight features by @kaushikmitr in #2563
  • feature: add slo-based deadline ordering policy by @loicmarchal in #2531
  • test(bbr): add unit tests for body-field-to-header plugin by @asaadbalum in #2569
  • revert multi arch build by @farazmd in #2580
  • agentgateway: update docs and add conformance report by @danehans in #2565
  • fix(conformance): use staging EPP image on main by @danehans in #2577
  • feat: add additional labels to the existing scheduler_attempts_total metric by @lionelvillard in #2545
  • fix(epp): avoid nested RLock in PoolGet to prevent potential deadlock by @gyliu513 in #2571
  • chore(deps): bump golang.org/x/net from 0.48.0 to 0.51.0 in /conformance by @dependabot[bot] in #2469
  • fix(prefix-cache): race conditions in indexer lock management by @hexfusion in #2501
  • datastore minor cleanup by @nirrozenbaum in #2448
  • feat(Flow Control)/Expand Flow Control capacity limits schema(resource.Quantity) by @BizerNotNull in #2492
  • Add TokenizedPrompt to LLMRequest for external tokeniza...
Read more

v1.4.0

20 Mar 04:48
v1.4.0
6e787dd

Choose a tag to compare

Release Highlights

  • Standalone chart work landed and is included in release artifacts
  • Conformance was split into its own Go module
  • InferencePool / Helm / gRPC-related improvements landed, including appProtocol, FailOpen, and ALPN h2
  • Significant ongoing work landed in flow control, BBR, predicted latency, and datalayer internals

What's Changed

Read more

v1.4.0-rc.3

16 Mar 22:04
v1.4.0-rc.3
315d092

Choose a tag to compare

v1.4.0-rc.3 Pre-release
Pre-release

Gateway API Inference Extension v1.4.0-rc.3 is available as a prerelease for community testing.

Full Changelog: v1.4.0-rc.2...v1.4.0-rc.3

v1.4.0-rc.2

10 Mar 21:06
v1.4.0-rc.2
b32dfd0

Choose a tag to compare

v1.4.0-rc.2 Pre-release
Pre-release

RC Highlights

  • v1.4.0-rc.2 is available for community testing before the final v1.4.0 release
  • fixes the release-branch quickstart vLLM image tags so they stay aligned with main while keeping release-branch IfNotPresent pull policy
  • bumps the ./conformance nested Go module to Gateway API v1.5.0

What's Changed

  • [release-1.4] fix(release): sync quickstart vllm images by @danehans in #2522
  • [release-1.4] chore(conformance): bump gateway-api to v1.5.0 by @danehans in #2520

Full Changelog: v1.4.0-rc.1...v1.4.0-rc.2

v1.4.0-rc.1

05 Mar 22:39
v1.4.0-rc.1
8f057d7

Choose a tag to compare

v1.4.0-rc.1 Pre-release
Pre-release

RC Highlights

  • v1.4.0-rc.1 is available for community testing before the final v1.4.0 release
  • standalone chart work landed and is included in release artifacts
  • conformance was split into its own Go module
  • InferencePool / Helm / gRPC-related improvements landed, including appProtocol, FailOpen, and ALPN h2
  • significant ongoing work landed in flow control, BBR, predicted latency, and datalayer internals

What's Changed

Read more

v1.3.1

20 Feb 00:40
v1.3.1

Choose a tag to compare

Fixes

This patch cherry picks a few fixes for:
#2321
#2300
#2316

v1.3.0

Noteworthy

LoRA Syncer

This release, and future releases will not have the lora syncer image associated with them, as we are deprecating that feature, a similar functionality will still exist in the form of the file system resolver. For model servers that do not yet support this form of LoRA management, but support the discrete LoRA management endpoints that the lora-syncer uses, the old images will be kept indefinitely, and can still be used.

In the next release, the lora syncer code will be removed from the codebase.

Flow Control

Flow Control continues to evolve with the addition of Scale from/to Zero support. Allowing requests to be sent to an EPP with no model serving endpoints behind it, and emitting metrics to be used by the autoscaler to then scale up the pool.

In following releases we will continue to develop towards this feature being default enabled.

Standalone EPP

This functionality allows the EPP to be deployed as a proxy, all contained within a single pod. This is achieved by the Envoy proxy having EPP as a sidecar container. This feature was developed for batch inference scenarios, and is currently considered experimental.

v1.3.1-rc.1

18 Feb 01:41
v1.3.1-rc.1

Choose a tag to compare

v1.3.1-rc.1 Pre-release
Pre-release

This patch cherry picks a few fixes for:
#2321
#2300
#2316

Full Changelog: v1.3.0...v1.3.1-rc.1

v1.3.0

21 Jan 14:17
v1.3.0
616745e

Choose a tag to compare

Noteworthy

LoRA Syncer

This release, and future releases will not have the lora syncer image associated with them, as we are deprecating that feature, a similar functionality will still exist in the form of the file system resolver. For model servers that do not yet support this form of LoRA management, but support the discrete LoRA management endpoints that the lora-syncer uses, the old images will be kept indefinitely, and can still be used.

In the next release, the lora syncer code will be removed from the codebase.

Flow Control

Flow Control continues to evolve with the addition of Scale from/to Zero support. Allowing requests to be sent to an EPP with no model serving endpoints behind it, and emitting metrics to be used by the autoscaler to then scale up the pool.

In following releases we will continue to develop towards this feature being default enabled.

Standalone EPP

This functionality allows the EPP to be deployed as a proxy, all contained within a single pod. This is achieved by the Envoy proxy having EPP as a sidecar container. This feature was developed for batch inference scenarios, and is currently considered experimental.

Fix(es)

  • We improved the functionality of the approximate prefix cache scorer when working with the llm-d P/D setup

What's Changed

  • Added crd validation ci workflow. by @bexxmodd in #1879
  • chore: bump sim version by @nirrozenbaum in #1890
  • feat(conformance): add conformance test for verifying x-gateway-destination-endpoint-served by @zetxqx in #1862
  • Add deprecation notice on metrics port in runner and datastore by @elevran in #1886
  • refactor: Flatten Flow Control inter-flow policy plugin directory structure by @LukeAVanDrie in #1841
  • Execute prepare data plugins in topological order of data dependencies by @rahulgurnani in #1878
  • chore(deps): bump go.uber.org/zap from 1.27.0 to 1.27.1 by @dependabot[bot] in #1896
  • chore(deps): bump google.golang.org/grpc from 1.76.0 to 1.77.0 by @dependabot[bot] in #1897
  • chore(deps): bump github.com/prometheus/common from 0.67.2 to 0.67.4 by @dependabot[bot] in #1895
  • enhance bbr helm chart to generalize cmd-line args by @nirrozenbaum in #1900
  • feat: Add totalRunningRequests metric for latency predictor by @BenjaminBraunDev in #1899
  • chore(deps): bump sigs.k8s.io/structured-merge-diff/v6 from 6.3.0 to 6.3.1 by @dependabot[bot] in #1898
  • SLO Aware Routing Sidecar + Plugin EPP Integration and Helm Deployment by @BenjaminBraunDev in #1839
  • Use the correct vllm metric gpu_cache_usage_perc --> kv_cache_usage_perc by @ezrasilvera in #1905
  • fix: fixed helm chart by @capri-xiyue in #1907
  • docs: add Kgateway BBR documentation by @howardjohn in #1908
  • Implement EPP Plugins by datalayer objects by @elevran in #1901
  • feat: Implement Model Rewrite and Traffic Splitting Logic by @zetxqx in #1820
  • docs: Updated quickstart to use stable Istio release 1.28.0 by @atharva-310 in #1902
  • fix(release): correctly update lora-syncer and epp image tags across RC and final releases by @googs1025 in #1916
  • fix: sort InferenceModelRewrite lists by (Namespace, Name) in tests by @googs1025 in #1917
  • Define and register plugin factories for datalayer by @elevran in #1911
  • fix: Properly install the InferenceModelRewrite CRD using kustomize by @shmuelk in #1934
  • Move AllPodsPredicate to datastore package by @elevran in #1939
  • Add automatic TLS certificate reloading for EPP by @pierDipi in #1765
  • feat(modelRewrite): Add metrics for InferenceModelRewrite decisions by @zetxqx in #1938
  • fix: CI golangci-lint errors by @shmuelk in #1948
  • Update inference perf chart to match upstream chart + Add Prefix Cache Github Actions by @rlakhtakia in #1949
  • Standardize plugins.TypedName field name from 'tn' to 'typedName' by @rohithnarasimha in #1918
  • Update inference perf chart to use new hf token structure. by @rlakhtakia in #1955
  • fix infinite loop in profile picker and switch predictor based routing to on by default with a header to disable by @BenjaminBraunDev in #1929
  • fix config load error when picker is set before the scoerer w/o weight. by @zetxqx in #1958
  • add kaushikmitr as appoved of slo aware routing plugin by @kaushikmitr in #1956
  • refactor: [Scale from Zero] Introduce PodLocator by @LukeAVanDrie in #1950
  • feat: add config validation in predicted-latency-scorer plugin by @googs1025 in #1904
  • Run tests with two data layer implementations by @irar2 in #1930
  • Rename PodInfo struct to EndpointMetadata to better reflect its purpose by @shmuelk in #1866
  • feat(metrics): add scheduler attempt counter by @googs1025 in #1931
  • chore: update released quickstart to v1.2.1 by @nirrozenbaum in #1941
  • generalize latest release quickstart by @nirrozenbaum in #1966
  • chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.2 to 2.27.3 by @dependabot[bot] in #1971
  • chore(deps): bump golang.org/x/sync from 0.18.0 to 0.19.0 by @dependabot[bot] in #1972
  • chore(deps): bump go.opentelemetry.io/otel/sdk from 1.38.0 to 1.39.0 by @dependabot[bot] in #1975
  • refactor: Standardize config loading and system default injection by @LukeAVanDrie in #1953
  • chore(deps): bump github.com/onsi/gomega from 1.38.2 to 1.38.3 by @dependabot[bot] in #1974
  • chore(deps): bump go.opentelemetry.io/otel/exporters/stdout/stdouttrace from 1.38.0 to 1.39.0 by @dependabot[bot] in #1973
  • feat: Enable Scale-from-Zero with Flow Control enabled by @LukeAVanDrie in #1952
  • feature: (helm) support custom volumes and volumeMounts for epp by @delavet in #1945
  • Use spf13/pflag instead of Go's standard flag package by @elevran in #1979
  • Extend textual configuration support with the Datalayer's configuration by @shmuelk in #1914
  • test/integration: introduce robust harness and migrate BBR suite by @LukeAVanDrie in #1959
  • test/bbr: fix startup race condition and IPv6 address formatting by @LukeAVanDrie in #1987
  • [chore]Bump vLLM Image Tags by @Frapschen in #1733
  • Add Prefill Heavy E2E Test to Github Actions by @rlakhtakia in #1894
    ...
Read more

v1.3.0-rc.3

15 Jan 14:22
v1.3.0-rc.3

Choose a tag to compare

v1.3.0-rc.3 Pre-release
Pre-release

RC diff

  • Helm fixes
  • Scale from zero fixes

What's Changed

  • Added crd validation ci workflow. by @bexxmodd in #1879
  • chore: bump sim version by @nirrozenbaum in #1890
  • feat(conformance): add conformance test for verifying x-gateway-destination-endpoint-served by @zetxqx in #1862
  • Add deprecation notice on metrics port in runner and datastore by @elevran in #1886
  • refactor: Flatten Flow Control inter-flow policy plugin directory structure by @LukeAVanDrie in #1841
  • Execute prepare data plugins in topological order of data dependencies by @rahulgurnani in #1878
  • chore(deps): bump go.uber.org/zap from 1.27.0 to 1.27.1 by @dependabot[bot] in #1896
  • chore(deps): bump google.golang.org/grpc from 1.76.0 to 1.77.0 by @dependabot[bot] in #1897
  • chore(deps): bump github.com/prometheus/common from 0.67.2 to 0.67.4 by @dependabot[bot] in #1895
  • enhance bbr helm chart to generalize cmd-line args by @nirrozenbaum in #1900
  • feat: Add totalRunningRequests metric for latency predictor by @BenjaminBraunDev in #1899
  • chore(deps): bump sigs.k8s.io/structured-merge-diff/v6 from 6.3.0 to 6.3.1 by @dependabot[bot] in #1898
  • SLO Aware Routing Sidecar + Plugin EPP Integration and Helm Deployment by @BenjaminBraunDev in #1839
  • Use the correct vllm metric gpu_cache_usage_perc --> kv_cache_usage_perc by @ezrasilvera in #1905
  • fix: fixed helm chart by @capri-xiyue in #1907
  • docs: add Kgateway BBR documentation by @howardjohn in #1908
  • Implement EPP Plugins by datalayer objects by @elevran in #1901
  • feat: Implement Model Rewrite and Traffic Splitting Logic by @zetxqx in #1820
  • docs: Updated quickstart to use stable Istio release 1.28.0 by @atharva-310 in #1902
  • fix(release): correctly update lora-syncer and epp image tags across RC and final releases by @googs1025 in #1916
  • fix: sort InferenceModelRewrite lists by (Namespace, Name) in tests by @googs1025 in #1917
  • Define and register plugin factories for datalayer by @elevran in #1911
  • fix: Properly install the InferenceModelRewrite CRD using kustomize by @shmuelk in #1934
  • Move AllPodsPredicate to datastore package by @elevran in #1939
  • Add automatic TLS certificate reloading for EPP by @pierDipi in #1765
  • feat(modelRewrite): Add metrics for InferenceModelRewrite decisions by @zetxqx in #1938
  • fix: CI golangci-lint errors by @shmuelk in #1948
  • Update inference perf chart to match upstream chart + Add Prefix Cache Github Actions by @rlakhtakia in #1949
  • Standardize plugins.TypedName field name from 'tn' to 'typedName' by @rohithnarasimha in #1918
  • Update inference perf chart to use new hf token structure. by @rlakhtakia in #1955
  • fix infinite loop in profile picker and switch predictor based routing to on by default with a header to disable by @BenjaminBraunDev in #1929
  • fix config load error when picker is set before the scoerer w/o weight. by @zetxqx in #1958
  • add kaushikmitr as appoved of slo aware routing plugin by @kaushikmitr in #1956
  • refactor: [Scale from Zero] Introduce PodLocator by @LukeAVanDrie in #1950
  • feat: add config validation in predicted-latency-scorer plugin by @googs1025 in #1904
  • Run tests with two data layer implementations by @irar2 in #1930
  • Rename PodInfo struct to EndpointMetadata to better reflect its purpose by @shmuelk in #1866
  • feat(metrics): add scheduler attempt counter by @googs1025 in #1931
  • chore: update released quickstart to v1.2.1 by @nirrozenbaum in #1941
  • generalize latest release quickstart by @nirrozenbaum in #1966
  • chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.2 to 2.27.3 by @dependabot[bot] in #1971
  • chore(deps): bump golang.org/x/sync from 0.18.0 to 0.19.0 by @dependabot[bot] in #1972
  • chore(deps): bump go.opentelemetry.io/otel/sdk from 1.38.0 to 1.39.0 by @dependabot[bot] in #1975
  • refactor: Standardize config loading and system default injection by @LukeAVanDrie in #1953
  • chore(deps): bump github.com/onsi/gomega from 1.38.2 to 1.38.3 by @dependabot[bot] in #1974
  • chore(deps): bump go.opentelemetry.io/otel/exporters/stdout/stdouttrace from 1.38.0 to 1.39.0 by @dependabot[bot] in #1973
  • feat: Enable Scale-from-Zero with Flow Control enabled by @LukeAVanDrie in #1952
  • feature: (helm) support custom volumes and volumeMounts for epp by @delavet in #1945
  • Use spf13/pflag instead of Go's standard flag package by @elevran in #1979
  • Extend textual configuration support with the Datalayer's configuration by @shmuelk in #1914
  • test/integration: introduce robust harness and migrate BBR suite by @LukeAVanDrie in #1959
  • test/bbr: fix startup race condition and IPv6 address formatting by @LukeAVanDrie in #1987
  • [chore]Bump vLLM Image Tags by @Frapschen in #1733
  • Add Prefill Heavy E2E Test to Github Actions by @rlakhtakia in #1894
  • Add decode heavy benchmark e2e test to github actions. by @rlakhtakia in #1893
  • BBR multi lora guide by @davidbreitgand in #1940
  • [feat] Add running requests scorer and tests by @BenjaminBraunDev in #1957
  • Implement PrepareDataPlugin for prefix cache match plugin by @rahulgurnani in #1942
  • Define and implement command line parsing with Options struct by @elevran in #1984
  • fix(inferenceModelRewrites): conditionally skip watching InferenceModelRewrite and InferenceObjective by @zetxqx in #1967
  • Add e2e test for multiport InferencePool enhancement by @RyanRosario in #1885
  • chore(deps): bump go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc from 1.38.0 to 1.39.0 by @dependabot[bot] in #1997
  • flowcontrol: refactor registry config to support dynamic priority...
Read more