Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions lisa/microsoft/testsuites/dpdk/dpdksuite.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,58 @@ def verify_dpdk_symmetric_mp(
) -> None:
run_dpdk_symmetric_mp(node, log, variables)

@TestCaseMetadata(
description="""
netvsc pmd version.
This test case checks dpdk symmetric mp app, plus an sriov hotplug.
More details refer https://docs.microsoft.com/en-us/azure/virtual-network/setup-dpdk#prerequisites # noqa: E501
""",
priority=2,
requirement=simple_requirement(
min_core_count=8,
min_nic_count=3,
network_interface=Sriov(),
unsupported_features=[Gpu, Infiniband],
),
timeout=600,
)
def verify_dpdk_symmetric_mp_hotplug(
self,
node: Node,
log: Logger,
variables: Dict[str, Any],
result: TestResult,
) -> None:
run_dpdk_symmetric_mp(
node, log, variables, trigger_hotplug=True, hotplug_times=1
)
Comment on lines +159 to +161
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description notes this hotplug test will fail on DPDK versions below 26.07. As written, the test will still run against older DPDK sources and fail noisily. Consider adding an explicit version gate (skip with a clear message) or a runbook variable/requirement that ensures only compatible DPDK sources/branches enable this test.

Copilot uses AI. Check for mistakes.

@TestCaseMetadata(
description="""
netvsc pmd version.
This test case checks dpdk symmetic mp app, plus an sriov hotplug.
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the test description: "symmetic" → "symmetric".

Suggested change
This test case checks dpdk symmetic mp app, plus an sriov hotplug.
This test case checks dpdk symmetric mp app, plus an sriov hotplug.

Copilot uses AI. Check for mistakes.
More details refer https://docs.microsoft.com/en-us/azure/virtual-network/setup-dpdk#prerequisites # noqa: E501
Comment on lines +165 to +167
""",
priority=4,
requirement=simple_requirement(
min_core_count=8,
min_nic_count=3,
network_interface=Sriov(),
unsupported_features=[Gpu, Infiniband],
),
timeout=6000,
)
def stress_dpdk_symmetric_mp_hotplug(
self,
node: Node,
log: Logger,
variables: Dict[str, Any],
result: TestResult,
) -> None:
run_dpdk_symmetric_mp(
node, log, variables, trigger_hotplug=True, hotplug_times=40
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With hotplug_times=40, run_dpdk_symmetric_mp will loop many times and (per current implementation) starts a new ping_async each iteration without any visible join/wait. This can create many concurrent ping processes during the stress run, increasing flakiness and resource pressure. Consider changing the runner to wait for each async ping to finish before starting the next iteration (or make the stress case use synchronous ping), and/or cap concurrent ping jobs.

Suggested change
node, log, variables, trigger_hotplug=True, hotplug_times=40
node, log, variables, trigger_hotplug=True, hotplug_times=4

Copilot uses AI. Check for mistakes.
)

@TestCaseMetadata(
description="""
netvsc pmd version with 1GiB hugepages
Expand Down
15 changes: 10 additions & 5 deletions lisa/microsoft/testsuites/dpdk/dpdkutil.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@
)
from lisa.tools.hugepages import HugePageSize
from lisa.tools.lscpu import CpuArchitecture
from lisa.util import sleep
from lisa.util.constants import DEVICE_TYPE_SRIOV, SIGINT
from lisa.util.parallel import TaskManager, run_in_parallel, run_in_parallel_async

Expand Down Expand Up @@ -1718,6 +1719,8 @@ def run_dpdk_symmetric_mp(
- count packets received on tx/rx side of each process and port

"""

test_timeout = 120 + (60 * hotplug_times if trigger_hotplug else 35)
# setup and unwrap the resources for this test
# get a list of the upper non-primary nics and select two of them
test_nics = [
Expand Down Expand Up @@ -1790,9 +1793,9 @@ def run_dpdk_symmetric_mp(
f"{str(symmetric_mp_path)} -l 1 --proc-type auto "
f"{symmetric_mp_args} --proc-id 0"
),
timeout=660,
timeout=test_timeout,
signal=SIGINT,
kill_timeout=30,
kill_timeout=test_timeout + 5,
)
Comment on lines +1796 to 1799
Comment on lines +1796 to 1799
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout.start_with_timeout() passes kill_timeout to GNU timeout --kill-after, which is a grace period after the initial signal (not an overall timeout). Setting it to test_timeout + 5 can make hung symmetric_mp runs wait an extra ~minutes (e.g., 40-hotplug stress) before being force-killed. Consider using a small, fixed grace period (e.g., 30–60s) independent of test_timeout.

Copilot uses AI. Check for mistakes.

# wait for it to start
Expand All @@ -1804,9 +1807,9 @@ def run_dpdk_symmetric_mp(
f"{str(symmetric_mp_path)} -l 2 --proc-type secondary "
f"{symmetric_mp_args} --proc-id 1"
),
timeout=600,
timeout=test_timeout,
signal=SIGINT,
kill_timeout=35,
kill_timeout=test_timeout + 5,
)
Comment on lines +1810 to 1813
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same kill_timeout issue as the primary process: --kill-after is a post-signal grace period, so tying it to test_timeout can significantly extend worst-case hangs. Use a small fixed grace period here as well.

Copilot uses AI. Check for mistakes.
secondary.wait_output("APP: Finished Process Init", timeout=20)

Expand Down Expand Up @@ -1842,7 +1845,7 @@ def run_dpdk_symmetric_mp(
"Device notification type=1", # RTE_DEV_EVENT_REMOVE
delta_only=True,
) # relying on compiler defaults here, not great.

sleep(1)
# turn SRIOV on
node.features[NetworkInterface].switch_sriov(
enable=True, wait=False, reset_connections=False
Comment thread
mcgov marked this conversation as resolved.
Expand All @@ -1867,6 +1870,8 @@ def run_dpdk_symmetric_mp(
)
# expect additional pings for each post-hotplug instance
expected_pings += 100
# sleep for a moment to avoid api throttling
sleep(1)
Comment thread
mcgov marked this conversation as resolved.

ping.ping_async(
target=test_nics[0].ip_addr,
Expand Down
8 changes: 4 additions & 4 deletions lisa/sut_orchestrator/azure/features.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@
generate_random_chars,
get_matched_str,
set_filtered_fields,
sleep,
)

if TYPE_CHECKING:
Expand Down Expand Up @@ -1008,12 +1009,11 @@ def switch_sriov(
f"now set its status into [{enable}]."
)
updated_nic.enable_accelerated_networking = enable
network_client.network_interfaces.begin_create_or_update(

poller = network_client.network_interfaces.begin_create_or_update(
self._resource_group_name, updated_nic.name, updated_nic
)
updated_nic = network_client.network_interfaces.get(
self._resource_group_name, nic_name
)
updated_nic = poller.result()
Comment on lines +1013 to +1016
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes switch_sriov effectively blocking on the Azure NIC update every time because it unconditionally calls poller.result(). In this PR, call sites explicitly pass wait=False (e.g., the hotplug test), so this likely breaks the intended non-wait semantics and can significantly slow down hotplug loops. Consider honoring the wait parameter: only call poller.result() (and any subsequent state assertions) when wait=True; otherwise return after starting the update (or return the poller for the caller to await).

Copilot uses AI. Check for mistakes.
assert_that(updated_nic.enable_accelerated_networking).described_as(
f"fail to set network interface {nic_name}'s accelerated "
f"networking into status [{enable}]"
Expand Down
Loading