Follow

Problem Summary

An issue in the NFNIC driver/firmware might lead to an APD (all paths down) condition in ESXi during a short window of the hot upgrade process on the InfiniBox. This may cause temporary data inaccessibility during the InfiniBox system SW upgrade.

Applicability

This issue is relevant for environments with UCS hardware running VMware ESXi 6.7 with a Cisco Native FNIC (NFNIC) version earlier than 4.0.0.63.

To determine which version of NFNIC is running:

  1. ssh root@<esxi FQDN or IP addr>
  2. Run
    vmkload_mod -s nfnic

Example

[root@test:~]vmkload_mod -s nfnic

vmkload_mod module information

input file: /usr/lib/vmware/vmkmod/nfnic

Version: 4.0.0.44-2vmw.701.0.0.16850804

Build Type: release

License: Proprietary

Required name-spaces:

 com.vmware.vmkapi#v2_7_0_0

Parameters:

 lun_queue_depth_per_path: ulong

  nfnic lun queue depth per path: Default = 32. Range [1 - 1024]

Problem Description

During the hot upgrade process of InfiniBox systems there is a small window of time (approximately 8-12 seconds) where the microcode queues up commands from initiators before sending responses again.

If the following events occur during this small window of time, an APD event is possible:

  • NFNIC driver sees a series of SCSI aborts
  • A single REPORT_LUNS is sent and doesn't respond in time

When this happens, the NFNIC driver never retries the REPORT_LUNS SCSI command, causing ESXi to report APD.

To recover the VMware ESXi access to data, reboot the ESXi server.

Resolution

Upgrade Cisco NFNIC to version 4.0.0.63 or above, as listed in the Reference section.

Reference

https://quickview.cloudapps.cisco.com/quickview/bug/CSCvu81080
https://kb.vmware.com/s/article/80101

Was this article helpful?
0 out of 0 found this helpful

0 out of 0 found this helpful

Last edited: 2022-01-24 14:36:50 UTC

Comments