Problem Summary
An issue in the NFNIC driver/firmware might lead to an APD (all paths down) condition in ESXi during a short window of the hot upgrade process on the InfiniBox. This may cause temporary data inaccessibility during the InfiniBox system SW upgrade.
Applicability
This issue is relevant for environments with UCS hardware running VMware ESXi 6.7 with a Cisco Native FNIC (NFNIC) version earlier than 4.0.0.63.
To determine which version of NFNIC is running:
- ssh root@<esxi FQDN or IP addr>
- Run
vmkload_mod -s nfnic
Problem Description
During the hot upgrade process of InfiniBox systems there is a small window of time (approximately 8-12 seconds) where the microcode queues up commands from initiators before sending responses again.
If the following events occur during this small window of time, an APD event is possible:
- NFNIC driver sees a series of SCSI aborts
- A single REPORT_LUNS is sent and doesn't respond in time
When this happens, the NFNIC driver never retries the REPORT_LUNS SCSI command, causing ESXi to report APD.
To recover the VMware ESXi access to data, reboot the ESXi server.
Resolution
Upgrade Cisco NFNIC to version 4.0.0.63 or above, as listed in the Reference section.
Reference
https://quickview.cloudapps.cisco.com/quickview/bug/CSCvu81080
https://kb.vmware.com/s/article/80101
Last edited: 2022-08-06 08:30:49 UTC
Comments