Purple Screen on Proliant DL380G9 vSAN Free
Posted: Wed Nov 15, 2023 10:05 am
Hello Community,
i search for help or self made experience on a eventually Hardware or Kernel Problem.
I have two Proliant DL360 G9 installed with VSphere ESXi 7.0.3
Both ESXi running 1 Node of an VSAN Free HA Cluster.
NVMe are mapped via passthrough directly to the VSAN VMs.
Everything was running for about 4 Months without an issue.
Then 4 Weeks ago the first ESXi got an Purple Screen generated through an NMI.
Core Dump was not created, but the screenshot of the PSOD points to a problem with the PCI Express Controller or an attached NVMe.
PCI Express Controller comes from the official HP Enablement Kit for the G9 Servers.
High Performance Fan Kit is also installed.
NVMe is an official HPE Kioxia CM5 with 6,84TB.
Problem occures in different ESXi Releases (HPE OEM Builds).
It seems to be in conjunction with higher load, both times it occured when Veeam Backup was creating Backups from the Datastore in hotadd mode.
But one takeout was in normal operation with low load on Datastore.
The second ESXi Node is simply placed an Samsung PM9A3 in a Startech U.2 to PCIe Adapter with Standard Fan Kit.
This makes no problems so far.
I have attached 2 Screenshots.
My next step is that i have ordered an additional Startech Adapter and a PM9A3 for test this configuraion in the ESXi Node 1.
Does anyone got an similar problem?
Help and Suggests appreciated
Best regards and have a nice Day,
i search for help or self made experience on a eventually Hardware or Kernel Problem.
I have two Proliant DL360 G9 installed with VSphere ESXi 7.0.3
Both ESXi running 1 Node of an VSAN Free HA Cluster.
NVMe are mapped via passthrough directly to the VSAN VMs.
Everything was running for about 4 Months without an issue.
Then 4 Weeks ago the first ESXi got an Purple Screen generated through an NMI.
Core Dump was not created, but the screenshot of the PSOD points to a problem with the PCI Express Controller or an attached NVMe.
PCI Express Controller comes from the official HP Enablement Kit for the G9 Servers.
High Performance Fan Kit is also installed.
NVMe is an official HPE Kioxia CM5 with 6,84TB.
Problem occures in different ESXi Releases (HPE OEM Builds).
It seems to be in conjunction with higher load, both times it occured when Veeam Backup was creating Backups from the Datastore in hotadd mode.
But one takeout was in normal operation with low load on Datastore.
The second ESXi Node is simply placed an Samsung PM9A3 in a Startech U.2 to PCIe Adapter with Standard Fan Kit.
This makes no problems so far.
I have attached 2 Screenshots.
My next step is that i have ordered an additional Startech Adapter and a PM9A3 for test this configuraion in the ESXi Node 1.
Does anyone got an similar problem?
Help and Suggests appreciated
Best regards and have a nice Day,