Hello all,
I am writing to get my setup checked as virtual machines I run in the cluster keep getting corrupted for some reason. I have 2 node hyper converged Hyper-V cluster with 2 CSV disks, each one with L1 and L2 cache. Cluster works well, failover tests run all ok, but I run into weird issues from time to time and it only happens to machines residing in the cluster storage. Sometimes the VM runs ok for a week and then it freezes and it's unable to boot. Sometimes I issue shutdown command and the machine locks and does not respond leaving me with only option to kill the VM process and thus corrupting VHDX. Sometimes VMs gets corrupted when doing failover to another host. All looks fine, but the machine stops responding and never boots up again. This only happens to VMs in the cluster with network connectivity (actually having some IO during the day) and never to the VMs stored locally on the system drive, or clustered VMs without network connectivity. I went through the settings zillion times and found nothing except the fact that I have RAID 5 spindle array and it's recommended to have RAID 0,1 or 10. It is non standard, but I'd expect it cause slow response, not data corruption. Second thing is I setup Round robin MPIO policy instead of Least queue depth as LQD had caused lots of problems with CSV turning to RAW device and so on and RR policy got rid of that problem. Other than that here is my config script and swdsk file to check. I must be missing something simple but important and it would be shame having to destroy this cluster as I really like the technology and want to use it as POC for possible future customers. Any help appreciated. Thanks.
SWDSK file:
<device active="true" plugin="imagefile" name="imagefile">
<storages>
<storage id="1" type="device" name="imagefile" lun="0x0">
<interval size="1370" units="GB"/>
<inquiry>
<serial_id>4FCC9E87A57E297D</serial_id>
<vendor id="STARWIND"/>
<product id="STARWIND " revision="0001"/>
<eui_64>4FCC9E87A57E297D</eui_64>
</inquiry>
<geometry>
<sector size="4096" psize="4096"/>
<track sectors="16"/>
<cylinder tracks="32" count="65535"/>
</geometry>
<caching>
<cache type="write-back" size="4" units="GB" level="1">
<storages>
<storage_ref id="1"/>
</storages>
</cache>
<cache type="write-through" size="110" units="GB" level="2">
<storages>
<storage_ref id="4"/>
</storages>
</cache>
</caching>
</storage>
</storages>
</device>
<system>
<resources>
<storages>
<storage id="1" name="RAM" type="RAM">
<interval size="4" units="GB"/>
</storage>
<storage id="2" name="My computer\E\CSV2\MasterCSV2.img" type="file">
<interval size="1370" units="GB"/>
</storage>
<storage id="4" name="My Computer\F\L2cache\L2cacheCSV2.swdsk" type="device">
<interval size="110" units="GB"/>
</storage>
</storages>
<network/>
</resources>
</system>
CreateCSV script:
$firstNode = new-Object Node
$firstNode.HostName = "xx.xx.xx.xx"
$firstNode.ImagePath = "My computer\E\CSV2"
$firstNode.ImageName = "MasterCSV2"
$firstNode.Size = 1402880
$firstNode.CreateImage = $true
$firstNode.TargetAlias = "MasterCSV2"
$firstNode.AutoSynch = $true
$firstNode.SyncInterface = "#p2=10.10.12.13:3260,10.10.12.14:3260"
$firstNode.HBInterface = "#p2=10.10.11.11:3260,10.10.13.11:3260,172.23.99.11:3260"
$firstNode.CacheSize = 4096
$firstNode.CacheMode = "wb"
$firstNode.PoolName = "CSVpool2"
$firstNode.SyncSessionCount = 1
$firstNode.ALUAOptimized = $true
#
# device sector size. Possible values: 512 or 4096(May be incompatible with some clients!) bytes.
#
$firstNode.SectorSize = 4096
$secondNode = new-Object Node
$secondNode.HostName = "yy.yy.yy.yy"
$secondNode.HostPort = "3261"
$secondNode.Login = "root"
$secondNode.Password = "starwind"
$secondNode.ImagePath = "My computer\E\CSV2"
$secondNode.ImageName = "PartnerCSV2"
$secondNode.Size = 1402880
$secondNode.CreateImage = $true
$secondNode.TargetAlias = "PartnerCSV2"
$secondNode.AutoSynch = $true
$secondNode.SyncInterface = "#p1=10.10.12.10:3260,10.10.12.11:3260"
$secondNode.HBInterface = "#p1=10.10.11.10:3260,10.10.13.10:3260,172.23.99.10:3260"
$secondNode.CacheSize = 4096
$secondNode.CacheMode = "wb"
$secondNode.SyncSessionCount = 1
$secondNode.ALUAOptimized = $true
The Latest Gartner® Magic Quadrant™Hyperconverged Infrastructure Software