HAImage (Non-Active)

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
Elcore
Posts: 12
Joined: Tue Jan 15, 2019 2:51 pm

Tue Jan 15, 2019 3:13 pm

We had a power outage on Saturday one of the 2 cluster computers shut down unexpectedly and since then the storage for the server is showing as (Non-Active) in the StarWind Management Console. I have reconnected the iSCSI targets and the drives show up in Windows Disk Management but the StarWind Management Console doesn't connect them. I need some help on getting things back up and running properly again.

Thanks for any help that can be provided.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Tue Jan 15, 2019 8:03 pm

Please post as much information as possible including logs, screenshots etc. Sensitive information can be posted via direct messages.
Elcore
Posts: 12
Joined: Tue Jan 15, 2019 2:51 pm

Wed Jan 16, 2019 2:19 pm

Here is the log file and a screen capture. I don't know what else is needed but if you do, please let me know and I will do my best to get you what you need to help me.

Actually the system is not allowing me to upload the log file...

John
Attachments
StarWindCapture.jpg
StarWindCapture.jpg (190.35 KiB) Viewed 9966 times
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Wed Jan 16, 2019 2:44 pm

Upload the logs to some file sharing service (Dropbox, Google Drive, WeTransfer etc.) and post me a link to the bundle via private messages. Use StarWind Log Collector from https://knowledgebase.starwindsoftware. ... collector/

I need a bit more information on your configuration. Particularly, I am interested in the storage type.
1. What do you use as storage on node 1? Is it a physical RAID configuration or Storage Spaces?
2. Is the partition where the StarWind files available on node 1? Does it have a drive letter assigned?
3. Have you tried restarting the StarWind service on node 1? If not, I would suggest you doing so.

Feel free to share any information.
Elcore
Posts: 12
Joined: Tue Jan 15, 2019 2:51 pm

Wed Jan 16, 2019 3:00 pm

I will work on getting you the logs. In the meantime the storage on both nodes is RAID using SSD and the partition is available on node 1 and drive letter assigned. I have indeed tried restarting the StarWind Service and the server several times even after attempting changes that have not worked. The cluster and storage has been up and running for over 2 years and suddenly this happened over the weekend due to a power outage. Currently our VMs are all running on node 2 as that server didn't shut down during the power outage.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Thu Jan 17, 2019 3:56 pm

Unfortunately, I cannot download the logs bundle using the the link you have sent me via PM. Just change file access rights to "Anyone with the link" for me to proceed.
Elcore
Posts: 12
Joined: Tue Jan 15, 2019 2:51 pm

Thu Jan 17, 2019 8:01 pm

I have updated the link and changed the access and sent you a new link via PM.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Thu Jan 17, 2019 8:47 pm

Show me screenshots of the content of the below folders on host 1:

Code: Select all

D:\Witness\
D:\Storage1\
D:\Storage2\
D:\FileServerStorage\
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Thu Jan 17, 2019 9:08 pm

Preferably with file extensions enabled.
Elcore
Posts: 12
Joined: Tue Jan 15, 2019 2:51 pm

Thu Jan 17, 2019 9:25 pm

Here are the screenshots you requested.

John
Attachments
Storage2.jpg
Storage2.jpg (117.24 KiB) Viewed 9933 times
Storage1.jpg
Storage1.jpg (121.76 KiB) Viewed 9933 times
FileServerStorage.jpg
FileServerStorage.jpg (119.84 KiB) Viewed 9933 times
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Fri Jan 18, 2019 12:28 am

Information you provided in your original post in this thread does not coincide with the logs.
According to you, the issue manifested on Saturday (i.e. January, 12), as the nodes went down after power outage. Yet, according to the logs, the nodes were able to get all disks synchronized on Sunday, Jan 13:

Code: Select all

788	HyperCluster1.electro-core.com	1948230	Information	High Availability Device iqn.2008-08.com.starwindsoftware:hypercluster1.electro-core.com-downloads, current Node Synchronization complete, Synchronizer is Partner Node iqn.2008-08.com.starwindsoftware:hypercluster2-downloads	StarWindService	1/13/2019 11:37:30 AM
773	HyperCluster1.electro-core.com	1948229	Information	High Availability Device iqn.2008-08.com.starwindsoftware:hypercluster1.electro-core.com-downloads, current Node State has changed to "Synchronized"	StarWindService	1/13/2019 11:37:30 AM
787	HyperCluster1.electro-core.com	1948228	Information	High Availability Device iqn.2008-08.com.starwindsoftware:hypercluster1.electro-core.com-downloads, current Node Synchronization started, Synchronizer is Partner Node iqn.2008-08.com.starwindsoftware:hypercluster2-downloads	StarWindService	1/13/2019 11:37:28 AM
774	HyperCluster1.electro-core.com	1948227	Warning	High Availability Device iqn.2008-08.com.starwindsoftware:hypercluster1.electro-core.com-downloads, current Node State has changed to "Synchronizing"	StarWindService	1/13/2019 11:37:28 AM
788	HyperCluster1.electro-core.com	1948226	Information	High Availability Device iqn.2008-08.com.starwindsoftware:hypercluster1.electro-core.com-fileserverstorage, current Node Synchronization complete, Synchronizer is Partner Node iqn.2008-08.com.starwindsoftware:hypercluster2-fileserverstorage	StarWindService	1/13/2019 11:37:27 AM
773	HyperCluster1.electro-core.com	1948225	Information	High Availability Device iqn.2008-08.com.starwindsoftware:hypercluster1.electro-core.com-fileserverstorage, current Node State has changed to "Synchronized"	StarWindService	1/13/2019 11:37:27 AM
902	HyperCluster1.electro-core.com	1948224	0	"The Software Protection service has started.
6.1.7601.17514"	Software Protection Platform Service	1/13/2019 11:36:46 AM
Finally, the devices went out of sync on the same day, Jan 13 at 12:56:16 PM, when the following happened:

Code: Select all

"The process Explorer.EXE has initiated the restart of computer HYPERCLUSTER1 on behalf of user ELECTRO-CORE\Norm for the following reason: Application: Maintenance (Planned)
Reason Code: 0x84040001
Shutdown Type: restart
Comment: "
Also, there was an unexpected shutdown on node 1:

Code: Select all

The previous system shutdown at 1:35:12 PM on ‎1/‎13/‎2019 was unexpected.
This looks pretty much like the event you initially meant. Am I right?

Anyway, the present status is as follows - your devices on node 1 are non-active because of the HA header files missing for three out of four disks on node 1. The one for Storage1 is still there, but its structure is totally corrupted, and thus the file is not usable at all. Unfortunately, StarWind logs cover only the time starting from 1/13 15:59:09.821 (overwritten by log rotation), so we are not really able to define what exactly happened there. Check your Windows Secutiry log for event 4660 related to Witness_HA.swdsk, Storage1_HA.swdsk, FileServerStorage_HA.swdsk and Storage2_HA.swdsk on node 1 after Jan 13, 12:56:16 PM for more information on what or who deleted the files, but only on condition the file system audit had been configured there.

In the current situation, I recommend you the following:
1. Force remove all targets on node 1.
2. Remove all StarWind related folders with their files from the D drive on node 1.
3. For each of the devices on node 2, select Replication Manager and delete the non-existing replicas.
4. For each of the devices on node 2, create replica to node 1 using the appropriate sync link and the two heartbeat links.
5. In the iSCSI initiators on both nodes, connect the newly appeared targets.
After connectivity to storage is restored from both nodes and the paths to storage are redundant, I would recommend you updating your StarWind installation to the latest build available at our website, as you keep using a pretty outdated build.
Elcore
Posts: 12
Joined: Tue Jan 15, 2019 2:51 pm

Sat Jan 19, 2019 3:31 pm

I was not there and only assumed that it was due to a power outage based on the information that I was given. I appreciate you taking the time to figure this out and to provide a solution. I will be working on it this morning and will report back.

John
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Sun Jan 20, 2019 8:41 pm

John,

Let me know if you need any addition assistance with this.
Post Reply