FIXED - Citrix Xenserver 7.x MPIO

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Anatoly (staff), Max (staff)

buzzzo
Posts: 13
Joined: Wed Jan 20, 2016 6:39 pm

Fri Mar 10, 2017 8:44 am

Hi Michael

Maybe i'm not explain correctly.

Problems comes when , for example, one of the two nodes is down.
The xenserver multipath correctly handle the path's failover, eg on the survived xenserver node the running vms are not impacted.
When I restart the problematic nodes (which is a xenserver node that serve both production vms and the starwind node) i'm not able to reconnect it
to the sr (which btw in the meantime "degraded" and running only with one starwind server).

Even if the sr appear connected also on the just rebooted xenserver node i'm to able to run any vm located on the sr served by starwind's cluster.
All returns to work when the starwind vm is turned on and the vdisk on which the sr is located is finally resynched.

Thx for help
Michael (staff)
Staff
Posts: 319
Joined: Thu Jul 21, 2016 10:16 am

Fri Mar 10, 2017 5:32 pm

Hello!
I am sorry, but it still not clear to me.
1.Let's imagine that you have turned off node B, so SR should be available from StarWind VM on node A, but should have a half of paths. Production VMs should work without interruption.
2.When node B back up, StarWind synchronizing its devices and after synchronization completion, SR's number of paths should be restored. Production VMs should work without interruption.

3.Then you can turn off node A. Again, the paths will be dropped from StarWind VM on node A, and SR will be available from StarWind on node B. Production VMs should work without interruption.
4. Once node A will back up, as before, StarWind will synchronize devices and all paths on SR should be restored. Again Production VMs should continue work without interruption.

Please specify on which step above you face with the issue. Thank you!
buzzzo
Posts: 13
Joined: Wed Jan 20, 2016 6:39 pm

Mon Mar 13, 2017 9:07 am

Hi Michael

The scenario you described works fine.
But now let's imagine another one , the most catastrophic one:

1) BOTH nodes goes down
2) i restart the two nodes, resulting in having the mirrored sr desynced on the two nodes.
3) i force the vdisk on node1 as "mark as synched"
4) now second node start full resync
5) imagine we have a 1tb vdisk , so it takes a long time
6) What i'm expecting on the xenserver side is that i can reconnect the sr which is served just by the 1 node.
7) What it is happened is that the sr seems to be connected (obviously with half of paths) but when i started vms they don't come up.
8) In the logs xenserver tell clearly that it can't connect to the second node via iscsi
9) When the resync is finished I can successfully reconnect the sr

On xenserver 6.5 all works fine: i can just reconnect the sr with half the path , restart vms and then restart resync on starwind side.
FYI: I'm able to obtain the same issue using datacore on the same hw.


Hope now is more clearly the issue.
buzzzo
Posts: 13
Joined: Wed Jan 20, 2016 6:39 pm

Mon Mar 13, 2017 11:00 am

UPDATE: it's a xenserver bug which invoke the uncorrect parameters when trying to detect scsi_id of the device.
I've managed to solve the problem to modify the pertinent code.

Thx for help.
Michael (staff)
Staff
Posts: 319
Joined: Thu Jul 21, 2016 10:16 am

Wed Mar 15, 2017 5:55 pm

Great news, thank you!
Could you please provide us with a link to the solution (if it's present)? :wink:
CameronWP
Posts: 2
Joined: Wed Apr 19, 2017 2:51 pm

Wed Apr 19, 2017 3:28 pm

I was having this same issue and performed the change in this thread and it appears to have resolved the issue.

http://discussions.citrix.com/topic/384 ... nt-sc9000/

The recommendation is to change the following in /opt/xensource/sm/scsiutil.py in getSCSIid(path):

Old Line: stdout = util.pread2([SCSI_ID_BIN, '-g', '-s', '/block/%s' % dev])
New Line: stdout = util.pread2([SCSI_ID_BIN, '-g', '/dev/%s' % dev])

No reboot or anything required, it just started working.

Hope that helps someone else banging their head against the wall!
Ivan (staff)
Staff
Posts: 172
Joined: Thu Mar 09, 2017 6:30 pm

Wed Apr 19, 2017 5:04 pm

Hello CameronWP,
Thanks for this information!
I hope it will be helpful for our Citrix XEN customers.
Post Reply