Translate

Thursday 15 May 2014

LUN locking problems on VIO servers

I had this problem and I think it is due to the order in which the VIOS servers were powered-up and claimed the LUNS.  The biggest clue is if you run "lspv isize" on your VIOS servers you get different answers for the sizes and PVIDs.

Here is an example of querying and releasing locks where required.

1. run # lsattr -El hdisk0 -a reserve_policy for each of your disks and ensure that only the VIOS internal disks are set to "reserve_policy single_path". Al the others should be "no reserve". If they need to be changed use the "-P" option and then reboot and check again.

2. Once all the disks are shown as shared, check their status as follows

# devrsrv -c query -l hdisk11

3. If any are still reserved attempt to break the locks as follows:

Use either depending which host has this LUN.

# devrsrv -c release -l hdisk11
or
# devrsrv -f -l hdisk11

Please check this link for disk reservation release: http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.cmds%2Fdoc%2Faixcmds2%2Fdevrsrv.htm

3. Check the system error log using "errpt" and "lspath".

4. Check all your FC adapters are set with "dynamic tracking" and "fast failover" eg.

# lsattr -El fscsi0
attach       switch    How this adapter is CONNECTED         False
dyntrk       yes       Dynamic Tracking of FC Devices        True
fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True
scsi_id      0xc80100  Adapter SCSI ID                       False
sw_fc_class  3         FC Class for Fabric                   True

http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.prftungd%2Fdoc%2Fprftungd%2Fdynamic_tracking.htm

5. Login to each of the nodes and attempt to rediscover the virtual disks and VSCSI.

6. Ensure that the VSCSIs have "heartbeat checking" activated.

Note: You may have to delete and rediscover your devices a couple of times and do various reboots before you get this absolutely right.

Once you think everything is working OK clear all your error logs and and power-down all the clients and VIOS servers, then power everything up in the normal order and check that all the locs, paths, etc have taken as expected, and there are no further errors.

No comments:

Post a Comment