IBM SDD driver troubleshooting on Linux
The Subsystem Device Driver [SDD] is a pseudo device driver designed to support the multipath configuration environments in the IBM TotalStorage Enterprise Storage Server, the IBM TotalStorage DS family, the IBM SystemStorage SAN Volume Controller. It resides in a host system with the native disk device driver and provides the following functions:
On AIX it works pretty much out-of-the-box, but on Linux the story is quite different. The driver is closed-source, so all you can do is download the driver kernel module for supported kernels from IBM, as well as some userspace software.
For supported distributions, the driver can be downloaded from:
Subsystem Device Driver for Linux
When you install the package, it copies its files under /opt/IBMsdd (at least the Redhat .rpms do). Also, it adds an init script called "sdd" under /etc/init.d, and adds a line like this in inittab:
srv:345:respawn:/opt/IBMsdd/bin/sddsrv > /dev/null 2>&1
That should take care that if the sddsrv daemon dies, it will be restarted automatically.
Basic configuration
After the disks have been configured in the SAN Volume Controller, they should show up when the busses of the Fibre Channel adapters are re-scanned.
/proc/partitions shows the new disk:
252 0 73400320 vpatha
/etc/vpath.conf holds the names for the device ID's, and /etc/sddsrv.conf some miscellaneous settings.
Under /opt/IBMsdd/bin there are some useful utilities. The datapath command will allow you to query adapters and devices, and set them offline or online. lsvpcg shows which scsi disks map to which vpath disks. cfgvpath allows you to make changes to the configuration.
Using with LVM on RHEL4
To get LVM working correctly with SDD on RHEL4, there are a couple of things that must be taken care of.
Boot-up load order
The first thing is to make sure the sdd driver loads before LVM on boot-up. The SDD User Manual suggests adding a script to start sdd in /etc/rc.sysinit. It must be started after the root filesystem has been remounted but before LVM initializes:
# Remount the root filesystem read-write.
update_boot_stage RCmountfs
state=`awk / / / && ($3 !~ /rootfs/) { print $4 } /proc/mounts`
[ "$state" != "rw" -a "$READONLY" != "yes" ] &&
action $"Remounting root filesystem in read-write mode: " mount -n -o remount,rw /
# Starting SDD
/etc/init.d/sdd start
# LVM initialization
...
Also, they say /etc/init.d/sdd script must be set _not_ to start at boot-up:
[root@server ~]# chkconfig sdd off
The above configuration has a problem, though. The SDD software is installed under /opt, but if your /opt is on an LVM partition, there is no way to load the SDD drivers from there before LVM has been started and /opt mounted. In fact, if the line sits there, it will not work if /opt resides on its own partition of any type.
One solution to that problem is to move the SDD drivers to the root partition. For me it seems to just work if I enable /etc/init.d/sdd at bootup (no changes needed in rc.sysinit):
[root@server ~]# chkconfig sdd on
This may be due to the fact that my root partition is also in LVM, so that it has to be initialized in initrd early, and perhaps LVM scans for devices again later on.
Volume initialization and detection problems
A couple of changes must be made to the LVM configuration file /etc/lvm/lvm.conf. If you don't do that, you will run into an error message like this while trying to create a physical volume on a vpath device:
[root@server ~]# pvcreate /dev/vpatha1 Device /dev/vpatha1 not found (or ignored by filtering).
Accepting vpath devices
In /etc/lvm/lvm.conf, you will see that this line is commented out:
# types = [ "fd", 16 ]
Remove the comment character, and change the line to look like this:
types = [ "vpath", 16 ]
That will add vpath to LVM's list of accepted device types.
Rejecting the underlying devices
The lvm.conf has a filter configuration option which is used to select the devices that are allowed to be used as LVM physical volumes. The default is usually to allow every device. That is a problem with vpath devices, however, because they will show up both as /dev/vpathX, and as regular scsi disks. So, if you have configured one disk per volume controller through two FC switches, you will see a total four /dev/sdX disks per one vpath disk. And all these five devices will show the same data (although only the vpath device is redundant, and should therefore be used). The LVM default is to scan all devices, so it will print error messages of duplicate physical volumes if you don't change the filtering rule. The messages look like this:
Found duplicate PV 1XlJrZHnI49tTtHVvwe7cXZ0cATNFTxw: using /dev/sdak not /dev/vpatha
This is particularly bad, as it means LVM has chosen to use sdak instead of the redundant vpatha path.
To prevent this, change the filter line in lvm.conf to look like this:
filter = [ "a/vpath[a-z]*/", "r/.*/" ]
That line means LVM accepts only devices named vpath[a-z]*. This way LVM will choose the correct device. However, if you have physical volumes on internal SCSI disks, that line will also reject them. Add an accept rule on those:
filter = [ "a/vpath[a-z]*/", "a/sda2/", "r/.*/" ]
After these changes, you should be able to create a physical volume on vpath devices, and no error messages should be displayed when handing volumes. And everything should look the same the next time you boot.