2006-07-29 -- more tests on new fileserver for ETS.
Now, use of linux md raid in emergencies.

Five disks (147GB 10krpm U320 SCSI), each with two 10000MB partitions
at its beginning.  Using a gentoo system installed on another (the first)
disk, with linux md raid from a 2.6.16-gentoo-r13 kernel, the following
raid grow/fail/spare etc. scenarios were worked through.


1) make raid5 array of 4 partitions, format with ext3, put files on it,
checksum the files, remove one disk by stopping array and removing superblock
from partition, then rerun array and check files are still there!
# mdadm --create /dev/md0 --auto=yes -l 5 -n 4 /dev/sd[bcde]1
# mkfs.ext3 /dev/md0
# mount -t ext3 /dev/md0  /mnt/tmp/
# cp file? /mnt/tmp/
# md5sum /mnt/tmp/file?
	93727fa64195aaf4c9c043f6a3b45905  /mnt/tmp/file1
	e0ebe8f2dc7c51cb5565d3d42f8f8c81  /mnt/tmp/file2
	7e14b1cdc730c13da524e280085e9817  /mnt/tmp/file3
	8858b2ff1cf9a46eedb9b7f861fab02f  /mnt/tmp/file4
	48f2f7d32c1d190406f9ef7b17bdc2ec  /mnt/tmp/fileb
# umount /dev/md0
# mdadm --stop /dev/md0
# mdadm --zero-superblock /dev/sdc1
# mdadm --assemble /dev/md0 --run /dev/sd[bcde]1
	mdadm: /dev/sdc1 has no superblock - assembly aborted
# mdadm --assemble /dev/md0 --run /dev/sd[bde]1
	mdadm: /dev/md0 has been started with 3 drives (out of 4)
# cat /proc/mdstat
	md0 : active raid5 sdb1[0] sde1[3] sdd1[2]
	      2963520 blocks level 5, 64k chunk, algorithm 2 [4/3] [U_UU]
# mount mount -t ext3 /dev/md0  /mnt/tmp/
# md5sum /mnt/tmp/file?
	(same results as above)


2) add another disk into the array
# mdadm --manage /dev/md0 --add /dev/sdf1
(wait a few seconds for the small array to be recovered)
# cat /proc/mdstat
	md0 : active raid5 sdf1[1] sdb1[0] sde1[3] sdd1[2]
	      2963520 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]


3) with mdadm --monitor running to send emails of "changes in status", 
use the mdadm commands for failing a component disk
# mdadm --manage /dev/md0 --fail /dev/sdb1
(and immediately got a nice email:)
	This is an automatically generated mail message from mdadm
	running on penguin
	A Fail event had been detected on md device /dev/md0.
	It could be related to component device /dev/sdb1.
	Faithfully yours, etc.
	P.S. The /proc/mdstat file currently contains the following:
	Personalities : [raid0] [raid5] [raid4] [raid1] 
	md0 : active raid5 sdf1[1] sdb1[4](F) sde1[3] sdd1[2]
	      2963520 blocks level 5, 64k chunk, algorithm 2 [4/3] [_UUU]
	      unused devices: <none>


4) restore the faulted drive to operation (--manage seems redundant)
# mdadm  /dev/md0 --remove /dev/sdb1
# mdadm  /dev/md0 --add /dev/sdb1
# cat /proc/mdstat
	md0 : active raid5 sdb1[0] sdf1[1] sde1[3] sdd1[2]
		2963520 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]


5) add a further device as a hot spare
# mdadm  /dev/md0 --add /dev/sdc1
# cat /proc/mdstat
	md0 : active raid5 sdc1[4](S) sdb1[0] sdf1[1] sde1[3] sdd1[2]
		2963520 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]


6) have another failure:  see how spare copes
# mdadm /dev/md0 --fail /dev/sdd1
# cat /proc/mdstat
	md0 : active raid5 sdc1[2] sdb1[0] sdf1[1] sde1[3] sdd1[4](F)
		2963520 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
all very nice!
Check the data's actually there:
# md5sum /mnt/tmp/file?
(yes -- all tallies)


7) try to grow array
# mdadm /dev/md0 --remove /dev/sdd1
# mdadm /dev/md0 --grow /dev/sdd1
	mdadm: can only add devices to linear arrays
Ahhh!! The problem... a many-hour job, then, to increase the array's size,
by wiping and rebuilding then restoring data from backup.
WORTH NOTING, if a hostraid card can do online growing.
      

8) not really raid, but related:  growing ext3fs.
# df .
	/dev/md/1              2916920   2465504    303240  90% /mnt/tmp
# umount /mnt/tmp
# resize2fs -f /dev/md1
	Resizing the filesystem on /dev/md1 to 987904 (4k) blocks.
	The filesystem on /dev/md1 is now 987904 blocks long.
# mount -t ext3 /dev/md1  /mnt/tmp/
# df .
	/dev/md/1              3888808   2465504   1225724  67% /mnt/tmp
# md5sum /mnt/tmp/file?
(again, these are ok)