2009-10-25. Since the upgrade to a newer gentoo with 2.6.30 kernel, in September, the fileserver has had some times of long (ten second or so) delays in responding to particular client connections: it seems this is mainly (only?) on mkdir calls, and that this in fact can happen whether the call is from an (NFSv3) client or done locally in e.g. an ssh session. The main space is on a ~1.8TB RAID5 array of 3 new disks, using ext3 filesystem. The kernel has been upgraded from 2.6.30 to 2.6.31 as a first precaution, but with no change noticed. There's no particularly heavy load associated with the times of problems, though I haven't yet any evidence that the problem has ever happened at a time with no other load at all -- there may for example be some cron-job using rdiff-backup briefly. Searching on Google for ages OR eternity OR long time mkdir linux raid OR md OR ext3 2.6.3 the first interestingly relevant find was: http://insights.oetiker.ch/linux/fsopbench/ ``Note the HUGE max delay for mkdir seen in this example.'' (19s). on which (albeit under considerable r/w load) the following /long/ delay had been seen 2.6.31.2-test relatime barrier=0 fs=ext3 disk=hdd journal=int data=writeback scheduler=cfq I mkdir cnt 1004 min 0.018 ms max 19592.233 ms avg 29.447 ms med 0.026 ms std 644.768 2.6.24-24-server atime barrier=0 fs=ext3 disk=areca RAID6 journal=ext data=ordered scheduler=deadline I mkdir cnt 706 min 0.018 ms max 5190.903 ms avg 33.380 ms med 0.025 ms std 349.304 The ~20s delay seemed quite reminiscent of our server... This time was when using the cfq (completely fair queueing, current default) scheduler, and with an ext3 filesystem. Ours is using cfq for the disks of the array, and no scheduler for the array itself (naturally enough). Our situation with a /software/ array rather than a simple disk or a hardware RAID (looking like a simple disk) is admittedly a little different from theirs. /sys/block/hda/queue/scheduler noop anticipatory deadline [cfq] /sys/block/hdc/queue/scheduler noop anticipatory deadline [cfq] /sys/block/sda/queue/scheduler noop anticipatory deadline [cfq] /sys/block/sdb/queue/scheduler noop anticipatory deadline [cfq] /sys/block/sdc/queue/scheduler noop anticipatory deadline [cfq] # tune2fs -l /dev/md0 tune2fs 1.41.3 (12-Oct-2008) Filesystem volume name: serv_home_raid Last mounted on: Filesystem UUID: 5832561e-34cd-432f-ae13-2fbc1cef9162 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file Filesystem flags: signed_directory_hash Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 122101760 Block count: 488375936 Reserved block count: 4883759 Free blocks: 13697351 Free inodes: 121436197 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 907 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 RAID stride: 16 Filesystem created: Fri Aug 21 20:28:49 2009 Last mount time: Tue Oct 20 20:51:19 2009 Last write time: Tue Oct 20 20:51:19 2009 Mount count: 11 Maximum mount count: -1 Last checked: Fri Aug 21 20:28:49 2009 Check interval: 0 () Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 First orphan inode: 91868594 Default directory hash: half_md4 Directory Hash Seed: 929343b3-40c2-4ad3-8f00-7d78f28e1680 Journal backup: inode blocks Other, much earlier kernels were also complained of as regressions. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/131094 Regression in speed (under competing processes) from ~2.6.15--2.6.22. A little test, 2009-10-26,00:21: tune2fs -e remount-ro /dev/md0 (shouldn't matter at all -- just a forgotten preference) mount -o remount,data=writeback /home/ (default is claimed to be `ordered' mode)