paulgorman.org

< ^ txt

Mon Nov 23 07:39:44 EST 2015 Went to bed around eleven, and got up around seven. Woke once in the night, but slept well otherwise. High of thirty-five today. Snow showers between 11 AM and 9 PM, amounting to half an inch or less. Goals: Work: - Finish draft of Hazel Park network diagram No, but worked on it a little. - Work on remote access No. Took a twenty-minute walk at lunch. Snowing --- big flakes like goose feathers. It was cloudy, so it didn't feel as warm as yesterday. Home: - Review/finish FreeBSD init rc.d notes Done, mostly. Questions: - So, I've found my own solutions to the problem of non-redundant UEFI ESP boot partitions on software RAID, but is there a best practice? No, I haven't been able to find anything better than what I figured out on my own (each disk in a mirror has a non-mirrored FAT16 ESP partition, with boot data manually written to each one). Software RAID with UEFI seems like it would be a common case. I surprised it's not better covered. It's mildly frustrating that Linux installers don't make this configuration easier out of the box, but at least it's relatively easy to setup up and maintain once you understand the problem. With Windows Server, it's way worse: http://blogs.technet.com/b/tip_of_the_day/archive/2014/10/10/tip-of-the-day-configuring-disk-mirroring-for-windows-server-2012.aspx - Is it necessary or a best practice to periodically scrub a Linux md RAID? The ArchWiki seems to think regular scrub are a good idea. Ah. See md(4). SCRUBBING AND MISMATCHES As storage devices can develop bad blocks at any time it is valuable to regularly read all blocks on all devices in an array so as to catch such bad blocks early. This process is called scrubbing. md arrays can be scrubbed by writing either check or repair to the file md/sync_action in the sysfs directory for the device. Requesting a scrub will cause md to read every block on every device in the array, and check that the data is consistent. For RAID1 and RAID10, this means checking that the copies are identical. For RAID4, RAID5, RAID6 this means checking that the parity block is (or blocks are) correct. If a read error is detected during this process, the normal read-error handling causes correct data to be found from other devices and to be written back to the faulty device. In many case this will effectively fix the bad block. If all blocks read successfully but are found to not be consistent, then this is regarded as a mismatch. If check was used, then no action is taken to handle the mismatch, it is simply recorded. If repair was used, then a mismatch will be repaired in the same way that resync repairs arrays. For RAID5/RAID6 new parity blocks are written. For RAID1/RAID10, all but one block are overwritten with the content of that one block. A count of mismatches is recorded in the sysfs file md/mismatch_cnt. This is set to zero when a scrub starts and is incremented whenever a sector is found that is a mismatch. md normally works in units much larger than a single sector and when it finds a mismatch, it does not determine exactly how many actual sectors were affected but simply adds the number of sectors in the IO unit that was used. So a value of 128 could simply mean that a single 64KB check found an error (128 x 512bytes = 64KB). If an array is created by mdadm with --assume-clean then a subsequent check could be expected to find some mismatches. On a truly clean RAID5 or RAID6 array, any mismatches should indicate a hardware problem at some level - software issues should never cause such a mismatch. However on RAID1 and RAID10 it is possible for software issues to cause a mismatch to be reported. This does not necessarily mean that the data on the array is corrupted. It could simply be that the system does not care what is stored on that part of the array - it is unused space. The most likely cause for an unexpected mismatch on RAID1 or RAID10 occurs if a swap partition or swap file is stored on the array. When the swap subsystem wants to write a page of memory out, it flags the page as 'clean' in the memory manager and requests the swap device to write it out. It is quite possible that the memory will be changed while the write-out is happening. In that case the 'clean' flag will be found to be clear when the write completes and so the swap subsystem will simply forget that the swapout had been attempted, and will possibly choose a different page to write out. If the swap device was on RAID1 (or RAID10), then the data is sent from memory to a device twice (or more depending on the number of devices in the array). Thus it is possible that the memory gets changed between the times it is sent, so different data can be written to the different devices in the array. This will be detected by check as a mismatch. However it does not reflect any corruption as the block where this mismatch occurs is being treated by the swap system as being empty, and the data will never be read from that block. It is conceivable for a similar situation to occur on non-swap files, though it is less likely. Thus the mismatch_cnt value can not be interpreted very reliably on RAID1 or RAID10, especially when the device is used for swap. root@mizzen:/home/paulgorman# echo check > /sys/block/md0/md/sync_action root@mizzen:/home/paulgorman# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sda2[0] sdb2[1] 2929635136 blocks super 1.2 [2/2] [UU] [>....................] check = 0.0% (31360/2929635136) finish=9337.0min speed=5226K/sec unused devices: <none> root@mizzen:/home/paulgorman# cat /sys/block/md0/md/mismatch_cnt 0 The scrub can be aborted like: # echo idle > /sys/block/md0/md/sync_action I'm still not absolutely convinced of the utility of scrubbing a RAID 1, but I guess we can do it periodically, and alert if the mismatch count grows beyond whatever might be normal for the box. - Why is it not possible (or at least it doesn't seem to be supported by any OS) to mirror FAT filesystems? Alternately, why is UEFI not extensible, with the ability to add support for additional filesystems? Well, RAID is conceptually lower-level than a file system (i.e. we can put any filesystem on a RAID device). I guess the question really is: is one drive of a software RAID 1 byte-for-byte identical to a non-RAID drive (i.e. --- does the RAID software write any weird info to the drive that would confuse something trying to read what it expects to be a non-RAID drive). Wait, maybe that can't work. UEFI gives each physical disk a UUID. The boot load probably uses that? No, or at least grub.cfg seems to indicate the uuid of the lvm device. I don't know. See tomorrow for probable answer (yes, but maybe not worth it).

< ^ txt