storage | arfore dot com

As many of your know, I have a Plex-based Mac Mini media center setup which has enabled me to to the cord with the cable monopoly. Part of this setup is the Openfiler NAS that I use to store all of the digital copies of my dvd collection.

Lately I have been wishing that I had configured the NAS with a RAID setup instead of just using an all-eggs-in-one-basket approach with three drives in a LVM configuration. In the process of cleaning out a bunch of videos that I was never going to watch I managed to free enough space to be able to disband the existing volume group and setup my two 1TB drives in a RAID 1 set. The happy result was that I was also able to re-purpose the 500GB drive for use with my ever growing iTunes library. (Thanks, Apple and Steve Jobs for making all my music DRM free!)

The problem I ran into was that Openfiler will not allow you to create a RAID set in a degraded state. This was necessary to enable me to work with the drives I owned and not spend additional funds. After discovering this I began investigating the possibility of doing the RAID configuration by hand and completing the rest of the setup in the Openfiler web console. Here is the process I used.

A few steps were left out, mainly the pieces revolving around moving the data from one logical volume to the other then pruning the volume group of the old volume. Excpet for the copy process, all of this can be easily accomplished in the Openfiler web console. The only piece that I couldn’t easily determine from the console interface was which physical devices were used in which logical volume. That information can be easily found using the following command:

[root@mangrove ~]# lvdisplay -m

Create the RAID 1 set with a missing drive and prepare the physical volume

Step 1: SSH as root into your Openfiler setup

arfore$ ssh [email protected]

Step 2: As root, partition the RAID member using fdisk. You will want to create a single, primary partition. Accept the defaults on the partition size so that it uses the whole drive.

[root@mangrove ~]# fdisk /dev/sda
 
The number of cylinders for this disk is set to 121601.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
 
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-121601, default 1):
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-121601, default 121601):
Using default value 121601

Next change the partition type to be Linux raid autodetect (Note: the hex code is fd)

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)
 
Now exit saving the changes. This will write the new partition table to the drive.
 
Command (m for help): w
The partition table has been altered!
 
Calling ioctl() to re-read partition table.
Syncing disks.

Step 3: Create the RAID 1 set using your newly partitioned drive. Normally when creating a RAID 1 set you would specify two drive since this minimum number for a RAID 1 set in a clean, non-degraded state. However in our case we need to start out with a set containing one drive, which will show up in Openfiler as a RAID 1 set in a clean, degraded state.

[root@mangrove ~]# mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 missing

If it all worked, you should see the following result from the mdadm command:

mdadm: array /dev/md0 started.

To check the status of your newly created RAID 1 set, execute the following command:

[root@mangrove ~]# cat /proc/mdstat

The result should look like this:

Personalities : [raid6] [raid5] [raid4] [raid10] [raid1]
md0 : active raid1 sda1[0] 976759936 blocks [2/1] [U_]
 
unused devices:

What this means is that your system supports RAID levels 1, 4, 5, 6, and 10. By extension this means it also supports RAID level 0. The [2/1] entry on the md0 line means that your RAID set is configured for two devices, but only one device exists. In a clean, non-degraded state, a RAID 1 set would show [2/2].
Step 4: Initialize a partition (in this case the RAID 1 partition) for use in a volume group.

[root@mangrove ~]# pvcreate /dev/md0

If the command completed successfully, you should see the following result:

Physical volume "/dev/md0" successfully created

Create the Volume Group and Logical Volume

Note: The next parts are done in the Openfiler web console.

Step 1: First check the status of your RAID set by selecting the Volumes tab then clicking on Software RAID in the Volumes section.

Step 2: On the Software Raid Management screen you will see your new RAID 1 set listed with a state of Clean & degraded. Normally, this would indicate a possible drive failure, however in this instance it is expected.

Step 3: The next step is to create a new volume group using the physical volume created during the previous command line steps. Click on the Volume Groups link in the Volumes section.

Create your new Volume Group and select the physical volume to be added to the group. Then click the Add volume group button.

Notice that now your new volume group shows up in the Volume Group Management section.

Step 4: Next you need to add a new volume to your newly create volume group. You can do this on the Add Volume screen. Click on the Add Volume link in the Volumes section.

On the Add Volume screen, setup your new volume and click the create button. The suggestion from Openfiler is to keep the default filesystem (which currently is xfs). Also, make sure to increase the volume size or Required Space to create the desired volume. In my case I am going to select the maximum available space.

The system now sends you to the Manage Volumes screen which will show you that you now have a volume group containing one volume.

At this point you can copy all your data over from the old LVM setup into the new LVM with RAID 1 setup. This may take some time. In my case, given that may hardware is far from the latest and greatest, it took just over an hour to copy 900GB of data.

Adding the empty drive to the RAID 1 set

The next steps assume that you have migrated your data from the last remaining volume group from the old setup into you degraded RAID 1 set. At this point what we are going to do is to add the free disk into the RAID set and start it syncing the disks.

Step 1: The first thing you need to do is to make sure that the partitioning structure of the new disk matches the structure of the RAID 1 member. In a RAID set the parition structure needs to match.

This can be easily accomplished using the following command:

[root@mangrove ~]# sfdisk -d /dev/sda | sfdisk /dev/sdb

The output from the command should look something like this:

Checking that no-one is using this disk right now ...
OK
 
Disk /dev/sdb: 121601 cylinders, 255 heads, 63 sectors/track
Old situation:
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
  Device Boot Start End #cyls #blocks Id System
/dev/sdb1 0+ 121600 121601- 976760001 fd Linux raid autodetect
/dev/sdb2 0 - 0 0 0 Empty
/dev/sdb3 0 - 0 0 0 Empty
/dev/sdb4 0 - 0 0 0 Empty
New situation:
Units = sectors of 512 bytes, counting from 0
  Device Boot Start End #sectors Id System
/dev/sdb1 63 1953520064 1953520002 fd Linux raid autodetect
/dev/sdb2 0 - 0 0 Empty
/dev/sdb3 0 - 0 0 Empty
/dev/sdb4 0 - 0 0 Empty
Warning: no primary partition is marked bootable (active)
This does not matter for LILO, but the DOS MBR will not boot this disk.
Successfully wrote the new partition table
 
Re-reading the partition table ...
 
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)

As long as the process doesn’t throw any wonky error messages then you are good to move on to the next step.

Step 2: Now we need to add the disk /dev/sdb into the RAID 1 set /dev/md0.

[root@mangrove ~]# mdadm --manage /dev/md0 --add /dev/sdb1

If the command completes successfully, you will get the following result:

mdadm: added /dev/sdb1

Step 3: At this point the system show automatically begin syncing the disks. To check on this run the command:

[root@mangrove ~]# cat /proc/mdstat

The result show look like so:

Personalities : [raid6] [raid5] [raid4] [raid10] [raid1]
md0 : active raid1 sdb1[2] sda1[0] 976759936 blocks [2/1] [U_] [>....................] recovery = 0.0% (535488/976759936) finish=182.3min speed=89248K/sec
 
unused devices:

Notice that it now shows that the RAID 1 set is in recovery mode. This indicates that the set is in the process of being synchronized. During this process the data shown on the RAID Management screen in the Openfiler web console will show that the state of the RAID is Clean & degraded & recovering. It will also show the progress of synchronization.

Step 4: Once the RAID set has finished syncing the mdstat results will look as follows:

[root@mangrove ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid10] [raid1]
md0 : active raid1 sdb1[1] sda1[0] 976759936 blocks [2/2] [UU]
 
unused devices:

As you can see, the RAID identity now shows that there are 2 volumes in the set with /dev/sda1 being the primary and /dev/sdb1 being the mirror. Also, it now shows that there are 2 fully synced disks, indicated by [2/2] [UU].

In the Openfiler web console the RAID Management screen now shows the state of the array as Clean with a sync status of Synchronized.

Final Steps

Step 1: Now that you have a completely clean RAID 1 set you will need to ensure that the /etc/mdadm.conf file has the correct information concerning the array. Normally this is created automatically for you by the Openfiler administration tool, however since we built the array by hand we will need to add this information into the existing file. (Note: backup the existing mdadm.conf file!)

This can be accomplished by the following commands:

[root@mangrove ~]# cp /etc/mdadm.conf /etc/mdadm.conf_orig
[root@mangrove ~]# mdadm --examine --scan >> /etc/mdadm.conf

The mdadm.conf file now contains the following (Note: the UUID entry will be unique to your system):

#
# PLEASE DO NOT MODIFY THIS CONFIGURATION FILE!
# This configuration file was auto-generated
# by Openfiler. Please do not modify it.
#
# Generated at: Sat Jul 24 19:43:42 EDT 2010
#
 
DEVICE partitions
PROGRAM /opt/openfiler/bin/mdalert
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=542fa4dc:c920dae9:c062205a:a8df35f1

Testing the RAID set

Testing the RAID set is the next step that I would recommend. Now, if something really bad(TM) goes wrong here, then you could end up losing data. If you are extremely worried, make sure that you have a backup of it all somewhere else, however if you don’t actually test the mirror then you don’t know that it all works and then you could be relying on a setup that will fail you when you need it the most.

Step 1: Get the details of your existing RAID set first prior to testing the removal of a device, that way you will have something to compare it to.

[root@mangrove ~]# mdadm –detail /dev/md0

This should yield some valuable data on the health of the array.

/dev/md0: Version : 00.90.03 Creation Time : Sat Jul 24 19:55:44 2010 Raid Level : raid1 Array Size : 976759936 (931.51 GiB 1000.20 GB) Used Dev Size : 976759936 (931.51 GiB 1000.20 GB) Raid Devices : 2 Total Devices : 2
Preferred Minor : 0 Persistence : Superblock is persistent
  Update Time : Sun Jul 25 13:41:07 2010 State : clean Active Devices : 2
Working Devices : 2 Failed Devices : 0 Spare Devices : 0
  UUID : 542fa4dc:c920dae9:c062205a:a8df35f1 Events : 0.3276
  Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1

Step 2: The next step is to actually remove a device from the array. Now, since /dev/sda was the initial disk I put into my set, I am going to remove it and see what happens.

To remove the device from the array and mark it as failed, use the following command:

[root@mangrove ~]# mdadm --manage --fail /dev/md0 /dev/sda1

You should receive the following result:

mdadm: set /dev/sda1 faulty in /dev/md0

Step 3: Check the status of the array by using the following command:

[root@mangrove ~]# cat /proc/mdstat

You will see that the device /dev/sda1 is now marked as failed and that the status of the array is degraded:

Personalities : [raid6] [raid5] [raid4] [raid10] [raid1]
md0 : active raid1 sdb1[1] sda1[2](F) 976759936 blocks [2/1] [_U]
 
unused devices:

Also, in the RAID Management screen in the Openfiler web console you will see the state of the array change to be Clean & degraded.

Step 4: The next step is to test the mirroring of the data. What you should do now is to remove the drive that you marked as failed, then add the drive back into the mirror. In a real world scenario you would also replace the actual drive itself, however that is not necessary in this test. Also, if you were replacing the actual drive you would need to repeat the duplication of the partitioning structure.

First, remove the failed drive from the set /dev/md0 using the following command:

[root@mangrove ~]# mdadm --manage --remove /dev/md0 /dev/sda1

The result that you get back will show a hot remove since this command was executed while the system was live. In an enterprise environment where the storage array supported hotplug devices you could then replace the failed drive without shutting the system down.

Next, add the new drive into the array like so:

[root@mangrove ~]# mdadm --manage --add /dev/md0 /dev/sda1

The result from this command will be:

mdadm: re-added /dev/sda1

Step 5: If nothing has gone horribly wrong with the test, the array will now begin the mirroring process again. As you can see from the output of mdstat, the recovery process will be indicated in the same manner as the initial mirror was:

[root@mangrove ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid10] [raid1]
md0 : active raid1 sda1[2] sdb1[1] 976759936 blocks [2/1] [_U] [>....................] recovery = 0.0% (834432/976759936) finish=175.4min speed=92714K/sec
 
unused devices:

Note: this is not a smart recovery process. When you break the mirror, the entire mirroring process has to complete, even if it was a test where the data had not actually disappeared. As you can see, the recovery process is going to take about the same length of time that initial build of the mirror did.

Also, in the RAID Management screen, the state of the array will now be shown as Clean & degraded & recovering, just as before when we built the mirror in the first place.

Step 6: Once the mirroring process of the testing has completed, the results of mdstat should look similar to the following:

Personalities : [raid6] [raid5] [raid4] [raid10] [raid1]
md0 : active raid1 sda1[0] sdb1[1] 976759936 blocks [2/2] [UU]
 
unused devices:

If you had actually replaced a drive you might need to update mdadm.conf to reflect the changes. At the very least it is wise to run the scan command again to ensure that the UUID of the array matches the file after the rebuild has completed.

Final Thoughts

One of the big lessons that I learned from this whole process, both using Openfiler and with handling storage arrays in general, is that more thought and planning on the frontend can save you a lot of tedium later. Had I initially planned my storage setup, I would have configured the drives in a RAID 1 set in the beginning. Another thing I learned is that while managing your NAS with an appliance type of setup (whether you use ClarkConnet, FreeNAS or Openfiler) is a great convenience, it doesn’t give you the insight and understanding that can be gained by doing everything, at least once, the manual way. I now have a much better understanding of how the software raid functions work in Linux and of the LVM process as a whole.

References

This whole process would have been much more difficult had it not been for the input of a friend as well as a series of postings on the Internet. Thanks, Joe for your help on this, given that it was my first foray into software raid on Linux.