phaqphaq

“a geeks daily life”

Archive for May, 2006

FreeBSD gvinum RAID5 on Sparc64

Saturday, May 27th, 2006

Not long ago a friend of mine generously donated me a Sun E450 server for use with my current networking projects.

The machine came along with four IBM DDRS34560 hard drives. Since a capacity of 4 GB per drive is not that overwhelming by today’s standards I was looking forward to incorporate them into a RAID5 array.

The operating system of choice to achieve this was FreeBSD/sparc64 6.1-RELEASE (custom-built kernel, Sun E450 SMP Kernel Config).

This howto will not go into too much details. If you want to learn more about this stuff check out the man pages of gvinum, geom and have a look at the FreeBSD handbook, Chapter #19.

#1 Get Some Good Hard Drives

First of all you need to have at least three hard drives to build up a RAID5 plex (this is the term used for arrays in gvinum). It does however not make very much sense to build a RAID5 from three disks, making it a RAID1 with hotspare is a better approach then.
I would use at least four disks (better five) but not more than seven. RAID5 makes heavy parity calculations, so the more disks you have, the longer it will take to spread the parity (don’t do a RAID5 over 14 disks, it is much slower than a RAID0 spanned accross two RAID5’s of seven disks).
Although it’s possible to use hard drives of any size and even make them member of multiple plexes of any kind, I would not recommend to do so. It makes things a bit more complicated.
This is why I choose to always dedicate a single disk to a single plex only, making it (at least to look) similar to what ordinary hardware RAID controllers do.

To get a balanced weight of I/O performance I would also recommend that you always span a plesk accross identical disks. Using multiple disks of different kind could lead to a very unbalanced behaviour.
Imagine what happens if you span a RAID5 accross one 5400 rpm SCSI-1 and two 15k rpm U360 hard drives…

In my case I had four IBM DDRS34560 hard drives to get along.

#2 Hard Drive Partitions

When you have your hard drives ready you should setup the partitions (or disk labels to be correct).

Due to platform specific differences FreeBSD/sparc64 neither has an fdisk nor a bsdlabel command you could work with. The utility of choice to manage the disk label is called ’sunlabel’.

First of all initialize all drives with a new disk label (replace DEVICE by your appropriate device name):

sunlabel -w DEVICE auto

Then you should edit the disk label (again, replace DEVICE by your appropriate device name):

sunlabel -e DEVICE

You will certainly notice a difference compared with a disklabel on the i386 platform. Create a new a: partition and start it at offset 1 (the offset is required to allow gvinum meta data to be store on disk). Since sunlabel will only allow to use sector offsets, this will waste more space than what’s actually needed for the meta data though this should not be a concern. Don’t make the mistake to start at offset 0 though, it won’t work out properly.

To set partition a: size take the partition c: size and reduce it by the amount you see in the ’sectors/cylinder’ header.

# /dev/da2:
text: SUN4.2G cyl 3880 alt 2 hd 16 sec 135
bytes/sector: 512
sectors/cylinder: 2160
sectors/unit: 8380800

8 partitions:
#
# Size is in sectors.
# Offset is in cylinders.
# size offset tag flag
# ———- ———- ———- —-
a: 8378640 1 unassigned wm
c: 8380800 0 backup wm

Repeat these steps for all future member disks of the RAID5 plex.

If you have all identical disks, you could safely dump the disk label of your first device to a prototype file like this:

sunlabel DEVICE > sunlabel.DEVICE

Then restore the label to your other devices like this:

sunlabel -R NEW_DEVICE sunlabel.DEVICE

#3 Create gvinum RAID5 volume

Next you should create a sample configuration file (eg. /tmp/raid5.conf) for initialization. Consider that the chunk size (261k in the example) should not be a power of 2, otherwise you filesystem super blocks might end up on the same physical disk.

drive vol1_disk1 device /dev/da2a
drive vol1_disk2 device /dev/da3a
drive vol1_disk3 device /dev/da4a
drive vol1_disk4 device /dev/da5a

volume raid5_vol1
plex org raid5 261k
sd drive vol1_disk1
sd drive vol1_disk2
sd drive vol1_disk3
sd drive vol1_disk4

If you may also choose to build your array with a designated hotspare drive, which might then look like this:

drive vol1_disk1 device /dev/da2a
drive vol1_disk2 device /dev/da3a
drive vol1_disk3 device /dev/da4a
drive vol1_disk4 device /dev/da5a hotspare
volume raid5_vol1
plex org raid5 261k
sd drive vol1_disk1
sd drive vol1_disk2
sd drive vol1_disk3

Now invoke gvinum to create the RAID5 volume:

gvinum create /tmp/raid5.conf

This should print a status listing after initialization (the sample shows an array without hotspare drive):

4 drives:
D vol1_disk4 State: up /dev/da5a A: 0/4091 MB (0%)
D vol1_disk3 State: up /dev/da4a A: 0/4091 MB (0%)
D vol1_disk2 State: up /dev/da3a A: 0/4091 MB (0%)
D vol1_disk1 State: up /dev/da2a A: 0/4091 MB (0%)

1 volume:
V raid5_vol1 State: up Plexes: 1 Size: 11 GB

1 plex:
P raid5_vol1.p0 R5 State: up Subdisks: 4 Size: 11 GB

4 subdisks:
S raid5_vol1.p0.s3 State: up D: vol1_disk4 Size: 4091 MB
S raid5_vol1.p0.s2 State: up D: vol1_disk3 Size: 4091 MB
S raid5_vol1.p0.s1 State: up D: vol1_disk2 Size: 4091 MB
S raid5_vol1.p0.s0 State: up D: vol1_disk1 Size: 4091 MB

Additional status information should be visible in your dmesg output.

The status can be reviewed by invoking ‘gvinum list’ at any time.

Make sure that the configuration gets saved by running:

gvinum saveconfig

#4 Format And Mount gvinum RAID5 volume

Now you are ready to format and mount the RAID5 volume.

newfs /dev/gvinum/raid5_vol1
mount /dev/gvinum_raid5_vol1 /mnt

Add the device to your fstab to automatically mount it during startup. For this to work you should also instruct the boot loader to enable gvinum. Add this line to /boot/loader.conf:

geom_vinum_load=”YES”

This step can be omitted if you have included geom_vinum with your kernel. This is however not recommended according to the FreeBSD manual.

#5 What Else Must Be Done?

Your RAID5 volume should be up an running by now.

The man pages of gvinum and geom will cover advanced topics, amongst them mirroring, concatenation and combinations thereof.

Special attention must be given to optimization, eg. how the chunk or stripe size and the filesystem block size affect read/write performance.

SS20 not recognizing IBM hard drives

Friday, May 26th, 2006

It was not very obvious and took me quiet some time to find out why two recently acquired Sun SS20 machines did not recognize IBM SCA hard drives.

Both machines originally came with very old 1 GB SCA hard drives. This is why I wanted to replace them by bigger 4 GB IBM hard drives (DDRS-34560).

Unfortunately Sun’s OpenBoot firmware would not recognize my replacement drives.

Since the drives were spinning up, I could not believe that the drives had failed. Testing in another machine showed me I was right: the drives were just running fine.

So why would they fail in the SS20’s?

After some research I found that the SS20 scsi bus supports wide and narrow SCSI-2, though only in single ended mode.

Since my IBM drives have not had the jumper #6 (S/E mode) closed, they did not initialize properly on the bus. After closing the jumper the drives were also recognized on the SS20’s.

Another Way To Disable Debugging in VMware Server Beta

Sunday, May 14th, 2006

**** Article Obsoleted ****

This article has been obsoleted by the final release of VMware Server 1.0.
It is left here as a reference only.

**** Article Obsoleted ****

Following up to my previous post on disabling debugging im VMware Server Beta I played around with altering the virtual machine vmx configuration files directly.

I had the idea while playing around with VMware Player, when I tried to change settings originally not forseen from within the interface…

Some editing of a vmx file of VMware Server proved that debugging could also be disabled on a per-VM basis, even if the hack of my previous howto was not applied.

Simply add these two lines to any vmx file (on linux you’ll usually find them at /var/lib/vmware/Virtual Machines):

debug = FALSE
logging = FALSE

I did not notice any difference if logging was either TRUE or FALSE, but as the Beta tends to log just about anything, it most definitely has an (small) impact in performance.

The debug flag is however worth it in any case. I’m sure this is the official way to disable debugging pretty much the samy way as the management console will do in the final release.

Installing Trimmed-Down Userland To FreeBSD Jails

Friday, May 12th, 2006

For obvious reasons there is a lot of howto’s on FreeBSD jails. One of the IMHP best is, besides the man page ;-) , at section6wiki.

While the howto explains everything you need to get started, I was fiddling around with a way to install a trimmed-down userland to a jail without editing or moving around /etc/make.conf. The reason to do this is simple: The system in question was not solely decicated to running jails and I wanted to avoid the toolchain within the jails at any cost. So I basically looked only for a simple and fast way to install the userland without tampering with my existing configuration.

‘man jail’ lists dozens of variables that can be put into /etc/make.conf to enable or disable certain features. A current list can also be found at /usr/share/examples/etc/make.conf. I’d recommend to take your options out of the example make.conf, as the man page is not always up to date.
If you don’t want to alter your existing /etc/make.conf (not even moving files around or such), the only way for a simple and straight forward install is by passing environment variables to ‘make’, eg.

make installworld DESTDIR=/my/path/to/jail NO_TOOLCHAIN=yes NO_BLUETOOTH=yes NO_BOOT=yes NO_CXX=yes NO_FORTRAN=yes NO_GDB=yes NO_GPIB=yes NO_I4B=yes NOINET6=yes NOATM=yes NO_USB=yes NO_LPR=yes NO_ACPI=yes NO_VINUM=yes NO_MAN=yes NO_SHAREDOCS=yes NO_GAMES=yes NO_INFO=yes NO_SHARE=yes NO_SENDMAIL=yes NO_BIND=yes NO_AUTHPF=yes NO_CVS=yes NO_PF=yes NO_IPFILTER=yes NO_MAILWRAPPER=yes NO_NIS=yes NO_NETCAT=yes

Of course the same environment variables must be used when running ‘make distribution’ from /usr/src/etc.

This will install a trimmed down userland to a jail of around 60 MB, leaving me most tools at hand while omitting the more specialized ones usually not needed within a jail.

Take care though when installing a jail like this from your host’s source tree. If you are building jails on a regurlar basis, it maybe better to have a second source tree around for building jails.
If you are using your host’s regurlar source tree, I’d recommend to first to a regurlar (eg. non-altered) ‘make buildworld’ and running ‘make installworld’ with the parameters given above later on. This will allow usage of the source tree for both your host and any subsequent jails.

Special attention must be given to the exclude parameters in this case though, as there are some dependencies which must be fulfilled. This is why you cannot exclude some subsets during ‘make installworld’ after running a full-fledged ‘make buildworld’.

If you are testing things out it may be best to temporarily disable kernel securelevel, otherwise you won’t be able to delete the files from the jail tree due to ’system immutable’ flags on some files within the tree. The same holds true when you try to update an existing jail.

You can circumvent this requirement however if you choose to install your jails within loop-back mounted disk images, which might be a good idea for limiting disk quota anyway.

Policy Filter for ClamSMTP (pf-clamsmtp)

Monday, May 1st, 2006

pf-clamsmtp is a perl-written policy filter for use with Postfix and clamsmtp primarily written for one purpose: pass messages to the clamsmtp virus filter only if they don’t exceed a given size.

The reason for doing so was simple: traditional tools like MailScanner or AMaViS put to much of a burden on systems when simple virus filtering is desired. Also it doesn’t make sense to scan today’s huge megabyte-sized messages: first, most mail viruses rarely exceed a few kilobytes in size, second, scanning huge messages has a big payload and may expose a system to certain race conditions (eg. out of memory, disk space, etc).

pf-clamsmtp is based on a sample policy daemon included with Postfix and developped to work together with ClamSMTP. Sample configurations are included with the release tarball. pf-clamsmtp is released under the terms of the GNU GPL, version 2.

Policy Filter for ClamSMTP (pf-clamsmtp)

Any feedback is appreciated.