phaqphaq

“a geeks daily life”

Archive for the 'FreeBSD' Category

Strange compilation error on MySQL

Thursday, July 16th, 2009

Yesterday I started digging around for a solution to create per-user or per-database statistics on MySQL, one of the more important peaces I was missing from it for a long time.

Luckily enough, some guys over there had already done some work on this topic, so I wouldn’t have to start over from scratch :-)

It took me only little time to port over the patch from MySQL 5.0.51 to the more current 5.0.83 release within the FreeBSD ports tree, however not soon after starting a build I would encounter this error message:

[root@bld-bsd-224-221 /usr/ports/databases/mysql50-server]# make build
[ - some output omitted - ]
/bin/sh ../ylwrap sql_yacc.yy y.tab.c sql_yacc.cc y.tab.h sql_yacc.h y.output sql_yacc.output --  -d -d --debug --verbose
-d: not found
*** Error code 1

At first I thought the port was corrupted so I refetched the package and reapplied the patch, to no avail.
So I tried again using the original port without having the patch applied, which worked flawlessly.

At second glance I checked for the file list from above command and noticed that it included the file named sql_yacc.yy, one of which had been altered by the previously applied patch.
My conclusion was that the file had been wrongly patched, containing a syntax error or such alike.
I then extracted the unpatched package once more to do a clean rebuild without patches.
I checked the compilation output for the above command line, only to note that it wasn’t actually there!

The question was: Why would the command line “/bin/sh ../ylwrap sql_yacc.yy ….” not get invoked when doing a build on a clean, unpatched package?
I double-checked my patches to see if the command was introduced by itself, which was not the case. That single command actually belongs to the stock MySQL Makefile.

At that stage I decided to just add a single whitespace to the file sql_yacc.yy and run the command manually:

[root@bld-bsd-224-221 /usr/ports/databases/mysql50-server/work/mysql-5.0.83/sql]# make
/bin/sh ../ylwrap sql_yacc.yy y.tab.c sql_yacc.cc y.tab.h sql_yacc.h y.output sql_yacc.output --  -d -d --debug --verbose
-d: not found
*** Error code 1

Interestingly enough that command actually only seems to get involved when the contents of the sql_yacc.yy file is altered.
As such the error was indeed not caused by the patch itself.

So I digged deeper in analyzing the “ylwrap” script file, which is included with the MySQL package. Oh well, at that time I really felt like an idiot!
When I realized that this seemed to by a wrapper script for YACC I also noticed the double-hyphen, which is really an indicator for subsequent command line arguments to be passed on to a sub-process.
Having said that I supposed there was actually missing something in between here: “– {HERE} -d -d”
Could it be that it’s missing the command name of the YACC sub-processor?

Well, do I have YACC installed?


bld-bsd-224-221.genotec.ch:/usr/ports/databases/mysql50-server# which yacc
/usr/bin/yacc

Well, I do … the only catch is: MySQL depends on bison, not on YACC.
To make things worse, neither the FreeBSD port Makefile nor the MySQL configure script check on that dependency, most likely as it is *usually* not required.

Good catch, after having installed bison from the ports tree MySQL compiled like a charm even with all my patches applied :-)

Is RAID1 possible on an USB stick?

Friday, October 26th, 2007

Last week we had a discussion at the office wether it would possible to span a RAID across USB sticks.
That question came up as a joke while I was working on some RAID system for evaluation purposes.
Well, my friend doubted it when I replied that it would definitely work out with a FreeBSD software RAID using gmirror (geom vinum as a matter of fact works, too).

Proof?

Here it is, a ‘dmesg’ from my Sony Vaio PCG-C1MGP bootet off two gmirrored 256 MB USB sticks:

Copyright (c) 1992-2007 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 6.2-RELEASE #0: Fri jan 12 10:40:27 UTC 2007
root@dessler.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Transmeta(tm) Crusoe(tm) Processor TM5800 (727.84-MHz 586-class CPU)
  Origin = "GenuineTmx86" Id = 0x543  Stepping = 3
  Features=0x80893f
real memory  = 251658240 (240 MB)
avail memory = 232452096 (221 MB)
kbd1 at kbdmux9
ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
acpi0:  on motherboard
Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
acpi_ec0:  port 0x62,0x66 on acpi0
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0
acpi_lid0:  on acpi0
acpu_button0:
 on acpi0
[ output omitted ]
umass0: Sony USB Memory Stick Slot, rev 1.10/1.83 addr2
umass1: vendor 0x4146 USB Mass Storage Device, rev 2.00/1.00, addr 2
umass2: vendor 0x4146 USB Mass Storage Device, rev 2.00/1.00, addr 3
[ output omitted ]
da0 at umass-sim1 bus 1 target 0 lun 0
da0: <-pretec 256 MB 1.10> Removable Direct Access SCSO device
da0: 1.000MB/s transfers
da0: 242 MB (4964000 512 byte sectors: 64H 32S/T 242C)
da1 at umass-sim1 bus 2 target 0 lun 0
da1: <-pretec 256 MB 1.10> Removable Direct Access SCSO device
da1: 1.000MB/s transfers
da1: 242 MB (4964000 512 byte sectors: 64H 32S/T 242C)
GEOM_MIRROR: Device gm0 created (id=1986392903).
GEOM_MIRROR: Device gm0: provider da0 detected.
GEOM_MIRROR: Device gm0: provider da1 detected.
GEOM_MIRROR: Device gm0: provider da0 activated.
GEOM_MIRROR: Device gm0: provider da1 activated.
GEOM_MIRROR: Device gm0: provider mirror/gm0 launched.
Trying to mount root from ufs:/dev/mirror/gm0s1a

Of course it’s not incredibly fast, but it works afterall, that was the whole point about it :-)
Where could it be used? Possibly projects like FreeNAS, which support USB installs, could benefit from doing RAID1 on the sticks while also storing sensitive configuration data on them.
I could also imagine to take backups this way, e.g. keep one working copy on the active stick on the computer, while swapping in spare sticks which then automatically rebuild the mirror.

I suppose this also works with linux ‘md’ software raid, and netbsd’s RAIDframe, though I’ve not tested it.

What about Windows? Definitely not with stock functionality. However as there’s also a way to patch software RAID1 functionality into Windows 2000 and XP, one never knows … ;-)

ufs_dirbad panic with mangled entries in ufs

Sunday, July 1st, 2007

FreeBSD’s ufs usually does an excellent job in preventing file system corruption. But even the best system happens to mess up once in a while.

One thing you may eventually stumble accross are so called mangled entries, which are usually not fixable with fsck and result in kernel panics upon access.

Now these are usually a sign of severe file system corruption, often caused by hardware faults like bad memory modules, a faulty disk controller or even a deffective hard drive.

Consider checking and replacing your hardware if you encounter mangled entries on a frequent and recurring occasion.
You may actually succeed in fixing it following the steps outlined below, however it is very likely to happen again if you have faulty hardware. So in the end you’ll end up curing the side-effects and not the actual reason, which may in turn lead to other, even more critical problems.

On the other hand, if you happen to have a corrupted file system like this very, very seldomly (as in “about once in a decade”, it happened to myself only three times in 10 years that I’ve worked on some 200-300 servers in total) you may risk fixing it by means of the file system debugger.

I define this as a “minor corruption” to which the following usually applies:

  • happens very, very seldomly
  • happens as a result of a server deadlock/crash/power failure/etc
  • limited to one or maybe two directory or file entries
  • is not found by fsck eventually
  • is not fixed by fsck
  • causes the server to panic when accessing given file or directory

When the error happens

A typical error message thrown at you in this case may look like this (some output omitted):

/mnt/da1s1a: bad dir ino 16392 AT OFFSET 512: MANGLED ENTRY
panic: ufs_dirbad: bad dir

The message gives some essential information about the file system concerned (the actual mountpoint, not the device name itself) amd the inode of the directory or file.

First steps in recovery

So the next best thing to do in this situation is to reboot into single user mode.
From there have fsck inspect the device first.

# fsck -y /dev/da1s1a
** /dev/da1s1a
** Last Mounted on /mnt/da1s1a
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counters
** Phase 5 - Check Cyl groups
60040 files, 464657 used, 423186 free (43252 frags, 47493 blocks, 4.9% fragmentation)

***** FILE SYSTEM MARKED CLEAN *****

Now since mangled entries are usually not fixed by fsck, the term “FILE SYSTEM MARKED CLEAN” should not be trusted in.

You may risk to bring your system back up without any further work, however if it panics again with the same message (mind the inode number), you are likely to have unfixable (by means of fsck) corruption.

Optional but recommend: Try to crash the machine again

Personally I always try in crashing the system before I touch the file system with the debugger, however not without taking a current backup first if at all possible.

The reasons for doing so is simple:

  • Trying to crash the machine will prove that the corruption still exists
  • Finding the bad entry through manual search may indicate why and where it happened eventually
  • It may reveal additional corruption
  • It may proove that it cannot be fixed by fsck at all, no matter how often you run it

Finding the corrupted entry is easiest by walking the directory structure.

For this a simple command line like this usually works well enough. It should be run from single user mode and on the read-only mounted target device only to minimize all impacts.

# find / -type d -exec ls -ld {} ;

This will usually cause the system to panic again when accessing the corrupted directory.
If it does not, this method may:

# find / -type d -exec stat {} ;

If this still doesn’t work, you may mount the device read-write so the afore mentioned commands can actually touch the file system to update file access times.

And if even that fails, try to create a dummy file inside each directory will do for sure:

# find / -type d -exec touch {}/mydummyfilenamewhichshouldnotexist ;

Now it must be noted that doing this on a already corrupted read-write file system _is_ dangerous.

I cannot stress this enough:

Don’t take the risk if you don’t have a backup!
Don’t take the risk if you’re not aware of the consequences!
Don’t take the risk if you’re a newbie!

A panic in this situation could make it even worse!

So, the system panics again…

Let’s assume the system panics again with the same error message.

If you were lucky enough you even saw which directory was last accessed before the panic.
This may be valuable to know if you run some certain type of application and could reveal yet unknown application errors or even vulnerabilities like temporary file creation race conditions.

/mnt/da1s1a: bad dir ino 16392 AT OFFSET 512: MANGLED ENTRY
panic: ufs_dirbad: bad dir

So you now have proof that there is (still) an unfixed corruption on the file system.
You also have proof that it happened at the same inode than before.
If it’s not the same inode, then you know for sure that there’s either another corruption or faulty hardware which causes excessive errors.

For the latter case remember what I wrote before about faulty hardware.

Right, now how to fix it?

To fix it go back to single user mode and re-run fsck just to make sure. Keep your device mounted read-only.

Then start the file system debugger, fsdb:

# fsdb /dev/ad1s1a
** /dev/ad1s1a
Editing file system '/dev/ad1s1a'
Last mounted on /mnt/ad1s1a
[output omitted]
fsdb (inum: 2)>

Now go to the inode which was mentioned during kernel panic to get some additional information.

fsdb (inum: 2)> inode 16392
current inode: directory
I=16392 MODE=40755 SIZE=512
           BTIME=Oct 23 11:47:24 2006 [0 nsec]
           MTIME=Oct 23 11:47:24 2006 [0 nsec]
           CTIME=Oct 23 11:47:24 2006 [0 nsec]
           ATIME=Oct 23 11:47:24 2006 [0 nsec]
OWNER=root GRP=WHEEL LINKCNT=2 FLAGS=0 BLKCNT=4 GEN=157338b7
fsdb (inum: 16392)>

Even if it results in data loss, clearing the inode is the way to go to get rid of this.

fsdb (inum: 16392)> clri 16392

Then exit the debugger:

fsdb (inum: 16392)> quit

**** FILE SYSTEM STILL DIRTY *****
*** FILE SYSTEM MARKED DIRTY
*** BE SURE TO RUN FSDK TO CLEAN UP ANY DAMAGE
*** IF IT WAS MOUNTED, RE-MOUNT WITH -u -o reload

Run fsck as told:

** /dev/da1s1a
** Last Mounted on /mnt/da1s1a
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
UNALLOCATED  I=16392 OWNER=root MODE=0
SIZE=512 MTIME Oct 23 11:47:24 2006
NAME=/dsj????

REMOVE=YES

** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counters
LINK COUNT DIR I=2  OWNER=root MODE=40755
SIZE=512 MTIME=Oct 23 11:47:24 2006  COUNT 21 SHOULD BE 20
ADJUST? yes

** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? yes

SUMMARY INFORMATION BAD
SAVLAGE? yes

BLK(S) MISSING IN BIT MAPS
SALVAGE? yes

60039 files, 464655 used, 423188 free (43248 frags, 47492 blocks, 4.9% fragmentation)

***** FILE SYSTEM MARKED CLEAN *****

***** FILE SYSTEM WAS MODIFIED *****

This is it?

Basically yes.

However I recommend rebooting the system once more into single user mode to rerun ‘find’.
This will reveal if there is (no) further corruption. Also the reboot will ensure that the operating system can re-read the disklabel and file system properly. This is especially important after messing around with the file system debugger.
For this reason do run fsck once more just to make sure the file system is really clean.

Also try keeping to these premises:

  • one corruption may happen once in a while and really mean nothing
  • two is still possible but must be looked at critically
  • three is a bad sign, there’s usually more to come

Remember: The file system is at the heart of your server. Messing it up could compromise your data, your users and even your job. So care for it!

FreeBSD software RAID0: gvinum vs. gstripe

Thursday, June 7th, 2007

Back some time I announced reviewing FreeBSD’s geom software RAID implementations.

Todays article compares geom stripe (gstripe) along with geom gvinum (gvinum) for disk striping (RAID0).

All testing was done on the same hardware as before to get results comparable to previous tests.

Benchmarks were taken using stripe sizes of 64k, 128k and 256k and measured using dd, bonnie++ and rawio as before.

As for the technology gstripe follows the same approach than gmirror which I look at previously.

# rawio benchmark results

rawio was choosen to measure I/O speed during concurrent access. rawio was set to run all tests (random read, seq read, random write, seq write) with eight processes on the /dev/stripe/* and /dev/gvinum/* devices.

Results for the single disk are provided as well to compare performance not only between the different frameworks but also against the native disk performance.

Click the images to see the actual result values and a chart.

* WPG2 Plugin Not Validated *

* WPG2 Plugin Not Validated *

# dd benchmark results

dd was choosen to measure raw block access to /dev/mirror/* and /dev/gvinum/* devices. dd was set to run sequential read and write tests using block sizes from 16k to 1024k.

Click the images to see the actual result values and a chart.

* WPG2 Plugin Not Validated *

* WPG2 Plugin Not Validated *

# bonnie++ benchmark results

finally, bonnie++ was used get pure file system performance.

Click the images to see the actual result values and a chart.

* WPG2 Plugin Not Validated *

* WPG2 Plugin Not Validated *

# conclusion

Looking at raw disk access I must conclude that none of the frameworks beats single disk performance in overall when it comes to blockwise input/output with dd.
gvinum generally performs better than gstripe except when using 256k stripe sizes.

Now since ‘dd’ is very synthetic by it’s nature, rawio is much better to see how the devices would perform under a more “real-life” situation.
Although rawio benchmark results may look low, these numbers where achieved by running 8 processes at once. They’ll reflect best what could be expected in a true multi-user environment with concurrent access.
As from the results there is no absolute winner, as depending on the stripe sizes either of both implementations out-performs the other.

Finally for bonnie++ we see some interesting results. Performance is almost identical for all implementations.
One notable exception was seen with gvinum (64k stripe size) which clearly outperformed its competitors..
One must keep in mind that the first six tests performed by bonnie++ (rand delete/read/create, seq delete/read/create) are limited by I/O performance of both the system bus and the device itself. The hardware I used for testing was capable of about 160 – 170 I/Os per second. I admit that results could be different if the tests are re-run on decent hardware with a higher I/O throughput. It’s possible that modern hardware reveals an I/O barrier for abstracted devices which cannot be seen from my tests.

Personally I prefer using gstripe over gvinum because of it’s more simplistic configuration approach. In terms of performance, gvinum seems to offer superiour performance when it comes to disk striping.

The next article will discuss gvinum and gstripe for RAID10.

FreeBSD’s loader fails with wrong harddisk geometry in BIOS

Friday, May 25th, 2007

I’s been a while since I last saw issues with FreeBSD’s loader(8).

The error I came along today read like this:

can't load kernel

The most obvious reasons would be that either the kernel is missing or it’s filename was specified incorrectly.
I thought to verify this by issueing ‘ls’ in the first place, only to notice that it would show nothing.
Wait! This is supposed to show the root directory contents, isn’t it?

Maybe loader didn’t catch up with tje devices? Let’s look at them:

lsdev
disk devices:
    disk0: BIOS drive C:
         disk0s1: FFS bad disklabel

Well, this is not supposed to happen at all.

Obviously loader can see the device but is unhappy with the disklabel.

Given the fact that the disklabel is not corrupt, which can easily be verified if the system is booted by using a recovery disk or a FreeBSD bootonly CD, then there must be some other reason for this.

So I checked the disk in question for it’s logical configuration, which was manually established by means of fdisk and disklabel.
This and also the filesystem were intact and did not reveal any errors.
To make sure I even checked the primary boot blocks and re-installed them using the ‘boot0cfg -o packet ad0′ command to ensure use of LBA addressing.

Since it would still not boot from the device I checked with my BIOS where I noticed that it reported the disk’s translation mode as being “automatic”.

I suspect “automatic” in this consensus likely meant ‘CHS’ and not ‘LBA’, this is why loader(8) failed on the disk.

After changing BIOS disk translation mode to ‘LBA’ it finally worked out.