Singularity, Part I: Confessions of a Digital Pack Rat
Articles in this series
Singularity, Part I: Confessions of a Digital Pack Rat
Singularity, Part II: The Network Upgrade [
Image 1] [
Image 2]
Singularity, Part III: Construction
Singularity, Part IV: Performance Testing
Singularity, Part V: Conclusion
Prologue
My desk is many men's worst nightmare. Papers are strewn everywhere. Pencils pour from every crevice. Network charts which should be on the wall are buried under stacks of paper and cdroms. That paperweight given to me as a gift is probably at the bottom of my trash bin--I trash anything that's not absolutely needed. If I didn't I'd drown.
It's not dirty, you understand. It's
cluttered. Disorganized.
It's a stark contrast to my digital world. Here I'm a meticulously-organized pack rat. I keep everything I download--I probably still have the NVidia drivers for my original GForce card somewhere under my vast hierarchical construct.
Such an addiction comes at a cost. My habit quickly outpaced available storage. Luckily for me we live in the future, where gigantic fileservers can be assembled from consumer parts. Where a digital pack rat can thrive.
This is the story of the singularity--the fileserver that will succeed
sol and retire jupiter. The third incarnation of primal consumer storage.
Disclaimer
This journal is being written as we build this server. Hardware specs and plans may change as we run into problems we can't solve or realities we didn't consider. As with any project, contingency plans must be considered.
ZFS, Thumper and Moore's Law
While Moore's Law applies only to CPUs, I'm convinced a similar law must exist for storage. Our first fileserver built five years ago, jupiter, is an 8-drive EIDE system using a 3WARE hardware raid controller. With 200 GiB drives it provides just over 1.3 TiB of usable space.
Sol, our second fileserver built two years ago, is a 12-drive (not counting mirrored OS drives) SATA system behind software raid. With 320 GiB drives it provides around 3.3 TiB of usable space. Our new fileserver, using 12 750 GiB SATAII drives, will provide almost 7.5 TiB of usable space with a higher level of redundancy than either of the servers before it.
We don't currently
need the space--
sol has a good terrabyte left in it should we clean up some backups, duplicate data and other assorted garbage (When you have this much space, you start ripping VOBs of your DVDs instead of watching them directly since it's so much more convenient). From a purely geek point of view, however, we
want the space.
You may have heard of Sun's new X4500 server named
Thumper. This beast packs 48 TiB of storage in 4U of space. Sun's ZFS filesystem drives this warhorse.
Thumper is priced too steeply for a consumer, but the same concepts can be applied to lower-end equipment.
Solaris Reborn
Solaris today is where Linux was nearly a decade ago--aside from enthusiasts and major geeks it's not really accepted as an OS fit for widespread use. Although few would deny its power on expensive SPARC hardware, most professionals (myself included) viewed Sun's x86 version of the OS with contempt. We laughed at the concept of Solaris rivaling Linux in the low-end server market.
Yet this is exactly what's happening. While Linux delves deeper and deeper into the realm of user desktops Sun is content to expand Solaris only as a server OS. Their development in recent years has provided us with tools and features that eclipse what's available on other OSes.
DTrace, for example. Or the replacement of sys5-style init scripts with
SMF.
And, of course,
ZFS.
The Decision
This isn't to say Solaris x86 isn't without problems. Driver support is pathetic, especially on the SATA side. GNU tools are nowhere to be found unless you install them yourself.
Still, the delight of ZFS (among other things) made this an easy decision. I've never been a fan of Linux's implementation of software raid (even if I do prefer it to most hardware solutions) and completely abhor LVM. This combined with Linux's terrible NFS performance and general IO subsystem stupidities has made me often consider alternatives in the past--but until now there simply hasn't been any. Despite its problems Linux was still the best fit.
For anyone experienced in raid setup and maintenance, reading Sun's docs on ZFS feels like finding the Promised Land.
ZFS Limitations
In early lab work one immediate concern with ZFS presented itself--the filesystem provides no practical extended attributes. Ugh.
A trick I've used commonly under Linux is to have a cronjob recurse through a filesystem nightly and set
the immutable flag on every file. This provides an extra level of safety against user stupidity. Unless you deliberately go in and
chattr -i a file you cannot delete it. This option is better than simply marking a file read-only, as the owner (or root) can still delete it or corrupt it without being prompted. If a file has the immutable flag set it cannot be touched. Ever. Not unless you remove the flag. (You can also limit the capabilities of the Linux kernel so this flag cannot be removed without a reboot.)
ZFS does not support this. What ZFS
does support, however, are
snapshots, which provide an even better level of user stupidity protection than does chattr.
The idea is to take five nightly snapshots. This will give us a five-day window to correct any stupidity at a minimal cost of disk space.
Next -- Singularity, Part II: The Network Upgrade
With jupiter, we took little to no photos at all.
With sol we took a few (http://liquidx.deadgod.net/Nicks_Photos/Sol/all) but still not terribly many.
I will try to get a bunch of this new server, which we have named Singularity.
And because LVM sucks total ass, that's why. You spend all your energy calculating correct stripe sizes and strides, then LVM comes along and says, "Dur! I'm going to ignore it!" Even on a single drive where this doesn't matter you'll suffer performance penalties of up to 40% using LVM.
Look, any logical volume manager is going to cost you performance--Sun's, Vertias, whatever. Oviously any time you add a new process to the IO stack your performance will decrease.
Of all the volume managers I've used, however, LVM seems to incur the most overhead for the fewest features. It's a steaming pile of horse shit.
^^ This is the case I have my eye on.
The internals are the same as Sol, but a different external look.
Also it adds a side fan.