Playback speed
×
Share post
Share post at current time
0:00
/
0:00

Storage & Virtualization Using ZFS

Many options, here's what I do.

The previous article, Starter workstation for $200, contained some information about what I’m doing in terms of hardware. One of the key enablers for you to level up is the ZFS file system and volume manager. If you pair ZFS with the right mix of devices you can get a very capable storage subsystem, we’re not talking just storing your files, the snapshot/rollback capability means you are free to abuse virtual machines without spending hours to rebuild one if you make a mistake.

This article is for those who fill the tech support role for themselves or for a group. Getting your head around this to the point of being able to create solutions is non-trivial, but implementing my recommendations is no more troubling than battling Windows to get it to take a new video driver.

Attention Conservation Notice: If you already have an advanced file system that handles snapshots, or if you’re stuck with macOS or Windows, this post is not for you.

Foundation:

We’ll assume you’ve got a system with 32GB of ram and the capacity to take multiple storage devices. Dell Precision laptops can support both NVMe and SATA storage at the same time and older HP workstations have half a dozen SATA ports and plenty of PCIe slots.

I learned ZFS by reading the FreeBSD Mastery:ZFS book. You should NOT do this, unless you have a lot of experience with FreeBSD. Instead you will need to read Aaron Toponce’s ZFS for Linux guide, which is in the Tech folder on Dropbox. The original website is gone, which is strange, because I talked to him on Mastodon a few months ago, and everything seemed fine.

There are some specific benefits we are aiming for by using ZFS:

  • Easy to mirror a pair of drives w/o RAID hardware.

  • Small amounts of SSD cache make large spindles feel as fast as flash.

  • Datasets are logical slices of a volume akin to separate file systems.

  • Snapshots of ZFS datasets take less than a second.

  • Rollback of a snapshot is just a matter of seconds.

  • Creating virtual block storage devices for experiments is easy.

If that doesn’t sound like English the goal is simple. If you create a ZFS dataset to store a virtual machine, you can snapshot the machine before you do any major changes. This takes almost no time because ZFS just preserves a map of the blocks of the dataset. If you boot the VM and start changing things, ZFS gives you new blocks for the changes and keeps the old blocks. If you manage to wreck the VM, restoring it to the earlier snapshot takes just a few seconds. ZFS does not move data in bulk to do snapshots, so it’s hundreds to thousands of times faster than a traditional backup.

My #1 use case for this is the lone Windows 11 virtual machine I keep for Sentinel Visualizer. I have always found Windows to be fragile and erratic, ever since I started using Version 1.0 clear back in 1986. I literally changed careers to get away from Microsoft and if I didn’t have the ability to snapshot/restore that VM, I would just avoid using Sentinel Visualizer all together.

Storage Hardware:

When I started using ZFS in 2019 the Seagate Nytro 240GB is what I did for my systems. Keep in mind that I terrorize mass storage with things like ArangoDB and Elasticsearch, so I preferred paying $99 for this drive, which can handle three writes per day, every day, for its five year warranty period. A similar consumer drive at that time was about $40.

I had a cluster of HP Z620s in production at that time and I used Seagate IronWolf drives for volume storage. Since my desktop is in my bedroom, and I was running a low voltage liquid cooled processor to limit fan noise, I chose 2.5” Western Digital Red 1TB spindles for storage. They are not produced any more so the price has gone up, I think I paid $108 for mine.

The twin to my desktop is downstairs running Proxmox. This is a clusterable, headless cloud computing environment. I got into Proxmox because I have three Dell rackmount systems in storage right now, awaiting a new customer that will cover the $400/month for a cabinet at Hurricane Electric. The machine downstairs was meant to be R&D, but it’s become production for me.

The small SATA heavy duty Nytro drives are no longer made, so I put an XF1230 in that machine instead. These only do 0.7 total drive writes per day for their warranty period, but that is still a LOT more than a consumer drive.

The rack mounts are the same age as my workstations. I was contemplating improving their storage performance, so I bought a couple small NVMe drives and PCIe carriers. The carrier claimed it would support any size of NVMe drive, but they’re not tapped for the movable mounting standoff, hence my ghetto fabulous pony tail band retainer. These 128GB drives were $9 and the carriers were $12.

Some of you will want to ask why I used these bottom of the barrel NVMe devices. There were two motives, the first being the smallest Seagate IronWolf NVMe drive is a 500GB unit that costs $107, and the second is that if ZFS loses a cache drive, it does so transparently. Losing a drive downstairs is no big deal. Losing a drive at a hosting facility where I have to pay the smart hands fee to get a replacement installed is a quite different matter.

I partitioned the NVMe drives into quarters. This allows for swap space, ZFS cache space, ZFS ZIL cache space, and one spare partition for whatever. Usually when flash drives die they let go completely. If one of these has a single partition go bad I can easily rearrange things.

Clean Slate Start:

If I had a new workstation and I could not reuse the drives I own I would do the following.

  • Seagate Nytro XF1230 OR IronWolf 525 for boot volume and base system.

  • IronWolf 4TB spindles in a pair for storage.

There OR up there is due to the differences between the Z420 and Z440. I don’t think the Z420s will boot from NVMe storage on a PCIe card. I know the Z440 will do that with an HP Z-Turbo drive. If you can get a machine that will handle NVMe that’s a big step towards future proofing your system.

I use mass storage drives in pairs because I hate data loss AND downtime. That’s just the ISP plant engineer in me talking. The ghetto fab NVMe install downstairs caches a lone refurbished HGST 10TB drive. The key data on there is mirrored to a 4TB IronWolf that is now in my desktop. If I find someone to cover the hosting cost there are a mix of 4TB and 16TB IronWolf drives in the rack mounts, with Nytro 240s for boot. I will probably add IronWolf 525s on PCIe carriers to improve caching performance.

Conclusion:

Your time is valuable. If you spend an evening getting ZFS to run you will have a system with snappy performance and you’ll get that time back, bit by bit, as you’ll be able to abuse virtual machines without facing a reinstall and data restoration.

None of the devices I mentioned in this article cost more than $100. You can get to 4TB of mirrored high performance storage for under $300 on drives with five year warranties. If you’re willing to use refurb drives in pairs you can accomplish the same with 10TB.

Discussion about this video

Netwar Irregulars Bulletin v2.0
Tool Time
Short articles and videos showing how to use the various tools that are mentioned in the Netwar Irregulars Bulletin.
Authors
Neal Rauhauser