I got 4 20TB drives from Amazon around Black Friday that I want to get setup for network storage. I’ve got 3 descent Ryzen 5000 series desktops that I was thinking about setting up so that I could build my own mini-Kubernetes cluster, but I don’t know if I have enough motivation. I’m pretty OCD so small projects often turn into big projects.
I don’t have an ECC motherboard though, so I want to get some input if BTRFS, ZFS, TrueNAS, or some other solution should be relatively safe without it? I guess it is a risk-factor but I haven’t had any issues yet (fingers crossed). I’ve been out of the CNCF space for a while but Rook used to be the way to go for Ceph on Kubernetes. Has there been any new projects worth checking out or should I just do RAID and get it over with? Does Ceph offer the same level of redundancy or performance? The boards have a single M.2 slot so I could add in some SSD caching.
If I go with RAID, should I do RAID 5 or 6? I’m also a bit worried because the drives are all the same so if there is an issue it could hit multiple drives at once, but I plan to try to have an online backup somewhere and if I order more drives I’ll balance it out with a different manufacturer.
Based on the hardware you have I would go with ZFS (using TrueNAS would probably be easiest). Generally with such large disks I would suggest using at least 2 parity disks, but seeing as you only have 4 that means you would lose half your storage to be able to survive 2 disk failures. Reason for the (at least) 2 parity disks is (especially with identical disks) the risk of failure during a rebuild after one failure is pretty high since there is so much data to rebuild and write to your new disk (like, it will probably take more than a day).
Can’t talk much about backup as I just have very little data that I care enough about to backup, and just throw that into cloud object storage as well as onto my local high-reliability storage.
I have tried many different solutions, so will give you a quick overview of my experiences, thoughts, and things I have heard/seen:
Single Machine
Do this unless you have to scale beyond one machine
ZFS (on TrueNAS)
FreeNASTrueNAS GUI.MDADM
BTRFS
UnRaid
Raid card
JBOD
Multi-machine
Ceph
SeaweedFS
Tahoe LAFS
MooseFS/LizardFS
Gluster
Kubernetes-native
Consider these if you’re using k8s. You can use any of the single-machine options (or most of the multi-machine options) and find a way to use them in k8s (natively for gluster and some others, or via NFS). I had a lot of luck just using NFS from my TrueNAS storage server in my k8s cluster.
Rook
Longhorn
OpenEBS
Wow this is an amazing explanation! I’ve been starting to lean towards UnRAID but I didn’t realize their fault tolerance wasn’t as vetted as my cohorts believe. I’ve been taking a painful option to keep an lvm on ubuntu for my drives, but I’m trying to get out of my mistake and build something for offsite so I can feel comfortable with redoing it all.
Thanks so much for the very detailed reply. I think at this point I’m conflicted between using TrueNAS or going all in and trying SDS. I’m leaning towards SDS primarily because I want to build experience, but heck maybe I’ll end up doing both and testing it out first and see what clicks.
I’ve setup Gluster before for OpenStack and had a pretty good experience, but at the time the performance was substantially worse than Ceph (however it may have gotten much better). Ceph was really challenging and required a lot of planning when I used it last in a previous role, but it seems like Rook might solve most of that. I don’t really care about rebuild times… I’m fine if it takes a day or two to recover the data as long as I don’t lose any.
As long as I make sure to have an offsite backup/replica somewhere then I guess I can’t go too wrong. Thanks for explaining the various configurations of Gluster. That will be extremely helpful if I decide to go that route, and if performance can be tuned to match Ceph then I probably will.