this post was submitted on 14 Dec 2023
32 points (84.8% liked)

Selfhosted

40329 readers
393 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

I recently got a few (5) hard drives to turn my home server into a NAS with trueNAS scale and my idea is to have 4 usable and 1 for redundancy, my question is... How does RAID work, like what is RAID 0, RAID 5, software RAID etc, and does any of that even matter for my use case?

all 37 comments
sorted by: hot top controversial new old
[–] Nibodhika@lemmy.world 35 points 11 months ago (1 children)

You have a 5GB file:

RAID 0: Each of your 5 disks stores 1GB of that data in alternating chunks (e.g. the first disk has bytes 1, 6, 11, second disk has 2, 7, 12, etc), occupying a total of 5GB. When you want to access it all disks read in parallel so you get 5x the speed of a single disk. However if one of the disks goes away you lose the entire file.

RAID 1: The file is stored entirely on two disks, occupying 10GB, giving a read speed of 2x, and if any single disk fails you still have your entire data.

RAID 5: Split the file in only 3 chunks similar to above, call them A, B and C, disk 1 has AB, disk 2 has BC, disk 3 has AC, the other two disks have nothing. This occupies a total of 10GB, it's read at most st 3x the speed of a single disk, but if any single one of the 5 disks fails you still have all of your file available. However if 2 disks fail you might incur in data loss.

That's a rough idea and not entirely accurate, but it's a good representation to understand how they work on a high level.

[–] Aiyub@feddit.de -4 points 11 months ago

Better explanation of raid 5:

You have 5GB of data and 5 disks. You split your data into 4 parts and split one on each disk. Then disk 5 remembers if there is an odd or even number of 1s on the other disks. So whichever disk fails you can count if it was odd or even. So you loose 1 disk but keep full capacity of the other disks. No doubling like suggested before

[–] tburkhol@lemmy.world 14 points 11 months ago (1 children)

Traditionally, RAID-0 "stripes" data across exactly 2 disks, writing half the data to each, trying to get twice the I/O speed out of disks that are much slower than the data bus. This also has the effect of looking like one disk twice the size of either physical disk, but if either disk fails, you lose the whole array. RAID-1 "mirrors" data across multiple identical disks, writing exactly the same data to all of them, again higher I/O performance, but providing redundancy instead of size. RAID-5 is like an extension of RAID-0 or a combination of -0 and -1, writing data across multiple disks, with an extra 'parity' disk for error correction. It requires (n) identical-sized disks but gives you storage capacity of (n-1), and allows you to rebuild the array in case any one disk fails. Any of these look to the filesystem like a single disk.

As @ahto@feddit.de says, none of those matter for TrueNAS. Technically, trueNAS creates "JBOD" - just a bunch of disks - and uses the file system to combine all those separate disks into one logical structure. From the user perspective, these all look exactly the same, but ZFS allows for much more complicated distributions of data and more diverse sizes of physical disks.

[–] taladar@sh.itjust.works 8 points 11 months ago

RAID-6 is basically the same as RAID-5 but with two extra disks instead of one, allowing for any two disks to fail and giving you n-2 capacity.

[–] lemmyvore@feddit.nl 11 points 11 months ago (2 children)

If you're using TrueNAS it already has some types of RAID it wants to do. Assuming your 5 drives are the same size what you want is called RAIDz1 (1 standing for one drive worth of redundancy).

It is a type of RAID5, which means instead of having 5x usable storage you reserve 1x for redundancy information spread out across the 5, and get only 4x usable space.

Since you're a beginner you get the usual lecture: RAID is not backup. RAID allows a certain number of your drives to fail without losing any data; it spreads the risk of hardware failure.

RAID won't help if you delete a file or accidentally explicitly format the wrong drive or even the whole array, and won't help if the PC is stolen or struck by lightning or burns in a fire.

The solution used by TrueNAS (ZFS) has something called snapshots that can help with modified or deleted files.

For anything else you have to consider which of your files are "my world has ended"-level of important and backup to a HDD in a drawer, or to Blu Ray discs, or online to the cloud.

[–] Presi300@lemmy.world 2 points 11 months ago* (last edited 11 months ago) (2 children)

Thanks, and yes, the disks are all the same, speed, capacity, brand etc... I'm confused about the difference between RAID 5 and RAIDz1, they seem to do the same thing on the surface and looking at the other comments and online, one of them is probably what I'm gonna go for. The only thing I get about the 2 is that RAIDz uses ZFS and RAID 5 does not (?)...

My "NAS" is relatively powerful and read/write speeds aren't really a big deal for me, as it's gonna be bottlenecked by the 1GBPs connection on my "NAS" (a PC I've scrambled together from handouts and cheap parts over the years)

[–] lemmyvore@feddit.nl 3 points 11 months ago (1 children)

I’m confused about the difference between RAID 5 and RAIDz1

RAID5 is the theoretical concept where you spread one drive worth of redundancy across multiple drives.

Traditionally this concept used to be implemented with special hardware cards that you plugged into your server and connected HDDs to it and it had its own BIOS where you managed the drives.

Later Linux implemented this concept (and other RAID concepts like RAID1, RAID6, RAID10 etc.) without the need of a special card. The Linux drives is called MD (Multiple Device). Actually Linux took it much further and can take any storage devices and do RAID with them: it can work with a whole disk but also with partitions, or with regular files formatted to look like a partition etc.

The cool thing about Linux MD is that it allows you to do any RAID combination that's logically possible, even if it's dumb and you'd never use it in real life, like RAID5 with 2 drives (normally you'd use RAID1 for that) or RAID1 with only one drive (no redundancy). Why? Because sometimes you need those dumb things. For example I have two RAID1 arrays and I noticed that both drives in one array show SMART signs that they might fail in the future (10% chance). Linux MD allowed me to remove one drive from each array and reconnect them to the other; now each array has one 100% healthy drive and one 90% healthy drive.

The uncool thing about Linux MD is that this is where it stops. It doesn't care what filesystem you use on the arrays and it has no other features. This makes its parity RAID implementations (RAID5, RAID6, RAID50, RAID60 etc.) vulnerable to sudden shutdowns (power failure or button off) because the drives may be left in disagreement about parity. To work around it you need a power UPS or a PCIe adapter card with a builtin battery, so that the parity is correctly written to the drives in case of power failure.

You can get some extra cool features (like snapshots) by using a filesystem like BTRFS on top of a Linux MD array.

RAIDz1 is a completely different implementation of RAID5 which uses ZFS and originated in the BSD world, not Linux. Linux gets to use it courtesy of the OpenZFS project as an external kernel module; it can never be included directly into the Linux kernel because of fundamental licensing differences. TrueNAS Scale is a Linux OS so it uses the module approach; there's also TrueNAS Core which is a BSD OS so it has native ZFS support. If you're only going to use your NAS as a NAS (for storage, not virtualization) I would recommend Core.

ZFS is both a RAID implementation and a filesystem; sort of like MD + BTRFS, but much more tightly integrated. ZFS has lots of extra features built-in: it has no write hole vulnerability; it has snapshots; it has compression; it can mark folders for special use cases and for example only activate compression on your documents folder but not on your movie folder.

The issue with ZFS is that it's much more complex and opinionated than Linux MD, so if you were to manage it directly yourself it would be a lot to learn. Even more experienced people have to think very carefully before using it. But since TrueNAS (both Scale and Core) have a user-friendly GUI that won't matter to you.

[–] Presi300@lemmy.world 1 points 11 months ago* (last edited 11 months ago)

I don't wanna use TrueNAS Core, as I'm not planning to use it as "just a NAS", I also plan to run a few other things on it, like pihole, searxng, wireguard, (maybe) nextcloud and a few other things. Other than that, I'm just not as familiar with BSD as I am with linux, nor do I particularly care to familiarize myself with it. As for ZFS, I'm still not sure about it, but looking at all the other options, it does look like the most straight forward and secure way to go, to me anyways...

[–] Voroxpete@sh.itjust.works 2 points 11 months ago (1 children)

RAIDz1 is just the ZFS version of RAID5. Since TrueNAS uses ZFS, RAIDz1 is what you'll be using (or RAIDz2 if you want an additional disk's worth of redundancy at the cost of a disk's worth of storage).

RAID5 isn't applicable to your situation. You'd have to be using a different OS, with a different default file system, for that question to matter.

What's the difference between RAIDz1 and RAID5? It's complicated. Without getting into the underlying specifics of how ZFS works, in short they're two different ways of achieving the same goal. To the end user, the differences are immaterial.

As an aside, if you're going to be using TrueNAS, which relies on ZFS for its storage technology, you're going to want to check if your disks use SMR or CMR. See this post for details; https://www.truenas.com/community/threads/WD-SMR-iX-Statement/

[–] ares35@kbin.social 1 points 11 months ago* (last edited 11 months ago) (1 children)

smr drives are horrible. we have some here. some by accident, others for cost savings (used only for long term, large file storage)--but all of smr's faults are really not worth it.. maybe at half the price per tb it might be--for some use cases, but not at current pricing.

the last batch we got in don't even support trim, so i guess the only way to 'clean up' zones is to literally dump everything off, secure_erase them, and 'start over'.

[–] lemmyvore@feddit.nl 1 points 11 months ago* (last edited 11 months ago)

TRIM is SSD technology and SMR is spinning-disk technology, you wouldn't find them on the same drive, would you? 🤔 Or do you mean a hybrid drive?

[–] Hopfgeist@feddit.de 1 points 11 months ago

To add, unlike "traditional" RAID, ZFS is also a volume manager and can have an arbitrary number of dynamic "partitions" sharing the same storage pool (literally called a "pool" in zfs). It also uses checksumming to determine if data has been corrupted. On redundant setups it will then quietly repair the corrupted parts with the redundant information while reading.

[–] c0mbatbag3l@lemmy.world 9 points 11 months ago (2 children)

If you have four drives you can do RAID 6 assuming your controller supports it.

RAID 0 just puts your data on multiple drives, giving you higher read/write speeds but with no built in redundancy.

RAID 1 is just a copy, you have your data duplicated so that if anything fails there's an immediate copy. No increase in RW speeds.

RAID 5/6 use "parity data" which operates somewhat like RNA/DNA when going through mitosis. The four building blocks TCGA only connect with one of the other four in pairs of two, so even if you have half the data (RNA) you know what the other half is by logical extension. The difference is that 5 uses 3 drives at a time whereas 6 uses 4, you can only withstand the failure of one drive in RAID 5 but 6 can handle the loss of two.

RAID 10 (one-zero, not "ten") does exactly what the name suggests, it combines the direct copy of RAID 1 with the striping of RAID 0 to give you double RW speeds with redundancy.

Each one will reduce your overall storage by a certain amount, either because of copying the data completely or taking up space for "parity data." The only one that doesn't do this is RAID 0 but you have absolutely no redundancy there and if You're considering RAID for home use I'm going to assume that's important to you.

[–] Septimaeus@infosec.pub 4 points 11 months ago (1 children)

I liked the mitosis analogy. May I borrow it?

[–] c0mbatbag3l@lemmy.world 4 points 11 months ago

Might as well, I think it's how my instructor taught it when I was going through school.

[–] pory@lemmy.world 3 points 11 months ago (1 children)

I thought RAID1 enabled faster reads too, because both drives have the complete file. Writes don't get a speed bump ofc, since those are still bottlenecked by the slowest single drive in the array

[–] c0mbatbag3l@lemmy.world 2 points 11 months ago

That could be, I was trained in systems admin but work as a network engineer by profession. I've only set up one server in an enterprise environment and it was using RAID 6.

I'd assume you could read from both disks at the same time though.

[–] ahto@feddit.de 6 points 11 months ago

You won't be using a traditional RAID with TrueNAS Scale. You have a choice of Stripe, Mirror, RAIDZ1, RAIDZ2, RAIDZ3, dRAID1, dRAID2, dRAID3. The docs are very detailed, so you should read up on RAIDZ and the other types elsewhere, too.

[–] _danny@lemmy.world 5 points 11 months ago (1 children)

This is a good tool for visualizing your raid needs from your capacity and total number of drives.

https://www.seagate.com/products/nas-drives/raid-calculator/

I'll preface that I'm no raid expert, just a nerd that uses it occasionally.

The main benefit of most raid configurations is the redundancy they provide. If you lose one drive, you do not lose any data. It's kinda obvious how you can have 1:1 redundancy, you just have an exact copy of the drive. But there are ways to split data into three chunks so that you can rebuild the data from any two chunks, and 5 chunks so that you can loose and two chunks. Truly understand how raid does this could easily be an entire college course.

Raid 0 is the exception. All it does is "join together" a bunch of drives into one disk. And if you lose an individual disk you likely will lose most of your data.

Another big difference is read/write speed. From my understanding, every raid configuration is slower to read and write than if you were using a single drive. Each raid configuration is varying levels of slower than the "base speed"

I typically use raid 5 or 6, since that gives some redundancy, but I can keep most of my total storage space.

The main thing in all of this is to keep an eye on drive health. If you lose more drives than your array can handle, all of your data is gone. From my understanding, there is no easy way to get the data off a broken raid array.

[–] Presi300@lemmy.world 3 points 11 months ago

I've mentioned it in another reply, but read/write speed isn't terribly important to me, as the whole thing is gonna be bottlenecked by a 1GBPs connection anyways. From what I read from the other replies and online, RAIDz1 sounds like the thing I'm gonna go with, as it seems robust enough and my NAS is powerful enough for the performance hit to not really matter...

[–] Decronym@lemmy.decronym.xyz 5 points 11 months ago* (last edited 11 months ago)

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:

Fewer Letters More Letters
NAS Network-Attached Storage
PCIe Peripheral Component Interconnect Express
RAID Redundant Array of Independent Disks for mass storage
SSD Solid State Drive mass storage
ZFS Solaris/Linux filesystem focusing on data integrity

5 acronyms in this thread; the most compressed thread commented on today has 6 acronyms.

[Thread #352 for this sub, first seen 14th Dec 2023, 12:15] [FAQ] [Full list] [Contact] [Source code]

[–] Huschke@lemmy.world 5 points 11 months ago* (last edited 11 months ago) (1 children)

[This is a good video that explains the basics and what raid setup you want for what kind of data.] (https://youtube.com/watch?v=5K8szc9gDYw)

[–] PipedLinkBot@feddit.rocks 4 points 11 months ago

Here is an alternative Piped link(s):

good videos that explains the basics and what raid setup you want

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I'm open-source; check me out at GitHub.

[–] poVoq@slrpnk.net 4 points 11 months ago* (last edited 11 months ago)

That is a way too broad question to be answer here and also depends on the file-system truenas uses.

If I remember correctly it uses ZFS by default and you can easily find some articles explaining the different raid levels of OpenZFS online.

Edit: ZFS is not the same as other file-systems so not all of the general RAID info you can find online is 1:1 applicable for it (same with btrfs).

[–] xia@lemmy.sdf.org 2 points 11 months ago

0: "i don't care about my data."

1: "i REALLY care about my data"

5: "i'll trade you one drive now, for my data if one of the drives dies later"

[–] redline23@lemmy.world 1 points 11 months ago (1 children)

Other people gave a good explanation of raid and some alternatives like zfs in truenas.

You want to avoid RAID5 with drives above 4TB. Every hard drive has can have an unrecoverable read error (URE) during the read. It's a very low percentage change that your hard drive publishes. During a raid 5 rebuild after replacing a drive, the other drives are stressed for a long time during the rebuild. With high capacity drives you have a pretty large chance of encountering a URE and losing the entire array. The high stress on the drives can also cause drive failure if another drive was on its way out.

I run truenas core at home in volumes that looks like raid 10. Two mirror volumes striped together for performance.

I never played around with raidz1 (like raid 5) but you still have the chance of an URE during the resilver. I can't comment if it's possible or what happens during an error. I did see people recommending raidz2 to allow for two disc failures from losing data during a resilver.

[–] Presi300@lemmy.world 1 points 11 months ago (1 children)

All my disks are 2TB so it shouldn't be a massive issue

[–] redline23@lemmy.world 1 points 11 months ago

I personally wouldn't use raidz1 because it seems too risky to me. I'd have higher redundancy.

Some links

https://www.truenas.com/community/threads/raidz1-vs-raid-5-ures.42598/

https://www.truenas.com/community/threads/5x-4tb-raidz1-array-rebuilding-with-nre-ure-issue.13719/

https://magj.github.io/raid-failure/

The last link is talking about actual raid and not zfs. But it has a 50/50 chance with a URE rate of 10^14 to lose the array. Raidz1 maybe won't have that catastrophic of a failure, but you'd still be rolling the dice on some corruption.

[–] sj_zero@lotide.fbxl.net 0 points 11 months ago (1 children)

The level of raid is fundamental to the operation of your raid array.

As I recall, RAID 0 is striping. It will give you faster throughput because your array can pull values out of multiple drives at once. RAID 1 is mirroring. In that, half of the drives are used for data, and the other half are used to back up the first half. RAID 5 is parody, and that's what you're looking for. Essentially, your drives will mostly be used for storing data come up with the last one will be used to track what information is on the other four, so you will have one drive for redundancy and the other four will be storing data.

Hardware raid versus software raid matters to the extent that parity calculations are relatively expensive and so if you're trying to do RAID 5 on software raid, that's going to eat up more of your CPU power and reduce your drive throughput.

I don't recall truenas in particular, and what you using the nas for is really what is important, but I do recall that some Nas software doesn't even want you to be using hardware raid because it will be using its own software algorithms that are separate from what you would typically consider to be raid.

[–] valen@lemmy.world 1 points 11 months ago

Raid 5 is parity, not parody 😀. Each drive contains part of the information of the other drives, so that if any one of the drives dies, you can still get all the information (it will just be slower until you replace it and the system rebuilds the data on the new drive).

[–] yournamehere@lemm.ee 0 points 11 months ago (1 children)

this is what chatgpt is made for

[–] Presi300@lemmy.world 1 points 11 months ago (1 children)

already tried, did not help that much

[–] yournamehere@lemm.ee 1 points 11 months ago

and they say AI will take over the world...hm