Storage VM: Part 1 – Server Components

Storage VM: Part 1 – Server Components

I have thought about how to proceed with this project and I have decided to break apart each particular VM I am building into its own series.  This is not really separate from the overall goal, but more an organizational thing.  Often I will work on each of these VMs in any particular order, based on what I think I can make the most progress with, or which one I am currently frustrated with and want to avoid for a bit.  In that vein, I want to start with the largest part of the project, the Storage VM.

The Old Storage Server

The old storage server is a FreeNAS deploy of 3 vdevs. A vdev akin to a RAID array, which I will go into more detail later.  It has one vdev of 8 x 2.0 TB Western Digital NAS Hard Disk Drives, one vdev of 8 x 4.0 TB Western Digital NAS Hard Disk Drives, one vdev of 8 x 8.0 TB Western Digital NAS Hard Disk Drives. Additionally, it has write log of 2 x 64 GB Kingston SSD, and l2arc cache of 1 x 256 GB Samsung SSD. 

When I first created this deployment, I had one thing primarily in mind for how I structured it, being able to add hard drive space without losing data.  At the time, there were really 2 file systems that supported increasing in size (that I was aware of), ZFS and XFS.  I considered XFS because it supported increases in size without needing Hard Drives with the exact same size.  This appealed to me because I did not have a uniform size of HDDs.  However, after doing some research, there is a cost associated with not matching sizes, usually in latency.

If I just wanted as much capacity as possible, mixing and matching could be done, but if I cared about performance in any realistic sense, or even potential raid rebuilding issues, it was, and is, vastly superior to use the exact same drives for everything.  I decided on ZFS because it technically did support increasing in size by replacing every drive in a vdev with a higher capacity drive. The last resliver will allow use of the extra space, which I have done once. ZFS also had a couple nice options for web control interfaces. After doing all of this by hand in configuration files with mdadm, this appealed to me. ZFS also includes a great deal of extra high availability and reliability features.  I still don’t regret this decision, as XFS hasn’t died out or anything, but ZFS appears to have grown much faster.  

Chassis:

I had originally thought to just transfer all the drives to the new system.  This was the original thought behind the SC846E1-R1200 chassis.  That was going to be the new storage chassis.  It was a slight mistake because the old storage chassis basically has no other space for anything else.  All drive bays are full, I have 4 SSDs just lying on the inside, not actually in any caddies or anything. There’s not even any 5.25 or 3.5 bays to put them in. 

SC846E1-R1200

Once I had it in hand, put the motherboards inside it, and saw what I would be working with, it occurred to me the mistake I had made here.  I needed more drive bays, or at least more space to work with.  I bought the second chassis because of its PSUs (1400 watts is probably enough for even the larger estimates).  

As I was working through this I encountered the Storinator [1] on Linus Tech Tips [2].  I had no idea such a thing existed.  Top loading chassis?  45 drives?  That was a lot. 

It turns out this is kinda where the storage space has gone in the last few years.  Supermicro has chassis that are top loaders like this [3] supporting 45, 60, even 90 HDDs.  These are very expensive, and I could not find any for sale on the secondary market. 

Storinator Server

I went and priced out a Storinator, which turned out to be quite expensive, well more than I am wanting to pay.  A thought occurred to me.  I sincerely doubt that Storinator manufacturers this chassis.  It’s a bit more well known in the hardware industry that numerous manufacturers are other manufacturers underneath. They will take some white label part or just some fabrication point and buy them, then skin and customize from there. 

I did some research on who manufactures the Storinator chassis, unfortunately I could not confirm who does the manufacturing.  I do have a suspicion.  The Storinator chassis may be manufactured by Chenbro.  Now, I am not confirming this, and I would not at all be surprised if I turned out to be wrong on this point.  I just encountered some forum speculation and I think the argument can be made.  The Storinator might be a reskin of the Chenbro RM43348 [4].  There is a lot of similarity in the styles.  If it is not a reskin, they clearly are taking inspiration from each other (or some other common source) on how to design storage chassis. 

Chenbro RM43348

Now one thing I am sure of, the Chenbro RM43348 is the IBM Cloud Object Storage Slicestor 3448 [5].  That is why I think that the Storinator might be a reskin, because Chenbro is already doing this and letting IBM white label it.  Once a single white label is confirmed, it’s not much of a stretch to think another manufacturer might be doing it.  That being said, there are distinct differences, such as fan location and back of chassis design, which may indicate this to be nothing more than speculation.  For my purposes though, this search led me to discovering the RM43348 and the Slicestor 3448.  

There were people selling brand new barebones Slicestor 3448’s with a motherboard, two LSI 3008-8i cards (HBA cards for operating the 48 drives), and two 800-watt redundant PSUs (no processor or drives) on ebay.  These were clearly for parts, and whoever was using them is moving on.  For me though?  This was a great catch. 

There were no rails (for putting in the rack), however, so I ended up having to contact a Chenbro reseller and special order rails for the RM43348 and they worked perfectly.  The rails cost half as much as the whole getup did on ebay. I was told Chenbro NR40700 or other large 4U Chenbro rails have a great chance of working, but there were none for sale anywhere.  This was in the middle of Covid lockdown, so it’s probably a manufacturing or logistics issue rather than a lack of will to sell them.  

Chenbro and I have an odd history.  They tend to be a cheap manufacturer.  Their chassis aren’t always the easiest to work with, and my previous 4U chassis from them had only 1 PSU.  It also had a lot of screws to move around to get to everything, especially the front bays.  That experience was 6 years ago and if IBM is willing to work with them, that gives me a lot of confidence on this. 

One hang-up though, the 800-watt power supplies.  My first thought was, well maybe I could buy 1400 watt power supplies from Chenbro?  No luck, only the 800-watts exist for this chassis. Or at least I couldn’t find any reference to other power supplies that fit.  I will say I didn’t look too hard because of another problem. 

The bigger issue was that the back panel where the motherboard sits is only about ~2.5 rack Us.  This meant that full size PCI cards do not fit in the back.  Even if I replaced the motherboard with one that works for me, and put it all together, most PCI cards would not fit.  

Free Hanging SSDs

Alright, even without the ability to make this the chassis for the whole project, this is still a really good find.  I did some thinking on this.  It is unrealistic to expect me to rebuild the entire storage system on a 24-drive bay without any extra space for other drives to be used by the other VMs.  My current Storage system has a bunch of SSDs literally floating in space.  This is not ideal, but I didn’t have anywhere else to stick them.  At the time I was happy to accept this compromise, it worked, SSDs don’t have moving parts, so this didn’t hurt their performance.  This time I am hoping to do better.  If my plans are to migrate from the old system to the new one, I will need more space.  The Slicestor chassis at least has plenty of expansion room.

At this point I had a thought, expansion chassis.  I have never actually done this.  I found a forum post that suggests something like that could work [6], and even referenced this specific chassis.  Power requirements for 24 hard drives are also a concern here. Hard drive power requirements aren’t large, but adding another ~10 watts per drive is 240 watts.  That’s pretty tough on an already tight power budget.  So I decided, I don’t think I can turn this into a chassis for the whole project, but I could turn it into a nice expansion chassis for just the hard drives of the storage system.  So I bought it.

Alright, I needed to research the requirements for expansion chassises.  First off, I need a control board.  This is like a motherboard except it only allows basic power controls.  It would be nice to be able to slave this board off of the motherboard housing the actual processor and other components, but I don’t think that is too realistic. 

In fact, one of the reasons I felt okay with purchasing the Slicestor chassis was because I had found a Supermicro JBOD chassis control board: CSE-PTJBOD-CB2 [7]. It is very hard to get direct information from Supermicro on this board, as they seem to only sell it in their own chassis and expansion chassis products.  I think it was not thought to be an end user product. Most information is found in chassis manuals rather than as a separate manual. 

CSE-PTJBOD-CB2

This seemed like a perfect fit.  It has the standard ATX power supply input, a series of control pins for power on, and a few extraneous inputs for fans and more.  I did not purchase it at the same time, I wanted to confirm that the Slicestor couldn’t serve as the larger chassis before purchasing it, just in case what I had read was wrong.  It was not.

The second requirement for an expansion chassis was converting from the internal based inputs to the external based inputs.  Since there would no longer be a motherboard inside, connections wouldn’t terminate within the chassis.  I would need to take inputs and convert them to what would be expected on an expansion chassis. 

Internal mini SAS-HD to External mini SAS-HD

The Slicestor uses mini-SAS connectors for the hard drives.  The wiring it comes with is for internal mini-SAS connectors not external.  I bought a pair of converters as there are four mini-SAS connectors within the chassis (two for each backplane).  Star Tech makes just such an item [8], so it just hooks into a low profile bracket from the old motherboard area of the expansion chassis.  Just plug in the internal mini-SAS connectors and now I had external connectors turning this into an expansion chassis. 

Internal SATA to External eSATA

I also added a pair of e-sata connectors [9] just for good measure as there is a 2.5in sata carriage in the Slicestor as well.  With that all of the internal connectors have been converted to external connectors, adding the control board, I think I have all I will need to convert this into an expansion chassis.

Now, let’s talk about a mistake I made at this point.  I was researching the expansion chassis, and forums were recommending I add caddies for the hard drives.  This makes a lot of sense if one is talking about hard disk drives.  These spin, and because of their physical spinning they create momentum and shaking.  Therefore it isn’t a great idea to just leave them without fitting rather snugly or at least screwed in.  I did a fair bit of testing on this, and I came up with 2 options.  I ordered a DELL OptiPlex caddy here, and a Supermicro MCP-220-94601-0N [10] caddy.

Dell OptiPlexSupermicro MCP-220-94601-0N

I was attempting to figure out which caddy would fit with the Slicestor chassis, as I did not get any caddies from the purchase.  I found that both work with varying degrees of success.  The biggest limitation of the Dell OptiPlex is that it needed to be cut, whereas the supermicro worked right out of the box.  Also the Dell only really works on 3.5in drives whereas the supermicro could work on both 3.5in and 2.5in drives (more common for SSD).  After testing and thought, I vastly preferred the Supermicro caddy as it was the best designed, and could be used for both sizes of drives.  

As it stands today, I am using none of the 48 Supermicro caddies I purchased.  I ultimately decided to use 2.5 in SSDs for this deploy, and they worked better just standing straight up rather than these bulky caddies that barely align right on the edges for the 2.5 in drives.  

Host Bus Adaptor (HBA):

The next piece I needed to put together was the HBA, or Host Bus Adaptor.  Remember back in the requirements post of this series I was thinking that this technology may have advanced?  Well it has, I think, but not in an extremely noticeable physical way.  The current technology is very parallel, with each of 4 channels per connector of up to 12 Gbs each (48 Gbps total), and over 1000 devices addressable.  The same basic infrastructure still works, and has just gone very wide. 

LSI 3008-8i PCI card

Checking into everything, I didn’t want to decide exactly which end software I’d be using.  I am still partial to FreeNAS and ZFS.  Seeing that Gluster can be built on top of FreeNAS (or ZFS really) [11], I wasn’t worried I couldn’t test things out later.  A quick look at recommended HBAs for Freenas [12] shows the LSI 3008 series, of which I got two from the chassis purchase, is still considered pretty good. 

Now here is where I am a little worried.  I do not see anything new on the HBA front for 5 years [13].  The 3008 is old, very old from a technology perspective.  I am concerned, even now, that I have missed something in this space.  I have even asked on forums directly if I am missing something.  So far nobody has answered.  

The fact that the brand new (even if for parts) Object Storage Slicestor from IBM is still using this technology makes me feel a little better about this, and the fact I can’t seem to find any new configurations on review sites, save just the drives themselves, makes me think the technology kinda stagnated. 

In other words, I think the storage space broke apart into slow-ish large arrays more for enterprise deployments, and the newer NVMe m.2.  It used to be that changes on consumer and enterprise both benefited each other.  With m.2 taking over, and not being anywhere nearly as scalable for enterprise systems (see PCIe lane limitations on both AMD and Intel), they split.  I think Enterprise hasn’t really needed much beyond these specifications because networking bottlenecks have been a bigger problem.  Even the sale of large JBOD drives, utilizing SAS3 as currently on multiple manufacturer websites has not eased my concerns that I am missing something.

With all of this in mind, I purchased two LSI SAS 9300-8e 8-port 12Gb/s SATA+SAS [14].  These would go into the main virtualization server, and connect to the external mini-SAS ports that I would be creating in the expansion chassis.  I have kept the two LSI SAS 9300-8i HBA cards just in case they become useful, worst case, I can always sell them on ebay.  I should note, I am aware of the LSI SAS 9305-16e [15], and am considering purchasing one if I want/need to reclaim a PCIe slot. 

The original chassis had a specific design in mind, I think, they mirrored the drives between the PCI cards.  Each card controlled one batch of hard drives, and the other controlled the other batch in a RAID 1+0 mirror.  This way, if an HBA card failed, the system would still be up and available. 

This is not wholly necessary for my purposes, as a card failure would not necessarily result in the loss of any data already confirmed sent to the hard drives.  The system would just be unavailable until the card is replaced.  That is not a big concern for my home lab, as I don’t need 100% availability a requirement an enterprise deployment may have.  However, I decided to stick with what the system designers had in mind until I have a better understanding of the trade-offs involved here.  I suspect the LSI SAS 9305-16e may be the better option for my deployment.

Storage Drives:

When I first put the system together, I got the whole thing working and built with literally no drives ready for use or even ordered.  The next post will cover the build challenges.  I verified everything was working with old used hard drives.   I got the entire system to boot into a windows 10 drive on an old Samsung 860 Pro SSD that was stuck in a slot of the chassis.  That proved I had all of it put together correctly.  However, I still hadn’t decided on exactly what I wanted to build for the new storage VM.  So let’s talk about the options I saw.

Okay, I’ve already covered what my old storage system looks like.  I won’t repeat that here.  The first option is to just pull those old hard disk drives out, stick them into the expansion chassis, and go.  This was my original plan.  I didn’t really see the need to update much beyond this.  However, my career since then has taught me exactly how bad these kinds of setups are for applications. 

A common Western Digital 4.0TB NAS drive has a measured IOPs of 137 and a peak latency of over a second [16]. I think back to my old storage system use cases.  I can get that 600 MB/sec indicative of that full 10G bandwidth when transferring a large single file.  But when I am performing a backup, or copying install files en mass (install directory) it is very slow.  I even once tried installing remotely on this storage array and found it basically unusable. 

It works well for basically one case, steaming the media I own and even then there is sometimes some stutter.  Everything else has to be an overnight task, which is fine for most of my purposes, but it’s not really ideal.  It was as I mulled the failures over that I decided, it’s been 6 years, SSDs are mainstream, and even being replaced by NVME drives. At least SATA SSDs are.  It is probably time to upgrade the whole darn thing to SSD arrays.  That is when I began researching SATA and SAS SSDs for replacing my whole array.

The first thing I had to consider was size, my current array uses 37 TB of space.  It has about 36 TB of space available.  This space has not shrunk by much over the years.  I installed about 30 TB when I first put it all together and maybe added 1TB a year.  Barring any gigantic new project that just needs that kind of space, I don’t think I need to increase the size of the array at all, and could even stand to lower it. 

A quick search of larger SSDs shows that 8 TB isn’t really a reasonable option.  They do exist [17][18][19], but these have a few decidedly negative factors.  They are bad on the price curve (dollar / TB), and all enterprise drives.  This is telling me that this is more of a niche technology right now.  They are pretty much data center exclusive, and probably only specific applications as well.  The lack of consumer options pretty much guarantees a drive much more expensive.  

Like I said before, I’m not really opposed to enterprise drives.  I like them.  However, I do have 48 drive slots.  I should be willing to use more of these slots to achieve a higher capacity than I normally would have.  So I tentatively settled on 24 x 4TB SSDs.  

There were a lot of options here.  My favorite review sites for this search is storagereview.com.  But I also hit up ebay and looked for what was for sale.  One thing to be aware of is that most enterprise equipment comes with either a 3 or a 5 year service warranty.  After that point, large companies will look to liquidate and upgrade.  They do this because they don’t want to be on the hook for maintaining these systems without vendor support.  This is why I tend to target what the enterprise market looked like 3 years ago.  These are where deals are to be made.  It took quite a bit of work in researching, looking through options, and frankly a bit of luck, but I found the perfect overview of this from a dell datasheet.

It lists both SATA and SAS drives, their exact model numbers, exactly what they mean (read intensive versus mixed use), and their theoretical specs.  I should note that this is no longer being hosted on Dell’s website.  I was able to find and download a copy from the archive.org though [20]. 

With this, I was even able to discover a few misidentified drives on ebay that coulda been quite a deal at their price.  However, I could never put together enough of them (max I was able to get to was 7) to really put my storage VM together. 

I also strongly considered the new Western Digital SSD NAS [21] and Seagate IronWolf 110 SSD NAS [22].  From this I was able to piece together all the options I could want, from second hand enterprise to modern NAS. 

The hard part was actually comparing between consumer and enterprise space.  Since Google, NetApp, and presumably many others know that there is minimal difference in failure rate here, it is somewhat surprising how little comparison between the two there is.  It’s hard to see how a WD NAS would run as a database storage solution.  I am thinking there might be some marketing shenanigans going on here, like preventing metrics from that use case from review sites.  Anyways, from pricing things out on Ebay and the above it looks like the new NAS SSDs, the Samsung PM863a, the Toshiba PX05SR are all in the same-ish price range. At first, I set out to decide among these.

I realized before purchasing the Samsung PM863as that I had almost made a mistake.  Second hand Dell equipment comes with added risk, a lesson I had learned before.  The reality of this is, all enterprise equipment manufacturers trade in some kind of vendor lock-in.  In some cases it’s pretty obvious, like their raid controllers only work with drivers they control (Dell). In others, significantly less so, I consider Amazon’s AWS environments to be a big attempt at vendor lock in.  Despite tools like terraform, if one goes too deep into AWS options, they become incompatible with other cloud services and AWS has achieved feature lock-in. 

When it comes to SSDs though, vendor lock-in works through firmware.  Almost every branded SSD, be it from Oracle, Dell, HPE, Cisco, etc. has custom firmware for their drives.  This makes it so bug fixes can only be acquired with an ongoing support contract.  Good luck trying to get that 40,000 Hour bug fix for your second hand HPE SSD [23].  I do know someone who works for HPE, and he is a good guy, I bet I could get some firmware, but this is not a good way to go about building things.  I first and foremost want to have drives that I am capable of maintaining myself.  This axes every enterprise drive manufacturer save 2, Micron and Intel.  So I have to give props to these 2 manufacturers for being willing to help and deal with the end consumer on drive issues.  

With all of this in mind, my solution space was the Western Digital SSD NAS, the Seagate IronWolf 110 SSD NAS, the Micron 5100/5300 PRO series, and the Intel DC-s4500 and D3-s4510 series.  I wanted enterprise drives, specifically power-loss protection.  This is my main array, I want data protection as much as I can get.  That, plus performance traits optimized for applications, pushed me to Micron and Intel.  Intel won out because their SSD toolbox [24] is an excellent application that will quickly let me update and evaluate the drives I get.  This was more expensive than I was hoping for, by almost 2x.  Second hand Samsung 863a’s are much cheaper, but I needed maintainability, lest I be stuck with unmaintainable system errors.  Now that is a nightmare.  

I bought refurbished and new Intel DC-s4500 and D3-s4510 drives.  My own performance metrics show them to be very close in overall performance (more on that in the next storage post).  I felt okay with refurbished because this is a RAID array.  It is designed to work with drive failures.  The cost of a refurbished drive is ~70% the cost of a new drive, so I could afford to buy an extra if need be once one fails.  Also, remembering my old adache, SSD failures are not a metric of use, made this feel a bit safer, a used drive (of which most “refurbished” drives are simply pulled from working machines and firmware updated, so more “used” than “refurbished”) is not a bad bet at all. 

Unfortunately even with all of ebay at my disposal, I could only find 12 refurbished/used drives (2 of them were mislabeled on ebay and in fact were cisco drives, the Intel SSD toolbox refused to work on them, thus confirming my fears above, and I was able to return them).  I also bought 12 new ones.  This was no disaster though, a mix of used and new may be a good half-way risk point.  This turned out to be much more expensive than I was hoping for, but I do think maintainability was the right choice, only time will tell.

References

[1] https://www.45drives.com/products/storage/

[2] https://www.youtube.com/watch?v=uykMPICGeqw

[3] https://www.supermicro.com/en/products/chassis/4U/946/SC946LE1C-R1K66JBOD

[4] http://www.chenbro.com/en-global/products/rackmountchassis/4u_chassis/rm43348

[5] https://www.ibm.com/support/knowledgecenter/STXNRM_3.14.9/coss.doc/pdfs/coss_slicestor_3448_book.pdf

[6] https://forums.servethehome.com/index.php?threads/48-drive-4u-chassis-ibm-slicestor-3448-aka-chenbro-rm43348.26812/

[7] https://www.supermicro.com/manuals/chassis/4U/SC417.pdf

[8] https://www.startech.com/en-us/cables/sff86448plt2

[9] https://www.startech.com/en-us/cables/esataplate2

[10] https://store.supermicro.com/mcp-220-94601-0n.html

[11] https://docs.gluster.org/en/latest/Administrator%20Guide/Gluster%20On%20ZFS/

[12] https://www.servethehome.com/buyers-guides/top-hardware-components-freenas-nas-servers/top-picks-freenas-hbas/

[13] https://www.storagereview.com/enterprise/storage-adapters

[14] https://www.broadcom.com/products/storage/host-bus-adapters/sas-9300-8e

[15] https://www.broadcom.com/products/storage/host-bus-adapters/sas-9305-16e

[16] https://www.storagereview.com/review/wd-red-4tb-hdd-review-wd40efrx

[17] https://www.micron.com/products/ssd/bus-interfaces/sata-ssds/part-catalog/mtfddak7t6qde-2av16ab

[18] https://www.samsung.com/semiconductor/ssd/datacenter-ssd/MZ7LH7T6HMLA/

[19] https://ark.intel.com/content/www/us/en/ark/products/134914/intel-ssd-d3-s4510-series-7-68tb-2-5in-sata-6gb-s-3d2-tlc.html

[20] https://web.archive.org/web/2019*/https://i.dell.com/sites/doccontent/shared-content/data-sheets/en/Documents/dell-poweredge-sas-ssd-performance-specifications.pdf

[21] https://www.storagereview.com/review/wd-red-sa500-nas-sata-ssd-review

[22] https://www.storagereview.com/review/seagate-ironwolf-110-ssd-nas-review

[23] https://www.bleepingcomputer.com/news/hardware/hpe-warns-of-new-bug-that-kills-ssd-drives-after-40-000-hours/

[24] https://downloadcenter.intel.com/download/29723/Intel-Solid-State-Drive-Toolbox

Leave a Reply