Storage VM: Part 5 – Caching

Storage VM: Part 5 – Caching

As I have been finalizing my plans for the storage system, I have been contemplating how it compares to my older storage system.  I have replaced the Hard Disk Drives with Solid State Drives.  This is a major performance increase, by at least one order of magnitude. 

In the older storage system, I had a pair of SSDs that I used as a ZFS intent log.  The intent log is basically a quick way to confirm state changes while they are written to the slower hard disks.  For example, when a file needs to be written, ZFS will write the changes to the intent log and then the arrays, it can even initiate both operations at the same time.  It can confirm that the write is complete when the intent log has finished saving the changes, then it is simply a matter of preserving the order of changes as they percolate to the HDD arrays. 

In the event of a failure (usually power, but sometimes other), the intent log can be used to reconstruct to complete the writes that were in progress at the time of the failure. There is a fair bit of complication around this, especially in preserving the order of the writes, but these are imminently solvable.  The intent log is essentially a write cache.

A good intent log will be able to confirm that the writes have completed to persistent memory as quickly as possible.  Six years ago, SSDs were very much the best possible.  There have been a couple of contenders since then.  Lots of SSDs have added RAM caches so that they can confirm writes or cache reads better.  However, for large data sets, these are easily overwhelmed, such as a network storage system for enterprises.  Here is a good overview of a lot of the choices for ZFS intent logs [1].  Given my use case, I will likely eventually overwhelm these caches.  I kinda want to figure out a way to add a cache.

There is a bit of a problem here, for a cache to be effective, it would need to have performance between the SSD and RAM memory.  SDRAM latency is on the order of ~10-100ns [2].  The access time of SSDs are ~10-100us [my own measurements].  There is one such technology that I have been able to find.  It first came out in 2017 and is called 3D XPoint. 

3D Xpoint works by changing the resistances of each cell, then measuring the resistance for reads.  3D Xpoint is marketed by Intel as Optane memory. Micron calls it QuantX, though I have yet to see a QuantX drive actually for sale to end users. Perhaps it is only available upon agreement with Micron. 

The optane drive has a latency of ~1-10us [3].  This is still quite a bit slower than modern fast RAM, but it is an order of magnitude faster than SSDs.  A quick aside here, it is possible that bandwidth may come into value here, if say there were similar latencies, and a cache could manage a great deal more bandwidth, just in front of the slower bandwidth options, but at a much greater expense per GB.  This doesn’t typically happen in the real world, at least I am not aware of a non-contrived example.  A good look into this effect is that there are essentially bands of access time and bandwidth versus price [4].  3D Xpoint definitely can handle more IOPS, has lower latency, and higher bandwidth than an SSD, but slower than RAM.  This is perfect for this case.

I now have an issue.  All optane drives need a PCIe 4x slot available, whether in the form of a U.2 interface, M.2 interface, or an actual PCIe slot.  Luckily I do have two unused SlimSAS-8i ports.  SlimSAS is new to me.  A close examination of recent motherboard designs did show me that these actually use PCIe lanes for them.  I suspect that they are a form-factor update.  Designers are trying to figure out ways to plug in the new class of U.2 disks, be they SSD, Optane, or any other, without the need for bulky PCIe cards.  Intel optane Memory comes in 3 forms, PCIe card, M.2 card, and U.2 2.5in drive.  Given that what I have available are SlimSAS-8i ports, the U.2 is the vastly preferred choice.  

Intel Optane Drive

I don’t need an excessive amount of space for this, its primary purpose is to serve as a fast persistent storage.  With so many other SSDs, the bandwidth of the SSD mirrors combined will be larger than a single Optane drive.  I chose the 280GB Intel 900P option.  There are some older ones that are 32GB or 16GB in the M.2 form factor (which could be converted to SlimSAS), but these are too small.  I want the amount of space to be at least equal to the amount of RAM the storage VM.  This is the next level up. 

Therefore I ordered 3 of these.  Typically, to prevent data loss, two of these cache drives are used in a mirror configuration.  I added the third to serve as an L2-Arc cache, which is a read cache for ZFS.  The same math applies for read cache as for write cache, and since I have 4 drives worth of PCIe lanes to work with, I added one for read caching.  I even got a couple used for a good price on eBay.

I want to have a quick discussion about used drives on eBay.  For this project, I bought several used drives on eBay, 3 different drive models, marked as used, refurbished, and new. 

Every single intel drive I bought I ran the Intel SSD toolbox [5].  There are three things I checked, the SMART output, which the toolbox identifies if there are any parameters that are of concern, there were none on any of these.  I then ran a full diagnostic scan, which will read and write from every block on the drive.  My hope was to find any issues with the drives, since refurbished or used is a slightly higher risk.  They all passed.  This took approximately 13 hours per 3.84 TB drive, and thus took a while to complete.  I ended up using the workstation and the streaming PC to run this test.  The Intel SSD Toolbox itself only allows one of these tests to be running at a time per machine.  Lastly I checked to verify that the most recent firmware was loaded on the drives.

I mention this because Intel has made managing these pretty simple.  I am able to test and validate these pretty easily.  The only weird thing I encountered was that four of these were just deleted, with a partition intact that I had to delete.  Three of them actually had a complete UEFI partition installed.  I had to step through some old fdisk to get rid of this [6].  People need to do a better job at this, the Intel SSD toolbox even includes an option to wipe the drive clean, which just performs the trim operation.  I am not the only one to encounter this [7].  I guess people just don’t care at some point. More likely these are hardware recyclers purchasing by the ton and it isn’t worth their time to protect other people’s data.  I just thought it was a little odd.

Before going through the next part I want to do a quick overview of the various connector types that I have been working with.  

Technical NameCommon NameInput (Male Connector)Socket (Female Connector)
SFF-8643Internal Mini SAS HD
SFF-8644External Mini SAS HD
SFF-8654Slimline SAS-4i
SFF-8654Slimline SAS-8i
SFF-8639U.2
m.2
All of the various connectors I worked with on this
The Optane drives visible in the BIOS

With that out of the way, I want to cover all of my attempts to get the U.2 Optane 900P drives working.  I believe that I have encountered a compatibility issue with the Tyan motherboard.  When I ordered the Optanes, they came with two types of connector, one was an m.2 to u.2 connector, the other a SFF-8643 to U.2 connector.  What I was looking for was successfully seeing the drive identified in the bios. Looking at the picture, I managed to get all 3 drives identified.  However, when I first attempted to get these drives working, none of them were correctly identified in the bios.

My first attempt was to use the SFF-8643 to U.2, which did not work.  I then purchased an SFF-8654 4i to u.2 converter [8].  This, rather surprisingly to me, did not work.  No drive was detected. 

Left: SFF-8643 to U.2, Right: SFF-8654 4i to U.2
M.2 to U.2 Converter

I then considered that maybe the drives themselves were defective.  So I removed the sabrent m.2 drives and plugged in the two m.2 to u.2 connectors that came with the drives.  These worked perfectly.  Apparently, using the m.2 converters are using a different set of connectors. 

I then considered if it was the SFF-8654 connectors on the motherboard (of which there are two) or possibly the specific converter I received.  So I put together another option.  I bought an SFF-8654 4i cable [9] and a U.2 2.5 inch cradle [10].  I could at least verify that it was powering on, and as an NVME drive. 

Left: Sff-8654 4i plugged into the U.2 2.5 in cradle, Right: Optane powered on in cradle

The next thing I considered was that these are SFF-8654-8i on the motherboard.  Maybe there is an issue with using just an SFF-8654-4i connector.  So I ordered an SFF-8654-8i to dual SFF-8654-4i split cable [11]. Which I then used the cradle with, and it did not work.  

I was getting a bit discouraged here.  The drives work, these are just standard cables.  Why was this not working?  I opened a ticket with Tyan at this point, it is still open even now (Opened 10/3/2020).  There is no response as to why any of these methods do not work.  I then decided to take a rather extreme measure, and use the m.2 connectors which I have seen work.

I bought an SFF-8654-4i to m.2 converter [12].  Then a dual SATA power to single sata power cable [13].  I connected all of these together and plugged it in, which did work!  So for whatever reason, direct conversion from SFF-8654 to u.2 was not working, but going through m.2 was.  

There was one more thing I wanted to check.  I have encountered situations like this before, where there appears to be a successful path, but the standard options aren’t working.  This reeks of vendor lock-in.  So I went back to the motherboard manual [14] and looked for a specific converter listed for this motherboard. 

There was only one place I found I could order this, and I did.  This cable works for both of the drives, though it is a little incomplete.  Within FreeNAS, it does not identify the drive as an SSD drive, but it does recognize it.  Therefore, I must conclude this is a case of some weird kind of vendor lock-in.  If I want to use u.2 with these SFF-8654 ports directly, I must buy these.

Since I have already put together a working solution for one drive that uses the m.2 conversion method, I decided to use this and leave the remaining SFF-8654-4i open for future expansion.  

I used a favorite of data centers, the zip tie!  It is something of a joke, and perhaps a truth, that zip ties run the data center.  They are a universal tool for keeping cables together, for keeping items around.  A friend even once told me that while working at a Google data center, he used zip ties to hold a thermal sensor in the middle of the isles.  I keep a fair number around for just such occasions.  I tied together the conversion method I outlined above for cable management.  This isn’t exactly pretty, but it is quite effective at containment of all of the conversions and cables.

The SFF-8654-4i to m.2 to U.2 bundle

There was one last item I encountered when I was having issues getting these working.  VMWare passthrough did not seem to be working correctly.  A quick google search and the solution was found.  It looks like this is a known issue [15], where the passthrough flags for the Optane 900P aren’t added for some reason.

  

– ssh to ESXi

– edit /etc/vmware/passthru.map

– add following lines at the end of the file:

# Intel Optane 900P

8086 2700 d3d0 false

– restart hypervisor

I used the Home Assistant VM on the old storage server to SSH over to the new ESXI and edit the /etc/vmware/passthru.map and add the passthrough flag information.  After this, everything worked.

With this last issue resolved, caching for the new storage VM would appear to be solved, and the only remaining item is a backup pool for important data.

References

[1] https://www.servethehome.com/what-is-the-zfs-zil-slog-and-what-makes-a-good-one/

[2] https://en.wikipedia.org/wiki/CAS_latency

[3] https://ark.intel.com/content/www/us/en/ark/products/123628/intel-optane-ssd-900p-series-280gb-1-2-height-pcie-x4-20nm-3d-xpoint.html

[4] https://www.electronicdesign.com/technologies/memory/article/21801131/3d-xpoint-a-new-revolution-in-memory

[5] https://downloadcenter.intel.com/download/29723/Intel-Solid-State-Drive-Toolbox

[6] https://www.howtogeek.com/215349/how-to-remove-an-efi-system-partition-or-gpt-protective-partition-from-a-drive-in-windows/

[7] https://www.servethehome.com/buyers-guide-datacenter-ssd-inexpensively/

[8] https://www.amazon.com/SFF-8654-SFF-8639-CableCreation-Straight-Compatible/dp/B07NVCTRSR

[9] https://www.amazon.com/gp/product/B082NT3TTW/ref=ppx_yo_dt_b_asin_title_o07_s00?ie=UTF8&psc=1

[10] https://www.amazon.com/gp/product/B07FMVBWJL/ref=ppx_yo_dt_b_asin_title_o06_s00?ie=UTF8&psc=1

[11] https://www.amazon.com/gp/product/B081ZC35H6/ref=ppx_yo_dt_b_asin_title_o04_s00?ie=UTF8&psc=1

[12] https://www.amazon.com/gp/product/B079S4PQCH/ref=ppx_yo_dt_b_asin_title_o01_s00?ie=UTF8&psc=1

[13] https://www.amazon.com/gp/product/B07ML447FG/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1

[14] ftp://ftp1.tyan.com/doc/S8030_UG_v1.0g.pdf

[15] https://www.ixsystems.com/community/threads/nvme0-missing-interrupt-after-esxi-and-bios-update.78745/

Leave a Reply