DNS, SSL Certificates, and Certificate Authorities

DNS, SSL Certificates, and Certificate Authorities

Domain Name System

One thing I have always wanted to set up for my home systems is a Domain Name System.  For those that are unfamiliar, the Domain Name System (DNS) is an old set of protocols designed to translate from name labels to IP addresses. For example, take www.google.com, which has three name labels, “www”, “google”, and “com”.  Each of these labels provides a different opportunity to translate to a different IP address.  “mail”, “google”, and “com” will actually take one to a completely different IP address than the main “www” would.  DNS is how that is managed.  Wikipedia has a good overview of the basics here [1].  

Briefly, when one registers and purchases a domain, they have control over that domain node.  While it is theoretically possible to purchase a domain directly from ICANN (The Internet Corporation for Assigned Names and Numbers, the entity that manages public domains and IP addresses), it is almost certainly not worth it for a small number of domains.  One would need to become a registrar and work directly with ICANN [2], but for most of us, simply letting a registrar manage these for us is best. 

Once a domain has been purchased, it is then possible to deploy a public DNS server, give that server a public IP address, and point the DNS system of the registrar to this DNS server to finish the translation from DNS name queries to IP addresses, or reverse queries for that matter, which go from IP address to name labels.  Usually, the interfaces of the registrars is sufficient for what most companies will need, and avoid some of the extra expense.

For my purposes though, I want to use a DNS name space for my internal systems to add a layer of flexibility.  If I call my storage VM storage.home, and point all of my other systems to that name instead of a direct IP address, I can then alter things behind the scenes. 

For instance, if I were to deploy a second storage system in order to replace the first one, I can simply run everything in parallel for a bit, get the second storage system set up and ready, then just switch the DNS entry for storage.home -> 192.168.1.5 to storage.home ->192.168.1.6 to complete the process and don’t need to do anything other than restart the services that use the storage systems. 

I could also just set up a quiet sync in the background between the two and wait for the time to live of the DNS lookup to expire and it will happen automatically.  Time to live is basically a number of seconds that the response for a particular query should be considered valid. This is an example of an orderly migration for enterprise systems.  It can be used on the local level as well, and I can even set up multiple names pointing to the same IP address if I predict I could want to switch things up later.

I also find that it is easier to remember labels than IP addresses, especially if I have more than a dozen or so to remember.  Having a DNS system set up means I have to actually codify this explicitly in a database.  This will help prevent me from forgetting exactly where some minor service I use less often is located.  I can always just go check the source of truth on these translations to figure this out.  This has actually happened to me before when I forgot which IP address the PS3 was assigned to, and it took me an hour to find it.

The last reason I want to do this is actually even more simple.  I want to learn it.  Using these systems through a domain registrars web interface is simply not the same as setting up my own server with greater understanding of exactly what is being done behind the scenes.  It won’t make me an expert, but it will give me a lot more familiarity than simply entering a few boxes in a web input text.

Choosing DNS Software

When I was looking into which DNS server to use, I did not have a notion of what I wanted to use, Wikipedia has a basic comparison of them [3], but I wasn’t sure what features would be important.  I just wanted the basic setup so I could learn and do more as I got deeper into it.  Also, I was more interested in which software had the best support and guides for learning.  I knew that I wanted to use a Linux server that I was familiar with.  In this case Ubuntu is probably my favorite distro of Linux.  A bit of searching yielded two good guides for BIND [4] [5], one that even was dedicated to Ubuntu.  This seemed like a big win to me.

Truthfully, when I am doing something like this I try to keep one bit of wisdom in mind:  Without familiarity with the systems involved, I am unlikely to be able to properly evaluate which is best for my case.  For example, CoreDNS [6] markets itself as being a DNS system for Kubernetes.  Without familiarity with Kubernetes and CoreDNS, how would I know that this is the best choice?  In these cases I find that it is often better to simply make a decision and start to learn than to expect to nail it the first time through.  I can get lost in evaluation of all the possibilities inherent in things I am simply not familiar enough with to make a good decision. 

From my history, a good example is the storage system.  I started with Napp-it, then switched to  FreeNAS, and continue with FreeNAS now.  Although there is some propensity to stick with what I am familiar with, there is also truth in the idea that without any familiarity, a proper evaluation isn’t likely to happen.  I went with Napp-it because it appeared to be recommended when I started doing basic searches.  This time through the storage system, if you’ve been following along, I have been able to discuss vastly more topics and variables in the decision.  This is because of the basic knowledge I started gaining from first deploying Napp-it, then secondly FreeNAS.  In this light, I often make these decisions with just basic information like BIND being the most deployed DNS software [7], and a good guide.

Configuring Bind9 DNS

I went ahead and deployed an Ubuntu VM with very modest resources, 2 CPUs (just in case one gets locked up, I don’t want the entire machine to be stuck) 1Gb of ram, 16Gb of disk, and the VM network adapter.

Then, I stepped through a basic ubuntu install in the previous post about the Home Assistant VM,  I won’t be repeating that here.  

Follwing that up, I went through basic guides for set up starting with installing bind9:

sudo apt-get install bind9 bind9utils bind9-doc

Not wanting to get too bogged down in ACLs (Access Control Lists), I ignored those sections.  I just wanted a basic install.  The configuration files are located in /etc/bind/.  I started by creating basic configurations, starting with /etc/bind/named.conf.options.

options {
        directory "/var/cache/bind";

        // If there is a firewall between you and nameservers you want
        // to talk to, you may need to fix the firewall to allow multiple
        // ports to talk.  See http://www.kb.cert.org/vuls/id/800113

        // If your ISP provided one or more IP addresses for stable
        // nameservers, you probably want to use them as forwarders.
        // Uncomment the following block, and insert the addresses replacing
        // the all-0's placeholder.

        recursion yes;
        allow-recursion { 192.168.2.0/24; };
        listen-on { 192.168.2.4; };
        allow-transfer { none; };

        forwarders {
                8.8.8.8;
                4.4.4.4;
        };
     //=======================================================================
        // If BIND logs error messages about the root key being expired,
        // you will need to update your keys.  See https://www.isc.org/bind-keys
        //=======================================================================
        dnssec-validation auto;

        listen-on-v6 { any; };
};

Then I added my two basic zones in /etc/bind/named.conf.local.

//
// Do any local configuration here
//

// Consider adding the 1918 zones here, if they are not used in your
// organization
//include "/var/lib/bind/zones.rfc1918";

zone "local" {
        type master;
        file "/etc/bind/zones/db.local";
        allow-transfer { 192.168.2.4; 127.0.0.1; };
};

zone "2.168.192.in-addr.arpa" {
        type master;
        file "/etc/bind/zones/db.2.168.192";  #192.168.2.0/24 subnet
        allow-transfer { 192.168.2.4; 127.0.0.1; };
};

I started by just trying to get my local network controller to resolve correctly, and pointing to the name servers.

;
; BIND reverse data file for local loopback interface
;
$TTL    604800
@       IN      SOA     local.       root.local. (
                             1         ; Serial
                         604800         ; Refresh
                          86400         ; Retry
                        2419200         ; Expire
                         604800 )       ; Negative Cache TTL
;

; name servers - NS records
        IN      NS      ns1.local.
        IN      NS      ns2.local.

; PTR Records
3       IN      PTR     ubiquiti.local.              ; 192.168.2.3
;
; BIND data file for local loopback interface
;
$TTL    604800
@       IN      SOA     local.  root.local. (
                             1         ; Serial
                         604800         ; Refresh
                          86400         ; Retry
                        2419200         ; Expire
                         604800 )       ; Negative Cache TTL
;
; name servers - NS records
        IN      NS      ns1.local.
        IN      NS      ns2.local.

; name servers - A records
ns1.local.      IN      A       192.168.2.4
ns2.local.      IN      A       192.168.2.5


; 192.168.2.0/24 - A Records
ubiquiti.local.         IN      A       192.168.2.3     ; network configuration

There is a configuration utility that can be used to validate the configuration files.  I just run that every time I make a change to validate that the syntax is correct. 

# check configuration
named-checkconf

There is also a utility for checking the zone configuration files.

# check zones
named-checkzone <zone name> db.<zonename>

Then I just restarted the bind service.

# restart bind9 service
systemctl restart bind9.service

I can go ahead and check if the resolver is working.

# dig ubiquiti.local

; <<>> DiG 9.16.1-Ubuntu <<>> ubiquiti.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 18346
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;ubiquiti.local.                        IN      A

;; Query time: 24 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Mon Nov 23 22:12:48 UTC 2020
;; MSG SIZE  rcvd: 43

And I failed.  I found that ubiquiti.local.local was resolving, for some reason.  I spent a lot of time trying to figure out if there was a restriction on the number of name labels allowed in name resolution.  That was not the issue.  It took me a fair bit of time to figure it out, despite the fact that it was staring me in the face on that first dig.  

;; WARNING: .local is reserved for Multicast DNS

The .local I was using was reserved.  I simply flipped all of the .local to .internal and it worked!

# dig ubiquiti.internal

; <<>> DiG 9.16.1-Ubuntu <<>> ubiquiti.internal
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55748
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;ubiquiti.internal.             IN      A

;; ANSWER SECTION:
ubiquiti.internal.      6978    IN      A       192.168.2.3

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Mon Nov 23 22:17:55 UTC 2020
;; MSG SIZE  rcvd: 62

The configuration files changed as follows.  I updated /etc/bind/named.conf.local.

//
// Do any local configuration here
//

// Consider adding the 1918 zones here, if they are not used in your
// organization
//include "/var/lib/bind/zones.rfc1918";

zone "internal" {
        type master;
        file "/etc/bind/zones/db.internal";
        allow-transfer { 192.168.2.4; 127.0.0.1; };
};

zone "2.168.192.in-addr.arpa" {
        type master;
        file "/etc/bind/zones/db.2.168.192";  #192.168.2.0/24 subnet
        allow-transfer { 192.168.2.4; 127.0.0.1; };
};

/etc/bind/zones/db.internal replaced /etc/bind/zones/db.local.

;
; BIND data file for local loopback interface
;
$TTL    604800
@       IN      SOA     internal.       root.internal. (
                             20         ; Serial
                         604800         ; Refresh
                          86400         ; Retry
                        2419200         ; Expire
                         604800 )       ; Negative Cache TTL
;
; name servers - NS records
        IN      NS      ns1.internal.
        IN      NS      ns2.internal.

; name servers - A records
ns1.internal.   IN      A       192.168.2.4
ns2.internal.   IN      A       192.168.2.5


; 192.168.2.0/24 - A Records
ubiquiti.internal.              IN      A       192.168.2.3     ; network configuration

/etc/bind/zones/db.2.168.192 was updated as follows:

;
; BIND reverse data file for local loopback interface
;
$TTL    604800
@       IN      SOA     internal.       root.internal. (
                             20         ; Serial
                         604800         ; Refresh
                          86400         ; Retry
                        2419200         ; Expire
                         604800 )       ; Negative Cache TTL
;

; name servers - NS records
        IN      NS      ns1.internal.
        IN      NS      ns2.internal.

; PTR Records
3       IN      PTR     ubiquiti.internal.              ; 192.168.2.3

As a side note, every time I change a db.<something> file, I need to up the serial on it to allow bind to detect that there are changes to register.  So that number keeps increasing as time goes on.

With that figured out, I went ahead and configured all of my IP addresses within the zone files and got everything working.  In my network, I use DHCP defined static IP addresses, which I may cover at some point later, to guarantee that my servers and VMs are always issued the same IP addresses. 

I wanted to extend configuration for logging.  I at first just tried to take a basic guide and work with it [8] and extend it to create logs in /var/log/ where I am used to looking for logs on Linux.  The log file never appeared.

That guide did not explain how AppArmor works with Bind9, and I took some time to figure it out (and read the comments which go over that).  AppArmor, a sort of evolution on SELinux that is file system independent, kept getting in the way.  After some time I was able to discover the issue, which was that Bind9 is expected to create logs only in specific places. I fixed it with its own configuration file /etc/bind/named.conf.log.

logging {
  channel bind_log {
    file "/var/log/named/query.log" versions 3 size 5m;
    severity info;
    print-category yes;
    print-severity yes;
    print-time yes;
  };
  category default { bind_log; };
  category update { bind_log; };
  category update-security { bind_log; };
  category security { bind_log; };
  category queries { bind_log; };
  category lame-servers { null; };
};

Then I added this to the main `/etc/bind/named.conf` file:

// This is the primary configuration file for the BIND DNS server named.
//
// Please read /usr/share/doc/bind9/README.Debian.gz for information on the
// structure of BIND configuration files in Debian, *BEFORE* you customize
// this configuration file.
//
// If you are just adding zones, please do that in /etc/bind/named.conf.local

include "/etc/bind/named.conf.options";
include "/etc/bind/named.conf.local";
include "/etc/bind/named.conf.default-zones";
include "/etc/bind/named.conf.log";

Lastly I set up a logrotate to make sure I don’t run out of hard drive space unexpectedly preventing the system from working.  This sets up a 7 day rotation to delete files after that time. I created the logrotate configuration in /etc/logrotate.d/bind.

/var/log/named/query.log {
  daily
  missingok
  rotate 7
  compress
  delaycompress
  notifempty
  create 644 bind bind
  postrotate
    /usr/sbin/invoke-rc.d bind9 reload > /dev/null
  endscript
}

With logrotation complete, this is a deploy I can work with.  It is resolving everything locally when I look up local services. 

I did want to play a quick trick on it.  I set up my local gateway to be my primary DNS server for all devices.  Then I pointed the gateway to this DNS server, and then to one of the google DNS servers.  That way if this VM were to fail the internet would still work in general.  I also configured BIND to allow for a secondary DNS server, but since I am not pointing all of my other configurations to this server, and I only had 2 slots available in Ubiquiti, I decided to leave this blank. 

After all, I am not sure what case I am covering for if this VM is down, and somehow the other VM would not also be down?  It would have to be a very esoteric case, as it is far more likely for the entire ESXI server to be down that just one specific VM somehow to have crashed but not hit both of them.

SSL Certificates and Certificate Authorities

The next thing I have always been bothered by with my home lab is that every time I go to a management page I am hit with one of these:

This is just an example from my internal network after setting up DNS.  It is the same when I go for the IP address directly too:

This is just annoying.  Yes, I can just click advanced and proceed anyways, but it would be a good project to actually fix this problem.  

At its base, this is an issue with SSL certificates and the web browser.  A few years ago it made some news that Google Chrome was going to start labeling websites that did not use SSL certificates as “Not Secure”.  This created a bit of a scramble for all the website supporters to get a valid public certificate to validate their websites and web apps.  Without a valid SSL certificate to encrypt the delivery of the web traffic, their customers would have to click through the advanced and proceed anyways

The fear was that the warnings about being insecure would cost customers.  This is likely correct on their part, as I know I will exit any public website with this warning immediately.  In my case though, I know that this is not backed by a public certificate because that hasn’t been done yet, not because I am potentially being hacked.  Most of these are serving up generated certificates, or possible self-signed certificates, that are by default not trusted by the web browsers.  It is possible to download these certificates and tell your operating system to trust them [9] [10] [11].  

That being said, the primary way the internet works is by using trusted public root certificates that are pre-loaded into the operating system’s certificate root store (or just a browser’s root store depending on implementation).  The standard set of trusted certificate authorities are usually Let’s Encrypt (operated by the Electronic Frontier Foundation), GoDaddy, DigiCert, Symantec, IdentTrust, and Comodo.  I am sure there are others I am simply not aware of. 

These root certificates can be used to sign intermediate certificates, and these certificates can then be used to issue more intermediate certificates, or potentially end certificates.  This also enables multiple servers to be using this methodology at the same time.  If an intermediate certificate is compromised, it can be easily revoked with knowledge of the root certificate, and a new intermediate certificate issued.  This is a bit of work for the IT managers behind the scenes, but since these are not distributed to all the end browsers like the root certificates, it requires no work from the actual end customers to fix.  A compromised root certificate would be a disaster though, as it would require a browser update for everyone to fix in addition to all that IT work.  Here’s a quick overview of this process, though it is a bit comic-y [12]

For a public website to not see that “Not Secure” warning, it needs to go through what is called a challenge from a certificate authority.  Essentially this is just a way for a Certificate Authority to prove that the person/entity requesting the certificate (and thus certification) actually does own the web domain they are attempting to obtain a certificate for.  There is a standard defined for these challenges called ACME [13], of which there are two primary types. 

The HTTP challenge is essentially asking the ACME server to give you a token, which will then be hosted by the HTTP server being run within that domain’s infrastructure at a specific URL.  The ACME server will then attempt to read that URL and expects to see the token there.  If it finds it, the challenge is considered passed. 

The second challenge is a DNS challenge, where the ACME will issue a token.  Then that domain’s infrastructure needs to set up a specific DNS text record containing that token in the DNS server.  From there, the ACME server can make a DNS query for a specific name label in a subdomain under the domain being challenged (E.G. _acme_challenge.www.google.com would the URL for a DNS challenge on www.google.com).  If it receives the token back from the DNS server, the challenge is considered passed.

As these relate to my own home lab, there are a couple of considerations here.  First is that I don’t want to open up my home lab systems to the internet at large.  That is far more effort than I wish to go through, especially the maintenance, upkeep, and security risks (being hacked).  There is a reason computer infrastructure engineering is a profession. 

Second, I don’t want to pay any money.  Most certificate authorities charge a fee for their services.  Let’s Encrypt, by the EFF, does not.  That’s the primary reason that something like 30% of the internet web sites use it for its certificate signing [14].  I briefly considered trying to use Let’s Encrypt for this.  I remember my time leading the engineering department of a web hosting firm and the commotion Let’s Encrypt created around this.  It was all very positive, and I still have some positive feedback about this. 

The biggest issue is that they have a relatively short time to live on their certificates, only 90 days.  That just means a challenge needs to be passed every few months and a new certificate uploaded to the server.  This is a process that can be, and has been, automated by many over the years.  It is even possible to pass the Let’s Encrypt challenge without opening an internal server to the public internet with the DNS challenge, which only requires control of the DNS server rather than the actual one.  However, Let’s Encrypt also imposes an additional requirement, a public domain namespace.  Think something like ending in .com or .net.  None of my internal names match this, nor do I particularly want to purchase a domain just to pass this challenge.

As I was researching all of this, a few people mentioned from time to time, just spinning up your own certificate authority and issuing your own certificates down the chain.  The root certificate installation is a one and done (they last ~10 years), and then you can automate the rest. 

This, in combination with my DNS server I had just set up, provides a method for actually issuing certificates for all of my internal services without actually needing to change or control the services themselves.  It would be a bit difficult to get the ESXI server to host the HTTP challenge without control over the base software, but just issuing it a certificate where the challenge was passed by my home lab’s DNS server?  That seems doable.  I decided that the challenge of creating my own certificate authority is definitely in the scope of a home lab and attempting to learn more about how these certificates work is good practice.

Certificate Authority Selection

Here is where I potentially made a mistake.  I started to search for software to run a private certificate authority.  I didn’t do an extensive search, and I focused on something I could use Certbot with.  Certbot is the utility published by the EFF for acquiring certificates from Let’s Encrypt. 

I found a quick guide for setting up something called Step-ca [15].  I did look a little into it, found it’s GitHub page [16] which does have 2,400 stars (a pseudo rating system on GitHub) which is not a small number.  There is also a docker for it [17], which is also a good sign. 

I would be remiss to say that I did a lot of research here, I really didn’t.  I mostly just didn’t find good guides for any other private certificate authority service. The other options I saw were EJBCA and openCA.  I chose to go with the one I had a guide for.  This may be a mistake in the end, as later in my project, a security friend of mine told me the documentation wasn’t good for it, which is a warning sign of poor support. 

That being said, I am not certain what the best choice would be, and as with the DNS deployment, it is more important to learn here than to make the most optimal decision right out of the box.  I also couldn’t easily separate market share for the actual public certificate authorities and the software being used here.

Step-ca Install and Configuration

I decided to go ahead with the Step-ca plan.  I spawned a new VM, again very basic, 1 CPU, 1Gig ram, 16 Gig hard drive, VM network.  If this is working properly it will do practically nothing except when certificates need to be renewed.  This is a case for the smallest VM needed.  There is even a case for keeping it off except in renewal scenarios, but I think I’d likely just forget if it wasn’t on, so that would need to be automated, perhaps for later in the project.

From there I actually had to combine the two guides I saw.  I wanted to run this in Ubuntu, but the Step-ca project doesn’t natively support Ubuntu.  However, as I said earlier, it does have a docker container.  This makes running it in Ubuntu trivial, just need to install docker.  I tend to prefer docker-compose, simply because I like having an explicit file to work with.  I set up the Step-ca service to work within docker-compose:

version: '3'
services:
    step-ca:
        container_name: step-ca
        image: smallstep/step-ca:latest
        restart: always
        ports:
            - "9000:9000"
            - "443:443"
        volumes:
            - /home/nrweaver/step/:/home/step/

Yes, I do run this out of the home directory.  That’s not enterprise deployment approved, but as I expect only myself to work on this, I don’t want to go overboard on everything. Keeping it in the home directory will help me remember where everything actually is.  I also litter all of my VMs with README.txt files describing where everything is, how the services are run, and a set of basic commands in case the next time I actually look on the VM I don’t remember what I am doing.

The basic docker install is covered on the actual docker page [18], which I just stepped through starting at step 3 since I was using docker-compose.

3. Bring up PKI bootstrapping container

docker run -it -v step:/home/step smallstep/step-ca sh

4. Run step initialization command

$ step ca init

✔ What would you like to name your new PKI?: Smallstep

✔ What DNS names or IP addresses would you like to add to your new CA? localhost

✔ What address will your new CA listen at?: :9000

✔ What would you like to name the first provisioner for your new CA?: admin

✔ What do you want your password to be?: <your password here>

Generating root certificate...

all done!

Generating intermediate certificate...

all done!


✔ Root certificate: /home/step/certs/root_ca.crt

✔ Root private key: /home/step/secrets/root_ca_key

✔ Root fingerprint: f9e45ae9ec5d42d702ce39fd9f3125372ce54d0b29a5ff3016b31d9b887a61a4

✔ Intermediate certificate: /home/step/certs/intermediate_ca.crt

✔ Intermediate private key: /home/step/secrets/intermediate_ca_key

✔ Default configuration: /home/step/config/defaults.json

✔ Certificate Authority configuration: /home/step/config/ca.json

✔ Certificate Authority configuration: /home/step/config/ca.json

Your PKI is ready to go. To generate certificates for individual services see 'step help ca`}

5. Place the PKI root password in a known safe location.

The image is expecting the password to be placed in /home/step/secrets/password. Bring up the shell prompt in the container again and write that file:

docker run -it -v step:/home/step smallstep/step-ca sh

Write the file into the expected location:

echo <your password here> > /home/step/secrets/password

Everything is now ready to run step-ca inside your container!

https://hub.docker.com/r/smallstep/step-ca

Then I enabled the ACME server by running the shell in the container and adding the provisioner:

docker run -it -v step:/home/step smallstep/step-ca sh
$ step ca provisioner add acme --type ACME

At this point ACME is running.  I then completed the steps for testing the basic deploy on the docker container site:

Setting Up a Development Environment

To initialize the development environment, grab the root fingerprint from step 4. In the case of this example: f9e45ae9ec5d42d702ce39fd9f3125372ce54d0b29a5ff3016b31d9b887a61a4. With the root certificate’s fingerprint we can bootstrap our dev environment.

$ step ca bootstrap --ca-url https://localhost:9000 --install --fingerprint f9e45ae9ec5d42d702ce39fd9f3125372ce54d0b29a5ff3016b31d9b887a61a4

The root certificate has been saved in ~/.step/certs/root_ca.crt.

Your configuration has been saved in ~/.step/config/defaults.json.

Installing the root certificate in the system truststore... done.

Your localstep CLI is now configured to use the container instance of step-ca and our new root certificate is trusted by our local environment (inserted into local trust store).

$ curl https://localhost:9000/health

{"status":"ok"}

https://hub.docker.com/r/smallstep/step-ca

That worked for me with no need for change.  I don’t really want to copy my own experience here, as a lot of it is copying around fingerprints and certificates.  I don’t really want to publish what I am using here.

Now the next part was setting up BIND for a DNS ACME challenge.  I found a slightly out of date guide for doing this here [19].  I say slightly out of date, because it calls for generating an HMAC-SHA512 key with dnssec-keygen.  It appears that this is deprecated in favor of tsig-keygen [20].  This generates a secret key that allows the Dynamic DNS entries for the acme challenge to proceed. 

With this, no human needs to do anything, the Certbot app itself can tell the DNS server to add the text entry for the token challenge.  This necessitated a change in BIND configuration.  I had to move the zones into a place where AppArmor would allow them to be edited /var/lib/bind/.  I also needed to add the Dynamic DNS policy to allow the user of the tsig key I generated to update the DNS domains.  These are specifically enumerated here.  For my first testing of this, I wanted to generate a certificate for the Ubiquiti Cloud Key I use to manage my network.  

First I changed /etc/bind/named.conf.local to allow an update policy.

//
// Do any internal configuration here
//

// Consider adding the 1918 zones here, if they are not used in your
// organization
//include "/var/lib/bind/zones.rfc1918";

zone "internal" {
        type master;
        file "/var/lib/bind/db.internal";
        allow-transfer { 192.168.2.4; 127.0.0.1; };
        check-names warn;
        update-policy {
                grant ca_wildcard. name _acme-challenge.ubiquiti.internal TXT;
                grant ca_wildcard. name _acme-challenge.internal TXT;
        };
};

zone "2.168.192.in-addr.arpa" {
        type master;
        file "/var/lib/bind/db.2.168.192";  #192.168.2.0/24 subnet
        allow-transfer { 192.168.2.4; 127.0.0.1; };
};

I copied the zones to /var/lib/bind.

/etc/bind$ cp /etc/bind/zones/db.* /var/lib/bind/

Then, I then generated the tsig key I would need.

/etc/bind$ tsig-keygen -a hmac-sha512 tsig-key > /etc/bind/tsig.key

Next, I moved that into ca_wildcard. I was planning to generate a wildcard certificate for all my internal devices.  

/etc/bind$ cp /etc/bind/tsig.key /etc/bind/ca_wildcard_key.conf

Then I added it to the /etc/bind/named.conf.

// This is the primary configuration file for the BIND DNS server named.
//
// Please read /usr/share/doc/bind9/README.Debian.gz for information on the
// structure of BIND configuration files in Debian, *BEFORE* you customize
// this configuration file.
//
// If you are just adding zones, please do that in /etc/bind/named.conf.local

include "/etc/bind/named.conf.options";
include "/etc/bind/named.conf.local";
include "/etc/bind/named.conf.default-zones";
include "/etc/bind/named.conf.log";
include "/etc/bind/ca_wildcard_key.conf";

And finished it by restarting the bind service.

$ systemctl restart bind9.service

I then thought I would be able to complete a Certbot challenge.  So I logged in to my home assistant VM, which runs standard Ubuntu, and tried to execute a change manually with nsupdate.  Following the guide:

$ nsupdate -k /home/nrweaver/ca_wildcard_key.conf -v
> server 192.168.2.4
> debug yes
> zone _acme-challenge.ubiquiti.internal
> update add _acme-challenge.ubiquiti.internal. 86400 TXT "test"
> show
Outgoing update query:
;; ->>HEADER<<- opcode: UPDATE, status: NOERROR, id:      0
;; flags:; ZONE: 0, PREREQ: 0, UPDATE: 0, ADDITIONAL: 0
;; ZONE SECTION:
;_acme-challenge.ubiquiti.internal. IN  SOA

;; UPDATE SECTION:
_acme-challenge.ubiquiti.internal. 86400 IN TXT "test"

> send
Sending update to 192.168.2.4#53
Outgoing update query:
;; ->>HEADER<<- opcode: UPDATE, status: NOERROR, id:  37524
;; flags:; ZONE: 1, PREREQ: 0, UPDATE: 1, ADDITIONAL: 1
;; ZONE SECTION:
;_acme-challenge.ubiquiti.internal. IN  SOA

;; UPDATE SECTION:
_acme-challenge.ubiquiti.internal. 86400 IN TXT "test"

;; TSIG PSEUDOSECTION:
tsig-key.               0       ANY     TSIG    hmac-sha512. 1606187444 300 64 <THIS IS A SECRET> 37524 NOERROR 0

; TSIG error with server: tsig indicates error

Reply from update query:
;; ->>HEADER<<- opcode: UPDATE, status: NOTAUTH, id:  37524
;; flags: qr; ZONE: 1, PREREQ: 0, UPDATE: 0, ADDITIONAL: 1
;; ZONE SECTION:
;_acme-challenge.ubiquiti.internal. IN  SOA

;; TSIG PSEUDOSECTION:
tsig-key.               0       ANY     TSIG    hmac-sha512. 1606187444 300 0 37524 BADKEY 0

Now, this failed, at first at least, because I didn’t realize how all of these are connected.  When generating the tsig-key, it creates a file for the new tsig key and gives it a key name.

key "tsig-key." {
        algorithm hmac-sha512;
        secret <THIS IS A SECRET>;
};

In this case, that key is named tsig-key.  This needs to match the key after the grant in the /etc/bind/named.conf.local.  The grant is given to the key called ca_wildcard.  A quick rename and it was failing in a new way!  I don’t actually have examples of this anymore, but I was getting a BADTIME error. 

After looking into this some more, it turns out that there are some timing requirements about requests and responses between the servers.  They need to be aligned [21].  This took me a bit to figure out, but it wasn’t really related to this entire certificate authority or DNS setup. 

I had installed NTP (network time protocol) on the home assistant VM.  This apparently was overwriting the native timedatectl for Ubuntu.  I tried a couple things, but eventually just uninstalled it.  That fixed the BADTIME issue, and was able to get the entire VM set to the right time.  This is one of those things where my older understanding of what to do here got in the way.  I usually install NTP after setting up Ubuntu.  Just need to get out of that habit now.  With that fixed, I was able to succeed in the DNS update.

$ nsupdate -k /home/nrweaver/ca_wildcard_key.conf -v
> server 192.168.2.4
> debug yes
> zone _acme-challenge.ubiquiti.internal
> update add _acme-challenge.ubiquiti.internal. 86400 TXT "test"
> show
Outgoing update query:
;; ->>HEADER<<- opcode: UPDATE, status: NOERROR, id:      0
;; flags:; ZONE: 0, PREREQ: 0, UPDATE: 0, ADDITIONAL: 0
;; ZONE SECTION:
;_acme-challenge.ubiquiti.internal. IN  SOA

;; UPDATE SECTION:
_acme-challenge.ubiquiti.internal. 86400 IN TXT "test"

> send
Sending update to 192.168.2.4#53
Outgoing update query:
;; ->>HEADER<<- opcode: UPDATE, status: NOERROR, id:  29608
;; flags:; ZONE: 1, PREREQ: 0, UPDATE: 1, ADDITIONAL: 1
;; ZONE SECTION:
;_acme-challenge.ubiquiti.internal. IN  SOA

;; UPDATE SECTION:
_acme-challenge.ubiquiti.internal. 86400 IN TXT "test"

;; TSIG PSEUDOSECTION:
ca_wildcard.            0       ANY     TSIG    hmac-sha512. 1606187811 300 64 <THIS IS A SECRET> 29608 NOERROR 0


Reply from update query:
;; ->>HEADER<<- opcode: UPDATE, status: NOTAUTH, id:  29608
;; flags: qr; ZONE: 1, PREREQ: 0, UPDATE: 0, ADDITIONAL: 1
;; ZONE SECTION:
;_acme-challenge.ubiquiti.internal. IN  SOA

;; TSIG PSEUDOSECTION:
ca_wildcard.            0       ANY     TSIG    hmac-sha512. 1606187811 300 64 <THIS IS A SECRET> 29608 NOERROR 0

This appears to work, however when I attempted to read the node, I at first failed.

$ dig @ns1.internal internal axfr

; <<>> DiG 9.16.1-Ubuntu <<>> @ns1.internal internal axfr
; (1 server found)
;; global options: +cmd
; Transfer failed.

It took me a bit to figure out that the transfer of all node options is actually restricted in BIND itself under the allow-transfer options in /etc/bind/named.conf.local.  I was able to complete the dig on the actual DNS server itself.

# dig @ns1.internal ubiquiti.internal axfr

; <<>> DiG 9.16.1-Ubuntu <<>> @ns1.internal ubiquiti.internal axfr
; (1 server found)
;; global options: +cmd
; Transfer failed.
root@nameserver:/etc/bind# dig @ns1.internal internal axfr

; <<>> DiG 9.16.1-Ubuntu <<>> @ns1.internal internal axfr
; (1 server found)
;; global options: +cmd
internal.               604800  IN      SOA     internal. root.internal. 80 604800 86400 2419200 604800
internal.               604800  IN      NS      ns1.internal.
internal.               604800  IN      NS      ns2.internal.

I don’t really want to list everything I configured here, but I will state that the ACME challenge was not present!  This was confusing.  It appeared that everything was working, but I could not see the test record.  However, this appears to be a known issue and does not prevent the challenge from completing, there is a note in the guide about it. 

MPORTANT: In 18.04, somehow, I can’t see the _acme-challenge sub-domain, yet the certbot command works as expected. I’m not totally sure what happens. The nsupdate command returns NOERROR, so it would look like it worked as expected.

https://linux.m2osw.com/setting-bind-get-letsencrypt-wildcards-work-your-system-using-rfc-2136

I decided I had properly configured this for now and wanted to pass the Certbot challenge.  However, since this is not pointing to Let’s Encrypt proper, but I want to use my own Certificate Authority, I needed to work on the Certbot command itself.  There is a sample on the original step-ca guide I was looking at for Certbot:

$ sudo REQUESTS_CA_BUNDLE=$(step path)/certs/root_ca.crt \

  certbot certonly -n --standalone -d foo.internal \

--server https://ca.internal/acme/acme/directory

https://smallstep.com/blog/private-acme-server/

I combined that with the other DNS guide setup:

sudo certbot certonly \

--dns-rfc2136 \

--dns-rfc2136-credentials /etc/bind/letsencrypt_keys/certbot.ini \

-d '*.restarchitect.com'

-d restarchitect.com

https://linux.m2osw.com/setting-bind-get-letsencrypt-wildcards-work-your-system-using-rfc-2136

I needed to create the rfc2136-credentials file, which looks as the following:

# Target DNS server
dns_rfc2136_server = 192.168.2.4
# Target DNS port
dns_rfc2136_port = 53
# TSIG key name
dns_rfc2136_name = ca_wildcard.
# TSIG key secret
dns_rfc2136_secret = <THIS IS A SECRET, IT IS IN THE TSIG-KEY FILE>
# TSIG key algorithm
dns_rfc2136_algorithm = HMAC-SHA512

One thing to note, I am using the raw IP address here, not ns1.internal, which I would prefer.  This appears to be a bit of a chicken and egg problem.  It cannot trust the DNS translation of its own IP address, so I needed to plug in the raw IP address.  I eventually got the following Certbot command to work:

$ sudo REQUESTS_CA_BUNDLE=/home/nrweaver/step/certs/root_ca.crt certbot certonly --dns-rfc2136 --dns-rfc2136-credentials /home/nrweaver/ca_wildcard_key.conf -n -d internal --server https://ca1.internal:9000/acme/acme/directory --agree-tos --email <my email>
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator dns-rfc2136, Installer None
Obtaining a new certificate
Performing the following challenges:
dns-01 challenge for internal
Unsafe permissions on credentials configuration file: /home/nrweaver/ca_wildcard_key.conf
Waiting 60 seconds for DNS changes to propagate
Waiting for verification...
Cleaning up challenges

IMPORTANT NOTES:
 - Congratulations! Your certificate and chain have been saved at:
   /etc/letsencrypt/live/internal/fullchain.pem
   Your key file has been saved at:
   /etc/letsencrypt/live/internal/privkey.pem
   Your cert will expire on 2020-11-19. To obtain a new or tweaked
   version of this certificate in the future, simply run certbot
   again. To non-interactively renew *all* of your certificates, run
   "certbot renew"
 - If you like Certbot, please consider supporting our work by:

   Donating to ISRG / Let's Encrypt:   https://letsencrypt.org/donate
   Donating to EFF:                    https://eff.org/donate-le

This appears to have worked!  I had my own certificate authority up and running.  Now I needed to install the correct certificates on all of my services.  The root cert is located in the directory where step-ca was installed under step/certs/root_ca.crt.  I used WinSCP to copy that onto my workstation and installed it with the following guide [22].  Then I checked in my browser that the root certificate was installed correctly by visiting https://ca1.inernal:9000/health

With that, I have verified that once these certificates are installed correctly, this whole scheme should work.

Before I go into each of the individual certificate installs I performed, I want to cover two global things up front.  First is that I was attempting to use a wildcard certificate over *.internal to each of these installs.  Using Let’s Encrypt publicly, it is possible to simply generate one certificate and get it working for all subdomains of the top level domain.  For example, a wildcard certificate for *.google.com would be able to cover www.google.com and mail.google.com because both are subdomains of google.com [23].  I thought I was able to generate a wildcard certificate following the Certbot manual, and still think I did it correctly, but the certificate did not work on any subdomain.  The basic explanation I found was that the wildcard certificate only covers single-level domains, but this appears to be correct in my case [24].

From the Certbot usage page I can see the following:

-d DOMAIN, –domains DOMAIN, –domain DOMAIN
Domain names to apply. For multiple domains you can
use multiple -d flags or enter a comma separated list
of domains as a parameter. The first domain provided
will be the subject CN of the certificate, and all
domains will be Subject Alternative Names on the
certificate. The first domain will also be used in
some software user interfaces and as the file paths
for the certificate and related material unless
otherwise specified or you already have a certificate
with the same name. In the case of a name collision it
will append a number like 0001 to the file path name.

https://certbot.eff.org/docs/using.html

From lots of other guides I can see that the syntax appears to simply be *.<domain> for a wildcard certificate [25] [26].  Even in the screenshot, I am highlighting the SAN (Subject Alternate Name, basically multi-domain certs) showing a wildcard, and the Subject showing the wildcard as well.  The text of the error even uses the wildcard. 

I think I must have made a minor mistake somewhere.  Although, I am not sure what it is.  I contacted a security friend of mine to ask about this. He suggested I look into some configuration options for the ACME server and the CA, as wildcards need to be enabled.  I agree with him that it looks like the * is not being interpreted as a wildcard, so it might be a configuration option. 

I found some documentation here [27], which helped on a few things I was having trouble with. But, there was nothing about wildcards not being allowed.  It also seems odd that Certbot did not have any issues with it either.  I am simply uncertain what I have done wrong here.

I did still want to make progress. So, on my friend ‘s suggestion, I just added ubiquiti.internal to the certificate along with the wildcard. That did work, even though it did not set the common name to be ubiquiti.internal.  I even followed the user guide and put that as the first domain, and it still does not show up correctly in the certificate.  I have verified that Certbot picked up ubiquiti.internal first because that is where it placed the certificates on the certifying machine.

It worked, but it did not mark the Common Name correctly (That is the subject in this certificate).  I ended up modifying the ForceCN on the ACME server for Step-ca at his suggestion as well.  That involved changing a config file located in the step directory at step/config/ca.json.  I edited the ACME section as follows:

{
    "type": "ACME",
    "name": "acme",
    "forceCN": true,
}

That did not fix the issue.  

I decided to simply proceed like this.  I was doing certification on an external system, and programmatically generating 5 certificates and deploying them is not all that much more effort than generating one and deploying it 5 times.  It is simply figuring out the separate Certbot commands, which at this point is trivially changing the domains, and enabling the DNS challenge queries on them.

The last thing I wanted to address was the critically short time period of the certificates.  The previous examples show after the certificates had been fixed, but it appears that the default time to live for a step-ca certificate is 24 hours.  That was something I didn’t want to work with.  I read through the configuration documentation and set the default to 90 days, to make it similar to Let’s Encrypt.  I will probably extend it to a year or longer at some point, but a part of me wants to keep these at a lower clip, just to make sure I don’t lose some of this base knowledge.  This is done by configuration that same step/config/ca.json file’s ACME section as follows:

{
    "type": "ACME",
    "name": "acme",
    "forceCN": true,
    "claims": {
        "maxTLSCertDuration": "2160h",
        "minTLSCertDuration": "720h",
        "defaultTLSCertDuration": "2160h"
    }
}

One thing I do wish the documentation would address is what valid parameters are.  It turns out that “90d” is not an acceptable parameter like I had thought.  I had to check the logs to figure that out though.  I also ended up installing a JSON linter to validate the JSON, since restarting the Docker container and checking the logs was getting cumbersome.

That covers the global set up configuration I changed, and the general approach towards generating the certificates I used.  Now let’s quickly cover the actual deployment to the various systems I wanted to use them for.  There are five services I wanted to install certificates on.  I had to do a fair bit of research on them to get them to work.  I will quickly go over all of them and what I had to do.

Ubiquiti Cloud Key

I started off with this one because I expected it to be the easiest.  I was wrong.  The Ubiquiti Unifi Cloud Key [28] is a controller for my network.  It essentially is a small ARM-based computer whose sole purpose is to run the controller software and web interface.  The Cloud Key runs on PoE (Power over Ethernet), which makes keeping it running as simple as keeping it plugged into one of the switches it manages.  It is possible to simply run this system in a VM, but I actually prefer to keep basic internet operations running like this.  Plus, this way, I can depend on Ubiquiti to do most of the maintenance on it, though I do need to login and force updates periodically.

The Unifi system starts with a key signed in San Jose, which is likely a basic self-signed certificate.  I ran into a lot of trouble trying to figure out how to actually replace the default certificate on this system.  I started with this guide [29], which mentions no fewer than seven methods for doing this! After trying, and failing, with three of them, I gave up on it. I honestly don’t have logs or screenshots from all of these methods.  Maybe I didn’t follow the guide correctly, but I don’t think so.  I started with the pre-commands:

Connect to the UniFi Cloud Key.

Stop UniFi Controller by running:
service unifi stop

Remove the symbolic link to the default certificate file and copy the real certificate file via:
rm /usr/lib/unifi/data/keystore && cp /etc/ssl/private/unifi.keystore.jks /usr/lib/unifi/data/keystore

Comment out or remove the following line in /etc/default/unifi
UNIFI_SSL_KEYSTORE=/etc/ssl/private/unifi.keystore.jks

Restart UniFI Controller using this command:
service unifi start

After this, proceed with the SSL setup.

https://www.namecheap.com/support/knowledgebase/article.aspx/10134/33/installing-an-ssl-certificate-on-ubiquiti-unifi#ace_jar

These appear to be based on a community article I discovered later [30] or possibly [31].  Side note, two articles stating the same instructions four years apart is probably not a good thing, as it shows the issues persist and aren’t addressed.  

I tried the java jar method, the keytool method for PEM files, and the keytool method for PKCS12.  They did not install the certificates correctly.  I saw a couple more guides that get even crazier, for example [32] [33].  Ultimately I found this guy’s shell script [34], which I used as a baseline for writing my own script to decrease iteration time for installing certificates.  I hacked it up to look like:

#!/bin/bash

# Backup current certificate. Just in case. Can never be too careful
tar -zcvf /root/CloudKeySSL_`date +%Y-%m-%d_%H.%M.%S`.tgz /etc/ssl/private/*

# Delete current files
rm -f   /etc/ssl/private/cert.tar                           \
        /etc/ssl/private/unifi.keystore.jks                 \
        /etc/ssl/private/unifi.keystore.jks.md5             \
        /etc/ssl/private/cloudkey.crt                       \
        /etc/ssl/private/cloudkey.key


# Decrypt keys and convert certificates to plain text
# Note, aircontrolenterprise is not arbitrary. this is what UniFi is expecting
openssl pkcs12 -export -in /root/fullchain.pem    \
                    -inkey /root/privkey.pem    \
                      -out /etc/ssl/private/cloudkey.p12    \
                      -name unifi -password pass:aircontrolenterprise

# Import keys into Java Key Store
keytool -importkeystore -deststorepass aircontrolenterprise \
            -destkeypass aircontrolenterprise               \
            -destkeystore /usr/lib/unifi/data/keystore      \
            -srckeystore /etc/ssl/private/cloudkey.p12      \
            -srcstoretype PKCS12 -srcstorepass aircontrolenterprise -alias unifi


# Cleanup
rm -f /etc/ssl/private/cloudkey.p12

pushd /etc/ssl/private

cp /root/fullchain.pem cloudkey.crt
cp /root/privkey.pem cloudkey.key

# Create tar file cloudkey expects
tar -cvf cert.tar *

# set permissions
chown root:ssl-cert /etc/ssl/private/*
chmod 640           /etc/ssl/private/*

popd

# Test
/usr/sbin/nginx -t

echo "Press enter to restart nginx and unifi"
read

/etc/init.d/nginx restart
/etc/init.d/unifi restart

Essentially the plan here is to copy the fullchain.pem and the privkey.pem from the certification directory to the Ubiquiti Cloud Key and run this script.  I don’t remember all of the things I attempted before I hacked this together.  But I do remember discovering a few key pieces of information along the way.

  1. The /etc/ssl/private/ directory expects and needs a cloudkey.crt and cloudkey.key as well as a tar’d up version called cert.tar of everything in this entire directory.  This is validated on startup.
  2. The default keystore is in /usr/lib/unifi/data/keystore, and must be deleted.  There is a backup empty keystore in /etc/ssl/private/unifi.keystore.jks that can be used to replace it.
  3. The keystore expects a unifi key to exist, and that is where it looks for the correct certificate.
  4. There is a place to store the keystore password in /usr/lib/unifi/data/system.properties.  It looks for the parameter app.keystore.pass=<password> for the password to the keystore.  Default is aircontrolenterprise.
  5. The permissions must be changed on all of the files in /etc/ssl/private/ to root:ssl-cert with file permissions of 640 (that is just a bitmask for owner has read/write and group has read permissions) or the validation of files here fails.

I figured these out from all of the guides, though not all of the guides agree on what to do and what is important.  I strongly suspect that is due to different Ubiquiti Cloud Key software versions, which aren’t well documented in any of the community forums or guides.  With that shell script and the knowledge I had built up from a day of failing to get the Ubiquiti Cloud Key to use the certificates I was generating, I was finally able to get the Cloud Key to actually use the certificate!

I thought I had a victory here, but it turns out that this only works if I restart the services.  It did not survive an actual Cloud Key reboot, and I really don’t want to attempt a firmware upgrade on the thing and watch it blow away the certificates and regenerate its self signed certificates again. 

I did read through a lot of community forum posts and guides though and one thing I did recall, over and over again I saw something about when using a Windows file transfer, it needs to copy all files into the /etc/ssl/private/ directory in one command [35].  This was a minor point to me, but I do know that it does validation on the cert.tar file with what is in the /etc/ssl/private/ directory.  I am speculating that it checks timestamps as well for validating the certificates haven’t been messed with.  In addition I also read that several people on the forums were reporting the need to keep a copy of the /usr/lib/unifi/data/keystore in the /etc/ssl/private/ directory.  I hacked up the shell script to cover these cases:

#!/bin/bash

# Backup current certificate. Just in case. Can never be too careful
tar -zcvf /root/CloudKeySSL_`date +%Y-%m-%d_%H.%M.%S`.tgz /etc/ssl/private/*

# Delete current files
rm -f   /etc/ssl/private/cert.tar                           \
        /etc/ssl/private/unifi.keystore.jks                 \
        /etc/ssl/private/unifi.keystore.jks.md5             \
        /etc/ssl/private/cloudkey.crt                       \
        /etc/ssl/private/cloudkey.key


# Decrypt keys and convert certificates to plain text
# Note, aircontrolenterprise is not arbitrary. this is what UniFi is expecting
openssl pkcs12 -export -in /root/fullchain.pem    \
                    -inkey /root/privkey.pem    \
                      -out /etc/ssl/private/cloudkey.p12    \
                      -name unifi -password pass:aircontrolenterprise

# Import keys into Java Key Store
keytool -importkeystore -deststorepass aircontrolenterprise \
            -destkeypass aircontrolenterprise               \
            -destkeystore /usr/lib/unifi/data/keystore      \
            -srckeystore /etc/ssl/private/cloudkey.p12      \
            -srcstoretype PKCS12 -srcstorepass aircontrolenterprise -alias unifi


# Cleanup
rm -f /etc/ssl/private/cloudkey.p12

pushd /etc/ssl/private

cp /root/fullchain.pem cloudkey.crt
cp /root/privkey.pem cloudkey.key
cp /usr/lib/unifi/data/keystore unifi.keystore.jks

# Create tar file cloudkey expects
tar -cvf cert.tar *

# set permissions
chown root:ssl-cert /etc/ssl/private/*
chmod 640           /etc/ssl/private/*
touch               /etc/ssl/private/*

popd

# Test
/usr/sbin/nginx -t

echo "Press enter to restart nginx and unifi"
read

/etc/init.d/nginx restart
/etc/init.d/unifi restart

This works and survives a reboot.  The touch command for everything in /etc/ssl/private/ after completion just gives all the files the same timestamp, and the keystore is copied after adding the cert in.  I did ask about all of this on the forums, and was told that Ubiquiti has taken custom SSL certificates as a feature request, but I don’t see a specific date for when to expect this. 

All I can say is that this was way more difficult than I expected it to be.  There is a ton of contradictory information out there, and even now, I am not sure which of the steps are truly necessary.  I can only say that this script works for me.  Custom SSL certificate support needs quite a bit of work from Ubiquiti here.

I typed up a rather large README.txt file for myself to cover how to do this in the future, but most of my discoveries are in the shell script.

VMWare ESXi

This was a breath of fresh air compared to the Ubiquiti Cloud Key.  There is a nice support article from VMWare on it, and that was all I needed [36].  I haven’t installed their vCenter Server yet.  I did that once. It took up 500GB of disc space.  I decided to put that off until I have the storage VM up and running and use those drives for this purpose.  The 2tb drives I have for fast VMs don’t need that much space taken up for the management software.  Reading through the section on Installing and configuring the certificate on the ESXi Host, I skipped ahead to point 6.  I ended at point 11.

6. Log in to the host and then navigate to /etc/vmware/ssl

7. Copy the files to a backup location, such as a VMFS volume

8. Log in to the host with WinSCP and navigate to the /etc/vmware/ssl directory

9. Delete the existing rui.crt and rui.key from the directory

10. Copy the newly created rui.crt and rui.key to the directory using Text Mode or ASCII mode to avoid the issue of special characters ( ^M) appearing in the certificate file

11. Type vi rui.crt to validate that there are no extra characters

Note: There should not be any erroneous ^M characters at the end of each line.

https://kb.vmware.com/s/article/2113926

That worked, the fullchain.pem became rui.crt and the privkey.pem became rui.key.  I then, briefly, looked for the web interface method to execute Restart Management Agents, but couldn’t find it.  I just rebooted.  It came up working.

FreeNAS Storage VM

This was also pretty easy to accomplish.  There is a web GUI for it, which I will probably eventually want to automate, but it appears that FreeNAS is expanding and even has support for setting up a Certificate Authority.  I just wanted to get these certificates installed.  I can likely automate renewal at a later point if I wish.

First I navigated to System->Certificates

Then I clicked on add in the top right, and selected the drop down for Import Certificate.

After that I copied in the certificate text and gave it a name (This is just a test, I don’t want to put up a picture of my certificates).

The last thing that needs to be done is to tell the FreeNAS system to use the certificate I just imported.  That is done from the Setting->General section, click the GUI SSL Certificate dropdown and select the certificate just added.

A reboot later and the whole system is now using the correct certificate.

Supermicro IPMI

This was one I thought would be simple, but turns out to not be.  There is a quick GUI interface for it.  I logged in and clicked on Configuration -> SSL Certification.

Then I uploaded the fullchain.pem and the privkey.pem files.

I clicked on upload and a dropdown popped down warning me I was replacing SSL Certificates.

I clicked okay, and it informed me it needed to reboot.

Unfortunately this does not work.  It didn’t take long to figure out the cause.  It appears that the IPMI web server is only reading the top level certificate and not including the intermediate chain.

Without the intermediate certificate, the browser cannot connect this certificate to the trusted root certificate I previously installed.  I pinged a couple people on this, and they were not that surprised.  I think this is a bit of a known issue.  Nobody had any good suggestions on what to try here. 

I did find a python script to automate uploading these certificates in the future [37].  This doesn’t work for me, but if I do manage to get these certificates working, I will likely spend some time on this script and fix it up.  Given the methodology I am using, I would prefer to script a command line version for a cron job (automatically run commands a specific times and or intervals) to generate and upload the new certificates.

At this point, I have kinda given up on these.  I would prefer to figure this out, but after a fair bit of research, all I am certain of is that it doesn’t load the full chain of certificates and I don’t see anything else to try.  Perhaps I am getting a bit fatigued on these issues and will try again later.

Tyan IPMI

The Tyan IPMI turned out exactly like the Supermicro IPMI.  I think they are both manufactured by ASpeed underneath, so it’s entirely possible this is simply a manufacturer issue.  I will cover the GUI method for uploading a new SSL certificate and show the same issue about not including the intermediate certificate.

First I logged in and navigated to the settings screen.

Then I selected SSL Settings.

Then I selected Upload SSL Certificates.

I then clicked and selected the fullchain.pem for the new certificate, and the privkey.pem for the private key.

I then clicked save and it told me to reload the page.

Lastly it still identifies itself as insecure because it is missing the intermediate certificate.

I tried reversing the order of the certificates in the fullchain.pem file.  Next, I tried just the chain.pem file and the cert.pem file.  I even tried converting it to a DER file and a PFX file, both different certificate formats.  None of these fixed the issue. 

I am again at a loss for what to try next.  For now, I have concluded that the IPMI’s have a bug I am not sure how to work around it yet.  I may try command line IPMI at some point, but I am not optimistic.  

Conclusion

At the end of the day though, I have a Certificate Authority that uses ACME and DNS challenges to generate SSL Certificates for all of my internal services without ever exposing anything to the internet at large.  I only need to install the CA’s root certificate on my devices and most of these pesky insecure warnings will disappear for me.  In addition all of my services have internal domain names for added flexibility and human remembrance.  I also learned quite a bit about how all of these actually connect together and work.  I am quite pleased with this end result.

References

[1] https://en.wikipedia.org/wiki/Domain_Name_System

[2] https://www.thesitewizard.com/domain/register-with-icann-sans-middlemen.shtml

[3] https://en.wikipedia.org/wiki/Comparison_of_DNS_server_software

[4] https://opensource.com/article/17/4/build-your-own-name-server

[5] https://www.digitalocean.com/community/tutorials/how-to-configure-bind-as-a-private-network-dns-server-on-ubuntu-18-04#testing-clients

[6] https://coredns.io/

[7] https://stats.ipnet.hu/2017/04/04/dns-server-software-usage-statistics/

[8] https://oitibs.com/bind9-logs-on-debian-ubuntu/

[9] https://windowsreport.com/install-windows-10-root-certificates/

[10] https://tosbourn.com/getting-os-x-to-trust-self-signed-ssl-certificates/

[11] https://thomas-leister.de/en/how-to-import-ca-root-certificate/

[12] https://howhttps.works/certificate-authorities/

[13] https://letsencrypt.org/docs/challenge-types/

[14] https://www.leebutterman.com/2019/08/05/analyzing-hundreds-of-millions-of-ssl-connections.html

[15] https://smallstep.com/blog/private-acme-server/

[16] https://github.com/smallstep/certificates

[17] https://hub.docker.com/r/smallstep/step-ca

[18] https://hub.docker.com/r/smallstep/step-ca

[19] https://linux.m2osw.com/setting-bind-get-letsencrypt-wildcards-work-your-system-using-rfc-2136

[20] https://unix.stackexchange.com/questions/523565/how-to-generate-tsig-key-for-certbot-plugin-certbot-dns-rfc2136

[21] https://tools.ietf.org/id/draft-dupont-dnsop-rfc2845bis-01.html#time_check

[22] https://windowsreport.com/install-windows-10-root-certificates/

[23] https://knowledge.digicert.com/generalinformation/INFO900.html

[24] https://www.ssl2buy.com/wiki/how-to-fix-err_cert_common_name_invalid-in-chrome

[25] https://medium.com/@saurabh6790/generate-wildcard-ssl-certificate-using-lets-encrypt-certbot-273e432794d7

[26] https://linux.m2osw.com/setting-bind-get-letsencrypt-wildcards-work-your-system-using-rfc-2136

[27] https://smallstep.com/docs/step-ca/configuration

[28] https://www.ui.com/unifi/unifi-cloud-key/

[29] https://www.namecheap.com/support/knowledgebase/article.aspx/10134/33/installing-an-ssl-certificate-on-ubiquiti-unifi#ace_jar

[30] https://community.ui.com/questions/Cloud-Key-is-not-using-public-certificate-after-reboot/28b0b25e-a756-43d5-aed7-f3e3626b54d3

[31] https://community.ui.com/questions/Cloud-Key-Not-Using-Public-Certificate-After-Reboot/8213d85c-de21-400e-9306-18101c68f968

[32] https://community.ui.com/questions/How-to-install-a-SSL-Certificate-on-Unifi-Cloud-Key/944dbbd6-cbf6-4112-bff5-6b992fcbf2c4

[33] https://community.spiceworks.com/how_to/128281-use-lets-encrypt-ssl-certs-with-unifi-cloud-key

[34] https://blog.arrogantrabbit.com/ssl/Ubiquiti-SSL/

[35] https://community.ui.com/questions/How-to-install-a-SSL-Certificate-on-Unifi-Cloud-Key/944dbbd6-cbf6-4112-bff5-6b992fcbf2c4

[36] https://kb.vmware.com/s/article/2113926

[37] https://gist.github.com/HQJaTu/963db9af49d789d074ab63f52061a951

This Post Has One Comment

  1. Duncan

    Supermicro, you can upload certificates without a problem for X11 boards. X10 need a modified version of ipmi-updater.py that ignores the error codes.
    The certificates need to be 2048 bits and have no intermediate issuer, just a CA and certificate.

    Haven’t got Tyan IPMI certificate to work yet, it seems rather buggy.

Leave a Reply