2021 Supporting Vanity Urls (Custom Domains, DNS CNAMES) in B2

- by Brian Wilson, 8/30/2021

if you have comments or suggestions.
 

Goals of This Document:
Specify the UI and some thoughts around implementation of how to support Vanity Urls (also called "custom domains", also called "CNAME" support) in B2.

What is the Feature?
The ability for a B2 customer to have these a URL of type #4 below serve the same content as all the other URLs:

  1. Native URL: https://f003.backblazeb2.com/b2api/v1/b2_download_file_by_id?fileId=4_zcf3abfd4ca1f5d2172230d16_f105ea9c5e8af8cf4_d20200526_m205255_c003_v0312003_t0024
     

  2. Friendly URL: https://f003.backblazeb2.com/file/catsanddogs/cutedogs/cute_puppy.jpg
     

  3. S3 URL: https://catsanddogs.s3.eu-central-003.backblazeb2.com/cutedogs/cute_puppy.jpg
     

  4. Vanity URL: https://sharefiles.kookbeach.com/cutedogs/cute_puppy.jpg

A "vanity URL" is one term for a custom domain.  B2 already has functionality to fetch any of the URLs #1 - #3 above, this feature is implementing URL #4.


Isn't this possible already?
Not by using Backblaze B2 alone.  It is possible to configure 3rd party systems on top of B2, like put a CDN such as Cloudflare in front of a B2 bucket to kind of achieve the same END USER result, but it is difficult, error prone, and not many people have ever pulled it off, and this isn't an option for regular people.  Configuring a CDN is something only professional IT people do, and then you deal with two separate companies, including the bills from two companies.
 

Is this building a CDN?
No.  A CDN is something completely different.  In the past we have used a CDN as a total hack to achieve this for certain large customers, and this has brain damaged the way people think.  This is just another URL to access the same one file.  We already had 3 URLs to access a file, we want a 4th URL to access a file.  It doesn't change the performance like a CDN does, it doesn't increase or decrease uptime like a CDN, it doesn't do "edge caching" (geography based caching) like a CDN does, it literally has nothing at all to do with a CDN.  Nothing.

 
Is this building a Hosting Feature?
No.  Hosting is something completely different.  This is just another URL to access the same one file.  This has zero to do with hosting.  This is not hosting any more than the S3 URL is "hosting" or the friendly URL is "hosting" or the Native URL is "hosting".  Why would you think serving up another URL is "hosting"?  I feel you aren't thinking clearly.

 
Why Do This Feature?
There are a couple of reasons to do this feature:

  1. Make more money. 
     

  2. Customer have requested it.  (One Example: https://www.reddit.com/r/backblaze/comments/p5eeue/update_on_vanity_subdomain/ )
     

  3. Make it really easy to do this for customers.  Customers don't like hard to use products.  Customers like easy to use products.
     

  4. From customer comment here: https://www.reddit.com/r/backblaze/comments/p5eeue/update_on_vanity_subdomain/hb4g468/
    "I would also note that this feature would make CDNs much easier to use. For example with Cloudflare, instead of all the frankly weird page rule stuff discussed at https://help.backblaze.com/hc/en-us/articles/217666928-Using-Backblaze-B2-with-the-Cloudflare-CDN , you simply point photos.example.com to the B2 bucket hostname, and it just works (plus your bucket name is properly hidden behind Cloudflare, instead of leaked as part of the semi-vanity URL)."
     

What is This Feature NOT DOING?
This feature WILL NOT implementing hosting, and WILL NOT be implementing a CDN.  That is not a goal.  This is just one more URL to a file, that's all.

 

How A Customer With Their Own DNS Configures This:
BrianW uses "hover" as the domain registration for kookbeach.com (kookbeach.com is a vanity domain, for a vanity URL).  So here are the steps after logging into hover.com:

PLEASE NOTE: the name of the B2 bucket I'm configuring with a Vanity URL is "catsanddogs", and the vanity URL is https://sharefiles.kookbeach.com

  1. Click "My Account" in upper right, and select "Control Panel" -> a list of "domains" will be shown 
     

  2. Click on "kookbeach.com" on the left.
     

  3. You should be on the "Overview" tab.  Click "Edit" by Nameservers and make sure they are pointed at the hover DNS servers ns1.hover.com and ns2.hover.com
     

  4. Click the "DNS" tab.
     

  5. MAYBE NOT NECESSARY -> Add an "A Record" to point random IP address like 13.225.51.54 (that's where ski-epic.com resolves to)
      
     

  6. Add a "CNAME Record" to point sharefiles.kookbeach.com to catsanddogs.s3.eu-central-003.backblazeb2.com
    (NOTE: this may work fine, but to generate the certificate it may be required that this is an A record.  This needs to be tested.)

     

  7. Now the DNS tab should look like this:

Ok, so at this point you are all finished configuring your DNS.  Try "ping sharefiles.kookbeach.com" (it might take a few hours for this to work because DNS propagation is slow) and it should say:

% ping sharefiles.kookbeach.com  <-- type this in a command prompt on Windows

Pinging catsanddogs.s3.eu-central-003.backblazeb2.com [45.11.37.254] with 32 bytes of data:
Reply from 45.11.37.254: bytes=32 time=122ms TTL=45
Reply from 45.11.37.254: bytes=32 time=122ms TTL=45
Reply from 45.11.37.254: bytes=32 time=122ms TTL=45


And also, if you try "ping catsanddogs.s3.eu-central-003.backblazeb2.com" it says the same IP address:

% ping catsanddogs.s3.eu-central-003.backblazeb2.com  <-- type this in a command prompt on Windows

Pinging s3.eu-central-003.backblazeb2.com [45.11.37.254] with 32 bytes of data:
Reply from 45.11.37.254: bytes=32 time=384ms TTL=45
Reply from 45.11.37.254: bytes=32 time=348ms TTL=45
Reply from 45.11.37.254: bytes=32 time=421ms TTL=45

That's it!  Those are all the changes required OUTSIDE of Backblaze.
 

GUI Changes to Backblaze Web Interface:
We need the ability to specify that this bucket accepts a particular CNAME.  Here is what that would look like:

Now, when the customer pops up the "Details" information for any file in that bucket, there is one additional URL added.  See below:

That's it for GUI changes.


What The Underlying Functionality Does:
There are three things that have to be implemented.

  1. The API/Download Servers need to accept requests to https://sharefiles.kookbeach.com and this functionality is called "Host Header".  Click here to see that in the RFC. Right now the API/Download servers at https://f003.backblazeb2.com are expecting a Host Header of: "Host=catsanddogs.s3.eu-central-003.backblazeb2.com:443" and it rejects anything that doesn't look like that.  This code needs to be modified to accept posts if the "acceptCNAME" is set on the bucket being accessed. It might also be interesting to look into "Server Name Indication" but it may not be necessary.  Here are some places in our current Java code to look:
          - bzmono/www/java/src/com/backblaze/modules/s3Compat/b2_api/guts/S3HostnameParts.java - object "S3HostnameParts"
          - bzmono/www/java/src/com/backblaze/modules/s3Compat/b2_api/servlet/S3CompatDispatcher.java - rejects HTTPS request if "Host Header" is not correct.
     

  2. When a request comes in for https://sharefiles.kookbeach.com/cutedogs/cute_puppy.jpg then the API/Download server needs to serve up the same thing it served when it gets the request for https://catsanddogs.s3.eu-central-003.backblazeb2.com/cutedogs/cute_puppy.jpg
     

  3. Backblaze needs an HTTPS certificate for https://sharefiles.kookbeach.com through the "HTTP-01 ACME challenge"

     

Of these three steps, I believe #1 and #2 are fairly straightforward.  I also think we can build #1 and #2 to get that part working, then tackle #3 once everything else is working correctly.  So the section below here is dedicated to how to do #3...
 

How to implement #3 above:
There is a beauty to getting a LetsEncrypt certificate from the "HTTP-01 ACME challenge".  Here is a quote from https://letsencrypt.org/how-it-works/:

"The objective of Letís Encrypt and the ACME protocol is to make it possible to set up an HTTPS server and have it automatically obtain a browser-trusted certificate, without any human intervention. This is accomplished by running a certificate management agent on the web server."

Quick explanation of the HTTP-01 ACME challenge: to prove the Backblaze API/Download servers have the "rights" to serve up content over SSL/HTTPS, the API/Download servers contact the LetsEncrypt servers with a request for an HTTPS certficate for sharefiles.kookbeach.com like this, and the LetsEncrypt servers respond with the name of a file to write out to a certain location, with certain signed contents:

 

After the Backblaze Download/API servers create that file in the user's bucket and are willing to serve it up on HTTP (notice it is not HTTPS), then LetsEncrypt fetches the contents of http://sharefiles.kookbeach.com/.well-known/acme-challenge/8303 (notice that is not HTTPS yet) and verifies that the challenge was completed correctly which proves the Backblaze API/Download servers are authorized to do certificate management for sharefiles.kookbeach.com and that's the concept of the HTTP-01 ACME challenge.

So to implement this, here are some things it will involve.

First, we have to turn on HTTP access (no SSL) to the API/Download servers, because there is a chicken and egg problem: how do you get an SSL cert if you don't have one to communicate with?  When requests come into port 80 (HTTP) the Java code should be carefully written to only accept requests that appear to be LetsEncrypt requests for vanity domain certs.  So for example, if "https://sharefiles.kookbeach.com" is the only vanity domain enabled in cluster 003, then ONLY requests to http://sharefiles.kookbeach.com/ (port 80, no SSL, not HTTPS) the Java code should only respond/allow that vanity domain on port 80, and furthermore only this URL should be allowed: http://sharefiles.kookbeach.com/.well-known/acme-challenge/<TOKEN> on HTTP on port 80.  Furthermore, this is ONLY on the API/Download servers in the cluster that this bucket is in. 

The API/Download servers all need to have the program "certbot" installed. To install certbot on Debian, an admin types these commands:

# apt-get install software-properties-common
# add-apt-repository apt-get install certbot
# apt-get update
# apt-get install certbot 

This should be translated into our Backblaze system of however we get software installed on API/Download systems.  After that is done and the API/Download servers have certbot available to them, then when a customer configures the bucket to contain properties on the bucket of { "acceptCNAME": "sharefiles.kookbeach.com" } then the Java code runs this command to get a valid cert for sharefiles.kookbeach.com:

# certbot certonly --standalone -d sharefiles.kookbeach.com

That's it, the valid SSL/HTTPS cert will appear in this folder:

# ls /etc/letsencrypt/live/foo.example.com/
cert.pem chain.pem fullchain.pem privkey.pem README

Finally, the files "cert.pem, chain.pem, and privkey.pem need to be copied to CATALINA_BASE/conf and the permission set correctly:

# cd /etc/letsencrypt/live/sharefiles.kookbeach.com
# cp cert.pem /opt/tomcat/conf
# cp chain.pem /opt/tomcat/conf
# cp privkey.pem /opt/tomcat/conf 
#  
# chown tomcat:tomcat *.pemp chain.pem 

Now, because the API/Download servers are a load balanced set of servers, this cert needs to be copied/sent to the other API/Download servers.  I'm hoping that is a straight-forward task the Java programmers at Backblaze can figure out. 

You should also make sure the Tomcat server.xml is set correctly, this is the part of the server.xml which looks like this:

<Connector port="443"
protocol="org.apache.coyote.http11.Http11NioProtocol" maxThreads="150" SSLEnabled="true">
<SSLHostConfig>  
   <Certificate certificateFile="conf/cert.pem" certificateKeyFile="conf/privkey.pem" certificateChainFile="conf/chain.pem" />
</SSLHostConfig>
</Connector>

That's it!  The next time Tomcat is restarted then this file can be fetched with SSL/HTTPS:

https://sharefiles.kookbeach.com/cutedogs/cute_puppy.jpg

Now, to start with we can wait until the Thursday push to restart Tomcat, customers can just wait for it.  Alternatively, we could restart Tomcat on the API/Download servers of that particular cluster once every two hours in a rotating fashion (restart the first API/Download server's Tomcat, make sure it works and came back, and then restart the next API/Download server, etc) IF AND ONLY IF one customer on that cluster has added a vanity domain.  This would pick up all the vanity domains that had been added in that last two hours.  I really don't expect customers to be adding many of these, we're talking about maybe 100 vanity URLs in the first year at most.
 

Refreshing the LetsEncrypt Certificates:
The LetsEncrypt certificates expire after 90 days, so they need to be refreshed maybe once every 30 days.  To refresh you just run the same set of commands as the first time.

 

That's it!!

Return to Ski-Epic home page.