2020 bztransmit executable

(written 8/16/2020)

by Brian Wilson


 

bztransmit - sends copies of new and changed files through HTTPS to the Backblaze datacenter.   See this parent 2020 Backblaze Personal Backup architecture page for terminology, and some context for what this VERY SPECIFIC web page is about.

 

NOTE: this page is currently a repeat of the content on that above page.  THIS PAGE IS A PLACE HOLDER that BrianW needs to fill out even more.

 

bztransmit - sends copies of new and changed files through HTTPS to the Backblaze datacenter.  bztransmit is always launched by bzserv.  This is the main work horse of the Backblaze client, and where most of the logic surrounding the backups occurs.  bztransmit has no UI components, and is therefore largely cross platform between Windows and Macintosh.  bztransmit runs as the user "SYSTEM" on Windows, and as the user "root" on Macintosh because it is always launched by parent process "bzserv" (see above).
 
Location on Disk Windows: C:\Program Files (x86)\Backblaze\bztransmit.exe (and a 64 bit version at C:\Program Files (x86)\Backblaze\x64\bztransmit64.exe)
Location on Disk Macintosh: /Library/Backblaze.bzpkg/bztransmit (the Macintosh version is ONLY 64 bit, Apple has not shipped a 32 bit laptop in over a decade)
 
Purpose of bztransmit: The primary purpose of "bztransmit" is to make the logical decision of which files to backup (based on the lists prepared by bzfilelist), read those customer files into RAM, compress them (lossless compression), then encrypt the compressed file (still held in RAM), then send this encrypted file over HTTPS (double encryption) to the Backblaze datacenter as a backup.  The bztransmit executable also updates the local laptop's record of what files with what "last modification date" have already been sent to Backblaze so that it doesn't have to back up a file twice.  The records of what files have already been backed up is called the "bz_done" files.

The encryption bztransmit uses is symmetric AES-128 to encrypt each file, with new AES keys and Initialization Vector for every customer data file being different.  The AES keys themselves are then encrypted with 2048 bit public/private key encryption using the customer's public key which is stored on the client in C:\Program Files (x86)\Backblaze\userPub.pem on Windows, and /Library/Backblaze.bzpkg/userPub.pem on Macintosh.  The "private key" is stored on the Backblaze servers ITSELF (the private key) encryped with a totally standard OpenSSL "passphrase".  By default this is a closely guarded secret passphrase only known by Backblaze and rotated (changed) regularly for security reasons.  However, the customer can optionally choose to setup a "Private Encryption Key" in the Backblaze client which changes the passphrase to something only the customer knows.  This passphrase is required to restore any files from the backup, and is not recoverable in any way, so the customers who set this up need to remember it or their backup is useless.
 
What Order does bztransmit upload files?  Which files are backed up first?  In general, Backblaze backs up in file size order, small files first.  Backblaze DOES NOT backup folders, or folders of files, or group the backup into folders.  All files are individual files.  So the client will backup 1 small file from one folder, then backup another small file from a different folder next, then return to the first folder to backup any larger files.  This often confuses customers who are closely watching their backup, and they think Backblaze has "skipped over" one of their files in a folder, when in reality Backblaze will loyally return to that folder when it is backing up larger files.  There are exceptions to the rule of "small files first" as follows: 1) Backblaze really wants to backup at least one file from every different volume FIRST, so the smallest file from each of the attached SSDs or Hard Drives is put to the start of the queue - this is to give the customer immediate feedback that Backblaze saw and acknowledged that each volume is part of the backup and has been refreshed recently, and 2) If the backup was paused or the laptop shut down in the middle of transmitting a large file, Backblaze attempts to complete the transmission of that large file before going back to start at the small files.  This is to avoid wasting all the effort that went into backing up half of a very large file every time the laptop goes to sleep.
 
How does bztransmit handle really small files? For any files less than 15 MBytes in one file, bztransmit "batches" up to 999 of these files into one packed datastructure for more efficient transmitting.  The HTTPS set and tear down murder performance for small files, so doing an individual HTTPS POST for each 1 byte or 2 byte file is painfully slow.  So bztransmit fully prepares the files in all the standard ways (read the file, compress the file, encrypt the file) then appends the finished package end-on-end, and transmits it to the Backblaze datacenter as one HTTPS POST operation.  It still adheres to the "small files first", so initially this can be 999 files each of which are 1, 2, or 3 bytes each, so the finished HTTPS POST operation is still relatively small at 1 - 3 KBytes in total length.  But this is approximately 1,000 time faster than doing each file individually, so it is worth it.  Later on, as the "batch of files" is being assembled, bztransmit stops appending more compressed, encrypted small files when the size of any one "batch" file gets larger than 30 MBytes in size.  So there might only be 2 files in a single "batch" HTTPS POST when bztransmit is transmitting 14 MByte files.  Once each individual file is larger than 15 MBytes the setup and teardown of HTTPS shrinks to be less than 1% of the transmission time, so bztransmit does one HTTPS POST for each file 15 MBytes or larger.  The cutoff of 15 MBytes was originally decided so that a single HTTPS POST would not time out even on the very slowest customer connections.
 
How bztransmit handles large files: For any files between 15 MBytes - 100 MBytes, bztransmit reads the file from disk, compresses the file, encrypts the file, and transmits it to the Backblaze datacenter as one unit.  For large files this becomes impractical, so for files larger than 100 MBytes bztransmit FIRST makes an entire copy of the file broken down into 10 MByte "chunks" that are found in C:\ProgramData\Backblaze\bzdata\bzbackup\bzdatacenter\bzcurrentlargefile\ on Windows, and /Library/Backblaze.bzpkg/bzdata/bzbackup/bzdatacenter/bzcurrentlargefile/ on Macintosh.  That folder can be changed to an external drive if the customer changes the "Temporary data drive" in the "Settings..." client panel.  The original "cutoff" for self contained files was 30 MBytes (not the current 100 MBytes) for two reasons: 99% of customer files were smaller than 30 MBytes each, and that was small enough where the HTTPS POST of a single file did not time out on even the slowest customer connections.  I raised this to 100 MBytes in 2018 for both reasons had changed.  The very slowest upload connections any customer had was now at least 3x faster making it possible, and a lot of large images and some music files were no longer fitting inside 30 MBytes in a single file, but 99% of individual files still fit inside of 100 MBytes.  So that is the modern cutoff point.
 
Threading and Bandwidth Utilization in bztransmit: bztransmit has two modes: threaded and non-threaded.  If a customer sets the number of threads to "1" in the Backblaze "Settings..." (see the "Performance" tab) then the customer can control the amount of bandwidth the client uses (change the "Throttle" slider) to be as low as 128 Kbits/sec upload rate, up to about 10 Mbits/sec which is the approximate maximum upload speed of 1 thread (unthrottled).  However, the maximum upload speed for 1 thread varies depending on how far the customer's laptop is from the Backblaze datacenter due to latency issues (for example, the maximum upload speed from New Zealand might be only 1 Mbit/sec), and other things can affect the maximum upload speed like the size of files (small files murder uploaded performance due to the setup and tear down overhead of HTTPS).  Setting the client to use 1 upload thread is also the most SSD efficient setting, the very minimum number of copies of any file are made this way, usually this is "zero copies" - bztransmit reads the file from SSD into RAM, compresses it in RAM, encrypts it in RAM, and transmits it from RAM through HTTPS to the Backblaze datacenter.  Ok, so the OTHER mode of bztransmit occurs when the customer sets the number of threads to "2" or more, and it can go up to 30 threads.  Each thread runs at maximum speed, so with 30 threads at 10 Mbits/sec it is possibly for the customer to use up to 300 Mbits/sec of upload capacity (if they are close enough to the Backblaze datacenter, and if they have a fast enough SSD that can keep up).  When using 2 - 30 threads, Backblaze makes a copy of each file before handing the copy off to a unique thread, so the threaded mode of operation requires 1 more temporary copy of each file be made on the SSD.

A note about thread names: bztransmit uses "full memory protected processes" to implement threading.  Just so the names of the threads are unique, the Backblaze installer makes IDENTICAL (down to the last byte) copies of the bztransmit executable named unique things like this on Windows:
     C:\Program Files (x86)\Backblaze\x64\bztrans_thread00.exe
     C:\Program Files (x86)\Backblaze\x64\bztrans_thread01.exe
     C:\Program Files (x86)\Backblaze\x64\bztrans_thread02.exe
     C:\Program Files (x86)\Backblaze\x64\bztrans_thread03.exe
     C:\Program Files (x86)\Backblaze\x64\bztrans_thread04.exe
       .... etc ....
     C:\Program Files (x86)\Backblaze\x64\bztrans_thread18.exe
     C:\Program Files (x86)\Backblaze\x64\bztrans_thread19.exe
 
It is the same on the Macintosh, but found in the folder /Library/Backblaze.bzpkg/ with the same names as above.  By assigning uniquely named executables to do each task, customers can watch the (now "named") threads come and go in "Activity Monitor" on the Macintosh, and "Task Manager" on Windows.   Now, you might notice there are only 20 of these executable names numbered "00" - "19" and you can use up to 30 threads.  At some point this system is silly and wastes customer disk space, so when bztransmit is using 21 - 30 threads it assigns a task that is supposed to be done by "thread25.exe" to a unique thread of course, but it uses the executable named "bztrans_thread05.exe" to do the task.  In other words, if the thread is numbered #20-29 then subtract 10 from the actual thread number to know which executable name was used.

Resource Load bzfilelist puts on customer laptop: bztransmit does all the encryption, and all the network communication for Backblaze's client, and depending on the customer settings and the size of the customer data it can use as little as 100 MBytes of RAM and a very small CPU load (5% of one core), or it can cause quite a bit of load and RAM use, and use all 16 cores of CPU at the same time.  The worst case situation is this: the customer is using 30 threads and they have a lot of 100 MByte files, this means each bztransmit process could be holding 100 MBytes EACH, for a total of 3 GBytes RAM use just for the data in memory, and it might actually come close to using 4 GBytes of RAM use when you include the extra data structures to figure out what to backup.  Now that is up to the customer, and a customer who wants to backup all night long and has a modern laptop with 16 GBytes of RAM won't even notice it.  But a customer with only a slightly old 8 GByte RAM laptop trying to use their laptop during the middle of the day might want to set Backblaze to only use 10 threads to keep their RAM use way down low at less than 1 GByte.  And any laptop not in the massive initial upload state can EASILY keep up with only 4 or 5 threads backing up in the middle of the day and the load will be quite minimal.

 

All done.

Return to Random Stufff

Return to Ski-Epic home page.