2020 Backblaze Client Architecture

(written 8/16/2020)

by Brian Wilson


 

Backblaze Personal Backup (internally to Backblaze it has the nickname "B1") is software that runs on a customer's laptop or desktop and pushes a copy of all their files to the Backblaze datacenter in Sacramento, California or the Backblaze datacenter in Amsterdam, Netherlands.  Backblaze charges a fixed price of $6/month per laptop for this service.  The flat fee includes any external drives as long as they are physically connected to the laptop or desktop.  This document describes in detail the architecture of the Backblaze client that runs on the customer's laptop or desktop computer.  This includes what executables exist on disk and what they do, the data structures Backblaze uses, and the flow of how the backups work.

Terminology:

  1. "Laptop" - this document uses the term "laptop" to mean "customer laptop or desktop", because it is clumsy to mention "customer laptop or desktop" in every sentence, and more than half the computers that Backblaze backs up are laptops nowadays.  A "laptop" is always a customer's computer.  Using the term "laptop" is also helpful to clarify that it is a customer's computer, because "datacenters" (see below) run by Backblaze never have any laptops in them.
     
  2. "Datacenter" or "Backblaze datacenter" - the physical location and the "servers" (see below) that Backblaze run to store the backups for retrieval later.  These datacenters are in Sacramento, California and in Phoenix, Arizona, and in Amsterdam, Netherlands. 
     
  3. "Region" or "Backblaze region" - Backblaze supports storing your backup in a selected region.  When this document was written, there are two regions: "US West" (Sacramento), and "EU Central" (Amsterdam).  Notice that there might be more than one "datacenter" in a single region - that is because when Backblaze (the company) runs out of space in one datacenter, Backblaze (the company) has to go find more space in an additional datacenter that is "very near the other datacenters in that region".
     
  4. "Server" or "Backblaze Servers" - the computers that run in the Backblaze datacenters.  The "client" (see below) communicates to the servers.
     
  5. "Client" or "Backblaze Client" - the collection of executable programs that make up what runs on a customer laptop.  This collection of executables includes a GUI (graphical interface) in one executable, and separate from that are a collection of executables that send the files found on the laptop to the Backblaze servers over the HTTPS protocol.
     
  6. "SSD" - Solid State Drive - for all intents and purposes you can substitute "Hard Drive" where-ever you see "SSD" if appropriate in your computer.  Most modern laptops only use SSDs, hard drives are disappearing from the world other than in datacenters.  But if you own a laptop that is 10 years old it might not contain an SSD, it might be an old fashion very slow hard drive instead. 
     
  7. "Account" or "Backblaze account" or "Backblaze Web Account" - customers create an account on the Backblaze website, and their account is where their data can be retrieved later, and where the customers specify payment for the service with a Credit Card.  Backblaze accounts each have a 12 hexadecimal digit unique id such as: "fd123d6faf4a" assigned at account creation time that never changes, however all accounts have exactly 1 customer email address associated with them, and that email address is the "username" that the customers use to sign in, so we often say "Backblaze accounts are defined by an email address".  Email addresses for accounts are absolutely globally unique, we never allow a second customer to create an account with an email address that is currently "in use" on any other Backblaze account anywhere in the world.  Customers sign into their account at: https://secure.backblaze.com/user_signin.htm
     
  8. "Host" or "Computer" - unfortunately these terms are used interchangeably for the customer's laptop and/or the backup of that laptop.  The regrettable term "host" was a term we used early in Backblaze's history for the customer's laptop, so it is hard to get rid of from our terminology.
     
  9. "HGUID" or "hguid" - stands for "Host Globally Unique Identifier" - this is a 24 digit hexadecimal number that describes one "laptop backup".  An example is "bf654d292a61ca0e193e908f".  Hguids are globally unique, no two backups will ever have the same hguid.  One customer account (one email address) can have multiple backups inside it.  For example, if one customer with email address "joe@corp.com" owns a Macintosh laptop running Backblaze and a Windows laptop running Backblaze, that customer's one Backblaze account will have two separate hguids inside of it - they define two different backups.  When preparing a restore, the customer would sign into their one Backblaze account, and choose which of the two backups to restore from.
     
  10. "volume" or "attached volume" or "partition" - A Macintosh customer laptop's SSDs or hard drives are organized by what are called "volumes" on the Macintosh.  Usually there is 1 volume per SSD or hard drive, but technically advanced customers sometimes create two or more "volumes" on one physical device (one SSD or hard drive) to make it appear as if two separate SSDs exist to the computer instead of just 1 SSD.  On Windows and Linux it is often called a "partition" but it is the IDENTICAL concept.  In this document I  use the word "volume" or "partition" interchangeably, and you have to substitute the appropriate word for the platform you are working with at the time.  Also, it is so common that there is 1 volume per SSD, that when I write "SSD" it means a logical volume.  As far as the Operating System is concerned, two volumes on one SSD are simply two different SSDs.  On Windows each partition gets a separate drive letter, so if a customer has three partitions it might appear as if they have 3 physical SSDs attached to the computer named "C:\", "E:\" and "G:\" or something like that, when in reality it is one SSD.  Backblaze would treat that situation as if there were 3 entirely separate physical SSDs with one partition each.  Backblaze only cares about volumes (partitions), in reality Backblaze does not care about the number of different physical SSDs attached, Backblaze deals at the "logical level" of volumes and partitions.
     
  11. "Volume Guid" or "vguid" or "BzVolumeGuid" - a 28 character globally unique identifier of one customer's laptop's volumes, and it ALWAYS starts with the letter "v".  Here is an example: v000c0101f6fb58de90a713a0e19   The laptop's "boot volume" (also known as the "system volume") ALWAYS starts with the characters "v000", and then subsequent volumes start with "v001" and "v002", etc to make it easy to get a little information quickly (like which is the boot volume).
     
  12. "Java Time" or "Milliseconds Since 1970" - The Backblaze client has to communicate with the Backblaze datacenters, and most of the code in the Backblaze datacenters is written in the programming language "Java".  Java has a pretty standard measure of time which is the number of milliseconds since 1970.  The Backblaze client uses this measure of time for everything (on both Macintosh and Windows) such as the "time a file was last modified".  This usually looks like a string of 16 hexidecimals units such as "00000173f855ec2b".  You can find web pages all over the internet which will convert that to a human readable date and time for you by copying and pasting the "Java Time" into the web page.
     
  13. "Utf8" or "Utf-8" or "Unicode" - Filenames on all modern computers (Windows, Macintosh, Linux, iOS, Android) are encoded in "Unicode", and all modern web browsers use Unicode.  All that means is you can type a single string with letters from different languages, like this: "Hello,여보세요, こんにちは, 你好".  The document you are reading is in Unicode.  The Backblaze client always uses "Utf-8" encoding for everything, which is one of the most standard forms of Unicode.  For most English speakers this just looks like (and acts like) regular old text and most people reading this document don't need to worry about what it means to be in "Unicode" or "Utf-8", so if this doesn't make sense to you, just ignore it.  This is a technical point for people who speak other languages, and their filenames are in other languages (which Backblaze profoundly supports).
     
  14. "bz_done" or "bz_done files" - These are a record stored on the customer's laptop of what has been backed up (what has been "done" already).  These files are the most important data structure on the customer's laptop and cannot ever be edited or deleted by the customer, or the backup is hopelessly corrupted.  On Windows you can find these files in C:\ProgramData\Backblaze\bzdata\bzbackup\bzdatacenter\ and on Macintosh in /Library/Backblaze.bzpkg/bzdata/bzbackup/bzdatacenter/ and you can think of it as one bz_done file is started every 3 days, with names like this: "bz_done_20200813_0.dat".  The date in the filename is when it was started, and it rolls to the next "date" after 3 or 4 days.  AGAIN, DO NOT CHANGE ONE SINGLE CHARACTER IN ANY bz_done FILE, IT WILL CORRUPT THE  BACKUP.  You can safely make a copy of one of these files into some other folder like C:\temp\ and then open the copy safely in WordPad on Windows, or TextEdit on the Mac.  The "bz_done" files are an APPEND ONLY format, by definition they can only grow, because you append new information to the end of them.  They are a complete record of everything that has been sent to the Backblaze datacenter.  The "bz_done" files are periodically encrypted and then sent to Backblaze's datacenter, and when a customer signs into the website and goes to "View/Restore" files, the "tree of files" that is seen on the "View/Restore" web page is literally created by reading through the bz_done file that was sent earlier.  You can watch a 57 minute tutorial on how to understand the internals of bz_done files here: https://www.youtube.com/watch?v=MOlz36nLbwA  You can view a slide used in that presentation here that documents what every column does by clicking this link.
     
  15. "bz_comb" or "bzcomb files" - When the Backblaze client sends the bz_done files to the Backblaze datacenter, it first reads them from the local laptop SSD into laptop RAM, then compresses the bz_done file, then encrypts it the same way the Backblaze client encrypts all user files before transmitting them, and sends the encrypted blob of data up through HTTPS (double encryption) to Backblaze.  These encrypted blobs that contain bz_done files are named "bz_comb" files inside the Backblaze datacenter.  Customers will never hear this phrase, this term is mostly used internally to Backblaze, and it's just important to know it is identically in every way to a bz_done file, but compressed and encrypted.  The name "bz_comb" is very unfortunate, it means "combined file", and is a historical term.  Inside of the bz_comb format, the file NAMES of customer files are encrypted for absolute privacy, but a small amount of the other information contained in bz_done files that is used for various cleanup processes by the Backblaze datacenter are not, and the two separate files are "combined" into one file by appending them end on end - simply to increase upload speed over transmitting them separately as two separate HTTPS requests.
     
  16. "Inherit Backup State" - The "backup state" is primarily the "bz_done" files on the customer's laptop, plus 3 or 4 other settings files.  It is a list of what has already been backed up.  "Inherit Backup State" is a feature in the Backblaze client where a customer can purchase a new laptop, and avoid re-uploading all of their data files from scratch to the Backblaze datacenter.  The customer avoids the re-upload by installing a Backblaze trial, then the "Inherit" feature downloads the copy of the bz_done files that are stored in the Backblaze datacenter to their current laptop.  After that, if the customer made any local changes to their directory structure or added new files, the normal backup processes detect those changes within a few hours and happily continue forward doing incremental backups to incorporate those new changes.
     
  17. "log files" or "customer logs" - Friendly, easy to read log files created on the customer laptop that are safe to browse (or even change, they are absolutely not used by the backup program in any way other than informational.  These files are found in this folder on the customer laptop:  C:\ProgramData\Backblaze\bzdata\bzlogs\ on Windows, and on Macintosh in /Library/Backblaze.bzpkg/bzdata/bzlogs/ Then the name of the executable that created the log file, like "bztransmit" (see below this terminology section for a list of executables), then there is one log file for each day the client performs a backup.  For instance, on Windows there might be a log file named C:\ProgramData\Backblaze\bzdata\bzlogs\bztransmit\bztransmit16.log which are all the logs bztransmit created on the 16th day of the month.  The log files are compressed after 2 days to save customer disk space, and the log files are deleted after 26 days to not grow forever on the customer laptop.  You can open the log files with WordPad on Windows, and TextEdit on the Mac - make the window very very wide and turn off line wrapping to make the logs format better and be more readable.  If a customer has an issue, I usually start debugging that issue by opening the bztransmit log files and searching for the word "ERROR" all in capitals.  Just because a line says "ERROR" doesn't mean it is the customer's issue - for example if in the middle of an HTTPS transmit of one customer data file to the Backblaze datacenter the customer WiFi is turned off, this will say "ERROR" in the log file.  But if there are HUNDREDS of the same ERROR in the log files it usually points to the problem.
     

 

Overview of Executables that Make Up the Backblaze Client:
The "Backblaze client" is actually a collection of the 8 executables listed below.  Each executable runs at different times and for entirely different reasons.

  1. bzserv - the core Backblaze service that MUST run all the time for any backup to occur.  It launches other executables (see below) to perform the actual backup.  bzserv has no UI components, and is therefore largely cross platform between Windows and Macintosh.
     
    Location on Disk Windows: C:\Program Files (x86)\Backblaze\bzserv.exe
    Location on Disk Macintosh: /Library/Backblaze.bzpkg/bzserv
     
    Purpose of bzserv: The process "bzserv" runs all the time as a "service" on the laptop, in order to launch the OTHER executables at the correct times.  The other executables (see below) perform the actual backup, bzserv can be thought of as "a scheduler" who's primary job is to NOT take any CPU and NOT take any RAM and NOT take any SSD performance, but to keep running at all costs.  bzserv is required to run as a service all the time the laptop is running (not shut down), even when the customer is logged out of their laptop, even if the customer is using the backup schedule "Only When I Click <Backup Now>" or "Once Per Day".  Any customer who shuts down bzserv is provably insane, doesn't know what they are doing, and is not qualified to operate a computer because bzserv takes no computer resources, period.  The only valid way of stopping bzserv from running is to entirely uninstall the Backblaze client. If bzserv is not running, that is what is causing ALL of the customer's problems, and it is the number one problem to solve, the first problem to solve, and the ONLY problem to solve.  bzserv runs as the user "SYSTEM" on Windows, and as the user "root" on Macintosh.

    Resource Load bzserv puts on customer laptop: 0.00000001% of one core of CPU (bzserv is one single thread), and 0.000001% extra load on the SSD, and about 3 MBytes of RAM (0.0375% of an 8 GByte RAM computer - far far less than 1% of the customer RAM - 3.7 hundredths of 1% of the customer RAM).
     
    Click here for a deeper description and analysis of what bzserv does.
     
  2. bzfilelist - walks the entire file system on the SSD on the customer's laptop looking for new and changed files to backup.  bzfilelist is always launched by bzserv.  bzfilelist creates lists of files, but absolutely does not transmit them anywhere.  bzfilelist completely lacks the ability to do network HTTPS communication, it profoundly cannot do anything but create lists of files for other executables to consume.  bzfilelist has no UI components, and is therefore largely cross platform between Windows and Macintosh.  bzfilelist runs as the user "SYSTEM" on Windows, and as the user "root" on Macintosh because it is always launched by parent process "bzserv" (see above).
     
    Location on Disk Windows: C:\Program Files (x86)\Backblaze\bzfilelist.exe
    Location on Disk Macintosh: /Library/Backblaze.bzpkg/bzfilelist
     
    Purpose of bzfilelist: The primary purpose of "bzfilelist" is to create the complete list of filenames with associated modification dates for each attached SSD or hard drive.  Each SSD's (each "volume's") list of files is stored in a separate file.  These lists of filenames with their last modification date are found at C:\ProgramData\Backblaze\bzdata\bzfilelists\ on Windows, and /Library/Backblaze.bzpkg/bzdata/bzfilelists/ on the Macintosh.  The name of the list of files starts with the BzVolumeGuid.  For the primary boot (system) volume that BzVolumeGuid begins with letters "v000", then subsequent drives start with "v001" and then "v002" and so on.  Here is an example of the list's filename from Windows: C:\ProgramData\Backblaze\bzdata\bzfilelists\v000c0101f6fb58de90a713a0e19_c____filelist.dat and you can open that with WordPad on Windows, or TextEdit on the Macintosh.  The name "v000c0101f6fb58de90a713a0e19_c____filelist.dat" always starts with the "volume guid" then has an underbar, then a friendly description of the drive (in my example above this is "_c____" to indicate this is the "C:\" Windows drive (on Macintosh the system boot drive would have the string "_root_"), then always ends with "filelist.dat".  These "per drive lists of files" are produced approximately once per hour, but it might be once every two hours, or even longer for customers with extremely large volumes.  There is a guarantee that the list of files with the name above is ALWAYS VALID and ALWAYS PRESENT for other programs to read and use, but it might be 1 or 2 hours "out of date" waiting for the next list of files to be produced.  If a new list of files is being produced by bzfilelist the new INCOMPLETE list of files has the same name, but at the end of it is appended "_future".

    Inside of one of these lists named things like "v000c0101f6fb58de90a713a0e19_c____filelist.dat" the very first line inside that file is when that list of files was created, it looks like:
    # GmtMillisThisListWasStarted: 00000173f855ec2b, GmtDateTime: 20200816173407
    Those are actually the identical date and time, the first one is the number of milliseconds since 1970, and the second one is human readable and says it is year "2020", month "08", day "16", then hours, minutes, and seconds.

    After that first line, the rest of the contents are pretty self explanatory.  The first letter on each line is an "f" for a file, a <tab> character, then the last modified timestamp (in milliseconds since 1970), then another <tab> character, then the number of bytes contained in the file, then another <tab> character, then the filename in completely pure (non-encoded) Utf8.  When the character '\n' (end of line) is encountered, that marks the end of that one filename.  Because this Utf-8 is not encoded in any way, this is extremely fast, there is no encode or decode step.

    Resource Load bzfilelist puts on customer laptop: bzfilelist only runs for maybe 10 minutes once an hour on most customer's laptops.  It is designed to use less than 1% of one core of CPU (bzfilelist is one single thread), and less than 1% extra load on the SSD, and while it is running bzfilelist might use about 20 MBytes of RAM or less (0.25% of an 8 GByte RAM computer - one fourth of 1% of the customer RAM).
     
    Click here for a deeper description and analysis of what bzfilelist does.
     
  3. bztransmit - sends copies of new and changed files through HTTPS to the Backblaze datacenter.  bztransmit is always launched by bzserv.  This is the main work horse of the Backblaze client, and where most of the logic surrounding the backups occurs.  bztransmit has no UI components, and is therefore largely cross platform between Windows and Macintosh.  bztransmit runs as the user "SYSTEM" on Windows, and as the user "root" on Macintosh because it is always launched by parent process "bzserv" (see above).
     
    Location on Disk Windows: C:\Program Files (x86)\Backblaze\bztransmit.exe (and a 64 bit version at C:\Program Files (x86)\Backblaze\x64\bztransmit64.exe)
    Location on Disk Macintosh: /Library/Backblaze.bzpkg/bztransmit (the Macintosh version is ONLY 64 bit, Apple has not shipped a 32 bit laptop in over a decade)
     
    Purpose of bztransmit: The primary purpose of "bztransmit" is to make the logical decision of which files to backup (based on the lists prepared by bzfilelist), read those customer files into RAM, compress them (lossless compression), then encrypt the compressed file (still held in RAM), then send this encrypted file over HTTPS (double encryption) to the Backblaze datacenter as a backup.  The bztransmit executable also updates the local laptop's record of what files with what "last modification date" have already been sent to Backblaze so that it doesn't have to back up a file twice.  The records of what files have already been backed up is called the "bz_done" files.

    The encryption bztransmit uses is symmetric AES-128 to encrypt each file, with new AES keys and Initialization Vector for every customer data file being different.  The AES keys themselves are then encrypted with 2048 bit public/private key encryption using the customer's public key which is stored on the client in C:\Program Files (x86)\Backblaze\userPub.pem on Windows, and /Library/Backblaze.bzpkg/userPub.pem on Macintosh.  The "private key" is stored on the Backblaze servers ITSELF (the private key) encryped with a totally standard OpenSSL "passphrase".  By default this is a closely guarded secret passphrase only known by Backblaze and rotated (changed) regularly for security reasons.  However, the customer can optionally choose to setup a "Private Encryption Key" in the Backblaze client which changes the passphrase to something only the customer knows.  This passphrase is required to restore any files from the backup, and is not recoverable in any way, so the customers who set this up need to remember it or their backup is useless.
     
    What Order does bztransmit upload files?  Which files are backed up first?  In general, Backblaze backs up in file size order, small files first.  Backblaze DOES NOT backup folders, or folders of files, or group the backup into folders.  All files are individual files.  So the client will backup 1 small file from one folder, then backup another small file from a different folder next, then return to the first folder to backup any larger files.  This often confuses customers who are closely watching their backup, and they think Backblaze has "skipped over" one of their files in a folder, when in reality Backblaze will loyally return to that folder when it is backing up larger files.  There are exceptions to the rule of "small files first" as follows: 1) Backblaze really wants to backup at least one file from every different volume FIRST, so the smallest file from each of the attached SSDs or Hard Drives is put to the start of the queue - this is to give the customer immediate feedback that Backblaze saw and acknowledged that each volume is part of the backup and has been refreshed recently, and 2) If the backup was paused or the laptop shut down in the middle of transmitting a large file, Backblaze attempts to complete the transmission of that large file before going back to start at the small files.  This is to avoid wasting all the effort that went into backing up half of a very large file every time the laptop goes to sleep.
     
    How does bztransmit handle really small files? For any files less than 15 MBytes in one file, bztransmit "batches" up to 999 of these files into one packed datastructure for more efficient transmitting.  The HTTPS set and tear down murder performance for small files, so doing an individual HTTPS POST for each 1 byte or 2 byte file is painfully slow.  So bztransmit fully prepares the files in all the standard ways (read the file, compress the file, encrypt the file) then appends the finished package end-on-end, and transmits it to the Backblaze datacenter as one HTTPS POST operation.  It still adheres to the "small files first", so initially this can be 999 files each of which are 1, 2, or 3 bytes each, so the finished HTTPS POST operation is still relatively small at 1 - 3 KBytes in total length.  But this is approximately 1,000 time faster than doing each file individually, so it is worth it.  Later on, as the "batch of files" is being assembled, bztransmit stops appending more compressed, encrypted small files when the size of any one "batch" file gets larger than 30 MBytes in size.  So there might only be 2 files in a single "batch" HTTPS POST when bztransmit is transmitting 14 MByte files.  Once each individual file is larger than 15 MBytes the setup and teardown of HTTPS shrinks to be less than 1% of the transmission time, so bztransmit does one HTTPS POST for each file 15 MBytes or larger.  The cutoff of 15 MBytes was originally decided so that a single HTTPS POST would not time out even on the very slowest customer connections.
     
    How bztransmit handles large files: For any files between 15 MBytes - 100 MBytes, bztransmit reads the file from disk, compresses the file, encrypts the file, and transmits it to the Backblaze datacenter as one unit.  For large files this becomes impractical, so for files larger than 100 MBytes bztransmit FIRST makes an entire copy of the file broken down into 10 MByte "chunks" that are found in C:\ProgramData\Backblaze\bzdata\bzbackup\bzdatacenter\bzcurrentlargefile\ on Windows, and /Library/Backblaze.bzpkg/bzdata/bzbackup/bzdatacenter/bzcurrentlargefile/ on Macintosh.  That folder can be changed to an external drive if the customer changes the "Temporary data drive" in the "Settings..." client panel.  The original "cutoff" for self contained files was 30 MBytes (not the current 100 MBytes) for two reasons: 99% of customer files were smaller than 30 MBytes each, and that was small enough where the HTTPS POST of a single file did not time out on even the slowest customer connections.  I raised this to 100 MBytes in 2018 for both reasons had changed.  The very slowest upload connections any customer had was now at least 3x faster making it possible, and a lot of large images and some music files were no longer fitting inside 30 MBytes in a single file, but 99% of individual files still fit inside of 100 MBytes.  So that is the modern cutoff point.
     
    Threading and Bandwidth Utilization in bztransmit: bztransmit has two modes: threaded and non-threaded.  If a customer sets the number of threads to "1" in the Backblaze "Settings..." (see the "Performance" tab) then the customer can control the amount of bandwidth the client uses (change the "Throttle" slider) to be as low as 128 Kbits/sec upload rate, up to about 10 Mbits/sec which is the approximate maximum upload speed of 1 thread (unthrottled).  However, the maximum upload speed for 1 thread varies depending on how far the customer's laptop is from the Backblaze datacenter due to latency issues (for example, the maximum upload speed from New Zealand might be only 1 Mbit/sec), and other things can affect the maximum upload speed like the size of files (small files murder uploaded performance due to the setup and tear down overhead of HTTPS).  Setting the client to use 1 upload thread is also the most SSD efficient setting, the very minimum number of copies of any file are made this way, usually this is "zero copies" - bztransmit reads the file from SSD into RAM, compresses it in RAM, encrypts it in RAM, and transmits it from RAM through HTTPS to the Backblaze datacenter.  Ok, so the OTHER mode of bztransmit occurs when the customer sets the number of threads to "2" or more, and it can go up to 30 threads.  Each thread runs at maximum speed, so with 30 threads at 10 Mbits/sec it is possibly for the customer to use up to 300 Mbits/sec of upload capacity (if they are close enough to the Backblaze datacenter, and if they have a fast enough SSD that can keep up).  When using 2 - 30 threads, Backblaze makes a copy of each file before handing the copy off to a unique thread, so the threaded mode of operation requires 1 more temporary copy of each file be made on the SSD.

    A note about thread names: bztransmit uses "full memory protected processes" to implement threading.  Just so the names of the threads are unique, the Backblaze installer makes IDENTICAL (down to the last byte) copies of the bztransmit executable named unique things like this on Windows:
         C:\Program Files (x86)\Backblaze\x64\bztrans_thread00.exe
         C:\Program Files (x86)\Backblaze\x64\bztrans_thread01.exe
         C:\Program Files (x86)\Backblaze\x64\bztrans_thread02.exe
         C:\Program Files (x86)\Backblaze\x64\bztrans_thread03.exe
         C:\Program Files (x86)\Backblaze\x64\bztrans_thread04.exe
           .... etc ....
         C:\Program Files (x86)\Backblaze\x64\bztrans_thread18.exe
         C:\Program Files (x86)\Backblaze\x64\bztrans_thread19.exe
     
    It is the same on the Macintosh, but found in the folder /Library/Backblaze.bzpkg/ with the same names as above.  By assigning uniquely named executables to do each task, customers can watch the (now "named") threads come and go in "Activity Monitor" on the Macintosh, and "Task Manager" on Windows.   Now, you might notice there are only 20 of these executable names numbered "00" - "19" and you can use up to 30 threads.  At some point this system is silly and wastes customer disk space, so when bztransmit is using 21 - 30 threads it assigns a task that is supposed to be done by "thread25.exe" to a unique thread of course, but it uses the executable named "bztrans_thread05.exe" to do the task.  In other words, if the thread is numbered #20-29 then subtract 10 from the actual thread number to know which executable name was used.

    Resource Load bzfilelist puts on customer laptop: bztransmit does all the encryption, and all the network communication for Backblaze's client, and depending on the customer settings and the size of the customer data it can use as little as 100 MBytes of RAM and a very small CPU load (5% of one core), or it can cause quite a bit of load and RAM use, and use all 16 cores of CPU at the same time.  The worst case situation is this: the customer is using 30 threads and they have a lot of 100 MByte files, this means each bztransmit process could be holding 100 MBytes EACH, for a total of 3 GBytes RAM use just for the data in memory, and it might actually come close to using 4 GBytes of RAM use when you include the extra data structures to figure out what to backup.  Now that is up to the customer, and a customer who wants to backup all night long and has a modern laptop with 16 GBytes of RAM won't even notice it.  But a customer with only a slightly old 8 GByte RAM laptop trying to use their laptop during the middle of the day might want to set Backblaze to only use 10 threads to keep their RAM use way down low at less than 1 GByte.  And any laptop not in the massive initial upload state can EASILY keep up with only 4 or 5 threads backing up in the middle of the day and the load will be quite minimal.
     
    Click here for a deeper description and analysis of what bztransmit does.
     
  4. bzbui (called "bzbmenu" on the Macintosh Activity Monitor) - this is the client's local laptop GUI (Graphical User Interface).  Because bzbui is all UI components, it written in different languages between Windows (C++) and Macintosh (Objective C), and shares very little code.  For a customer to bring up the bzbui GUI, in Windows they click on a "Backblaze red flame" icon in the system tray.  On the Macintosh, they pull down the "black flame" icon along the very top right of their monitor, or go to the Macintosh System Preferences and click on the "Backblaze" system pref.  bzbui (bzbmenu on the Macintosh) runs as the current user logged in, so that it has permissions to access the keyboard and mouse for input.  bzbui (bzbmenu on the Macintosh) does not run AT ALL unless the user is currently fully logged into their laptop with their local laptop's username and password (completely different than the Backblaze account username which is an email address and Backblaze account password).
     
    Location on Disk Windows: C:\Program Files (x86)\Backblaze\bzbui.exe
    Location on Disk Macintosh: /Library/Backblaze.bzpkg/bzbmenu.app (and a Macintosh System Pref Panel)
     
    Purpose of bzbui: The primary purpose of "bzbui" is to present the customer with a local interface to the Backblaze client and local controls for things running on their local laptop.  The most essential thing that bzbui does is edit the file "bzinfo.xml" which is found at C:\ProgramData\Backblaze\bzdata\bzinfo.xml on Windows, and /Library/Backblaze.bzpkg/bzdata/bzinfo.xml on the Macintosh.  The file "bzinfo.xml" is the configuration and instructions for how all the OTHER (background) client executables behave.  For example, if a customer adds a folder to exclude using bzbui, that excluded folder path is added to bzinfo.xml, and so on.  Most everything that occurs in the GUI presented by bzbui simply edits the file bzinfo.xml on the local laptop's SSD.

    The executable bzbui (bzbmenu on the Macintosh) runs as the current user logged in, so that it has access to the GUI.  It is COMPLETELY unnecessary for this to run for the backup to continue, as proven by logging out of the local laptop's account and the backup will continue just fine (better even) than when the user is signed in and bzbui/bzbmenu is running.  It is silly to disable/kill this process as it is so ridiculously light weight, but the process is completely optional and killing it will not affect the backup's progress at all.  Sometimes customers are confused by this, they feel like if they kill this process the backup should stop, but it has literally nothing to do with the backup progress other than writing out configuration files.

    One of the other things bzbui (bzbmenu on the Macintosh) does is that it can "Pause" a running backup (by clicking the GUI button <Pause Backup>) and it can unpause (start the backup again) later if you click the <Backup Now> button. 
     
    Another responsibility of bzbui (bzbmenu on the Macintosh) is to pop up warning and error dialogs if something is wrong, like if the backup is not progressing for some reason.  For example, if the customer's credit card is totally maxed out at the limit, and the payment to Backblaze fails, then Backblaze will both send emails (from the datacenter), and also pop up dialogs on the client to explain the customer needs to fix the billing problem.  In general the customer has 45 days to fix a billing problem, but if they refuse to pay Backblaze for more than 45 days their backup will be deleted from the Backblaze servers to free up space for other (paying) customers.  Another thing bzbui/bzbmenu will pop up a warning dialog about is if the customer has gone too long without plugging in one of their external drives that is "selected for backup" and runs some danger of losing the backup of that one drive.  Another important aspect of bzbui/bzbmenu is to monitor that bzserv is running.  The way it does this is bzserv writes out a "heartbeat" file once every 10 minutes as a kind of "dead man's switch" to prove it is running properly.  If the heartbeat file is missing (not updated) for more than 30 minutes, bzbui/bzbmenu pops up an error dialog explaining there is a VERY PROFOUND problem that must be fixed or the backup cannot continue - since bzserv is required to be resident and running so that it can launch the other backup processes.
     
    The bzbui/bzbmenu process has a few other miscellaneous tasks available in its small pull down menu such as "Inherit Backup State" and displaying an "About..." dialog with the version of the client that is currently installed.

    Resource Load bzbui / bzbmenu puts on customer laptop: bzbui / bzbmenu is extremely small and efficient, and ESPECIALLY when the interface is not up on the screen (which is how most customers run Backblaze 99.9999% of the time when not changing any configurations).  It is designed to use less than 0.001% of one core of CPU (bzbui is one single thread), and less than 0.001% extra load on the SSD, and it might use at most might use about 30 MBytes - 40 MBytes of RAM or less (0.5% of an 8 GByte RAM computer - one half of 1% of the customer RAM).  It should be one of the smallest RAM uses of any process on a customer's laptop.
     
    Click here for a deeper description and analysis of what bzbzui / bzbmenu does.
     
  5. Honorable Mention: bzfclean - This process is never run for any reason normally.  It is only run as the very very final step of "Uninstall" of the entire Backblaze client.
     
    Location on Disk Windows: C:\Program Files (x86)\Backblaze\bzfclean.exe
    Location on Disk Macintosh: /Library/Backblaze.bzpkg/bzfclean
     
    Purpose of bzfclean: This absolutely tiny (4 KBytes) program has no GUI, and it is run as the very final step when a customer uninstalls the entire Backblaze client from their local laptop.  Backblaze prides itself on a completely clean uninstall - no registry entries left behind, and zero files or folders left behind on the customer's laptop.  On Windows computers, it is difficult to uninstall the very last executable running the uninstaller, because running an executable on Windows means you cannot also delete it.  An executable cannot delete itself.  To work around this issue, Backblaze copies bzfclean to a temporary folder that Windows will clean up automatically at a later date and RUNS IT FROM THAT LOCATION.  When the bzfclean executable is run as the final step in the uninstaller, the uninstaller runs bzfclean and then IMMEDIATELY exits itself (unlocking it's own executable).  So when bzfclean runs, it first wakes up as a running process, and then it very consciously "pauses" itself for 2 or 3 seconds to let the uninstaller exit and quit running, then this tiny little executable reaches back and deletes the uninstaller, leaving no trace behind.
     
  6. Honorable Mention: bzdoinstall - The Backblaze client installer is a self contained executable that also includes all the files and executables to be installed inside of itself.  This is called a "Self Extracting Archive" in old computer science terms.  Backblaze does not use an "off the shelf installer" like "InstallShield" or "Wise Installer", the installer is written and maintained in house by the client software engineers. When the Backblaze client installer runs, the self extracting program (interally in the Backblaze build tree this is called "bzserlfextractor") unpacks all of it's internally contained files to install into a TEMPORARY folder first, including "bzdoinstall".  Then the self extracting program is all done with it's primary task, and the final step is to launch "bzdoinstall" which has a GUI to present to the customer so the customer can enter their customer email address and Backblaze password to complete the install.  The executable "bzdoinstall" authenticates with the Backblaze website, copies the executables to their correct final locations, and finally presents a progress dialog to the customer as the laptop's SSD is scanned for the very first time for the initial list of files to upload.
     
  7. Honorable Mention: bzdownloader - This is technically not part of the Backblaze client in that it has nothing at all to do with backing up the computer.  bzdownloader does not run EVER as part of the backup process.  What bzdownloader does is help customers download their free ZIP file restores they have prepared on the Backblaze website.  Right after we finished the Backblaze Personal Backup client and the web restore process 13 years ago we thought we finally had a complete product.  In product development terms we would have called this a "minimum viable product" - a product that doesn't satisfy everybody and has some rough edges, but you could sell it for money and some customers would find it useful.  However, IMMEDIATELY a profound problem appeared that AT THAT TIME no web browser could download any file larger than 2 GBytes.  Period.  This was before the HTTP Range Header had been implemented by any web browsers (the "Range" header specifies a sub range of a large file, and was finally adopted in 2014 - 7 years after the Backblaze client was launched).  So for any customer to get back more than 2 GBytes of restored files (which is comically small) the Backblaze client team of 1 Windows programmer and 1 Macintosh programmer had to furiously create "bzdownloader" while ignoring any backup client bug fixes.  The bzdownloader can use up to 30 threads to download different parts (what would later be called "different ranges") of a file at the same time.  All of the Backblaze restore servers have 10 Gbit/sec ethernet on them, and the vast majority of USA customers have 1 Gbit/sec download capacity now, so bzdownloader is designed around 40 MByte blocks where it can download 30 of them at the same time, to reach speeds of something like 1 Gbit/sec OR HIGHER to download restores.  The bzdownloader is offered up as an executable program that doesn't even have an installer of any kind when a customer goes to download their web based ZIP restore.  Finally, on Windows the bzdownloader has one additional responsibility - to unzip the ZIP restore once it has finished downloading it.  The reason for this is that up until Windows 10, the "unzip" functionality built into Windows was atrocious.  Up until Windows 10, the built in Windows Explorer had "Unzip", but if you ever clicked "Unzip" on any ZIP file larger than 2 GBytes, it would run for 30 minutes then crash.  ACTUALLY CRASH.  Not one of the 20,000 programmers at Microsoft could be bothered to check the ZIP file size with 2 lines of code and pop up an error dialog with ANOTHER 4 lines of code that said "Microsoft Is Unable to Unzip files larger than 2 GBytes - go Install WinZip or something."  So the bzdownloader on Windows bundles a free program called "7-zip" that does a pretty good job, and the bzdownloader uses the command line version of 7-zip to unzip the newly downloaded ZIP files.  This additional functionality is only needed on Windows, the Macintosh Finder does a pretty good job of handling unzipping without additional software.
     
  8. Honorable Mention: the bztrans_thread (0 - 19) executables - Every one of these executables is an identical copy to the others, and is a copy of the original "bztransmit" executable.  If you see these processes in "Activity Monitor" on the Macintosh, or "Task Manager" on Windows, they are explicitly for transmitting files to the Backblaze datacenter when a customer is running with more than one thread.  See the notes in the "bztransmit" section above.
     

Future Idea for notes around the Backblaze Personal Backup Client:
Look through brianwski's reddit posts, copy and paste contents into here, or link directly to them. 

Some Notes on International Strings in the Backblaze Client:
<fill out some info here> - Maybe possibly link to internationalization video (might need edits): <redacted>

Documentation around bz_done files on the Backblaze Client:
You can watch a 57 minute tutorial on how to understand the internals of bz_done files here: https://www.youtube.com/watch?v=MOlz36nLbwA  You can view a slide used in that presentation here that documents what every column does by clicking this link.

 

All done.

Return to Random Stufff

Return to Ski-Epic home page.