From be9a84d7042541363a5c1b57c5f18b23ca02cde8 Mon Sep 17 00:00:00 2001 From: Torsten Grote <t@grobox.de> Date: Thu, 4 Mar 2021 16:04:15 -0300 Subject: [PATCH] Add storage design document --- storage/doc/design.md | 402 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 402 insertions(+) create mode 100644 storage/doc/design.md diff --git a/storage/doc/design.md b/storage/doc/design.md new file mode 100644 index 00000000..59e149c9 --- /dev/null +++ b/storage/doc/design.md @@ -0,0 +1,402 @@ +# Overview + +This is a design document for Seedvault Storage backup. +It is heavily inspired by borgbackup, but simplified and adapted to the Android context. + +The aim is to efficiently backup media files from Android's `MediaStore` +and other files from external storage. +Apps and their data are explicitly out of scope +as this is handled already by Seedvault via the Android backup system. +Techniques introduced here might be applied to app backups in the future. + +## Terminology + +A **backup snapshot** (or short backup) represents a collection of files at one point in time. +Making a backup creates such a snapshot and writes it to **backup storage** +which is an abstract location to save files to (e.g. flash drive or cloud storage). +Technically the backup snapshot is a file containing metadata +about the backup such as the included files. +A **backup run** is the process of backing up files i.e. making a backup. + +Large files are split into **chunks** (smaller pieces) by a **chunker**. +Small files are combined to **zip chunks**. + +File information is cached locally in the **files cache** to speed up operations. +There is also the **chunks cache** to cache information about available chunks. + +# Operations + +## Making a backup + +A backup run is ideally (automatically) triggered when + +* the device is charging and connected to an un-metered network in case network storage is used +* a storage medium is plugged in and the user confirmed the run in case removable storage is used + +Files to be backed up are listed based on the user's preference +using Android's `MediaProvider` and `ExternalStorageProvider`. +Tests on real world devices have shown ~200ms list times for `MediaProvider` +and `~10sec` for *all* of `ExternalStorageProvider` +(which is unlikely to happen, because the entire storage volume cannot be selected on Android 11). + +All files will be processed with every backup run. +If a file is found in the cache, it is checked +if its content-modification-indicating attributes have not been modified +and all its chunks are still present in the backup storage. +We might be able to speed up the latter check by initially retrieving a list of all chunks. + +For present unchanged files, an entry will be added to the backup snapshot +and the TTL in the files cache updated. +If a file is not found in cache an entry will be added for it. +New and modified files will be put through a chunker +which splits up larger files into smaller chunks. +Initially, the chunker might just return a single chunk, +the file itself, to simplify the operation. + +A chunk is hashed (with a key / MACed), +then (compressed and) encrypted (with authentication) and written to backup storage, +if it is not already present. +New chunks get added to the chunks cache. +Only after the backup has completed and the backup snapshot was written, +the reference counters of the included chunks will be incremented. + +When all chunks of a file have either been written or were present already, +the file metadata is added to the backup snapshot with its list of chunk IDs and other metadata. + +When all files have been processed, the backup snapshot is finalized +and written (encrypted) to storage. + +If the backup fails, a new run is attempted at the next opportunity creating a new backup snapshot. +Chunks uploaded during the failed run should still be available in backup storage +and the cache with reference count `0` providing an auto-resume. + +After a successful backup, chunks that still have reference count `0` +can be deleted from storage and cache without risking to delete chunks that will be needed later. + +## Removing old backups + +Ideally, the user can decide how many backups should be kept based on available storage capacity. +These could be a number in the yearly/monthly/weekly/daily categories. +However, initially, we might simply auto-prune backups older than a month, +if there have been at least 3 backups within that month (or some similar scheme). + +After doing a successful backup run, is a good time to prune old backups. +To determine which backups to delete, the backup snapshots need to be downloaded and inspected. +Maybe their file name can be the `timeStart` timestamp to help with that task. +If a backup is selected for deletion, the reference counter of all included chunks is decremented. +The backup snapshot file and chunks with reference count of `0` are deleted from storage. + +## Restoring from backup + +When the user wishes to restore a backup, they select the backup snapshot that should be used. +The selection can be done based on time and name. +We go through the list of files in the snapshot, +download, authenticate, decrypt (and decompress) each chunk of the file +and re-assemble the file this way. +Once we have the original chunk, +we might want to re-calculate the chunk ID to check if it is as expected +to prevent an attacker from swapping chunks. +This could also be achieved by including the chunk ID +in the associated data of the authenticated encryption (AEAD). +The re-assembled file will be placed into the same directory under the same name +with its attributes (e.g. lastModified) restored as much as possible on Android. + +Restoring to storage that is already in use is not supported. +However, if a file already exists with the that name and path, +we check if the file is identical to the one we want to restore +(by relying on file metadata or re-computing chunk IDs) +and move to the next if it is indeed identical. +If it is not identical, we could rely on Android's Storage Access Framework +to automatically give it a `(1)` suffix when writing it to disk. +Normally, restores are expected to happen to a clean file system anyway. + +However, if a restore fails, the above behavior should give us a seamless auto-resume experience. +The user can re-try the restore and it will quickly skip already restored files +and continue to download the ones that are still missing. + +After all files have been written to a directory, +we might want to attempt to restore its metadata (and flags?) as well. + + +# Cryptography + +The goal here is to be as simple as possible while still being secure +meaning that we want to primarily conceal the content of the backed up files. +Certain trade-offs have to be made though, +so that for now we do not attempt to hide file sizes. +E.g. an attacker with access to the backup storage might be able to infer +that the Snowden files are part of our backup. +We do however encrypt file names and paths. + +## Master Key + +Seedvault already uses [BIP39](https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki) +to give users a mnemonic recovery code and for generating deterministic keys. +The derived key has 512 bits +and Seedvault uses the first 256 bits as an AES key to encrypt app data (out of scope here). +This key's usage is limited by Android for encryption and decryption. +Therefore, the second 256 bits will be imported into Android's keystore for use with HMAC-SHA256, +so that this key can act as a master key we can deterministically derive additional keys from +by using HKDF ([RFC5869](https://tools.ietf.org/html/rfc5869)). +These second 256 bits must not be used for any other purpose in the future. +We use them for a master key to avoid users having to handle another secret. + +For deriving keys, we are only using the HKDF's second 'expand' step, +because the Android Keystore does not give us access +to the key's byte representation (required for first 'extract' step) after importing it. +This should be fine as the input key material is already a cryptographically strong key +(see section 3.3 of RFC 5869 above). + +## Choice of primitives + +AES-GCM and SHA256 have been chosen, +because [both are hardware accelerated](https://en.wikichip.org/wiki/arm/armv8#ARMv8_Extensions_and_Processor_Features) +on 64-bit ARMv8 CPUs that are used in modern phones. +Our own tests against Java implementations of Blake2s, Blake3 and ChaCha20-Poly1305 +have confirmed that these indeed offer worse performance by a few factors. +C implementations via JNI have not been evaluated though +due to difficulties of building those as part of AOSP. + +## Chunk ID calculation + +We use a keyed hash instead of a normal hash for calculating the chunk ID +to not leak the file content via the public hash. +Using HMAC-SHA256 directly with the master key in Android's key store +resulted in terrible throughput of around 4 MB/sec. +Java implementations of Blake2s and Blake3 performed better, +but by far the best performance gave HMAC-SHA256 +with a key we can hold the byte representation for in memory. + +Therefore, we suggest to derive a dedicated key for chunk ID calculation from the master key +and keep it in memory for as long as we need it. +If an attacker is able to read our memory, +they have access to the entire device anyway +and there's no point anymore in protecting content indicators such as chunk hashes. + +To derive the chunk ID calculation key, we use HKDF's expand step +with the UTF-8 byte representation of "Chunk ID calculation" as info input. + +## Stream Encryption + +When a stream is written to backup storage, +it starts with a header consisting of a single byte indicating the backup format version +followed by the encrypted payload. + +Each chunk and backup snapshot written to backup storage will be encrypted with a fresh key +to prevent issues with nonce/IV re-use of a single key. +Similar to the chunk ID calculation key above, we derive a stream key from the master key +by using HKDF's expand step with the UTF-8 byte representation of "stream key" as info input. +This stream key is then used to derive a new key for each stream. + +Instead of encrypting, authenticating and segmenting a cleartext stream ourselves, +we have chosen to employ the [tink library](https://github.com/google/tink) for that task. +Since it does not allow us to work with imported or derived keys, +we are only using its [AesGcmHkdfStreaming](https://google.github.io/tink/javadoc/tink-android/1.5.0/index.html?com/google/crypto/tink/subtle/AesGcmHkdfStreaming.html) +to delegate encryption and decryption of byte streams. +This follows the OAE2 definition as proposed in the paper +"Online Authenticated-Encryption and its Nonce-Reuse Misuse-Resistance" +([PDF](https://eprint.iacr.org/2015/189.pdf)). + +It adds its own 40 byte header consisting of header length (1 byte), salt and nonce prefix. +Then it adds one or more segments, each up to 1 MB in size. +All segments are encrypted with a fresh key that is derived by using HKDF +on our stream key with another internal random salt (32 bytes) and associated data as info +([documentation](https://github.com/google/tink/blob/master/docs/WIRE-FORMAT.md#streaming-encryption)). + +When writing files/chunks to backup storage, +the authenticated associated data (AAD) will contain the backup version as the first byte +(to prevent downgrade attacks) +followed by a second type byte depending on the type of file written: + +* chunks: `0x00` as type byte and then the chunk ID +* backup snapshots: `0x01` as type byte and then the backup snapshot timestamp + +The chunk ID and the backup snapshot timestamp get added +to prevent an attacker from renaming and swapping files/chunks. + +# Data structures + +## Local caches + +### Files cache + +This cache is needed to quickly look up if a file has changed and if we have all of its chunks. +It will probably be implemented as a sqlite-based Room database +which has shown promising performance in early tests. + +Contents: + +* URI (stripped by scheme and authority?) (`String` with index for fast lookups) +* file size (`Long`) +* last modified in milliseconds (`Long`) +* generation modified (MediaStore only) (`Long`) +* list of chunk IDs representing the file's contents +* zip index in case this file is inside a single zip chunk (`Integer`) +* last seen in milliseconds (`Long`) + +If the file's size, last modified timestamp (and generation) is still the same, +it is considered to not have changed. +In that case, we check that all file content chunks are (still) present in storage. + +If the file has not changed and all chunks are present, +the file is not read/chunked/hashed again. +Only file metadata is added to the backup snapshot. + +As the cache grows over time, we need a way to evict files eventually. +This could happen by checking a last seen timestamp or by using a TTL counter +or maybe even a boolean flag that gets checked after a successful run over all files. +A flag might not be ideal if the user adds/removes folder as backup targets. +Current preference is for using a last seen timestamp. + +The files cache is local only and will not be included in the backup. +After restoring from backup the cache needs to get repopulated on the next backup run +after re-generating the chunks cache. +The URIs of the restored files will most likely differ from the backed up ones. +When the `MediaStore` version changes, +the chunk IDs of all files will need to get recalculated as well. + +### Chunks cache + +This is used to determine whether we already have a chunk, +to count references to it and also for statistics. + +It could be implemented as a table in the same database as the files cache. + +* chunk ID (hex representation of the chunk's MAC) +* reference count +* size + +If the reference count of a chunk reaches `0`, +we can delete it from storage (after a successful backup) +as it isn't used by a backup anymore. + +References are only stored in this local chunks cache. +If the cache is lost (or not available after restoring), +it can be repopulated by inspecting all backup snapshots +and setting the reference count to the number of backup snapshots a chunk is referenced from. + +When making a backup run and hitting the files cache, +we need to check that all chunks are still available on storage. + +## Remote Files + +All types of files written to backup storage have the following format: + + ┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ + ┃ ┃ tink payload (with 40 bytes header) ┃ + ┃ version ┃ ┏━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓ ┃ + ┃ byte ┃ ┃ header length ┃ salt ┃ nonce prefix ┃ encrypted segments ┃ ┃ + ┃ ┃ ┗━━━━━━━━━━━━━━━┻━━━━━━┻━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┛ ┃ + ┗━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ + +### Backup Snapshot + +The backup snapshot contains metadata about a single backup +and is written to the storage after a successful backup run. + +* version - the backup version +* name - a name of the backup +* media files - a list of all `MediaStore` files in this backup + * media type (enum: images, video, audio or downloads) + * name (string) + * relative path (string) + * last modified timestamp (long) + * owner package name (string) + * is favorite (boolean) + * file size (long) + * storage volume (string) + * ordered list of chunk IDs (to re-assemble the file) + * zip index (int) +* document files - a list of all document files from external storage in this backup + * name (string) + * relative path (string) + * last modified timestamp (long) + * file size (long) + * storage volume (string) + * ordered list of chunk IDs (to re-assemble the file) + * zip index (int) +* total size - sum of the size of all files, for stats +* timeStart - when the backup run was started +* timeEnd - when the backup run was finished + +All backup snapshots are stored in the root folder. +The filename is the timeStart timestamp. + +### Chunks + +The encrypted payload of chunks is just the chunk data itself. +All chunks are stored in one of 256 sub-folders +representing the first byte of the chunk ID encoded as a hex string. +The file name is the chunk ID encoded as a (lower-case) hex string. +This is similar to how git stores its repository objects +and to avoid having to store all chunks in a single directory. + +### Zip chunks + +Transferring many very small files causes a substantial overhead +when transferring them to the storage medium. +It would be nice to avoid that. +Michael Rogers proposed the following idea to address this. + +A chunk can either be part of a large file, all of a medium-sized file, +or a (deterministic) zip containing multiple small files. +When creating a backup, we sort the files in the small category by last modification +and pack as many files into each chunk as we can. +Each small file will be stored in the zip chunk under some artificial name +that is unique within the scope of the zip chunk like a counter. +The path to unique name mapping will be stored in the backup snapshot (zip index). +If a small file is inside a zip chunk, +that chunk ID will be listed as the only chunk of the file in the backup snapshot +and likewise for any other files inside that chunk. + +When creating the next backup, if none of the small files have changed, +we just increase the ref count on the existing chunk. +If some of them have changed, they will be added to a new zip chunk +together with other new/changed small files. + +When fetching a chunk for restore, we know in advance whether it is a zip chunk, +because the file we need it for contains the zip index, +so we will not confuse it with a medium-sized zip file. +Then we unzip the zip chunk and extract the file by its zip index (name). + +# Out-of-Scope + +The following features would be nice to have, +but are considered out-of-scope of the current design for time and budget reasons. + +* compression (we initially assume that most files are already sufficiently compressed) +* packing several smaller files into larger combined chunks to improve transfer efficiency +* using a rolling hash to produce chunks in order to increase likelihood of obtaining same chunks + even if file contents change slightly or shift +* external secret-less corruption checks that would use checksums over encrypted data +* supporting different backup clients backing up to the same storage +* concealing file sizes (though zip chunks helps a bit here) + +# Known issues + +## Changes to files can not be detected reliably + +Changes can be detected using file size and lastModified timestamps. +These have only a precision of seconds, +so we can't detect a changes happening within a second of a first change. +Also other apps can reset the lastModified timestamp +preventing us from registering a change if the file size doesn't change. +On Android 11, media files have a generation counter that gets incremented when files changes +to help with this issue. +However, files on external storage still don't have anything similar +and usually also don't trigger `ContentObserver` notifications. + +# Acknowledgements + +The following individuals have reviewed this document and provided helpful feedback. + +* Demi M. Obenour +* Chirayu Desai +* Kevin Niehage +* Michael Rogers +* Thomas Waldmann +* Tom Hacohen + +As they have reviewed different parts and different versions at different times, +this acknowledgement should not be mistaken for their endorsement of the current design +or the final implementation.