Update storage design document and add some TODOs to the code
This commit is contained in:
parent
858954e82d
commit
e77d5cad2c
4 changed files with 89 additions and 53 deletions
|
@ -28,30 +28,31 @@ There is also the **chunks cache** to cache information about available chunks.
|
||||||
|
|
||||||
## Making a backup
|
## Making a backup
|
||||||
|
|
||||||
A backup run is ideally (automatically) triggered when
|
A backup run is usually triggered automatically when
|
||||||
|
|
||||||
* the device is charging and connected to an un-metered network in case network storage is used
|
* the device is charging and connected to an un-metered network in case network storage is used
|
||||||
* a storage medium is plugged in and the user confirmed the run in case removable storage is used
|
* a storage medium is plugged in (and the user confirmed the run) in case removable storage is used
|
||||||
|
|
||||||
Files to be backed up are listed based on the user's preference
|
Files to be backed up are scanned based on the user's preference
|
||||||
using Android's `MediaProvider` and `ExternalStorageProvider`.
|
using Android's `MediaProvider` and `ExternalStorageProvider`.
|
||||||
Tests on real world devices have shown ~200ms list times for `MediaProvider`
|
Tests on real world devices have shown ~200ms scan times for `MediaProvider`
|
||||||
and `~10sec` for *all* of `ExternalStorageProvider`
|
and `~10sec` for *all* of `ExternalStorageProvider`
|
||||||
(which is unlikely to happen, because the entire storage volume cannot be selected on Android 11).
|
(which is unlikely to happen, because the entire storage volume cannot be selected on Android 11).
|
||||||
|
|
||||||
All files will be processed with every backup run.
|
All files included in backups will be scanned with every backup run.
|
||||||
If a file is found in the cache, it is checked
|
If a file is found in the cache, it is checked
|
||||||
if its content-modification-indicating attributes have not been modified
|
if its content-modification-indicating
|
||||||
|
(size, lastModified and generation for media files)
|
||||||
|
have not been modified
|
||||||
and all its chunks are still present in the backup storage.
|
and all its chunks are still present in the backup storage.
|
||||||
We might be able to speed up the latter check by initially retrieving a list of all chunks.
|
For the latter check, we initially retrieve a list of all chunks available on backup storage.
|
||||||
|
|
||||||
For present unchanged files, an entry will be added to the backup snapshot
|
For present unchanged files, an entry will be added to the backup snapshot
|
||||||
and the TTL in the files cache updated.
|
and the lastSeen timestamp in the files cache updated.
|
||||||
If a file is not found in cache an entry will be added for it.
|
If a file is not found in the cache, an entry will be added for it.
|
||||||
New and modified files will be put through a chunker
|
New and modified files will be put through a chunker
|
||||||
which splits up larger files into smaller chunks.
|
which splits up larger files into smaller chunks.
|
||||||
Initially, the chunker might just return a single chunk,
|
Very small files are combined into larger zip chunks for transfer efficiency.
|
||||||
the file itself, to simplify the operation.
|
|
||||||
|
|
||||||
A chunk is hashed (with a key / MACed),
|
A chunk is hashed (with a key / MACed),
|
||||||
then (compressed and) encrypted (with authentication) and written to backup storage,
|
then (compressed and) encrypted (with authentication) and written to backup storage,
|
||||||
|
@ -68,9 +69,9 @@ and written (encrypted) to storage.
|
||||||
|
|
||||||
If the backup fails, a new run is attempted at the next opportunity creating a new backup snapshot.
|
If the backup fails, a new run is attempted at the next opportunity creating a new backup snapshot.
|
||||||
Chunks uploaded during the failed run should still be available in backup storage
|
Chunks uploaded during the failed run should still be available in backup storage
|
||||||
and the cache with reference count `0` providing an auto-resume.
|
and in the cache with reference count `0`, providing a seamless auto-resume.
|
||||||
|
|
||||||
After a successful backup, chunks that still have reference count `0`
|
After a *successful* backup run, chunks that still have reference count `0`
|
||||||
can be deleted from storage and cache without risking to delete chunks that will be needed later.
|
can be deleted from storage and cache without risking to delete chunks that will be needed later.
|
||||||
|
|
||||||
## Removing old backups
|
## Removing old backups
|
||||||
|
@ -80,11 +81,14 @@ These could be a number in the yearly/monthly/weekly/daily categories.
|
||||||
However, initially, we might simply auto-prune backups older than a month,
|
However, initially, we might simply auto-prune backups older than a month,
|
||||||
if there have been at least 3 backups within that month (or some similar scheme).
|
if there have been at least 3 backups within that month (or some similar scheme).
|
||||||
|
|
||||||
After doing a successful backup run, is a good time to prune old backups.
|
After a successful backup run is a good time to prune old backups.
|
||||||
To determine which backups to delete, the backup snapshots need to be downloaded and inspected.
|
To determine which backups to delete, the backup snapshots need to be downloaded and inspected.
|
||||||
Maybe their file name can be the `timeStart` timestamp to help with that task.
|
Their file name can be derived from their `timeStart` timestamp to help with that task.
|
||||||
If a backup is selected for deletion, the reference counter of all included chunks is decremented.
|
If a backup is selected for deletion, the reference counter of all included chunks is decremented.
|
||||||
The backup snapshot file and chunks with reference count of `0` are deleted from storage.
|
Note that a backup snapshot can reference a single chunk several times.
|
||||||
|
The reference counter however refers to the number of snapshots references it,
|
||||||
|
not the number of files.
|
||||||
|
The backup snapshot file and chunks with reference count of `0` are then deleted from storage.
|
||||||
|
|
||||||
## Restoring from backup
|
## Restoring from backup
|
||||||
|
|
||||||
|
@ -94,28 +98,29 @@ We go through the list of files in the snapshot,
|
||||||
download, authenticate, decrypt (and decompress) each chunk of the file
|
download, authenticate, decrypt (and decompress) each chunk of the file
|
||||||
and re-assemble the file this way.
|
and re-assemble the file this way.
|
||||||
Once we have the original chunk,
|
Once we have the original chunk,
|
||||||
we might want to re-calculate the chunk ID to check if it is as expected
|
we could re-calculate the chunk ID to prevent an attacker from swapping chunks.
|
||||||
to prevent an attacker from swapping chunks.
|
However, we instead include the chunk ID
|
||||||
This could also be achieved by including the chunk ID
|
in the associated data of the authenticated encryption (AEAD) which should have the same effect.
|
||||||
in the associated data of the authenticated encryption (AEAD).
|
|
||||||
The re-assembled file will be placed into the same directory under the same name
|
The re-assembled file will be placed into the same directory under the same name
|
||||||
with its attributes (e.g. lastModified) restored as much as possible on Android.
|
with its attributes (e.g. lastModified) restored as much as possible on Android.
|
||||||
|
|
||||||
Restoring to storage that is already in use is not supported.
|
Restoring to storage that is already in use is not supported.
|
||||||
However, if a file already exists with the that name and path,
|
However, if a file already exists with the that name and path,
|
||||||
we check if the file is identical to the one we want to restore
|
we could check if the file is identical to the one we want to restore
|
||||||
(by relying on file metadata or re-computing chunk IDs)
|
(by relying on file metadata or re-computing chunk IDs)
|
||||||
and move to the next if it is indeed identical.
|
and move to the next if it is indeed identical.
|
||||||
If it is not identical, we could rely on Android's Storage Access Framework
|
If it is not identical, we rely on Android's Storage Access Framework
|
||||||
to automatically give it a `(1)` suffix when writing it to disk.
|
to automatically give it a `(1)` suffix when writing it to disk or add one manually.
|
||||||
Normally, restores are expected to happen to a clean file system anyway.
|
Normally, restores are expected to happen to a clean file system anyway.
|
||||||
|
|
||||||
However, if a restore fails, the above behavior should give us a seamless auto-resume experience.
|
However, if a restore fails, the above behavior (not implemented in first iteration)
|
||||||
|
should give us a seamless auto-resume experience.
|
||||||
The user can re-try the restore and it will quickly skip already restored files
|
The user can re-try the restore and it will quickly skip already restored files
|
||||||
and continue to download the ones that are still missing.
|
and continue to download the ones that are still missing.
|
||||||
|
|
||||||
After all files have been written to a directory,
|
After all files have been written to a directory,
|
||||||
we might want to attempt to restore its metadata (and flags?) as well.
|
we might want to attempt to restore its metadata (and flags?) as well.
|
||||||
|
However, restoring directory metadata is not implemented in first iteration.
|
||||||
|
|
||||||
|
|
||||||
# Cryptography
|
# Cryptography
|
||||||
|
@ -134,12 +139,12 @@ Seedvault already uses [BIP39](https://github.com/bitcoin/bips/blob/master/bip-0
|
||||||
to give users a mnemonic recovery code and for generating deterministic keys.
|
to give users a mnemonic recovery code and for generating deterministic keys.
|
||||||
The derived key has 512 bits
|
The derived key has 512 bits
|
||||||
and Seedvault uses the first 256 bits as an AES key to encrypt app data (out of scope here).
|
and Seedvault uses the first 256 bits as an AES key to encrypt app data (out of scope here).
|
||||||
This key's usage is limited by Android for encryption and decryption.
|
Unfortunately, this key's usage is currently limited by Android to encryption and decryption.
|
||||||
Therefore, the second 256 bits will be imported into Android's keystore for use with HMAC-SHA256,
|
Therefore, the second 256 bits will be imported into Android's keystore for use with HMAC-SHA256,
|
||||||
so that this key can act as a master key we can deterministically derive additional keys from
|
so that this key can act as a master key we can deterministically derive additional keys from
|
||||||
by using HKDF ([RFC5869](https://tools.ietf.org/html/rfc5869)).
|
by using HKDF ([RFC5869](https://tools.ietf.org/html/rfc5869)).
|
||||||
These second 256 bits must not be used for any other purpose in the future.
|
These second 256 bits must not be used for any other purpose in the future.
|
||||||
We use them for a master key to avoid users having to handle another secret.
|
We use them for a master key to avoid users having to handle yet another secret.
|
||||||
|
|
||||||
For deriving keys, we are only using the HKDF's second 'expand' step,
|
For deriving keys, we are only using the HKDF's second 'expand' step,
|
||||||
because the Android Keystore does not give us access
|
because the Android Keystore does not give us access
|
||||||
|
@ -162,12 +167,13 @@ due to difficulties of building those as part of AOSP.
|
||||||
We use a keyed hash instead of a normal hash for calculating the chunk ID
|
We use a keyed hash instead of a normal hash for calculating the chunk ID
|
||||||
to not leak the file content via the public hash.
|
to not leak the file content via the public hash.
|
||||||
Using HMAC-SHA256 directly with the master key in Android's key store
|
Using HMAC-SHA256 directly with the master key in Android's key store
|
||||||
resulted in terrible throughput of around 4 MB/sec.
|
resulted in terrible throughput of around 4 MB/sec,
|
||||||
|
presumably because file data needs to enter the secure element to get hashed there.
|
||||||
Java implementations of Blake2s and Blake3 performed better,
|
Java implementations of Blake2s and Blake3 performed better,
|
||||||
but by far the best performance gave HMAC-SHA256
|
but by far the best performance gave HMAC-SHA256
|
||||||
with a key we can hold the byte representation for in memory.
|
with a key we can hold the byte representation for in memory.
|
||||||
|
|
||||||
Therefore, we suggest to derive a dedicated key for chunk ID calculation from the master key
|
Therefore, we derive a dedicated key for chunk ID calculation from the master key
|
||||||
and keep it in memory for as long as we need it.
|
and keep it in memory for as long as we need it.
|
||||||
If an attacker is able to read our memory,
|
If an attacker is able to read our memory,
|
||||||
they have access to the entire device anyway
|
they have access to the entire device anyway
|
||||||
|
@ -201,15 +207,15 @@ It adds its own 40 byte header consisting of header length (1 byte), salt and no
|
||||||
Then it adds one or more segments, each up to 1 MB in size.
|
Then it adds one or more segments, each up to 1 MB in size.
|
||||||
All segments are encrypted with a fresh key that is derived by using HKDF
|
All segments are encrypted with a fresh key that is derived by using HKDF
|
||||||
on our stream key with another internal random salt (32 bytes) and associated data as info
|
on our stream key with another internal random salt (32 bytes) and associated data as info
|
||||||
([documentation](https://github.com/google/tink/blob/master/docs/WIRE-FORMAT.md#streaming-encryption)).
|
([documentation](https://github.com/google/tink/blob/v1.5.0/docs/WIRE-FORMAT.md#streaming-encryption)).
|
||||||
|
|
||||||
When writing files/chunks to backup storage,
|
When writing files/chunks to backup storage,
|
||||||
the authenticated associated data (AAD) will contain the backup version as the first byte
|
the authenticated associated data (AAD) will contain the backup version as the first byte
|
||||||
(to prevent downgrade attacks)
|
(to prevent downgrade attacks)
|
||||||
followed by a second type byte depending on the type of file written:
|
followed by a second type byte depending on the type of file written:
|
||||||
|
|
||||||
* chunks: `0x00` as type byte and then the chunk ID
|
* chunks: `0x00` as type byte and then the byte representation of the chunk ID
|
||||||
* backup snapshots: `0x01` as type byte and then the backup snapshot timestamp
|
* backup snapshots: `0x01` as type byte and then the backup snapshot timestamp as int64 bytes
|
||||||
|
|
||||||
The chunk ID and the backup snapshot timestamp get added
|
The chunk ID and the backup snapshot timestamp get added
|
||||||
to prevent an attacker from renaming and swapping files/chunks.
|
to prevent an attacker from renaming and swapping files/chunks.
|
||||||
|
@ -221,8 +227,8 @@ to prevent an attacker from renaming and swapping files/chunks.
|
||||||
### Files cache
|
### Files cache
|
||||||
|
|
||||||
This cache is needed to quickly look up if a file has changed and if we have all of its chunks.
|
This cache is needed to quickly look up if a file has changed and if we have all of its chunks.
|
||||||
It will probably be implemented as a sqlite-based Room database
|
It is implemented as a sqlite-based Room database
|
||||||
which has shown promising performance in early tests.
|
which had shown promising performance in early tests.
|
||||||
|
|
||||||
Contents:
|
Contents:
|
||||||
|
|
||||||
|
@ -232,7 +238,7 @@ Contents:
|
||||||
* generation modified (MediaStore only) (`Long`)
|
* generation modified (MediaStore only) (`Long`)
|
||||||
* list of chunk IDs representing the file's contents
|
* list of chunk IDs representing the file's contents
|
||||||
* zip index in case this file is inside a single zip chunk (`Integer`)
|
* zip index in case this file is inside a single zip chunk (`Integer`)
|
||||||
* last seen in milliseconds (`Long`)
|
* last seen in epoch milliseconds (`Long`)
|
||||||
|
|
||||||
If the file's size, last modified timestamp (and generation) is still the same,
|
If the file's size, last modified timestamp (and generation) is still the same,
|
||||||
it is considered to not have changed.
|
it is considered to not have changed.
|
||||||
|
@ -242,41 +248,53 @@ If the file has not changed and all chunks are present,
|
||||||
the file is not read/chunked/hashed again.
|
the file is not read/chunked/hashed again.
|
||||||
Only file metadata is added to the backup snapshot.
|
Only file metadata is added to the backup snapshot.
|
||||||
|
|
||||||
As the cache grows over time, we need a way to evict files eventually.
|
If a file's URI should ever change, it will be considered as a new file,
|
||||||
This could happen by checking a last seen timestamp or by using a TTL counter
|
so read/chunked/hashed again, but if it hasn't otherwise changed,
|
||||||
or maybe even a boolean flag that gets checked after a successful run over all files.
|
its chunks will not be written to storage again
|
||||||
A flag might not be ideal if the user adds/removes folder as backup targets.
|
(except for small files that get added to a new zip chunk).
|
||||||
Current preference is for using a last seen timestamp.
|
|
||||||
|
As the cache grows over time, we need a way to evict files eventually
|
||||||
|
(not implemented in first iteration).
|
||||||
|
This can happen by checking the last seen timestamp
|
||||||
|
and delete all files we haven't seen for some time (maybe a month).
|
||||||
|
|
||||||
The files cache is local only and will not be included in the backup.
|
The files cache is local only and will not be included in the backup.
|
||||||
After restoring from backup the cache needs to get repopulated on the next backup run
|
After restoring from backup the cache needs to get repopulated on the next backup run.
|
||||||
after re-generating the chunks cache.
|
This will happen automatically, because before each backup run we check cache consistency
|
||||||
|
and repopulate the cache if we find it inconsistent with what we have in backup storage.
|
||||||
The URIs of the restored files will most likely differ from the backed up ones.
|
The URIs of the restored files will most likely differ from the backed up ones.
|
||||||
When the `MediaStore` version changes,
|
When the `MediaStore` version changes,
|
||||||
the chunk IDs of all files will need to get recalculated as well.
|
the chunk IDs of all files will need to get recalculated as well
|
||||||
|
(not implemented in first iteration),
|
||||||
|
because we can't be sure about their new state.
|
||||||
|
|
||||||
### Chunks cache
|
### Chunks cache
|
||||||
|
|
||||||
This is used to determine whether we already have a chunk,
|
This is used to determine whether we already have a chunk,
|
||||||
to count references to it and also for statistics.
|
to count references to it and also for statistics.
|
||||||
|
|
||||||
It could be implemented as a table in the same database as the files cache.
|
It is implemented as a table in the same database as the files cache.
|
||||||
|
|
||||||
* chunk ID (hex representation of the chunk's MAC)
|
* chunk ID (hex representation of the chunk's MAC)
|
||||||
* reference count
|
* reference count
|
||||||
* size
|
* size
|
||||||
|
* backup version byte (currently 0)
|
||||||
|
|
||||||
If the reference count of a chunk reaches `0`,
|
If the reference count of a chunk reaches `0`,
|
||||||
we can delete it from storage (after a successful backup)
|
we can delete it from storage (after a successful backup run)
|
||||||
as it isn't used by a backup anymore.
|
as it isn't used by a backup snapshot anymore.
|
||||||
|
|
||||||
References are only stored in this local chunks cache.
|
References are only stored in this local chunks cache.
|
||||||
If the cache is lost (or not available after restoring),
|
If the cache is lost (or not available after restoring),
|
||||||
it can be repopulated by inspecting all backup snapshots
|
it can be repopulated by inspecting all backup snapshots
|
||||||
and setting the reference count to the number of backup snapshots a chunk is referenced from.
|
and setting the reference count to the number of backup snapshots a chunk is referenced from.
|
||||||
|
|
||||||
When making a backup run and hitting the files cache,
|
When making a backup run and hit the files cache,
|
||||||
we need to check that all chunks are still available on storage.
|
we check that all chunks are still available on storage.
|
||||||
|
|
||||||
|
The backup version number of a chunk is stored, so we can know without downloading the chunk
|
||||||
|
with what backup version it was written.
|
||||||
|
This might be useful when increasing the backup version and changing the chunk format in the future.
|
||||||
|
|
||||||
## Remote Files
|
## Remote Files
|
||||||
|
|
||||||
|
@ -325,11 +343,11 @@ The filename is the timeStart timestamp.
|
||||||
### Chunks
|
### Chunks
|
||||||
|
|
||||||
The encrypted payload of chunks is just the chunk data itself.
|
The encrypted payload of chunks is just the chunk data itself.
|
||||||
All chunks are stored in one of 256 sub-folders
|
We suggest that file-system based storage plugins store chunks in one of 256 sub-folders
|
||||||
representing the first byte of the chunk ID encoded as a hex string.
|
representing the first byte of the chunk ID encoded as a hex string.
|
||||||
The file name is the chunk ID encoded as a (lower-case) hex string.
|
The file name is the chunk ID encoded as a (lower-case) hex string.
|
||||||
This is similar to how git stores its repository objects
|
This is similar to how git stores its repository objects
|
||||||
and to avoid having to store all chunks in a single directory.
|
and to avoid having to store all chunks in a single directory which might not scale.
|
||||||
|
|
||||||
### Zip chunks
|
### Zip chunks
|
||||||
|
|
||||||
|
@ -353,11 +371,13 @@ When creating the next backup, if none of the small files have changed,
|
||||||
we just increase the ref count on the existing chunk.
|
we just increase the ref count on the existing chunk.
|
||||||
If some of them have changed, they will be added to a new zip chunk
|
If some of them have changed, they will be added to a new zip chunk
|
||||||
together with other new/changed small files.
|
together with other new/changed small files.
|
||||||
|
Hanging on to the old file inside the still referenced zip chunk longer than necessary
|
||||||
|
should be ok as these files are small.
|
||||||
|
|
||||||
When fetching a chunk for restore, we know in advance whether it is a zip chunk,
|
When fetching a chunk for restore, we know in advance whether it is a zip chunk,
|
||||||
because the file we need it for contains the zip index,
|
because the file we need it for contains the zip index,
|
||||||
so we will not confuse it with a medium-sized zip file.
|
so we will not confuse it with a medium-sized zip file.
|
||||||
Then we unzip the zip chunk and extract the file by its zip index (name).
|
Then we unzip the zip chunk and extract the file by its zip index.
|
||||||
|
|
||||||
# Out-of-Scope
|
# Out-of-Scope
|
||||||
|
|
||||||
|
@ -365,12 +385,12 @@ The following features would be nice to have,
|
||||||
but are considered out-of-scope of the current design for time and budget reasons.
|
but are considered out-of-scope of the current design for time and budget reasons.
|
||||||
|
|
||||||
* compression (we initially assume that most files are already sufficiently compressed)
|
* compression (we initially assume that most files are already sufficiently compressed)
|
||||||
* packing several smaller files into larger combined chunks to improve transfer efficiency
|
|
||||||
* using a rolling hash to produce chunks in order to increase likelihood of obtaining same chunks
|
* using a rolling hash to produce chunks in order to increase likelihood of obtaining same chunks
|
||||||
even if file contents change slightly or shift
|
even if file contents change slightly or shift
|
||||||
* external secret-less corruption checks that would use checksums over encrypted data
|
* external secret-less corruption checks that would use checksums over encrypted data
|
||||||
* supporting different backup clients backing up to the same storage
|
* supporting different backup clients backing up to the same storage
|
||||||
* concealing file sizes (though zip chunks helps a bit here)
|
* concealing file sizes (though zip chunks helps a bit here)
|
||||||
|
* implementing different storage plugins
|
||||||
|
|
||||||
# Known issues
|
# Known issues
|
||||||
|
|
||||||
|
@ -386,6 +406,18 @@ to help with this issue.
|
||||||
However, files on external storage still don't have anything similar
|
However, files on external storage still don't have anything similar
|
||||||
and usually also don't trigger `ContentObserver` notifications.
|
and usually also don't trigger `ContentObserver` notifications.
|
||||||
|
|
||||||
|
## Android's Storage Access Framework can be unreliable
|
||||||
|
|
||||||
|
Since Seedvault already uses Android's Storage Access Framework (SAF) to store app backups,
|
||||||
|
we re-use this storage that the user has already chosen.
|
||||||
|
So we can avoid making the user choose two storage location
|
||||||
|
and to avoid having to implement another storage backend in the first iteration.
|
||||||
|
However, the SAF can be backed by different storage providers which are not equally reliable.
|
||||||
|
Also, the API is very limited, doesn't allow for atomic operations
|
||||||
|
and doesn't give feedback if file writes completed successfully as they happen asynchronously.
|
||||||
|
The best solution will be to not (only) rely on this storage abstraction API,
|
||||||
|
but at least offer different storage plugins that can operate more reliably.
|
||||||
|
|
||||||
# Acknowledgements
|
# Acknowledgements
|
||||||
|
|
||||||
The following individuals have reviewed this document and provided helpful feedback.
|
The following individuals have reviewed this document and provided helpful feedback.
|
||||||
|
|
|
@ -80,7 +80,7 @@ internal class Backup(
|
||||||
zipChunker = ZipChunker(mac, chunkWriter),
|
zipChunker = ZipChunker(mac, chunkWriter),
|
||||||
)
|
)
|
||||||
|
|
||||||
@Throws(IOException::class)
|
@Throws(IOException::class, GeneralSecurityException::class)
|
||||||
@OptIn(ExperimentalTime::class)
|
@OptIn(ExperimentalTime::class)
|
||||||
suspend fun runBackup(backupObserver: BackupObserver?) {
|
suspend fun runBackup(backupObserver: BackupObserver?) {
|
||||||
backupObserver?.onStartScanning()
|
backupObserver?.onStartScanning()
|
||||||
|
|
|
@ -49,7 +49,7 @@ internal class Pruner(
|
||||||
backupObserver?.onPruneComplete(duration.toLongMilliseconds())
|
backupObserver?.onPruneComplete(duration.toLongMilliseconds())
|
||||||
}
|
}
|
||||||
|
|
||||||
@Throws(IOException::class, SecurityException::class)
|
@Throws(IOException::class, GeneralSecurityException::class)
|
||||||
private suspend fun pruneSnapshot(timestamp: Long, backupObserver: BackupObserver?) {
|
private suspend fun pruneSnapshot(timestamp: Long, backupObserver: BackupObserver?) {
|
||||||
val snapshot = snapshotRetriever.getSnapshot(streamKey, timestamp)
|
val snapshot = snapshotRetriever.getSnapshot(streamKey, timestamp)
|
||||||
val chunks = HashSet<String>()
|
val chunks = HashSet<String>()
|
||||||
|
@ -60,6 +60,8 @@ internal class Pruner(
|
||||||
chunksCache.decrementRefCount(it)
|
chunksCache.decrementRefCount(it)
|
||||||
}
|
}
|
||||||
var size = 0L
|
var size = 0L
|
||||||
|
// TODO add integration test for a failed backup that later resumes with unreferenced chunks
|
||||||
|
// and here only deletes those that are still unreferenced afterwards
|
||||||
val cachedChunksToDelete = chunksCache.getUnreferencedChunks()
|
val cachedChunksToDelete = chunksCache.getUnreferencedChunks()
|
||||||
val chunkIdsToDelete = cachedChunksToDelete.map {
|
val chunkIdsToDelete = cachedChunksToDelete.map {
|
||||||
if (it.refCount < 0) Log.w(TAG, "${it.id} has ref count ${it.refCount}")
|
if (it.refCount < 0) Log.w(TAG, "${it.id} has ref count ${it.refCount}")
|
||||||
|
|
|
@ -38,6 +38,8 @@ internal abstract class AbstractChunkRestore(
|
||||||
tag: String,
|
tag: String,
|
||||||
streamWriter: suspend (outputStream: OutputStream) -> Long,
|
streamWriter: suspend (outputStream: OutputStream) -> Long,
|
||||||
) {
|
) {
|
||||||
|
// TODO check if the file exists already (same name, size, chunk IDs)
|
||||||
|
// and skip it in this case
|
||||||
fileRestore.restoreFile(file, observer, tag, streamWriter)
|
fileRestore.restoreFile(file, observer, tag, streamWriter)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue