Supported libgetar Backends¶
Zip¶
The zip backend uses zip64 archives (see Zip vs Zip64) to store data, optionally compressed using the deflate algorithm. The zip format consists of a series of “local headers” followed by content, with a central directory at the very end of the file which lists the locations of all files present in the archive to allow for efficient random acces. This makes it possible for forcefully-killed processes to leave zip files without a central index; see Zip Central Directories.
Performance-wise, the zip format reads, writes, and opens files at a not-unbearably-slow rate. Its main drawback is the reliance on the presence of the central directory.
Tar¶
The tar backend stores data in the standard tar format, currently with no option of compression. The tar format stores a file header just before the data of each file, but with no global index in the standard format. Libgetar builds a global index upon opening a tar file, which consists of scanning through the entire archive file by file. Tar files should be robust to process death; in the worst case, only part of the data of a file is written.
The tar format involves the least overhead of any libgetar backend, so it is fast to read and write. However, building the index quickly becomes time-consuming for large archives with many files stored inside, causing file opens to be slow.
Sqlite¶
The sqlite backend stores data in an sqlite database. Currently, each write is implemented as a transaction, which causes the write speed to be low for large numbers of records (see the sqlite faq). Data are stored uncompressed or compressed with LZ4 and LZ4HC. Unfortunately, storing data in sqlite breaks the ability to use common archive tools to inspect and manipulate stored data, so these are less portable outside of libgetar. Because transactions are atomic, sqlite databases are robust to process death.
The sqlite backend should be expected to have moderately fast open
speeds, slow write speeds (for large numbers of independent writes;
use a C++ BulkWriter
object to write multiple records
within a single transaction), and fast read speeds.
Directory¶
The experimental directory backend stores data directly on the filesystem. Currently, data are only stored uncompressed. Because each file access occurs in the filesystem, this backend is extremely robust to process death.
Backend Summary¶
In summary:
Zip
Pros
Reasonably fast at everything
“Good” compression ratio
Cons
Weak to process death
Tar
Pros
Fast reads and writes
Resilient
Cons
Slow to open with many files in an archive
No compression
Sqlite
Pros
Fast for reading and opening
Resilient
Fast but less-powerful compression (LZ4)
Cons
No standard archive-type tools
Slow for many individual writes (use
BulkWriter
for bulk writes)
Directory
Pros
Native writing speed
Extremely resilient
Cons
No compression
Could stress filesystem with many entries