Libgetar Python Module: gtar

Usage

There are currently two main objects to work with in libgetar: gtar.GTAR archive wrappers and gtar.Record objects.

GTAR Objects

These wrap input and output to the zip file format and some minor serialization and deserialization abilities.

class gtar.GTAR

Python wrapper for the GTAR c++ class. Provides basic access to its methods and simple methods to read and write files within archives.

The backend is automatically selected based on the suffix of the given path: if the name ends in ‘.tar’, a tar-format archive will be created, if it ends in ‘.sqlite’ a sqlite-format archive will be created, if it ends in ‘/’ a directory structure (filesystem) “archive” will be created, otherwise a zip-format archive will be created.

The open mode controls how the file will be opened.

  • read: The file will be opened in read-only mode

  • write: A new file will be opened for writing, potentially overwriting an existing file of the same name

  • append: A file will be opened for writing, adding to the end of a file if it already exists with the same name

Parameters
  • path – Path to the file to open

  • mode – Open mode: one of ‘r’, ‘w’, ‘a’

close(self)

Close the file this object is writing to. It is safe to close a file multiple times, but impossible to read from or write to it after closing.

framesWithRecordsNamed(self, names, group=None, group_prefix=None)

Returns ([record(val) for val in names], [frames]) given a set of record names names. If only given a single name, returns (record, [frames]).

Parameters
  • names – Iterable object yielding a set of property names

  • group – Exact group name to select (default: do not filter by group); overrules group_prefix

  • group_prefix – Prefix of group name to select (default: do not filter by group)

getBulkWriter(self)

Get a gtar.BulkWriter context object. These allow for more efficient writes when writing many records at once.

getRecord(self, Record query, index='')

Returns the contents of the given base record and index.

Parameters
  • query (gtar.Record) – Prototypical gtar.Record object describing the record to fetch

  • index (string) – Index used to fetch the record (defaults to index embedded in query)

Note

If an index is passed into this function, it takes precedence over the index embedded in the given record.

getRecordTypes(self, group=None, group_prefix=None)

Returns a python list of all the record types (without index information) available in this archive. Optionally filters results down to records found with a particular group name, if requested.

Parameters
  • group – Exact group name to select (default: do not filter by group); overrules group_prefix

  • group_prefix – Prefix of group name to select (default: do not filter by group)

queryFrames(self, Record target)

Returns a python list of all indices associated with a given record available in this archive

Parameters

target – Prototypical gtar.Record object (the index of which is unused)

readBytes(self, path)

Read the contents of the given location within the archive, or return None if not found

Parameters

path – Path within the archive to write

readPath(self, path)

Reads the contents of a record at the given path. Returns None if not found. If an array is found and the property is present in gtar.widths, reshape into an Nxwidths[prop] array.

Parameters

path – Path within the archive to write

readStr(self, path)

Read the contents of the given path as a string or return None if not found.

Parameters

path – Path within the archive to write

recordsNamed(self, names, group=None, group_prefix=None)

Returns (frame, [val[frame] for val in names]) for each frame which contains records matching each of the given names. If only given a single name, returns (frame, val[frame]) for each found frame. If a property is present in gtar.widths, returns it as an Nxwidths[prop] array.

Parameters
  • names – Iterable object yielding a set of property names

  • group – Exact group name to select (default: do not filter by group); overrules group_prefix

  • group_prefix – Prefix of group name to select (default: do not filter by group)

Example:

g = gtar.GTAR('dump.zip', 'r')

# grab single property
for (_, vel) in g.recordsNamed('velocity'):
    pass

# grab multiple properties
for (idx, (pos, quat)) in g.recordsNamed(['position', 'orientation']):
    pass
staticRecordNamed(self, name, group=None, group_prefix=None)

Returns a static record with the given name. If the property is found in gtar.widths, returns it as an Nxwidths[prop] array. Optionally restricts the search to records with the given group name or group name prefix.

Parameters
  • name – Name of the property to find

  • group – Exact group name to select (default: do not filter by group); overrules group_prefix

  • group_prefix – Prefix of group name to select (default: do not filter by group)

writeArray(self, path, arr, mode=cpp.FastCompress, dtype=None)

Write the given numpy array to the location within the archive, using the given compression mode. This serializes the data into the given binary data type or the same binary format that the numpy array is using.

Parameters
  • path – Path within the archive to write

  • arr – Array-like object

  • mode – Optional compression mode (defaults to fast compression)

  • dtype – Optional numpy dtype to force conversion to

Example:

gtar.writeArray('diameter.f32.ind', numpy.ones((N,)))
writeBytes(self, path, contents, mode=cpp.FastCompress)

Write the given contents to the location within the archive, using the given compression mode.

Parameters
  • path – Path within the archive to write

  • contents – Bytestring to write

  • mode – Optional compression mode (defaults to fast compression)

writePath(self, path, contents, mode=cpp.FastCompress)

Writes the given contents to the given path, converting as necessary.

Parameters
  • path – Path within the archive to write

  • contents – Object which can be converted into array or string form, based on the given path

  • mode – Optional compression mode (defaults to fast compression)

writeRecord(self, Record rec, contents, mode=cpp.FastCompress)

Writes the given contents to the path specified by the given record.

Parameters
  • recgtar.Record object specifying the record

  • contents – [byte]string or array-like object to write

  • mode – Optional compression mode (defaults to fast compression)

writeStr(self, path, contents, mode=cpp.FastCompress)

Write the given string to the given path, optionally compressing with the given mode.

Parameters
  • path – Path within the archive to write

  • contents – String to write

  • mode – Optional compression mode (defaults to fast compression)

Example:

gtar.writeStr('params.json', json.dumps(params))

When writing many small records at once, a gtar.BulkWriter object can be used.

class gtar.BulkWriter

Class for efficiently writing multiple records at a time. Works as a context manager.

Parameters

archgtar.GTAR archive object to write within

Example:

with gtar.GTAR('traj.sqlite', 'w') as traj, traj.getBulkWriter() as writer:
    writer.writeStr('notes.txt', 'example text')
writeArray(self, path, arr, mode=cpp.FastCompress, dtype=None)

Write the given numpy array to the location within the archive, using the given compression mode. This serializes the data into the given binary data type or the same binary format that the numpy array is using.

Parameters
  • path – Path within the archive to write

  • arr – Array-like object

  • mode – Optional compression mode (defaults to fast compression)

  • dtype – Optional numpy dtype to force conversion to

Example:

writer.writeArray('diameter.f32.ind', numpy.ones((N,)))
writeBytes(self, path, contents, mode=cpp.FastCompress)

Write the given contents to the location within the archive, using the given compression mode.

Parameters
  • path – Path within the archive to write

  • contents – Bytestring to write

  • mode – Optional compression mode (defaults to fast compression)

writePath(self, path, contents, mode=cpp.FastCompress)

Writes the given contents to the given path, converting as necessary.

Parameters
  • path – Path within the archive to write

  • contents – Object which can be converted into array or string form, based on the given path

  • mode – Optional compression mode (defaults to fast compression)

writeRecord(self, Record rec, contents, mode=cpp.FastCompress)

Writes the given contents to the path specified by the given record.

Parameters
  • recgtar.Record object specifying the record

  • contents – [byte]string or array-like object to write

  • mode – Optional compression mode (defaults to fast compression)

writeStr(self, path, contents, mode=cpp.FastCompress)

Write the given string to the given path, optionally compressing with the given mode.

Parameters
  • path – Path within the archive to write

  • contents – String to write

  • mode – Optional compression mode (defaults to fast compression)

Example:

writer.writeStr('params.json', json.dumps(params))

Creation

# Open a trajectory archive for reading
traj = gtar.GTAR('dump.zip', 'r')
# Open a trajectory archive for writing, overwriting any dump.zip
# in the current directory
traj = gtar.GTAR('dump.zip', 'w')
# Open a trajectory archive for appending, if you want to add
# to the file without overwriting
traj = gtar.GTAR('dump.zip', 'a')

Note that currently, due to a limitation in the miniz library we use, you can’t append to a zip file that’s not using the zip64 format, such as those generated by python’s zipfile module in most cases (it only makes zip64 if it has to for file size or count constraints; I didn’t see anything right off the bat to be able to force it to write in zip64). See Zip vs Zip64 below for solutions.

Simple API

If you know the path you want to read from or store to, you can use GTAR.readPath() and GTAR.writePath():

with gtar.GTAR('read.zip', 'r') as input_traj:
    props = input_traj.readPath('props.json')
    diameters = input_traj.readPath('diameter.f32.ind')

with gtar.GTAR('write.zip', 'w') as output_traj:
    output_traj.writePath('oldProps.json', props)
    output_traj.writePath('mass.f32.ind', numpy.ones_like(diameters))

If you just want to read or write a string or bytestring, there are methods GTAR.readStr(), GTAR.writeStr(), GTAR.readBytes(), and GTAR.writeBytes().

If you want to grab static properties by their name, there is GTAR.staticRecordNamed():

diameters = traj.staticRecordNamed('diameter')

There are two methods that can be used to quickly get per-frame data for time-varying quantities:

  1. GTAR.framesWithRecordsNamed() is useful for “lazy” reading, because it returns the records and frame numbers which can be processed separately before actually reading data. This is especially helpful for retrieving every 100th frame of a file, for example. This is usually the most efficient way to retrieve data.

(velocityRecord, frames) = traj.framesWithRecordsNamed('velocity')
for frame in frames:
    velocity = traj.getRecord(velocityRecord, frame)
    kinetic_energy += 0.5*mass*numpy.sum(velocity**2)

((boxRecord, positionRecord), frames) = traj.framesWithRecordsNamed(['box', 'position'])
good_frames = filter(lambda x: int(x) % 100 == 0, frames)
for frame in good_frames:
    box = traj.getRecord(boxRecord, frame)
    position = traj.getRecord(positionRecord, frame)
    fbox = freud.box.Box(*box)
    rdf.compute(fbox, position, position)
    matplotlib.pyplot.plot(rdf.getR(), rdf.getRDF())
  1. GTAR.recordsNamed(): is useful for iterating over all frames in the archive. It reads and returns the content of the records it finds.

for (frame, vel) in traj.recordsNamed('velocity'):
    kinetic_energy += 0.5*mass*numpy.sum(vel**2)

for (frame, (box, position)) in traj.recordsNamed(['box', 'position']):
    fbox = freud.box.Box(*box)
    rdf.compute(fbox, position, position)
    matplotlib.pyplot.plot(rdf.getR(), rdf.getRDF())

Advanced API

The more complicated API can be used if you have multiple properties with the same name (for example, a set of low-precision trajectories for visualization and a less frequent set of dumps in double precision for restart files).

Finding Available Records

A list of record types (records with blank indices) can be obtained by the following:

traj.getRecordTypes()

This can be filtered further in something like:

positionRecord = [rec for rec in traj.getRecordTypes() if rec.getName() == 'position'][0]

The list of frames associated with a given record can be accessed as:

frames = traj.queryFrames(rec)
Reading Binary Data

To read binary data (in the form of numpy arrays), use the following method:

traj.getRecord(query, index="")

This takes a gtar.Record object specifying the path and an optional index. Note that the index field of the record is nullified in favor of the index passed into the method itself; usage might look something like the following:

positionRecord = [rec for rec in traj.getRecordTypes() if rec.getName() == 'position'][0]
positionFrames = traj.queryFrames(positionRecord)
positions = [traj.getRecord(positionRecord, frame) for frame in positionFrames]

Record Objects

These objects are how you discover what is inside an archive and fetch or store data. Records consist of several fields defining where in the archive the data are stored, what type the data are, and so forth. Probably the most straightforward way to construct one of these yourself is to let the Record constructor itself parse a path within an archive:

rec = Record('frames/0/position.f32.ind')
class gtar.Record

Python wrapper for the c++ Record class. Provides basic access to Record methods. Initializes in different ways depending on the number of given parameters.

  • No arguments: default constructor

  • 1 argument: Parse the given path

  • 6 arguments: Fill each field of the Record object (group, name, index, behavior, format, resolution)

getBehavior(self)

Returns the behavior field of this object

getFormat(self)

Returns the format field of this object

getGroup(self)

Returns the group field of this object

getIndex(self)

Returns the index field for this object

getName(self)

Returns the name field of this object

getPath(self)

Generates the path of the file inside the archive for this object

getResolution(self)

Returns the resolution field for this object

nullifyIndex(self)

Nullify the index field of this object

setIndex(self, index)

Sets the index field of this object

Tools

gtar.fix

Fix a getar-formatted zip file.

usage: python -m gtar.fix [-h] [-o OUTPUT] input

Command-line zip archive fixer

positional arguments:
  input                 Input zip file to read

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
Output location for fixed zip archive

gtar.cat

Take records from multiple getar-formatted files and place them into an output file. In case of name conflicts, records from the last input file take precedence.

usage: cat.py [-h] [-o OUTPUT] ...

Command-line archive concatenation

positional arguments:
  inputs                Input files to read

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        File to write to

gtar.copy

Copy each record from one getar-formatted file to another.

usage: python -m gtar.copy [-h] input output

Command-line archive copier or translator

positional arguments:
  input           Input file to read
  output          File to write to

optional arguments:
  -h, --help      show this help message and exit

gtar.read

Create an interactive python shell with the given files opened for reading.

usage: read.py [-h] ...

Interactive getar-format archive shell

positional arguments:
  inputs      Input files to open

optional arguments:
  -h, --help  show this help message and exit

Enums: OpenMode, CompressMode, Behavior, Format, Resolution

class gtar.OpenMode

Enum for ways in which an archive file can be opened

Read
Write
Append
class gtar.CompressMode

Enum for ways in which files within an archive can be compressed

NoCompress
FastCompress
MediumCompress
SlowCompress
class gtar.Behavior

Enum for how properties can behave over time

Constant
Discrete
Continuous
class gtar.Format

Formats in which binary properties can be stored

Float32
Float64
Int32
Int64
UInt8
UInt32
UInt64
class gtar.Resolution

Resolution at which properties can be recorded

Text
Uniform
Individual