Backup#

As explained in the Projects configuration and Settings section, jobflow-remote uses a MongoDB database to store the information about the state of Jobs and Flows. This is defined in the queue section of the project configuration. In several circumstances it may be required to perform a backup of this database. For this reason jobflow-remote offers an option to create a dump of the relevant collections for a project and restore it if needed.

Warning

This functionality does not create a backup of the JobStore containing the output of the workflows. Since the output store can be any kind of Store and the result may be split in the additional_stores, if a backup is needed it will be required to do that through the JobStore or directly with the storage system.

There are two options to create and restore a backup. The default relies on the official MongoDB tools: mongodump and mongorestore. For this to work the MongoDB database tools need to be installed. The connection details provided in the project configuration will be used to execute the commands. This is the preferred option, since it is faster and also dumps and restores all the metadata of the collections. However, not all the connection options defined in the queue Store may be supported or it may be not possible to install the tools. For this reason a second option, based on a pure python implementation is also available. This can be activated by selecting the --python option from the CLI.

Warning

The python version of the backup and restore will not preserve the metadata of the collection. After restoring a backup with this option it would be better to regenerate the standard indexes using the jf admin index rebuild command.

Note

It is of course possible to manually create a backup using the MongoDB tools. This jobflow-remote feature is meant to ease the procedure by automatically selecting the appropriate collections to backup.

Create a backup#

A backup can be created with the command:

jf backup create

As already mentioned, this will use the mongodump executable, unless the --python option is specified. It is possible to specify the destination path of the backup and the output folder contains the jobs.bson, flows.bson and jf_auxiliary.bson files. If the mongodump command is used, the folder will also contain the metadata files for each collection. It is also possible to request that the backup files will be gzipped, by adding the --compress option.

Note

The folder creation follows the convention of the mongodump executable, so inside the folder specified in the create command there will be a subfolder with the name of the database.

Note

The name of the files will be the standard ones, even if the names of the collections defined in the project configuration file are different.

Restore a backup#

To restore a backup the following command can be used:

jf backup restore /path/to/backup/folder

The path should point to the folder containing the bson files generated during the creation. The code will automatically determine if the files are zipped, based on their extension.

Note

The name of the target collection are determined by the values defined in the project settings, not by the names of the files, nor by the names of the collections from which the backup was created.

Note

The backup can be restored only in an empty database. The code will raise an error if the target database already contains jobs and flows.