Projects configuration and Settings#

Jobflow-remote allows to handle multiple configurations, defined projects. Since for most of the users a single project is enough let us first consider the configuration of a single project. The handling of Multiple Projects will be described below.

Aside from the project options, a set of General Settings can be also be configured through environment variables or an additional configuration file.

Project options#

The project configurations allow to control the behaviour of the Job execution, as well as the other objects in jobflow-remote. Here a full description of the project’s configuration file will be given. If you are looking for a minimal example with its description you can find it in the Configuration section.

The specifications of the project’s attributes are given by the Project pydantic model, that serves the purpose of parsing and validating the configuration files, as well as giving access to the associated objects (e.g. the JobStore). A graphical representation of the Project model and thus of the options available in the configuration file is given below (generated with erdantic)

All-in-one configuration

A description for all the types and keys of the project file is given in the Project specs section below, while an example for a full configuration file can be generated running:

jf project generate --full YOUR_PROJECT_NAME

Note that, while the default file format is YAML, JSON and TOML are also acceptable format. You can generate the example in the other formats using the --format option.

Name and folders#

The project name is given by the name attribute. The name will be used to create a subfolder containing

  • files with the parsed outputs copied from the remote workers

  • logs

  • files used by the daemon

For all these folders the paths are set with defaults, but can be customised setting

tmp_dir, log_dir and daemon_dir.

Warning

The project name does not take into consideration the configuration file name. For coherence it would be better to give use the project name as file name.

Workers#

Multiple workers can be defined in a project. In the configuration file they are given with their name as keyword, and their properties in the contained dictionary.

Several defining properties should be set in the configuration of each workers. First it should be specified the type. At the moment the possible worker types are

  • local: a worker running on the same system as the Runner. No connection is needed for the Runner to reach the queueing system.

  • remote: a worker on a different machine than the Runner, requiring an SSH connection to reach it.

Since the Runner needs to constantly interact with the workers, for the latter type all the credentials to connect automatically should be provided. The best option would be to set up a passwordless connection and define it in the ~/.ssh/config file.

The other key property of the workers is the scheduler_type. It can be any of the values supported by the qtoolkit. Typical values are:

  • shell: the Job is executed directly in the shell. No queue will be used. If not limited, all the Jobs can be executed simultaneously

  • slurm, pbs, …: the name of a queueing system. The job will be submitted to the queue with the selected resources.

Another mandatory argument is work_dir, indicating the full path for a folder on the worker machine where the Jobs will be actually executed.

It is possible to optionally select default values for keywords like pre_run and resources, that can be overridden for individual Jobs. Note that these configurations will be applied to all the Jobs executed by the worker. These are thus more suitable for generic settings (e.g. the activation of a python environment, or loading of some modules), rather than for the specific code configurations. Those can better be set with the Execution configurations.

Note

If a single worker is defined it will be used as default in the submission of new Flows.

JobStore#

The jobstore value contains a dictionary representation of the standard JobStore object defined in jobflow. It can either be the serialized version as obtained by the as_dict module or the representation defined in jobflow’s documentation.

This JobStore will be used to store the outputs of all the Jobs executed in this project.

Note

The JobStore should be defined in jobflow-remote’s configuration file. The content of the standard jobflow configuration file will be ignored.

Queue Store#

The queue element contains the definition of the database containing the state of the Jobs and Flows. The subelement store should contain the representation of a maggma Store. As for the JobStore it can be either its serialization or the same kind of representation used for the docs_store in jobflow’s configuration file.

The collection defined by the Store will contain the information about the state of the Job, while two more collections will be created. The name of these two collections can also be customized.

Warning

The queue Store should be a subclass of the MongoStore and currently it should be based on a real MongoDB (e.g. not a JSONStore). Some key operations required by jobflow-remote on the collections are not supported by any file based MongoDB implementation at the moment.

Execution configurations#

It is possible to define a set of ExecutionConfig objects to quickly set up configurations for different kind of Jobs and Flow. The exec_config key contains a dictionary where the keys are the names associated to the configurations and for each a set of instruction to be set before and after the execution of the Job.

Runner options#

The behaviour of the Runner can also be customized to some extent. In particular the Runner implements an exponential backoff mechanism for retrying when an operation of updating of a Job state fails. The amount of tries and the delay between them can be set max_step_attempts and delta_retry values. In addition some reasonable values are set for the delay between each check of the database for different kind of actions performed by the Runner. These intervals can be changed to better fit your needs. Remind that reducing these intervals too much may put unnecessary strain on the database.

Metadata#

While this does currently not play any role in the execution of jobflow-remote, this can be used to include some additional information to be used by external tools or to quickly distinguish a configuration file among others.

Multiple Projects#

While a single project can be enough for most of the users and for beginners, it may be convenient to define different databases, configurations and python environments to work on different topics. For this reason jobflow-remote will consider as potential projects configuration all the YAML, JSON and TOML files in the ~/.jfremote folder. There is no additional procedure required to add or remove project, aside from creating/deleting a project configuration file.

Warning

Different projects are meant to use different Queue Stores. Sharing the same collections for two projects is not a supported option.

To define the Queue Store for multiple projects two options are available:

  • each project has its own database, with standard collection names

  • a single database is used and each project is assigned a set of collections. For example, a configuration for one of the projects could be:

    queue:
      store:
        type: MongoStore
        database: DB_NAME
        collection_name: jobs_project1
        ...
      flows_collection: flows_project1
      auxiliary_collection: jf_auxiliary_project1
    

    And the same for a second project with different collection names.

There is no constraint for the database and collection used for the output JobStore. Even though it may make sense to separate the sets of outputs, it is possible to share the same collection among multiple projects. In that case the output documents will have duplicated db_id, as each project has its own counter. If this may be an issue it is possible to set different db_id_prefix values in the queue configuration for the different projects.

If more than one project is present and a specific one is not selected, the code will always stop asking for a project to be specified. Python functions like submit_flow and get_jobstore accept a project argument to specify which project should be considered. For the command line interface a general -p allows to select a project for the command that is being executed:

jf -p another_project job list

To define a default project for all the functions and commands executed on the system or in a specific cell see the General Settings section.

Project specs#

Project

Project

Type: object

The configurations of a Project.

No Additional Properties

Name

Type: string

The name of the project

Base Dir

Default: null

The base directory containing the project related files. Default is a folder with the project name inside the projects folder

Type: string
Type: null

Tmp Dir

Default: null

Folder where remote files are copied. Default a 'tmp' folder in base_dir

Type: string
Type: null

Log Dir

Default: null

Folder containing all the logs. Default a 'log' folder in base_dir

Type: string
Type: null

Daemon Dir

Default: null

Folder containing daemon related files. Default to a 'daemon' folder in base_dir

Type: string
Type: null

Type: enum (of string) Default: "info"

The level set for logging

Must be one of:

  • "error"
  • "warn"
  • "info"
  • "debug"

Type: object

The options for the Runner

No Additional Properties

Delay Checkout

Type: integer Default: 30

Delay between subsequent execution of the checkout from database (seconds)

Delay Check Run Status

Type: integer Default: 30

Delay between subsequent execution of the checking the status of jobs that are submitted to the scheduler (seconds)

Delay Advance Status

Type: integer Default: 30

Delay between subsequent advancement of the job's remote state (seconds)

Delay Refresh Limited

Type: integer Default: 600

Delay between subsequent refresh from the DB of the number of submitted and running jobs (seconds). Only used if a worker with max_jobs is present

Delay Update Batch

Type: integer Default: 60

Delay between subsequent refresh from the DB of the number of submitted and running jobs (seconds). Only used if a batch worker is present

Lock Timeout

Default: 86400

Time to consider the lock on a document expired and can be overridden (seconds)

Type: integer
Type: null

Delete Tmp Folder

Type: boolean Default: true

Whether to delete the local temporary folder after a job has completed

Max Step Attempts

Type: integer Default: 3

Maximum number of attempt performed before failing an advancement of a remote state

Delta Retry

Type: array of integer Default: [30, 300, 1200]

List of increasing delay between subsequent attempts when the advancement of a remote step fails

No Additional Items

Each item of this array must be:

Workers

Type: object

A dictionary with the worker name as keys and the worker configuration as values

Each additional property must conform to the following schema


Type: object

Worker representing the local host.

Executes command directly.

No Additional Properties

Type

Type: const Default: "local"

The discriminator field to determine the worker type

Specific value: "local"

Scheduler Type

Type: string

Type of the scheduler. Depending on the values supported by QToolKit

Work Dir

Type: stringFormat: path

Absolute path of the directory of the worker where subfolders for executing the calculation will be created

Resources

Default: null

A dictionary defining the default resources requested to the scheduler. Used to fill in the QToolKit template

Pre Run

Default: null

String with commands that will be executed before the execution of the Job

Post Run

Default: null

String with commands that will be executed after the execution of the Job

Timeout Execute

Type: integer Default: 60

Timeout for the execution of the commands in the worker (e.g. submitting a job)

Max Jobs

Default: null

The maximum number of jobs that can be submitted to the queue.

Default: null

Options for batch execution. If define the worker will be considered a batch worker

Type: object

Configuration for execution of batch jobs.

Allows to execute multiple Jobs in a single process executed on the worker (e.g. SLURM job).

Same definition as BatchConfig
Type: object

Worker representing a remote host reached through an SSH connection.

Uses a Fabric Connection. Check Fabric documentation for more details on the
options defining a Connection.

No Additional Properties

Type

Type: const Default: "remote"

The discriminator field to determine the worker type

Specific value: "remote"

Scheduler Type

Type: string

Type of the scheduler. Depending on the values supported by QToolKit

Work Dir

Type: stringFormat: path

Absolute path of the directory of the worker where subfolders for executing the calculation will be created

Resources

Default: null

A dictionary defining the default resources requested to the scheduler. Used to fill in the QToolKit template

Pre Run

Default: null

String with commands that will be executed before the execution of the Job

Post Run

Default: null

String with commands that will be executed after the execution of the Job

Timeout Execute

Type: integer Default: 60

Timeout for the execution of the commands in the worker (e.g. submitting a job)

Max Jobs

Default: null

The maximum number of jobs that can be submitted to the queue.

Default: null

Options for batch execution. If define the worker will be considered a batch worker

Type: object

Configuration for execution of batch jobs.

Allows to execute multiple Jobs in a single process executed on the worker (e.g. SLURM job).

Same definition as BatchConfig

Host

Type: string

The host to which to connect

Key Filename

Default: null

The filename, or list of filenames, of optional private key(s) and/or certs to try for authentication

Passphrase

Default: null

Passphrase used for decrypting private keys

Gateway

Default: null

A shell command string to use as a proxy or gateway

Connect Kwargs

Default: null

Other keyword arguments passed to paramiko.client.SSHClient.connect

Inline Ssh Env

Default: null

Whether to send environment variables 'inline' as prefixes in front of command strings

Keepalive

Default: 60

Keepalive value in seconds passed to paramiko's transport

Shell Cmd

Default: "bash"

The shell command used to execute the command remotely. If None the command is executed directly

Login Shell

Type: boolean Default: true

Whether to use a login shell when executing the command

Interactive Login

Type: boolean Default: false

Whether the authentication to the host should be interactive

Type: object

The configuration of the Store used to store the states of the Jobs and the Flows

No Additional Properties

Store

Type: object

Dictionary describing a maggma Store used for the queue data. Can contain the monty serialized dictionary or a dictionary with a 'type' specifying the Store subclass. Should be subclass of a MongoStore, as it requires to perform MongoDB actions. The collection is used to store the jobs

Flows Collection

Type: string Default: "flows"

The name of the collection containing information about the flows. Taken from the same database as the one defined in the store

Auxiliary Collection

Type: string Default: "jf_auxiliary"

The name of the collection containing auxiliary information. Taken from the same database as the one defined in the store

Db Id Prefix

Default: null

a string defining the prefix added to the integer ID associated to each Job in the database

Type: string
Type: null

Exec Config

Type: object

A dictionary with the ExecutionConfig name as keys and the ExecutionConfig configuration as values

Each additional property must conform to the following schema

Type: object

Configuration to be set before and after the execution of a Job.

No Additional Properties

Modules

Default: null

list of modules to be loaded

Type: array of string
No Additional Items

Each item of this array must be:

Export

Default: null

dictionary with variable to be exported

Pre Run

Default: null

Other commands to be executed before the execution of a job

Post Run

Default: null

Commands to be executed after the execution of a job

Jobstore

Type: object

The JobStore used for the input. Can contain the monty serialized dictionary or the Store int the Jobflow format

Metadata

Default: null

A dictionary with metadata associated to the project

Type: object
Type: null

General Settings#

Aside from the project specific configuration, a few options can also be defined in general. There are two ways to set these options:

  • set the value in the ~/.jfremote.yaml configuration file.

  • export the variable name prepended by the jfremote prefix:

    export jfremote_project=project_name
    

Note

The name of the exported variables is case-insensitive (i.e. JFREMOTE_PROJECT is equally valid).

The most useful variable to set is the project one, allowing to select the default project to be used in a multi-project environment.

Other generic options are the location of the projects folder, instead of ~/.jfremote (projects_folder) and the path to the ~/.jfremote.yaml file itself (config_file).

Some customization options are also available for the behaviour of the CLI. For more details see the API documentation jobflow_remote.config.settings.JobflowRemoteSettings.