Projects configuration and Settings#

Jobflow-remote allows to handle multiple configurations, defined projects. Since for most of the users a single project is enough let us first consider the configuration of a single project. The handling of Multiple Projects will be described below.

Aside from the project options, a set of General Settings - Environment variables can be also be configured through environment variables or an additional configuration file.

Project options#

The project configurations allow to control the behaviour of the Job execution, as well as the other objects in jobflow-remote. Here a full description of the project’s configuration file will be given. If you are looking for a minimal example with its description you can find it in the Configuration section.

The specifications of the project’s attributes are given by the Project pydantic model, that serves the purpose of parsing and validating the configuration files, as well as giving access to the associated objects (e.g. the JobStore). A graphical representation of the Project model and thus of the options available in the configuration file is given below (generated with erdantic)

A description for all the types and keys of the project file is given in the Project specs section below, while an example for a full configuration file can be generated running:

jf project generate --full YOUR_PROJECT_NAME

Note that, while the default file format is YAML, JSON and TOML are also acceptable format. You can generate the example in the other formats using the --format option.

Name and folders#

The project name is given by the name attribute. The name will be used to create a subfolder containing

files with the parsed outputs copied from the remote workers
logs
files used by the daemon

For all these folders the paths are set with defaults, but can be customised setting

tmp_dir, log_dir and daemon_dir.

Warning

The project name does not take into consideration the configuration file name. For coherence it would be better to use the project name as file name.

Workers#

Multiple workers can be defined in a project. In the configuration file they are given with their name as keyword, and their properties in the contained dictionary.

Several defining properties should be set in the configuration of each workers. First it should be specified the type. At the moment the possible worker types are

local: a worker running on the same system as the Runner. No connection is needed for the Runner to reach the queueing system.
remote: a worker on a different machine than the Runner, requiring an SSH connection to reach it.

Since the Runner needs to constantly interact with the workers, for the latter type all the credentials to connect automatically should be provided. The best option would be to set up a passwordless connection and define it in the ~/.ssh/config file.

The other key property of the workers is the scheduler_type. It can be any of the values supported by the qtoolkit. Typical values are:

shell: the Job is executed directly in the shell. No queue will be used. If not limited, all the Jobs can be executed simultaneously
slurm, pbs, …: the name of a queueing system. The job will be submitted to the queue with the selected resources.

Another mandatory argument is work_dir, indicating the full path for a folder on the worker machine where the Jobs will be actually executed.

It is possible to optionally select default values for keywords like pre_run and resources, that can be overridden for individual Jobs. Note that these configurations will be applied to all the Jobs executed by the worker. These are thus more suitable for generic settings (e.g. the activation of a python environment, or loading of some modules), rather than for the specific code configurations. Those can better be set with the Execution configurations.

Note

If a single worker is defined it will be used as default in the submission of new Flows.

Warning

By default, jobflow-remote fetches the status of the jobs from the scheduler by passing the list of ids. If the selected scheduler does not support this option (e.g. SGE), it is also necessary to specify the username on the worker machine through the scheduler_username option. Jobflow-remote will use that as a filter, instead of the list of ids.

JobStore#

The jobstore value contains a dictionary representation of the standard JobStore object defined in jobflow. It can either be the serialized version as obtained by the as_dict module or the representation defined in jobflow’s documentation.

This JobStore will be used to store the outputs of all the Jobs executed in this project.

Note

The JobStore should be defined in jobflow-remote’s configuration file. The content of the standard jobflow configuration file will be ignored.

Warning

If you have been using jobflow without jobflow-remote and you have a JobStore defined in a jobflow.yaml it will be ignored. Only the definition in the jobflow-remote configuration file will be considered

Queue Store#

The queue element contains the definition of the database containing the state of the Jobs and Flows. The subelement store should contain the representation of a maggma Store. As for the JobStore it can be either its serialization or the same kind of representation used for the docs_store in jobflow’s configuration file.

The collection defined by the Store will contain the information about the state of the Job, while two more collections will be created. The name of these two collections can also be customized.

Warning

The queue Store should be a subclass of the MongoStore and currently it should be based on a real MongoDB (e.g. not a JSONStore). Some key operations required by jobflow-remote on the collections are not supported by any file based MongoDB implementation at the moment.

Warning

If the JobStore is also based on a MongoDB, it is often convenient to have its main docs_store in the same database as the queue store, in that case it is important that the two do not point to the same collection. Unexpected errors may happen otherwise.

Execution configurations#

It is possible to define a set of ExecutionConfig objects to quickly set up configurations for different kind of Jobs and Flow. The exec_config key contains a dictionary where the keys are the names associated to the configurations and for each a set of instruction to be set before and after the execution of the Job. See the Execution configuration section for more details and a usage examples.

Runner options#

The behaviour of the Runner can also be customized to some extent. In particular the Runner implements an exponential backoff mechanism for retrying when an operation of updating of a Job state fails. The amount of tries and the delay between them can be set max_step_attempts and delta_retry values. In addition some reasonable values are set for the delay between each check of the database for different kind of actions performed by the Runner. These intervals can be changed to better fit your needs. Remind that reducing these intervals too much may put unnecessary strain on the database.

Metadata#

While this does currently not play any role in the execution of jobflow-remote, this can be used to include some additional information to be used by external tools or to quickly distinguish a configuration file among others.

Multiple Projects#

While a single project can be enough for most of the users and for beginners, it may be convenient to define different databases, configurations and python environments to work on different topics. For this reason jobflow-remote will consider as potential projects configuration all the YAML, JSON and TOML files in the ~/.jfremote folder. There is no additional procedure required to add or remove project, aside from creating/deleting a project configuration file.

Warning

Different projects are meant to use different Queue Stores. Sharing the same collections for two projects is not a supported option.

To define the Queue Store for multiple projects two options are available:

each project has its own database, with standard collection names

a single database is used and each project is assigned a set of collections. For example, a configuration for one of the projects could be:

queue:
  store:
    type: MongoStore
    database: DB_NAME
    collection_name: jobs_project1
    ...
  flows_collection: flows_project1
  auxiliary_collection: jf_auxiliary_project1

And the same for a second project with different collection names.

There is no constraint for the database and collection used for the output JobStore. Even though it may make sense to separate the sets of outputs, it is possible to share the same collection among multiple projects. In that case the output documents will have duplicated db_id, as each project has its own counter. If this may be an issue it is possible to set different db_id_prefix values in the queue configuration for the different projects.

If more than one project is present and a specific one is not selected, the code will always stop asking for a project to be specified. Python functions like submit_flow and get_jobstore accept a project argument to specify which project should be considered. For the command line interface a general -p allows to select a project for the command that is being executed:

jf -p another_project job list

To define a default project for all the functions and commands executed on the system or in a specific cell see the General Settings - Environment variables section.

Project specs#

Project

Type: object

The configurations of a Project.

No Additional Properties

root name

Name

Type: string

The name of the project

root base_dir

Base Dir

Default: null

The base directory containing the project related files. Default is a folder with the project name inside the projects folder

root base_dir anyOf item 0

Type: string

root base_dir anyOf item 1

Type: null

root tmp_dir

Tmp Dir

Default: null

Folder where remote files are copied. Default a 'tmp' folder in base_dir

Any of

Option 1
Option 2

root tmp_dir anyOf item 0

Type: string

root tmp_dir anyOf item 1

Type: null

root log_dir

Log Dir

Default: null

Folder containing all the logs. Default a 'log' folder in base_dir

Any of

Option 1
Option 2

root log_dir anyOf item 0

Type: string

root log_dir anyOf item 1

Type: null

root daemon_dir

Daemon Dir

Default: null

Folder containing daemon related files. Default to a 'daemon' folder in base_dir

Any of

Option 1
Option 2

root daemon_dir anyOf item 0

Type: string

root daemon_dir anyOf item 1

Type: null

root log_level

Type: enum (of string) Default: "info"

The level set for logging

Must be one of:

"error"
"warn"
"info"
"debug"

root runner

Type: object

Default: 86400

Time to consider the lock on a document expired and can be overridden (seconds)

Any of

Option 1
Option 2

root runner lock_timeout anyOf item 0

Type: integer

root runner lock_timeout anyOf item 1

Type: null

root runner delete_tmp_folder

Delete Tmp Folder

Type: boolean Default: true

Whether to delete the local temporary folder after a job has completed

root runner max_step_attempts

Max Step Attempts

Type: integer Default: 3

Maximum number of attempt performed before failing an advancement of a remote state

root runner delta_retry

Delta Retry

Type: array of integer Default: [30, 300, 1200]

List of increasing delay between subsequent attempts when the advancement of a remote step fails

No Additional Items

Each item of this array must be:

root runner delta_retry delta_retry items

Type: integer

root workers

Workers

Type: object

A dictionary with the worker name as keys and the worker configuration as values

Each additional property must conform to the following schema

root workers additionalProperties

One of

LocalWorker
RemoteWorker

root workers additionalProperties oneOf LocalWorker

Type: object

Worker representing the local host.

Executes command directly.

No Additional Properties

root workers additionalProperties oneOf LocalWorker type

Type

Type: const Default: "local"

The discriminator field to determine the worker type

Specific value: "local"

root workers additionalProperties oneOf LocalWorker scheduler_type

Scheduler Type

Type: string

Type of the scheduler. Depending on the values supported by QToolKit

root workers additionalProperties oneOf LocalWorker work_dir

Work Dir

Type: stringFormat: path

Absolute path of the directory of the worker where subfolders for executing the calculation will be created

root workers additionalProperties oneOf LocalWorker resources

Resources

Default: null

A dictionary defining the default resources requested to the scheduler. Used to fill in the QToolKit template

Any of

Option 1
Option 2

root workers additionalProperties oneOf LocalWorker resources anyOf item 0

Type: object

root workers additionalProperties oneOf LocalWorker resources anyOf item 1

Type: null

root workers additionalProperties oneOf LocalWorker pre_run

Pre Run

Default: null

String with commands that will be executed before the execution of the Job

Any of

Option 1
Option 2

root workers additionalProperties oneOf LocalWorker pre_run anyOf item 0

Type: string

root workers additionalProperties oneOf LocalWorker pre_run anyOf item 1

Type: null

root workers additionalProperties oneOf LocalWorker post_run

Post Run

Default: null

String with commands that will be executed after the execution of the Job

Any of

Option 1
Option 2

root workers additionalProperties oneOf LocalWorker post_run anyOf item 0

Type: string

root workers additionalProperties oneOf LocalWorker post_run anyOf item 1

Type: null

root workers additionalProperties oneOf LocalWorker timeout_execute

Timeout Execute

Type: integer Default: 60

Timeout for the execution of the commands in the worker (e.g. submitting a job)

root workers additionalProperties oneOf LocalWorker max_jobs

Max Jobs

Default: null

The maximum number of jobs that can be submitted to the queue.

Any of

Option 1
Option 2

root workers additionalProperties oneOf LocalWorker max_jobs anyOf item 0

Type: integer

Value must be greater or equal to 0

root workers additionalProperties oneOf LocalWorker max_jobs anyOf item 1

Type: null

root workers additionalProperties oneOf LocalWorker batch

Default: null

Options for batch execution. If define the worker will be considered a batch worker

Any of

BatchConfig
Option 2

root workers additionalProperties oneOf LocalWorker batch anyOf BatchConfig

Type: object

Configuration for execution of batch jobs.

Allows to execute multiple Jobs in a single process executed on the worker (e.g. SLURM job).

Same definition as BatchConfig

root workers additionalProperties oneOf LocalWorker batch anyOf item 1

Type: null

root workers additionalProperties oneOf RemoteWorker

Type: object

Worker representing a remote host reached through an SSH connection.

Uses a Fabric Connection. Check Fabric documentation for more details on the
options defining a Connection.

No Additional Properties

root workers additionalProperties oneOf RemoteWorker type

Type

Type: const Default: "remote"

The discriminator field to determine the worker type

Specific value: "remote"

root workers additionalProperties oneOf RemoteWorker scheduler_type

Scheduler Type

Type: string

Type of the scheduler. Depending on the values supported by QToolKit

root workers additionalProperties oneOf RemoteWorker work_dir

Work Dir

Type: stringFormat: path

Absolute path of the directory of the worker where subfolders for executing the calculation will be created

root workers additionalProperties oneOf RemoteWorker resources

Resources

Default: null

A dictionary defining the default resources requested to the scheduler. Used to fill in the QToolKit template

Any of

Option 1
Option 2

root workers additionalProperties oneOf RemoteWorker resources anyOf item 0

Type: object

root workers additionalProperties oneOf RemoteWorker resources anyOf item 1

Type: null

root workers additionalProperties oneOf RemoteWorker pre_run

Pre Run

Default: null

String with commands that will be executed before the execution of the Job

Any of

Option 1
Option 2

root workers additionalProperties oneOf RemoteWorker pre_run anyOf item 0

Type: string

root workers additionalProperties oneOf RemoteWorker pre_run anyOf item 1

Type: null

root workers additionalProperties oneOf RemoteWorker post_run

Post Run

Default: null

String with commands that will be executed after the execution of the Job

Any of

Option 1
Option 2

root workers additionalProperties oneOf RemoteWorker post_run anyOf item 0

Type: string

root workers additionalProperties oneOf RemoteWorker post_run anyOf item 1

Type: null

root workers additionalProperties oneOf RemoteWorker timeout_execute

Timeout Execute

Type: integer Default: 60

Timeout for the execution of the commands in the worker (e.g. submitting a job)

root workers additionalProperties oneOf RemoteWorker max_jobs

Max Jobs

Default: null

The maximum number of jobs that can be submitted to the queue.

Any of

Option 1
Option 2

root workers additionalProperties oneOf RemoteWorker max_jobs anyOf item 0

Type: integer

Value must be greater or equal to 0

root workers additionalProperties oneOf RemoteWorker max_jobs anyOf item 1

Type: null

root workers additionalProperties oneOf RemoteWorker batch

Default: null

Options for batch execution. If define the worker will be considered a batch worker

Any of

BatchConfig
Option 2

root workers additionalProperties oneOf RemoteWorker batch anyOf BatchConfig

Type: object

Configuration for execution of batch jobs.

Allows to execute multiple Jobs in a single process executed on the worker (e.g. SLURM job).

Same definition as BatchConfig

root workers additionalProperties oneOf RemoteWorker batch anyOf item 1

Type: null

root workers additionalProperties oneOf RemoteWorker host

Host

Type: string

The host to which to connect

root workers additionalProperties oneOf RemoteWorker user

User

Default: null

Login username

Any of

Option 1
Option 2

root workers additionalProperties oneOf RemoteWorker user anyOf item 0

Type: string

root workers additionalProperties oneOf RemoteWorker user anyOf item 1

Type: null

root workers additionalProperties oneOf RemoteWorker port

Port

Default: null

Port number

Any of

Option 1
Option 2

root workers additionalProperties oneOf RemoteWorker port anyOf item 0

Type: integer

root workers additionalProperties oneOf RemoteWorker port anyOf item 1

Type: null

root workers additionalProperties oneOf RemoteWorker password

Password

Default: null

Login password

Any of

Option 1
Option 2

root workers additionalProperties oneOf RemoteWorker password anyOf item 0

Type: string

root workers additionalProperties oneOf RemoteWorker password anyOf item 1

Type: null

root workers additionalProperties oneOf RemoteWorker key_filename

Key Filename

Default: null

The filename, or list of filenames, of optional private key(s) and/or certs to try for authentication

Any of

root workers additionalProperties oneOf RemoteWorker key_filename anyOf item 0

Type: string

root workers additionalProperties oneOf RemoteWorker key_filename anyOf item 1

Type: array of string
No Additional Items

Each item of this array must be:

root workers additionalProperties oneOf RemoteWorker key_filename anyOf item 1 item 1 items

Type: string

root workers additionalProperties oneOf RemoteWorker key_filename anyOf item 2

Type: null

root workers additionalProperties oneOf RemoteWorker passphrase

Passphrase

Default: null

Passphrase used for decrypting private keys

Any of

Option 1
Option 2

root workers additionalProperties oneOf RemoteWorker passphrase anyOf item 0

Type: string

root workers additionalProperties oneOf RemoteWorker passphrase anyOf item 1

Type: null

root workers additionalProperties oneOf RemoteWorker gateway

Gateway

Default: null

A shell command string to use as a proxy or gateway

Any of

root workers additionalProperties oneOf RemoteWorker gateway anyOf item 0

Type: string

root workers additionalProperties oneOf RemoteWorker gateway anyOf ConnectionData

Option 1
Option 2

root workers additionalProperties oneOf RemoteWorker shell_cmd anyOf item 0

Type: string

root workers additionalProperties oneOf RemoteWorker shell_cmd anyOf item 1

Type: null

root workers additionalProperties oneOf RemoteWorker login_shell

Login Shell

Type: boolean Default: true

Whether to use a login shell when executing the command

root workers additionalProperties oneOf RemoteWorker interactive_login

Interactive Login

Type: boolean Default: false

Whether the authentication to the host should be interactive

root queue

Type: object

The configuration of the Store used to store the states of the Jobs and the Flows

No Additional Properties

root queue store

Store

Type: object

Dictionary describing a maggma Store used for the queue data. Can contain the monty serialized dictionary or a dictionary with a 'type' specifying the Store subclass. Should be subclass of a MongoStore, as it requires to perform MongoDB actions. The collection is used to store the jobs

root queue flows_collection

Flows Collection

Type: string Default: "flows"

The name of the collection containing information about the flows. Taken from the same database as the one defined in the store

root queue auxiliary_collection

Auxiliary Collection

Type: string Default: "jf_auxiliary"

The name of the collection containing auxiliary information. Taken from the same database as the one defined in the store

root queue db_id_prefix

Db Id Prefix

Default: null

a string defining the prefix added to the integer ID associated to each Job in the database

Any of

Option 1
Option 2

root queue db_id_prefix anyOf item 0

Type: string

root queue db_id_prefix anyOf item 1

Type: null

root exec_config

Exec Config

Type: object

A dictionary with the ExecutionConfig name as keys and the ExecutionConfig configuration as values

Each additional property must conform to the following schema

root exec_config ExecutionConfig

Type: object

Configuration to be set before and after the execution of a Job.

No Additional Properties

root exec_config ExecutionConfig modules

Modules

Default: null

list of modules to be loaded

Any of

Option 1
Option 2

root exec_config ExecutionConfig modules anyOf item 0

Type: array of string
No Additional Items

Option 1
Option 2

root exec_config ExecutionConfig post_run anyOf item 0

Type: string

root exec_config ExecutionConfig post_run anyOf item 1

Type: null

root jobstore

Jobstore

Type: object

The JobStore used for the input. Can contain the monty serialized dictionary or the Store int the Jobflow format

root metadata

Metadata

Default: null

A dictionary with metadata associated to the project

Any of

Option 1
Option 2

root metadata anyOf item 0

Type: object

root metadata anyOf item 1

Type: null

General Settings - Environment variables#

Aside from the project specific configuration, a few options can also be defined in general. There are two ways to set these options:

set the value in the ~/.jfremote.yaml configuration file.
set an environment variable composed by the name of the variable and prepended by the JFREMOTE_ prefix:
```
export JFREMOTE_PROJECT=project_name
```

Note

The name of the exported variables is case-insensitive (i.e. jfremote_project is equally valid).

The most useful variable to set is the project one, allowing to select the default project to be used in a multi-project environment.

Other generic options are the location of the projects folder, instead of ~/.jfremote (JFREMOTE_PROJECT_FOLDER) and the path to the ~/.jfremote.yaml file itself (JFREMOTE_CONFIG_FILE).

Some customization options are also available for the behaviour of the CLI. For more details see the API documentation jobflow_remote.config.settings.JobflowRemoteSettings.

Projects configuration and Settings#

Project options#

Name and folders#

Workers#

JobStore#

Queue Store#

Execution configurations#

Runner options#

Metadata#

Multiple Projects#

Project specs#

Project

name Required

Name

base_dir

Base Dir

Any of

tmp_dir

Tmp Dir

Any of

log_dir

Log Dir

Any of

daemon_dir

Daemon Dir

Any of

log_level

Must be one of:

runner

delay_checkout

Delay Checkout

delay_check_run_status

Delay Check Run Status

delay_advance_status

Delay Advance Status

delay_refresh_limited

Delay Refresh Limited

delay_update_batch

Delay Update Batch

lock_timeout

Lock Timeout

Any of

delete_tmp_folder

Delete Tmp Folder

max_step_attempts

Max Step Attempts

delta_retry

Delta Retry

Each item of this array must be:

workers

Workers

Additional Properties

One of

type

Type

scheduler_type Required

Scheduler Type

work_dir Required

Work Dir

resources

Resources

Any of

pre_run

Pre Run

Any of

post_run

Post Run

Any of

timeout_execute

Timeout Execute

max_jobs

Max Jobs

Any of

batch

Any of

type

Type

scheduler_type Required

Scheduler Type

work_dir Required