States#
Job States#
During their execution by the Runner
a Job can reach different states.
Each of the states describes the current status of the after the Runner
has finished switching from one state to another.
Since the Runner
can be stopped and will update the different states at
predefined intervals, it may be that the state does not reflect the
actual situation of a Job (for example it could be that the process of a
Job in the RUNNING
state has finished, but the Runner
did not
update its state yet.
Description#
The states can then be grouped in
Waiting states: describing Jobs that has not started yet.
Running states: states for which the
Runner
has started working on the JobCompleted state: a state where the Job has been completed successfully
Error states: states associated with some error in the Job, either programmatic or during the execution.
The list of Job states is defined in the jobflow_remote.jobs.state.JobState
object. Here we present a list of each state with a short description.
WAITING#
Waiting state. A Job that has been inserted into the database but has to wait for other Jobs to be completed before starting.
READY#
Waiting state. A Job that is ready to be executed by the Runner
.
CHECKED_OUT#
Running state. A Job that has been selected by the Runner
to
start its execution.
UPLOADED#
Running state. All the inputs required by the Job has been copied to the worker.
SUBMITTED#
Running state. The Job has been submitted to the queueing system of the worker.
RUNNING#
Running state. The Runner
verified that the Job has started is being
executed on the worker.
TERMINATED#
Running state. The process executing the Job on the worked has finished running. No knowledge of whether this happened for an error or because the Job was completed correctly is available at this point.
DOWNLOADED#
Running state. The Runner
has copied to the local machine all the
files containing the Job response and outputs to be stored.
COMPLETED#
Completed state. A Job that has completed correctly.
FAILED#
Error state. The procedure to execute the Job completed correctly, but an error happened during the execution of the Job’s function, so the Job did not complete successfully.
REMOTE_ERROR#
Error state. An error occurred during the procedure to execute the Job. For example the files could not be copied due to some network error and the maximum number of attempts has been reached. The Job may or may not be executed, depending on the action that generated the issue, but in any case no information is available about it. This failure is independent from the correct execution of the Job’s function.
PAUSED#
Waiting state. The Job has been paused by the user and will not be
executed by the Runner
. A Job in this state can be started again.
STOPPED#
Error state. The Job was stopped by another Job as a consequence of a
stop_jobflow
or stop_children
actions in the Job’s response.
This state cannot be modified.
USER_STOPPED#
Error state. A Job stopped by the user. This state cannot be modified.
BATCH_SUBMITTED#
Running state. A Job submitted for execution to a batch worker. Differs
from the SUBMITTED
state since the Runner
does not have to
check its state in the queueing system.
BATCH_RUNNING#
Running state. A Job that is being executed by a batch worker. Differs
from the RUNNING
state since the Runner
does not have to
check its state in the queueing system.
Evolution#
If the state of a Job is not directly modified by user, the Runner
will consistently update the state of each Job in a running state.
The following diagram illustrates which states transitions can
be performed by the Runner
on a Job. This includes the transitions
to intermediate or final error states.
Flow states#
Each Flow in the database also has a global state. This is a function of
the states of each of the Jobs included in the Flow. As for the Jobs,
the Flow states can change due to the action of the Runner
or of the user.
Description#
The list of Flow states is simplified compared to the Job’s states, since several Job state will be grouped under a single Flow state.
The list of Flow states is defined in the jobflow_remote.jobs.state.FlowState
object. Here we present a list of each state with a short description.
READY#
There is at least one Job in the READY state. No Jobs have started or have failed.
RUNNING#
At least one of the Jobs is being or has been executed. The state will not be changed if a single Job completes, but there are still other Jobs to be executed.
COMPLETED#
All the left Jobs of the Flow are in the COMPLETED
state. This means that some
intermediate Job may be in the FAILED
state, but its children are set to
not give an error in the on_missing_references
of the JobConfig
.
FAILED#
At least one of the Jobs failed and the Flow is not COMPLETED
.
STOPPED#
At least one of the Job is in the STOPPED
or the USER_STOPPED
state
and the flow is not in one of the previous states.
PAUSED#
At least one of the Job is in the PAUSED
state and the flow is not in one
of the previous states.