.. _sec_schemes:

Automation: *Schemes*
=======================

*Schemes* were introduced to |RELION|-3.1, where they were called *Schedules* but to prevent confusion with scheduled jobs in the main GUI, they were renamed to *Schemes* in |RELION|-4.0 and their functionality was further improved. *Schemes* aim to provide a generalised methodology for automatic submission of |RELION| jobs. This is useful for creating standardised workflows, for example to be used in on-the-fly processing. The ``relion_it.py`` script that was introduced in |RELION|-3.1 has been re-written to work with the *Schemes*.

The *Schemes* framework is built around the following key concepts: a directed graph that represents the logic of a series of subsequent |RELION| job-types is encoded in *Nodes* and *Edges*. *Nodes* can be either a |RELION| job or a so-called *Operator*; *Edges* form the connections between *Nodes*.
In addition, *Schemes* have their own *Variables*.

All information for each *Scheme* is stored in its own subdirectory of the ``Schemes/`` directory in a |RELION| project, e.g. ``Schemes/prep``.
Within each *Scheme*'s directory, the ``scheme.star`` file contains information about all the  *Variables*, *Edges*, *Operators* and *Jobs*.


Variables
---------

Three different types of *Variables* exist: *floatVariables* are numbers; *booleanVariables* are either True or False; and *stringVariables* are text.
Each Variable has a *VariableName*; a so-called *VariableResetValue*, at which the value is initialised; and a *VariableValue*, which may change during execution of the *Scheme* through the actions of Operators, as outlined below.

One special *stringVariable* is called ``email``.
When this is set, upon completion or upon encountering an error, the *Scheme* will send an email (through the Linux ``mail`` command) to the value of the `email` *stringVariable*.


Jobs
----

Jobs are the first type of *Node*.
They can be of any of the jobtypes defined in the |RELION| pipeliner, i.e. :jobtype:`Import`, :jobtype:`Motion correction`, etc, including the new :jobtype:`External`.
Any *Variable* defined in the *Scheme* can be set as a parameter in a Job, by using two dollar signs on the GUI or in the `job.star` file.
For example, one could define a *floatVariable* ``voltage`` and use ``$$voltage`` on the corresponding input line of an :jobtype:`Import` job.
Upon execution of the job inside the *Scheme*, the ``$$voltage`` will be replaced with the current value of the ``voltage`` *floatVariable*.

Jobs within a *Scheme* each have a `JobName` and a `JobNameOriginal`.
The latter is defined upon creation of the job (see next section); the former depends on the execution status of the *Scheme*, and will be set to the executed |RELION| job's name, e.g. ``CtfFind/job004``.
In addition, each job has a `JobMod` and a `jobHasStarted` status.
There are two types of `JobMode`:


``new``
    regardless of `jobHasStarted`, a new job will be created, with its own new `JobName`, every time the *Schemer* passes through this Node.

``continue``
    if `jobHasStarted` is False, a new job, with its own new `JobName`, will be created.
    If `jobHasStarted` is True, the job will be executed as a continue job inside the existing `JobName` directory.

When a *Scheme* executes a Job, it always sets `jobHasStarted` to True.
When a *Scheme* is reset, the `jobHasStarted` status for all jobs is set to False.


Operators
---------

*Operators* are the second type of *Node*.
Each operator within a *Scheme* has a unique name and a type.
Operators can also have an output Variable: `output`, on which they act, and up to two input Variables: `input1` and `input2`.
Most, but not all operators change the value of their `output` Variable.

The following types of operators act on an `output` that is a *floatVariable*:

``float=set``
    `output` = *floatVariable* `input1`
``float=plus``
    `output` = *floatVariable* `input1` + *floatVariable* `input2`
``float=minus``
     `output` = *floatVariable* `input1` - *floatVariable* `input2`
``float=mult``
    `output` = *floatVariable* `input1` × *floatVariable* `input2`
``float=divide``
    `output` = *floatVariable* `input1` / *floatVariable* `input2`
``float=round``
    `output` = ROUND(*floatVariable* `input1`)
``float=count_images``
    sets `output` to the number of images in the STAR file with the filename in *stringVariable* `input1`. *stringVariable* `input2` can be `particles`, `micrographs` or `movies`, depending on what type of images need to be counted.
``float=count_words``
    sets `output` to the number of words in *stringVariable* `input1`, where individual words need to be separated with a `,` (comma) sign.
``float=read_star``
    sets `output` to the value of a double or integer that is read from a STAR file. *stringVariable* `input1` defines which variable to read as: *starfilename,tablename,metadatalabel*.
    If *tablename* is a table instead of a list, then *floatVariable* `input2` defines the line number, with the default of zero being the first line.
``float=star_table_max``
    sets `output` to the maximum value of a column in a starfile table, where *stringVariable* `input1` specifies the column as *starfilename,tablename,metadatalabel*.
``float=star_table_min``
    sets `output` to the minimum value of a column in a starfile table, where *stringVariable* `input1` specifies the column as *starfilename,tablename,metadatalabel*.
``float=star_table_avg``
    sets `output` to the average value of a column in a starfile table, where *stringVariable* `input1` specifies the column as *starfilename,tablename,metadatalabel*.
``float=star_table_sort_idx``
    a sorting will be performed on the values of a column in a starfile table, where *stringVariable* `input1` specifies the column as *starfilename,tablename,metadatalabel*. *stringVariable* `input2` specifies the index in the ordered array: the lowest number is 1, the second lowest is 2, the highest is -1 and the one-but-highest is -2.
    Then, `output` is set to the corresponding index in the original table.


The following types of operators act on an `output` that is a *booleanVariable*:

``bool=set``
    `output` = *booleanVariable* `input1`
``bool=and``
    `output` = *booleanVariable* `input1` AND *booleanVariable* `input2`
``bool=or``
    `output` = *booleanVariable* `input1` OR *booleanVariable* `input2`
``bool=not``
    `output` = NOT *booleanVariable* `input1`
``bool=gt``
    `output` = *floatVariable* `input1` > *floatVariable* `input2`
``bool=lt``
    `output` = *floatVariable* `input1` < *floatVariable* `input2`
``bool=ge``
    `output` = *floatVariable* `input1` >= *floatVariable* `input2`
``bool=le``
    `output` = *floatVariable* `input1` <= *floatVariable* `input2`
``bool=eq``
    `output` = *floatVariable* `input1` == *floatVariable* `input2`
``bool=file_exists``
    `output` = True if a file with the filename stored in *stringVariable* `input1` exists on the file system; False otherwise
``bool=read_star``
    reads `output` from a boolean that is stored inside a STAR file. *stringVariable* `input1` defines which variable to read as: *starfilename,tablename,metadatalabel*.
    If *tablename* is a table instead of a list, then *floatVariable* `input2` defines the line number, with the default of zero being the first line.


The following types of operators act on an `output` that is a *stringVariable*:

``string=set``
    `output` = *stringVariable* `input1`
``string=join``
    `output` = concatenate *stringVariable* `input1` and *stringVariable* `input2`
``string=before_first``
    sets `output` to the substring of *stringVariable* `input1` that occurs before the first instance of substring *stringVariable* `input2`.
``string=after_first``
    sets `output` to the substring of *stringVariable* `input1` that occurs after the first instance of substring *stringVariable* `input2`.
``string=before_last``
    sets `output` to the substring of *stringVariable* `input1` that occurs before the last instance of substring *stringVariable* `input2`.
``string=after_last``
    sets `output` to the substring of *stringVariable* `input1` that occurs after the last instance of substring *stringVariable* `input2`.
``string=read_star``
    reads `output` from a string that is stored inside a STAR file. *stringVariable* `input1` defines which variable to read as: *starfilename,tablename,metadatalabel*.
    If *tablename* is a table instead of a list, then *floatVariable* `input2` defines the line number, with the default of zero being the first line.
``string=glob``
    `output` = GLOB(*stringVariable* `input1`), where input1 contains a Linux wildcard and GLOB is the Linux function that returns all the files that exist for that wildcard.
    Each existing file will be separated by a comma in the `output` string.
``string=nth_word``
    `output` = the Nth substring in *stringVariable* `input1`, where N=*floatVariable* `input2`, and substrings are separated by commas.
    Counting starts at one, and negative values for *input2* mean counting from the end, e.g. *input2=-2* means the second-last word.


The following types of operators do not act on any variable:

``touch_file``
    performs ``touch input1`` on the file system
``copy_file``
    performs ``cp input1 input2`` on the file system. *stringVariable* `input1` may contain a linux wildcard.
    If *stringVariable* `input2` contains a directory structure that does not exist yet, it will be created.
``move_file``
    performs ``mv input1 input2`` on the file system. *stringVariable* `input1` may contain a linux wildcard.
    If *stringVariable* `input2` contains a directory structure that does not exist yet, it will be created.
``delete_file``
    performs ``rm -f input1`` on the file system. *stringVariable* `input1` may contain a linux wildcard.
``email``
    sends an email, provided a *stringVariable* with the name `email` exists and the Linux command `mail` is functional.
    The content of the email has the current value of *stringVariable* `input1`, and optionally also *stringVariable* `input2`.
``wait``
    waits *floatVariable* `input1` seconds since the last time this operator was executed.
    The first time it is executed, this operator only starts the counter and does not wait.
    Optionally, if `output` is defined as a *floatVariable*, then the elapsed number of seconds since last time is stored in `output`.
``exit_maxtime`` 
    terminates the execution of the *Scheme* after the number of hours have passed since its start as stored in *floatVariable* `input1`.
``exit``
    terminates the execution of the *Scheme*.


Edges
-----

Two types of *Edges* exist.
The first type is a normal *Edge*, which connects an `inputNode` to an `ouputNode`, thereby defining their consecutive execution.

The second type is called a *Fork*.
A Fork has one `inputNode`, an `outputNode`, an `outputNodeIfTrue`, and an associated *booleanVariable*.
Whether one or the other output Node is executed depends on the current value of the booleanVariable that is associated with the Fork.
The fork with lead from the `inputNode`, an `outputNode` if the *booleanVariable* is *False*.
The fork will lead from the `inputNode`, an `outputNodeIfTrue` if the *booleanVariable* is *True*.
Thereby, Forks are the main instrument of making decisions in *Schemes*.


Create a Scheme
-----------------

The combination of the *Variables*, *Nodes* and *Edges* allows one to create complicated sequences of jobs.
It is probably a good idea to draw out a logical flow-chart of your sequence before creating a *Scheme*. Then, use your favourite text editor to manually edit the files ``Schemes/SCHEMENAME/scheme.star`` and all the files ``Schemes/SCHEMENAME/JOBNAMES/job.star`` for all the jobs in that *Scheme*. Following the ``prep`` and ``proc`` examples in the ``scripts`` directory of your |relion| installation is probably the easiest way to get started.

In the ``Schemes/SCHEMENAME/scheme.star`` file, first add all the different variables and operators that you will need.  

Note that any variable names that contain a `JobNameOriginal` of any of the *Jobs* inside any *Scheme* that is present in the *ProjectDirectory*, will be replaced by the current `JobName` upon execution of an operator.
For example, a *stringVariable* with the value ``Schemes/prep/ctffind/micrographs_ctf.star`` will be replaced to something like ``CtfFind/job003/micrographs_ctf.star`` upon execution of the job that uses it, assuming that the current `JobName` of that job is ``CtfFind/job003/`` in the ``Schemes/prep/scheme.star`` file.

Then, add your jobs. You can use the normal |relion| GUI to fill in all parameters of each job that you need and then use the ``Save job.star`` options from the ``Jobs`` menu to save the ``job.star`` file in the corresponding ``Schemes/SCHEMENAME/JOBNAMES/`` directory. 
Jobs use the same mechanism as described for the *Variables* above. 
So, if an :jobtype:`Auto-picking` job depends on its micrographs STAR file input on a :jobtype:`CTF estimation` job called `ctffind`, and this :jobtype:`CTF estimation` job is part of a *Scheme* called ``prep``, then the micrographs STAR file input for the :jobtype:`Auto-picking` job should be set to ``Schemes/prep/ctffind/micrographs_ctf.star``, and this will be converted to ``CtfFind/job003/micrographs_ctf.star`` upon execution of the job. 
In addition, a corresponding edge will be added to the ``default_pipeliner.star`` upon execution of the *Scheme*. 
Also note that parameters in ``job.star`` files may be updated with the current values of *Variables* from the *Scheme* by using the ``$$`` prefix, followed by the name of the corresponding *Variable*, as also mentioned above.

In addition, the `JobMode` needs to be chosen from options: ``new`` or ``continue``.
Typically, in on-the-fly-processing procedures that iterate over ever more movies, jobs like :jobtype:`Import`, :jobtype:`Motion correction`, :jobtype:`CTF estimation`, :jobtype:`Auto-picking` and :jobtype:`Particle extraction` are set as ``continue``, whereas most other jobs are set as ``new``.

Finally, once all the *Variables*, *Operators* and *Jobs* are in place, one should define all the *Edges* between them.

The *Scheme* will be initialised (and reset) to the left-hand *Node* of the first defined *Edge*.
If the *Scheme* is not an infinite loop, it is recommended to add the ``exit`` *Operator* as the last *Node*.

Once a *Scheme* has been created, it may be useful for more than one |RELION| project.
Therefore, you may want to store it in a tar-ball:

::

    tar -zcvf preprocess_scheme.tar.gz Schemes/preprocess


That tar-ball can then be extracted in any new |RELION| project directory:

::

    tar -zxvf preprocess_scheme.tar.gz


.. _sec_execute_schemes:

Executing a *Scheme*
^^^^^^^^^^^^^^^^^^^^^^

Once you have created the Scheme/name/ (sub)directories (with "name" being the name of your scheme), you can launch a separate GUI using:


::

    relion_schemegui.py name


You can start, stop, change parameters, and restart the scheme from there. You can also look into this python script to see the actual calls it makes to relion_schemer, which is the command line program that executes the scheme. While it runs, you can then follow the generation of new jobs in the normal relion GUI.