Table Of Contents

Previous topic

Introducing ratatosk

Next topic

Ratatosk configuration

This Page

Tasks

The main ratatosk task class is a subclass of luigi.Task and is a core component of ratatosk functionality. If you haven’t already done so, be sure to read up on the luigi documentation , in particular the section conceptual overview.

Tasks in ratatosk.job

All ratatosk tasks subclass classes that are based on a base job task called ratatosk.job.BaseJobTask. The main task classes used (and which should be used) by the wrapper library tasks are defined in ratatosk.job, including

  • JobTask: main job task which adds a default job runner to the base job task
  • InputJobTask: a task that depends on an external file
  • JobWrapperTask: task that wraps several tasks into one unit
  • NullJobTask: a task that always completes
  • PipedTask: a task that chains tasks in a pipe
  • PipelineTask: a wrapper task for predefined pipelines

Therefore, in order to understand how ratatosk tasks work, you just need to get a basic understanding of ratatosk.job.BaseJobTask.

ratatosk.job.BaseJobTask

Configurable attributes

To begin with, there are a couple of attributes essential to the behaviour of all subclasses, some of which are configurable at run-time (they are luigi.Parameter objects). The most important ones are

config_file
Main configuration file
custom_config
Custom configuration file that is used for tuning options in predefined pipelines
options
Program options for wrapped executable, represented by a list
parent_task
Defines the task on which this task depends, encoded as a string that represents a python module (e.g. ‘ratatosk.lib.tools.gatk.UnifiedGenotyper’. Several parent tasks can be defined.
target
The output target name of this task.
suffix
The output suffix of this task. Can be a list in case several outputs are produced.
label
The label that is attached to the resulting output (e.g. file.txt -> file.label.txt)
exe_path
Path to executable
executable
Name of executable
sub_executable
Name of executable, if applicable

Non-configurable attributes

In addition, there are a couple of important non-configurable attributes:

_handlers
Handlers registered to this task
_parent_cls
Placeholder for registered parent classes

Functions

The most important functions include

_register_parent_task()
Registers classes to _parent_cls. In practice parses string representation of a python module and tries to load the module, falling back on a default class on failure
_update_config()
Reads configuration files and sets the attributes of the task
output()
Defines the output target as a luigi.LocalTarget class
requires()
Defines the dependencies.
args()
Defines what the final program command string looks like. This function should often be overridden in subclasses
target_iterator()
Helper function that iterates over targets defined by a user supplied function target_generator_handler(). This is the function that enables tasks to compute target file names, and should generate a 3-tuple consisting of (name, merge-prefix, read-prefix)
_make_source_file_name()
Calculates source file names from target by adding/subtracting indices and labels

Initialization

When a task is instantiated, it basically needs to do the following things:

  1. read configuration files and update configuration
  2. register parent tasks

Thereafter, the luigi framework resolves dependencies based on the requires() function, eventually running the tasks.

Job runners

Job runners govern how a task is run. In practice, they do the following

  1. create argument list from the args function
  2. fix path names for outputs, generating temporary file names so that all operations are atomic
  3. submit the command string via subprocess