Table Of Contents

Previous topic

Introducing ratatosk

Next topic

Ratatosk configuration

This Page


The main ratatosk task class is a subclass of luigi.Task and is a core component of ratatosk functionality. If you haven’t already done so, be sure to read up on the luigi documentation , in particular the section conceptual overview.

Tasks in ratatosk.job

All ratatosk tasks subclass classes that are based on a base job task called ratatosk.job.BaseJobTask. The main task classes used (and which should be used) by the wrapper library tasks are defined in ratatosk.job, including

  • JobTask: main job task which adds a default job runner to the base job task
  • InputJobTask: a task that depends on an external file
  • JobWrapperTask: task that wraps several tasks into one unit
  • NullJobTask: a task that always completes
  • PipedTask: a task that chains tasks in a pipe
  • PipelineTask: a wrapper task for predefined pipelines

Therefore, in order to understand how ratatosk tasks work, you just need to get a basic understanding of ratatosk.job.BaseJobTask.


Configurable attributes

To begin with, there are a couple of attributes essential to the behaviour of all subclasses, some of which are configurable at run-time (they are luigi.Parameter objects). The most important ones are

Main configuration file
Custom configuration file that is used for tuning options in predefined pipelines
Program options for wrapped executable, represented by a list
Defines the task on which this task depends, encoded as a string that represents a python module (e.g. ‘’. Several parent tasks can be defined.
The output target name of this task.
The output suffix of this task. Can be a list in case several outputs are produced.
The label that is attached to the resulting output (e.g. file.txt -> file.label.txt)
Path to executable
Name of executable
Name of executable, if applicable

Non-configurable attributes

In addition, there are a couple of important non-configurable attributes:

Handlers registered to this task
Placeholder for registered parent classes


The most important functions include

Registers classes to _parent_cls. In practice parses string representation of a python module and tries to load the module, falling back on a default class on failure
Reads configuration files and sets the attributes of the task
Defines the output target as a luigi.LocalTarget class
Defines the dependencies.
Defines what the final program command string looks like. This function should often be overridden in subclasses
Helper function that iterates over targets defined by a user supplied function target_generator_handler(). This is the function that enables tasks to compute target file names, and should generate a 3-tuple consisting of (name, merge-prefix, read-prefix)
Calculates source file names from target by adding/subtracting indices and labels


When a task is instantiated, it basically needs to do the following things:

  1. read configuration files and update configuration
  2. register parent tasks

Thereafter, the luigi framework resolves dependencies based on the requires() function, eventually running the tasks.

Job runners

Job runners govern how a task is run. In practice, they do the following

  1. create argument list from the args function
  2. fix path names for outputs, generating temporary file names so that all operations are atomic
  3. submit the command string via subprocess