Table Of Contents

Previous topic

Ratatosk configuration

Next topic

Helper scripts

This Page

Of handlers and targets

Warning

The target collecting implementation is still under heavy development. The API is likely to change every now and then.

All tasks subclass BaseJobTask and therefore inherit an option target. By passing a target option, a task can be run like a make target:

ratatosk_run.py Task --target target.txt

Should Task have any further dependencies, these will be calculated on-the-fly and run if incomplete.

Target generator functions

For more complex workflows, such as pipelines, the situation is more complicated. Generally, we want to run hundreds of targets, and providing them as arguments to options is cumbersome, to say the least. Therefore, ratatosk implements the concept of target generator functions whose purpose is to generate a list of experimental units that contain information of names at different levels.

The experimental units are python objects that must inherit from the abstract base class ratatosk.experiment.ISample. This ensures that the target generator function returns objects with defined properties and entries, such as sample and project identifier. The goal is to provide an interface that makes it easy to write any function of choice that is tailored for a given file organization and naming convention. See ratatosk.ext.scilife.sample.generic_target_generator and ratatosk.ext.scilife.sample.target_generator for examples.

Registering target generator functions

Target generator functions can be registered in two ways:

  1. to backend.__handlers__, which serves as a global container, via the handler.register() function. See handler.setup_global_handlers() for an example of how this is done.

  2. to the task attribute BaseJobTask._handlers via the handler.register_task_handler() function, as implemented in MergeSamFiles.requires and CombineVariants.requires.

    Note

    these target generator functions are used for collecting targets to merge. They currently do not return ISample objects but lists of file names. This inconsistency will be resolved, either by changing the handler name, or by generating the target names from ISample objects within the requires() function.

The target generator functions are provided to ratatosk as option paramaters, and can therefore be defined in configuration files. In the above cases, one could use

settings:
  target_generator_handler: my.module.tgf

ratatosk.lib.tools.gatk.MergeSamFiles:
  target_generator_handler: my.module.collect_bam_files

Adding custom handlers

In general, handler functions and classes are registered by one of the register() functions in ratatosk.handler. Each of these functions takes as input a handler object of class ratatosk.handler.IHandler. Custom classes and functions can therefore be added by instantiating a subclass of an IHandler object (e.g. RatatoskHandler) and passing the string representation of the class/function along with a label descriptor as init arguments:

h = RatatoskHandler(label="HandlerLabel", mod="my.handler.function")
# Will register handler to backend.__handlers__["HandlerLabel"]
register(h)

Keeping track of information in pipelines

Finally, some words of how targets are collected and handled in the pipeline modules. First, targets are loaded via the global target generator function (registered in backend.__handlers__["target_generator_handler"]). Then, in order to make the targets accessible to all task-specific target generator handlers, targets are stored in backend.__global_vars__["targets"]. For instance, collecting a list of bam files to merge could then be generated as follows:

sample_runs = backend.__global_vars__.get("targets")
bam_list = list(set([x.prefix("sample_run") + task.suffix) for x in sample_runs])
return bam_list

This is (almost) how ratatosk.ext.scilife.sample.collect_sample_runs() works.