Table Of Contents

Previous topic

Helper scripts

Next topic


This Page



This section contains examples that are sligtly outdated. They were initially used as examples at the github site and are kept here for reference.

Examples in tests

These examples are currently based on the tests in ratatosk.tests.test_commands and ratatosk.tests.test_wrappers.

Alignment with bwa sampe

Here’s a more useful example; paired-end alignment using bwa.

nosetests -v -s
bwa sampe

Figure 2. Read alignment with bwa.

Wrapping up metrics tasks

The class subclasses ratatosk.job.JobWrapperTask that can be used to require that several tasks have completed. Here I’ve used it to group picard metrics tasks:

nosetests -v -s
picard metrics

Figure 3. Summarizing metrics with a wrapper task

Here, I’ve set the option --use-long-names=False, which changes the output to show only the class names for each task. This example utilizes a configuration file that links tasks together. More about that in the next example.

Examples with

NB: these examples don’t actually do anything except plot the dependencies. To actually run the pipelines, see the examples in the extension module ratatosk.ext.scilife.

Dry run

The --dry-run option will resolve dependencies but not actually run anything. In addition, it will print the tasks that will be called. By passing a target RawIndelRealigner --target sample.merge.realign.bam
        --custom-config /path/to/ratatosk/examples/J.Doe_00_01.yaml --dry-run

we get the dependencies as specified in the config file:

dry run

Figure 1. Dry run output.

The task RawIndelRealigner is defined in ratatosk.pipeline.haloplex and is a modified version of IndelRealigner. It is used for analysis of HaloPlex data.

Merging samples over several runs

Samples that have data from two separate runs should be merged. The class merges sample_run files and places the result in the sample directory. The MergeSamFiles task needs information on how to find files to merge. This is currently done by registering a handler via the configuration option target_generator_handler. In the custom configuration file J.Doe_00_01.yaml, we have
    target_generator_handler: test.site_functions.collect_sample_runs

where the function test.site_functions.collect_sample_runs() is defined as

def collect_sample_runs(task):
    return ["sample/fc1/sample.sort.bam",

This can be any python function, with the only requirement that it return a list of source file names. This task could be run as follows MergeSamFiles  --target sample.sort.merge.bam
  --config-file /path/to/ratatosk/examples/J.Doe_00_01.yaml

resulting in (dry run version shown here)

dry run

Figure 2. Dry run output of merging.

Adding adapter trimming

Changing the following configuration section (see J.Doe_00_01_trim.yaml):

    parent_task: ratatosk.lib.utils.cutadapt.CutadaptJobTask

    parent_task: ratatosk.lib.utils.misc.ResyncMatesJobTask

and running MergeSamFiles
        --target P001_101_index3/P001_101_index3.trimmed.sync.sort.merge.bam
        --config-file ~/opt/ratatosk/examples/J.Doe_00_01_trim.yaml

runs the same pipeline as before, but on adapter-trimmed data.

dry run

Figure 3. Adding adapter trimming

Extending workflows with subclassed tasks

It’s dead simple to add tasks of a given type. Say you want to calculate hybrid selection on bam files that have and haven’t been mark duplicated. By subclassing an existing task and giving the new class it’s own configuration file location, you can configure the new task to depend on whatever you want. In I have added the following class:

class HsMetricsNonDup(HsMetrics):
        """Run on non-deduplicated data"""
        parent_task = luigi.Parameter(default="")

and a picard metrics wrapper task

class PicardMetricsNonDup(JobWrapperTask):
    """Runs hs metrics on both duplicated and de-duplicated data"""
    def requires(self):
        return [InsertMetrics( + str(InsertMetrics.target_suffix.default[0])),
                HsMetrics( + str(HsMetrics.target_suffix.default)),
                HsMetricsNonDup(target=rreplace(, str(DuplicationMetrics.label.default), "", 1) + str(HsMetrics.target_suffix.default)),
                AlignmentMetrics( + str(AlignmentMetrics.target_suffix.default))]

The task can be configured by adding a configuration subsection PicardMetricsNonDup to the picard configuration section. In the configuration file J.Doe_00_01_nondup.yaml we have:

Running PicardMetricsNonDup  --target P001_101_index3/P001_101_index3.sort.merge.dup
  --config-file ~/opt/ratatosk/examples/J.Doe_00_01_nondup.yaml

will add hybrid selection calculation on non-deduplicated bam file for sample P001_101_index3:

dry run

Figure 4. Adding custom tasks.