Helper scripts

Troubleshooting

# Examples¶

Note

This section contains examples that are sligtly outdated. They were initially used as examples at the github site and are kept here for reference.

## Examples in tests¶

These examples are currently based on the tests in ratatosk.tests.test_commands and ratatosk.tests.test_wrappers.

### Alignment with bwa sampe¶

Here’s a more useful example; paired-end alignment using bwa.

nosetests -v -s test_commands.py:TestCommand.test_bwasampe


Figure 2. Read alignment with bwa.

The class ratatosk.lib.tools.picard.PicardMetrics subclasses ratatosk.job.JobWrapperTask that can be used to require that several tasks have completed. Here I’ve used it to group picard metrics tasks:

nosetests -v -s test_commands.py:TestCommand.test_picard_metrics


Figure 3. Summarizing metrics with a wrapper task

Here, I’ve set the option --use-long-names=False, which changes the output to show only the class names for each task. This example utilizes a configuration file that links tasks together. More about that in the next example.

## Examples with ratatosk_run.py¶

NB: these examples don’t actually do anything except plot the dependencies. To actually run the pipelines, see the examples in the extension module ratatosk.ext.scilife.

### Dry run¶

The --dry-run option will resolve dependencies but not actually run anything. In addition, it will print the tasks that will be called. By passing a target

ratatosk_run.py RawIndelRealigner --target sample.merge.realign.bam
--custom-config /path/to/ratatosk/examples/J.Doe_00_01.yaml --dry-run


we get the dependencies as specified in the config file:

Figure 1. Dry run output.

The task RawIndelRealigner is defined in ratatosk.pipeline.haloplex and is a modified version of IndelRealigner. It is used for analysis of HaloPlex data.

### Merging samples over several runs¶

Samples that have data from two separate runs should be merged. The class ratatosk.lib.tools.picard.MergeSamFiles merges sample_run files and places the result in the sample directory. The MergeSamFiles task needs information on how to find files to merge. This is currently done by registering a handler via the configuration option target_generator_handler. In the custom configuration file J.Doe_00_01.yaml, we have

ratatosk.lib.tools.picard:
MergeSamFiles:
target_generator_handler: test.site_functions.collect_sample_runs


where the function test.site_functions.collect_sample_runs() is defined as

def collect_sample_runs(task):
return ["sample/fc1/sample.sort.bam",
"sample/fc2/sample.sort.bam"]


This can be any python function, with the only requirement that it return a list of source file names. This task could be run as follows

ratatosk_run.py MergeSamFiles  --target sample.sort.merge.bam
--config-file /path/to/ratatosk/examples/J.Doe_00_01.yaml


resulting in (dry run version shown here)

Figure 2. Dry run output of merging.

Changing the following configuration section (see J.Doe_00_01_trim.yaml):

ratatosk.lib.utils.misc:
ResyncMates:

ratatosk.lib.align.bwa:
Aln:


and running

ratatosk_run.py MergeSamFiles
--target P001_101_index3/P001_101_index3.trimmed.sync.sort.merge.bam
--config-file ~/opt/ratatosk/examples/J.Doe_00_01_trim.yaml


runs the same pipeline as before, but on adapter-trimmed data.

### Extending workflows with subclassed tasks¶

It’s dead simple to add tasks of a given type. Say you want to calculate hybrid selection on bam files that have and haven’t been mark duplicated. By subclassing an existing task and giving the new class it’s own configuration file location, you can configure the new task to depend on whatever you want. In ratatosk.lib.tools.picard I have added the following class:

class HsMetricsNonDup(HsMetrics):
"""Run on non-deduplicated data"""


and a picard metrics wrapper task

class PicardMetricsNonDup(JobWrapperTask):
"""Runs hs metrics on both duplicated and de-duplicated data"""
def requires(self):
return [InsertMetrics(target=self.target + str(InsertMetrics.target_suffix.default[0])),
HsMetrics(target=self.target + str(HsMetrics.target_suffix.default)),
HsMetricsNonDup(target=rreplace(self.target, str(DuplicationMetrics.label.default), "", 1) + str(HsMetrics.target_suffix.default)),
AlignmentMetrics(target=self.target + str(AlignmentMetrics.target_suffix.default))]


The task can be configured by adding a configuration subsection PicardMetricsNonDup to the picard configuration section. In the configuration file J.Doe_00_01_nondup.yaml we have:

ratatosk.lib.tools.picard:
PicardMetricsNonDup:

ratatosk_run.py PicardMetricsNonDup  --target P001_101_index3/P001_101_index3.sort.merge.dup