#### Next topic

Predefined workflows

# Writing a pipeline¶

One of the main uses of ratatosk is to write an analysis pipeline. The steps involved are best explained by following an example.

## Example: a variant calling pipeline¶

Assume we want to define a pipeline that performs the following tasks:

1. align reads to a reference with bwa
2. merge reads from several runs
3. generate summary statistics

Furthermore, assume we have the following input files (you’ll actually find these files in the test directory):

sample1_run1_1.fastq.gz
sample1_run1_2.fastq.gz
sample1_run2_1.fastq.gz
sample1_run2_2.fastq.gz


Setting up dependencies between tasks requires two things:

2. keeping track of the output file name (i.e. the target)

Furthermore, I’ll assume we use a configuration file called config.yaml that we put in the test directory, running the command from the ratatosk install directory.

### 1. Setting up alignment dependency¶

Note

Currently you need to manually bwa index the reference.

Here we utilize the bwa.Aln class. The default parent_task is ratatosk.lib.align.bwa.InputFastqFile, which is defined to point to a task without requirements (i.e. it was generated by some external process). Therefore, for the parent task, we actually don’t need to add anything to our config file, but just for the sake of clarity, we set the following in our config file. We do, however, need to modify the bwaref that points to the reference (also located in the test directory).

bwa:
bwaref: test/data/chr11.fa
Aln:
- ratatosk.lib.align.bwa.InputFastqFile


Running the command ratatosk_run.py Aln --config-file test/config.yaml --target test/data/sample1_run1_1.sai will run bwa aln -t 1 test/data/chr11.fa test/data/sample1_run1_1.fastq.gz \> test/data/sample1_run1_1.sai-luigi-tmp-7165456595, where the temporary output file is renamed on success to the target file name, which should be sample1_run1_1.sai .See figure 1 for the corresponding graph.

Figure 1. Alignment.

### 2. Setting up sampe dependency¶

Next, we should run bwa sampe on the .sai files, generating an output target sample1_run1.sam. Here, we would need to strip out the read suffixes (_1 and _2). The latter are defined in the ratatosk.lib.align.bwa.Aln task by the options read1_suffix and read2_suffix.

bwa:
bwaref: test/data/chr11.fa
Aln:
- ratatosk.lib.align.bwa.InputFastqFile
Sampe: