Operators
Nextflow uses operators to filter, transform, split, combine and carry out mathematical operations on channels.
We will cover some of the most commonly used operators below, using dummy files.
Dummy files are empty files that contain file extensions we can test within the script.
Warning
One of the most common mistakes is to test the workflow on a full size dataset. This can be extremely time consuming and burns through uneccessary computational resources.
Map
The map{} operator performs a mapping function on an input channel. Conceptually, map allows you to re-organise the structure of a channel.
Hint
nextflow uses 0 based indexing
#!/usr/bin/env nextflow
Channel.from( ['A', 1, 2], ['B', 3, 4] )
.map{ it -> it[0] }
.view()
Channel.from( ['A', 1, 2], ['B', 3, 4] )
.map{ it -> [ it[1], it[2] ] }
.view()
$nextflow run map.nf
N E X T F L O W ~ version 21.04.1
Launching `map.nf` [jovial_stallman] - revision: 476751b062
A
B
[1, 2]
[3, 4]
Join
The join() operator combines two channels according to a common tuple key. The order in which you supply channels to join() matters:
#!/usr/bin/env nextflow
ch_genes = Channel.from( ['SRR0001', 'SRR0001_mRNA.txt'], ['SRR0002', 'SRR0002_mRNA.txt'] )
.view()
ch_mirna = Channel.from( ['SRR0001', 'SRR0001_miRNA.txt'], ['SRR0002', 'SRR0002_miRNA.txt'] )
.view()
all_files = ch_genes.join(ch_mirna).view()
$ nextflow run map.nf
N E X T F L O W ~ version 21.04.1
Launching `join.nf` [gloomy_elion] - revision: 85b961030d
[SRR0001, SRR0001_mRNA.txt]
[SRR0002, SRR0002_mRNA.txt]
[SRR0001, SRR0001_miRNA.txt]
[SRR0002, SRR0002_miRNA.txt]
[SRR0001, SRR0001_mRNA.txt, SRR0001_miRNA.txt]
[SRR0002, SRR0002_mRNA.txt, SRR0002_miRNA.txt]
BaseName
Those familiar with bash will recognise commands such as basename /path/to/file.txt, ${VAR%pattern} to strip the path and file extension, respectively.
In nextflow, the same can be achieved using Name, baseName, simpleName and Extension.
Let’s use it in conjunction with map{}:
Note
This operation must be performed on a file, not a string. We must read in a dummy file using fromPath(). Don’t get too caught up on this, I am just demonstrating the functions.
#!/usr/bin/env nextflow
Channel.fromPath( "dummy_files/SRR0001_R{1,2}.fastq.gz" )
.view()
.map{ it -> [ it.Name, it.baseName, it.simpleName, it.Extension ] }
.view()
nextflow run map.nf
N E X T F L O W ~ version 21.04.1
Launching `map.nf` [curious_newton] - revision: cd2c4772e7
/data/test/dummy_files/SRR0001_R1.fastq.gz
/data/test/dummy_files/SRR0001_R2.fastq.gz
[SRR0001_R1.fastq.gz, SRR0001_R1.fastq, SRR0001_R1, gz]
[SRR0001_R2.fastq.gz, SRR0001_R2.fastq, SRR0001_R2, gz]
Flatten
The flatten() operator will transform channels in a manner such that each item in the channel is output one by one.
Say for example we wanted to feed in our fastq files one by one to a process (each process is run in parallel - this could speed up our workflow) we would use flatten().
Let’s use the dummy files as an example:
#!/usr/bin/env nextflow
Channel.fromFilePairs( "dummy_files/SRR000*_R{1,2}.fastq.gz" )
.map{ it -> [ it[1][0], it[1][1] ] }
.flatten()
.view()
$nextflow run map.nf
N E X T F L O W ~ version 21.04.1
Launching `map.nf` [nice_sinoussi] - revision: 403faf87e0
/data/test/dummy_files/SRR0002_R1.fastq.gz
/data/test/dummy_files/SRR0002_R2.fastq.gz
/data/test/dummy_files/SRR0007_R1.fastq.gz
/data/test/dummy_files/SRR0007_R2.fastq.gz
/data/test/dummy_files/SRR0003_R1.fastq.gz
/data/test/dummy_files/SRR0003_R2.fastq.gz
/data/test/dummy_files/SRR0004_R1.fastq.gz
/data/test/dummy_files/SRR0004_R2.fastq.gz
/data/test/dummy_files/SRR0009_R1.fastq.gz
/data/test/dummy_files/SRR0009_R2.fastq.gz
/data/test/dummy_files/SRR0008_R1.fastq.gz
/data/test/dummy_files/SRR0008_R2.fastq.gz
/data/test/dummy_files/SRR0006_R1.fastq.gz
/data/test/dummy_files/SRR0006_R2.fastq.gz
/data/test/dummy_files/SRR0001_R1.fastq.gz
/data/test/dummy_files/SRR0001_R2.fastq.gz
/data/test/dummy_files/SRR0005_R1.fastq.gz
/data/test/dummy_files/SRR0005_R2.fastq.gz