Channels are a key data structure of Nextflow that allows the implementation of reactive-functional oriented computational workflows based on the Dataflow programming paradigm.
They are used to logically connect tasks to each other or to implement functional style data transformations.
5.1 Channel types¶
Nextflow distinguishes two different kinds of channels: queue channels and value channels.
5.1.1 Queue channel¶
A queue channel is an asynchronous unidirectional FIFO queue that connects two processes or operators.
- asynchronous means that operations are non-blocking.
- unidirectional means that data flows from a producer to a consumer.
- FIFO means that the data is guaranteed to be delivered in the same order as it is produced. First In, First Out.
Try the following snippets:
Click the icons in the code for explanations.
- Use the built-in print line function
printlnto print the
- Apply the
viewchannel operator to the
chchannel prints each item emitted by the channels
Try to execute this snippet. You can do that by creating a new
.nf file or by editing an already existing
5.1.2 Value channels¶
A value channel (a.k.a. singleton channel) by definition is bound to a single value and it can be read unlimited times without consuming its contents. A
value channel is created using the value channel factory or by operators returning a single value, such as first, last, collect, count, min, max, reduce, and sum.
To better understand the difference between value and queue channels, save the snippet below as
When you run the script, it prints only 2, as you can see below:
A process will only instantiate a task when there are elements to be consumed from all the channels provided as input to it. Because
ch2 are queue channels, and the single element of
ch2 has been consumed, no new process instances will be launched, even if there are other elements to be consumed in
To use the single element in
ch2 multiple times, we can either use
Channel.value as mentioned above, or use a channel operator that returns a single element such as
Besides, in many situations, Nextflow will implicitly convert variables to value channels when they are used in a process invocation. For example, when you invoke a process with a workflow parameter (
params.example) which has a string value, it is automatically cast into a value channel.
5.2 Channel factories¶
These are Nextflow commands for creating channels that have implicit expected inputs and functions.
value channel factory is used to create a value channel. An optional not
null argument can be specified to bind the channel to a specific value. For example:
- Creates an empty value channel
- Creates a value channel and binds a string to it
- Creates a value channel and binds a list object to it that will be emitted as a sole emission
Channel.of allows the creation of a queue channel with the values specified as arguments.
The first line in this example creates a variable
ch which holds a channel object. This channel emits the values specified as a parameter in the
of channel factory. Thus the second line will print the following:
Channel.of channel factory works in a similar manner to
Channel.from (which is now deprecated), fixing some inconsistent behaviors of the latter and providing better handling when specifying a range of values. For example, the following works with a range from 1 to 23:
Channel.fromList channel factory creates a channel emitting the elements provided by a list object specified as an argument:
fromPath channel factory creates a queue channel emitting one or more files matching the specified glob pattern.
This example creates a channel and emits as many items as there are files with a
csv extension in the
./data/meta folder. Each element is a file object implementing the Path interface.
Two asterisks, i.e.
**, works like
* but cross directory boundaries. This syntax is generally used for matching complete paths. Curly brackets specify a collection of sub-patterns.
|type||Type of path returned, either
|maxDepth||Maximum number of directory levels to visit (default:
Learn more about the glob patterns syntax at this link.
Channel.fromPath channel factory to create a channel emitting all files with the suffix
.fq in the
data/ggal/ directory and any subdirectory, in addition to hidden files. Then print the file names.
fromFilePairs channel factory creates a channel emitting the file pairs matching a glob pattern provided by the user. The matching files are emitted as tuples, in which the first element is the grouping key of the matching pair and the second element is the list of files (sorted in lexicographical order).
It will produce an output similar to the following:
[liver, [/workspace/gitpod/nf-training/data/ggal/liver_1.fq, /workspace/gitpod/nf-training/data/ggal/liver_2.fq]] [gut, [/workspace/gitpod/nf-training/data/ggal/gut_1.fq, /workspace/gitpod/nf-training/data/ggal/gut_2.fq]] [lung, [/workspace/gitpod/nf-training/data/ggal/lung_1.fq, /workspace/gitpod/nf-training/data/ggal/lung_2.fq]]
The glob pattern must contain at least a star wildcard character (
|type||Type of paths returned, either
|maxDepth||Maximum number of directory levels to visit (default:
|size||Defines the number of files each emitted item is expected to hold (default:
fromFilePairs channel factory to create a channel emitting all pairs of fastq read in the
data/ggal/ directory and print them. Then use the
flat: true option and compare the output with the previous execution.
Channel.fromSRA channel factory makes it possible to query the NCBI SRA archive and returns a channel emitting the FASTQ files matching the specified selection criteria.
The query can be project ID(s) or accession number(s) supported by the NCBI ESearch API.
This function now requires an API key you can only get by logging into your NCBI account.
Instructions for NCBI login and key acquisition
- Go to: https://www.ncbi.nlm.nih.gov/
- Click the top right "Log in" button to sign into NCBI. Follow their instructions.
- Once into your account, click the button at the top right, usually your ID.
- Go to Account settings
- Scroll down to the API Key Management section.
- Click on "Create an API Key".
- The page will refresh and the key will be displayed where the button was. Copy your key.
For example, the following snippet will print the contents of an NCBI project ID:
<Your API key here> with your API key.
This should print:
[SRR3383346, [/vol1/fastq/SRR338/006/SRR3383346/SRR3383346_1.fastq.gz, /vol1/fastq/SRR338/006/SRR3383346/SRR3383346_2.fastq.gz]] [SRR3383347, [/vol1/fastq/SRR338/007/SRR3383347/SRR3383347_1.fastq.gz, /vol1/fastq/SRR338/007/SRR3383347/SRR3383347_2.fastq.gz]] [SRR3383344, [/vol1/fastq/SRR338/004/SRR3383344/SRR3383344_1.fastq.gz, /vol1/fastq/SRR338/004/SRR3383344/SRR3383344_2.fastq.gz]] [SRR3383345, [/vol1/fastq/SRR338/005/SRR3383345/SRR3383345_1.fastq.gz, /vol1/fastq/SRR338/005/SRR3383345/SRR3383345_2.fastq.gz]] // (remaining omitted)
Multiple accession IDs can be specified using a list object:
[ERR908507, [/vol1/fastq/ERR908/ERR908507/ERR908507_1.fastq.gz, /vol1/fastq/ERR908/ERR908507/ERR908507_2.fastq.gz]] [ERR908506, [/vol1/fastq/ERR908/ERR908506/ERR908506_1.fastq.gz, /vol1/fastq/ERR908/ERR908506/ERR908506_2.fastq.gz]] [ERR908505, [/vol1/fastq/ERR908/ERR908505/ERR908505_1.fastq.gz, /vol1/fastq/ERR908/ERR908505/ERR908505_2.fastq.gz]]
Read pairs are implicitly managed and are returned as a list of files.
It’s straightforward to use this channel as an input using the usual Nextflow syntax. The code below creates a channel containing two samples from a public SRA study and runs FASTQC on the resulting files. See:
If you want to run the workflow above and do not have fastqc installed in your machine, don’t forget what you learned in the previous section. Run this workflow with
-with-docker biocontainers/fastqc:v0.11.5, for example.
5.2.7 Text files¶
splitText operator allows you to split multi-line strings or text file items, emitted by a source channel into chunks containing n lines, which will be emitted by the resulting channel. See:
- Instructs Nextflow to make a channel from the path
splitTextoperator splits each item into chunks of one line by default.
- View contents of the channel.
You can define the number of lines in each chunk by using the parameter
by, as shown in the following example:
subscribe operator permits execution of user defined functions each time a new value is emitted by the source channel.
An optional closure can be specified in order to transform the text chunks produced by the operator. The following example shows how to split text files into chunks of 10 lines and transform them into capital letters:
You can also make counts for each line:
Finally, you can also use the operator on plain files (outside of the channel context):
5.2.8 Comma separate values (.csv)¶
splitCsv operator allows you to parse text items emitted by a channel, that are CSV formatted.
It then splits them into records or groups them as a list of records with a specified length.
In the simplest case, just apply the
splitCsv operator to a channel emitting a CSV formatted text files or text entries. For example, to view only the first and fourth columns:
When the CSV begins with a header line defining the column names, you can specify the parameter
header: true which allows you to reference each value by its column name, as shown in the following example:
Alternatively, you can provide custom header names by specifying a list of strings in the header parameter as shown below:
You can also process multiple CSV files at the same time:
Notice that you can change the output format simply by adding a different delimiter.
Finally, you can also operate on CSV files outside the channel context:
Try inputting fastq reads into the RNA-Seq workflow from earlier using
Add a CSV text file containing the following, as an example input with the name "fastq.csv":
Then replace the input channel for the reads in
script7.nf. Changing the following lines:
To a splitCsv channel factory input:
Finally, change the cardinality of the processes that use the input data. For example, for the quantification process, change it from:
Repeat the above for the fastqc step.
Now the workflow should run from a CSV file.
5.2.9 Tab separated values (.tsv)¶
Parsing TSV files works in a similar way, simply add the
sep: '\t' option in the
Try using the tab separation technique on the file
data/meta/regions.tsv, but print just the first column, and remove the header.
5.3 More complex file formats¶
We can also easily parse the JSON file format using the
splitJson channel operator.
splitJson operator supports JSON arrays:
And even a JSON array of JSON objects!
Files containing JSON content can also be parsed:
This can also be used as a way to parse YAML files:
- patient_id: ATX-TBL-001-GB-01-105 region_id: R1 feature: pass_vafqc_flag pass_flag: "TRUE" - patient_id: ATX-TBL-001-GB-01-105 region_id: R1 feature: pass_stripy_flag pass_flag: "TRUE" - patient_id: ATX-TBL-001-GB-01-105 region_id: R1 feature: pass_manual_flag pass_flag: "TRUE" - patient_id: ATX-TBL-001-GB-01-105 region_id: R1 feature: other_region_selection_flag pass_flag: "TRUE" - patient_id: ATX-TBL-001-GB-01-105 region_id: R1 feature: ace_information_gained pass_flag: "TRUE" - patient_id: ATX-TBL-001-GB-01-105 region_id: R1 feature: concordance_flag pass_flag: "TRUE" - patient_id: ATX-TBL-001-GB-01-105 region_id: R2 feature: pass_vafqc_flag pass_flag: "TRUE" - patient_id: ATX-TBL-001-GB-01-105 region_id: R2 feature: pass_stripy_flag pass_flag: "TRUE" - patient_id: ATX-TBL-001-GB-01-105 region_id: R2 feature: pass_manual_flag pass_flag: "TRUE" - patient_id: ATX-TBL-001-GB-01-105 region_id: R2 feature: other_region_selection_flag pass_flag: "TRUE" - patient_id: ATX-TBL-001-GB-01-105 region_id: R2 feature: ace_information_gained pass_flag: "TRUE" - patient_id: ATX-TBL-001-GB-01-105 region_id: R2 feature: concordance_flag pass_flag: "TRUE" - patient_id: ATX-TBL-001-GB-01-105 region_id: R3 feature: pass_vafqc_flag pass_flag: "TRUE" - patient_id: ATX-TBL-001-GB-01-105 region_id: R3 feature: pass_stripy_flag pass_flag: "FALSE"
ATX-TBL-001-GB-01-105 -- pass_vafqc_flag ATX-TBL-001-GB-01-105 -- pass_stripy_flag ATX-TBL-001-GB-01-105 -- pass_manual_flag ATX-TBL-001-GB-01-105 -- other_region_selection_flag ATX-TBL-001-GB-01-105 -- ace_information_gained ATX-TBL-001-GB-01-105 -- concordance_flag ATX-TBL-001-GB-01-105 -- pass_vafqc_flag ATX-TBL-001-GB-01-105 -- pass_stripy_flag ATX-TBL-001-GB-01-105 -- pass_manual_flag ATX-TBL-001-GB-01-105 -- other_region_selection_flag ATX-TBL-001-GB-01-105 -- ace_information_gained ATX-TBL-001-GB-01-105 -- concordance_flag ATX-TBL-001-GB-01-105 -- pass_vafqc_flag ATX-TBL-001-GB-01-105 -- pass_stripy_flag
5.3.3 Storage of parsers into modules¶
The best way to store parser scripts is to keep them in a Nextflow module file.
Let's say we don't have a JSON channel operator, but we create a function instead. The
parsers.nf file should contain the
parseJsonFile function. See the contente below:
ATX-TBL-001-GB-01-105 has pass_stripy_flag as feature ATX-TBL-001-GB-01-105 has ace_information_gained as feature ATX-TBL-001-GB-01-105 has concordance_flag as feature ATX-TBL-001-GB-01-105 has pass_vafqc_flag as feature ATX-TBL-001-GB-01-105 has pass_manual_flag as feature ATX-TBL-001-GB-01-105 has other_region_selection_flag as feature
Nextflow will use this as a custom function within the workflow scope.
You will learn more about module files later in the Modularization section of this tutorial.