Aller au contenu

7. Operators

Nextflow operators are methods that allow you to manipulate channels. Every operator, with the exception of set and subscribe, produces one or more new channels, allowing you to chain operators to fit your needs.

There are seven main groups of operators are described in greater detail within the Nextflow Reference Documentation, linked below:

  1. Filtering operators
  2. Transforming operators
  3. Splitting operators
  4. Combining operators
  5. Forking operators
  6. Maths operators
  7. Other operators

7.1 Basic example

The map operator applies a function of your choosing to every item emitted by a channel, and returns the items so obtained as a new channel. The function applied is called the mapping function and is expressed with a closure as shown in the example below:

Click the icons in the code for explanations.

snippet.nf
1
2
3
nums = Channel.of(1, 2, 3, 4) // (1)!
square = nums.map { it -> it * it } // (2)!
square.view() // (3)!
  1. Creates a queue channel emitting four values
  2. Creates a new channel, transforming each number into its square
  3. Prints the channel content
eyJ2ZXJzaW9uIjoiMSIsImVuY29kaW5nIjoiYnN0cmluZyIsImNvbXByZXNzZWQiOnRydWUsImVuY29kZWQiOiJ4nO1dWVPbyFx1MDAxNn7Pr6CY10HT2+llqm7dwixcdJtZjIFwayolbNlcdTAwMTbYkpFlXHUwMDEzmMp/v6dcdTAwMDVY8lwisFx1MDAxOVxiXCJjPVx1MDAxMGhJraPu831n6T7K35+Wlpbj2663/OfSsve95rb9euTeLP9u21x1MDAwN17U88NcdTAwMDBPseTvXtiPasmVrTju9v7844+OXHUwMDFiXXlxt+3WPGfg9/puu1x1MDAxN/frfujUws5cdTAwMWZ+7HV6/7U/y27H+0837NTjyElcdTAwMWay4tX9OIzun+W1vY5cdTAwMTfEPez9f/j30tLfyc+MdG5cdTAwMTSF94IlzalwnMvx1nJcdTAwMTgkgjLOmOFkeNrvreODYq+O51x1MDAxYSisl56xTcvB56vGYXN/rU7Ob1dcdTAwMDdcdTAwMTdcdTAwMDeVb5dRP31mw2+3K/Ft+35cZtxaq1x1MDAxZnnp2V5cdTAwMWOFV96pX49beF6MtVx1MDAwZu/rhfj66V1R2G+2XHUwMDAyr2ffnFx1MDAwZVvDrlvz41vbRlLx3aCZ9JG2fMe/gHBcdTAwMDdAS0WlXHUwMDEyhIOG4WnbXHUwMDAxV9IhSipJXHRTXFxIMybYWtjGSUDBflx1MDAxM1x1MDAwNlxiqFS0XHUwMDBit3bVRPmC+vCaOHKDXteNcKrS625cdTAwMWVeWWmHXHUwMDEzo1x1MDAxNFx1MDAwNcap1Gp4Qcvzm63YXG7uXHUwMDEwo7WkQJVcdTAwMDHDObCMNF4yLZRJil2gKMMzVobuVj3Rjb+yXHUwMDAzXHUwMDE31Fx1MDAxZlx1MDAwNu5RWVJ1YVx1MDAwZi0/0rex129k1Cztqt+tu/dKgUNcYkJqjYrDU9HafnCFJ4N+u522hbWrKXrUi90oLvlB3Vx1MDAwZprjt3hBPT2TXHUwMDEx+UH3t1x1MDAxMlx1MDAxNbz7etTYuGyuXGZ6tXO2tnLa0r04XHUwMDFka6uEYa1v5V/BwaRCXHUwMDEyqUEzRoQxKnNV0+3a16GOXHUwMDFka5xcZq5cdTAwMDVcdTAwMDCO98SwtN1evFx1MDAxNnY6fowjcFx1MDAxMPpBPC528kqrXHUwMDE2fS3PrU95qey5cZh2bY8pqu2R/raUanLyx/D3v36fenWegtljZYpupf19yv778PJcdTAwMTNcZuO12363503jXHUwMDE4VMo8jlx1MDAxMVxiLFwiOE9cdTAwMDH/XHUwMDFjyzw9xXOxXGZcdTAwMWRrf0OWMVx1MDAxY/VNg1x1MDAxMVJcdTAwMWEqWfq6XHTJXGLuXGJUR0U4XHUwMDA1XHRSqDHBUpYhyfFylmHC0VxmXHUwMDFmw1KaXHUwMDFi8otcdTAwMTSo7zJz5oFUsElxrVKZ34hShvf8ndHMmWzLiFwi3mNoeObHo/a+XCJhPVx0XHUwMDA0vH06XG6ozkOBQSpcdTAwMTKoI3pmXHUwMDE0nFxcr630Yrj+XFzyeoPzs1x1MDAxMHrHm2HRUSCJclxik4Igp05DgXCEokoz/dYo4JI53HCqpNT4SJLR7dTaTthWKnCaXHUwMDE43jNcdTAwMDdcdTAwMTBGlKhgRnS0s9e1ZCPnXtWM5c/d2O0zWa08sHKaa7JcdTAwMTQoXHUwMDA13GTU4Dmwnp93LytUbfLr1Xp1R+83XHUwMDE02W5cdTAwMTZcdTAwMWas0lx1MDAwMZFcdTAwMTA/0UKnqmhvR1xiO0BcZlx1MDAwNVBcdTAwMDBcdTAwMTgljIv1mlCVXHUwMDBlNUZcdTAwMWHsXHUwMDA1qVx1MDAwMfgsSFx1MDAwNUnRqdNzecH/XGKpXCJ90lx1MDAwMqmPR+7Ujd39j4BKiVx1MDAxMePNj0ilklx1MDAxMIZcdTAwMGaH2WNY//g8im51XHUwMDEwuer2XCLod+7qh1eHhYeqIY40klx1MDAxYcWUssHMmF1FIFx1MDAxM5BcdTAwMWH9OIZw5WNyvVx1MDAxZVYxYDLMaIq+XGZojSHVlCBcdTAwMTaMg45cdTAwMTDjXHUwMDEy7b+0WjFcdTAwMGVeoYyhRM9jZN9cIoBdIHrK1cqRkjFOXHUwMDE4JZIzmeZcdTAwMWbsQdEuS4s3prQkVDHxXFx3+fpiXHUwMDBmZlx1MDAxY6UkVVxcUSaE4fBsd8RB02zDdZRPafwt253A7qgwUiEhcUpAP9dcdTAwMWRGQ4bjWyhcZk9cciOoXHUwMDFh2e4m9Pi1+IxcdTAwMTkz3jxcZpZxUHFs51xiXHUwMDEzWNQ6i1x1MDAwZjf3O1H9a3vr4lbCgVxcLzqdXHUwMDE5RTBcdTAwMWMmXHUwMDFjlGBcdTAwMTQyTPDIZpxLdDy41Fx1MDAwNDJ5m2LSXHUwMDE5XHUwMDA1Q1B/lGRcdTAwMGJCW1x1MDAxMNq/kdCoyV1hoOikI0Q5mT2Wql6KWK3Tg2a063X8i63VTXFW+MSHVsYhXHUwMDE0XGKG9IhFmvLII6NJZqdcdTAwMDRcdTAwMWRmJTVcdTAwMWJcdTAwMTOraIRGXHJDX1KIhYe2ILRfmdBi73s8NY+bWVwinEhcclx1MDAxOabss2ams7vjjfWgSlb8Vml9db/UuJMl96TodCaBOOjQXHUwMDE4ZC3ClSBjeVxcyVx1MDAxZGFXkeDNVzPolGxQJqv8wFio7VZcdTAwMTI+z1x1MDAwMsY7ZINcdTAwMWFhXHUwMDEwV/y7hFx1MDAxY8lI66bb8dvJ0Fx1MDAwZZtcdTAwMTPlxOFcdTAwMTHZwet5iVx1MDAxMVx1MDAxZbnfXrja9ptBYqO9xqhKx37NbVx1MDAwZk93/Hq9ndGvXHUwMDFhPtvFXHUwMDFlo61cdNJcbiO/6Vx1MDAwN277eCjHy7CU71x1MDAxYjBikKxcdTAwMTSF2ZdcdTAwMDbP/Xr5W4Pv7t2Uqrds56zi71x1MDAxZF5cdTAwMTZcdTAwMWRMXHUwMDE45DjKaJtM5URlXHUwMDA3pJhgQrui7E6HX1x1MDAxMU28IGjiL0NTdlx1MDAxOX1cdTAwMDJNSlx1MDAxOW6AzG6aWuebq63P5S2xu9lsld3dQet6p1xcdDRpLlx1MDAxZKNcdTAwMDE9XHUwMDFmTTlToIuNJoqekDCC8oKvVLxcYk2sIGhiL0NcdTAwMTNkpnBiYcFo1CEzR9x6tHW6e3b4ef/usn1W5lxyftC93iu8bVx1MDAxMlx1MDAxODEoRJFUXGZcclx1MDAxM77yKJpcdTAwMDBcdTAwMWMmpVxyXHUwMDBiQGHYXHUwMDAx47v2Xlx1MDAwZk1yXG6aMlx1MDAxYlx1MDAxN1x1MDAxZtEkXHUwMDE0hqaa0lx1MDAwZoQmPVx1MDAxZE10XHUwMDAyTUG/05tcdTAwMGUoeFx1MDAxMaDisPtcIjQlcjxcdKj81HZmIW7S29NCSDWHfTraXHUwMDFh1N2bvZPg7Fx1MDAwNtxcclx1MDAxY+lB7YJcdTAwMTZcdTAwMWRRSoMjtFx1MDAwNFxuXHUwMDE0KUSPh042uU0oxpCMUEPI21x1MDAwMep1UkHCMGFcdTAwMTdcdTAwMWX1XHUwMDFjeFukglx1MDAxNqmgpV8kXHUwMDE1JEguoVFJmd3YqmZfrFx1MDAxM4RcdTAwMDfejl7pbuNvoTn4slx1MDAwZau3RSc0w1x0ulxiTI5sXHUwMDFjLaSvzVFlJVx1MDAxMD5cdTAwMGZZfVx1MDAxNFebXHUwMDE2xNWmT3tcdTAwMDa5VShcIn9cdTAwMTOPVmCooDBcdTAwMDeOXHUwMDBlK1x1MDAwN1v9jVxunLngVb9cdTAwMDZfoFx1MDAxNK1cdTAwMTS9XHUwMDBlxWjkayqlXHUwMDAxu7RccmqyXGZFWlpDxkbCXHUwMDE0kIulf1qGXCKZQ9FcdTAwMTdDQVx1MDAwNJNi2jbxie12gkrQKNxihbuYhVx1MDAxZnlTOnbzTJYwXHUwMDFmwZC7KkJcdTAwMDFcdTAwMWZcZugyzJ7Jbai7OKxuXHUwMDFib1Dud1x1MDAxYScrX26vu+2iQ1x1MDAxOFxiulx1MDAxOVpgXHUwMDAwakvJNPBR51x1MDAxZVx1MDAxNHpQmlx0bdDdouJcdOe+XHUwMDE4pWSCXHUwMDE5wZn6xX37vEqyqL5cdTAwMWbpdrTRXHUwMDFmiGD/aq9/VDmuVWaoJFx1MDAwM0KozmxJXHUwMDFmqSVcdTAwMTOKcnSUXGK6qHJiYD5cdTAwMTKlvF8tXHUwMDE5V7nb4yg3XHUwMDE4XHUwMDAzcDZcdTAwMDfPPD3LXHUwMDA1dbnBaEdcdKGJLSZjlI5u91x1MDAwNW5wblx1MDAxNDVcdTAwMWGV8lx0R+E9KsmoRVx1MDAwN+X0XUvJnrYtI3r4slKyuVx1MDAxOOtJXHUwMDFj5Fan8FxcY1x1MDAwYoQyXGaqZ4dApSXarWZ592A9+Hrw/ahP1dVa4aNOaVx1MDAxM9Ngi53pNFxiXGLicKVcdTAwMDC9njeFwIvKyFx1MDAxOGFaWXPx08rIPrBf/NHLyFx1MDAwNMut+aSSXHUwMDAweop8jjoyf/dk86zc7J3e3urB5Vx1MDAwZW1tlb7tflx1MDAwNKgyjGjNvU88tp1bXHUwMDEy7lxiij4zt1VCjMkxud63kCwp9VO66OtJvzxUf0ZcdTAwMWRcdTAwMTnLr/i0XHUwMDFiJ5iWbHak7lbc89LFVbDfb1e/uHB0KlvHp4VHqpGOYKiDSVx1MDAxOVx1MDAxOYix8FUwRKqtgeZcdTAwMDb1VesxuYq2NqU0KEAv4M19zVx1MDAwNaTtsVibKlrdhaD5dVx1MDAxN4Lh+Cs6x0bltfPS9srn65OdnePPcOPtXHUwMDBm1na8XHUwMDBmUEmGlsOuTdlKMslh1PewjMaQ8DRcdTAwMDDFXHUwMDFm/O1cdTAwMDKF12E0bDaCzLWNeUFoXHUwMDBiQlv6VVxijT+xIVx1MDAwZsM5kHSeXHJ5g29nartEettu465yXHUwMDE53tT2yvXCV15oLVx1MDAxYyVw1pNCMiknXHRccpihjFx1MDAwMLXfwCh6aayQgmrx9tnAXHUwMDA1odljQWhF2z2UX+lvh4FgXHUwMDEwNsfXN8vsrnRM1Jq5pFx1MDAwMVvxXHUwMDFhXHUwMDE33vpFteh8ZivJjLArYlx1MDAwMNmVnYTNcNw5umWMv/VCxiR3Tds9JJJNzlx1MDAwNc/avmz3kCzK9iH5sq36guauh6BTYOuq59g+dFu/oXvq6vrr5emuavZZtcrK34tcdTAwMGUkJTSafsFJUkWmMlx1MDAxZuR6hFx1MDAxMpOacnjbXHUwMDA1kdk24oFcdTAwMDLNXHL8kvWYpiBIMi9cdTAwMDRcdTAwMTKwPCBxgaaP6jk87P5cdTAwMDbZObq42lxy482t3Y04XHUwMDE0ldPaatGBpG05JrWVJFxiXHUwMDE3nl0m+Jk2acbqMSFcctNGXHUwMDE2/Tt3/8bKZilyg1X0S1x1MDAxOVdcXM6efKtcdTAwMWScf1x1MDAwZlx1MDAwZlx1MDAwN1V1urPf2vau3dtKJIqOJG60XbbRilx1MDAxOGVcdTAwMTSfMEnUwcBBMmVcYlx1MDAxM/B2635mttIxpu1cdTAwMTelOJlnXHUwMDEz60cpXHUwMDFk61333axGvF/x2IMkT1wiKj//I1VcdTAwMWWkqOZcXFx1MDAwYpX9bOBzmDprxP2Nz1x1MDAxYl9cdTAwMWLVtvLYoMPL5dJm0TGFjlx1MDAxZMaiTDFAZJHsp+KGXHRtouxcdTAwMDfV7dYwnv+fXHUwMDE1XHUwMDE0JP9j0Fx1MDAxZJVSLDaOL1x1MDAxMkD/xlx1MDAwNFx1MDAxMNBcXEJjXHUwMDA0fVx1MDAwNEHM7GHr2c16XFzfvYmuvfXqxt31Xa23XHKFL4c1XFw6Qmhtq8eRtsZcbsxcdTAwMGLmbXM7IdQs6seKVz8m8zfvoPaA0ITOsSO2Wz1hrYiu7VXuLi74dSdUXHUwMDFibtGLT4zhXHUwMDBlXHUwMDE3zFx1MDAxMFs/JsjoJ1x1MDAwNkHZhSEpXGJcdTAwMGWDJkTnf679XHUwMDFkyseo5EagO1x1MDAwM+++XHUwMDBlRF+Ct0K4XHUwMDAxXHUwMDFmv36MitzMXHUwMDEz6ixcdTAwMTBcdTAwMDMwx3ePy/XmZqe/pVqDw7PVu6sv29HpefDKxrBcdTAwMWXauXvtj0NcdTAwMTiDrsZY9adg2kGIXGIyUlLxykYwXHUwMDE1ZVxiVa1cdTAwMWShXHUwMDA1V1x1MDAxM547unxcXHCm391xXyD2+auHs/hKXqvO34Wh0XpI9JFmX7Wkd9fAeTXYKvnmbLXhtfZBXHUwMDFmXHUwMDE1XHUwMDFlqJpIR/FcdTAwMTFkJDhcdTAwMDXi2KqsLH5f/UtIaYZjiFMxkc4yilDr9fy8bNaLkJjxVTMpuWeyWVx1MDAxZLc71VvlbOTSt05lWTHu8fPpXHUwMDAxh8tut1uJcWyGbIej7tdcdTAwMWZeMO1seeB7N6XJef+tkVx1MDAxY7bXXHUwMDA0k1b7vYQ6f3z68X/hLNZyIn0= 432nums11694square1map

Operators can also be chained to implement custom behaviors, so the previous snippet can also be written as:

snippet.nf
1
2
3
4
Channel
    .of(1, 2, 3, 4)
    .map { it -> it * it }
    .view()

Summary

In this step you have learned:

  1. The basic features of an operator

7.2 Commonly used operators

Here you will explore some of the most commonly used operators.

7.2.1 view()

The view operator prints the items emitted by a channel to the console standard output, appending a new line character to each item. For example:

snippet.nf
1
2
3
Channel
    .of('foo', 'bar', 'baz')
    .view()
Output
foo
bar
baz

An optional closure parameter can be specified to customize how items are printed. For example:

snippet.nf
1
2
3
Channel
    .of('foo', 'bar', 'baz')
    .view { "- $it" }
Output
- foo
- bar
- baz

7.2.2 map()

The map operator applies a function of your choosing to every item emitted by a channel and returns the items obtained as a new channel. The function applied is called the mapping function and is expressed with a closure. In the example below the groovy reverse method has been used to reverse the order of the characters in each string emitted by the channel.

snippet.nf
1
2
3
4
Channel
    .of('hello', 'world')
    .map { it -> it.reverse() }
    .view()

A map can associate a generic tuple to each element and can contain any data. In the example below the groovy size method is used to return the length of each string emitted by the channel.

snippet.nf
1
2
3
4
Channel
    .of('hello', 'world')
    .map { word -> [word, word.size()] }
    .view()
Output
[hello, 5]
[world, 5]

Exercise

Use fromPath to create a channel emitting the fastq files matching the pattern data/ggal/*.fq, then use map to return a pair containing the file name and the file path. Finally, use view to print the resulting channel.

Hint

You can use the name method to get the file name.

Solution

Here is one possible solution:

snippet.nf
1
2
3
4
Channel
    .fromPath('data/ggal/*.fq')
    .map { file -> [file.name, file] }
    .view()

Your output should look like this:

Output
[gut_1.fq, /workspace/gitpod/nf-training/data/ggal/gut_1.fq]
[gut_2.fq, /workspace/gitpod/nf-training/data/ggal/gut_2.fq]
[liver_1.fq, /workspace/gitpod/nf-training/data/ggal/liver_1.fq]
[liver_2.fq, /workspace/gitpod/nf-training/data/ggal/liver_2.fq]
[lung_1.fq, /workspace/gitpod/nf-training/data/ggal/lung_1.fq]
[lung_2.fq, /workspace/gitpod/nf-training/data/ggal/lung_2.fq]

7.2.3 mix()

The mix operator combines the items emitted by two (or more) channels.

snippet.nf
1
2
3
4
5
6
7
my_channel_1 = Channel.of(1, 2, 3)
my_channel_2 = Channel.of('a', 'b')
my_channel_3 = Channel.of('z')

my_channel_1
    .mix(my_channel_2, my_channel_3)
    .view()

It prints a single channel containing all the items emitted by the three channels:

Output
1
2
a
3
b
z

Warning

The items in the resulting channel have the same order as in the respective original channels. However, there is no guarantee that the element of the second channel are appended after the elements of the first. Indeed, in the example above, the element a has been printed before 3.

7.2.4 flatten()

The flatten operator transforms a channel in such a way that every tuple is flattened so that each entry is emitted as a sole element by the resulting channel.

snippet.nf
1
2
3
4
5
6
7
foo = [1, 2, 3]
bar = [4, 5, 6]

Channel
    .of(foo, bar)
    .flatten()
    .view()
Output
1
2
3
4
5
6

7.2.5 collect()

The collect operator collects all of the items emitted by a channel in a list and returns the object as a sole emission.

snippet.nf
1
2
3
4
Channel
    .of(1, 2, 3, 4)
    .collect()
    .view()
Output
[1, 2, 3, 4]

Info

The result of the collect operator is a value channel.

7.2.6 groupTuple()

The groupTuple operator collects tuples (or lists) of values emitted by the source channel, grouping the elements that share the same key. Finally, it emits a new tuple object for each distinct key collected.

snippet.nf
1
2
3
4
Channel
    .of([1, 'A'], [1, 'B'], [2, 'C'], [3, 'B'], [1, 'C'], [2, 'A'], [3, 'D'])
    .groupTuple()
    .view()
Output
[1, [A, B, C]]
[2, [C, A]]
[3, [B, D]]

This operator is especially useful to process a group together with all the elements that share a common property or grouping key.

Exercise

Use fromPath to create a channel emitting all of the files in the folder data/meta/, then use a map to associate the baseName method to each file. Finally, group all files that have the same common prefix.

Solution
snippet.nf
1
2
3
4
5
Channel
    .fromPath('data/meta/*')
    .map { file -> tuple(file.baseName, file) }
    .groupTuple()
    .view()
Output
[patients_1, [/workspace/gitpod/nf-training/data/meta/patients_1.csv]]
[patients_2, [/workspace/gitpod/nf-training/data/meta/patients_2.csv]]
[random, [/workspace/gitpod/nf-training/data/meta/random.txt]]
[regions, [/workspace/gitpod/nf-training/data/meta/regions.json, /workspace/gitpod/nf-training/data/meta/regions.tsv, /workspace/gitpod/nf-training/data/meta/regions.yml]]
[regions2, [/workspace/gitpod/nf-training/data/meta/regions2.json]]

7.2.7 join()

The join operator creates a channel that joins together the items emitted by two channels with a matching key. The key is defined, by default, as the first element in each item emitted.

snippet.nf
1
2
3
left = Channel.of(['X', 1], ['Y', 2], ['Z', 3], ['P', 7])
right = Channel.of(['Z', 6], ['Y', 5], ['X', 4])
left.join(right).view()
Output
[Z, 3, 6]
[Y, 2, 5]
[X, 1, 4]

Note

Notice P is missing in the final result.

7.2.8 branch()

The branch operator allows you to forward the items emitted by a source channel to one or more output channels.

The selection criterion is defined by specifying a closure that provides one or more boolean expressions, each of which is identified by a unique label. For the first expression that evaluates to a true value, the item is bound to a named channel as the label identifier.

snippet.nf
Channel
    .of(1, 2, 3, 40, 50)
    .branch {
        small: it < 10
        large: it > 10
    }
    .set { result }

result.small.view { "$it is small" }
result.large.view { "$it is large" }
Output
1 is small
40 is large
2 is small
3 is small
50 is large

Info

The branch operator returns a multi-channel object (i.e., a variable that holds more than one channel object).

Note

In the above example, what would happen to a value of 10? To deal with this, you can also use >=.

Summary

In this step you have learned:

  1. How to use the view operator to print the content of a channel
  2. How to use the map operator to transform the content of a channel
  3. How to use the mix operator to combine the content of two or more channels
  4. How to use the flatten operator to flatten the content of a channel
  5. How to use the collect operator to collect the content of a channel
  6. How to use the groupTuple operator to group the content of a channel
  7. How to use the join operator to join the content of two channels
  8. How to use the branch operator to split the content of a channel

7.3 Text files

7.3.1 splitText()

The splitText operator allows you to split multi-line strings or text file items, emitted by a source channel into chunks containing n lines, which will be emitted by the resulting channel.

snippet.nf
1
2
3
4
Channel
    .fromPath('data/meta/random.txt') // (1)!
    .splitText() // (2)!
    .view() // (3)!
  1. Instructs Nextflow to make a channel from the path data/meta/random.txt
  2. The splitText operator splits each item into chunks of one line by default.
  3. View contents of the channel.
Output
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s,
when an unknown printer took a galley of type and scrambled it to make a type specimen book.
It has survived not only five centuries, but also the leap into electronic typesetting,
...

You can define the number of lines in each chunk by using the parameter by, as shown in the following example:

snippet.nf
1
2
3
4
Channel
    .fromPath('data/meta/random.txt')
    .splitText(by: 2)
    .view()
Output
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s,

when an unknown printer took a galley of type and scrambled it to make a type specimen book.
It has survived not only five centuries, but also the leap into electronic typesetting,
...

An optional closure can also be specified in order to transform the text chunks produced by the operator. The following example shows how to split text files into chunks of 2 lines and transform them into capital letters:

snippet.nf
1
2
3
4
Channel
    .fromPath('data/meta/random.txt')
    .splitText(by: 2) { it.toUpperCase() }
    .view()
Output
LOREM IPSUM IS SIMPLY DUMMY TEXT OF THE PRINTING AND TYPESETTING INDUSTRY.
LOREM IPSUM HAS BEEN THE INDUSTRY'S STANDARD DUMMY TEXT EVER SINCE THE 1500S,

WHEN AN UNKNOWN PRINTER TOOK A GALLEY OF TYPE AND SCRAMBLED IT TO MAKE A TYPE SPECIMEN BOOK.
IT HAS SURVIVED NOT ONLY FIVE CENTURIES, BUT ALSO THE LEAP INTO ELECTRONIC TYPESETTING,
...

7.3.2 splitCsv()

The splitCsv operator allows you to parse text items emitted by a channel, that are CSV formatted.

It then splits them into records or groups them as a list of records with a specified length.

In the simplest case, just apply the splitCsv operator to a channel emitting a CSV formatted text files or text entries. For example, to view only the first and fourth columns:

snippet.nf
1
2
3
4
Channel
    .fromPath("data/meta/patients_1.csv")
    .splitCsv()
    .view { row -> "${row[0]}, ${row[3]}" }
Output
patient_id, num_samples
ATX-TBL-001-GB-02-117, 3
ATX-TBL-001-GB-01-110, 3
ATX-TBL-001-GB-03-101, 3
ATX-TBL-001-GB-04-201, 3
ATX-TBL-001-GB-02-120, 3
ATX-TBL-001-GB-04-102, 3
ATX-TBL-001-GB-03-104, 3
ATX-TBL-001-GB-03-103, 3

When the CSV begins with a header line defining the column names, you can specify the parameter header: true which allows you to reference each value by its column name, as shown in the following example:

snippet.nf
1
2
3
4
5
Channel
    .fromPath("data/meta/patients_1.csv")
    .splitCsv(header: true)
    // row is a list object
    .view { row -> "${row.patient_id}, ${row.num_samples}" }

Alternatively, you can provide custom header names by specifying a list of strings in the header parameter as shown below:

snippet.nf
1
2
3
4
Channel
    .fromPath("data/meta/patients_1.csv")
    .splitCsv(header: ['col1', 'col2', 'col3', 'col4', 'col5'])
    .view { row -> "${row.col1}, ${row.col4}" }
Output
patient_id, num_samples
ATX-TBL-001-GB-02-117, 3
ATX-TBL-001-GB-01-110, 3
ATX-TBL-001-GB-03-101, 3
ATX-TBL-001-GB-04-201, 3
ATX-TBL-001-GB-02-120, 3
ATX-TBL-001-GB-04-102, 3
ATX-TBL-001-GB-03-104, 3
ATX-TBL-001-GB-03-103, 3

You can also process multiple CSV files at the same time:

snippet.nf
1
2
3
4
Channel
    .fromPath("data/meta/patients_*.csv") // <-- just use a pattern
    .splitCsv(header: true)
    .view { row -> "${row.patient_id}\t${row.num_samples}" }
Output
ATX-TBL-001-GB-02-117   3
ATX-TBL-001-GB-01-110   3
ATX-TBL-001-GB-03-101   3
ATX-TBL-001-GB-04-201   3
ATX-TBL-001-GB-02-120   3
ATX-TBL-001-GB-04-102   3
ATX-TBL-001-GB-03-104   3
ATX-TBL-001-GB-03-103   3
ATX-TBL-001-GB-01-111   2
ATX-TBL-001-GB-01-112   3
ATX-TBL-001-GB-04-202   3
ATX-TBL-001-GB-02-124   3
ATX-TBL-001-GB-02-107   3
ATX-TBL-001-GB-01-105   3
ATX-TBL-001-GB-02-108   3
ATX-TBL-001-GB-01-113   3

Tip

Notice that you can change the output format simply by adding a different delimiter.

Finally, you can also operate on CSV files outside the channel context:

1
2
3
4
5
def f = file('data/meta/patients_1.csv')
def lines = f.splitCsv()
for (List row : lines) {
    log.info "${row[0]} -- ${row[2]}"
}

Exercise

Create a CSV file and use it as input for script7.nf, part of the Simple RNA-Seq workflow tutorial.

Solution

Add a CSV text file containing the following, as an example input with the name "fastq.csv":

fastq.csv
gut,/workspace/gitpod/nf-training/data/ggal/gut_1.fq,/workspace/gitpod/nf-training/data/ggal/gut_2.fq

Then replace the input channel for the reads in script7.nf. Changing the following lines:

1
2
3
Channel
    .fromFilePairs(params.reads, checkIfExists: true)
    .set { read_pairs_ch }

To a splitCsv channel factory input:

script7.nf
1
2
3
4
5
Channel
    .fromPath("fastq.csv")
    .splitCsv()
    .view { row -> "${row[0]}, ${row[1]}, ${row[2]}" }
    .set { read_pairs_ch }

Finally, change the cardinality of the processes that use the input data:

script7.nf
process QUANTIFICATION {
    tag "$sample_id"

    input:
    path salmon_index
    tuple val(sample_id), path(reads1), path(reads2)

    output:
    path sample_id, emit: quant_ch

    script:
    """
    salmon quant --threads $task.cpus --libType=U -i $salmon_index -1 ${reads1} -2 ${reads2} -o $sample_id
    """
}

Repeat the above for the fastqc step.

script7.nf
process FASTQC {
    tag "FASTQC on $sample_id"

    input:
    tuple val(sample_id), path(reads1), path(reads2)

    output:
    path "fastqc_${sample_id}_logs"

    script:
    """
    mkdir fastqc_${sample_id}_logs
    fastqc -o fastqc_${sample_id}_logs -f fastq -q ${reads1} ${reads2}
    """
}

Now the workflow should run from a CSV file.

7.3.3 Tab separated values (.tsv)

Parsing TSV files works in a similar way. Simply add the sep: '\t' option in the splitCsv context:

snippet.nf
1
2
3
4
5
Channel
    .fromPath("data/meta/regions.tsv", checkIfExists: true)
    // use `sep` option to parse TAB separated files
    .splitCsv(sep: '\t')
    .view()

Exercise

Use the tab separation technique on the file data/meta/regions.tsv, but print just the first column, and remove the header.

Solution
snippet.nf
1
2
3
4
5
6
Channel
    .fromPath("data/meta/regions.tsv", checkIfExists: true)
    // use `sep` option to parse TAB separated files
    .splitCsv(sep: '\t', header: true)
    // row is a list object
    .view { row -> "${row.patient_id}" }

7.3.4 splitJson()

You can parse the JSON file format using the splitJson channel operator.

The splitJson operator supports JSON arrays:

snippet.nf
1
2
3
4
Channel
    .of('["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"]')
    .splitJson()
    .view()
Output
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday

As well as JSON arrays in objects:

snippet.nf
1
2
3
4
Channel
    .of('{"player": {"name": "Bob", "height": 180, "champion": false}}')
    .splitJson()
    .view()
Output
[value:[name:Bob, height:180, champion:false], key:player]

And even a JSON array of JSON objects:

snippet.nf
1
2
3
4
5
Channel
    .of('[{"name": "Bob", "height": 180, "champion": false}, \
          {"name": "Alice", "height": 170, "champion": false}]')
    .splitJson()
    .view()
Output
[name:Bob, height:180, champion:false]
[name:Alice, height:170, champion:false]

You can also parse JSON files directly:

file.json
[
    { "name": "Bob", "height": 180, "champion": false },
    { "name": "Alice", "height": 170, "champion": false }
]
snippet.nf
1
2
3
4
Channel
    .fromPath('file.json')
    .splitJson()
    .view()
Output
[name:Bob, height:180, champion:false]
[name:Alice, height:170, champion:false]

Summary

In this step you have learned:

  1. How to use the splitText operator to split text files of various formats
  2. How to use the splitJson operator to split JSON files of various formats

7.4 More resources

Check the operators documentation on Nextflow web site.