5. Groovy Imports¶
There exists in Groovy a wealth of helper classes that can be imported into Nextflow scripts. In this chapter, we create a very small Workflow using the FastP tool to investigate importing the Groovy JSONSlurper class.
First, let's move into the chapter 4 directory:
Let's assume that we would like to pull in a samplesheet, parse the entries and run them through the FastP tool. So far, we have been concerned with local files, but Nextflow will handle remote files transparently:
Let's write a small closure to parse each row into the now-familiar map + files shape. We might start by constructing the meta-map:
... but this precludes the possibility of adding additional columns to the samplesheet. We might to ensure the parsing will capture any extra metadata columns should they be added. Instead, let's partition the column names into those that begin with "fastq" and those that don't:
New methods
We've introduced a new keySet method here. This is a method on Java's LinkedHashMap class (docs here)
We're also using the .split()
method, which divides collection based on the return value of the closure. The mrhaki blog provides a succinct summary.
From here, let's
... but we run into an error:
If we have a closer look at the samplesheet, we notice that not all rows have two read pairs. Let's add a condition
Now we need to construct the meta map. Let's have a quick look at the FASTP module that I've already pre-defined:
I can see that we require two extra keys, id
and single_end
:
This is now able to be passed through to our FASTP process:
Let's assume that we want to pull some information out of these JSON files. To make our lives a little more convenient, let's "publish" these json files so that they are more convenient. We're going to discuss configuration more completely in a later chapter, but that's no reason not to dabble a bit here.
We'd like to add a publishDir
directive to our FASTP process.
Groovy Tip: Elvis Operator
This pattern of returning something if it is true and somethingElse
if not:
has a shortcut in Groovy - the "Elvis" operator:
This enables us to iterate quickly to test out our JSON parsing without waiting on the FASTP caching to calculate on these slow virtual machines.
Let's consider the possibility that we'd like to capture some of these metrics so that they can be used downstream. First, we'll have a quick peek at the Groovy docs and I see that I need to import a JsonSlurper
:
Now let's create a second entrypoint to quickly pass these JSON files through some tests:
Entrypoint developing
Using a second Entrypoint allows us to do quick debugging or development using a small section of the workflow without disturbing the main flow.
which we run with
Let's create a small function at the top of the workflow to take the JSON path and pull out some basic metrics:
Exercise
The fastpResult
returned from the parseText
method is a large Map - a class which we're already familiar with. Modify the getFilteringResult
function to return just the after_filtering
section of the report.
We can then join this new map back to the original reads using the join
operator:
Exercise
Can you amend this pipeline to create two channels that filter the reads to exclude any samples where the Q30 rate is less than 93.5%?