Metadata in workflows¶

In any scientific analysis, we rarely work with just the raw data files. Each file comes with its own additional information: what it is, where it came from, and what makes it special. This extra information is what we call metadata.

Metadata is data describing other data. Metadata tracks important details about files and experimental conditions, and helps tailor analyses to each dataset's unique characteristics.

Think of it like a library catalog: while books contain the actual content (raw data), the catalog cards provide essential information about each book—when it was published, who wrote it, where to find it (metadata). In Nextflow pipelines, metadata can be used to:

Track file-specific information throughout the workflow
Configure processes based on file characteristics
Group related files for joint analysis

Learning goals¶

In this side quest, we'll explore how to handle metadata in workflows. Starting with a simple datasheet containing basic file information, you'll learn how to:

Read and parse file metadata from CSV files
Create and manipulate metadata maps
Add new metadata fields during workflow execution
Use metadata to customize process behavior

These skills will help you build more robust and flexible pipelines that can handle complex file relationships and processing requirements.

Prerequisites¶

Before taking on this side quest, you should:

Have completed the Hello Nextflow tutorial or equivalent beginner's course.
Be comfortable using basic Nextflow concepts and mechanisms (processes, channels, operators)

0. Get started¶

Open the training codespace¶

If you haven't yet done so, make sure to open the training environment as described in the Environment Setup.

Move into the project directory¶

Let's move into the directory where the files for this tutorial are located.

cd side-quests/metadata

You can set VSCode to focus on this directory:

code .

Review the materials¶

You'll find a main workflow file and a data directory containing a samplesheet and a handful of data files.

Directory contents

.
├── data
│   ├── bonjour.txt
│   ├── ciao.txt
│   ├── guten_tag.txt
│   ├── hallo.txt
│   ├── hello.txt
│   ├── hola.txt
│   ├── salut.txt
│   └── samplesheet.csv
├── main.nf
└── nextflow.config

The samplesheet list the paths to the data files and some associated metadata, organized in 3 columns:

id: self-explanatory, an ID given to the file
character: a character name, that we will use later to draw different creatures
data: paths to .txt files that contain greetings in different languages

samplesheet.csv

id,character,recording
sampleA,squirrel,/workspaces/training/side-quests/metadata/data/bonjour.txt
sampleB,tux,/workspaces/training/side-quests/metadata/data/guten_tag.txt
sampleC,sheep,/workspaces/training/side-quests/metadata/data/hallo.txt
sampleD,turkey,/workspaces/training/side-quests/metadata/data/hello.txt
sampleE,stegosaurus,/workspaces/training/side-quests/metadata/data/hola.txt
sampleF,moose,/workspaces/training/side-quests/metadata/data/salut.txt
sampleG,turtle,/workspaces/training/side-quests/metadata/data/ciao.txt

Each data file contains some greeting text in one of five languages (French, German, Spanish, Italian, English).

We will also provide you with a containerized language analysis tool called langid.

Review the assignment¶

Your challenge is to write a Nextflow workflow that will:

Identify the language in each file automatically
Group files by language family (Germanic vs Romance languages)
Customize the processing for each file based on its language and metadata
Organize outputs by language group

This represents a typical workflow pattern where file-specific metadata drives processing decisions; exactly the kind of problem that metadata maps solve elegantly.

Readiness checklist¶

Think you're ready to dive in?

I understand the goal of this course and its prerequisites
My codespace is up and running
I've set my working directory appropriately
I understand the assignment

If you can check all the boxes, you're good to go.

1. Read in datasheet¶

1.1. Read in datasheet with splitCsv¶

Let's start by reading in the datasheet with splitCsv. In the main workflow file, you'll see that we've already started the workflow:

main.nf
workflow  {

    ch_samplesheet = channel.fromPath("./data/samplesheet.csv")

}

Note

Throughout this tutorial, we'll use the ch_ prefix for all channel variables to clearly indicate they are Nextflow channels.

AfterBefore

main.nf
    ch_samplesheet = channel.fromPath("./data/samplesheet.csv")
        .splitCsv(header: true)
        .view()

main.nf
    ch_samplesheet = channel.fromPath("./data/samplesheet.csv")

Here we use the splitCsv operator with the header: true option to tell Nextflow to read the first row of the CSV file as the header row. This will parse the datasheet into a channel of maps, i.e. key-value pairs, where each map represents a row from the CSV file, with the column headers as keys for the corresponding values.

(Optional) More about maps

In Groovy, the programming language that Nextflow is built on, a map is a key-value data structure similar to dictionaries in Python, objects in JavaScript, or hashes in Ruby.

For example:

Groovy map

def my_map = [id:'sampleA', character:'squirrel']
println my_map.id  // Prints: sampleA

And here's a runnable script that applies this in practice:

examples/map_demo.nf

#!/usr/bin/env nextflow

// Create a simple map
def my_map = [id:'sampleA', character:'squirrel']

// Print the whole map
println "map: ${my_map}"

// Access individual values using dot notation
println "id: ${my_map.id}"
println "character: ${my_map.character}"

Even though it doesn't have a proper workflow block, Nextflow can run this as if it were a workflow:

Run map demo example

nextflow run examples/map_demo.nf

And here's what you can expect to see in the output:

Output

Nextflow 25.10.0 is available - Please consider updating your version to it

N E X T F L O W   ~  version 25.04.3

Launching `map_demo.nf` [cheesy_plateau] DSL2 - revision: fae5b8496e

map: [id:sampleA, character:squirrel]
id: sampleA
character: squirrel

Let's see what Nextflow can see after reading with splitCsv.

Run the pipeline with the view() operator we added above:

Read the datasheet

nextflow run main.nf

Read datasheet with splitCsv

 N E X T F L O W   ~  version 24.10.4

Launching `main.nf` [exotic_albattani] DSL2 - revision: c0d03cec83

[id:sampleA, character:squirrel, recording:/workspaces/training/side-quests/metadata/data/bonjour.txt]
[id:sampleB, character:tux, recording:/workspaces/training/side-quests/metadata/data/guten_tag.txt]
[id:sampleC, character:sheep, recording:/workspaces/training/side-quests/metadata/data/hallo.txt]
[id:sampleD, character:turkey, recording:/workspaces/training/side-quests/metadata/data/hello.txt]
[id:sampleE, character:stegosaurus, recording:/workspaces/training/side-quests/metadata/data/hola.txt]
[id:sampleF, character:moose, recording:/workspaces/training/side-quests/metadata/data/salut.txt]
[id:sampleG, character:turtle, recording:/workspaces/training/side-quests/metadata/data/ciao.txt]

We can see that each row from the CSV file has been converted into a map with keys matching the header row. Each map entry corresponds to a column in our datasheet:

id
character
recording

This format makes it easy to access specific fields from each file. For example, we could access the file ID with id or the txt file path with recording. The output above shows each row from the CSV file converted into a map with keys matching the header row.

Let's access a specific column, the character column, that we read in from the datasheet, and print it. We can use the Nextflow map operator to iterate over each item in our channel and return specific entry of our map object:

AfterBefore

main.nf
        .splitCsv(header: true)
        .map{ row ->
            row.character
        }
        .view()

main.nf
        .splitCsv(header: true)
        .view()

and rerun:

Print the creatures

nextflow run main.nf

Print the creatures

squirrel
tux
sheep
turkey
stegosaurus
moose
turtle

Success, we can use the map structures derived from our datasheet to access the values from individual columns for each row.

Now that we've successfully read in the datasheet and have access to the data in each row, we can begin implementing our pipeline logic.

1.2. Separate metadata and data¶

In the datasheet, we have both the input files and data about the input files (id, character), the metadata. As we progress through the workflow, we generate more metadata about each file.

So far, every column from the datasheet has become a separate item in the channel items we derived using splitCsv(). Every process that consumes this channel would need to be configured with this structure in mind:

    input:
    tuple val(id), val(character), file(recording)

This means that as soon as somebody added an extra column to the datasheet, the workflow would start producing errors, because the shape of the channel would no longer match what the process expected. It also makes the process hard to share with others who might have slightly different input data, and we end up hard-coding variables into the process that aren't needed by the script block.

To avoid this, we need to find a way of keeping the channel structure consistent however many columns that datasheet contains, and we can do that by keeping all the metadata in a single part of the channel tuples we call simply the meta map.

Let's use this and separate our metadata from the file path. We'll use the map operator to restructure our channel elements into a tuple consisting of the meta map and file:

AfterBefore

main.nf
    .map { row ->
            [ [id: row.id, character: row.character], row.recording ]
    }

main.nf
    .map { row ->
            row.character
    }

Let's run it:

View meta map

nextflow run main.nf

View meta map

 N E X T F L O W   ~  version 24.10.4

Launching `main.nf` [lethal_booth] DSL2 - revision: 0d8f844c07

[[id:sampleA, character:squirrel], /workspaces/training/side-quests/metadata/data/bonjour.txt]
[[id:sampleB, character:tux], /workspaces/training/side-quests/metadata/data/guten_tag.txt]
[[id:sampleC, character:sheep], /workspaces/training/side-quests/metadata/data/hallo.txt]
[[id:sampleD, character:turkey], /workspaces/training/side-quests/metadata/data/hello.txt]
[[id:sampleE, character:stegosaurus], /workspaces/training/side-quests/metadata/data/hola.txt]
[[id:sampleF, character:moose], /workspaces/training/side-quests/metadata/data/salut.txt]
[[id:sampleG, character:turtle], /workspaces/training/side-quests/metadata/data/ciao.txt]

Now, each item in the channel has only two items, the metadata map (e.g. [id:sampleA, character:squirrel]) and the data file described by that metadata (e.g. /workspaces/training/side-quests/metadata/data/bonjour.txt).

Now we can write processes to consume the channel without hard-coding the metadata items into the input specification:

    input:
    tuple val(meta), file(recording)

Additional columns in the datasheet will make more metadata available in the meta map, but won't change the channel shape.

Takeaway¶

In this section, you've learned:

Why metadata is important: Keeping metadata with your data preserves important file information throughout the workflow.
How to read in a datasheets: Using splitCsv to read CSV files with header information and transform rows into structured data
How to create a meta map: Separating metadata from file data using tuple structure [ [id:value, ...], file ]

2. Manipulating metadata¶

2.1. Copying input metadata to output channels¶

Now we want to process our files with unidentified languages. Let's add a process definition before the workflow that can identify the language in each file:

AfterBefore

main.nf
/*
 * Use langid to predict the language of each input file
 */
process IDENTIFY_LANGUAGE {

    container 'community.wave.seqera.io/library/pip_langid:b2269f456a5629ff'

    input:
    tuple val(meta), path(file)

    output:
    tuple val(meta), path(file), stdout

    script:
    """
    langid < ${file} -l en,de,fr,es,it | sed -E "s/.*\\('([a-z]+)'.*/\\1/" | tr -d '\\n'
    """
}

workflow {

main.nf
workflow  {

The tool langid is used for language identification. It comes pre-trained on a set of languages. For a given phrase, it outputs a language prediction and a probability score for each guess to the console. In the script section, we use sed to remove the probability score, clean up the string by removing newline characters, and return only the language prediction. Since the output is printed directly to the console, we use Nextflow’s stdout output qualifier to capture and pass the string as output.

Let's include the process, then run, and view it:

AfterBefore

main.nf
            [ [id: row.id, character: row.character], row.recording ]
        }

    ch_prediction = IDENTIFY_LANGUAGE(ch_samplesheet)
    ch_prediction.view()

main.nf
            [ [id:row.id, character:row.character], row.recording ]
        }
        .view()

Identify languages

nextflow run main.nf

Identify languages

 N E X T F L O W   ~  version 24.10.4

Launching `main.nf` [voluminous_mcnulty] DSL2 - revision: f9bcfebabb

executor >  local (7)
[4e/f722fe] IDENTIFY_LANGUAGE (7) [100%] 7 of 7 ✔
[[id:sampleA, character:squirrel], /workspaces/training/side-quests/metadata/work/eb/f7148ebdd898fbe1136bec6a714acb/bonjour.txt, fr]
[[id:sampleB, character:tux], /workspaces/training/side-quests/metadata/work/16/71d72410952c22cd0086d9bca03680/guten_tag.txt, de]
[[id:sampleD, character:turkey], /workspaces/training/side-quests/metadata/work/c4/b7562adddc1cc0b7d414ec45d436eb/hello.txt, en]
[[id:sampleC, character:sheep], /workspaces/training/side-quests/metadata/work/ea/04f5d979429e4455e14b9242fb3b45/hallo.txt, de]
[[id:sampleF, character:moose], /workspaces/training/side-quests/metadata/work/5a/6c2b84bf8fadb98e28e216426be079/salut.txt, fr]
[[id:sampleE, character:stegosaurus], /workspaces/training/side-quests/metadata/work/af/ee7c69bcab891c40d0529305f6b9e7/hola.txt, es]
[[id:sampleG, character:turtle], /workspaces/training/side-quests/metadata/work/4e/f722fe47271ba7ebcd69afa42964ca/ciao.txt, it]

Neat, for each of our files, we now have a language predicted.

Note

"de" stands for "deutsch", the German word for "german"

You may have noticed something else: we kept the metadata of our files and associated it with our new piece of information. We achieved this by adding the meta map to the output tuple in the process:

main.nf
output:
tuple val(meta), path(file), stdout

This is a useful way to ensure the metadata stays connected with any new information that is generated.

Note

Another compelling reason to use meta maps in this way is that they make it easier to associate related results that share the same identifiers. As you learned in "Hello Nextflow", you can't rely on the order of items in channels to match results across them. Instead, you must use keys to associate data correctly - and meta maps provide an ideal structure for this purpose. We explore this use case in detail in Splitting & Grouping.

2.2. Using process outputs to augment metadata¶

Given that this is more data about the files, let's add it to our meta map. We can use the map operator again to create a new key lang and set the value to the predicted language:

AfterBefore

main.nf
    ch_prediction = IDENTIFY_LANGUAGE(ch_samplesheet)
    ch_languages = ch_prediction
        .map { meta, file, lang ->
            [meta + [lang: lang], file]
        }
        .view()
}

main.nf
    ch_prediction = IDENTIFY_LANGUAGE(ch_samplesheet)
    ch_prediction.view()

The map operator takes each channel element and processes it to create a modified version. Inside the closure { meta, file, lang -> ... }, we then take the existing meta map, create a new map [lang:lang], and merge both together using +.

The + operator in Groovy merges two maps together. So if our original meta was [id:sampleA, character:squirrel], then meta + [lang:'fr'] creates a new map: [id:sampleA, character:squirrel, lang:fr].

Note

The + notation with maps creates an entirely new map object, which is what we want. If we'd done something like meta.lang = lang we'd have been modifying the original object, which can lead to unpredictable effects.

View new meta map key

nextflow run main.nf -resume

View new meta map key

 N E X T F L O W   ~  version 24.10.4

Launching `main.nf` [cheeky_fermat] DSL2 - revision: d096281ee4

[4e/f722fe] IDENTIFY_LANGUAGE (7) [100%] 7 of 7, cached: 7 ✔
[[id:sampleA, character:squirrel, lang:fr], /workspaces/training/side-quests/metadata/work/eb/f7148ebdd898fbe1136bec6a714acb/bonjour.txt]
[[id:sampleB, character:tux, lang:de], /workspaces/training/side-quests/metadata/work/16/71d72410952c22cd0086d9bca03680/guten_tag.txt]
[[id:sampleC, character:sheep, lang:de], /workspaces/training/side-quests/metadata/work/ea/04f5d979429e4455e14b9242fb3b45/hallo.txt]
[[id:sampleD, character:turkey, lang:en], /workspaces/training/side-quests/metadata/work/c4/b7562adddc1cc0b7d414ec45d436eb/hello.txt]
[[id:sampleF, character:moose, lang:fr], /workspaces/training/side-quests/metadata/work/5a/6c2b84bf8fadb98e28e216426be079/salut.txt]
[[id:sampleE, character:stegosaurus, lang:es], /workspaces/training/side-quests/metadata/work/af/ee7c69bcab891c40d0529305f6b9e7/hola.txt]
[[id:sampleG, character:turtle, lang:it], /workspaces/training/side-quests/metadata/work/4e/f722fe47271ba7ebcd69afa42964ca/ciao.txt]

Nice, we expanded our meta map with new information we gathered in the pipeline. After running the language prediction, each element in the output channel looks like this:

[meta, file, lang]  // e.g. [[id:sampleA, character:squirrel], bonjour.txt, fr]

2.3. Assign a language group using conditionals¶

Now that we have our language predictions, let's use the information to assign them into new groups. In our example data, we have provided data sets that belong either to germanic (either English or German) or romance (French, Spanish, Italian) languages.

We can add another map operator to assign either group (add this below your last map operation).

AfterBefore

main.nf
        .map { meta, file, lang ->
            [meta + [lang: lang], file]
        }
        .map { meta, file ->

            def lang_group = "unknown"
            if (meta.lang.equals("de") || meta.lang.equals('en')) {
                lang_group = "germanic"
            }
            else if (meta.lang in ["fr", "es", "it"]) {
                lang_group = "romance"
            }

            [meta + [lang_group: lang_group], file]
        }
        .view()

main.nf
        .map { meta, file, lang ->
            [meta + [lang: lang], file]
        }
        .view()

Let's rerun it

View language groups

nextflow run main.nf -resume

View language groups

 N E X T F L O W   ~  version 24.10.4

Launching `main.nf` [wise_almeida] DSL2 - revision: 46778c3cd0

[da/652cc6] IDENTIFY_LANGUAGE (7) [100%] 7 of 7, cached: 7 ✔
[[id:sampleA, character:squirrel, lang:fr, lang_group:romance], /workspaces/training/side-quests/metadata/data/bonjour.txt]
[[id:sampleB, character:tux, lang:de, lang_group:germanic], /workspaces/training/side-quests/metadata/data/guten_tag.txt]
[[id:sampleC, character:sheep, lang:de, lang_group:germanic], /workspaces/training/side-quests/metadata/data/hallo.txt]
[[id:sampleD, character:turkey, lang:en, lang_group:germanic], /workspaces/training/side-quests/metadata/data/hello.txt]
[[id:sampleE, character:stegosaurus, lang:es, lang_group:romance], /workspaces/training/side-quests/metadata/data/hola.txt]
[[id:sampleF, character:moose, lang:fr, lang_group:romance], /workspaces/training/side-quests/metadata/data/salut.txt]
[[id:sampleG, character:turtle, lang:it, lang_group:romance], /workspaces/training/side-quests/metadata/data/ciao.txt]

Let's understand how this transformation works. The map operator here again takes a closure that processes each element in the channel. Inside the closure, we're using an if-clause to create a new language group classification.

Here's what's happening step by step:

Create a new field lang_group: We set a default to unknown
Extract existing metadata: We access meta.lang (the language we predicted earlier) from the existing meta map
Apply conditional logic: We use an if-clause to determine the language group based on the language: Is meta.lang either de or en, we re-assign lang_group to germanic, if fr, es, or it, then we re-assign to romance
Merge with existing metadata: We use meta + [lang_group:lang_group] in the same way as before to combine the existing meta map with our new field

The resulting channel elements maintain their [meta, file] structure, but the meta map now includes this new classification.

Takeaway¶

In this section, you've learned how to :

Apply input metadata to output channels: Copying metadata in this way allows us to associate results later on based on metadata content.
Create custom keys: You created two new keys in your meta map, merging them with meta + [new_key:value] into the existing meta map.One based on a computed value from a process, and one based on a condition you set in the map operator.

These allow you to associate new and existing metadata with files as you progress through your pipeline.

3. Use meta map information in a process¶

Let's make some fun characters say the phrases from the files our channel. In the hello-nextflow training, you already encountered the cowpy package, a python implementation of a tool called cowsay that generates ASCII art to display arbitrary text inputs in a fun way. We will re-use a process from there.

Copy in the process before your workflow block:

AfterBefore

main.nf
/*
 * Generate ASCII art with cowpy
*/
process COWPY {

    publishDir "results/", mode: 'copy'

    container 'community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273'

    input:
    tuple val(meta), path(input_file)

    output:
    tuple val(meta), path("cowpy-${input_file}")

    script:
    """
    cat $input_file | cowpy -c "stegosaurus" > cowpy-${input_file}
    """

}

workflow{}

main.nf
20	`workflow{`

3.1. Use meta map information in the process definition¶

Let's run our files through COWPY and remove our view statement:

AfterBefore

main.nf
            [ meta + [lang_group: lang_group], file ]
        }
    COWPY(ch_languages)

main.nf
            [meta + [lang_group: lang_group], file]
        }
        .view()

The process definition as provided would direct results to the results folder, but let's make a tweak to be a little smarter. Given we have been trying to figure out what languages our samples were in, let's group the samples by language in the output directory.

Earlier, we added the predicted language to the meta map. We can access this key in the process and use it in the publishDir directive:

AfterBefore

main.nf
process COWPY {

    publishDir "results/${meta.lang_group}", mode: 'copy'

    container 'community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273'

main.nf
process COWPY {

    publishDir "results/", mode: 'copy'

    container 'community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273'

Let's run this:

Use cowpy

nextflow run main.nf -resume

You should now see a new folder called results:

Results folder

results/
├── germanic
│   ├── cowpy-guten_tag.txt
│   ├── cowpy-hallo.txt
│   └── cowpy-hello.txt
└── romance
    ├── cowpy-bonjour.txt
    ├── cowpy-ciao.txt
    ├── cowpy-hola.txt
    └── cowpy-salut.txt

Success! All our phrases are correctly sorted and we now see which of them correspond to which language.

Let's take a look at cowpy-salut.txt:

cowpy-salut.txt

 ____________________
/ Salut, ça va?      \
\ à plus dans le bus /
 --------------------
\                             .       .
 \                           / `.   .' "
  \                  .---.  <    > <    >  .---.
   \                 |    \  \ - ~ ~ - /  /    |
         _____          ..-~             ~-..-~
        |     |   \~~~\.'                    `./~~~/
       ---------   \__/                        \__/
      .'  O    \     /               /       \  "
     (_____,    `._.'               |         }  \/~~~/
      `----.          /       }     |        /    \__/
            `-.      |       /      |       /      `. ,~~|
                ~-.__|      /_ - ~ ^|      /- _      `..-'
                     |     /        |     /     ~-.     `-. _  _  _
                     |_____|        |_____|         ~ - . _ _ _ _ _>

Look through the other files. All phrases should be spoken by the fashionable stegosaurus.

How did this work? The publishDir directive is evaluated at runtime when the process executes. Each process task gets its own meta map from the input tuple When the directive is evaluated, ${meta.lang_group} is replaced with the actual group language value for that dataset creating the dynamic paths like results/romance.

3.2. Customize the character¶

In our datasheet, we have another column: character. To tailor the tool parameters per file, we can also access information from the meta map in the script section. This is really useful in cases where a tool should have different parameters for each file.

Let's customize the characters by changing the cowpy command:

AfterBefore

main.nf
    script:
    """
    cat $input_file | cowpy -c ${meta.character} > cowpy-${input_file}
    """

main.nf
    script:
    """
    cat $input_file | cowpy -c "stegosaurus" > cowpy-${input_file}
    """

Let's run this:

Use cowpy

nextflow run main.nf -resume

and take another look at our french phrase:

romance/cowpy-salut.txt

 ____________________
/ Salut, ça va?      \
\ à plus dans le bus /
 --------------------
  \
   \   \_\_    _/_/
    \      \__/
           (oo)\_______
           (__)\       )\/\
               ||----w |
               ||     ||

This approach differs from using pipeline parameters (params), which generally apply the same configuration to all files in your workflow. By leveraging metadata applied to each item in a channel, you can fine-tune process behavior on a per-file basis.

3.2.1. Exploiting metadata at the workflow level¶

In the example above, by using a property of the meta map in the script block, we introduce a hard requirement on the properties that must be present. Anyone running with a sample sheet that did not contain the character property would encounter an error. The process input: only says that the meta map is required, so someone trying to use this process in another workflow might not notice immediately that the character property was required.

A better approach is to make the required metadata property an explicit input rather than accessing it from within the meta map. This makes the requirement clear and provides better error messages. Here's how to refactor the COWPY process:

AfterBefore

main.nf
    input:
    tuple val(meta), val(character), path(input_file)

main.nf
    input:
    tuple val(meta), path(input_file)

Then, at the workflow level, we extract the character property from the metadata and pass it as a separate input:

AfterBefore

main.nf
    COWPY(ch_languages.map{meta, file -> [meta, meta.character, file]})

main.nf
    COWPY(ch_languages)

Why this approach is better:

Clear Requirements: The process input explicitly shows that character is required
Better Error Messages: If character is missing, Nextflow will fail at the input stage with a clear error
Self-Documenting: Anyone using this process can immediately see what inputs are needed
Reusable: The process is easier to redeploy in other contexts

This highlights an important design principle: Use the meta map for optional, descriptive information, but extract required values as explicit inputs. The meta map is excellent for keeping channel structures clean and preventing arbitrary channel structures, but for mandatory elements that are directly referenced in a process, extracting them as explicit inputs creates more robust and maintainable code.

Takeaway¶

In this section, you've learned how to:

Tweak directives using meta values: Using meta map values in publishDir directives to create dynamic output paths based on the file's metadata
Tweak the script section based on meta values: Customizing tool parameters per file using meta information in the script section

Summary¶

In this side quest, you've explored how to effectively work with metadata in Nextflow workflows.

This pattern of keeping metadata explicit and attached to the data is a core best practice in Nextflow, offering several advantages over hardcoding file information:

File metadata stays associated with files throughout the workflow
Process behavior can be customized per file
Output organization can reflect file metadata
File information can be expanded during pipeline execution

Applying this pattern in your own work will enable you to build robust, maintainable bioinformatics workflows.

Key patterns¶

Reading and Structuring Metadata: Reading CSV files and creating organized metadata maps that stay associated with your data files.

channel.fromPath('samplesheet.csv')
  .splitCsv(header: true)
  .map { row ->
      [ [id:row.id, character:row.character], row.recording ]
  }

Expanding Metadata During Workflow Adding new information to your metadata as your pipeline progresses by adding process outputs and deriving values through conditional logic.

Adding new keys based on process output

.map { meta, file, lang ->
  [ meta + [lang:lang], file ]
}

Adding new keys using a conditional clause

.map{ meta, file ->
    if ( meta.lang.equals("de") || meta.lang.equals('en') ){
        lang_group = "germanic"
    } else if ( meta.lang in ["fr", "es", "it"] ) {
        lang_group = "romance"
    } else {
        lang_group = "unknown"
    }
}

Customizing Process Behavior: Using metadata to adapt how processes handle different files.
- Using meta values in Process Directives
```
publishDir "results/${meta.lang_group}", mode: 'copy'
```
- Adapting tool parameters for individual files
```
cat $input_file | cowpy -c ${meta.character} > cowpy-${input_file}
```

Additional resources¶

map
stdout

What's next?¶

Return to the menu of Side Quests or click the button in the bottom right of the page to move on to the next topic in the list.