Saltar a contenido

Nextflow for Genomics

This training course is intended for researchers in genomics and related fields who are interested in developing or customizing data analysis pipelines. It builds on the Hello Nextflow beginner training and demonstrates how to use Nextflow in the specific context of the genomics domain.

Specifically, this course demonstrates how to implement a simple variant calling pipeline with GATK (Genome Analysis Toolkit), a widely used software package for analyzing high-throughput sequencing data.

Let's get started! Click on the "Open in GitHub Codespaces" button below to launch the training environment (preferably in a separate tab), then read on while it loads.

Open in GitHub Codespaces

Learning objectives

By working through this course, you will learn how to apply foundational Nextflow concepts and tooling to a typical genomics use case.

By the end of this workshop you will be able to:

  • Write a linear workflow to apply variant calling to a single sample
  • Handle accessory files such as index files and reference genome resources appropriately
  • Leverage Nextflow's dataflow paradigm to parallelize per-sample variant calling
  • Implement multi-sample variant calling using relevant channel operators
  • Configure pipeline execution and manage and optimize resource allocations
  • Implement per-step and end-to-end pipeline tests that handle genomics-specific idiosyncrasies appropriately

Prerequisites

The course assumes some minimal familiarity with the following:

  • Tools and file formats commonly used in this scientific domain
  • Experience with the command line
  • Foundational Nextflow concepts and tooling covered in the Hello Nextflow beginner training.

For technical requirements and environment setup, see the Environment Setup mini-course.