Continuous Integration

Github actions can perform a test run of your workflow using the minimal test-dataset. Just like the Dockerhub continuous integration, the actions are performed upon each push to the dev branch.

In order to set this up, we will need to specify both a test configuration profile and a ci.yml workflow file.

Test profile

The test configuration profile contains a series of input parameters that will be used as inputs to the workflow for the test run. These parameters point to the URL of the test-dataset hosted on GitHub.

Unfortunately, wildcard glob patterns are not supported via html links, so the following is not valid:

params{
  input = "https://raw.githubusercontent.com/nf-core/test-datasets/circrna/fastq/*_{1,2}.fastq.gz"
}

Here is a valid test.config file for our simulated RNA-Seq dataset we have been working with:

params {
    config_profile_name = 'Test profile'
    config_profile_description = 'Test dataset to check pipeline function'

    // Limit resources so that this can run on GitHub Actions
    max_cpus = 2
    max_memory = 6.GB
    max_time = 48.h

    // Input data for test data
    input = 'https://raw.githubusercontent.com/BarryDigby/CRT_workshop/master/docs/source/test_datasets/samples.csv'
    fasta = 'https://raw.githubusercontent.com/BarryDigby/CRT_workshop/master/docs/source/test_datasets/chrI.fa'
    gtf = 'https://raw.githubusercontent.com/BarryDigby/CRT_workshop/master/docs/source/test_datasets/chrI.gtf'
    outdir = 'test_outdir/'
}

Save the file to conf/test.config in your repository.

Sample File

To overcome the html glob limitation, we need to construct an input samples file.

See below for a valid example of a samples.csv file, specifying the links to each fastq file:

Sample_ID,Read1,Read2
cel_N2_1,https://raw.githubusercontent.com/nf-core/test-datasets/circrna/fastq/N2_rep1_1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/circrna/fastq/N2_rep1_2.fastq.gz
cel_N2_2,https://raw.githubusercontent.com/nf-core/test-datasets/circrna/fastq/N2_rep2_1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/circrna/fastq/N2_rep2_2.fastq.gz
cel_N2_3,https://raw.githubusercontent.com/nf-core/test-datasets/circrna/fastq/N2_rep3_1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/circrna/fastq/N2_rep3_2.fastq.gz
fust1_1,https://raw.githubusercontent.com/nf-core/test-datasets/circrna/fastq/fust1_rep1_1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/circrna/fastq/fust1_rep1_2.fastq.gz
fust1_2,https://raw.githubusercontent.com/nf-core/test-datasets/circrna/fastq/fust1_rep2_1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/circrna/fastq/fust1_rep2_2.fastq.gz
fust1_3,https://raw.githubusercontent.com/nf-core/test-datasets/circrna/fastq/fust1_rep3_1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/circrna/fastq/fust1_rep3_2.fastq.gz

Instead of supplying the path to sequencing reads as params.input, we can provide the samples.csv file. Save this file in your directory to test it out.

We will need to use custom functions to read in the file and stage them as inputs for our workflow.

See the nextflow script below. Save it and run nextflow run <script_name>.nf --input 'samples.csv'

Note

We are testing this locally, so we are not deploying from Github. If you are not in the directory containing the nextflow.config file, specify it’s path with the -c argument.

#!/usr/bin/env nextflow

// parse input data
if(has_extension(params.input, ".csv")){

   csv_file = file(params.input, checkIfExists: true)
   ch_input = extract_data(csv_file)

}else{

   exit 1, "error: The sample input file must have the extension '.csv'."

}

// stage input data
( ch_qc_reads, ch_raw_reads) = ch_input.into(2)

ch_raw_reads.view()

process FASTQC{
    tag "${base}"
    publishDir params.outdir, mode: 'copy',
        saveAs: { params.save_qc_intermediates ? "fastqc/${it}" : null }

    when:
    params.run_qc

    input:
    tuple val(base), file(reads) from ch_qc_reads

    output:
    tuple val(base), file("*.{html,zip}") into ch_multiqc

    script:
    """
    fastqc -q $reads
    """
}

/*
================================================================================
                            AUXILLARY FUNCTIONS
================================================================================
*/

// Check if a row has the expected number of item
def checkNumberOfItem(row, number) {
    if (row.size() != number) exit 1, "error:  Invalid CSV input - malformed row (e.g. missing column) in ${row}, consult documentation."
    return true
}

// Return file if it exists
def return_file(it) {
    if (!file(it).exists()) exit 1, "error: Cannot find supplied FASTQ input file. Check file: ${it}"
    return file(it)
}

// Check file extension
def has_extension(it, extension) {
    it.toString().toLowerCase().endsWith(extension.toLowerCase())
}

// Parse samples.csv file
def extract_data(csvFile){
    Channel
        .fromPath(csvFile)
        .splitCsv(header: true, sep: ',')
        .map{ row ->

        def expected_keys = ["Sample_ID", "Read1", "Read2"]
        if(!row.keySet().containsAll(expected_keys)) exit 1, "error: Invalid CSV input - malformed column names. Please use the column names 'Sample_ID', 'Read1', 'Read2'."

        checkNumberOfItem(row, 3)

        def samples = row.Sample_ID
        def read1 = row.Read1.matches('NA') ? 'NA' : return_file(row.Read1)
        def read2 = row.Read2.matches('NA') ? 'NA' : return_file(row.Read2)

        if( samples == '' || read1 == '' || read2 == '' ) exit 1, "error: a field does not contain any information. Please check your CSV file"
        if( !has_extension(read1, "fastq.gz") && !has_extension(read1, "fq.gz") && !has_extension(read1, "fastq") && !has_extension(read1, "fq")) exit 1, "error: A R1 file has a non-recognizable FASTQ extension. Check: ${r1}"
        if( !has_extension(read2, "fastq.gz") && !has_extension(read2, "fq.gz") && !has_extension(read2, "fastq") && !has_extension(read2, "fq")) exit 1, "error: A R2 file has a non-recognizable FASTQ extension. Check: ${r2}"

        // output tuple mimicking fromFilePairs
        [ samples, [read1, read2] ]

        }
}

Note

nextflow will only download the files once they are passed to a process.

Note

note to barry: integrate these functions to students main.nf before proceeding.

CI.yml

‘All’ that is left is to set up the Github actions file and integrate two profiles, test and docker.

Create the following file in your directory: .github/workflows/ci.yml:

Warning

I cannot stress how important indentation is with .yml files.

name: CI
# This workflow runs the pipeline with the minimal test dataset to check that it completes without any syntax errors
on:
  push:
    branches:
      - dev
  pull_request:
  release:
    types: [published]

jobs:
  test:
    name: Run workflow tests
    # Only run on push if this is the nf-core dev branch (merged PRs)
    if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'BarryDigby/rtp_workshop') }}
    runs-on: ubuntu-latest
    env:
      NXF_VER: ${{ matrix.nxf_ver }}
      NXF_ANSI_LOG: false
    strategy:
      matrix:
        # Nextflow versions: specify nextflow version to use
        nxf_ver: ['21.04.0', '']
    steps:
      - name: Check out pipeline code
        uses: actions/checkout@v2.4.0

      - name: Check if Dockerfile or Conda environment changed
        uses: technote-space/get-diff-action@v4
        with:
          FILES: |
            Dockerfile
            environment.yml

      - name: Build new docker image
        if: env.MATCHED_FILES
        run: docker build --no-cache . -t barryd237/test:dev

      - name: Pull docker image
        if: ${{ !env.MATCHED_FILES }}
        run: |
          docker pull barryd237/test:dev
          docker tag barryd237/test:dev barryd237/test:dev

      - name: Install Nextflow
        env:
          CAPSULE_LOG: none
        run: |
          wget https://github.com/nextflow-io/nextflow/releases/download/v21.04.1/nextflow
          sudo chmod 777 ./nextflow
          sudo mv nextflow /usr/local/bin/

      - name: Run pipeline with test data
        run: |
          nextflow run ${GITHUB_WORKSPACE} -profile test,docker

In your nexflow.config file, add the following:

profiles {
    docker {
        docker.enabled = true
        singularity.enabled = false
        podman.enabled = false
        shifter.enabled = false
        charliecloud.enabled = false
        docker.runOptions = '-u \$(id -u):\$(id -g)'
    }
    test { includeConfig 'conf/test.config' }
}

In your conf/test.config file, add the following:

// overwrite the -B bind path we used for singularity
// Docker will fail trying to use it
process{
  containerOptions = null
}

Add, commit and push the changes and cross your fingers!