St. Jude HPCF Configuration
All nf-core pipelines have been successfully configured for use on the High Performance Research Cluster (HPCF) at St. Jude Children’s Research Hospital. The HPCF cluster is using IBM Spectrum LSF for workload management and job scheduling.
To use, run the pipeline with -profile stjude
. This will download and launch the stjude.config
which has been pre-configured with a setup suitable for the St. Jude HPCF cluster. Using this profile, a Singularity image containing all of the required software will be used for execution of the pipeline.
Running the Pipeline
Before running the pipeline, you will need to prepare a Nextflow
runtime environment to launch the workflow.
You can do this by issuing the command below (not recommended):
## Load nextflow module
module load nextflow/23.10.0
You can also use a Nextflow
binary installed in your own environment for better control of software version. Nextflow
can be
easily installed via conda
or mamba
using the following command:
<conda/mamba> create -n nextflow -c bioconda nextflow={VERSION}
conda activate nextflow
## To exit: conda deactivate
The conda
or mamba
will automatically sets up the necessary environment for nextflow
execution.
Resource Allocation
The St. Jude HPCF configuration includes the following resource limits:
- Max CPUs: 32
- Max Memory: 1024 GB
- Max Accelerator (NVIDIA TESLA GPU): 8
- Max Time: 240 hours (10 days)
Queue Selection
The configuration automatically selects the appropriate queue based on your task requirements:
- GPU tasks: ‘gpu’
- Short tasks (<30 min): ‘short’
- High memory tasks (>512 GB): ‘large_mem’
- Default: ‘standard’
GPU Usage
In St. Jude’s HPCF cluster, the GPU jobs will be dispatched the gpu
queue.
To use GPUs in your pipeline:
- Set the
accelerator
option in your process directives. This can configured by:
- Adding directives in each process specifically, or
- Using
process {}
-based label selectors (e.g. label ‘process_gpu’)
- (Optional) If you want to use docker (unsupported) or singularity (supported) on HPCF, you need to add the following container options:
containerOptions = {
workflow.containerEngine == "singularity" ? '--nv':
( workflow.containerEngine == "docker" ? '--gpus all': null )
}
Singularity Configuration
Singularity is enabled by default with the following settings:
- Auto-mounts enabled
- Environment variables whitelist: SINGULARITY_TMPDIR, TMPDIR, CUDA_VISIBLE_DEVICES
- Pull timeout: 3 hours
Singularity Cache Directory
There is no institution-level directory setup for caching singularity images. To set the cache directory, you can try the following two ways:
- Add
singularity.cacheDir = /path/to/your/singularity/cache
in your nextflow config file. - Set nextflow’s environment variable:
export NXF_SINGULARITY_CACHE=/path/to/your/singularity/cache
in your shell’s profile file (e.g.~/<.bashrc/.zshrc>
).
[!NOTE] You will need an account to use the HPC cluster at St. Jude Children’s Research Hospital in order to run the pipeline. If in doubt, contact IT.
[!NOTE] Nextflow will need to submit the jobs via the LSF job scheduler to the HPC cluster, and as such, the commands above will have to be executed on one of the login nodes. If in doubt, contact IT.
Support
For any issues or questions, please contact:
Profile Contact: Haidong Yi (hyi@stjude.org)
Config file
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for St. Jude Children's Research Hospital's High Performance Research Cluster (HPCF)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Author: Haidong Yi
Mail: hyi@stjude.org
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/
params {
config_profile_contact = "Haidong Yi (hyi@stjude.org)"
config_profile_description = "St. Jude Children's Research Hospital HPC cluster (HPCF) profile"
config_profile_url = "https://www.stjude.org/"
max_cpus = 64
max_memory = 1024.GB
max_time = 240.h
}
process {
// use `resourceLimits` directive to control resource
// Note: `resourceLimits` is a new feature starting from version 24.04.0.
// To use it, `Nextflow` should be upgraded to 24.04.0.
// see: https://github.com/nf-core/tools/issues/2923
resourceLimits = [
memory: 1024.GB,
cpus: 32,
time: 240.h
]
executor = 'lsf'
scratch = false
cache = 'lenient'
maxRetries = 3
errorStrategy = { task.exitStatus in [143, 137, 134, 139, 140] ? 'retry' : 'finish' }
afterScript = 'sleep 10'
// to avoid fail when using storeDir for missing output
beforeScript = """
module load singularity/4.1.1
export SINGULARITY_TMPDIR="${System.getenv('TMPDIR')}" # The 'TMPDIR' environment variable will be set up by LSF after a job is submitted.
"""
// queue selection based on task configs
// if urgent, change default queue from 'standard' to 'priority'
queue = {
if (task.accelerator) {
'gpu'
}
else if (task.time < 30.min) {
"short"
}
else if (task.memory > 512.GB) {
"large_mem"
}
else {
"standard"
}
}
// clusterOptions for gpu task:
// NOTE: We use GPU exclusively in each job
clusterOptions = { task.accelerator ? "-gpu \"num=${task.accelerator}/host:mode=shared:j_exclusive=yes\"" : null }
}
singularity {
envWhitelist = "SINGULARITY_TMPDIR,TMPDIR,CUDA_VISIBLE_DEVICES"
// allow the tmp dir and GPU visible devices visible in the containers
enabled = true
autoMounts = true
runOptions = '-p -B "$TMPDIR"'
pullTimeout = "3 hours"
}
// clean the generated files in the working directory
cleanup = true
executor {
name = 'lsf'
queueSize = 100
perTaskReserve = true
perJobMemLimit = false
submitRateLimit = "10/1sec"
exitReadTimeout = "5 min"
jobName = {
task.name
.replace("[", "(")
.replace("]", ")")
.replace(" ", "_")
}
}