rnaseq v1.3
Rnaseq workflow , which agree to FAIR principles , was built in Nexflow dsl2 language, with singularity container for used softwares, optimized in terms of computing resources (cpu, memory), and its use on a informatic farm with a slurm scheduler.
Install rnaseq flow and build singularity image
Clone rnaseq git and build local singularity image (with system admin rights) based on the provided singularity definition file.
git clone https://forgemia.inra.fr/lpgp/rnaseq.git
sudo singularity build ./rnaseq/singularity/rnaseq.sif ./rnaseq/singularity/rnaseq.def
Usage examples
design.csv file must have ID, R1 and R2 header and write with comma separator.
ID | R1 | R2 |
---|---|---|
A | /path/to/targetA_R1.fa.gz | /path/to/targetA_R2.fa.gz |
B | /path/to/targetB_R1.fa.gz | /path/to/targetB_R2.fa.gz |
C | /path/to/targetC_R1.fa.gz | /path/to/targetC_R2.fa.gz |
Bulk rnaseq
Genome alignment with STAR and HTSEQ-count
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
module load bioinfo/Nextflow/21.10.6
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "${PWD}/design.csv" \
--genome "${PWD}/genome.fa.gz" \
--gff_gtf "${PWD}/annot.gtf.gz" \
--method "star_htseq-count" \
--sjdbOverhang 80 \
--clip_r1 12 \
--three_prime_clip_r1 2 \
--clip_r2 12 \
--three_prime_clip_r2 2 \
--out_dir "${PWD}/results"
Transcriptome alignment with Salmon (SAF mode)
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
module load bioinfo/Nextflow/21.10.6
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "${PWD}/design.csv" \
--method "salmon_saf" \
--genome "${PWD}/genome.fa.gz" \
--transcriptome "${PWD}/transcriptome.fa.gz" \
--clip_r1 12 \
--three_prime_clip_r1 2 \
--clip_r2 12 \
--three_prime_clip_r2 2 \
--out_dir "${PWD}/results"
Genome alignment with STARsolo for BRB chemistry
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
module load bioinfo/Nextflow/21.10.6
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "${PWD}/design.csv" \
--genome "${PWD}/genome.fa.gz" \
--gff_gtf "${PWD}/annot.gtf.gz" \
--method "star_solo" \
--chemistry "brb" \
--whitelist "${PWD}/whitelist.txt" \
--out_dir "${PWD}/results"
Transcriptome alignment with Alevin fry for BRB chemistry
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
module load bioinfo/Nextflow/21.10.6
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "${PWD}/design.csv" \
--genome "${PWD}/genome.fa.gz" \
--gff_gtf "${PWD}/annot.gtf.gz" \
--method "alevin_fy" \
--chemistry "brb" \
--whitelist "${PWD}/whitelist.txt" \
--out_dir "${PWD}/results"
Single cell
Genome alignment with STARsolo for Chromium (v2/v3) chemistry
# get The 10X Chromium V2 whitelist
wget https://github.com/10XGenomics/cellranger/raw/master/lib/python/cellranger/barcodes/737K-august-2016.txt
# get The 10X Chromium V3 whitelist
wget https://github.com/10XGenomics/cellranger/raw/master/lib/python/cellranger/barcodes/3M-february-2018.txt.gz
gunzip 3M-february-2018.txt.gz
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
module load bioinfo/Nextflow/21.10.6
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "${PWD}/design.csv" \
--genome "${PWD}/genome.fa.gz" \
--gff_gtf "${PWD}/annot.gtf.gz" \
--method "star_solo" \
--chemistry "10xv3" \
--whitelist "${PWD}/3M-february-2018.txt" \
--out_dir "${PWD}/results"
Transcriptome alignment with ALEVIN-fry (through simpleaf) for Chromium (v2/v3) chemistry
#!/bin/bash
#SBATCH -J rnaseq
#SBATCH --mem 10GB
module load containers/singularity/3.9.9
module load bioinfo/Nextflow/21.10.6
nextflow run /work/project/lpgp/Nextflow/rnaseq/ \
-profile slurm \
--input "${PWD}/design.csv" \
--genome "${PWD}/genome.fa.gz" \
--gff_gtf "${PWD}/annot.gtf.gz" \
--method "alevin_fry" \
--chemistry "10xv3" \
--out_dir "${PWD}/results"
Defaults parameters
Please refer to Trim Galore, STAR, htseq-count, and Salmon simpleaf for complete arguments explanation.
# sequences
input = false
genome = false
transcriptome = false
# fastqc
skip_fastqc = false
# trimming
skip_trimming = false
clip_r1 = 0
clip_r2 = 0
three_prime_clip_r1 = 0
three_prime_clip_r2 = 0
# alignment mode should be star_htseq-count and/or salmon_saf for bulk-RNAseq
# alignment mode should be star_solo and/or alevin_saf and/or alevin_fry for BRBseq or scRNAseq
method = false
# STAR options
star_index = false
gff_gtf = false
sjdbOverhang = 99
keep_star_index = false
htseq_count_multimapped = false
feature_type = "exon"
# SALMON options
salmon_index = false
keep_salmon_index = false
writeMappings = false
# ALEVIN options
alevin_fry_index = false
keep_alevin_fry_index = false
chemistry = false
spliceu = false
# STAR SOLO options
star_index = false
gff_gtf = false
keep_star_index = false
whitelist = false
# save directory
out_dir = "${PWD}/results"
References
- Krueger F, Galore T. A wrapper tool around cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files [Internet]. Available from: http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
- Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21.
- Anders S, Pyl PT, Huber W. HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–9.
- Srivastava A, Malik L, Sarkar H, Zakeri M, Almodaresi F, Soneson C, et al. Alignment and mapping methodology influence transcript abundance estimation. Genome Biol. 2020;21:239.
- Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods. 2017;14:417–9.