metagWGS: Documentation
Introduction
metagWGS is a Nextflow bioinformatics analysis pipeline used for metagenomic Whole Genome Shotgun sequencing data (Illumina HiSeq3000 or NovaSeq, paired, 2*150bp ; PacBio HiFi reads, single-end).
Pipeline graphical representation
The workflow processes raw data from .fastq/.fastq.gz
input and/or assemblies (contigs) .fa/.fasta
and uses the modules represented in this figure:
metagWGS steps
metagWGS is split into different steps that correspond to different parts of the bioinformatics analysis. Many of these steps are optional and their necessity depends on the desired analysis.
-
S01_CLEAN_QC
- trims adapters sequences and deletes low quality reads (Cutadapt, Sickle)
- suppresses host contaminants (BWA-MEM2 or Minimap2 + Samtools)
- controls the quality of raw and cleaned data (FastQC)
- makes a taxonomic classification of cleaned reads (Kaiju MEM + kronaTools + plot_kaiju_stat.py + merge_kaiju_results.py)
-
S02_ASSEMBLY
- assembles reads (metaSPAdes or Megahit or Hifiasm_meta or metaFlye)
- assesses the quality of assembly (metaQUAST)
- reads deduplication (BWA-MEM2 or Minimap2 + Samtools)
-
S03_FILTERING
- filters contigs with low CPM value (Filter_contig_per_cpm.py + metaQUAST)
-
S04_STRUCTURAL_ANNOT
- makes a structural annotation of genes (Prodigal + Barrnap + tRNAscan-SE + merge_annotations.py)
-
S05_ALIGNMENT
-
S06_FUNC_ANNOT
- makes a sample and global clustering of genes (cd-hit-est + cd_hit_produce_table_clstr.py)
- quantifies reads that align with the genes (featureCounts + Quantification_clusters.py)
- makes a functional annotation of genes and a quantification of reads by function (eggNOG-mapper + merge_abundance_and_functional_annotations.py + quantification_by_functional_annotation.py)
-
S07_TAXO_AFFI
- taxonomically affiliates the genes (Samtools + aln2taxaffi.py)
- taxonomically affiliates the contigs (Samtools + aln2taxaffi.py)
- counts the number of reads and contigs, for each taxonomic affiliation, per taxonomic level (Samtools + merge_contig_quantif_perlineage.py + quantification_by_contig_lineage.py)
-
S08_BINNING
- aligns reads samples against assemblies (according to the strategy used) (BWA-MEM2 or Minimap2)
- performs metagenome binning (METABAT2 + MAXBIN2 + CONCOCT)
- refines bin sets (bin_refinement.sh adapt from METAWRAP bin_refinement)
- dereplicates bins between samples (DREP)
- taxonomically affiliates the bins (GTDBTK)
- calculates bins abundances between samples (BWA-MEM2 or Minimap2 + SAMTOOLS)
All steps are launched one after another by default. Use --stop_at_[STEP]
and --skip_[STEP]
parameters to tweak execution to your will.
A report html file is generated at the end of the workflow with MultiQC.
The pipeline is built using Nextflow, a bioinformatics workflow tool to run tasks across multiple compute infrastructures in a very portable manner.
Three Singularity containers are available making installation trivial and results highly reproducible.
Documentation
The metagWGS documentation can be found in the following pages:
-
Installation
- The pipeline installation procedure.
-
Usage
- An overview of how the pipeline works, how to run it and a description of all of the different command-line flags.
-
Output
- An overview of the different output files and directories produced by the pipeline.
-
Use case (WARNING: not up-to-date, needs to be updated)
- A tutorial to learn how to launch the pipeline on a test dataset on genologin cluster.
-
Functional tests
- (for developers) A tool to launch a new version of the pipeline on curated input data and compare its results with known output.
Contact us
If you have any questions or suggestions for improvement, please contact us to claire.hoede[@]inrae.fr.
Cite us
For the moment if you use metagWGS for your research, please cite : Joanna Fourquet, Jean Mainguy, Maïna Vienne, Céline Noirot, Pierre Martin, et al.. metagWGS: a workflow to analyse short and long HiFi metagenomic reads Taxonomic profile HiFi vs Short reads assembly. JOBIM 2022, Jul 2022, Rennes, France. ⟨10.15454/1.5572369328961167E12⟩. ⟨hal-03771202⟩