Skip to content
Snippets Groups Projects
user avatar
Alexis Mergez authored
837cda5b
History

Pan1c

Snakemake workflow for creating a pangenome at a chromosomic scale. Tools used within the workflow :

The file architecture for the workflow is as follow :

Pan1c/
├── config.yaml
├── README.md
├── runSnakemake.sh
├── scripts
│   ├── bin_split.py
│   ├── getPanachePAV.sh
│   ├── inputClustering.py
│   ├── ragtagChromInfer.sh
│   └── statsAggregation.py
└── Snakefile

Prepare your data

This workflow can take chromosome level assemblies as well as contig level assemblies.
Fasta files need to be compressed using bgzip2 (included in PanGeTools). Records id must follow this pattern : <haplotype name>#<ctg|chr name>. (CHM13#chr01 for example where the fasata file is named CHM13.fa.gz).
Because of the clustering step, this pattern is only needed for the reference assembly. Input files should be read only to prevent snakemake to mess with them (which seems to happen in some rare cases).

Download apptainer images

Before running the worflow, some apptainer images needs to be downloaded :

apptainer build <your_apptainer_image_directory>/PanGeTools.sif oras://registry.forgemia.inra.fr/alexis.mergez/pangetools/pangetools:latest  
apptainer build <your_apptainer_image_directory>/pytools.sif oras://registry.forgemia.inra.fr/alexis.mergez/pangetools/pytools:latest  
apptainer build <your_apptainer_image_directory>/pggb.sif oras://registry.forgemia.inra.fr/alexis.mergez/pangratools/pggb:latest  
apptainer build <your_apptainer_image_directory>/snakebox.sif oras://registry.forgemia.inra.fr/alexis.mergez/pangratools/snakebox:latest  

Usage

Clone this repo and create data/haplotypes. Place all your haplotypes in it. Change the reference name and the apptainer image directory in config.yaml.
Finally, change variables in runSnakemake.sh to match your needs (threads, memory, job name, mail, etc...). Go in the root directory of the repo and run sbatch runSnakemake.sh !