Pan1c workflow
Pan1c (Pangenome 1 Chromosome) : a snakemake workflow for creating a pangenome at a chromosomic scale. Tools used within the workflow :
- PanGeTools : https://forgemia.inra.fr/alexis.mergez/pangetools
- PanGraTools : https://forgemia.inra.fr/alexis.mergez/pangratools
- Pan1c-Apps : https://forgemia.inra.fr/alexis.mergez/pan1capps
The file architecture for the workflow is as follow :
Pan1c/
├── config.yaml
├── example
│ ├── CICC1445.fa.gz
│ ├── config_CICD.yaml
│ ├── R64.fa.gz
│ ├── SX2.fa.gz
│ └── workflow.svg
├── getApps.sh
├── README.md
├── runSnakemake.sh
├── scripts
│ ├── getPanachePAV.sh
│ ├── inputClustering.py
│ ├── ragtagChromInfer.sh
│ ├── statsAggregation.py
│ └── workflowStats.py
└── Snakefile
Example DAG
This DAG shows the worflow for a pangenome of Arabidospis Thaliana
using the TAIR10.1
reference.
Prepare your data
This workflow can take chromosome level assemblies as well as contig level assemblies.
Fasta files need to be compressed using bgzip2 (included in PanGeTools).
Records id must follow this pattern : <haplotype name>#<ctg|chr name>
. (CHM13#chr01
for example where the fasta file is named CHM13.fa.gz).
Because of the clustering step, this pattern is only needed for the reference assembly.
Note : Input files should be read-only to prevent snakemake to mess with them (which seems to happen in some rare cases).
Download apptainer images
Before running the worflow, some apptainer images needs to be downloaded. Use the script getApps.sh to do so :
./getApps.sh -a <apps directory>
Running the workflow
Clone this repository and create a data/haplotypes
directory where you will place all your haplotypes.
Update the reference name and the apptainer image directory in config.yaml
.
Then, modify the variables in runSnakemake.sh
to match your requirements (number of threads, memory, job name, email, etc.).
Navigate to the root directory of the repository and execute sbatch runSnakemake.sh
!