@@ -580,6 +580,8 @@ The `raw` directory contains others files which are not main files. For more inf
With the next script, we want to run metagWGS on test dataset in order to have **`06_func_annot` step** results. This new script is the same script than `Script_filtering_binning.sh` where we have changed `--step "03_filtering,08_binning"` by `--step "03_filtering,08_binning,06_func_annot"` into the `--step` parameter and where we have added the parameter `--eggnogmapper_db` to build eggNOG-mapper database for functional annotation. All previous choices have beed conserved: we don't know the real host genome for this dataset but we want to test host filtering: we decided to use **sus scrofa** as host genome. We also want to **filter contigs** after assembly with the default cpm value (10). It is this assembly that will be used in the following steps requiring the assembly files. Assembly tool used in this script is `metaspades`.
**NOTE:** keeping `08_binning` into the `--step` parameter allows to keep binning metrics in MultiQC html report file.
### B. Write the script `Script_filtering_functional.sh`
1. Go to `launch_test` directory.
...
...
@@ -664,7 +666,7 @@ Cached : 46
### D. Output files
With `Script_filtering_functional.sh` you have run all steps allowing to run `06_func_annot` step, including `03_filtering` step: `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot`, `05_alignment` and `06_func_annot`. But with the previous run script (`Script_filtering_binning.sh`) the steps `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot` and `05_alignment` have already been launched. This is why the jobs associated to these steps in the previous slurm file are indicated as "`cached`". All output files of the pipeline related to these steps haven't been changed because they don't have been re-generated. They are presented into the chapter [IV.D](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md#d-output-files).
With `Script_filtering_functional.sh` you have run all steps allowing to run `06_func_annot` step, including `03_filtering` step: `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot`, `05_alignment` and `06_func_annot`. But with the previous run script (`Script_filtering_binning.sh`) the steps `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot` and `05_alignment` have already been launched. This is why the jobs associated to these steps in the previous slurm file are indicated as "`cached`". Moreover, we keep `08_binning` and into `--step` parameter so the jobs associated to this step in the previous slurm file are also indicated as "`cached`". Keeping `06_func_annot` allows to have a new MultiQC report file updated with metrics of all steps launched in the two scripts. All output files of the pipeline related to these cached steps haven't been changed because they don't have been re-generated. They are presented into the chapter [IV.D](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md#d-output-files).
In the following sections, we will only present the main numerical output files in the subdirectory added to `results/` by this second script: `06_func_annot`.
With this last script, we want to run metagWGS on test dataset in order to have **`07_taxo_affi` step** results. This new script is the same script than `Script_filtering_functional.sh` (and so close to `Script_filtering_binning.sh)` where we have added `07_taxo_affi` into the `--step` parameter: `--step "03_filtering,08_binning,06_func_annot,07_taxo_affi"`. All previous choices have beed conserved: we don't know the real host genome for this dataset but we want to test host filtering: we decided to use **sus scrofa** as host genome. We also want to **filter contigs** after assembly with the default cpm value (10). It is this assembly that will be used in the following steps requiring the assembly files. Assembly tool used in this script is `metaspades`.
**NOTE:** keeping `08_binning` and `06_func_annot` into the `--step` parameter allows to keep binning and functional annotation metrics in MultiQC html report file.
### B. Write the script `Script_filtering_taxo.sh`
1. Go to `launch_test` directory.
...
...
@@ -1015,7 +1019,7 @@ Cached : 45
### D. Output files
With `Script_filtering_taxo.sh` you have run all steps allowing to run `07_taxo_affi` step, including `03_filtering` step: `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot`, `05_alignment` and `07_taxo_affi`. With the first script (`Script_filtering_binning.sh`) the steps `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot` and `05_alignment` have already been launched. This is why the jobs associated to these steps in the previous slurm file are indicated as "`cached`". All output files of the pipeline related to these steps haven't been changed because they don't have been re-generated. They are presented into the chapter [IV.D](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md#d-output-files).
With `Script_filtering_taxo.sh` you have run all steps allowing to run `07_taxo_affi` step, including `03_filtering` step: `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot`, `05_alignment` and `07_taxo_affi`. With the first script (`Script_filtering_binning.sh`) the steps `01_clean_qc`, `02_assembly`, `03_filtering`, `04_structural_annot` and `05_alignment` have already been launched. This is why the jobs associated to these steps in the previous slurm file are indicated as "`cached`". Moreover, we keep `08_binning` and `06_func_annot` into `--step` parameter so the jobs associated to these steps in the previous slurm file are also indicated as "`cached`". Keeping `08_binning` and `06_func_annot` allows to have a new MultiQC report file updated with metrics of all steps launched in the three scripts. All output files of the pipeline related to these cached steps haven't been changed because they don't have been re-generated. They are presented into the chapters[IV.D](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md#d-output-files) and [V.D](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/docs/use_case.md#d-output-files-1).
In the following sections, we will only present the main numerical output files in the subdirectory added to `results/` by this third script: `07_taxo_affi`.