Analysis Pipelines and Software

We support a wide range of genomics, multiomics and other projects, using standard pipelines or custom analyses as appropriate. If you have a project that you would like us to support, please contact us to discuss it. Our expertise is demonstrated by our impact in research and teaching activities, including publications enabled by work we have been involved in.

Descriptions of some common genomics applications and analyses workflows that our expert bioinformaticians can support are outlined below. Suggested pipelines and software are also detailed for anyone interested in analysing their own data. Tailored workflows, non-standard analyses, and/or follow-up analyses of results from any of your own analyses are also available as CGEBM bioinformatics services.

Relevant CGEBM training courses are noted. Further information on fees and how to register are on our training pages. Please register your interest for any analysis courses not currently offered and these will be considered in future CPD planning.

If you are embarking on your own analysis, nf-core may provide a suitable pipeline. nf-core is a community-driven framework offering curated bioinformatics pipelines for reproducible analysis (article link). These pipelines, built in Nextflow, run on most compute environments and support containerization via Docker or Singularity. They are developed and maintained by the nf-core community. Please note, although we cannot provide support for running nf-core pipelines at this time, individual pipeline pages provide detailed information on software and best practices for state-of-the-art analysis.

The Expressed Genome: Gene Expression and the Regulome

RNA-seq for Differential Expression

Brief Description: RNA-Seq measures the transcriptome, enabling quantification of gene expression levels and identification of differentially expressed genes between conditions (e.g., disease vs. control; treated vs untreated, mutant vs. wild type, etc). It captures low-abundance transcripts with high sensitivity.

Example Bioinformatics Pipeline:

Initial Steps: Quality control (FastQC), adapter trimming (TrimGalore!), alignment to reference genome (HISAT2), and quantification (featureCounts). Note that the Rsubread package in R also supports alignment and quantification.
Later Phases: Normalization and differential expression testing (edgeR package in R), visualization (scatter- and volcano plots via ggplot2 in R) and enrichment analysis (clusterProfiler in R).

CGEBM Full-Featured Workflow: We provide a comprehensive, end-to-end RNA-Seq pipeline including raw data processing (FastQC, TrimGalore!, MultiQC), quantification (HISAT2, featureCounts, edgeR in R), and downstream functional annotation and enrichment analysis to uncover biological insights by detecting over-represented pathways or functions within a gene set (R::clusterProfiler). Our bioinformatics services include data management and reporting, including provision of publication quality figures.

CGEBM Additional Analysis: CGEBM provides a separate enrichment analysis service, detailed below. Briefly, state-of-the-art enrichment analysis for functional and pathways annotation can be performed on your bespoke gene list and/or gene ranking generated by, for example, your own RNA-seq analysis, Venn diagram analysis of multiple RNA-seq analyses, or overlap analyses of multiomics data sources (e.g. ChIP-seq vs RNA-seq, etc.). See Enrichment Analysis of User-Supplied Results below.

Relevant Training: CGEBM offers a training course in RNA-sequencing, Differential Expression and Functional Enrichment Analysis.

ChIP-Seq and ATAC-seq for Study of the Regulome

Brief Description: CGEBM has expertise in identifying genome-wide binding sites of transcription factors, histones, or other DNA binding proteins using Chromatin Immunoprecipitation Sequencing (ChIP-Seq). ChIP-Seq involves sequencing DNA fragments pulled down with specific antibodies, revealing regulatory elements. ATAC-seq involves sequencing DNA fragments captured through transposase mediated insertion of sequencing adapters into open chromatin (tagmentation), thus mapping nucleosome positions and genome-wide chromatin-accessibility signatures. ChIP-seq and ATAC-seq are therefore important techniques to explore cis- and trans- factors regulating gene expression (the regulome).

Example Bioinformatics Pipeline for ChIP-seq:

Initial Steps: Quality control (FastQC), trimming (TrimGalore!), alignment (HISAT2 or BWA), duplicate removal (samtools or Picard), assessment of alignment quality and fragment length distribution (Qualimap), removal of mitochondrial reads and ENCODE blacklisted regions, assessment of TSS enrichment (deepTools), and peak calling (MACS2).
Later Phases: Motif discovery (HOMER), differential binding analysis (DiffBind), visualization (IGV or ggplot2 heatmaps) and enrichment analysis of genes associated with peaks (clusterProfiler in R).

More detail: For more detail, including software used in an existing complete pipeline for ChIP-seq or ATAC-seq, you can refer to the nf-core web site.

CGEBM Workflow: We provide an end-to-end ChIP-Seq pipeline with standard alignments (HISAT2 or BWA), peak calling (MACS2), and read counting in annotated regulatory regions (featureCounts). Three follow-up analyses are available: 1) motif enrichment analysis in peaks (HOMER), annotating peaks relative to gene features, 2) coverage relative to gene regions (deepTools) and 3) enrichment analysis of genes overlapped by peaks to uncover biological insight by detecting over-represented pathways or functions within a gene set (R/clusterProfiler, see below). Bespoke follow-up analyses are also available: for example, comparison of gene differential expression results (from your own or publicly available datasets) with lists of genes defined by overlaps with ChIP-seq peaks, that is, linking gene expression with the regulome. Our workflow includes data management and reporting, including provision of publication quality figures. Contact CGEBM to discuss.

BIS-Seq (Bisulfite Sequencing) of the Methylome

Brief Description: Whole-genome bisulfite sequencing (WGBS) detects DNA methylation by converting unmethylated cytosine to uracil bases, followed by sequencing of genomic DNA.

Example Bioinformatics Pipeline:

Initial Steps: Quality control of reads (FastQC, TrimGalore), alignment (Bismark/Bowtie2).
Later Phases: Deduplication and methylation analysis (Bismark tools), differential analysis (msPIPE or methylKit in R).

CGEBM Workflow: We use Bismark for alignment and methylation calling followed by differential methylation analysis with methylKit. Our bioinformaticians can provide expert assistance in all phases of this analysis. For instance, when Bismark’s output format is insufficient for reporting purposes, bespoke scripts can be used to fulfil the requirements of the study. Our bioinformatics service includes data management, reporting and provision of publication-quality figures.

Relevant Training: An up-to-date pipeline using Bismark, including software suggestions and steps can be found on the nf-core MethylSeq page.

Single-Cell RNA-Seq (scRNA-Seq), ATAC-Seq (scATAC-Seq) and multiomics

Brief Description: Single-Cell RNA-Seq measures whole transcriptome gene expression in individual cells, providing single cell resolution of gene expression in heterogeneous samples ). This can be highly informative in the exploration of heterogeneous samples (e.g. blood, tumours, brain, organoids, etc), where impact of conditions on specific cell types may be masked in a bulk RNAseq analysis.

Example Bioinformatics Pipeline:

Initial Steps: Demultiplexing (e.g. Cellranger and bcl2fastq for 10x Genomics data), alignment and quantification of reads at gene regions, and filtering of ambient RNA outside of single cells.
Later Phases: The level of downstream analysis is often determined by the objectives of the project. Loupe Cell Browser, for instance, is a software package provided by 10X Genomics that allows visualisation of the cloupe file generated by Cell Ranger count. Other typical tools used are mainly developed for the R and Python environments. The R package Seurat, along with the ScanPy tool and the scverse ecosystem, are the main tools available for each of these environments. Cloud-based solutions, such as Parse Biosciences’ Trailmaker^TM are also becoming more popular. Quality assessment and removal of outlying cells, normalisation of gene expression measurements for each cell, dimensionality reduction, visualisation of cell clusters, visualisation of cells expressing genes of interest, identification of gene expression changes between cells, and the determination of biomarkers within each cell cluster are some of the examples of downstream analysis steps that may be carried out.

CGEBM Workflow: Our bioinformaticians have experience in scRNA-Seq, scATAC-Seq and multiomics approaches in the same cells with data from 10X genomics or Parse Evercode. CGEBM supports projects from raw data to publication-quality figures, including data management and reporting. Example outputs: as shown in an example analysis by the Satija lab.

Relevant Training: We recommend two sources of free training online, but please note, this training only addresses scRNA-Seq, not scATAC-Seq or multiomics. Introductory-level training including theory and Galaxy exercises is available at the Galaxy Training pages. More comprehensive training is available at the Parse Biosciences Trailmaker training page.

NanoString (GeoMx, CosMx and nCounter) Spatial Biology

Brief Description: NanoString GeoMx Digital Spatial Profiler (DSP) instrument enables spatially resolved transcriptomics and/ or proteomics by profiling RNA and/or proteins in regions of interest (ROIs) on tissue sections. ROIs are defined by marker staining and/or user delineation.

NanoString CosMx Spatial Molecular Imager (SMI) instrument visualises cells and measures abundance of panels of target transcripts or proteins in spatial context at subcellular resolution using high-plex in situ hybridisation probes or antibodies linked to gene/protein specific DNA strands. This enables detailed spatial quantification of RNAs and/or proteins in individual cells and subcellular regions located, for example, at cell-cell interfaces.

NanoString nCounter instrument is used for targeted, multiplexed, reproducible, amplification-free absolute quantification of up to 800 genes or smaller protein panels, even from low-quality samples (e.g. FFPE with degraded RNA).

Example Bioinformatics Pipelines:

GeoMx: Region of interest (ROI) selection per sample, segmentation using marker presence/ absence or location (contouring or geometric) to define AOI, and collection of barcode tags from each AOI, precede sequencing and are used to generate spatially resolved count data (GeoMx DSP software). After counts files (DCC) have been generated for each sample, further analysis steps including sequencing QC and user-defined differential expression analyses may be done using the GeomxTools package in R or using NanoString proprietary software on the instrument.
CosMx analysis is performed with the Nanostring AtoMx platform. Cells are segmented and quality control steps are performed. A typical analysis pipeline from this point will include count normalisation, principal component analysis, UMAP, Leiden Clustering or Cell Typing, Neighborhood analysis, and differential gene expression analysis.
nCounter: analysis is performed with a combination of the NanoStringNCTools, NanoStringDiff and NanoTube packages in R. NanoTube offers differential abundance testing.
Follow-up functional enrichment analysis of differentially expressed genes may be performed with clusterProfiler and a suitable species annotation package in R.

CGEBM NanoString instruments: CGEBM has GeoMx and nCounter instruments and offer lab services for these applications. Users doing their own analysis will be provided count (DCC or RCC) files as part of our service. Nanostring data generated externally, including CosMx, can also be analysed by CGEBM, and CosMx analysis will include optimisation of cell segmentation.

CGEBM Full-Featured Workflows: CGEBM supports analysis of transcript and/or protein count data from any of the three NanoString technologies, from generation of count data to data management and reporting to production of publication-quality figures. Our pipelines begin with robust quality control on the input data and optimisation of analysis parameters. Subsequently, differential gene expression analysis is performed based on user-supplied sample metadata. Finally, functional enrichment analysis is performed on differentially expressed genes using Gene Ontology, KEGG pathways, and other relevant databases using processes described in the “Enrichment Analysis of User-Supplied Results” below. Example outputs: CGEBM’s pipeline outputs include all QC plots and full report of parameters applied in the QC process, differential expression results and downstream enrichment analysis.

Relevant Training: A detailed, comprehensive example of analysis of DCC count files from GeoMx is provided with the GeomxTools R package. Nanotring provides in-depth tutorials for CosMx analysis.

Enrichment Analysis of User-Supplied Gene Lists

Brief Description: This service provides downstream functional annotation and enrichment analysis of user-supplied gene lists or rankings to uncover biological insight by detecting over‑represented pathways and functions. We compare your genes of interest to curated gene sets defined by biological pathways, functions, chromosomal locations, regulatory targets, or disease/cell type signatures, using databases such as KEGG, MSigDB, and GO. We can filter your gene lists by thresholds such as log fold change and p‑value, followed by list enrichment analysis, or analyse a gene ranking with Gene Set Enrichment Analysis (GSEA) without filtering. Gene list/ranking inputs can come from RNA‑seq, ChIP‑seq, ATAC‑seq, proteomics, exome sequencing, or any omics approach or combination of approaches.

Example Bioinformatics Pipeline:

Initial Steps: Ranked or filtered list preparation, gene set loading (MSigDB). Potential need for construction of R annotation database for non-model organisms.
Later Phases: Enrichment scoring (clusterProfiler or fgsea in R), visualization (enrichplot).

CGEBM Full-Featured Workflow: State of the art enrichment analysis can be performed on results from your own analysis in your species of interest, including production of final publication-quality figures. This includes enrichment analysis of a ‘top hits’ filtered list as well as the far-more-sensitive gene-set enrichment analysis (GSEA), which uses a modified Kolmogorov-Smirnov approach to detect biases in results, even when individual genes are non-significant. Typical databases used for these analyses are GO, KEGG, and the Molecular Signatures Database (MSigDB). Non-model organism annotation databases may need to be built to enable your enrichment analysis, and this is a labour-intensive process. Please contact CGEBM to discuss options.

Example outputs: Tables of enrichment results with computed significance figures. Enrichment dot plots.

External alternatives:

Gene Ontology Web Interface: This tool can be found at the geneontology.org website with an overview/tutorial here. This tool supports a wide array of species. Supported ID types include NCBI gene IDs or gene symbols (see https://pantherdb.org/tips/tips_toolsUploadFile.jsp for others), and for accurate results upload a text file with the entire list of genes considered. Caveat: for unknown reasons, some species suffer from poor gene identifier recognition – unfortunately these types of problems can consume a lot of time, even for experienced bioinformaticians, so this is not something we can support.
Galaxy.eu Web Interface: The GOEnrichment tool (not limited to Gene Ontology) is found within the Galaxy web interface at Galaxy.eu. A tutorial with worked example is found on the Galaxy Training pages. The tool requires you to upload all relevant files, including your omics results file, ontology .obo file, and gene-term association file.
Gorilla enrichment analysis: The Gorilla tool performs enrichment analysis for a small range of organisms, where the user can choose analysis of lists or gene rankings. A link to an example usage is provided on the tool web page.
DAVID (Database for Annotation, Visualisation and Integrated Discovery): DAVID can be used for functional enrichment analysis of gene lists. DAVID includes an ortholog tool to convert gene lists between species (DAVID Functional Annotation Bioinformatics Microarray Analysis). To begin analysing a list, click “Start Analysis” link at the top left of the page.

Multi-omics Data Fusion

Brief Description: Data fusion by comparison of omics-scale data of any two types can yield unprecedented insights into the RNA biology of your cells of interest. For example, lists of genes from ChIP-Seq, ATAC-Seq, Me-Seq, ribosomal profiling, mRNA-turnover analysis or any other method may be compared to RNA-Seq results to identify mechanisms of change in mRNA abundance. Please contact CGEBM to arrange a meeting to discuss how we can help you answer the questions you are interested in!

Example Bioinformatics Pipeline:

Initial Steps: You may have data that you have analysed or lists of genes from publications that you want to test against your dataset. You may want to draw from the wealth of publicly available datasets – we can help you identify relevant public datasets as well as interrogate them in new ways. It all begins with a meeting.
Later Phases: We can help you by developing custom scripts to perform your analysis and represent data fusion analyses of multiomics data in the most useful manner. We will provide data outputs and high-quality figures to support your manuscript submissions.

Example Outputs: For sample plots demonstrating the power of cross comparison between omics-scale datasets, see Figure 4A-J in Paris et al (2019) and Figure 4D-F in Codino et al (2021).

Genomes and Transcriptomes

Genome Assembly and Annotation

Brief Description: CGEBM has expertise in a wide range of sequencing data types for genome assembly and annotation. This includes Illumina short read and long-read sequencing, assembly and annotation of large eukaryotic genomes, assembly, annotation and downstream analysis of single bacterial genomes (e.g. virulence, antimicrobial resistance, CAZymes, etc).

Example Bioinformatics Pipelines:

Large Eukaryotic Genomes: These are usually assembled from long reads (e.g. Oxford Nanopore Technologies or PacBio) or hybrid assembly from short plus long read data. Base-calling (Guppy, Dorado for ONT), assembly (Flye, wtdbg2, shasta for ONT, HiFiasm for PacBio; Unicycler for hybrid short read assembly with long read gap filling), polishing (Racon or Medaka for ONT; Pilon for Illumina), remove alternate haplotype contigs (Purge_dups for ONT and PacBio), transposable element identification (RepeatModeler, RepeatMasker), annotation (Braker, Maker, Augustus, SNAP). Downstream assembly quality assessment is performed by comparison to existing genomes and with BUSCO analysis.
Bacterial Single Genomes: Quality control (FastQC, Kraken2, blobtools), assembly (SPAdes, Velvet, or A5/A5-miseq assemblers), assessment of assembly metrics (QUAST, BUSCO), annotation (Prokka), and evaluation (QUAST).
Contamination: it is important to include a quality control step on your raw data for removal of contaminating sequences to prevent these ending up in your assembly. This is particularly important for sample types that are expected to have host or other organisms’ cells/ DNA present. Kraken can be used to screen and filter contaminants in raw reads, and BlobTools can be used post-assembly to detect presence of contaminants.

CGEBM Workflows: We provide an end-to-end workflow from raw data to publication ready outputs. Outputs include FASTA-formatted assemblies and GFF annotation files, as well as tables and plots of bespoke analysis results with publication quality figures. Data management and reporting are included.

Relevant Training: CGEBM offers a training course in Genome Assembly and Annotation, which uses assembly of a bacterial genome as an example but teaches principles that are also applicable to larger genomes.

Transcriptome Assembly

Brief Description: CGEBM has expertise in de novo transcriptome assembly and annotation. This is useful for creating a reference set of sequences for RNA-seq analysis in non-model organisms for which no reference genome is available.

Example Bioinformatics Pipeline:

Initial Steps: Quality control (FastQC), Trimming (TrimGalore!), assembly (Trinity).
Later Phases: Quantification (RSEM). Overall assessment of completeness (BUSCO). Identification of protein coding genes and functional annotation (Trinotate)

CGEBM Workflow: We provide an end-to-end workflow from raw data to publication ready outputs, including reporting and data management. Our analysis uses Trinity for de novo assembly and Trinotate for annotation.

Relevant Training: A detailed tutorial based on Trinity can be found on the Galaxy Training pages. A larger, more-general guide to software options has been published by Raghavan et al (2022).

Variant Detection

Brief Description: Variant calling identifies SNPs, indels, and structural variants from aligned sequencing reads, crucial for population genetics and disease association studies.

Example Bioinformatics Pipeline:

Initial Steps: Quality control (fastqc), Alignment (BWA), realignment (GATK), and base quality recalibration.
Later Phases: Variant Calling (GATK HaplotypeCaller, FreeBayes), annotation (ANNOVAR in R), and filtering (VCFtools).

CGEBM Workflow: We use BWA or HISAT2 alignments followed by variant discovery using FreeBayes or GATK for germline/somatic variants. Data management, reporting and publication-quality figures are also included.

Example outputs: VCF files with annotated variants. If desired, further analysis using one of Ensembl’s many genome version-specific Variant Effect Predictor resources can be performed.

Transposable element (TE) library manual curation and TE identification

Brief Description: CGEBM has expertise in de novo transposable element (TE) library creation and manual curation. TE identification is an essential step during the generation and annotation of a eukaryotic genome assembly. Manually curating a TE library is the gold standard and results in a more comprehensive and less fragmented TE library.

Example Bioinformatics Pipeline:

Initial Steps: De novo TE library generation from a genome assembly (RepeatModeler), automated curation of TE library (TEtrimmer), and manual curation of TE library which checks all automated curation results following the protocols of Goubert et al (2022).
Later Steps: TE identification and quantification in the genome assembly, including TE age profiles (RepeatMasker, ParseRM).

CGEBM Pipeline: We provide an end-to-end workflow from TE library generation to annotated and analysed TE content with publication ready outputs. TE identification will utilise the novel manually curated TE library and a homology-based approach.

Relevant Training: A detailed step by-step tutorial, including programme installation guides and scripts, for TE library manual curation has been published in Goubert et al (2022). A detailed tutorial on how to identify TEs in a genome assembly is available here.

Microbial Communities and Complex DNA Samples

Marker Gene Profiling/Amplicon Sequencing (Amp-seq)

Brief Description: CGEBM has extensive expertise in amplicon sequencing for microbial community profiling. This targets marker genes (e.g. 16S rRNA, ITS), for profiling of microbial community compositions, including bacteria, archaea and fungi. CGEBM have also used AMP-seq for profiling eukaryotic communities (e.g. dietary plant and animal composition of stool)

Example Bioinformatics Pipeline:

Initial Steps: Quality control (FastQC), Adapter and primer trimming (CutAdapt), quality filtering and amplicon construction (DADA2 package in R), and taxonomy assignment (DADA2) using suitable database for bacteria and archaea (e.g. SILVA, RDP), fungi (e.g. UNITE) or other appropriate databases for the marker gene and sample type. If the publicly available database for your marker gene is poor or does not exist, you may need a custom database.
Later Phases: Diversity (phyloseq and vegan packages in R) and differential abundance analyses (DESeq2 and ANCOM in R).

CGEBM Full-Featured Workflow: CGEBM provides a comprehensive, end-to-end Amp-Seq pipeline including Quality control (fastqc), raw data processing (CutAdapt), amplicon construction (DADA2), diversity analysis (phyloseq, vegan and adonis in R) and differential abundance analysis (DESeq2 and ANCOM). Where required, we also can perform follow-up bespoke analysis for more complicated experimental designs, for example repeated measures. Data management, reporting and production of publication-quality figures are included in our bioinformatics service. Example outputs: ASV tables, alpha/beta diversity plots, differential abundance plots.

Relevant Training: CGEBM offers a training course in Microbial Community Analysis, usually in April or May each year.

Metagenomics

Brief Description: CGEBM has expertise in performing metagenomic analysis, which uses shotgun sequencing of all DNA in a sample to profile microbial communities. This may include species from multiple domains of life and will include unculturable species. Analysis may include the detection of functional genes and their prevalence, extending beyond marker genes for taxonomic profiling.

Example Bioinformatics Pipeline:

Initial Steps: Quality control, removal of host DNA where necessary, and assembly are handled by sub-modules of the metaWRAP pipeline.
Later Phases: Binning is handled by a metaWRAP sub-module, and annotation is handled either with metaWRAP’s Annotate_bins module or by METABOLIC. The abundances of the bins across the samples can be assessed with metaWrap’s Quant_bins module and taxonomy with the Classify_bins module.

CGEBM Workflow: We use metaWRAP for metagenomic analysis. Extended analysis of microbial abundance and frequency of different gene classes using in-house methods and custom scripts can be included. Data management, reporting and publication-quality figures are also provided in our bioinformatics service.

Relevant Training: A tutorial and sample analysis can be found on the metaWRAP page on GitHub.

Functional Genomics: Mutant Library Screens

Transposon-Insertion Sequencing (TIS): Tn-seq, TraDIS, INSeq, HITS for mutant library screens

Brief Description: Transposon mutagenesis sequencing can use genomic mappings of Illumina short-read sequencing data to identify transposon insertion sites and thereby identify essential genes and fitness effects in bacterial genomes under selective conditions. A recent overview of the technique and applications of it was published by Warner et al (2023) (https://pmc.ncbi.nlm.nih.gov/articles/PMC10710833/)

Example Bioinformatics Pipeline:

Initial Steps: Quality control (fastqc), Read trimming (TrimGalore), mapping (BWA).
Later Phases: Insertion counting and essentiality modelling (Bio-Tradis).

CGEBM Workflow: Our bioinformaticians have extensive experience with read alignment to genomes. Downstream analysis of essentiality would be done using Bio-Tradis or similar packages, including bespoke scripts for novel analyses. Note that our bioinformatics services include data management, reporting and provision of publication-quality figures.

Example outputs: Fitness scores per gene, graphs of differential coverage depth between test and control samples and statistical significance calculations per gene.

Bar-Seq: "Deep Barcode Sequencing" for mutant library screens.

Brief Description: Bar-Seq is a high-throughput functional genomics approach ideally suited to testing for genetic vulnerabilities of cells in culture, for example yeast cells. Bar-Seq can also be used to track relative abundance of CRISPR or other mutations, including quantification and significance testing.

Example Bioinformatics Pipeline:

Initial Steps: Quality control (fastqc), Read trimming (TrimGalore), barcode extraction (custom scripts with BBDuk).
Later Phases: Abundance normalization (DESeq2 in R), hit selection (MAGeCK).

CGEBM Workflow: Our workflow follows the general process described here to generate a count matrix. Downstream abundance analysis is performed with DESeq2. Our bioinformatics services include data management, reporting and provision of publication-quality figures.

Example Outputs: CGEBM workflow was published by Pradhan et al. (2017), demonstrating typical outputs from this analysis.

Throughout this document we highlight other external software tools and resources that may be useful for different applications. We hope these references will be useful for your research, but they should not be interpreted as endorsements. As with nf-core pipelines, CGEBM do not have sufficient capacity to offer individual support for your own analyses with these tools, nor can we guarantee the correctness of your usage of them.

We strongly encourage you to consult the official documentation and associated publications for any software or pipeline you use, as these materials explain any assumptions or requirements of the analysis and how different parameters may significantly impact your analysis.