Report deseq-from-counts-matrix Pipeline: 20210303_220413

Reads statistics
Exploratory analysis
Differential expression analysis
Bioinformatics pipeline methods
Links to statistical results
- Quantification and statistical analysis results
Acknowledgments

This analysis ran DESeq2 with batches. This analysis ran DESeq2 with the : contrasts.

Reads statistics

Figure 1: Histogram showing the number of reads for each sample in raw data. Median, mean and standard deviation of the sequencing depth of all samples were 4170450, 4504656, 1695691 reads.

Download figure as table

Exploratory analysis

The top highly-expressed genes

Figure 2:

Heatmap plotting the highly-expressed genes.

The highest fraction of counts from a single gene is 3.5%. The figure below presents the fraction of reads from the genes with the most counts.

Download figure as table

Heatmap of samples correlation

Figure 3:

Heatmap of Pearson distances between samples using normalized log2 gene expression values.

Distances between samples are calculated as 1- r (r = Pearson correlation coefficient).

Download samples correlation with batch correction table

Download samples correlation without batch correction table

Samples dendrogram

Figure 4 :

Distances between samples are calculated according to Pearson distances and then clustered according to Ward’s minimum variance agglomerative method.

Download samples dendrogram without batch correction table

Download samples dendrogram with batch correction table

PCA analysis

Figure 5:

PCA analysis: a. Histogram of explained variance percentage for each PC component.

Download figure5a with batch correction as table

Download figure5a without batch correction as table

b. PCA plot of PC1 vs PC2 c. PCA plot of PC1 vs PC3

Download figure 5b_pca with batch correction as table

Download figure 5b_pca without batch correction as table

Differential expression analysis

Table 1: Differentially expressed (DE) genes for each comparison

Differential expression analysis was performed using DESeq2.

Thresholds for significant DE genes (per comparison):

padj <= 0.05
|log2FoldChange| >= 1
baseMean >= 5

Comparison	Padj corrected by fdrtool	Plots	MA plot	DE Genes
VP24_NT_vs_VP24_24h	FALSE	link	link	link

Table 2: DE genes for functional analysis

To perform functional enrichments, you can try one or more of the following websites: Intermine, Reactome, GeneAnalytics from the GeneCards(R) Suite^(R) or STRING.You can also use the relevant Send to Intemine buttons below to send the differentially expressed genes directly to Intermine.

Bioinformatics pipeline methods

Normalization of the counts and differential expression analysis was performed using DESeq2 (DOI: 10.1186/s13059-014-0550-8) with the parameters: betaPrior=True, cooksCutoff=FALSE, independentFiltering=FALSE. Raw P values were adjusted for multiple testing using the procedure of Benjamini and Hochberg (DOI: 10.1111/j.2517-6161.1995.tb02031.x).

We recommend looking at the p-values distribution plots for each pairwise comparison in the Differential Expression Analysis section of the report. The plots can be reached by clicking on the word link in the General Plots column of Table 1. The p-values distribution plots should be used to evaluate the need of correcting the adjusted p-value with fdrtools.

Interactive MA plots for each pairwise comparison were done using Glimma (see the link in the “Plots” column of Table 1 under the section Differential Expression Analysis). A dot plot representation of the normalized gene counts per condition can be found in the link in the " DE Genes" column of Table 1. The pipeline was constructed using Snakemake (DOI: 10.1093/bioinformatics/bts480).

Links to statistical results

Quantification and statistical analysis results

Results: raw counts, normalized counts and ComBat (log normalized counts after batch correction; combat values were calculated using the “sva” package of R and are batch corrected normalized log2 count values), and pairwise DESeq2 statistics can be downloaded as txt here or xlsx here files.

Normalized counts can be downloaded from: here.

R packages versions can be found at: sessionInfo.txt

Samples comparison and their batch details can be found at: ~/example_and_data_for_testing_DESeq_from_counts_matrix/20210303_220413_demo_DESeq2_from_counts_matrix/pheno_data-20210303_220413.tsv

General information about the run can be found at: ~/UTAP-data/reports/20210303_220413_demo_analysis_parameters.yaml

Input counts matrix: ~/example_and_data_for_testing_DESeq_from_counts_matrix/01.raw_counts_VP24.txt

Output folder: ~/example_and_data_for_testing_DESeq_from_counts_matrix/20210303_220413_demo_DESeq2_from_counts_matrix

Report output folder: ~/example_and_data_for_testing_DESeq_from_counts_matrix/20210303_220413_demo_DESeq2_from_counts_matrix/demo__20210303_220413

Acknowledgments

Citing UTAP:

Kohen R, Barlev J, Hornung G, Stelzer G, Feldmesser E, Kogan K, Safran M, Leshkowitz D: UTAP: User-friendly Transcriptome Analysis Pipeline. BMC Bioinformatics 2019, 20(1):154 (PMID: 30909881).

Report deseq-from-counts-matrix Pipeline: 20210303_220413_demo

03-03-2021