Developed by UTAP2 team
Bioinformatics unit at Life Sciences Core Facilities (LSCF)
Weizmann Institute of Science    

This analysis ran DESeq2 with batches. This analysis ran DESeq2 with the : contrasts.

Sequencing and mapping quality control (QC)

Figure 1: Plots the average quality of each base across all reads. Quality of 30 and up is good (predicted error rate 1:1000).

Download figure as table

Figure 2: Histogram showing the number of reads for each sample in raw data. The sequencing depth statistics for all samples were as follows:

the median depth was 26,948,024 reads,

the mean depth was 30,721,138 reads,

the standard deviation was 15,659,407 reads.

Download figure as table

Figure 3: Percentage of reads discarded after trimming.

No figure presented since the percentage of reads discarded after trimming for all samples is lower than 1%.

Download table

Figure 4: Histogram with the number of reads for each sample in each step of the pipeline.

Download figure as table

Figure 5: Coverage plot on Genebody

Plot of mean read (counts per million mapped reads) coverage of gene regions. This plot displays the mean coverage for all the genes, from -2000 bases of the transcription start site (TSS) to +2000 bases of the transcription end site (TES).

Exploratory analysis

The top highly-expressed genes

Figure 6:

Heatmap plotting the highly-expressed genes.

The highest fraction of counts from a single gene is 5.4%. The figure below presents the fraction of reads from the genes with the most counts.

Download figure as table

Heatmap of samples correlation

Figure 7:

Heatmap of Pearson distances between samples using normalized log2 gene expression values.

Distances between samples are calculated as 1- r (r = Pearson correlation coefficient). Download samples correlation with batch correction table

Download samples correlation without batch correction table

Samples dendrogram

Figure 8 :

Distances between samples are calculated according to Pearson distances and then clustered according to Ward’s minimum variance agglomerative method.Download samples dendrogram without batch correction table

Download samples dendrogram with batch correction table

PCA analysis

Figure 9:

PCA analysis: a. Histogram of explained variance percentage for each PC component.

Download figure9a with batch correction as table

Download figure9a without batch correction as table

b. PCA plot of PC1 vs PC2 c. PCA plot of PC1 vs PC3

Download figure 9b_pca with batch correction as table

Download figure 9b_pca without batch correction as table

Differential expression analysis

Table 1: Differentially expressed (DE) genes for each comparison

Differential expression analysis was performed using DESeq2.

Thresholds for significant DE genes (per comparison):

Comparison Padj corrected by fdrtool Plots MA plot DE Genes
siTAZ_vs_siYAP FALSE link link link
siTAZ_vs_siC FALSE link link link
siYAP_vs_siC FALSE link link link

Table 2: DE genes for functional analysis

To perform functional enrichments, you can try one or more of the following websites: Intermine, Reactome, GeneAnalytics from the GeneCards(R) Suite(R) or STRING.

Bioinformatics pipeline methods

Reads were trimmed using cutadapt (DOI: 10.14806/ej.17.1.200) v4.1 (parameters: -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -a “A{10}” -a “T{10}” –times 2 -q 20 -m 25).

Reads were mapped to genome /shareDB/genomes/Homo_sapiens/UCSC/hg38/Sequence/hg38_star_v2.7.10a_index using STAR (DOI: 10.1093/bioinformatics/bts635) v2.7.10a (parameters: –alignEndsType EndToEnd, –outFilterMismatchNoverLmax 0.05, –twopassMode Basic).

The pipeline quantifies the Gencode annotated genes: /home/labs/bioservices/services/ngs/support/gtf/hg38-gencode.genes.gtf.

The annotation version and date are: description: evidence-based annotation of the human genome (GRCh38), version 34 (Ensembl 100).

Counting was done using STAR v2.7.10a.

Further analysis is done for genes having minimum 5 read in at least one sample.

Normalization of the counts and differential expression analysis was performed using DESeq2 (DOI: 10.1186/s13059-014-0550-8) v1.36.0 with the parameters: betaPrior=True, cooksCutoff=FALSE, independentFiltering=FALSE. Raw P values were adjusted for multiple testing using the procedure of Benjamini and Hochberg (DOI: 10.1111/j.2517-6161.1995.tb02031.x).

We recommend looking at the p-values distribution plots for each pairwise comparison in the Differential Expression Analysis section of the report. The plots can be reached by clicking on the word link in the General Plots column of Table 1. The p-values distribution plots should be used to evaluate the need of correcting the adjusted p-value with fdrtools.

Interactive MA plots for each pairwise comparison were done using Glimma v2.6.0 (see the link in the “Plots” column of Table 1 under the section Differential Expression Analysis). A dot plot representation of the normalized gene counts per condition can be found in the link in the " DE Genes" column of Table 1.

The pipeline was constructed using Snakemake (DOI: 10.1093/bioinformatics/bts480) v7.14.0.

Acknowledgments

Citing UTAP:

Kohen R, Barlev J, Hornung G, Stelzer G, Feldmesser E, Kogan K, Safran M, Leshkowitz D: UTAP: User-friendly Transcriptome Analysis Pipeline. BMC Bioinformatics 2019, 20(1):154.