This analysis ran DESeq2 with the : contrasts.
Figure 1: Plots the average quality of each base across all reads. Quality of 30 and up is good (predicted error rate 1:1000).
Figure 2: Histogram showing the number of reads for each sample in raw data. The sequencing depth statistics for all samples were as follows:
the median depth was 3,677,372 reads,
the mean depth was 5,415,016 reads,
the standard deviation was 3,923,443 reads.
Figure 3: Percentage of reads discarded after trimming.
No figure presented since the percentage of reads discarded after trimming for all samples is lower than 1%.
Figure 4: Histogram with the number of reads for each sample in each step of the pipeline.
Figure 5: Coverage plot on Genebody
Plot of mean read (counts per million mapped reads) coverage of gene regions. This plot displays the mean coverage for all the genes, from -2000 bases of the transcription start site (TSS) to +2000 bases of the transcription end site (TES).
Figure 6:
Heatmap plotting the highly-expressed genes.
The highest fraction of counts from a single gene is 5.5%. The figure below presents the fraction of reads from the genes with the most counts.
Figure 7:
Heatmap of Pearson distances between samples using normalized log2 gene expression values.
Distances between samples are calculated as 1- r (r = Pearson correlation coefficient). Download samples correlation table
Figure 8 :
Distances between samples are calculated according to Pearson distances and then clustered according to Ward’s minimum variance agglomerative method.Download samples dendrogram table
Figure 9:
PCA analysis: a. Histogram of % explained variability for each PC component
b. PCA plot of PC1 vs PC2 c. PCA plot of PC1 vs PC3
Table 1: Differentially expressed (DE) genes for each comparison
Differential expression analysis was performed using DESeq2.
Thresholds for significant DE genes (per comparison):
Comparison | Padj corrected by fdrtool | Plots | MA plot | DE Genes |
---|---|---|---|---|
basal_base_vs_basal_tip | FALSE | link | link | link |
Table 2: DE genes for functional analysis
To perform functional enrichments, you can try one or more of the following websites: Intermine, Reactome, GeneAnalytics from the GeneCards(R) Suite(R) or STRING.You can also use the relevant Send to Intemine buttons below to send the differentially expressed genes directly to Intermine.
Reads were trimmed using cutadapt (DOI: 10.14806/ej.17.1.200) v4.1 (parameters: -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -a “A{10}” –times 2 -u 3 -u -3 -q 20 -m 25).
Reads were mapped to genome ~/genomes/Mus_musculus/UCSC/mm10/Sequence/STAR_v2.7.10a_index/ using STAR (DOI: 10.1093/bioinformatics/bts635) v2.7.10a (parameters: –alignEndsType EndToEnd, –outFilterMismatchNoverLmax 0.05, –twopassMode Basic, –alignSoftClipAtReferenceEnds No).
The pipeline quantifies the 3’ of RefSeq annotated genes (The 3’ region contains 1,000 bases upstream of the 3’ end and 100 bases downstream):
We used the 3’ end (1000bp) of the transcripts for counting the number of reads per gene. Counting (UMI counts) was done after marking duplicates (in-house script) using HTSeq-count (DOI: 10.1093/bioinformatics/btu638) v2.0.2 in union mode.
The pipeline quantifies the RefSeq annotated genes: ~/gtf/mm10.genes.3utr.gtf.
The annotation version and date are: no current information about gtf file version and date.
Further analysis is done for genes having minimum 5 read in at least one sample.
Normalization of the counts and differential expression analysis was performed using DESeq2 (DOI: 10.1186/s13059-014-0550-8) v1.36.0 with the parameters: betaPrior=True, cooksCutoff=FALSE, independentFiltering=FALSE. Raw P values were adjusted for multiple testing using the procedure of Benjamini and Hochberg (DOI: 10.1111/j.2517-6161.1995.tb02031.x).
We recommend looking at the p-values distribution plots for each pairwise comparison in the Differential Expression Analysis section of the report. The plots can be reached by clicking on the word link in the General Plots column of Table 1. The p-values distribution plots should be used to evaluate the need of correcting the adjusted p-value with fdrtools.
Interactive MA plots for each pairwise comparison were done using Glimma v2.6.0 (see the link in the “Plots” column of Table 1 under the section Differential Expression Analysis). A dot plot representation of the normalized gene counts per condition can be found in the link in the " DE Genes" column of Table 1.
The pipeline was constructed using Snakemake (DOI: 10.1093/bioinformatics/bts480) v7.14.0.
Results: raw counts, normalized counts, rld - log normalized counts and pairwise DESeq2 statistics can be downloaded as txt format here or as xlsx format here.
Sequences from folder: /mnt/host_mount/utap-output/testuser/exmaple_and_data_for_testing_mm10_SCRB-seq/fastq
Output folder: /mnt/host_mount/utap-output/testuser/exmaple_and_data_for_testing_mm10_SCRB-seq/20250202_101250_LCM_mm10_Transcriptome_SCRB-Seq
FastQC folder: /mnt/host_mount/utap-output/testuser/exmaple_and_data_for_testing_mm10_SCRB-seq/20250202_101250_LCM_mm10_Transcriptome_SCRB-Seq/3_fastqc
MultiQC folder: /mnt/host_mount/utap-output/testuser/exmaple_and_data_for_testing_mm10_SCRB-seq/20250202_101250_LCM_mm10_Transcriptome_SCRB-Seq/3_fastqc/multiQC
Report output folder: /mnt/host_mount/utap-output/testuser/exmaple_and_data_for_testing_mm10_SCRB-seq/20250202_101250_LCM_mm10_Transcriptome_SCRB-Seq/10_reports/short_20250202_101250
Statistics regarding the number of reads for each sample for various steps of the pipeline can be downloaded from: here.
Raw counts can be downloaded from: here.
Raw counts after UMI correction can be downloaded from here
Normalized counts can be downloaded from: here.
Commands log can be downloaded from: here.
R packages versions can be found at: sessionInfo.txt
UTAP version 2.0
Samples comparison and their batch details can be found at: /mnt/host_mount/utap-output/testuser/exmaple_and_data_for_testing_mm10_SCRB-seq/20250202_101250_LCM_mm10_Transcriptome_SCRB-Seq/pheno_data-20250202_101250.tsv
General information about the run can be found at: https://utap-demo.weizmann.ac.il/reports/20250202_101250_LCM_mm10_analysis_parameters.yaml
Citing UTAP:
Kohen R, Barlev J, Hornung G, Stelzer G, Feldmesser E, Kogan K, Safran M, Leshkowitz D: UTAP: User-friendly Transcriptome Analysis Pipeline. BMC Bioinformatics 2019, 20(1):154.