Figure 1: Plots the average quality of each base across all reads. Quality of 30 and up is good (predicted error rate 1:1000).
Figure 2: Histogram showing the number of reads for each sample in raw data. The sequencing depth statistics for all samples were as follows:
the median depth was 53,583,236 reads,
the mean depth was 55,238,538 reads,
the standard deviation was 4,703,440 reads.
Figure 3: Percentage of reads discarded after trimming.
No figure presented since the percentage of reads discarded after trimming for all samples is lower than 1%.
Figure 4: Histogram with the number of reads for each sample in each step of the pipeline.
MACS results for each sample
Sample Type | Sample | Total Tags (Reads) | Filtered Tags (Reads) | Maximum Duplicate Tags (Reads) at the Same Position | Redundant Rate |
---|---|---|---|---|---|
Treatment | RNC1 | 3513170 | 0 | 0 | 0 |
Treatment | RNC2 | 3300693 | 0 | 0 | 0 |
Treatment | RKD1 | 3924299 | 0 | 0 | 0 |
Treatment | RKD2 | 3433104 | 0 | 0 | 0 |
MACS results for each comparison
Comparison | tag (read) size (bp) | MACS model d length | Total number of peaks |
---|---|---|---|
RNC1 | 28 | 1 | 178147 |
RNC2 | 28 | 1 | 172091 |
RKD1 | 28 | 1 | 39898 |
RKD2 | 27 | 1 | 29716 |
The final number of peaks for all comparisons is: 419,852
Figure 5: Number of peaks for all samples
Figure 6: Peaks distribution in genomic regions
Figure 7: Peaks distribution around TSS
Figure 8: Overlap of peaks among the first 4 samples
Venn plot legend
samples | mark |
---|---|
RNC1 | 1 |
RNC2 | 2 |
RKD1 | 3 |
RKD2 | 4 |
Reads were trimmed using cutadapt (DOI: 10.14806/ej.17.1.200) with the parameters: -a CTGTAGGCACCATCAATAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC –times TIMES -q 20 -m 25).
Reads that did not align to rRNA were filtered using Bowtie1 (DOI: 10.1186/gb-2009-10-3-r25).
Reads with a minimum length of 25 and a maximum length of 32 were filtered using Cutadapt.
Reads were mapped to the mm10 genome using TopHat (DOI: 10.1093/bioinformatics/btp120) with the parameters: -N 1, –no-novel-juncs, –library-type fr-firststrand, and -p 20.
Uniquely mapped reads were extracted using samtools with the -q 10 parameter.
Only 5’ UTR and CDS fragments are counted using HTSeq-count v2.0.2 (DOI: 10.1093/bioinformatics/btu638) in intersection-nonempty mode.
Significant regions (peaks) are identified using MACS2 callpeak (DOI: 10.1186/gb-2008-9-9-r137) with the parameters: –keep-dup all, –nomodel, and –extsize=1.
The summits output from peak calling were shifted by 13 bases and extended by 3 bases according to gene orientation.
The distribution of peaks in genomic regions and their proximity to TSS (transcription start sites) were examined using ChIPseeker (DOI: 10.18129/B9.bioc.ChIPseeker). The ovelap of peaks for the first 4 samples in the Venn diagram was also analyzed using ChIPseeker.
The pipeline was constructed using Snakemake (DOI: 10.1093/bioinformatics/bts480).
Sequences from folder: ~/example_and_data_for_testing_ribo-seq_mm10/fastq
Output folder: ~/example_and_data_for_testing_ribo-seq_mm10/20241118_225323_demo_Ribo-Seq
FastQC folder: ~/example_and_data_for_testing_ribo-seq_mm10/20241118_225323_demo_Ribo-Seq/2_fastqc
MultiQC folder: ~/example_and_data_for_testing_ribo-seq_mm10/20241118_225323_demo_Ribo-Seq/2_fastqc/multiQC
Report output folder: data_for_demo_production_20241118_225323
Statistics regarding the number of reads for each sample for various steps of the pipeline can be downloaded from: here.
MACS peak calling Statistics for each sample and comparison can be downloaded from: here.
Commands log can be downloaded from: here.
R packages versions can be found at: sessionInfo.txt
UTAP version 2.0
General information about the run can be found at: ~/reports/20241118_225323_demo_analysis_parameters.yaml
Treatment and control samples are found at: here.
Citing UTAP:
Kohen R, Barlev J, Hornung G, Stelzer G, Feldmesser E, Kogan K, Safran M, Leshkowitz D: UTAP: User-friendly Transcriptome Analysis Pipeline. BMC Bioinformatics 2019, 20(1):154.