[1] TRUE
Figure 1: Plots the average quality of each base across all reads. Quality of 30 and up is good (predicted error rate 1:1000).
Figure 2: Histogram showing the number of reads for each sample in raw data. The sequencing depth statistics for all samples were as follows:
the median depth was 35,695,056 reads,
the mean depth was 35,894,160 reads,
the standard deviation was 3,717,888 reads.
Figure 3: Percentage of reads discarded after trimming.
No figure presented since the percentage of reads discarded after trimming for all samples is lower than 1%.
Figure 4: Histogram with the number of reads for each sample in each step of the pipeline.
Figure 5: Coverage plot on Genebody
Plot of mean read (counts per million mapped reads) coverage of gene regions. This plot displays the mean coverage for all the genes, from -2000 bases of the transcription start site (TSS) to +2000 bases of the transcription end site (TES).
MACS results for each sample
| Sample Type | Sample | Total Tags (Reads) | Filtered Tags (Reads) | Maximum Duplicate Tags (Reads) at the Same Position | Redundant Rate |
|---|---|---|---|---|---|
| Control | Runx3_CD4DC_input1 | 22457161 | 22418914 | 2 | 0.00 |
| Control | Runx3_CD4DC_input2 | 26806518 | 26760044 | 2 | 0.00 |
| Treatment | Runx3_CD4DC_IP1 | 25625619 | 25438921 | 2 | 0.01 |
| Treatment | Runx3_CD4DC_IP2 | 21749235 | 21591289 | 2 | 0.01 |
MACS results for each comparison
| Comparison | tag (read) size (bp) | MACS model d length | Total number of peaks | Total number of peaks after filtering black list’s peaks | MACS model plots |
|---|---|---|---|---|---|
| Runx3_CD4DC_IP1_vs_Runx3_CD4DC_input1 | 50 | 307 | 7530 | 7381 | link |
| Runx3_CD4DC_IP2_vs_Runx3_CD4DC_input2 | 49 | 317 | 11474 | 11255 | link |
The final number of peaks for all comparisons is: 18,636
Link for Integrated annotated peaks file Download table
Figure 6: Number of peaks for all comparisons
Figure 7: Peaks distribution in genomic regions
Figure 8: Peaks distribution around TSS
Figure 9: Overlap of peaks among the first 4 comparisons
Venn plot legend
| comparisons | mark |
|---|---|
| Runx3_CD4DC_IP1_vs_Runx3_CD4DC_input1 | 1 |
| Runx3_CD4DC_IP2_vs_Runx3_CD4DC_input2 | 2 |
Reads were trimmed using cutadapt (DOI: 10.14806/ej.17.1.200) with the parameters: -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC –times TIMES -q 20 -m 25).
Reads were mapped to mm10 genome using Bowtie2 (DOI: 10.1038/nmeth.1923).
Uniquely mapped reads were extracted using samtools with the parameters: -F 4, -f 0x2, -q 39.
The reads were graphically visualized using ngsplot with the parameters: -G -R genebody -C -O samples -D refseq -L 50000.
Significant chip regions (peaks) are evaluated and compared to control samples if present using MACS2 callpeak (https://doi.org/10.1186/gb-2008-9-9-r137) with the parameters: –bw 300 -B -f –SPMR -keep-dup auto -q 0.05 BAMPE –nomodel for paired end analyses, and BAM for single end analyses.
Files containing the predicted peaks coordinates in BedGraph format are converted to BIgWig format using bedtools slop with the parameters: -g -b 0, bedClip stdin and bedGraphToBigWig with default parameters.
The predicted peaks were annotated according to the mm10 genome using Homer with default parameters after merging all peaks from all samples together with bedtools multiinter.
The distribution of peaks in genomic regions and their proximity to TSS (transcription start sites) were examined using ChIPseeker (DOI: 10.18129/B9.bioc.ChIPseeker). The ovelap of peaks for the first 4 comparisons in the Venn diagram was also analyzed using ChIPseeker.
The resulting peaks underwent filtering to exclude peaks from the blacklist.
black list for mm10 genome is taken from (https://github.com/Boyle-Lab/Blacklist/tree/master/lists)
The pipeline was constructed using Snakemake (DOI: 10.1093/bioinformatics/bts480).
Sequences from folder: /home/labs/bioservices/Collaboration/example_ChIP_Dicken_DC_Runx/Dicken_fastq
Output folder: /home/labs/bioservices/Collaboration/example_ChIP_Dicken_DC_Runx/20241119_044920_demo_ChIP-Seq
FastQC folder: /home/labs/bioservices/Collaboration/example_ChIP_Dicken_DC_Runx/20241119_044920_demo_ChIP-Seq/2_fastqc
MultiQC folder: /home/labs/bioservices/Collaboration/example_ChIP_Dicken_DC_Runx/20241119_044920_demo_ChIP-Seq/2_fastqc/multiQC
Report output folder: test_20241119_044920
Statistics regarding the number of reads for each sample for various steps of the pipeline can be downloaded from: here.
MACS peak calling Statistics for each sample and comparison can be downloaded from: here.
Commands log can be downloaded from: here.
R packages versions can be found at: sessionInfo.txt
UTAP version 2.0
General information about the run can be found at: /home/labs/bioservices/services/UTAP-data/utap-meta-data/installation/UTAP-data-singularity/utap_update/reports/20241119_044920_demo_analysis_parameters.yaml
Treatment and control samples are found at: here.
Citing UTAP:
Kohen R, Barlev J, Hornung G, Stelzer G, Feldmesser E, Kogan K, Safran M, Leshkowitz D: UTAP: User-friendly Transcriptome Analysis Pipeline. BMC Bioinformatics 2019, 20(1):154.