Developed by UTAP2 team
Bioinformatics unit at Life Sciences Core Facilities (LSCF)
Weizmann Institute of Science    

[1] TRUE

Sequencing and mapping quality control (QC)

Figure 1: Plots the average quality of each base across all reads. Quality of 30 and up is good (predicted error rate 1:1000).

Download figure as table

Figure 2: Histogram showing the number of reads for each sample in raw data. The sequencing depth statistics for all samples were as follows:

the median depth was 35,695,056 reads,

the mean depth was 35,894,160 reads,

the standard deviation was 3,717,888 reads.

Download figure as table

Figure 3: Percentage of reads discarded after trimming.

No figure presented since the percentage of reads discarded after trimming for all samples is lower than 1%.

Download table

Figure 4: Histogram with the number of reads for each sample in each step of the pipeline.

Download figure as table

Figure 5: Coverage plot on Genebody

Plot of mean read (counts per million mapped reads) coverage of gene regions. This plot displays the mean coverage for all the genes, from -2000 bases of the transcription start site (TSS) to +2000 bases of the transcription end site (TES).

MACS peak calling

MACS results for each sample

Sample Type Sample Total Tags (Reads) Filtered Tags (Reads) Maximum Duplicate Tags (Reads) at the Same Position Redundant Rate
Control Runx3_CD4DC_input1 22457161 22418914 2 0.00
Control Runx3_CD4DC_input2 26806518 26760044 2 0.00
Treatment Runx3_CD4DC_IP1 25625619 25438921 2 0.01
Treatment Runx3_CD4DC_IP2 21749235 21591289 2 0.01

MACS results for each comparison

Comparison tag (read) size (bp) MACS model d length Total number of peaks Total number of peaks after filtering black list’s peaks MACS model plots
Runx3_CD4DC_IP1_vs_Runx3_CD4DC_input1 50 307 7530 7381 link
Runx3_CD4DC_IP2_vs_Runx3_CD4DC_input2 49 317 11474 11255 link

The final number of peaks for all comparisons is: 18,636

Link for Integrated annotated peaks file Download table

Figure 6: Number of peaks for all comparisons

Figure 7: Peaks distribution in genomic regions

Figure 8: Peaks distribution around TSS

Figure 9: Overlap of peaks among the first 4 comparisons

Venn plot legend

comparisons mark
Runx3_CD4DC_IP1_vs_Runx3_CD4DC_input1 1
Runx3_CD4DC_IP2_vs_Runx3_CD4DC_input2 2

Bioinformatics pipeline methods

Reads were trimmed using cutadapt (DOI: 10.14806/ej.17.1.200) with the parameters: -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC –times TIMES -q 20 -m 25).

Reads were mapped to mm10 genome using Bowtie2 (DOI: 10.1038/nmeth.1923).

Uniquely mapped reads were extracted using samtools with the parameters: -F 4, -f 0x2, -q 39.

The reads were graphically visualized using ngsplot with the parameters: -G -R genebody -C -O samples -D refseq -L 50000.

Significant chip regions (peaks) are evaluated and compared to control samples if present using MACS2 callpeak (https://doi.org/10.1186/gb-2008-9-9-r137) with the parameters: –bw 300 -B -f –SPMR -keep-dup auto -q 0.05 BAMPE –nomodel for paired end analyses, and BAM for single end analyses.

Files containing the predicted peaks coordinates in BedGraph format are converted to BIgWig format using bedtools slop with the parameters: -g -b 0, bedClip stdin and bedGraphToBigWig with default parameters.

The predicted peaks were annotated according to the mm10 genome using Homer with default parameters after merging all peaks from all samples together with bedtools multiinter.

The distribution of peaks in genomic regions and their proximity to TSS (transcription start sites) were examined using ChIPseeker (DOI: 10.18129/B9.bioc.ChIPseeker). The ovelap of peaks for the first 4 comparisons in the Venn diagram was also analyzed using ChIPseeker.

The resulting peaks underwent filtering to exclude peaks from the blacklist.

black list for mm10 genome is taken from (https://github.com/Boyle-Lab/Blacklist/tree/master/lists)

The pipeline was constructed using Snakemake (DOI: 10.1093/bioinformatics/bts480).

Acknowledgments

Citing UTAP:

Kohen R, Barlev J, Hornung G, Stelzer G, Feldmesser E, Kogan K, Safran M, Leshkowitz D: UTAP: User-friendly Transcriptome Analysis Pipeline. BMC Bioinformatics 2019, 20(1):154.