Report ChIP-Seq Pipeline: 20241119_044920

Sequencing and mapping quality control (QC)
MACS peak calling
Bioinformatics pipeline methods
Links to results
Acknowledgments

[1] TRUE

Sequencing and mapping quality control (QC)

Figure 1: Plots the average quality of each base across all reads. Quality of 30 and up is good (predicted error rate 1:1000).

Download figure as table

Figure 2: Histogram showing the number of reads for each sample in raw data. The sequencing depth statistics for all samples were as follows:

the median depth was 35,695,056 reads,

the mean depth was 35,894,160 reads,

the standard deviation was 3,717,888 reads.

Download figure as table

Figure 3: Percentage of reads discarded after trimming.

No figure presented since the percentage of reads discarded after trimming for all samples is lower than 1%.

Download table

Figure 4: Histogram with the number of reads for each sample in each step of the pipeline.

Download figure as table

Figure 5: Coverage plot on Genebody

Plot of mean read (counts per million mapped reads) coverage of gene regions. This plot displays the mean coverage for all the genes, from -2000 bases of the transcription start site (TSS) to +2000 bases of the transcription end site (TES).

MACS peak calling

MACS results for each sample

Sample Type	Sample	Total Tags (Reads)	Filtered Tags (Reads)	Maximum Duplicate Tags (Reads) at the Same Position	Redundant Rate
Control	Runx3_CD4DC_input1	22457161	22418914	2	0.00
Control	Runx3_CD4DC_input2	26806518	26760044	2	0.00
Treatment	Runx3_CD4DC_IP1	25625619	25438921	2	0.01
Treatment	Runx3_CD4DC_IP2	21749235	21591289	2	0.01

MACS results for each comparison

Comparison	tag (read) size (bp)	MACS model d length	Total number of peaks	Total number of peaks after filtering black list’s peaks	MACS model plots
Runx3_CD4DC_IP1_vs_Runx3_CD4DC_input1	50	307	7530	7381	link
Runx3_CD4DC_IP2_vs_Runx3_CD4DC_input2	49	317	11474	11255	link

The final number of peaks for all comparisons is: 18,636

Link for Integrated annotated peaks file Download table

Figure 6: Number of peaks for all comparisons

Figure 7: Peaks distribution in genomic regions

Figure 8: Peaks distribution around TSS

Figure 9: Overlap of peaks among the first 4 comparisons

Venn plot legend

comparisons	mark
Runx3_CD4DC_IP1_vs_Runx3_CD4DC_input1	1
Runx3_CD4DC_IP2_vs_Runx3_CD4DC_input2	2

Bioinformatics pipeline methods

Reads were trimmed using cutadapt (DOI: 10.14806/ej.17.1.200) with the parameters: -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC –times TIMES -q 20 -m 25).

Reads were mapped to mm10 genome using Bowtie2 (DOI: 10.1038/nmeth.1923).

Uniquely mapped reads were extracted using samtools with the parameters: -F 4, -f 0x2, -q 39.

The reads were graphically visualized using ngsplot with the parameters: -G -R genebody -C -O samples -D refseq -L 50000.

Significant chip regions (peaks) are evaluated and compared to control samples if present using MACS2 callpeak (https://doi.org/10.1186/gb-2008-9-9-r137) with the parameters: –bw 300 -B -f –SPMR -keep-dup auto -q 0.05 BAMPE –nomodel for paired end analyses, and BAM for single end analyses.

Files containing the predicted peaks coordinates in BedGraph format are converted to BIgWig format using bedtools slop with the parameters: -g -b 0, bedClip stdin and bedGraphToBigWig with default parameters.

The predicted peaks were annotated according to the mm10 genome using Homer with default parameters after merging all peaks from all samples together with bedtools multiinter.

The distribution of peaks in genomic regions and their proximity to TSS (transcription start sites) were examined using ChIPseeker (DOI: 10.18129/B9.bioc.ChIPseeker). The ovelap of peaks for the first 4 comparisons in the Venn diagram was also analyzed using ChIPseeker.

The resulting peaks underwent filtering to exclude peaks from the blacklist.

black list for mm10 genome is taken from (https://github.com/Boyle-Lab/Blacklist/tree/master/lists)

The pipeline was constructed using Snakemake (DOI: 10.1093/bioinformatics/bts480).

Links to results

Sequences from folder: /home/labs/bioservices/Collaboration/example_ChIP_Dicken_DC_Runx/Dicken_fastq

Output folder: /home/labs/bioservices/Collaboration/example_ChIP_Dicken_DC_Runx/20241119_044920_demo_ChIP-Seq

FastQC folder: /home/labs/bioservices/Collaboration/example_ChIP_Dicken_DC_Runx/20241119_044920_demo_ChIP-Seq/2_fastqc

MultiQC folder: /home/labs/bioservices/Collaboration/example_ChIP_Dicken_DC_Runx/20241119_044920_demo_ChIP-Seq/2_fastqc/multiQC

Report output folder: test_20241119_044920

Statistics regarding the number of reads for each sample for various steps of the pipeline can be downloaded from: here.

MACS peak calling Statistics for each sample and comparison can be downloaded from: here.

Commands log can be downloaded from: here.

R packages versions can be found at: sessionInfo.txt

UTAP version 2.0

General information about the run can be found at: /home/labs/bioservices/services/UTAP-data/utap-meta-data/installation/UTAP-data-singularity/utap_update/reports/20241119_044920_demo_analysis_parameters.yaml

Treatment and control samples are found at: here.

Acknowledgments

Citing UTAP:

Kohen R, Barlev J, Hornung G, Stelzer G, Feldmesser E, Kogan K, Safran M, Leshkowitz D: UTAP: User-friendly Transcriptome Analysis Pipeline. BMC Bioinformatics 2019, 20(1):154.

Report ChIP-Seq Pipeline: 20241119_044920_demo

19-11-2024

Citation: Dicken J, Mildner A, Leshkowitz D, Touw IP, Hantisteanu S, Jung S, et al. (2013) Transcriptional Reprogramming of CD11b+Esamhi Dendritic Cell Identity and Function by Loss of Runx3. PLoS ONE 8(10): e77490. https://doi.org/10.1371/journal.pone.0077490

Sequencing and mapping quality control (QC)

MACS peak calling

Bioinformatics pipeline methods

Links to results

Acknowledgments