Report ATAC-Seq Pipeline: 20241119_044729

Sequencing and mapping quality control (QC)
MACS peak calling
Bioinformatics pipeline methods
Links to results
Acknowledgments

Sequencing and mapping quality control (QC)

Figure 1: Plots the average quality of each base across all reads. Quality of 30 and up is good (predicted error rate 1:1000).

Quality of read 1:

Download figure as table

Quality of read 2:

Download figure as table

Figure 2: Histogram showing the number of reads for each sample in raw data. The sequencing depth statistics for all samples were as follows:

the median depth was 42,837,616 reads,

the mean depth was 44,940,902 reads,

the standard deviation was 7,617,043 reads.

Download figure as table

Figure 3: Percentage of reads discarded after trimming.

No figure presented since the percentage of reads discarded after trimming for all samples is lower than 1%.

Download table

Figure 4: Histogram with the number of reads for each sample in each step of the pipeline.

Download figure as table

Figure 5: Coverage plot on Genebody

Plot of mean read (counts per million mapped reads) coverage of gene regions. This plot displays the mean coverage for all the genes, from -2000 bases of the transcription start site (TSS) to +2000 bases of the transcription end site (TES).

Figure 6: Coverage plot on TSS

This plot displays the mean coverage for all the genes, from -2000 bases of the transcription start site (TSS) to 2000+ bases after it.

Figure 7: This plot displays the insert-size histogram for each sample.

MACS peak calling

MACS results for each sample

Sample Type	Sample	Total Fragments
Treatment	Aire_C313Y_WT1	3920338
Treatment	Aire_C313Y_WT2	7939748
Treatment	Aire_C313Y_Het3	7005306
Treatment	Aire_C313Y_Het2	9707626

MACS results for each comparison

Comparison	Fragment size (bp)	MACS model d length	Total number of peaks	Total number of peaks after filtering black list’s peaks
Aire_C313Y_WT1	33	33	138793	136902
Aire_C313Y_WT2	33	33	83949	82665
Aire_C313Y_Het3	33	33	245524	242566
Aire_C313Y_Het2	33	33	114515	112921

The final number of peaks for all comparisons is: 575,054

Link for Integrated annotated peaks file Download table

Figure 8: Number of peaks for all samples

Figure 9: Peaks distribution in genomic regions

Figure 10: Peaks distribution around TSS

Figure 11: Overlap of peaks among the first 4 samples

Venn plot legend

samples	mark
Aire_C313Y_WT1	1
Aire_C313Y_WT2	2
Aire_C313Y_Het3	3
Aire_C313Y_Het2	4

Bioinformatics pipeline methods

Reads were trimmed using cutadapt (DOI: 10.14806/ej.17.1.200) with the parameters: -a CTGTCTCTTATACACATCTCCGAGCCCACGAGAC -A CTGTCTCTTATACACATCTGACGCTGCCGACGAGTGTAGATCTCGGTGGTCGCCGTATCATT –times TIMES -q 25 -m 30).

Reads were mapped to mm10 genome using Bowtie2 (DOI: 10.1038/nmeth.1923).

Uniquely mapped reads were extracted using samtools with the parameters: -F 4, -f 0x2, -q 39.

The reads were graphically visualized using ngsplot with the parameters: -G -R genebody -C -O samples -D refseq -L 50000.

Following alignment, mitochondrial genes were removed from the analysis, and duplicated reads were removed using picard-tools.

Nucleosome-free fragments at the length <120bp were selected from the remaining unique reads, and broad peaks were called using MACS2 callpeak (https://doi.org/10.1186/gb-2008-9-9-r137) with the parameters: –bw 120 -B -f BAMPE –SPMR –B –shift -50 –extsize 100 -keep-dup all -q 0.05).

If chosen, for mouse genome a TSS file is used, containing either a broad or narrow definition of the gene’s TSS (Transcription Start Site) regions (based on Nature. 2016 Jun 30;534(7609):652-7 - The landscape of accessible chromatin in mammalian preimplantation embryos).

The predicted peaks were annotated according to the mm10 genome using Homer with default parameters after merging all peaks from all samples together with bedtools multiinter.

The distribution of peaks in genomic regions and their proximity to TSS (transcription start sites) were examined using ChIPseeker (DOI: 10.18129/B9.bioc.ChIPseeker). The ovelap of peaks for the first 4 samples in the Venn diagram was also analyzed using ChIPseeker.

The resulting peaks underwent filtering to exclude peaks from the blacklist.

black list for mm10 genome is taken from (https://github.com/Boyle-Lab/Blacklist/tree/master/lists)

The pipeline was constructed using Snakemake (DOI: 10.1093/bioinformatics/bts480).

Links to results

Sequences from folder: /home/labs/bioservices/Collaboration/example_and_data_for_testing_ATAC-seq_mm10/demo_AIRE/fastq

Output folder: /home/labs/bioservices/Collaboration/example_and_data_for_testing_ATAC-seq_mm10/demo_AIRE/20241119_044729_demo_ATAC-Seq

FastQC folder: /home/labs/bioservices/Collaboration/example_and_data_for_testing_ATAC-seq_mm10/demo_AIRE/20241119_044729_demo_ATAC-Seq/2_fastqc

MultiQC folder: /home/labs/bioservices/Collaboration/example_and_data_for_testing_ATAC-seq_mm10/demo_AIRE/20241119_044729_demo_ATAC-Seq/2_fastqc/multiQC

Report output folder: data_for_demo_production_20241119_044729

Statistics regarding the number of reads for each sample for various steps of the pipeline can be downloaded from: here.

MACS peak calling Statistics for each sample and comparison can be downloaded from: here.

Commands log can be downloaded from: here.

R packages versions can be found at: sessionInfo.txt

UTAP version 2.0

General information about the run can be found at: /home/labs/bioservices/services/UTAP-data/utap-meta-data/installation/UTAP-data-singularity/utap_update/reports/20241119_044729_demo_analysis_parameters.yaml

Treatment and control samples are found at: here.

Acknowledgments

Citing UTAP:

Kohen R, Barlev J, Hornung G, Stelzer G, Feldmesser E, Kogan K, Safran M, Leshkowitz D: UTAP: User-friendly Transcriptome Analysis Pipeline. BMC Bioinformatics 2019, 20(1):154.

Report ATAC-Seq Pipeline: 20241119_044729_demo

19-11-2024

Sequencing and mapping quality control (QC)

MACS peak calling

Bioinformatics pipeline methods

Links to results

Acknowledgments