Figure 1: Plots the average quality of each base across all reads. Quality of 30 and up is good (predicted error rate 1:1000).
Quality of read 1:
Quality of read 2:
Figure 2: Histogram showing the number of reads for each sample in raw data. The sequencing depth statistics for all samples were as follows:
the median depth was 42,837,616 reads,
the mean depth was 44,940,902 reads,
the standard deviation was 7,617,043 reads.
Figure 3: Percentage of reads discarded after trimming.
No figure presented since the percentage of reads discarded after trimming for all samples is lower than 1%.
Figure 4: Histogram with the number of reads for each sample in each step of the pipeline.
Figure 5: Coverage plot on Genebody
Plot of mean read (counts per million mapped reads) coverage of gene regions. This plot displays the mean coverage for all the genes, from -2000 bases of the transcription start site (TSS) to +2000 bases of the transcription end site (TES).
Figure 6: Coverage plot on TSS
This plot displays the mean coverage for all the genes, from -2000 bases of the transcription start site (TSS) to 2000+ bases after it.
Figure 7: This plot displays the insert-size histogram for each sample.
MACS results for each sample
Sample Type | Sample | Total Fragments |
---|---|---|
Treatment | Aire_C313Y_WT1 | 3920338 |
Treatment | Aire_C313Y_WT2 | 7939748 |
Treatment | Aire_C313Y_Het3 | 7005306 |
Treatment | Aire_C313Y_Het2 | 9707626 |
MACS results for each comparison
Comparison | Fragment size (bp) | MACS model d length | Total number of peaks | Total number of peaks after filtering black list’s peaks |
---|---|---|---|---|
Aire_C313Y_WT1 | 33 | 33 | 138793 | 136902 |
Aire_C313Y_WT2 | 33 | 33 | 83949 | 82665 |
Aire_C313Y_Het3 | 33 | 33 | 245524 | 242566 |
Aire_C313Y_Het2 | 33 | 33 | 114515 | 112921 |
The final number of peaks for all comparisons is: 575,054
Link for Integrated annotated peaks file Download table
Figure 8: Number of peaks for all samples
Figure 9: Peaks distribution in genomic regions
Figure 10: Peaks distribution around TSS
Figure 11: Overlap of peaks among the first 4 samples
Venn plot legend
samples | mark |
---|---|
Aire_C313Y_WT1 | 1 |
Aire_C313Y_WT2 | 2 |
Aire_C313Y_Het3 | 3 |
Aire_C313Y_Het2 | 4 |
Reads were trimmed using cutadapt (DOI: 10.14806/ej.17.1.200) with the parameters: -a CTGTCTCTTATACACATCTCCGAGCCCACGAGAC -A CTGTCTCTTATACACATCTGACGCTGCCGACGAGTGTAGATCTCGGTGGTCGCCGTATCATT –times TIMES -q 25 -m 30).
Reads were mapped to mm10 genome using Bowtie2 (DOI: 10.1038/nmeth.1923).
Uniquely mapped reads were extracted using samtools with the parameters: -F 4, -f 0x2, -q 39.
The reads were graphically visualized using ngsplot with the parameters: -G -R genebody -C -O samples -D refseq -L 50000.
Following alignment, mitochondrial genes were removed from the analysis, and duplicated reads were removed using picard-tools.
Nucleosome-free fragments at the length <120bp were selected from the remaining unique reads, and broad peaks were called using MACS2 callpeak (https://doi.org/10.1186/gb-2008-9-9-r137) with the parameters: –bw 120 -B -f BAMPE –SPMR –B –shift -50 –extsize 100 -keep-dup all -q 0.05).
If chosen, for mouse genome a TSS file is used, containing either a broad or narrow definition of the gene’s TSS (Transcription Start Site) regions (based on Nature. 2016 Jun 30;534(7609):652-7 - The landscape of accessible chromatin in mammalian preimplantation embryos).
The predicted peaks were annotated according to the mm10 genome using Homer with default parameters after merging all peaks from all samples together with bedtools multiinter.
The distribution of peaks in genomic regions and their proximity to TSS (transcription start sites) were examined using ChIPseeker (DOI: 10.18129/B9.bioc.ChIPseeker). The ovelap of peaks for the first 4 samples in the Venn diagram was also analyzed using ChIPseeker.
The resulting peaks underwent filtering to exclude peaks from the blacklist.
black list for mm10 genome is taken from (https://github.com/Boyle-Lab/Blacklist/tree/master/lists)
The pipeline was constructed using Snakemake (DOI: 10.1093/bioinformatics/bts480).
Sequences from folder: /home/labs/bioservices/Collaboration/example_and_data_for_testing_ATAC-seq_mm10/demo_AIRE/fastq
Output folder: /home/labs/bioservices/Collaboration/example_and_data_for_testing_ATAC-seq_mm10/demo_AIRE/20241119_044729_demo_ATAC-Seq
FastQC folder: /home/labs/bioservices/Collaboration/example_and_data_for_testing_ATAC-seq_mm10/demo_AIRE/20241119_044729_demo_ATAC-Seq/2_fastqc
MultiQC folder: /home/labs/bioservices/Collaboration/example_and_data_for_testing_ATAC-seq_mm10/demo_AIRE/20241119_044729_demo_ATAC-Seq/2_fastqc/multiQC
Report output folder: data_for_demo_production_20241119_044729
Statistics regarding the number of reads for each sample for various steps of the pipeline can be downloaded from: here.
MACS peak calling Statistics for each sample and comparison can be downloaded from: here.
Commands log can be downloaded from: here.
R packages versions can be found at: sessionInfo.txt
UTAP version 2.0
General information about the run can be found at: /home/labs/bioservices/services/UTAP-data/utap-meta-data/installation/UTAP-data-singularity/utap_update/reports/20241119_044729_demo_analysis_parameters.yaml
Treatment and control samples are found at: here.
Citing UTAP:
Kohen R, Barlev J, Hornung G, Stelzer G, Feldmesser E, Kogan K, Safran M, Leshkowitz D: UTAP: User-friendly Transcriptome Analysis Pipeline. BMC Bioinformatics 2019, 20(1):154.