Published on Thu Sep 30 2021

PEPATAC: An optimized pipeline for ATAC-seq data analysis with serial alignments

Smith, J. P., Corces, M. R., Xu, J., Reuter, V. P., Chang, H. Y., Sheffield, N. C.

PEPATAC is an ATAC-seq pipeline that is easily applied to projects of any size. It is restartable, fault-tolerant, and can be run on local hardware. Downstream analysis is simplified by a standard definition format.

1
6
16
Abstract

MotivationAs chromatin accessibility data from ATAC-seq experiments continues to expand, there is continuing need for standardized analysis pipelines. Here, we present PEPATAC, an ATAC-seq pipeline that is easily applied to ATAC-seq projects of any size, from one-off experiments to large-scale sequencing projects. ResultsPEPATAC leverages unique features of ATAC-seq data to optimize for speed and accuracy, and it provides several unique analytical approaches. Output includes convenient quality control plots, summary statistics, and a variety of generally useful data formats to set the groundwork for subsequent project-specific data analysis. Downstream analysis is simplified by a standard definition format, modularity of components, and metadata APIs in R and Python. It is restartable, fault-tolerant, and can be run on local hardware, using any cluster resource manager, or in provided Linux containers. We also demonstrate the advantage of aligning to the mitochondrial genome serially, which improves the accuracy of alignment statistics and quality control metrics. PEPATAC is a robust and portable first step for any ATAC-seq project. AvailabilityBSD2-licensed code and documentation at https://pepatac.databio.org.