Single cell RNA sequencing (scRNA-seq) is a powerful gene expression profiling technique. Existing single-cell RNA-sequencing methods suffer from sub-optimal target recovery. The resulting zero-inflated data may confound data interpretation.
MotivationSingle cell RNA sequencing (scRNA-seq) is a powerful gene expression profiling technique that is presently revolutionizing the study of complex cellular systems in the biological sciences. Existing single-cell RNA-sequencing methods suffer from sub-optimal target recovery leading to inaccurate measurements including many false negatives. The resulting zero-inflated data may confound data interpretation and visualization. ResultsSince cells have coherent phenotypes defined by conserved molecular circuitries (i.e. multiple gene products working together) and since similar cells utilize similar circuits, information about each each expression value or node in a multi-cell, multi-gene scRNA-Seq data set is expected to also be predictable from other nodes in the data set. Based on this logic, several approaches have been proposed to impute missing values by extracting information from non-zero measurements in a data set. In this study, we applied non-negative matrix factorization approaches to a selection of published scRNASeq data sets to recommend new values where original measurements are likely to be inaccurate and where zero measurements are predicted to be false negatives. The resulting imputed data model predicts novel cell type markers and expression patterns more closely matching gene expression values from orthogonal measurements and/or predicted literature than the values obtained from other previously published imputation approaches. [email protected] Availability and implementationFIESTA is written in R and is available at https://github.com/elnazmirzaei/FIESTA and https://github.com/TheSpikeLab/FIESTA.