PUBLICATION

Normalization of RNA-seq data using factor analysis of control genes or samples

Authors
Risso, D., Ngai, J., Speed, T.P., Dudoit, S.
ID
ZDB-PUB-190806-3
Date
2014
Source
Nature Biotechnology   32: 896-902 (Journal)
Registered Authors
Ngai, John
Keywords
none
Datasets
GEO:GSE53334
MeSH Terms
  • Action Potentials
  • Factor Analysis, Statistical*
  • Sequence Analysis, RNA*
PubMed
25150836 Full text @ Nat Biotechnol.
Abstract
Normalization of RNA-sequencing (RNA-seq) data has proven essential to ensure accurate inference of expression levels. Here, we show that usual normalization approaches mostly account for sequencing depth and fail to correct for library preparation and other more complex unwanted technical effects. We evaluate the performance of the External RNA Control Consortium (ERCC) spike-in controls and investigate the possibility of using them directly for normalization. We show that the spike-ins are not reliable enough to be used in standard global-scaling or regression-based normalization procedures. We propose a normalization strategy, called remove unwanted variation (RUV), that adjusts for nuisance technical effects by performing factor analysis on suitable sets of control genes (e.g., ERCC spike-ins) or samples (e.g., replicate libraries). Our approach leads to more accurate estimates of expression fold-changes and tests of differential expression compared to state-of-the-art normalization methods. In particular, RUV promises to be valuable for large collaborative projects involving multiple laboratories, technicians, and/or sequencing platforms.
Genes / Markers
Figures
Expression
Phenotype
Mutations / Transgenics
Human Disease / Model
Sequence Targeting Reagents
Fish
Antibodies
Orthology
Engineered Foreign Genes
Mapping