PUBLICATION

Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets

Authors
Shainer, I., Stemmer, M.
ID
ZDB-PUB-210916-2
Date
2021
Source
BMC Genomics   22: 661 (Journal)
Registered Authors
Shainer, Inbal, Stemmer, Manuel
Keywords
10X genomics, Alignment, Cell Ranger, Kallisto, Opsin, Pineal gland, Single-cell RNA sequencing, Zebrafish
MeSH Terms
  • Animals
  • Cluster Analysis
  • Gene Expression Profiling
  • Mice
  • RNA, Small Cytoplasmic*
  • Sequence Analysis, RNA
  • Single-Cell Analysis*
  • Software
  • Zebrafish/genetics
PubMed
34521337 Full text @ BMC Genomics
Abstract
Single-cell RNA sequencing (scRNA-seq) has quickly become one of the most dominant techniques in modern transcriptome assessment. In particular, 10X Genomics' Chromium system, with its high throughput approach, turn key and thorough user guide made this cutting-edge technique accessible to many laboratories using diverse animal models. However, standard pre-processing, including the alignment and cell filtering pipelines might not be ideal for every organism or tissue. Here we applied an alternative strategy, based on the pseudoaligner kallisto, on twenty-two publicly available single cell sequencing datasets from a wide range of tissues of eight organisms and compared the results with the standard 10X Genomics' Cell Ranger pipeline.
In most of the tested samples, kallisto produced higher sequencing read alignment rates and total gene detection rates in comparison to Cell Ranger. Although datasets processed with Cell Ranger had higher cell counts, outside of human and mouse datasets, these additional cells were routinely of low quality, containing low gene detection rates. Thorough downstream analysis of one kallisto processed dataset, obtained from the zebrafish pineal gland, revealed clearer clustering, allowing the identification of an additional photoreceptor cell type that previously went undetected. The finding of the new cluster suggests that the photoreceptive pineal gland is essentially a bi-chromatic tissue containing both green and red cone-like photoreceptors and implies that the alignment and pre-processing pipeline can affect the discovery of biologically-relevant cell types.
While Cell Ranger favors higher cell numbers, using kallisto results in datasets with higher median gene detection per cell. We could demonstrate that cell type identification was not hampered by the lower cell count, but in fact improved as a result of the high gene detection rate and the more stringent filtering. Depending on the acquired dataset, it can be beneficial to favor high quality cells and accept a lower cell count, leading to an improved classification of cell types.
Genes / Markers
Figures
Show all Figures
Expression
Phenotype
Mutations / Transgenics
Human Disease / Model
Sequence Targeting Reagents
Fish
Antibodies
Orthology
Engineered Foreign Genes
Mapping