PUBLICATION

Incorporating RNA-seq data into the Zebrafish Ensembl Gene Build

Authors

Collins, J.E., White, S., Searle, S.M., and Stemple, D.L.

ID

ZDB-PUB-120718-36

Date

2012

Source

Genome research 22(10): 2067-2078 (Journal)

Registered Authors

Stemple, Derek L.

Keywords

none

MeSH Terms

3' Untranslated Regions
Databases, Nucleic Acid*
Introns
Exons
Molecular Sequence Annotation*
Genomics/methods
Zebrafish/genetics*
Male
Computational Biology/methods*
Animals
RNA/chemistry*
RNA/genetics
Transcription, Genetic
DNA, Complementary
Models, Genetic

PubMed

22798491 Full text @ Genome Res.

Abstract

Ensembl gene annotation provides a comprehensive catalogue of transcripts aligned to the reference sequence. It relies on publicly available species specific and orthologous transcripts plus their inferred protein sequence. The accuracy of gene models is improved by increasing the species specific component which can be cost-effectively achieved using RNA-Seq. Two zebrafish gene annotations are presented in Ensembl version 62 built on the Zv9 reference sequence. Firstly, RNA-Seq data from five tissues and seven developmental stages were assembled into 25,748 gene models. A 3' end capture and sequencing protocol was developed to predict the 3' ends of transcripts and 46.1% of the original models were subsequently refined. Secondly, a standard Ensembl gene build, incorporating carefully filtered elements from the RNA-Seq only build, followed by a merge with the manually curated VEGA database produced a comprehensive annotation of 26,152 genes represented by 51,569 transcripts. The RNA-Seq only and the Ensembl/VEGA gene builds contribute contrasting elements to the final gene build. The RNA-Seq gene build was used to adjust intron/exon boundaries of orthologous defined models, confirm their expression and improve 3'’ untranslated regions. Importantly the inferred protein alignments within the Ensembl gene build conferred proof of model contiguity for the RNA-Seq models. The zebrafish gene annotation has been enhanced by the incorporation of RNA-Seq data and the pipeline will be used for other organisms. Organisms with little species specific cDNA data will generally benefit the most.

Genes / Markers

Figures

Expression

Phenotype

Mutations / Transgenics

Human Disease / Model

Sequence Targeting Reagents

Fish

Antibodies

Orthology

Engineered Foreign Genes

Mapping

Collins et al., 2012 ZDB-PUB-120718-36 PMID:22798491

Incorporating RNA-seq data into the Zebrafish Ensembl Gene Build

Collins et al., 2012

ZDB-PUB-120718-36
PMID:22798491