Abstract
RNA-binding proteins are key regulators of gene expression, yet only a small fraction have been functionally characterized. Here we report a systematic analysis of the RNA motifs recognized by RNA-binding proteins, encompassing 205 distinct genes from 24 diverse eukaryotes. The sequence specificities of RNA-binding proteins display deep evolutionary conservation, and the recognition preferences for a large fraction of metazoan RNA-binding proteins can thus be inferred from their RNA-binding domain sequence. The motifs that we identify in vitro correlate well with in vivo RNA-binding data. Moreover, we can associate them with distinct functional roles in diverse types of post-transcriptional regulation, enabling new insights into the functions of RNA-binding proteins both in normal physiology and in human disease. These data provide an unprecedented overview of RNA-binding proteins and their targets, and constitute an invaluable resource for determining post-transcriptional regulatory mechanisms in eukaryotes.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
Accessions
Gene Expression Omnibus
Data deposits
Raw and processed microarray data are available at GEO (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE41235. The derived motifs and results of analyses are available at http://hugheslab.ccbr.utoronto.ca/supplementary-data/RNAcompete_eukarya/.
References
Glisovic, T., Bachorik, J. L., Yong, J. & Dreyfuss, G. RNA-binding proteins and post-transcriptional gene regulation. FEBS Lett. 582, 1977–1986 (2008)
Keene, J. D. RNA regulons: coordination of post-transcriptional events. Nature Rev. Genet. 8, 533–543 (2007)
Cook, K. B., Kazan, H., Zuberi, K., Morris, Q. & Hughes, T. R. RBPDB: a database of RNA-binding specificities. Nucleic Acids Res. 39, D301–D308 (2011)
Gabut, M., Chaudhry, S. & Blencowe, B. J. SnapShot: The splicing regulatory machinery. Cell 133, 192.e1 (2008)
Auweter, S. D., Oberstrass, F. C. & Allain, F. H. Sequence-specific binding of single-stranded RNA: is there a code for recognition? Nucleic Acids Res. 34, 4943–4959 (2006)
De Gaudenzi, J. G., Noe, G., Campo, V. A., Frasch, A. C. & Cassola, A. Gene expression regulation in trypanosomatids. Essays Biochem. 51, 31–46 (2011)
Noyes, M. B. et al. Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites. Cell 133, 1277–1289 (2008)
Berger, M. F. et al. Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell 133, 1266–1276 (2008)
Christensen, R. G. et al. Recognition models to predict DNA-binding specificities of homeodomain proteins. Bioinformatics 28, i84–i89 (2012)
Liu, J. & Stormo, G. D. Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors. Bioinformatics 24, 1850–1857 (2008)
Ray, D. et al. Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nature Biotechnol. 27, 667–670 (2009)
Berger, M. F. & Bulyk, M. L. Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nature Protocols 4, 393–411 (2009)
Li, X., Quon, G., Lipshitz, H. D. & Morris, Q. Predicting in vivo binding sites of RNA-binding proteins using mRNA secondary structure. RNA 16, 1096–1107 (2010)
Hoell, J. I. et al. RNA targets of wild-type and mutant FET family proteins. Nature Struct. Mol. Biol. 18, 1428–1431 (2011)
Miyamoto, S., Hidaka, K., Jin, D. & Morisaki, T. RNA-binding proteins Rbm38 and Rbm24 regulate myogenic differentiation via p21-dependent and -independent regulatory pathways. Genes Cells 14, 1241–1252 (2009)
Anyanful, A. et al. The RNA-binding protein SUP-12 controls muscle-specific splicing of the ADF/cofilin pre-mRNA in C. elegans . J. Cell Biol. 167, 639–647 (2004)
Stefl, R., Skrisovska, L. & Allain, F. H. RNA sequence- and shape-dependent recognition by proteins in the ribonucleoprotein particle. EMBO Rep. 6, 33–38 (2005)
Brooks, A. N. et al. Conservation of an RNA regulatory map between Drosophila and mammals. Genome Res. 21, 193–202 (2011)
Huelga, S. C. et al. Integrative genome-wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins. Cell Rep. 1, 167–178 (2012)
Burd, C. G. & Dreyfuss, G. RNA binding specificity of hnRNP A1: significance of hnRNP A1 high-affinity binding sites in pre-mRNA splicing. EMBO J. 13, 1197–1204 (1994)
Blanchette, M. et al. Genome-wide analysis of alternative pre-mRNA splicing and RNA-binding specificities of the Drosophila hnRNP A/B family members. Mol. Cell 33, 438–449 (2009)
Goodarzi, H. et al. Systematic discovery of structural elements governing stability of mammalian messenger RNAs. Nature 485, 264–268 (2012)
Moses, A. M., Chiang, D. Y., Pollard, D. A., Iyer, V. N. & Eisen, M. B. MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model. Genome Biol. 5, R98 (2004)
Yeo, G. W. et al. An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein interactions in stem cells. Nature Struct. Mol. Biol. 16, 130–137 (2009)
Morris, A. R., Mukherjee, N. & Keene, J. D. Ribonomic analysis of human Pum1 reveals cis-trans conservation across species despite evolution of diverse mRNA target sets. Mol. Cell. Biol. 28, 4093–4103 (2008)
Licatalosi, D. D. et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464–469 (2008)
Wang, E. T. et al. Transcriptome-wide regulation of pre-mRNA splicing and mRNA localization by muscleblind proteins. Cell 150, 710–724 (2012)
Sawicka, K., Bushell, M., Spriggs, K. A. & Willis, A. E. Polypyrimidine-tract-binding protein: a multifunctional RNA-binding protein. Biochem. Soc. Trans. 36, 641–647 (2008)
Biedermann, B., Hotz, H. R. & Ciosk, R. The Quaking family of RNA-binding proteins: coordinators of the cell cycle and differentiation. Cell Cycle 9, 1929–1933 (2010)
Izquierdo, J. M. Hu antigen R (HuR) functions as an alternative pre-mRNA splicing regulator of Fas apoptosis-promoting receptor on exon definition. J. Biol. Chem. 283, 19077–19084 (2008)
Markus, M. A. & Morris, B. J. RBM4: a multifunctional RNA-binding protein. Int. J. Biochem. Cell Biol. 41, 740–743 (2009)
Myer, V. E., Fan, X. C. & Steitz, J. A. Identification of HuR as a protein implicated in AUUUA-mediated mRNA decay. EMBO J. 16, 2130–2139 (1997)
Van Etten, J. et al. Human Pumilio proteins recruit multiple deadenylases to efficiently repress messenger RNAs. J. Biol. Chem. 287, 36370–36383 (2012)
Xue, Y. et al. Genome-wide analysis of PTB-RNA interactions reveals a strategy used by the general splicing repressor to modulate exon inclusion or skipping. Mol. Cell 36, 996–1006 (2009)
Zhang, C. et al. Defining the regulatory network of the tissue-specific splicing factors Fox-1 and Fox-2. Genes Dev. 22, 2550–2563 (2008)
Fogel, B. L. et al. RBFOX1 regulates both splicing and transcriptional networks in human neuronal development. Hum. Mol. Genet. 21, 4171–4186 (2012)
Voineagu, I. et al. Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature 474, 380–384 (2011)
Barash, Y. et al. Deciphering the splicing code. Nature 465, 53–59 (2010)
Hogan, D. J., Riordan, D. P., Gerber, A. P., Herschlag, D. & Brown, P. O. Diverse RNA-binding proteins interact with functionally related sets of RNAs, suggesting an extensive regulatory system. PLoS Biol. 6, e255 (2008)
Qin, X., Ahn, S., Speed, T. P. & Rubin, G. M. Global analyses of mRNA translational control during early Drosophila embryogenesis. Genome Biol. 8, R63 (2007)
Tadros, W. et al. SMAUG is a major regulator of maternal mRNA destabilization in Drosophila and its translation is activated by the PAN GU kinase. Dev. Cell 12, 143–155 (2007)
Lécuyer, E. et al. Global analysis of mRNA localization reveals a prominent role in organizing cellular architecture and function. Cell 131, 174–187 (2007)
Wunderlich, Z. & Mirny, L. A. Different gene regulation strategies revealed by analysis of binding motifs. Trends Genet. 25, 434–440 (2009)
Castello, A. et al. Insights into RNA biology from an atlas of mammalian mRNA-binding proteins. Cell 149, 1393–1406 (2012)
Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011)
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005)
Mahony, S. & Benos, P. V. STAMP: a web tool for exploring DNA-binding motif similarities. Nucleic Acids Res. 35, W253–W258 (2007)
Acknowledgements
We thank H. van Bakel for computational support, A. Ramani and J. Calarco for discussions, Y. Wu, G. Rasanathan, M. Krishnamoorthy, O. Boright, A. Janska, J. Li, S. Talukder, A. Cote and S. Votruba for technical assistance, L. Sutherland for purchasing RBM5 protein and for feedback on the manuscript, S. Jain for software modified to create Fig. 2, and N. Barbosa-Morais for generating cRPKM values from autism RNA-seq data. We thank M. Kiledjian (PCBP1 and PCBP2), J. Stevenin (SRSF2 and SFRS7), S. Richard (QKI), M. Gorospe (TIA1), B. Chabot (SRSF9), A. Berglund (MBNL1), F. Pagani (DAZAP1), A. Bindereif (HNRNPL), M. Freeman (HNRNPK), E. Miska (LIN28A), K. Kohno (YBX1), M. Garcia-Blanco (PTBP1), R. Wharton (PUM-HD), C. Smibert (Vts1p) and M. Blanchette (Hrb27C, Hrb87F and Hrb98DE) for sending published constructs. This work was supported by funding from NIH (1R01HG00570 to T.R.H. and Q.D.M., R01GM084034 to K.W.L.), CIHR (MOP-49451 to T.R.H., MOP-93671 to Q.D.M., MOP-125894 to Q.D.M. and T.R.H., MOP-67011 to B.J.B., and MOP-14409 to H.D.L.), and the Intramural Program of the NIDDK (DK015602-05 to E.P.L.). K.B.C. and S.G. hold NSERC Alexander Graham Bell Canada Graduate Scholarships. M.T.W. was funded by fellowships from CIHR and CIFAR. H.S.N. holds a Charles H. Best Fellowship and was funded partially by awards from CIFAR to T.R.H. and B.J.F. M.I. is the recipient of an HFSP LT Fellowship.
Author information
Authors and Affiliations
Contributions
D.R., H.K., K.B.C., M.T.W. and H.S.N. made unique, essential and extensive contributions to the manuscript, and are ordered by amount of time and effort contributed. D.R. and H.K. developed most of the laboratory and computational components of RNAcompete, respectively. D.R., H.Z., A.Y., H.N., L.H.M., S.A.S., C.A.Y., S.M.K., B.N., D.M., W.L., R.S.L. and M.Q. cloned, expressed and purified the proteins. D.R. ran the RNAcompete assays, including data extraction. H.K. and K.B.C. processed the data, H.K. and K.B.C. generated motifs, and H.K., K.B.C., M.T.W. and H.S.N. performed the motif analyses. H.K. assembled the in vivo protein-RNA data sets. L.H.M. and R.K.D. performed and analysed RIP-seq data. K.B.C. developed the supplementary website and Figs 1 and 2 with assistance from H.K. and M.T.W. M.T.W. and M.A. created the cisBP-RNA database. M.T.W., H.S.N. and T.R.H. created Fig. 3. H.S.N. performed the analyses of human splicing, RNA stability data and human sequence conservation, and created Figs 4 and 5. M.I. and S.G. generated and analysed RNA-seq data and S.G. performed reporter-based RNA stability assays. X.L. performed Drosophila data analysis. H.D.L., F.P., A.H.C., R.P.C., B.J.F., R.A.A., K.W.L., L.O.F.P., E.P.L., B.J.B. and A.G.F. helped organize and support the project, and provided feedback on the manuscript. B.J.F., B.J.B. and A.G.F. provided critical advice and commentary on data analysis. Q.D.M. and T.R.H. conceived of the study, supervised the project and wrote the manuscript with contributions from D.R., H.K., K.B.C., B.J.B., A.F. and H.S.N.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Information
This file contains Supplementary Methods, Supplementary Figures 1-6, Supplementary Tables 1-4 and additional references. (PDF 2569 kb)
Supplementary Data 1
This file shows RNA-binding proteins with known consensus motifs. It contains panels for human and Drosophila listing RBPs with known consensus motifs as well as the Pubmed ID of the publication that defined the motif. (XLSX 27 kb)
Supplementary Data 2
The RNAcompete master file. This file contains data on all RNAcompete experiments indexed by motif ID including: name, systematic ID and species of protein queried, the resulting motif, amino acid sequence of plasmid insert, and information on binding conditions used. (XLSX 2614 kb)
Supplementary Data 3
Secondary structure analysis. This file contains data panels in which each row corresponds to a significantly enriched secondary structure context for a given RNAcompete experiment along with P-values and effect sizes. Classification panel summarizes analysis results by motif. (XLSX 30 kb)
Supplementary Data 4
Clustered E-scores. This file contains the data matrix used in Figures 1b and S7. (TXT 16827 kb)
Supplementary Data 5
Comparison of RNAcompete and literature motifs. This file shows the results of comparison with previously defined motifs for RNAcompete RBPs. (XLSX 515 kb)
Supplementary Data 6
AUROC scores for in vivo and in vitro defined motifs on in vivo binding data. This file contains AUROCs for RNAcompete motifs on in vivo binding data described in Table S2, along with motifs learned by Malarkey on these data and AUROC scores for previously defined motifs for these RBPs. (XLSX 19 kb)
Supplementary Data 7
Post-transcriptional regulation (PTR) analysis in human. This file contains additional details and results of PTR analysis in human including predicted RBP-transcript regulatory networks for splicing and stability analysis. (XLSX 1445 kb)
Supplementary Data 8
Post-transcriptional regulation (PTR) analysis in Drosophila. This file contains details and results of PTR analysis for Drosophila including lists of PTR categories enriched for RNAcompete-derived IUPAC motifs, weights of trained logistic regression classifiers, Drosophila RBP(s) associated with each IUPAC motif, and IUPAC motifs queried. (XLSX 44 kb)
Supplementary Data 9
Sources of gene and Pfam models. This file details sources for gene and protein models for all organisms used in cisBP-RNA and in this paper. Also indicates Pfam models used to scan for RBDs.Sources of gene and Pfam models. This file details sources for gene and protein models for all organisms used in cisBP-RNA and in this paper. Also indicates Pfam models used to scan for RBDs. (XLSX 34 kb)
Rights and permissions
About this article
Cite this article
Ray, D., Kazan, H., Cook, K. et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172–177 (2013). https://doi.org/10.1038/nature12311
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nature12311
This article is cited by
-
Improved modeling of RNA-binding protein motifs in an interpretable neural model of RNA splicing
Genome Biology (2024)
-
CircRNA identification and feature interpretability analysis
BMC Biology (2024)
-
The phosphatase inhibitor LB-100 creates neoantigens in colon cancer cells through perturbation of mRNA splicing
EMBO Reports (2024)
-
Development and validation of AI/ML derived splice-switching oligonucleotides
Molecular Systems Biology (2024)
-
Transcriptome meta-analysis-based identification of hub transcription factors and RNA-binding proteins potentially orchestrating gene regulatory cascades and crosstalk in response to abiotic stresses in Arabidopsis thaliana
Journal of Applied Genetics (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.