RNA, MicroRNAs, and 3D Structures
Ribonucleic acid (RNA) has four types of nucleotide bases: adenine (A), cytosine (C), guanine (G), and uracil (U); they form single-stranded sequences that vary greatly in size, ranging from short segments of about 20 nucleotide bases in microRNAs with gene regulatory roles to long molecules of over 30,000 bases as entire viral genomes [1]. Among the four nucleotide bases, complementary base pairs can form by hydrogen bonding between C and G as well as A and U, but less frequently between G and U. Although a single-stranded RNA molecule is a linear polymer, it tends to fold back on itself to form a 3-dimensional (3D) structure mostly by the molecular interaction between distant complementary bases. We can gain useful information about the 3D structure and functions from analyzing its secondary structure, which refers to the arrangement patterns of hydrogen-bonded base pairs in the molecule.
Close Inversions, Stem-loops, and Pseudoknots
RNA secondary structures consist of pattern features that can be classified into two basic categories: stem-loops and pseudoknots (see Fig. 1a and 1b), which have been implicated in important biological processes such as gene expression and regulation [2,3,4]. We also note that in both secondary structures, it is necessary to have a stretch of nucleotide sequence (ACCGUC) followed by its inverted complementary sequence (GACGGU) downstream; these patterns are called "close inversions." A bioinformatics tool, InversFinder 1.0, released in February 2012 and later upgraded to Version 2.0 in September (visit Downloads), is a small, fast Java-based application for finding such close inversions. The development of mathematical models and computational prediction algorithms for stem-loop structures based on thermodynamic models started in the 1980s [5,6]. Because of the extra base pairings, pseudoknots must be represented by more complex models and data structures.
Sequence Analysis, Structure Prediction, and Databases
Despite the computing power of supercomputers and emerging advanced technologies, e.g., multi-core architectures, the prediction of secondary structures of long RNA sequences (in the order of 1000 nucleotides) based on thermodynamic methods, e.g., [7], is still not feasible, especially if the structures include complex secondary structures as pseudoknots. We aim at developing computational tools for comparing RNA sequences, efficient prediction algorithms for RNA secondary structures, and fast databases for these special RNA features determined computationally and experimentally.
References
[1] Thiel, V., et al. Mechanisms and enzymes involved in SARS coronavirus genome expression. J. Gen.
Virol. 2003;84:2305–2315. [PubMed: 12917450]
[2] Petrillo, M., Silvestro, G., Di Nocera, P.P., Boccia, A., Paolella, G. Stem-loop structures in prokaryotic genomes. BMC Genom 2006;7:170.
[3] Su, M.-C., et al. An atypical RNA pseudoknot simulator and an upstream attenuation signal for -1
ribosomal frameshifting of SARS coronavirus. Nucleic Acids Res 2005;33(13):4265–4275.
[4] Wilkinson, S.R. and Been, M.D. A pseudoknot in the 3′ non-core region of the glmS ribozyme enhances self-cleavage activity. RNA 2005;11:1788–1794. [PubMed: 16314452]
[5] Sankoff, D. Simultaneous solution of the RNA folding, alignment, and protosequence problems.
SIAM J. Appl. Math 1985;45:810–825.
[6] Zuker, M. Computer prediction of RNA structure. Methods Enzymol 1989;180:262–288. [PubMed:
2482418]
[7] Zuker, M.; Mathews, D.; Turner, D. RNA Biochemistry and Biotechnology. Kluwer Academic Publishers; 1999. Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide.