Ribonucleic Acid Virtual Laboratory for Sequence Analysis, Structure Prediction, and Databases



Adaptive Grid Computing (AGC) System

The adaptive nature of the system is the ability that, at runtime, allows the AGC system to identify and exploit computer resources across UTEP campus to predict secondary structures of large numbers of RNA segments using a variety of prediction programs. A global overview of the project is presented in this picture:

Project Overview Image

Our software tool is implemented upon the existing open-source Grid middleware APST, or AppLeS Parameter Sweep Template [1]. APST supports Grid environments such as CONDOR [2] and GLOBUS [3] to launch application tasks, move data, and discover resources across network domains. It has been used successfully for other bioinformatics applications such as MCell [4] that studies cellular microphysiology. This project extends APST into APSTe (APSTextended), to support BOINC [5], a Grid environment that allows researchers to deploy desktop and laptop PCs owned by students or administration personnel when their computers are idling.

Sampling Approaches to Harness the Parallel Computing Power

Long RNA molecule must be cut into smaller segments so that sequence data of individual segments can be distributed to available AGC computers for structure prediction jobs simultaneously. For the sampling approach in determining the cutting points, we adopt two strategies, namely, windowing and progressive segmentation. Predicted structures of all the segments are later pooled together for assembly, followed by overal structural analysis of the reconstructed molecule.

In the windowing sampling approach, each set of segments has a fixed size (window size) and a fixed sliding step (window step). The segments in a set are generated by sliding the fixed-size window forward for a fixed number of steps. At each step, the nucleotides within the window form a segment. Both the window size and step are adjustable by the user.

In the progressive sampling approach the user defines a starting point, ending point, and a “step size”; the sampler generates a series of segments by progressively removing “step size” bases from the original segment, whose length is defined by the beginning and ending points (5' and 3' terminals, respectively) given by the user, starting from the beginning point and progressing in a 5' to 3' direction. The selection of cutting points is based on our previously reported palindrome distributions for DNA sequences [6].

RNA Secondary Structure Analysis (RNASSA) Tools

The first version of RNASSA tools has been released in October 2011 for RG [7] and UNAFold [8] prediction algorithms with the added flexibility of two user-defined prediction results for comparison. The application allows graphical visualization of the predicted structures using PseudoViewer [9] and comparative alignments of sequences selected by the user. No software installation is required after unzipping the following file:
Download RNASSA Version 1.0
RNASSA 1.0 (zip file with the user manual included in the package)

Since it is a Java-based application, a runnng Java environment on your PC is necessary. As data will be stored on this RNAVLab server, access by off-campus users requires an VPN account from UTEP. Please contact us for account setup at bioinformatics AT utep DOT edu.

In November 2012, as a major RNASSA upgrade, Version 2.0 has incorporated both InversFinder 2.0 and Segmenta 2.0. In other words, RNASSA 2.0 is now capable of finding inversions, cutting RNA into smaller segments, editing chunks or longer regions, submitting them for prediction at RNAVLab, assembling chunks or regions, and comparing them without leaving the software. In addition to logging in RNAVLab, this new RNASSA 2.0 requires latest versions of Java and R installed properly. With improved file manager in RNASSA 2.0, all files will be placed under the RNASSA directory automatically created after unzipping the following file:

RNASSA 2.0 (zip file; Java and R required)
; Release Notes; PDF Manual

References

[1] H. Casanova and F. Berman (2002) Parameter Sweeps on the Grid with APST. Chapter 33 in Grid Computing: Making the Global Infrastructure a Reality, Wiley Publisher, Inc.
[2] D. Thain, T. Tannenbaum, and M. Livny (2005) Distributed Computing in Practice: The Condor Experience. Concurrency and Computation: Practice and Experience, John Wiley & Sons, Ltd.
[3] I. Foster and C. Kesselman (1997) Globus: A Metacomputing Infrastructure Toolkit. Intl J. Supercomputer Applications, 11(2):115-128.
[4] MCell: A Monte Carlo Simulator of Cellular Microphysiology. http://www.mcell.cnl.salk.edu. Accessed September 22, 2011.
[5] D. P. Anderson (2004) BOINC: A System for Public-Resource Computing and Storage. GRID'2004: 4-10.
[6] M. Leung et al. (2005) Nonrandom clusters of palindromes in herpesvirus genomes. J. Comput. Biol 12(3):331–354. [PubMed: 15857246].
[7] J. Reeder and R. Giegerich (2004) Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics, BMC Bioinformatics, 5:104. http://bibiserv.techfak.uni-bielefeld.de/pknotsrg/. Accessed October 20, 2011.
[8] N. R. Markham & M. Zuker (2008) UNAFold: software for nucleic acid folding and hybriziation. In Keith, J. M., editor, Bioinformatics, Volume II. Structure, Functions and Applications, number 453 in Methods in Molecular Biology, chapter 1, pages 3–31. Humana Press, Totowa, NJ. http://mfold.rna.albany.edu/?q=DINAMelt/software. Accessed October 20, 2011.
[9] Y. Byun and K. Han (2006) PseudoViewer: web application and web service for visualizing RNA pseudoknots and secondary structures. Nucleic Acid Research, 34 (supplement 2):W416. http://nar.oxfordjournals.org/content/34/suppl_2/W416. Accessed October 20, 2011.


Home | Structure Prediction | PseudoBase++ | Downloads | Members | Publications
This work is supported by the BBRC Grant 5G12MD007592
The University of Texas at El Paso (UTEP), 500 W University Ave. El Paso, TX 79968
Copyright 2020 RNAVLab, All Rights Reserved