RNAVLab: A Computational Environment for Studying RNA Secondary Structures Software version: RNAVLab 1.0 Jan 2008 Document author: Abel Licon University of Texas at El Paso and University of Delaware Table of Contents: ***************************************************************************************** 1. Code Overview and References 2. Software Requirements 3. Download and Installation 4. Using RNAVlab 5. Contact 6. References ***************************************************************************************** ***************************************************************************************** 1. Code Overview and References ***************************************************************************************** RNAVLab a virtual laboratory for studying RNA secondary structures. It includes software programs for aligning, comparing, and predicting secondary structures in the form of Java byte-code. The secondary structure alignment is based on the the Needleman-Wunsch alignment algorithm but rather than considering sequences of nucleotides, it aligns a parenthetical representation of secondary structures. RNAVLab includes three comparison methods for the comparison of secondary structures. They are based on a strict stack algorithm, a lenient bracket matching algorithm, and a Hamiltonian-inspired using a pair matching algorithm. Most prediction algorithms of RNA secondary structures that are based on thermodynamics methods are limited by their memory requirements and time complexity. To build the secondary structures of longer sequences, RNAVLab includes a rebuilding algorithm that uses sub-segments or chunks of overlapping regions to rebuild the longer secondary structure. By using a scoring scheme, the secondary structures of the chunks are combined into a final secondary structure including all the nucleotides of the long sequence. Two papers report the design and features of RNAVLab. If you want to know more about our work and you use our code for your research, please refer [1] and [2]. ***************************************************************************************** 2. Software Requirements ***************************************************************************************** The alignment and comparison classes work right out of the box, so for those you only need JRE 1.5 or higher. You can download JRE 1.5 or higher at Sun's web-site. RNAVLab uses a MySQL database for the storage and retrieval of RNA secondary structures during the rebuilding process. In order for the database interface to operate correctly there are a few things you have to set up first on your machine. You need to install Python 2.4 or higher and the MySQL bindings for it. You also need access to a MySQL database and the ability to create new tables in the database. We have provided a mysql dump script* that can be used to create a table of sample predictions. A test set is also provided. In summary, for aligning, comparing, and rebuilding you need: 1. JRE 1.5 or higher. (For alignment and comparison) 2. Ability to create a new table in a MySQL database. (For rebuilding) 3. Python 2.4 or higher and the MySQL bindings for Python. (For rebuilding) *the table in dump.sql was created using the pknotsRG algorithm by Jens Reeder and Robert Giegerich [3]. A python script is provided for the database interface in the bin directory of the attached tar file. You have to modify this file to point to your database in order for the interface to function properly. (See next section for more details about how to do this). The RNAVlab package also contains some Java wrappers for three well-known prediction programs, i.e., Pknots-RE [4], Pknots-RG [3], and NuPack [5]. You can download the codes at: NuPack: http://nupack.org/ PknotsRG: http://bibiserv.techfak.uni-bielefeld.de/pknotsrg/ PknotsRE: http://selab.janelia.org/software.html These codes are used to predict the secondary structures of short nucleotide chunks. The chunk predictions are used for rebuilding the longer secondary structure. See the next section for details on how to modify their source in order for these codes to work with RNAVLab. ***************************************************************************************** 3. Download and Installation ***************************************************************************************** RNAVLab is currently available as Java byte-code and should run on all the architectures that have a JRE 1.5 and above. To download RNAVLab, go to: http://rnavlab.utep.edu/rnavlab/download.php The file is in the form of a .tar.gz file; untarr and unzip the archive in a directory on your machines by issuing the following command: $tar -xzf rnavlab.tar.gz This command decompresses the archive into a folder called RNAVLab1.0 into your current working directory. Here is the layout of the directory structure: RNAVLab1.0- build- classes - Contains the java packages dataS_G.rna - Energy look-up table used by NuPack, must be copied here from the NuPack source in order to use the algorithm. align - Package for Needleman-Wunsch alignment Align.class predict - Package for prediction wrapper Predict.class Submit.class XML.class compare - Package for comparison algorithms Compare.class rebuild - Package for rebuilding algorithm AlignMotif.class Convert.class DB.class Element.class ElementComparator.class IntStack.class Motif.class Pair.class PairsAndCounts.class Rebuild.class Sequence.class Source.classs bin database.py - Script used by DB.class to access a MySQL database Energy.out - Energy Calculator(must be compiled and placed here by the user) predBin - Set of prediction programs. (Must be compiled, renamed, and copied here by the user) wrapperRE wrapperRG wrapperNuPack README.txt - This file dump.sql - A mysql script file that can be used to create a table with all the sub-predictions of the provided input set in exampleInput. exampleInput - Files containing the RNA sequence and their experimentally confirmed secondary structure. RF00001_A.bpseq ... ... At this point, there are still a couple of important things you have to take care of before you can start using all the parts of RNAVLAb. First of all, you have to set up the database. A mysqldump of the table containing the sub-predictions of the test set provided in exampleInput is provided as dump.sql. You can use it to test the RNAVLab functionality. To load the table in the database, log in to mysql and from the prompt issue: $>> mysql source dump.sql; This command creates a table called sub_seq_rg that RNAVLab can use to make rebuilt structures. Once you have succesfully loaded the table, you need to modify the database.py script in the RNAVLab1.0/bin directory. Replace the dummy information on Line 12 with your information: Line 12: conn = MySQLdb.connect(host='localhost', user='USER_NAME', passwd='PASSWRD', db='DB_NAME') RNAVLab includes the Java wrappers of three well-known prediction codes, i.e., Pkntos-RG[3], Pknots-RE[4], and NuPack[5]. Although the prediction algorithms are not required for testing the example sequences we provided in exampleInput, they are required to rebuild any new sequences. In order for the Java prediction wrapper to function properly, you have to modify the three prediction codes to receive input and print output in the following manner. Input: $./wrapperRE CAUGCUAGCUAGCUGAUCGUAGCUAGCUAGC Output: CAUGCUAGCUAGCUGAUCGUAGCUAGCUAGC :::((((((((((((::::)))))))))))) -18.500000 Each code should take a nucleotide sequence as its single argument and echo the sequence, the secondary structure, and the energy to the console separated by new lines. Once you have made these modifications, you must rename the binaries to: pknotsRE = wrapperRE pknotsRG = wrapperRG nupack = wrapperNuPack A folder called predBin should be created in the RNAVLab1.0 folder and the binaries copied to it. The NuPack package also contains a standalone energy calculator that we use to determine the energy of the rebuilt structures. It is called Energy.out and needs to be copied to the RNAVLab1.0/bin folder. It has to be modified to only output the energy. As it is now, it takes the name of a file containing a sequences and secondary structure and print out the energy to the console. You need to comment out some of the printf statements so that only the energy is printed out. Example: $./Energy.out test.txt -32.000 NuPack needs an energy lookup table called dataS_G.rna. It can be found in the NuPack download and has to be copied to the ./RNAVLab1.0/build/classes directory. RNAVLab should now be set up, let's see how to use it. ***************************************************************************************** 4. Using RNAVLab ***************************************************************************************** To rebuild the secondary structures from our test set (provided in the exampleInput directory), from the RNAVLab1.0 directory, go to the ./build/classes directory: $cd ./build/classes Issue the command for rebuilding the sequence, e.g., RF00001_A.bpseq: $ java rebuild/Rebuild ../../exampleInput/RF00001_A.bpseq The RF00001_A.bpseq file contains an RNA sequence with its secondary structure. The rebuild algorithm rebuilds the structure by using many different parameters, namely changing window size, step size, and threshold. Since the experimentally confirmed structure is in RF00001_A.bpseq, we can quantify how well the rebuild algorithm perform with respect to that particular sequence. If successful, you should get a stream of output printing to the screen. Now, a lot of data is printed, so you probably want to redirect the output as follows: $java rebuild/Rebuild ../../exampleInput/RF00001_A.bpseq > results.csv Because the rebuild class pretty much uses all the classes except the alignment one, if you have been successful so far, you can be pretty sure that all the other classes are working. We have included main methods in most of the classes so that they can be used separately. If you want more information on how to use the individual classes, use the "-h" option after each command. For example, from the RNAVLab1.0/build/classes directory, if you issue: $java ./align/Align -h you get detailed information on how to align secondary structures. Note that JVM must be called from the classes directory and not the package directories. In other words, you have to issue commands as follows: $java ./predict/Predict RG AUCACGUACGAUGCUAGCUAGCUGA These are incorrectly formed commands: $java Predict RG ACUGACUGAUCGUAGCUAGCUGAUCGUACGU $java Predict.class RG AACGAUCUAGUGAUGCAUGC These are the classes that have a main method and can be executed: Align.class Rebuild.class Predict.class DB.class They all have their corresponding -h usage option that will tell you how to use them. ***************************************************************************************** 5. Contact ***************************************************************************************** Any questions, comments or bugs can be reported to alicon AT udel.edu ***************************************************************************************** 6. References ***************************************************************************************** References: [1] M. Taufer, M-Y. Leung, A. Licon, D. Mireles, T. Solorio, D. Gomez-Leon, R. Araiza, K.K. Johnson: RNAVLab: A unified environment for computational RNA structure analysis based on grid computing technology. Submitted to the Journal of Parallel Computing, 2007. [2] M. Taufer, T. Solorio, A. Licon, D. Mireles, and M.-Y. Leung: On the Effectiveness of Rebuilding RNA Secondary Structures from Sequence Chunks. To appear in Proceedings of the Seventh IEEE International Workshop on High Performance Computational Biology (HiCOMB'08), April 2008, Miami, Florida, USA. [3] Pknots-RG J. Reeder and R. Giegrich. Design, implementation and evaluation of a practical pseudoknot folding algorithm based on on thermodynamics. BMC Bioinformatics, 5:104, 2004 [4] Pknots-RE E.Rivas and S. Eddy. A dynamic programming algorithm for RNA structure prediction including pseudoknots. J. Mol. Biol., 285:2053-2068, 1999 [5] NuPack N. Pierce. NuPack: A software suite for the analysis and design of nucleic acids. http://nupack.org