Segmenta: A Bioinformatics Tool for Finding Inversions and RNA Segmentation Software Version: Segmenta 2.0 March 2015 Document Author: RNAVLab, Bioinformatics Program, UTEP Email: rnavlab@utep.edu Table of Contents for Release Notes with Updates for R on Windows 10: Updated 7/4/2020 ***************************************************************************************** 1. Root Directory (\Segmenta20) and Output Subdirectory (\Segmenta20\inverslist) 2. Use of FASTA and Knowlegde of Sequence Inversions 3. R Environment (GUI) Required for Running Segmenta 2.0 in Windows 10 4. Find Inversions, Prepare Inversion List, Run Segmentation, and Generate Files for R 5. Files Generated in the Segmenta Root Directory and \inverslist Subdirectory 6. Version 2.0.150303 for Including Smaller Overlapping Inversions as Default ***************************************************************************************** ***************************************************************************************** 1. Root Directory (\Segmenta20) and Output Subdirectory (\Segmenta20\inverslist) ***************************************************************************************** After unzipping the Segmenta20.zip file, the root directory (\Segmenta20) for the Segmenta20.jar executive file and output subdirectory (\Segmenta20\inverslist) for chunk files and PDF plots will be created automatically. ***************************************************************************************** 2. Use of FASTA Files and Knowledge of Sequence Inversions ***************************************************************************************** Sequence file must be prepared in FASTA format and placed in the root directory for loading. Type the filename in the first input field or click "Browse" to select the .fasta file, and click the radio buttons for optional choices in finding the inversions. In addition, you may select your own starting and ending values of minimum stem lengths, maximum gaps, and number of mismatches allowed for Segmenta to run iteratively within the ranges. Default entries will show up after clicking the "Find Inversions" button and you may edit those values. ***************************************************************************************** 3. R Environment (GUI) Required for Running Segmenta 2.0 in Windows 10 ***************************************************************************************** In addition to the Java environment with the latest version for Windows 7 or later, Segmenta 2.0 requires R to be installed on your computer running either 32-bit or 64-bit Windows 7 or later, and for R, Version 2.14.2 or later is recommended. For other R versions already installed on your computer, such as R-4.0.2 for Windows 10, you may click the "Browse" button for finding "Path for R" in the lower left-hand corner of Segmenta 2.0. For example, click all the way down to "C:\Program Files\R\R-4.0.2\bin\x64\Rgui.exe" in a typical path for R 4.0.2 running in Windows 10. When running R for the first time after installation, there may be errors of missing packages (e.g., "fields" and "doBy"). To install them, pull down "Packages" on the top menu, select "Install Package(s)..." and follow the instructions. For newly installed R, you have to set a mirror site for download by clicking "Set CRAN Mirror..." to select the closest one in the list (e.g., "USA(TX1)[https]" for Texans). ***************************************************************************************** 4. Find Inversions, Prepare Inversion List, and Run Segmentation ***************************************************************************************** After finding the inversions, you may type in your own file name for the generated output file or you may just use the default one. Inversion list files are generated automatically and place in the inverslist subdirectory to be used by Segmenta 2.0R while running RNA segmentation. Then, you may enter the value of Maximum Chunk Size (C) or use the default of 100, and click "Run Segmentation" for the GUI version of R to load, where you select "Source R Code..." under the File menu. Highlight "Segmenta20.R" (with all methods) or "Segmenta21.R" (without using Optimized Method for shorter running time), and click the file to run the program. Output text, FASTA, and PDF Rplot files will be placed in inverslist subdirectory. The FASTA files (in subdirectory \inverslist) for chunks after segmentation can be loaded directly to RNASSA 2.0 for prediction. ***************************************************************************************** 5. Files Generated in the Segmenta Root Directory and \inverslist Subdirectory ***************************************************************************************** sequenceName_filelist.txt: list of FASTA files and PDF Plots generated or excluded sequenceName_inverslist.txt: list of inversions indicated by the start and end positions sequenceName_noinversfiles.txt: list of files with no inversion sequenceName_output.txt: output file of the sequence and their inversions InversList_param.txt: a parameter file for R, overwritten by running another sequence In \inverslist Subdirectory: sequenceName.txt: a text file of the sequence in letters without sequence name (not FASTA) sequenceName_C_Regular.fasta: sequence plus chunks cut by Regular Method (only one file) sequenceName...Centered.fasta: chunks cut by Centered Method sequenceName...Optimized.fasta: chunks cut by Optimized Method sequenceName..._Plot.pdf: Excursion plots in PDF files for each method ***************************************************************************************** 6. Version 2.0.150303 for Including Smaller Overlapping Inversions as Default ***************************************************************************************** In previous versions, the default was excluding overlaps and to include them by clicking the radio button "Report overlaps." In this release, the results will include overlaps by default and user has to click the radio "Exclude inversions inside longer ones" to exclude this type of inversions. ***************************************************************************************** Please email us at rnavlab@utep.edu immediately if you encounter problems or have comments. The latest version is Segmenta 2.0.150303.