# Copyright @ 2002 - 2010 The Institute for Genomic Research (TIGR). # All rights reserved. # # This software is provided "AS IS". TIGR makes no warranties, express or # implied, including no representation or warranty with respect to the # performance of the software and derivatives or their safety, # effectiveness, or commercial viability. TIGR does not warrant the # merchantability or fitness of the software and derivatives for any # particular purpose, or that they may be exploited without infringing the # copyrights, patent rights or property rights of others. # # This software program may not be sold, leased, transferred, exported or # otherwise disclaimed to anyone, in whole or in part, without the prior # written consent of TIGR. BLAST SCORE RATIO ANALYSIS DESCRIPTION: BSR.pl compares a reference proteome and two query proteome. It Blasts each reference protein against itself, record the BLAST score. Blast each query protein against the reference and record the BLAST score. The BLAST SCORE RATIO is calculated by dividing each Query BLAST score by the reference BLAST Score. BLAST SCORE RATIOS are plotted as (x,y) coordinates. Synteny plots are generated by plotted the position of the hit in the second genome, and coloring the point with according to the BLAST SCORE RATIOS. The script make use of gnuplot and ggobi to visualize the output. INSTALLATION LIBRARY REQUIREMENTS Libraries are provided in the direcltory lib. This directory can be copies to (for Mac OSX /Library/perl/ or Linux /usr/lib/perl/ ) SOFTWARES: NCBI Blast (blastall) should be installed and available in the path. gnuplot can be obtained from www.gnuplot.org ggobi can be obtained from www.ggobi.org RUNNING THE SCRIPT: INPUTS: The following files are input to the script: For each of the three genomes: genome_name.pep and genome_name.coords The coordinate files should have the same name as the .pep file with the extension .coords They should be located in the same directory as the .pep files. The .pep file is multifasta formatted file of each peptide in the genome/ Each .pep file has to be blastable with blastall. To do so run the following command on each .pep file formatdb -i reference.pep The coordinate file looks like: feat_name::end5::end3::com_name ORF00003::1422316::1421633::hypothetical protein ORF00004::1420851::1421462::peptidase, M23/M37 family ORF00005::1420185::1420763::ATP:cob(I)alamin adenosyltransferase ORF00006::1418272::1420044::sensor histidine kinase ResE ORF00007::1417556::1418269::DNA-binding response regulator ResD ORF00008::1416096::1417250::resC protein ORF00009::1414456::1416078::resB protein Running the script: BSR.pl -R reference.pep -Q1 query1.pep -Q2 query2.pep -h --help> Prints this information. -v --version> Prints version information. -R Reference genome peptide file in fasta format (required with reference.coords) -Q1 Query genome 1 peptide file in fasta format (required with query1.coords) -Q2 Query genome 2 peptide file in fasta format (required with query2.coords OUTPUT: Multiple output are generated: All output files are written into a directory named: reference_query1_query2.txt A. Text files 1. reference_query1_query2.txt ORF00003 469.5 ORFB01531 449.1 0.956549520766773 14579ORF2383_1439332_1438649_RZC00098 468.0 0.996805111821086 hypothetical protein ORF00004 414.5 ORFB01530 375.2 0.90518697225573 14579ORF2382_1437867_1438478_RZC07567 408.7 0.986007237635706 peptidase, M23/M37 family ORF00005 376.3 ORFB01528 344.4 0.915227212330587 14579ORF2381_1437201_1437779_RZC05647 372.5 0.98990167419612 ATP:cob(I)alamin adenosyltransferase, putative ORF00006 1137.5 ORFB01527 1096.6 0.964043956043956 14579ORF2380_1435289_1437061_RZC00162 1117.4 0.98232967032967 sensor histidine kinase ResE B. Visualization with gnuplot 1.BSR Analysis gnuplot query1_qery2.gp postscript and xfig files are also created with the name query1_query2.ps and query1_query2.xfig when gnuplot is run 2.Synteny plot gnuplot reference_query1.gp or gnuplot reference_query2.gp As for the BSR Analysis plot postscript and xfig files are also generated with the extension .ps and xfig C. Visualization with ggobi 1. BSR Analysis ggobi query1_qery2.xml 1. Synteny Analysis ggobi reference_query1.xml ggobi reference_query2.xml To visualize the BSR plot use the ViewMode menu the PROJECTION MODE XYplot and to view the annotation upon mouse-over of the points use INTERACTION MODE "identity. Please see the ggobi manual for more details. ggobi is available under the GNU license at www.ggobi.org. STATISTICAL ANALYSIS: The script BSR_stats.pl can be run on the reference_query1_query2.txt It will calculate the following statisticals parameters based on the series for BSRs for each of the genomes used as query. Examples STATISTICAL ANALYSIS OF THE OVERALL BSR DATA - caviae muridarum -------------------------------------------------------- PROTEIN ANALYZED 1118 1118 -------------------------------------------------------- MEDIAN 0.6544 0.5409 -------------------------------------------------------- MEAN 0.5665 0.4801 -------------------------------------------------------- STD DEVIATION 0.2753 0.2774 -------------------------------------------------------- VARIANCE 0.0758 0.0769 -------------------------------------------------------- MED ABS DEVIATION 0.2429 0.3135 -------------------------------------------------------- USAGE: BSR_stats.pl reference_query1_query2.txt OUTPUT is SDTOUT EXAMPLES: Files for of 3 chlamydial genomes are provided for the purpose of testing the script; caviae.pep : Chlamydia caviae muridarum.pep : Chlamydia muridarum pneumoniae.pep : Chlamydia pneumoniae To test the script run for example: (any genome can be used as reference): BSR.pl -R caviae.pep -Q1 muridarum.pep -Q2 pneumoniae.pep all output will be in: caviae_muridarum_pneumoniae.dir list of output: bin/ pneumoniae_caviae_muridarum.group pneumoniae_muridarum.xml caviae_muridarum.gp pneumoniae_caviae_muridarum.plot pneumoniae_muridarum_only.group caviae_muridarum.xml pneumoniae_caviae_muridarum.txt pneumoniae_only.group pneumoniae_caviae.gp pneumoniae_caviae_only.group pneumoniae_caviae.xml pneumoniae_muridarum.gp REFERENCE: See http://www.microbialgenomics.org/BSR/ for more information and update.