University of Nebraska at Omaha bioinformatics group

Bayesian sensor

New! Our new Bayesian Splice Sites (SS) sensor has been shown to outperform contemporary Maximum Entropy Sensor on 5' SS for several representative test sets, such as set of 250 human genes and short 5' UTR human gene fragments excluded from the learning set as well as collection of 183 rat genes. On the same test sets performance of our new Bayesian 3' SS sensor is as good or better than that of Maximum Entropy Sensor. Please refer to our CSB2005 poster for details. Our implementation of Bayesian SS sensor is available as Perl wrapper. We do believe that sensor performance could be generalized to a broad spectrum of tetrapoda organisms, since genes responsible for recognition of splicing motifs (such as encoding U1 and U2 snRNPs) are among the most conservative known genes.

SpliceScan

New! Our new Ab initio gene annotation tool SpliceScan uses interaction between different signals. In our prediction we rely on Donor/Donor, Donor/Acceptor, Acceptor/Acceptor and Acceptor/Donor interactions plus set of ESE/ISE/ESS signals that substantially enhance prediction quality. Tool was conceived as a splicing simulator with objectives similar to ExonScan engine. Our tool explores different approach of splice sites detection based on splice sites definition. If available, it includes information related to exon and intron definition models combined with prediction of Bayesian splice sites sensor. This approach is especially efficient for short pre-mRNA fragments. SpliceScan has been shown to perform better than SpliceView, GeneSplicer, NNSplice, NetUTR and Genio tools on the ROC curve for the test set of 250 human genes excluded from the learning set and collection of 183 rat genes. For the test set of short 5'UTR human gene fragments, with cross-correlation removed between SpliceScan learning and the test set, our tool outperforms all the contemporary gene structural prediction methods as could be seen here.
  • Run SpliceScan tool online for 5' SS (donor) and 3' SS (acceptor)
  • Download SpliceScan tool
  • Poster presented at CSB2005 and short report
  • Recent article in Biology Direct journal and my dissertation that can explain the SpliceScan in more details
  • Updated ROC diagrams for different applications. In this experiment cross-correlation has been removed between learning and test set.
    • The ROC curves were obtained using our online web crawling application, capable of querying test sequences against various web tools and parsing the results. Please notice difference between the standard ROC curves (False positive fraction vs. True positive fraction) and the ROC curves we use (Sensitivity vs. 1 - Specificity).
  • We used MHMMotif tool to learn some of the ESE/ISE motifs used by SpliceScan

GIGOgene engine

GIGOgene test results

We used the following gene structural prediction quality test framework to obtain our data. Exon level precision based on Genie learning set of 462 human genes. Our GIGOgene application outperforms contemporary Homology Based annotation tools we have looked at in terms of exon level Sensitivity and Specificity.

Exon level precision based on Genie learning set of 462 human genes. Our GIGOgene application outperforms contemporary Homology Based annotation tools we have looked at in terms of exon level Sensitivity and Specificity.


TE
AE
PE
ESn
ESp
Galahad
4744
4909 4790 96.64% 99.04%
Spidey
4827
4909 4847 98.33% 99.59%
EST2Genome
4742
4909 4752 96.60% 99.79%
Sim4
4837
4909 4845 98.53% 99.83%
BLAT
4832
4909 4902 98.43% 98.57%
GIGOgene
4864
4909 4865 99.08% 99.98%

We have compared performance of different programs on human genes containing microexons. First we parsed gene structures for the whole human genome to find genes containing microexons (2-11nt). Then we carefully examined splice sites to be canononical in the genomic structures predicted. We compared predicted structures with other program annotations and got the following results in terms of exonic level Sensitivity and Specificity:


TE
AE
PE
ESn
ESp
Galahad
1220
1422
1278
85.79% 95.46%
Spidey
1251
1422
1334
87.97%
93.78%
EST2Genome
1270
1422
1318
89.31%
96.36%
Sim4
1278
1422
1326
89.87%
96.38%
BLAT
1375
1422 1424 96.69% 96.56%
GIGOgene
1420
1422 1422 99.86% 99.86%


Contact e-mail: Alexander Churbanov