Performance of miR-BAG:

miR-BAG is a novel, reliable and accurate tool for prediction of miRNA genes with three fold prediction applicability:

i. Prediction from individual sequences
ii. Prediction from genomic sequences
iii. Prediction from Next Generation Sequencing data

Currently this tool works on nematode species, insect species and works for animals species. For each species the most sturdy training and testing datasets were created by random selection of sequences from the positive and negative datasets using in-house built scripts. The training sets consisted of equal say from the positive and negative sets. The negative set incorporated sequences from rRNAs, snoRNAs, snRNAs, tRNAs, Alu elements, SINE elements and pseudo hairpins as shown in the table 1

Table 1 Number of sequences used in testing and training sets

Training set Testing set
+ve -ve +ve -ve
miR r sno sn t sines ps miR r sno sn t sines ps
Homo sapiens 584 98 98 98 86 5 250 500 97 97 97 86 5 225
Canis familiaris 158 34 34 34 0 0 55 159 33 35 35 0 0 53
Mus musculus 348 75 75 75 0 0 123 348 75 75 75 0 0 123
Rattus norvegicus 195 43 45 44 0 0 60 196 30 30 30 0 0 84
Drosophila melanogaster 112 12 24 11 25 0 40 113 13 24 11 25 0 40
Caenorhabditis elegans 108 5 25 25 25 0 25 109 5 25 25 25 0 25

where miR= miRNA containing sequences, r= rRNA sequences, sno= snoRNA sequences, sn= snRNA sequences, t= tRNA sequences, ps= pseudo-hairpin sequences

For most of the classifiers an accuracy > 90% was achieved with an average accuracy of 91%. The average sensitivity was 89% with an average specificity of 93%. The highest Accuracy of (93.24%) as well as highest MCC (0.86) was achieved by Rattus norvegicus (Table 2).

Table 2 Performance of miR-BAG on different species.

Species TP FN TN FP Sensitivity Specificity Accuracy MCC
Homo sapiens 449 51 556 51 0.89 0.91 0.90 0.81
Canis familiaris 143 16 150 6 0.89 0.96 0.93 0.86
Mus musculus 302 46 319 29 0.86 0.91 0.89 0.78
Rattus norvegicus 184 12 161 13 0.93 0.92 0.93 0.86
Drosophila melanogaster 102 11 108 5 0.90 0.95 0.92 0.85
Caenorhabditis elegans 97 12 98 7 0.88 0.93 0.91 0.82

where TP= True Positive, FN= False Negative, TN= True Negative, FP= False Positive, MCC= Matthew's Correlation Coefficient

miR-BAG employs three classifiers to generate a single classifier using the bootstrap aggregating methodology (Bagging), the effect of the three classifiers was also analyzed by removing each of them one by one and observing the effect of the classifier on performance. The performance was tested on animal classifiers (Homo sapiens) as shown below.

Table 3 Effect of different classifiers

Homo sapiens Sensitivity(%) Specificity(%) Accuracy(%)
All three classifiers 89.80 91.59 90.78
SVM and Naive bayes 95.0 76.46 83.73
Best First Decision Tree and Naive bayes 93.20 76.27 83.92
Best First Decision Tree and SVM 92.40 80.56 85.90

miR-BAG introduces two novel features along with a novel implementation of a previously existing feature. To analyze the effectiveness of these features, feature selection was performed and it was shown that the two features had a significant contribution to the overall classification.

Table 4 Feature score of matrix for different species.

Species Matrix based scoring
Rank f-score
Homo sapiens 17 0.53
Canis familiaris 2 1.16
Mus musculus 1 0.81
Rattus norvegicus 1 1.02
Drosophila melanogaster 49 0.33
Caenorhabditis elegans 1 0.72

To further analyze effect of structural profile based matrix, a study was performed whereby, matrix scoring feature was removed and classification was done using the remaining features. This proved to be a prescient move as it clearly distinguished the contribution of structural profile based matrix.

Table 5 Effect of matrix on classification

Classification With Matrix Classification Without Matrix
Species Sensitivity (%) Specificity (%) Accuracy (%) Sensitivity (%) Specificity (%) Accuracy (%)
Homo sapiens 89.80 91.59 90.78 91.00 90.44 90.69
Canis familiaris 89.93 96.15 93.01 91.19 95.51 93.33
Mus musculus 86.78 91.66 89.22 89.94 87.93 88.93
Rattus norvegicus 93.87 92.52 93.24 92.34 91.37 91.89
Drosophila melanogaster 90.26 95.57 92.92 85.84 94.69 90.26
Caenorhabditis elegans 88.99 93.33 91.12 88.07 93.33 90.65

Benchmarking:

miR-BAG was compared with six other tools on test data set of Homo sapiens. To analyze the performance of miR-BAG.

Table 6 Performance of miR-BAG against six different software on unseen data.

Animal (Human) Sensitivity(%) Specificity(%) Accuracy(%)
miRPara 83.80 78.91 81.12
miRNASVM 43.20 94.72 71.45
mirEval 81.40 79.24 80.21
Triplet SVM 70.20 93.57 83.02
microPred 16.60 67.54 44.53
CSHMM 98.40 24.05 57.63
miR-BAG 89.80 91.59 90.78

ROC Curve:

Animal Classifiers ROC Curve:

 

miR-BAG Home Page


miR-BAG Workflow
Related Link

 

Copyright © 2012, CSIR-Institute of Himalayan Bioresource Technology.
Developed & Maintained by Heikham Russiachand Singh, SCBB, Biotech Division