p-TAREF performance (Project funded by DBT, Govt. of India.)

p-TAREF is a tool/server to identify miRNA targets in plant transcriptome accurately and precisely with high speed. The training file for model was built by taking 104 experimentally validated sequences (source Beauclair et al.. *) as positive dataset and 119 sequences as negative dataset which also contains 32 experimentally validated negative sequences used by Heikham and Shankar (2010), using SVR classifier. We first tested our model on the training file as the test file for classifier. Out of 119 negative instances, our model predicted 100 sequences as negative (TN) and out of 104 positive targets, 98 were identified as positive instance (TP), with sensitivity of 94.230 % and specificity of 84.033%. The accuracy of p-TAREF for this test was ~ 89%. To test our model again, we used experimentallly validated sequences from ASRP** database. We removed redundancy by removing common target sequences shared by Beauclair et al.(2010) dataset. After removing redundancy,we had 125 unique sequences on which we built our testing set. These 125 sequences are targets for 287 unique miRNA. Our model predicted 285 targets (TP) out of 287 experimentally validated targets and 100 as non targets (TN) out of 119 negative instances. The sensitivity and specificity for this test have been 99.303 % and 84.033 % respectively, accuracy in this case is ~ 98% . We compared our tool's performance against two recent tools for plant miRNA target identification, and found it performing better than them in many aspects.


Abbreviations:
        TP=True Positive
        TN= True Negative
        FP=False Positive
        FN= False Negative
        Sn=Sensitivity
        Sp=Specificity
        MCC=Matthew Correlation Coefficient

 
psRNA target
Target-align
p-TAREF
 Beauclair et al.* ASRP**Beauclair et al.* ASRP**Beauclair et al.* ASRP**
TP245119203103288285
FN73168115184302
TN119 119119119100100
FP00001919
Sn 77.044 41.16 63.836 35.888 90.566 99.303
Sp 100 100 100 100 84.033 84.033
MCC 0.6910 0.4146 0.5697 0.4586 0.726 0.874
ACU% 83.895 58.620 73.684 50.800 88.787 97.222

Table 1 shows data for p-TAREF when executed from the beginning. The target sequences reported by Beauclair et al. were targeted by 318 unique miRNAs. The Kernel used by p-TAREF for this comparison data has been Linear one, which shows that even the less better performing kernel implementation of p-TAREF, achieved higher accuracy than the compared tools.

Impact of kernel selection on accuracy

 
Polynomial Kernel
RBF/Gaussian kernel
Linear kernel
 
Beauclair et al.
ASRP
Beauclair et al.
ASRP
Beauclair et al.
ASRP
TP
104
262
98
285
98
285
FN
0
25
6
2
6
2
TN
119
119
113
113
100
100
FP
0
0
6
6
19
19
Sn
100
91.289
94.23
99.303
94.23
99.303
Sp
100
100
94.957
94.95
84.033
84.033
MCC
1
0.8685
0.8918
0.9522
0.781
0.874
ACU%
100
93.842
94.618
98.029
88.787
97.222

Table 2 shows the performance of p-TAREF on diffrent kernel. Use of Polynomial Kernel introduces high stringency, while moderate stringency is achieved by Gaussian Kernel and least stringency is obtained by using linear kernel (As suggested by TP and FP rates at various kernels). As the reported sequences by Beauclair et al. were targeted by 318 unique miRNAs, to make our training set non-redundant, we removed those target sites which are targeted by miRNAs of same family. The micro RNA mir-156 family target the same sequence at same position (as mir-156a,b,c .. have same mature miRNA sequences). Therefore, we removed such sequences from our training data while the experimentally validated sequences from ASRP database, were targeted by 287 unique miRNAs (after removing redundancy).

p-TAREF performance on Target-align/TAPIR Reference set

We downloaded the reference dataset, contianing 102 (3 targets were targeted by 2 diffrent miRNAs, mir165/mir166 that makes total 105 targets) experimentally validated plant miRNA targets, previously used by TAPIR and Target-align for their performance measurement. The target sequences were downloaded from TAIR and the sequences and their target site was subjected to p-TAREF. The found performance of p-TAREF on the given reference dataset for benchmarking was found to better than TAPIR (91.83%) and Target-align (93.14%), as can be found below:

Linear kernel
RBF/Gaussian kernel
Polynomial kernel
TP
102
102
105
FN
3
3
0
TP Rate
97.142 %
97.142 %
100 %

The TP Rate and FP Rate comparision between Tapir, Target-align and p-TAREF.

 
Tapir
Target-align
p-TAREF
 
Fasta
RNAhybrid
Less stringent
More stringent
Polynomial kernel
TP Rate (%)
91.83
93.14
97.05
93.14
100
FP Rate (%)
81.47
88.97
84.0
57.8
56.2

We used the same dataset as was used by Target-align and Tapir to measure their respective performances. Our tool identifies total 689 targets (both positive and negative). After comparing the result with experimentally validated data, as practiced by Bonnet et al in their supplementery table, p-TAREF scored 100 % True positive rate (225 targets were matching with dataset, after removing duplicates, all 102 targets were successfully identified). To calculate the false positive rate we considered the targets predicted in target mRNAs, which fall outside the experimetnally validated target sites( similar protocol as used by TAPIR and Target-aling for their performance mesurement) as the false positives.

FP Rate = (339/604) X 100 = 56.2 %
(as applied by Bonnet et al).

ROC/AUC Curve for p-TAREF




























References:
1. TAPIR, a web server for the prediction of plant microRNA, Bonne et al., Bioinformatics, 2010,
     12, 1566-1568.
2. Target-align: a tool for plant microRNA target identification, Xie et al., Bioinformatics, 2010,
     23, 3002-3003.

Dataset Reference:
1. Endogenous siRNA and miRNA targets identified by sequencing of the Arabidopsis degradome,
    Addo-Quaye et al., Curr. Biol., 2008, 18, 758-762.
2. microRNA-directed phasing during trans-acting siRNA biogenesis in plants, Allen et al.,Cell,
     2005, 121,     207-221.
3. Comprehensive prediction of novel microRNA targets in Arabidopsis thaliana, Alves et al.,
     Nucleic Acids    Res , 2009, 37, 4010-4021.
4. Global identification of microRNA-target RNA pairs by parallel analysis of RNA ends, German et al., Nat Biotechnol, 26, 941-946.


Performance of p-TAREF on psRNAtarget dataset

The data set used by Dai at al. is used to measure accuracy of our tool. Out of given 46 validated targets, 45 were identified by p-TAREF. The accession Ids of the targets and the corrosponding miRNA is given below:

Experimentally validated Targets id used by Dai et al.
miRNA
AT1G27360
ath-miR156g
AT1G27370
ath-miR156g
AT1G53160
ath-miR156g
AT1G69170
ath-miR156g
AT2G42200
ath-miR156g
AT3G57920
ath-miR156g
AT5G43270
ath-miR156g
AT5G50570
ath-miR156g
AT5G50670
ath-miR156g
AT1G27370
ath-miR156a
AT1G53160
ath-miR156a
AT1G69170
ath-miR156a
AT2G33810 
ath-miR156a
AT5G43270
ath-miR156a
AT1G66690
ath-miR163
AT1G66700
ath-miR163
AT1G66720
ath-miR163
AT3G44860
ath-miR163
AT1G56010
ath-miR164a
AT3G15170
ath-miR164a
AT5G07680
ath-miR164a
AT5G53950
ath-miR164a
AT5G61430
ath-miR164a
AT1G30330 
ath-miR167c
AT5G37020
ath-miR167c
AT1G48410
ath-miR168a
AT5G12840
ath-miR169b
AT2G28550
ath-miR172a
AT3G54990
ath-miR172a
AT4G36920
ath-miR172a
AT5G60120
ath-miR172a
AT5G67180
ath-miR172a
AT1G50055
ath-miR173
AT2G27400
ath-miR173
AT2G39675
ath-miR173
AT2G39681
ath-miR173
AT1G27360
ath-miR157d
AT1G27370
ath-miR157d
AT1G53160
ath-miR157d
AT1G69170
ath-miR157d
AT2G42200
ath-miR157d
AT3G15270
ath-miR157d
AT3G57920
ath-miR157d
AT5G43270
ath-miR157d
AT5G50570
ath-miR157d
AT5G50670
ath-miR157d

Reference:
1. psRNATarget: a plant small RNA target analysis server, Dai et al., Nucleic Acids Res, 2011,1-5.

Performance of p-TAREF on Rice miRNA target prediction

We downloaded experimentally validated rice miRNA target sequences reported by Li et al..(2010) from RiceGE: Rice Functional Genomic Express Database, and run p-TAREF on these sequences to find out miRNA targets in rice. Identification accuracy of p-TAREF, noted over Rice sequences was tremendous, as can be found below. The list of rice targets gene ID is given below:

Rice gene Ids
miRNA
Os01g69830
osa-miR156
Os06g45310
osa-miR156
Os03g02970
osa-miR162
Os12g41680
osa-miR164
Os05g39650
osa-miR164
Os12g41860
osa-miR166
Os11g30370
osa-miR156
Os06g47150
osa-miR160
Os06g46270
osa-miR164
Os08g10080.1
osa-miR164
Os03g50040
osa-miR164
Os03g01890
osa-miR166
Os10g33960
osa-miR166
Os12g41950
osa-miR167
Os02g53620
osa-miR169
Os03g29760
osa-miR169
Os03g48970
osa-miR169
Os07g41720
osa-miR169
Os07g33790
osa-miR167
Os02g06910
osa-miR167
Os06g46410
osa-miR167
Os03g07880
osa-miR169
Os03g44540
osa-miR169
Os07g06470
osa-miR169
Os02g44360
osa-miR171
Os01g25484
osa-miR2105
Os04g55560
osa-miR172
Os01g04550
osa-miR172
Os05g05800
osa-miR393
Os02g47280
osa-miR396
Os12g42400
osa-miR169
Os04g46860
osa-miR171
Os05g03040
osa-miR172
Os04g32460
osa-miR393
Os01g69940
osa-miR394
Os03g47140
osa-miR396
Os06g02560
osa-miR396
Os07g46990.1
osa-miR398
Os08g37670
osa-miR408
Os08g07540
osa-miR414
Os01g55880
osa-miR414
Os12g29980
osa-miR396
Os06g29430
osa-miR396
Os06g11490
osa-miR408
Os07g46990.1
osa-miR398
Os08g37670
osa-miR408
Os08g07540
osa-miR414
Os01g55880
osa-miR414
Os06g11310
osa-miR528
Os12g29980
osa-miR396
Os06g29430
osa-miR396
Os06g11490
osa-miR408


Accuracy for p-TAREF for prediction for miRNA targets was 88 % (Gaussian kernel), when only Arabidopsis miRNAs were considered. When we incorporated rice specific miRNAs in our miRNA library along with Arabidopsis miRNAs, accuracy reached 94.66%. When we predicted targets directly through the SVR, without incorporating the RNAhybrid and filtering steps, the accuracy achieved was 97.33%, showcasing extremely good performance of p-TAREF

Reference :
1. Transcriptome-wide identification of microRNA targets in rice, Li et al., Plant J., 2010, 62, 742-759.
2. Rice GE : “http://signal.salk.edu/cgi-bin/RiceGE”.

Performance of p-TAREF on Medicago truncatula target prediction

We downloaded experimentally validated Medicago truncatula target sequences reported by Jagadeeswaran et al.. from Medicago genome project, and run p-TAREF on these sequences to find out miRNA targets. In their paper they found 19 targets for miRNAs. The list of Medicago truncatula targets gene ID is given below:


Transcript Ids
miRNA
ES612384
mtr-mir160
AC150443_32.2
mtr-mir162
AC14478_44.4
mtr-mir167
AW773594
mtr-mir168
A238429
mtr-mir172
AC146721_16.4
mtr-mir395 (removed)
AC135467_30.2
mtr-mir397 (removed)
AC161863_13.2
mtr-mir408 (removed)
BQ148941
mtr-mir160
AC203553_1.1
mtr-mir164
CU32639_14.1
mtr-mir167
AC121238_43.2
mtr-mir170
AC133780_22.2
mtr-mir393
AC203224_21.2
mtr-mir397
BG583436
mtr-mir408
AC144658
mtr-mir399
AC202360_18.1
mtr-mir2118
AC143338_38.2
mtr-mir2118
AC203224_171
mtr-mir2118

Out of above mentioned miRNAs, three miRNAs were removed from miRBase (Release 17). We found sequences of 9 transcripts (out of 16 as 3 microRNA were removed from miRBase) and executed p-TAREF on these sequences the accuracy attained for this data is 100%

Reference :
1. Cloning and characterization of small RNAs from Medicago truncatula reveals four novel legume-specific microRNA families, Jagadeeswaran et al., New Phytol., 2009, 184, 85–98.
2. Medicago Genome Sequence Consortium : “(http://www.medicago.org/genome/downloads/Mt2/”.

Performance of p-TAREF on Solanum lycopersicum target prediction

12 experimentally validated Solanum lycopersicum miRNA-target sequences was reported by Moxon et al. for 11 miRNA but some these of miRNAs are discarded in mirbase (version 17), therefore we were left with 8 miRNA-target complex. The mRNA sequences were given in their supplementry file, these targets were subjected to SVM classification all the three kernels predicts all 8 targets successfully (100 % Accuracy). The list of targets gene ID and corrosponding miRNA is given below:

Target Id
miRNA
SGN-U324312
sly-miR156
SGN-U317177
sly-miR156
SGN-U319736
sly-miR156
SGN-U324618
sly-miR160a
SGN-U321033
sly-miR166
SGN-U327976
sly-miR167
SGN-U314858
sly-miR171
SGN-U333058
sly-miR172

Reference :
1. Deep sequencing of tomato short RNAs identifies microRNAs targeting genes involved in fruit ripening, Moxon et al., Genome Res., 2008, 18, 1602-1609.

Performance of p-TAREF on Populus euphratica target prediction

We downloaded populus miRNA target sequences validated by Li et al. for Populus euphratica and Populus trichocarpa from Populus trichocarpa v1.1. These targets were subjected to SVM classification. For Populus trichocarpa we found sequences of 17 targets out of 21, and 16 targets were successfully identified by both Gaussian and Polynomial kernel, with accuracy of 94.11 %. For Populus euphratica 18 and 21 targets out of 24 were successfully identified by Polynomial (Acu = 75.0 %) and Gaussian (Acu = 87.5 %) kernel respectively.

The list of verified miRNA targets of Populus trichocarpa is given below:

Target Id
Populus trichocarpa miRNA
jgi|Poptr1_1|733659|estExt_Genewise1_v1.C_LG_XV2187
ptc-miR156k
jgi|Poptr1_1|769914|fgenesh4_pg.C_LG_X001404
ptc-miR156k
jgi|Poptr1_1|733659|estExt_Genewise1_v1.C_LG_XV2187
ptc-miR156a
jgi|Poptr1_1|769914|fgenesh4_pg.C_LG_X001404
ptc-miR156a
jgi|Poptr1_1|178285|gw1.I.6885.1
ptc-miR159f
jgi|Poptr1_1|558202|eugene3.00091462
ptc-miR159f
jgi|Poptr1_1|208135|gw1.V.3536.1
ptc-miR164f
jgi|Poptr1_1|218417|gw1.VII.2722.1
ptc-miR164f
jgi|Poptr1_1|570963|eugene3.00130327
ptc-miR475a
jgi|Poptr1_1|570289|eugene3.00120942
ptc-miR156k
jgi|Poptr1_1|570289|eugene3.00120942
ptc-miR156a
jgi|Poptr1_1|208135|gw1.V.3536.1
ptc-miR164d
jgi|Poptr1_1|218417|gw1.VII.2722.1
ptc-miR164d
jgi|Poptr1_1|588910|eugene3.02710006
ptc-miR482-1
jgi|Poptr1_1|675520|grail3.0250000201
ptc-miR482-1
jgi|Poptr1_1|276236|gw1.182.27.1
ptc-miR1444a
jgi|Poptr1_1|769914|fgenesh4_pg.C_LG_X001404
ptc-miR156i
jgi|Poptr1_1|292259|gw1.6326.1.1
ptc-miR166b
jgi|Poptr1_1|778882|fgenesh4_pg.C_LG_XVIII000250
ptc-miR166b
jgi|Poptr1_1|832118|estExt_fgenesh4_pm.C_LG_VI0713
ptc-miR166b
jgi|Poptr1_1|797557|fgenesh4_pm.C_LG_I000560
ptc-miR166n

The list of newly discovered miRNA of Populus euphratica and their targets is given below:

Target Id
Populus euphratica miRNA
jgi|Poptr1_1|548199|eugene3.00010640
peu-miR30a
jgi|Poptr1_1|579296|eugene3.105640001
peu-miR30a
jgi|Poptr1_1|788190|fgen esh4_pg.C_scaffold_263000013
peu-miR30a
jgi|Poptr1_1|548199|eugene3.00010640
peu-miR30b
jgi|Poptr1_1|430410|gw1.VIII.1137.1
peu-miR67x
jgi|Poptr1_1|554868|eugene3.00031501
peu-miR67*
jgi|Poptr1_1|548199|eugene3.00010640
peu-miR71*
jgi|Poptr1_1|579296|eugene3.105640001
peu-miR71*
jgi|Poptr1_1|640215|grail3.0008024501
peu-miR71*
jgi|Poptr1_1|788190|fgenesh4_pg.C_scaffold_263000013
peu-miR71*
jgi|Poptr1_1|55274
peu-miR77
jgi|Poptr1_1|732312|estExt_Genewise1_v1.C_LG_XIV3469
peu-miR77
jgi|Poptr1_1|806761|fgenesh4_pm.C_LG_XIII000061
peu-miR84*
jgi|Poptr1_1|656445|grail3.0010018301
peu-miR93aa
jgi|Poptr1_1|714215|estExt_Genewise1_v1.C_LG_IV3721
peu-miR93aa
jgi|Poptr1_1|656445|grail3.0010018301
peu-miR93b
jgi|Poptr1_1|570289|eugene3.00120942
peu-miR131
jgi|Poptr1_1|733659|estExt_Genewise1_v1.C_LG_XV2187
peu-miR131
jgi|Poptr1_1|755123|fgenesh4_pg.C_LG_II001303
peu-miR131
jgi|Poptr1_1|769914|fgenesh4_pg.C_LG_X001404
peu-miR131
jgi|Poptr1_1|733659|estExt_Genewise1_v1.C_LG_XV2187
peu-miR58
jgi|Poptr1_1|769914|fgenesh4_pg.C_LG_X001404
peu-miR58
jgi|Poptr1_1|793900|fgenesh4_pg.C_scaffold_9189000001
peu-miR131
jgi|Poptr1_1|829056|estExt_fgenesh4_pg.C_170200031_1
peu-miR106*
jgi|Poptr1_1|837031|estExt_fgenesh4_pm.C_1230037
peu-miR106*
jgi|Poptr1_1|434998|gw1.57.264.1jgi
peu-miR115a
jgi|Poptr1_1|817423|estExt_fgenesh4_pg.C_LG_III1182
peu-miR123a
jgi|Poptr1_1|180750|gw1.I.9350.1
peu-miR101a


Reference :
1. Genome-wide characterization of new and drought stress responsive microRNAs in Populus euphratica, Li et al., J Exp Bot., 2011, doi:10.1093/jxb/err051.
2. Populus trichocarpa v1.1 "http://genome.jgi-psf.org/Poptr1_1/Poptr1_1.home.html".

p-TAREF performance measure on introduction of concurrency

Data for p-TAREF execution time when executed on different processors. Configuration of system on which p-TAREF speed performance was measured "2 X Intel Xeon (Quad core)" processors, RAM 24GB, file size = 381.7 kb, Number of genes is equal to 205

# of processor/ (mismatches)
8
4
2
1
p-TAREF (4)
1 Hour 43 min 3 Hours 21 min 5 Hours 01 min 8 Hours 37 min
p-TAREF (3)
1 Hours 17 min 3 Hours 00 min 4 Hours 34 min 6 Hours 07 min
p-TAREF (2)
46 min 2 Hours 21 min 3 Hours 53 min 5 Hours 42 min
p-TAREF (1)
42 min 1 Hours 52 min 3 Hours 14 min 4 Hours 21 min
p-TAREF (0)
37 min 1 Hours 14 min 2 Hours 05 min 3 Hours 01 min
Target-align
N/A
N/A
N/A
92 Hours 26 min

* microRNA-directed cleavage and translational repression of the copper chaperone for superoxide dismutase   mRNA in Arabidopsis, Beauclair et al., Plant J., 2010, 62, 454 – 462.

**Arabidopsis small RNA project "http://asrp.cgrb.oregonstate.edu/".


Work Flow Illustration







Some Related Links
RNAhybrid
miRBase
TAIR
RiceGE
ASRP
PMRD
weigel world

Administrator: Ashwani Jha and Heikham Russiachand Singh
Copyright © 2011, Institute Of Himalayan Bioresource Technology.