miRNAs are small non coding RNAs of length ~21 nt and were found to be expressed in both human and plants, playing multiple important roles in regulating gene expression and biological pathways. Over the years only those small RNAs were considered as miRNAs which followed the canonical miRNA biogenesis pathway and showed canonical features such as presence of loop, hairpin structure duplex mature miRNAs etc. It was also believed that endogenous miRNAs required host gene promoters for their transcription. In recent years, specially after introduction of Next generation sequencing techniques many sRNAs have been discovered in various species. Some of these novel identified sRNAs follows canonical miRNA biogenesis pathway wereas others sRNAs follows alternative biogeneis pathways and does not contains typical miRNA properties, still shown to regulate target gene expression.

We have performed a comprehensive study and identified 11,234 novel small regulatory RNAs (rsRNAs) which regulates about 17,000 unique genes. The role of these 11,234 novel rsRNAs were studied with respect to 25 different cancerous conditions. Data for these cancerous conditions was downloaded from The Cancer Genome Atlus (TCGA) and Gene Expression Omnibus (GEO) . Data from 4,997 individuals were considered for the analysis which results in 260 Gb of processed data. Total raw data for the complete analysis was about 20Tb.

The rsRNA:target gene interaction was identified using TAREF and Targetscan and was validated by using Argonaute CLIP-sequencing data (14 experimental conditions), CLASH-sequencing data (1 experiment), anti-coexpression using protein (4776 individuals) and RNA expression data (3013 conditions). Form these analyses we have identified several important genes involved in regulation of cancer, pathways related to cancer, growth and development, cell cycle, apoptosis and many many more. The analyses performed in this study shows that these rsRNAs originates from multiple loci including in repetitive elements and intronic regions. Also these novel rsRNAs share high conserved seed region with known mature miRNAs of humans reported in miRBase version 21. Furthermore these rsRNAs were searched in DGCR over-expressed sRNA reads to identify rsRNAs processed by DGCR8 microprocessor complex. A total of 2,999 rsRNAs identified in this study were shown to processed by DGCR8 coplex.

It was believed that miRNA/sRNA biogenesis depends on presence of terminal loop, where mature miRNAs arises from stem region when Dicer cuts the double stranded stem region of stem loop structure with two overhangs. After introduction of small RNA based NGS-sequencing technologies many new miRNAs were reported in miRBase which do not follow the above criteria such as presence of terminal loop region (hsa-miR-181d, hsa-miR-141) absence of pairing mature miRNAs (in about 45% of total mature miRNAs reported (hsa-miR-944, hsa-miR-378d-2), mature miRNAs coming from loop regions instead of stem region (hsa-miR-451a, hsa-miR-7111), rsRNA reads mapping on other region instead of reported known miRNAs (hsa-miR-5680, hsa-miR-5697), mature miRNAs with no overhangs (hsa-miR-7108). These miRNAs suggest that following only the canonical or traditional pathway for miRNA biogenesis has become a bottleneck in identification of miRNAs creating bias towards analysis, and report only those miRNAs which follows the same patterns of already reported miRNAs in miRBase. Small RNAs have been reported as an important regulatory component of genome and considering only those rsRNAs as miRNAs which follows old traditional biogenesis patterns narrows the understanding of molecular mechanisms of cell growth and development.


The Cancer Genomics Atlas (TCGA) in the recent time have been emerged as the prime repository for all cancer related genomics data. TCGA hosts an enormous amount of data for approximately 20 various cancer types, freely available for researchers worldwide. The data includes, RNA-seq based digital gene expression, sRNA reads, protein based expression, Microarray based gene expression, DNA methylation data etc. for all the studied cancer conditions.

Studio of Computational Biology & Bioinformatics,
Biotech Division,
CSIR-Institute of Himalayan Bioresource Technology,
Palampur 176061 (Himachal Pradesh), India
Email: Website:

Facebook LinkedIn Twitter YouTube