After target predictions, small RNAs were found to target 17,612 unqiue genes. For further studies, from the sRNA:target interactions data only those sRNAs were considered which were expressed two fold or above in any state. After applying these filters, a total of 11,234 potential novel regulatory sRNAs were identified. Out of 11,234 regulatory sRNAs, 9,860 regulatory small RNAs displayed higher abundance in cancerous conditions
, whereas 564 rsRNAs showed higher abundance in normal conditions
. The remaining 810 rsRNAs did not show any preference for cancerous or normal states as they were overexpressed in some cancerous as well as normal states.
To identify the possible DGCR8-DROSHA mediated rsRNAs, rsRNAs were scanned across the DGCR8 expressed data and compared against the DGCR8 knockout sRNA sequencing data. The analysis revealed that 2,999 (26.69%) of the novel rsRNAs were identified as those processed by DGCR8.
To identify if these DGCR8 specific rsRNAs exhibited pre-miRNA like typical hairpin loop structure, their genomic sequences with 200 bp flanking regions were scanned using MirEval and RNAfold. It was found that 2,204 rsRNAs precursors
were having hair-pin loop precursor like structure. 1,125 such sRNAs were present almost perfectly within the stem region.
Validation of rsRNA:target inetaractions was done by searching the inetractions in the Argonaute sequencing data downloaded from starBase version 2. A total of 149,344, 150,049, 145,561 and 148,251 novel putative small regulatory RNA:target interactions were identified for 8,036, 6,247, 8,342 and 7,196 unique rsRNAs and 14,265, 12,448, 14,438 and 13,674 unique targeted genes, for AGO1
sequencing data, respectively.
CLASH-seq data was also used for the identification of rsRNA: target interactions. As compared to CLIP-seq, CLASH-seq data gives more specific information about the interactions as the interactions are precisely arrested through ligation of the target site and targeting small RNA. Using CLASH-seq
data, 16,371 unique genes were found being targeted by 10,048 putative rsRNAs, comprising 474,770 unique target interactions.
The list of common and unique genes identified in AGO based HITS-CLIP data and CLASH data.
Further to validate the interactions, expression anti-correlation based validation study was performed using sRNA expression (RPM) and expression of target genes (RPKM) obtained from TCGA and GEO, a total of 3,013 cancerous and normal tissue based experimental conditions were considered. For those cancerous conditon in which RPKM data was not available, microarray absed expression date from TCGA (for 137 patients and normal conditions) were used. A high level of agreement was observed between different methods of validation for the identified rsRNA:target interactions.
In order to find the common targets identified reported by the avarious above mentioned validation approaches i.e. (AGO1-4
, Protein abundance support
, RPKM based
, Microarray based
) a venn representation was done. As apparent here, for most of the interactions two or more methods agreed, strongly suggesting the regulatory existence of these sRNAs.
These novel sRNAs were found regulating many genes including some important genes involved in cacer like BRCA2, p53, Rb, Myc, 14-3-3 epsilon, CycD
, and in general pathways related to cancer
*Red = overexpressed in cancer ; Green = overexpressed in normal.
From the mapped data, it was found that these small regulatory RNAs have multiple biogenesis loci mainly belonging to repetitive elements (48.22%), Intronic region (36.02%) and ncRNA (9.31%) regions. Many of the known miRNAs have been reported from intronic region, in-fact 46.03% (866 out of 1881) miRNAs, in current version of miRBase are from the intronic regions
Several rsRNAs were found originating from the Alu elements. While observing the distribution profile of these sRNAs across the length of Alu consensus, it was observed that, these rsRNAs follow a conserved pattern of biogenesis for a number of different experimental conditions. A Multiple chart
visualization of Alu derived sRNA expression profile variations for the different cancer states and normal tissues has been made available to showcase this behavior of Alu derived sRNAs
. The sRNAs originating from Alu displayed high conservation of profiles across the individuals, which was also different between the normal and cancer samples. A series of t-tests
between cancer v/s normal conditions gave consistently significant p-values (p<0.05) for such observation, suggesting the sRNA profile originating from Alu differ significantly between cancer and normal states
Percentage of reads distribution on rsRNAs originating from repeats were also calculated. This analysis was performed to normalize the rsRNA reads as a given rsRNA from repeats could map across multiple loci. It was found that rsRNAs from ERVL and Alu families were distributed non-randomly across the genome followed by other repeat families
The pathway enrichment analysis
of rsRNAs significantly up-regulated in cancer and originating from Alu regions showed that these rsRNAs target genes were involved in regulation of apoptosis, colorectal cancer, renal cell carcinoma, melanoma, small cell lung cancer, pancreatic cancer and chronic myeloid leukemia pathways. Whereas rsRNAs significantly up-regulated in normal conditions target genes involved in apoptosis, neuroactive ligand receptor interaction, colorectal cancer, renal cell carcinoma, fatty acid biosynthesis, tight junction, glycosaminoglycan degradation, cell cycle.
One such novel small regulatory RNA derived from Alu, identified as "rsRNA-6458-n"
, targets some important genes
like ADAR, ATM, BCL2L1, BIRC2, ERBB3, ITGA2, NRAS, RPS6, SRC, STAT3, SYK, VHL
. One of its important target, SRC
is an important component in signal transduction which is usually found over-expressed in cancer conditions. rsRNA-6458-n
was found exhibiting higher expressions (two fold and above) in the normal tissue for READ, KIRC, LUAD, BRCA, HNSC, STAD, PRAD, LUSC, UCEC, OV
types of cancer states. Consequently, the expression of its target gene,
, exhibited over expression in these cancerous states in terms of RNA abundance (RPKM) as well as in terms of protein abundance.
These identified rsRNAs were clustered using four different methods, namely 1) seed region similarity ( comapred with known miRNA seeds)
2) sRNA length and Argonatue association, 3) All smaller sRNAs covered by a single long sRNA (longest coordinates bound based)
and, 4) Expression based clustering
. These clusterings were done to identify
the rsRNAs sharing similar properties. rsRNAs in a cluster were found targeting same/similar gene, regulating similar pathways and similar functions. Expression based clustering resulted into 362 distinct clusters
which includes 7,292 rsRNAs. These rsRNAs shared high co-expression with each other ("r" > 0.5) calculated for 4,997 experimental conditions.
Clusters identified through the given approaches were analyzed for overlap measure among them. High commonalities between the clusters generated based upon different properties suggest a degree of similarity and relationship between the two properties. A 4X4 matrix
was generated for similarity scoring between the clusters of above mentioned four types. It was found that expression based clusters shared ~82% similarity
with the clusters generated using coordinates bound based clustering, 22% with seed clusters and 0.23% with length based clustering.
Coordinates bound based clusters displayed 77% overlap
with seed based cluster and 0.09% clusters overlap with length based clustering. This analysis suggested that the three clustering methods based on seed region, coordinates bound and expression similarity agreed with each other, where the co-expressed sRNAs were also similar to each other in terms of genomic coordinates, which also shared a good amount of common targets.
Several novel regulatory small RNAs were found significantly differentially expressed between cancerous and normal conditions. The differential expression was evaluated across large number of individual samples, followed by t-test for significance between normal sample sets and cancer sample sets, for every cancer condition.
Pathway enrichment analysis of the rsRNAs up-regulated in cancer shows that the rsRNAs target genes were enriched for apoptosis, renal cell carcinoma, role of brca1 brca2 and atr in cancer susceptibility, caspase cascade in apoptosis and EGFR1 Signaling Pathway(Mus musculus)
. Whereas, rsRNAs up-regulated
in normal tissues were found targeting genes involved in induction of apoptosis through dr3 and dr4/5 death receptors, vegf hypoxia and angiogenesis,
hedgehog signaling pathway, jak stat signaling pathway and Signaling Pathways in Glioblastoma(Homo sapiens)
An interesting case is rsRNA 9881
This small regulatory RNA was found overexpressed in almost all cancer states studied here, suggesting about some central points being affected by this regulatory small RNA.
There were 20 different target genes which were found strongly negatively correlated to its expression.
A closer analysis revealed that the target genes were enriched for pathways critical for cell development and cancer, at the interfaces of diverse pathways (apoptosis, cell death,
p53 signaling, hiv-1 nef, caspase cascade, TLR, TNFR-1 signaling and FAS pathway), reasoning why the regulatory small RNA 9881
was found abundant in most of the studied cancer conditions. The figure below shows how its target genes are interlinked and positioned, making it critical for cancer conditions. Adiopogenesis was the common factor found most affected.