The dysregulation of splicing can drive disease and is a hallmark of cancer arising from both specific genetic mutations and altered splicing machinery. The discovery of novel splice junctions can lead to new knowledge and treatments, yet the mere absence from a reference annotation does not ensure a splice junction is truly novel. Efficiently querying publicly available sequencing data for evidence of rare splice junctions is challenging due to the cost of processing raw data and the substantial technical and biological variability across samples.
LemonSplice overcomes this challenge by leveraging precomputed RNA-seq alignment and quantification of genes across tens of thousands of experiments, processed using a common pipeline [1,2]. The pre-processed data from databases such as the Sequence Read Archive (SRA), Genotype-Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA) includes exon and junction information including unannotated junctions, allowing exploration, validation and quantification of transcripts across hundreds of thousands of samples.
LemonSplice is an interactive application that can rapidly filter evidence for disease-specific and cancer-specific splice junctions of interest, and produce publication ready visualisations of the frequency of junctions across databases. Metadata associated with samples can link junction observations to the literature from where it was found.
We applied LemonSplice to several cancer splicing problems. These include an analysis of glioma-associated oncogene homolog 1 (GLI1) transcripts that indicates a lack of a support for a commonly accepted cancer-specific isoform, and implicated unannotated transcripts as well-supported alternatives. We also used LemonSplice to investigate deletions in IKZF1 and discovered and validated cryptic acceptor sites downstream of partially deleted genes, seen in leukaemia samples. By directly querying reads that support alternative splicing across thousands of experiments, LemonSplice can be used to gain insight into a range of rare splicing events, and determine just how common novel splice junctions might be.