Somatic mutations drive diverse diseases beyond cancer, including neurological, age-related, and autoimmune disorders. However, detecting these mutations remains challenging as they often occur in small cell subsets within heterogeneous samples, rendering bulk sequencing and conventional variant-calling methods ineffective. While single-cell DNA sequencing (scDNA-seq) technologies address these limitations, analytical tools for interpreting scDNA-seq data remain critically underdeveloped.
We developed SCARCE (Single-Cell Analysis of Rare Clonal Events), an R package that statistically prioritises rare somatic variants in defined cell populations from scDNA-seq experiments. SCARCE integrates cell surface marker protein data to identify biologically meaningful subpopulations or, when unavailable, performs unsupervised variant-based clustering to identify genetically distinct cell populations. Using a one-versus-all statistical approach, SCARCE calculates odds ratios and Z-scores to prioritise enriched variants within each population. The package generates intuitive visualisations including heatmaps, UMAP plots, and Sankey plots to facilitate biological interpretation.
Benchmarking against existing tools demonstrated SCARCE's superior sensitivity being the only tool to detect a known pathogenic variant in a cell line comprising just 0.06% of total cells (10/16,316). In two previous studies, SCARCE correctly prioritises cell-type specific pre-lymphoma somatic drivers in celiac disease1 and HepC-induced cryovasculitis2. The framework supports multiple scDNA-seq platforms including Mission Bio Tapestri targeted sequencing and whole-genome amplified single-cell data.
SCARCE provides a unified computational framework for identifying, prioritising, and visualising somatic variants in heterogeneous single-cell datasets. By enabling detection of ultra-rare mutational events, SCARCE advances our understanding of clonal mosaicism underlying complex diseases and opens new avenues for precision medicine in non-malignant diseases driven by somatic mutations.