Somatic variants are increasingly recognised as contributors to neurodegenerative disease, yet accurate detection in post-mortem brain tissue remains challenging. In amyotrophic lateral sclerosis (ALS), where ~90 % of cases are sporadic, recent evidence suggests that low-frequency, brain-specific somatic mutations may initiate focal neurodegeneration. Detecting somatic variants requires specialised workflows to distinguish true somatic variants from false positives in high-depth (250X) whole-genome sequencing (WGS) data. However, many somatic variant detection tools, developed for cancer genomics, are not optimised for these requirements.
We therefore developed a reproducible Somatic Variant Detection (SVD) pipeline implemented in Nextflow for accurate identification of low-frequency somatic variants from high-depth (250X) WGS of brain tissue. The pipeline uses modular workflows for quality control, variant calling, filtering, and annotation within a fully containerised environment for scalability across HPC systems. Two analytical branches were implemented in the pipeline: a matched-tissue workflow that uses affected (brain) and unaffected (blood) samples from the same individual, and a single-tissue workflow optimised for studies without a matched sample. A technically matched Panel-of-Normals (PONs) variant catalogue, constructed from 143 ALS whole-genome datasets (50X coverage), was incorporated to remove recurrent technical artefacts.
Pipeline performance was evaluated using simulated data in which ~5,000 variants were spiked into 250X WGS brain sequences at variant allele frequencies (VAF) of 10%, 5%, 3%, 2%, and 1%. Across both analysis branches, the SVD pipeline achieved high precision and recall, confidently detecting somatic variants down to a 3% VAF threshold.
This work establishes a reproducible and scalable framework for somatic variant discovery in high-depth WGS data. The SVD pipeline enables identification of brain-specific genomic variation in ALS and related neurodegenerative disorders.