Intronic GGGGCC (G4C2) hexanucleotide repeat expansions in C9orf72 are the most common genetic cause of amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD). Despite its intronic location, this repeat supports synthesis of pathogenic dipeptide repeat (DPR) proteins via repeat-associated non-AUG (RAN) translation that contribute to neurodegeneration. How this repeat engages with ribosomes and the endogenous template for RAN translation in patients remains unclear. Here we used a long-read based 5' RNA ligase-mediated rapid amplification of cDNA ends (5’ Repeat-RLM-RACE) strategy to identify novel C9orf72 transcripts initiating within intron 1 in a C9BAC mouse model, patient-derived iNeurons, and iNeuron-derived polysomes. These cryptic m7G-capped mRNAs are polyadenylated, are more abundant than transcripts derived from intron retention or circular intron lariats and are more efficient templates for RAN translation.
To understand the mechanism underlying RAN translation initiation, we used cryo-electron microscopy to determine the structure of a human late-stage C9orf72 translation initiation complex at 3.2 Å resolution. RAN translation from linear C9orf72 mRNA templates initiates at a near-cognate CUG codon located 24 nucleotides upstream of the G4C2 repeat that is flanked by a strong endogenous Kozak sequence that contributes to its use. The structure further reveals a direct interaction between the G4C2 repeat and 18S ribosomal RNA expansion segment 9 (ES9S) on the 40S small ribosomal subunit through a kissing-loop interaction. Preventing this RNA-RNA interaction using complementary antisense oligonucleotides markedly reduces RAN translation, indicating that this structural interaction is functionally important.
Together these findings provide mechanistic insights into RAN translation initiation and highlight potential therapeutic approaches to mitigate toxic DPR production in C9orf72-associated neurodegenerative disease.