Oral Presentation 47th Lorne Genome Conference 2026

Pre-ranked GSEA is hugely anticonservative but can be fixed. (133160)

Waruni Abeysekera 1 2 , Gordon K Smyth 1 3
  1. Walter and Eliza Hall Institute of Medical Research, Melbourne, VICTORIA, Australia
  2. Department of Medical Biology, The University of Melbourne, Parkville, VICTORIA, Australia
  3. School of Mathematics and Statistics, The University of Melbourne, Parkville, VICTORIA, Australia

Gene set enrichment analysis (GSEA) is one of the most commonly used methods in omics research. It is used routinely to identify molecular pathways, expression signatures or biological processes associated with diseases, cell types or experimental conditions. The original GSEA paper (Subramanian et al 2005) has been cited >50,000 times. The original GSEA method, developed as a Java application by the Broad Institute, used sample permutation to compute p-values. The approach later evolved into the more efficient and much more widely adopted pre-ranked GSEA, implemented in R as fgsea, which relies on gene permutation. We show however that pre-ranked GSEA is highly sensitive to inter-gene correlations, as gene permutation disrupts the natural correlation structure among genes. Such correlations are typically strong within biologically defined gene sets representing molecular pathways, causing pre-ranked GSEA to be spectacularly anti-conservative with very high false discovery rates. The CAMERA method, implemented in the limma package, addresses this issue by empirically estimating inter-gene correlations and adjusting the gene set test statistic accordingly.  While CAMERA typically operates on raw expression data to estimate inter-gene correlations directly, we also introduce a pre-ranked version that serves as a direct replacement for fgsea. Unlike fgsea, the pre-ranked CAMERA retains the ability to account for inter-gene correlation, either through user specified values or a default setting. Using simulated data, we demonstrate that CAMERA accurately controls the type I error rate even under high inter-gene correlation, while maintaining excellent power to detect genuine differential expression compared with fgsea. Similar to fgsea, CAMERA can be applied to predefined gene sets or used as a pathway analysis tool with molecular signature databases. Through an example analysis of breast cancer subtypes, we show that CAMERA yields more biologically meaningful and interpretable results than pre-ranked GSEA.