Scientific data explorer

Microglial Enhancers &
Alzheimer's Risk

Interactive analysis of GWAS loci, single-nucleus RNA-seq reanalysis, and EWCE enrichment — with experimental aims for IRF1 CETCh-seq.

Murphy et al. 2023 Jansen et al. 2019 Skene & Grant 2016 PhD Project: IRF1 CETCh-seq
GWAS risk loci (Jansen 2019)
29
N = 455,258 individuals; 71,880 AD cases
Microglial loci
~17
Nearest gene primarily expressed in microglia
True pseudobulk DEGs (Murphy 2023)
26
Down from 23,923 with pseudoreplication correction
DEGs in microglia
25/26
96% of all robust DEGs from snRNA-seq reanalysis
EWCE ζ — AD microglia
4.1 SD
p < 0.00001; strongest signal across all disorders
GWAS genetic correlation (proxy)
rg = 0.81
AD vs AD-by-proxy (Jansen et al. 2019)

Scientific rationale

Converging evidence from GWAS, single-cell transcriptomics and EWCE analysis consistently implicates microglia as the primary cell type mediating Alzheimer's genetic risk. The 29 Jansen et al. loci include canonical microglial genes (TREM2, INPP5D, CD33, MEF2C, MS4A family) and the Murphy et al. pseudobulk reanalysis demonstrates that 96% of robust gene expression changes in AD brains occur in microglia.

This PhD project addresses the next outstanding question: which specific microglial enhancers are activated in the disease-associated state, and where does IRF1 act to drive this transition? Standard ChIP-seq approaches are confounded by the lack of reliable anti-IRF1 antibodies for ChIP, motivating a Prime Editing + CETCh-seq strategy.

Key papers

  • Murphy et al. (2023) eLife 12:e90214
    "Avoiding false discoveries in single-cell RNA-seq by revisiting the first Alzheimer's disease dataset"
    elifesciences.org ↗
  • Jansen et al. (2019) Nature Genetics 51, 404–413
    "Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer's disease risk"
    nature.com ↗ · Summary stats ↗
  • Skene & Grant (2016) Front. Neurosci. 10:16
    "Identification of Vulnerable Cell Types in Major Brain Disorders Using Single Cell Transcriptomes and EWCE"
    frontiersin.org ↗
  • Mathys et al. (2019) Nature 570, 332–337
    Original snRNA-seq dataset (source data for reanalysis)
    Synapse: syn18485175 ↗

GWAS — 29 Alzheimer's Risk Loci

Jansen et al. 2019 meta-analysis: N=455,258 (71,880 cases / 383,378 controls). Orange = microglial gene. Hover for SNP ID, OR, and p-value. APOE capped at -log₁₀p = 60 for visualisation.

Manhattan Plot — Jansen et al. 2019 (29 sentinel loci)

Full loci table

GeneChrLead SNP OR-log₁₀pMicroglial

snRNA-seq Reanalysis — Murphy et al. 2023

Pseudobulk reanalysis of Mathys 2019 (Synapse syn18485175) using edgeR. Pseudoreplication in the original analysis inflated DEG counts ~920-fold. 26 robust DEGs remain; 25 are microglial.

Volcano Plot — Pseudobulk DEGs (Table 2)
DEG Count Comparison by Method
Cell Counts Across QC Filters
Pseudoreplication: Cell Count vs DEG Count (r ≈ 0.99)

DEG table (pseudobulk, Table 2)

GeneCell TypelogFC logCPMFDRDirection

EWCE — Cell-type Enrichment of Disease Risk Genes

Expression-Weighted Cell Type Enrichment (Skene & Grant 2016). Scores (ζ in SD above random bootstrap expectation) computed from Zeisel 2015 mouse brain single-cell transcriptomes. AD risk gene set: 19 genes from Skene & Grant Table S1.

EWCE Heatmap — 7 brain disorders × 8 cell types

AD risk gene set (n = 19)

APOE, CLU, CR1, PICALM, BIN1, ABCA7, MS4A6A, CD33, EPHA1, CD2AP, INPP5D, MEF2C, ZCWPW1, CELF1, NME8, SLC24A4, CASS4, FERMT2, HLA

Skene & Grant Table S1 Pre-Jansen 2019 gene set

EWCE method

Bootstrap-based test. Disease genes are mapped to mouse orthologues using BioMart. Specificity scores per gene–cell-type pair computed from Zeisel 2015 scRNA-seq. 10,000 bootstrap resamples of random gene sets of matched size. ζ = (observed − mean(null)) / SD(null).

library(EWCE)
results <- bootstrap.enrichment.test(
  sct_data = ctd,
  hits = AD_genes,
  bg = background_genes,
  reps = 10000,
  annotLevel = 1
)

Experimental Aims & Data Hub

IRF1/IRF8 CETCh-seq across microglial states. Prime editing tagging strategy validated in HCT116, transferred to BV2 and KOLF2.1 iMG. Integration with human AD GWAS via SLDP regression and fine-mapping.

AimYearSystemKey Output StatusData
Aim 1: Establish Prime Editing + CETCh-seq for IRF1/IRF8 Year 1 HCT116 Validated FLAG-tagged IRF1/IRF8 cell lines; ChIP-seq pilot In progress Pending
Aim 2: Transfer to microglial model systems Year 2 BV2, KOLF2.1 iMG Edited microglia in homeostatic, IFN-γ activated, and DAM-like states Upcoming Pending
Aim 3: Genome-wide IRF1 & IRF8 binding maps Years 2–3 BV2, iMG Anti-FLAG ChIP-seq peaks; motif enrichment; state-differential binding sites Planned
Aim 4: SLDP regression & fine-mapping integration Years 3–3.5 In silico Heritability enrichment in IRF1-bound enhancers; candidate drug targets Planned

Aim 1 — Technical Detail

Prime editing guide RNA design

pegRNA designed to insert 3×FLAG epitope (DYKDDDDK × 3) into the C-terminus of IRF1 and IRF8 coding sequence. PE3 strategy with nicking sgRNA 40–90 bp downstream to improve efficiency. Spacer sequences target exon 9 of IRF1 (NM_002198) and exon 9 of IRF8 (NM_002163).

# Spacer sequence (IRF1 C-term)
5'-GCAGTGGCTCAGCGGCAGCC-3'

# pegRNA extension (FLAG insert)
5'-...GATTACAAAGACCATGACGGTGATTATAAAGATCATGACATCGATTATAAAG...-3'

Validation pipeline

Editing efficiency assessed by: (1) genomic PCR + Sanger sequencing, (2) Western blot with anti-FLAG M2 antibody (Sigma F1804), (3) IF co-localisation of FLAG signal with endogenous IRF1, (4) CUT&RUN pilot with anti-FLAG to confirm ChIP-able signal.

Aim 4 — SLDP Framework

Stratified LD Score & SLDP

IRF1 ChIP-seq peaks will be converted to binary annotations (± 500 bp from peak summit). LDSC S-LDSC will estimate heritability enrichment of Jansen 2019 AD GWAS in IRF1-bound enhancer regions. SLDP regression will test whether SNPs in IRF1 binding sites have directionally consistent effects on AD risk.

# S-LDSC command outline
python ldsc.py \
  --h2 AD_sumstats_Jansen2019.txt.gz \
  --ref-ld-chr IRF1_peaks.,baselineLD. \
  --w-ld-chr weights. \
  --overlap-annot \
  --print-coefficients \
  --out IRF1_AD_enrichment

Fine-mapping integration

SUSIE fine-mapping credible sets from Jansen 2019 GWAS will be intersected with state-differential IRF1 peaks to identify causal variants within IRF1-bound enhancers. Variant-to-gene links via Hi-C and eQTL colocalization (coloc) in microglia.


Upcoming data placeholders

The following chart containers will be populated with real data as the PhD progresses. They are designed to accept ChIP-seq peak BED files and enrichment outputs in JSON format.

📊
IRF1 ChIP-seq peak tracks
Homeostatic · Activated · DAM-like
Awaiting Aim 2 data
📉
SLDP regression output
Heritability enrichment in IRF1 peaks
Awaiting Aim 3 data

Methods Summary

Technical details for the computational and experimental approaches used in this project.

snRNA-seq reanalysis (Murphy 2023)

Raw count matrices from Mathys 2019 (Synapse syn18485175) and reprocessed matrices (Synapse syn51758062) were processed using scFlow QC pipeline. Cell type annotation via ewceData reference. Differential expression by pseudobulk with edgeR (quasi-likelihood F-test). Pseudoreplication artefact demonstrated by correlation between per-donor cell count and naive DEG count (r = 0.99).

scFlow edgeR R 4.2

GWAS meta-analysis (Jansen 2019)

Fixed-effects meta-analysis across 4 cohorts (IGAP, UK Biobank, FinnGen proxy, ADSP). LD score regression for heritability and genetic correlation. Conditional analysis with GCTA-COJO for independent loci. Gene-set analysis via MAGMA. Mendelian randomization with two-sample MR.

METAL LDSC MAGMA GCTA

Prime Editing strategy

PE3 system (PE2 nickase + pegRNA + nicking sgRNA). pegRNAs synthesised as IVT or chemically modified (Altogen). Delivery by RNP electroporation (Lonza 4D-Nucleofector). Selection by puromycin resistance cassette flanked by loxP for subsequent Cre excision. Efficiency target: ≥15% allelic editing without enrichment.

PE3 3×FLAG HCT116 → BV2 → iMG

Anti-FLAG ChIP-seq (CETCh-seq)

Anti-FLAG M2 magnetic beads (Sigma M8823). 5–10M cells per ChIP. Library prep: NEBNext Ultra II. Sequencing: 50 bp SE, ~40M reads. Peak calling: MACS3 (q < 0.05). IDR for replicate concordance. Motif enrichment: HOMER findMotifsGenome.pl vs. repeat-masked mm39/hg38.

MACS3 HOMER deepTools IDR

Software environment

# R packages
library(edgeR) # v3.40 — pseudobulk DE
library(EWCE) # v1.8 — cell type enrichment
library(Seurat) # v4.3 — scRNA-seq processing
library(scFlow) # v0.7 — QC pipeline
library(coloc) # v5.2 — eQTL colocalization
library(susieR) # v0.12 — fine-mapping

# Python / CLI
macs3 # v3.0 — peak calling
ldsc # v1.0 — LD score regression
sldp # v1.0 — signed LD profile regression
bowtie2 # v2.5 — read alignment
samtools # v1.17 — BAM processing
deeptools # v3.5 — ChIP QC & bigwig