Scientist View — Microglial Enhancers & Alzheimer's Risk

GWAS risk loci (Jansen 2019)

N = 455,258 individuals; 71,880 AD cases

Microglial loci

~17

Nearest gene primarily expressed in microglia

True pseudobulk DEGs (Murphy 2023)

Down from 23,923 with pseudoreplication correction

DEGs in microglia

25/26

96% of all robust DEGs from snRNA-seq reanalysis

EWCE ζ — AD microglia

4.1 SD

p < 0.00001; strongest signal across all disorders

GWAS genetic correlation (proxy)

r_g = 0.81

AD vs AD-by-proxy (Jansen et al. 2019)

Scientific rationale

Converging evidence from GWAS, single-cell transcriptomics and EWCE analysis consistently implicates microglia as the primary cell type mediating Alzheimer's genetic risk. The 29 Jansen et al. loci include canonical microglial genes (TREM2, INPP5D, CD33, MEF2C, MS4A family) and the Murphy et al. pseudobulk reanalysis demonstrates that 96% of robust gene expression changes in AD brains occur in microglia.

This PhD project addresses the next outstanding question: which specific microglial enhancers are activated in the disease-associated state, and where does IRF1 act to drive this transition? Standard ChIP-seq approaches are confounded by the lack of reliable anti-IRF1 antibodies for ChIP, motivating a Prime Editing + CETCh-seq strategy.

↓ GWAS loci JSON ↓ DEG data JSON ↓ EWCE scores JSON ↓ QC stats JSON

Key papers

Murphy et al. (2023) eLife 12:e90214
"Avoiding false discoveries in single-cell RNA-seq by revisiting the first Alzheimer's disease dataset"
elifesciences.org ↗
Jansen et al. (2019) Nature Genetics 51, 404–413
"Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer's disease risk"
nature.com ↗ · Summary stats ↗
Skene & Grant (2016) Front. Neurosci. 10:16
"Identification of Vulnerable Cell Types in Major Brain Disorders Using Single Cell Transcriptomes and EWCE"
frontiersin.org ↗
Mathys et al. (2019) Nature 570, 332–337
Original snRNA-seq dataset (source data for reanalysis)
Synapse: syn18485175 ↗

GWAS — 29 Alzheimer's Risk Loci

Jansen et al. 2019 meta-analysis: N=455,258 (71,880 cases / 383,378 controls). Orange = microglial gene. Hover for SNP ID, OR, and p-value. APOE capped at -log₁₀p = 60 for visualisation.

Manhattan Plot — Jansen et al. 2019 (29 sentinel loci)

Full loci table

Gene	Chr	Lead SNP	OR	-log₁₀p	Microglial

↓ Download JSON ↗ Full summary stats (~500MB)

snRNA-seq Reanalysis — Murphy et al. 2023

Pseudobulk reanalysis of Mathys 2019 (Synapse syn18485175) using edgeR. Pseudoreplication in the original analysis inflated DEG counts ~920-fold. 26 robust DEGs remain; 25 are microglial.

Volcano Plot — Pseudobulk DEGs (Table 2)

DEG Count Comparison by Method

Cell Counts Across QC Filters

Pseudoreplication: Cell Count vs DEG Count (r ≈ 0.99)

DEG table (pseudobulk, Table 2)

Gene	Cell Type	logFC	logCPM	FDR	Direction

↓ DEG JSON ↗ GitHub (scripts) ↗ SCE object (Figshare)

EWCE — Cell-type Enrichment of Disease Risk Genes

Expression-Weighted Cell Type Enrichment (Skene & Grant 2016). Scores (ζ in SD above random bootstrap expectation) computed from Zeisel 2015 mouse brain single-cell transcriptomes. AD risk gene set: 19 genes from Skene & Grant Table S1.

EWCE Heatmap — 7 brain disorders × 8 cell types

AD risk gene set (n = 19)

APOE, CLU, CR1, PICALM, BIN1, ABCA7, MS4A6A, CD33, EPHA1, CD2AP, INPP5D, MEF2C, ZCWPW1, CELF1, NME8, SLC24A4, CASS4, FERMT2, HLA

Skene & Grant Table S1 Pre-Jansen 2019 gene set

EWCE method

Bootstrap-based test. Disease genes are mapped to mouse orthologues using BioMart. Specificity scores per gene–cell-type pair computed from Zeisel 2015 scRNA-seq. 10,000 bootstrap resamples of random gene sets of matched size. ζ = (observed − mean(null)) / SD(null).

library(EWCE)

results <- bootstrap.enrichment.test(

  sct_data = ctd,

  hits = AD_genes,

  bg = background_genes,

  reps = 10000,

  annotLevel = 1

)

↓ EWCE JSON ↗ Skene & Grant 2016 ↗ EWCE Bioconductor

Experimental Aims & Data Hub

IRF1/IRF8 CETCh-seq across microglial states. Prime editing tagging strategy validated in HCT116, transferred to BV2 and KOLF2.1 iMG. Integration with human AD GWAS via SLDP regression and fine-mapping.

Aim	Year	System	Key Output	Status	Data
Aim 1: Establish Prime Editing + CETCh-seq for IRF1/IRF8	Year 1	HCT116	Validated FLAG-tagged IRF1/IRF8 cell lines; ChIP-seq pilot	In progress	Pending
Aim 2: Transfer to microglial model systems	Year 2	BV2, KOLF2.1 iMG	Edited microglia in homeostatic, IFN-γ activated, and DAM-like states	Upcoming	Pending
Aim 3: Genome-wide IRF1 & IRF8 binding maps	Years 2–3	BV2, iMG	Anti-FLAG ChIP-seq peaks; motif enrichment; state-differential binding sites	Planned	ChIP-seq tracks (placeholder)
Aim 4: SLDP regression & fine-mapping integration	Years 3–3.5	In silico	Heritability enrichment in IRF1-bound enhancers; candidate drug targets	Planned	SLDP outputs (placeholder)

Aim 1 — Technical Detail

Prime editing guide RNA design

pegRNA designed to insert 3×FLAG epitope (DYKDDDDK × 3) into the C-terminus of IRF1 and IRF8 coding sequence. PE3 strategy with nicking sgRNA 40–90 bp downstream to improve efficiency. Spacer sequences target exon 9 of IRF1 (NM_002198) and exon 9 of IRF8 (NM_002163).

# Spacer sequence (IRF1 C-term)

5'-GCAGTGGCTCAGCGGCAGCC-3'

# pegRNA extension (FLAG insert)

5'-...GATTACAAAGACCATGACGGTGATTATAAAGATCATGACATCGATTATAAAG...-3'

Validation pipeline

Editing efficiency assessed by: (1) genomic PCR + Sanger sequencing, (2) Western blot with anti-FLAG M2 antibody (Sigma F1804), (3) IF co-localisation of FLAG signal with endogenous IRF1, (4) CUT&RUN pilot with anti-FLAG to confirm ChIP-able signal.

Aim 4 — SLDP Framework

Stratified LD Score & SLDP

IRF1 ChIP-seq peaks will be converted to binary annotations (± 500 bp from peak summit). LDSC S-LDSC will estimate heritability enrichment of Jansen 2019 AD GWAS in IRF1-bound enhancer regions. SLDP regression will test whether SNPs in IRF1 binding sites have directionally consistent effects on AD risk.

# S-LDSC command outline

python ldsc.py \

  --h2 AD_sumstats_Jansen2019.txt.gz \

  --ref-ld-chr IRF1_peaks.,baselineLD. \

  --w-ld-chr weights. \

  --overlap-annot \

  --print-coefficients \

  --out IRF1_AD_enrichment

Fine-mapping integration

SUSIE fine-mapping credible sets from Jansen 2019 GWAS will be intersected with state-differential IRF1 peaks to identify causal variants within IRF1-bound enhancers. Variant-to-gene links via Hi-C and eQTL colocalization (coloc) in microglia.

Upcoming data placeholders

The following chart containers will be populated with real data as the PhD progresses. They are designed to accept ChIP-seq peak BED files and enrichment outputs in JSON format.

📊

IRF1 ChIP-seq peak tracks

Homeostatic · Activated · DAM-like

Awaiting Aim 2 data

📉

SLDP regression output

Heritability enrichment in IRF1 peaks

Awaiting Aim 3 data

Methods Summary

Technical details for the computational and experimental approaches used in this project.

snRNA-seq reanalysis (Murphy 2023)

Raw count matrices from Mathys 2019 (Synapse syn18485175) and reprocessed matrices (Synapse syn51758062) were processed using scFlow QC pipeline. Cell type annotation via ewceData reference. Differential expression by pseudobulk with edgeR (quasi-likelihood F-test). Pseudoreplication artefact demonstrated by correlation between per-donor cell count and naive DEG count (r = 0.99).

scFlow edgeR R 4.2

GWAS meta-analysis (Jansen 2019)

Fixed-effects meta-analysis across 4 cohorts (IGAP, UK Biobank, FinnGen proxy, ADSP). LD score regression for heritability and genetic correlation. Conditional analysis with GCTA-COJO for independent loci. Gene-set analysis via MAGMA. Mendelian randomization with two-sample MR.

METAL LDSC MAGMA GCTA

Prime Editing strategy

PE3 system (PE2 nickase + pegRNA + nicking sgRNA). pegRNAs synthesised as IVT or chemically modified (Altogen). Delivery by RNP electroporation (Lonza 4D-Nucleofector). Selection by puromycin resistance cassette flanked by loxP for subsequent Cre excision. Efficiency target: ≥15% allelic editing without enrichment.

PE3 3×FLAG HCT116 → BV2 → iMG

Anti-FLAG ChIP-seq (CETCh-seq)

Anti-FLAG M2 magnetic beads (Sigma M8823). 5–10M cells per ChIP. Library prep: NEBNext Ultra II. Sequencing: 50 bp SE, ~40M reads. Peak calling: MACS3 (q < 0.05). IDR for replicate concordance. Motif enrichment: HOMER findMotifsGenome.pl vs. repeat-masked mm39/hg38.

MACS3 HOMER deepTools IDR

Software environment

# R packages

library(edgeR)      # v3.40 — pseudobulk DE

library(EWCE)       # v1.8  — cell type enrichment

library(Seurat)     # v4.3  — scRNA-seq processing

library(scFlow)     # v0.7  — QC pipeline

library(coloc)      # v5.2  — eQTL colocalization

library(susieR)     # v0.12 — fine-mapping

# Python / CLI

macs3               # v3.0  — peak calling

ldsc                # v1.0  — LD score regression

sldp                # v1.0  — signed LD profile regression

bowtie2             # v2.5  — read alignment

samtools            # v1.17 — BAM processing

deeptools           # v3.5  — ChIP QC & bigwig

Microglial Enhancers &Alzheimer's Risk