Scientific rationale
Converging evidence from GWAS, single-cell transcriptomics and EWCE analysis consistently implicates microglia as the primary cell type mediating Alzheimer's genetic risk. The 29 Jansen et al. loci include canonical microglial genes (TREM2, INPP5D, CD33, MEF2C, MS4A family) and the Murphy et al. pseudobulk reanalysis demonstrates that 96% of robust gene expression changes in AD brains occur in microglia.
This PhD project addresses the next outstanding question: which specific microglial enhancers are activated in the disease-associated state, and where does IRF1 act to drive this transition? Standard ChIP-seq approaches are confounded by the lack of reliable anti-IRF1 antibodies for ChIP, motivating a Prime Editing + CETCh-seq strategy.
Key papers
-
Murphy et al. (2023) eLife 12:e90214
"Avoiding false discoveries in single-cell RNA-seq by revisiting the first Alzheimer's disease dataset"
elifesciences.org ↗ -
Jansen et al. (2019) Nature Genetics 51, 404–413
"Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer's disease risk"
nature.com ↗ · Summary stats ↗ -
Skene & Grant (2016) Front. Neurosci. 10:16
"Identification of Vulnerable Cell Types in Major Brain Disorders Using Single Cell Transcriptomes and EWCE"
frontiersin.org ↗ -
Mathys et al. (2019) Nature 570, 332–337
Original snRNA-seq dataset (source data for reanalysis)
Synapse: syn18485175 ↗
GWAS — 29 Alzheimer's Risk Loci
Jansen et al. 2019 meta-analysis: N=455,258 (71,880 cases / 383,378 controls). Orange = microglial gene. Hover for SNP ID, OR, and p-value. APOE capped at -log₁₀p = 60 for visualisation.
Full loci table
| Gene | Chr | Lead SNP | OR | -log₁₀p | Microglial |
|---|
snRNA-seq Reanalysis — Murphy et al. 2023
Pseudobulk reanalysis of Mathys 2019 (Synapse syn18485175) using edgeR. Pseudoreplication in the original analysis inflated DEG counts ~920-fold. 26 robust DEGs remain; 25 are microglial.
DEG table (pseudobulk, Table 2)
| Gene | Cell Type | logFC | logCPM | FDR | Direction |
|---|
EWCE — Cell-type Enrichment of Disease Risk Genes
Expression-Weighted Cell Type Enrichment (Skene & Grant 2016). Scores (ζ in SD above random bootstrap expectation) computed from Zeisel 2015 mouse brain single-cell transcriptomes. AD risk gene set: 19 genes from Skene & Grant Table S1.
AD risk gene set (n = 19)
APOE, CLU, CR1, PICALM, BIN1, ABCA7, MS4A6A, CD33, EPHA1, CD2AP, INPP5D, MEF2C, ZCWPW1, CELF1, NME8, SLC24A4, CASS4, FERMT2, HLA
EWCE method
Bootstrap-based test. Disease genes are mapped to mouse orthologues using BioMart. Specificity scores per gene–cell-type pair computed from Zeisel 2015 scRNA-seq. 10,000 bootstrap resamples of random gene sets of matched size. ζ = (observed − mean(null)) / SD(null).
results <- bootstrap.enrichment.test(
sct_data = ctd,
hits = AD_genes,
bg = background_genes,
reps = 10000,
annotLevel = 1
)
Experimental Aims & Data Hub
IRF1/IRF8 CETCh-seq across microglial states. Prime editing tagging strategy validated in HCT116, transferred to BV2 and KOLF2.1 iMG. Integration with human AD GWAS via SLDP regression and fine-mapping.
| Aim | Year | System | Key Output | Status | Data |
|---|---|---|---|---|---|
| Aim 1: Establish Prime Editing + CETCh-seq for IRF1/IRF8 | Year 1 | HCT116 | Validated FLAG-tagged IRF1/IRF8 cell lines; ChIP-seq pilot | In progress | Pending |
| Aim 2: Transfer to microglial model systems | Year 2 | BV2, KOLF2.1 iMG | Edited microglia in homeostatic, IFN-γ activated, and DAM-like states | Upcoming | Pending |
| Aim 3: Genome-wide IRF1 & IRF8 binding maps | Years 2–3 | BV2, iMG | Anti-FLAG ChIP-seq peaks; motif enrichment; state-differential binding sites | Planned |
ChIP-seq tracks (placeholder)
|
| Aim 4: SLDP regression & fine-mapping integration | Years 3–3.5 | In silico | Heritability enrichment in IRF1-bound enhancers; candidate drug targets | Planned |
SLDP outputs (placeholder)
|
Aim 1 — Technical Detail
Prime editing guide RNA design
pegRNA designed to insert 3×FLAG epitope (DYKDDDDK × 3) into the C-terminus of IRF1 and IRF8 coding sequence. PE3 strategy with nicking sgRNA 40–90 bp downstream to improve efficiency. Spacer sequences target exon 9 of IRF1 (NM_002198) and exon 9 of IRF8 (NM_002163).
5'-GCAGTGGCTCAGCGGCAGCC-3'
# pegRNA extension (FLAG insert)
5'-...GATTACAAAGACCATGACGGTGATTATAAAGATCATGACATCGATTATAAAG...-3'
Validation pipeline
Editing efficiency assessed by: (1) genomic PCR + Sanger sequencing, (2) Western blot with anti-FLAG M2 antibody (Sigma F1804), (3) IF co-localisation of FLAG signal with endogenous IRF1, (4) CUT&RUN pilot with anti-FLAG to confirm ChIP-able signal.
Aim 4 — SLDP Framework
Stratified LD Score & SLDP
IRF1 ChIP-seq peaks will be converted to binary annotations (± 500 bp from peak summit). LDSC S-LDSC will estimate heritability enrichment of Jansen 2019 AD GWAS in IRF1-bound enhancer regions. SLDP regression will test whether SNPs in IRF1 binding sites have directionally consistent effects on AD risk.
python ldsc.py \
--h2 AD_sumstats_Jansen2019.txt.gz \
--ref-ld-chr IRF1_peaks.,baselineLD. \
--w-ld-chr weights. \
--overlap-annot \
--print-coefficients \
--out IRF1_AD_enrichment
Fine-mapping integration
SUSIE fine-mapping credible sets from Jansen 2019 GWAS will be intersected with state-differential IRF1 peaks to identify causal variants within IRF1-bound enhancers. Variant-to-gene links via Hi-C and eQTL colocalization (coloc) in microglia.
Upcoming data placeholders
The following chart containers will be populated with real data as the PhD progresses. They are designed to accept ChIP-seq peak BED files and enrichment outputs in JSON format.
Methods Summary
Technical details for the computational and experimental approaches used in this project.
snRNA-seq reanalysis (Murphy 2023)
Raw count matrices from Mathys 2019 (Synapse syn18485175) and reprocessed matrices (Synapse syn51758062) were processed using scFlow QC pipeline. Cell type annotation via ewceData reference. Differential expression by pseudobulk with edgeR (quasi-likelihood F-test). Pseudoreplication artefact demonstrated by correlation between per-donor cell count and naive DEG count (r = 0.99).
GWAS meta-analysis (Jansen 2019)
Fixed-effects meta-analysis across 4 cohorts (IGAP, UK Biobank, FinnGen proxy, ADSP). LD score regression for heritability and genetic correlation. Conditional analysis with GCTA-COJO for independent loci. Gene-set analysis via MAGMA. Mendelian randomization with two-sample MR.
Prime Editing strategy
PE3 system (PE2 nickase + pegRNA + nicking sgRNA). pegRNAs synthesised as IVT or chemically modified (Altogen). Delivery by RNP electroporation (Lonza 4D-Nucleofector). Selection by puromycin resistance cassette flanked by loxP for subsequent Cre excision. Efficiency target: ≥15% allelic editing without enrichment.
Anti-FLAG ChIP-seq (CETCh-seq)
Anti-FLAG M2 magnetic beads (Sigma M8823). 5–10M cells per ChIP. Library prep: NEBNext Ultra II. Sequencing: 50 bp SE, ~40M reads. Peak calling: MACS3 (q < 0.05). IDR for replicate concordance. Motif enrichment: HOMER findMotifsGenome.pl vs. repeat-masked mm39/hg38.
Software environment
library(edgeR) # v3.40 — pseudobulk DE
library(EWCE) # v1.8 — cell type enrichment
library(Seurat) # v4.3 — scRNA-seq processing
library(scFlow) # v0.7 — QC pipeline
library(coloc) # v5.2 — eQTL colocalization
library(susieR) # v0.12 — fine-mapping
# Python / CLI
macs3 # v3.0 — peak calling
ldsc # v1.0 — LD score regression
sldp # v1.0 — signed LD profile regression
bowtie2 # v2.5 — read alignment
samtools # v1.17 — BAM processing
deeptools # v3.5 — ChIP QC & bigwig