The largest GWAS of all time (of all time!) dropped a few weeks ago to little fanfare, at least in these spaces. In a nutshell: 5.4 million participants measuring height and 1.4 million SNPs per participant, so about 7 trillion data points if I’m not mistaken. If you submitted 23andme samples, congratulations! You contributed to the (current) record holder for largest GWAS in history. In total, the study accounts for 40-45% of the phenotypic variance of height, and furthermore, the authors claim this is saturating: adding more samples won’t increase the fraction of heritability that they can account for.
What you can do with this data:
Generate some robust polygenic scores (PGS)
‘Risk prediction’ if you have a burning desire to know how tall someone will be (with large error bars)
What you can’t do with this data:
Understand the phenomenon of ‘height’ in any meaningful way
Genetic engineering a la Oryx and Crake, which is how most people see using CRISPR to make designer babies.
Develop any kind of treatment or therapeutic that would improve the human condition.
So, to put it in some context: the criticism of GWAS has always been that these studies are large, expensive, rarely teach us anything about the underlying biology and explain little of the actual heritability (‘missing heritability’ problem). The ‘mechanistic’ biologists interested in curing disease or engineering biology generally dislike GWAS. It’s interesting in the way that astrobiology is interesting; good to know that planet XYZ792 150 light years away may have liquid water on it’s surface, but not really of practical use. What they (and I, being very much of this pedigree) missed is that PGS are of use if you’re in the business of embryo selection and I was corrected on that point a few years ago (conversation here if you want to see me being wrong). So if your goal is having really tall (or short!) children, this paper is good news for you, but you’ll probably still be dissatisfied with the current low-throughputness of embryo selection.
That being said, these criticisms are still salient and, to some extent, I think have been validated: saturating the SNP space with an absurd number of samples (for context: there are only 1.5 million Americans with type 1 diabetes! Good luck saturating that GWAS in our lifetime) only explains 45% of the variance, and this number will undoubtedly vary from trait to trait. Presumably the rest is coming from rare variants (the cutoff in this study is a minor allele frequency (MAF) of < 1% which is quite high), structural variants, or some genetic dark matter implying that our heritability estimates are too high or not being driven by DNA (?).
I think this also has something to say about the omnigenic model. Even with a very high-powered study most of the SNPs are still clustering around genes with known functions related to growth, bone structure, etc. About a third aren’t near anything at all and we have no idea what they might be doing. But again, the low heritability explained would argue that rare variants may play a much larger role than previously appreciated, which may hew closer to Jim Lupski’s Clan Genomics model. And, this is much more speculative, but perhaps this is hinting at the biological underpinnings of ‘interindividual variation is larger than population level differences,’ i.e., rare variants (and the rarer end of SNPs) unique to your ‘clan’ have a similar or larger effect size than the very common SNPs shared by populations. Eager to see what people think or if they have any corrections.
By the way, how does one use superscripts around these parts? Would have been useful to clean up some of these asides with footnotes. Also, how to use tilde without getting strikethrough?