site banner

GWAS in 2022

The largest GWAS of all time (of all time!) dropped a few weeks ago to little fanfare, at least in these spaces. In a nutshell: 5.4 million participants measuring height and 1.4 million SNPs per participant, so about 7 trillion data points if I’m not mistaken. If you submitted 23andme samples, congratulations! You contributed to the (current) record holder for largest GWAS in history. In total, the study accounts for 40-45% of the phenotypic variance of height, and furthermore, the authors claim this is saturating: adding more samples won’t increase the fraction of heritability that they can account for.

What you can do with this data:

  1. Generate some robust polygenic scores (PGS)

  2. ‘Risk prediction’ if you have a burning desire to know how tall someone will be (with large error bars)

  3. ???

What you can’t do with this data:

  1. Understand the phenomenon of ‘height’ in any meaningful way

  2. Genetic engineering a la Oryx and Crake, which is how most people see using CRISPR to make designer babies.

  3. Develop any kind of treatment or therapeutic that would improve the human condition.

So, to put it in some context: the criticism of GWAS has always been that these studies are large, expensive, rarely teach us anything about the underlying biology and explain little of the actual heritability (‘missing heritability’ problem). The ‘mechanistic’ biologists interested in curing disease or engineering biology generally dislike GWAS. It’s interesting in the way that astrobiology is interesting; good to know that planet XYZ792 150 light years away may have liquid water on it’s surface, but not really of practical use. What they (and I, being very much of this pedigree) missed is that PGS are of use if you’re in the business of embryo selection and I was corrected on that point a few years ago (conversation here if you want to see me being wrong). So if your goal is having really tall (or short!) children, this paper is good news for you, but you’ll probably still be dissatisfied with the current low-throughputness of embryo selection.

That being said, these criticisms are still salient and, to some extent, I think have been validated: saturating the SNP space with an absurd number of samples (for context: there are only 1.5 million Americans with type 1 diabetes! Good luck saturating that GWAS in our lifetime) only explains 45% of the variance, and this number will undoubtedly vary from trait to trait. Presumably the rest is coming from rare variants (the cutoff in this study is a minor allele frequency (MAF) of < 1% which is quite high), structural variants, or some genetic dark matter implying that our heritability estimates are too high or not being driven by DNA (?).

I think this also has something to say about the omnigenic model. Even with a very high-powered study most of the SNPs are still clustering around genes with known functions related to growth, bone structure, etc. About a third aren’t near anything at all and we have no idea what they might be doing. But again, the low heritability explained would argue that rare variants may play a much larger role than previously appreciated, which may hew closer to Jim Lupski’s Clan Genomics model. And, this is much more speculative, but perhaps this is hinting at the biological underpinnings of ‘interindividual variation is larger than population level differences,’ i.e., rare variants (and the rarer end of SNPs) unique to your ‘clan’ have a similar or larger effect size than the very common SNPs shared by populations. Eager to see what people think or if they have any corrections.

By the way, how does one use superscripts around these parts? Would have been useful to clean up some of these asides with footnotes. Also, how to use tilde without getting strikethrough?

6
Jump in the discussion.

No email address required.

Presumably the rest is coming from rare variants (the cutoff in this study is a minor allele frequency (MAF) of < 1% which is quite high), structural variants, or some genetic dark matter implying that our heritability estimates are too high or not being driven by DNA (?).

Or environmental factors (e.g. prevalent nutritional deficiencies)?

Incidentally, how do heritability estimates discriminate between genes that "causally" influence height (e.g. a gene that, when expressed, somehow biostructurally increases bone growth), and genes that dictate "unrelated" behavioral patterns which, in turn, affect the desired trait (e.g. craving/distaste for junk food)? Am I right in thinking that this is another major weakness of GWAS - even if you identify candidate genes, those genes might completely fail to transfer to, say, another population in which junk food doesn't exist?

So if you run a GWAS identifying N promising genes for affecting height on US citizens, you couldn't use that to reliably increase the height of European babies?

Incidentally, how do heritability estimates discriminate between genes that "causally" influence height

They don't. Heritability refers to given population in given environment. Gene effect depends on environment (you don't need gene to make vitamin C it your environment has excess of it) and even on frequency of it in population. In a food scarce environment gene which increases bone growth might as well have smaller effect than genes which allow to get more food. So of course simple linear regression, which GWAS is, wouldn't tell about many things.

So if you run a GWAS identifying N promising genes for affecting height on US citizens, you couldn't use that to reliably increase the height of European babies?

Because junk food mainly affects width and not height, this is unlikely to be a problem. And it's not that Europe is free or junk food too.

Or environmental factors (e.g. prevalent nutritional deficiencies)?

It's possible. People like to use height because, in the west at least, a very large fraction of the variation will be genetic. But who knows?

Incidentally, how do heritability estimates discriminate between genes that "causally" influence height (e.g. a gene that, when expressed, somehow biostructurally increases bone growth), and genes that dictate "unrelated" behavioral patterns which, in turn, affect the desired trait (e.g. craving/distaste for junk food)? Am I right in thinking that this is another major weakness of GWAS - even if you identify candidate genes, those genes might completely fail to transfer to, say, another population in which junk food doesn't exist?

You wouldn't, and that goes beyond GWAS. It's a fundamental problem with all the correlational genetic studies. Inferring mechanism is extremely difficult, and it's easy to be fooled by how you think about the trait rather than how biology thinks about the trait.

So if you run a GWAS identifying N promising genes for affecting height on US citizens, you couldn't use that to reliably increase the height of European babies?

I think it would probably work due to shared ancestry, particularly with their racial breakdown scheme. May not work as well in other races, although they do mention that the majority of their loci are shared.