site banner
Jump in the discussion.

No email address required.

How does access to the database work? Do scientists request specific types of data which are sent to them if their proposal is approved? Do they get temporary access keys to look stuff up?

I'm wondering how plausible it would be for someone to pirate/copy the database and replicate it elsewhere anonymously kind of like sci-hub. Which would be more or less plausible depending on who has how much access to the database.

Access is by study or study subset and so would require a massive distributed project of requests that were falsified or a big internal wikileaks style breach. That would really suck for all the patients who were generous enough to share their data with the assumption it would be protected. I think it is a Bad Thing to be avoided.

Probably. In addition to the damage to those specific individuals, it would make it significantly harder to convince future patients to commit to similar projects. But on the other hand, it would reduce barriers to scientific progress and the authoritarian control of elite institutions from being able to arbitrate which topics are and are not within the Overton Window of Science.

I assume a theoretical leaker would leak anonymously, but I guess if the data set is unique to that study then they could deduce it, unless a bunch of them were combined and mixed together, maybe with some stochastic omissions to further obfuscate what the original data looked like. A deadman's switch might work, where the data gets uploaded to the internet and made public like 10 years later.

But you're right that there would also be the issue that nobody could publish results using the leaked data.

nobody could publish results using the leaked data.

Your mean publications in scientific journals?... ... one can publish anonymously.

This is very disappointing. Polygenic screening is going to need this kind of data linking genes and IQ if it's ever going to work well; it would be ironic and shameful if the NIH, by attempting to hide gene-to-IQ associations, ended up sabotaging the very groups such a censorship regime was meant to protect.

Society is fixed, but biology is mutable, and this is only going to become more true as AI foundation models bring more of biology under our explicit and direct control. If any one group did end up being lower-IQ than others, that is the group that has by far the most to gain from this kind of technology and (by extension) from this kind of research.

What would be the time and cost of creating a similar dataset?

I don't have access to a financial breakdown of the kinds of studies that feed into this database so I admit this is somewhat me talking out of my ass, but the sequencing costs involved are pretty low these days. I'd guess the cost of computation might even be comparable to running the SNP panels (maybe 25-50$ per sample in bulk, or even 70-100$ per sample in bulk if you just want full genome data).

The real cost is the army of nurses, doctors and scientists doing more or less unpaid labor for career advancement and altruistic reasons; doing this as a private company would be staggeringly expensive unless your scale is much smaller, and either way, any kind of payout from this data would be dubious. Getting the demographic and phenotypic data to associate with the genetic data is an enormous pain in the ass between IRBs, patients who are unreliable and disinterested in giving you data, making sure you're following all the regulations around PII, etc. Not to mention the fact that half the population hates the medical-industrial complex right now and is unlikely to cooperate on any kind of large scale project.

Ideally, we'd all be genome sequenced at birth and our medical records would be entered into a centralized system where researchers could access de-identified data. The ML folks and data scientists would be able to tease out a remarkable number of associations that we just don't have the power for right now. Although maybe we've just circled back to square one, where that centralized system would decide what you can do with it's data...

I have helped prep data for NHLBI. Basically it is just taking existing study data then standardizing and anonymizing it for inclusion into NHLBI. The idea is to retain the data that's already been gathered for future use.

I'm very much opposed to the restrictions FWIW.

I found this disappointing, but not infuriating, until I realized that NIH is a taxpayer-funded organization. Why can't this data be in the public domain?

I wonder if a suit could be brought over this.

I don't think that NIH wants to be in the Eugenics business, so they're taking steps to avoid it.

  • -22

If you're not in the Eugenics business, you're in the Dysgenics business. You don't get to not play the game.

Yeah you do, it's called Natural Selection.

If you are not the one deciding what genes are "good" and what are "bad", you are in the game, but as a ball, not as player or referee.

If an organization is "in the X business" just because data it produces could, by some third party, be used to justify X, isn't funding something could cause Y an even stronger connection? Yet NIH funded viral research in Wuhan, increasing the risk of a global pandemic.

You could broadly say that NIH wants to be in the germs business but not the make people smarter business.

It's looking more and more like genetic engineering is the only viable way to close racial SES gaps. Ironically, the NIH is fighting to preserve racial inequalities while proclaiming its intent to narrow them.

Yes, we all know how you think science should work at this point. If all the expensive data collection is just used to launder arbitrary moral decrees, wouldn't reading seabird entrails work just as well for a fraction of the cost?

you

Me in particular, or just people who aren't HBD enthusiasts?