site banner

Small-Scale Question Sunday for June 11, 2023

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

3
Jump in the discussion.

No email address required.

Apologies for the naive question, but I'm largely ignorant of the nuts and bolts of AI/ML.

Many data formats in biology are just giant arrays, with each row representing a biological cell and columns representing a gene (RNA-Seq), parameter (flow cytometry). Sometimes rows are genetic variants and columns are various characteristics of said variant (minor allele frequency, predicted impact on protein function, etc).

Is there a way to feed this kind of data to LLMs? It seems trivial for chatGPT to parse 'This is an experiment looking at activation of CD8+ T cells, generate me a series of scatterplots and gates showcasing the data' but less trivial to parse the giant 500,000x15 (flow) or 10,000x20,000 (scRNA-Seq) arrays. Or is there a way for LLMs to interact with existing software?

Why language models specifically? From a cursory google I found a couple of papers which may make more sense to you than me

https://www.sciencedirect.com/science/article/pii/S1672022922001668

https://www.frontiersin.org/articles/10.3389/fimmu.2021.787574/full

To overcome the challenges faced by manual gating, many computational tools have been developed to automate every step of the cytometry data analysis, including quality control (5), batch normalization (6, 7), data visualization (8–10), cell population identification (11–16), and sample classification (17–20). The tools utilize a wide range of computations methods, ranging from rule-based algorithms to machine learning models.

Do you want LLMs so you can "talk to" your lab results? Otherwise it's easier to analyse masses of data without the LLM middleman.

Do you want LLMs so you can "talk to" your lab results? Otherwise it's easier to analyse masses of data without the LLM middleman.

Yeah, exactly. There's a lot of grunt work involved in flow cytometry analysis which I was thinking of more than the scRNA-Seq. Machine learning for most basic flow cytometry is slightly overkill because conceptually what you're doing with each gate is conceptually pretty simple. I tried to elaborate/clarify in this comment.

You should send the grunt work to CCP where eve denzions can do it for fractions of a cent.

You've been repeatedly warned to stop doing low effort drive-bys like this that contribute nothing.

Banned for five days this time.