If you seriously feel that the ML community is gatekeeping, then I invite you to come join the community and propose ways to remove these gates. There are regular workshops hosted to address these issues and improve them. In just 4 days, there will be a workshop on "The Future of Machine Learning Publishing" https://inverseprobability.com/sorrento2026/future-ml-publishing.html.
There are also more-or-less annually workshops at NeurIPS/ICML on improving the publishing process in ML. Here is an (incomplete) ChatGPT generated list:
(2010) : https://mloss-static.ml.tu-berlin.de/workshop/icml10/
(2018) : https://ml-critique-correct.github.io/
(2019) : https://ml-retrospectives.github.io/
(2020) : https://ml-retrospectives.github.io/neurips2020/
(2021) : https://neurips.cc/virtual/2021/workshop/21885
(2022) : https://ml-eval.github.io/
(2023) : https://sites.google.com/view/reconsidering-peer-review
I don't know of any academic communities that are remotely as open and accessible as the NeurIPS/ICML community. The NLP and CV communities have made some progress in these directions (due to the overlap of their members and the ML community), but even other branches of CS are way behind.
To me that is indicative of a level of quality, skill on the author, or even writing to a wider audience (pretty much a skill) instead of writing to the clique (poor intent).
I don't see anything wrong with writing papers for a "clique" when you are actively trying to help people come into the clique who want to. The ML community has pioneered open access to papers via JMLR/ICML/NeurIPS breaking away from the older venues in the 1980s that refused open access, and basically every graduate level textbook is available for free online.
I basically agree with everything in your last paragraph. Except that AlexNet was published at NeurIPS, not Nature: https://papers.nips.cc/paper_files/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html
I'm not sure the ML community agrees with you because there are prevalent conferences like CVPR or the NLP one I am blanking on. These are considered ML conferences, focused on a particular practical field.
No. People who publish in these conferences do not consider these ML conferences. Historically computer vision and NLP started out as fully distinct communities with almost no overlap with the ML community. Since about 2014 and the deep learning revolution, the lines have been blurred a bit, but they are still very distinct communities.
NeurIPS/ICML are basically considered the same conference, and any paper that could be accepted at one could also be accepted at the other without modification (beyond styling); the only meaningful difference is the submission deadline. Similarly, CVPR/ICCV/ECCV are all basically the same conference with difference submission deadlines, and ACL/EMNLP/NAACL. You cannot, for example, take a paper designed for NeurIPS/ICML and get it published at CVPR/ICCV/ECCV without major structural changes, and that's we know they are part of different communities.
The division here is not academic/industry like you suggest. Bishop---who again is the prototypical author for probabilistic ML---works at Microsoft and you can find the textbook info at: https://www.microsoft.com/en-us/research/people/cmbishop/prml-book/. The division is based on the conference communities and who publishes/reviews where.
Unfortunately this is a constraint in industry, I have a job, there is work to get done. spending 8+ hours to digest a theory paper is a large impact on my time. Even if it leads to something useful.
Honestly, those papers shouldn't take 8 hours for a researcher to read. I had a pretty solid idea of what they were doing in <5 minutes, and I'd guess in <1hr I could fully understand everything about each paper.
The difference is that I am the target audience. Having done a ML phd, I've read >20 graduate level textbooks cover-to-cover and >1000 papers in great depth. If you haven't done this background work (which is fine---it's not for everybody, and I actively recommend my students not pursue this path) then these papers are not designed for you. You should accept this rather that complain that they are too hard or gatekeeping.
Nature paper ...
Without looking at this paper I agree it is shit. This paper is not a machine learning paper (and basically nothing in Nature is). The failure to replicate is a problem of the culture of medical science and not ML.
Just because a researcher uses a compiler in their research does not make them a "compiler researcher", and similarly, just because someone uses machine learning in their research does not make them a "machine learning researcher". Papers at PLDI are not targeted at people who are "trying to apply compilers" and papers at NeurIPS/ICML are not targeting people who are "trying to apply ML". (If you actually want to see a "mathy" paper, BTW, you should take a look at the papers at COLT... these are definitely not for you and these are definitely hard-core proper machine learning papers.)
Grassman flows paper
This paper is definitely an ML paper, and honestly is pretty reasonable. It's not earth shattering, but it's exactly the kind of work that I would expect from a decent phd student (which the author is). It's pretty bread-and-butter ML to take a model and explore ways to reduce the representational complexity of the model. Grassmann manifolds are outside of standard ML math, but the explanation in 2.2 was easy to follow. The math here is no harder than the math in standard graduate textbooks.
Causal Foundation Models paper...
Again, this doesn't seem very mathy to me. The notation all looks like standard stuff from the Pearl textbook (admittedly not standard ML, but definitely standard for anything causal), and anyone who has worked through Bishop (which should be literally everyone with an ML phd of a certain age) should have no problem.
Having to look up 3 references to read and understand a paper seems absolutely reasonable to me.
Research papers are written for phds, and if you don't have a phd then you are not the target audience. Unreproducibility and over-mathiness of ML research is a common meme among the online ML-adjacent communities, but it's just not true. The ML community has done far more than any other community to encourage reproducibility and they've had a lot of success in doing so.
Source: I am an ML researcher with only a mediocre publication record. I've got my own gripes with the system that have led to my pub-record being mediocre, but reproducibility is not one of them.
Search amazon for "duplo marble run" and you get all sorts of cool knock-off duplo sets that are so much better than anything lego makes: https://www.amazon.com/s?k=duplo+marble+run
I'm wondering how the Chinese "lego-compatible" ecosystem fits in here? It seems like an easy win FIRST and some up-and-coming brand.
My oldest is 8 and has been playing around with knock-off technic legos for about 2 years. They're about 30% the price of name-brand, and the pieces themselves seem basically just as good. The instructions and designs are definitely not as good as lego, but this seems like a place where FIRST could put in Western-quality work for cheap.
Some of the knock-off stuff is legitimately better than lego too. We have ~5 sets of fake-duplo marble runs that are legitimately much more fun to play with than anything lego makes in the duplo age range (both for kids and adults).
Thanks for sharing this. The project is fascinating to me from a technical perspective.
I'm currently working on a make-like build system for automating LLM workflows like yours. I've only been using it for internal projects so far, but I might try putting together an example that outputs material compatible with your system. So I looked into some of the technical details, and I have a few questions for you.
Q1
It looks like each novel is stored in its own git repo. I dug through your https://github.com/JohnQPulp/CupOfGold repo and I think I understand how all the info is stored. My first question is: is the annotation format you use in pulp.txt standard for visual novels or something you invented? Specifically, in the lines
All afternoon the wind sifted out of the black Welsh glens, crying notice that Winter was come sliding down over the world from the Pole; and riverward there was the faint moaning of new ice. It was a sad day, a day of gray unrest, of discontent.<e>"Winter... of discontent" opens Steinbeck's first novel. That's some neat, Shakespearean <book>The Winter of Our Discontent|career bookending</book>.</e>
b=wales
The gently moving air seemed to be celebrating the loss of some gay thing with a soft, tender elegy.
n:r=Robert Morgan; n:m=Mother Morgan; n:g=Gwenliana; n:h=Henry Morgan
I'm wondering if the html-like tags and the b=wales metadata stuff is formally documented anywhere?
Q2
These two repos look like how your generating the actual HTML from a book repo:
But what are you using to automate the actual git repos of the books? Could you walk me through that workflow a bit? (This is the part that I might try automating with my own tool.)
For example, I don't see anything in the book repos that look like they are designed to enforce consistency (like a character sheet) anywhere. All the material in the repo looks more like a final product than intermediate developer/artist "documentation". Do you generate any intermediate files like this?
Q3
What's the approximate cost for the full conversion? How much time does it take? (both manual and API/compute)
I'll second that I also appreciate these posts :)
I always thought mission was equally required for both men and women, and only adult converts "get out" of mission.
In my reply to @clo above I mention having just read an easy-greek reader of The Illiad... but since you mention Sherlock Holmes... I feel obligated now to mention that I just received an attic greek translation of Sherlock Holmes from amazon this week. I've been really enjoying reading these "modern ancient greek" stories recently.
I just read Ho epi Troian Polemos. It's an easy-greek reader that tells the story of the Illiad using only ~400 greek words. It's designed for someone who has had about 1 semester of greek studies.
If you're actually interested enough in the books to re-read a translation, then I recommend starting to just go to the original language!
I teach computer science and so I look at a lot of people's hands as I watch them type on the keyboard. I'd guess that about 1/3 of female students have nails long enough that they cannot type comfortably on a keyboard, and this meaningfully impacts their performance in my classes. (Foreign-born women do not have this problem; only American-born women.) I don't see any painted nails though, just grotesquely large nails.
The median parental income at this school is $500k/year, so these are pretty upper class women.
It's time for the daily Two Minutes Hate against translators/localizers/paraphrasers who take unjustified liberties with the source material. "Said" rather than "had said"? "Old gentleman" rather than "gentleman"? Commas rather than em dashes? No repetition of "my son"?
This week I was reading my bible in greek and noticed that in Matt 15:17 uses the word ἀφεδρῶνα (toilet). Sadly, none of the popular translations like NIV/ESV actually include this word in their translation :(
Dell-Mann Amnesia is related. But the effect I'm thinking about / worried about is different.
- Prev
- Next

They don't; it's all informal. AAAI is the closest thing and has a lot of overlap. Basically no one is a member if IEEE or ACM.
More options
Context Copy link