@ControlsFreak's banner p

ControlsFreak


				

				

				
5 followers   follows 0 users  
joined 2022 October 02 23:23:48 UTC

				

User ID: 1422

ControlsFreak


				
				
				

				
5 followers   follows 0 users   joined 2022 October 02 23:23:48 UTC

					

No bio...


					

User ID: 1422

I think it is abundantly clear that I am unusually willing to engage in debate and spend an immense amount of effort in elaborating on my arguments, usually in good faith. That includes people I earnestly disagree with or those who dislike me.

I think that is evidence that someone who still manages to annoy me into disengaging is more likely to be in the wrong than I am.

I would also like to note that if we are inferring characteristics of our interlocutors from our own demonstrated efforts, I find it conveniently-timed that it is hard to say that I am unwilling to put in significant effort in good faith to understand vague terminology. I actually often thrive in environments where we don't have strict technical definitions, and we're trying to work through how to construct terminology that most closely matches our vague intuitions. I have one particular term at work that I've been saying I only have a "working definition" for for the past 4-5 years, because it still has plenty of vagueness around the edges and we're still learning stuff about it.

I am perfectly happy working with you on some amount of vagueness in your terms. But, as stipulated above, the natural inference is that you've given me nothing to work with. Not even an attempt.

You can't even get the rock to admit that your position is internally consistent and coherent

It's kind of hard to admit that something is consistent and coherent when you can't even say what the terms mean. How would one check? "Blurfs are bleep." Is that consistent and coherent? How can one know, unless they know what those things are? At least when rocks use words, we know what they mean. (Heh, trivially true, since rocks don't use words.)

you have a distressing tendency to vanish whenever I make an effort post calling out a bad argument you make

Is that better or worse than staying around long enough to declare the conversation over due to difficulties in your position and then insulting people to dismiss them when other difficulties are found in related positions?

I agree with your description of what Schooner actually did. What I am saying is that the framework that Schooner put in place is likely the correct framework within which to analyze the topic. Schooner does not directly answer our question, either, but my sense from all the follow-on cases is that if we're going to make conceptual sense of the matter, if we're going to construct a theory of the law, as Justice Barrett is always looking for, I'm not sure how we do that in any framework other than Schooner.

I don't think any courts have really analyzed the case of illegal aliens in detail, but I think if we look at a few other cases, we can at least get some ideas about what is plausible and how we might go about it.

The first one that I would mention, as @RoyGBivensAction says, is Indians. This is because we do have reasonably conclusive case law on it. Yes, Indians are Weird, yo, and I wouldn't appeal to any specifics there to actually tell us what the answer is for illegal aliens, but it reminds us of how we have to think of the framework. The jurisdiction of the sovereign is absolute, lest he consent to abridging it in some way. Whatever way that has weirdly been done for Indians, it is the case that they can, for example, be prosecuted for offenses, but also do not possess the "direct and immediate allegiance", not even one that is "local and temporary", that is necessary to be considered "subject to the jurisdiction thereof". Regardless of whether one mentally shoves "political jurisdiction" into that phrase, I really don't think one can just look at the case law and conclude anything other than that these two things are at least possible simultaneously.

Caveat paragraph: it is entirely possible that the Indians are Weird, yo caseline is just wrong. It is not impossible to think that the Court today could, for example, just state that Elk v. Wilkins was incorrectly decided, and that Indians either have to have birthright citizenship from 14A or be entirely immune from prosecution. This is possible! But absent that, I have to admit that these two things do not always follow in lockstep. So, we have to figure out how they fit into the framework. My sense is that one way to think about this is that the sovereign has consented to some amount of reduced jurisdiction, and this involves both some fuzzy amount of qualified allegiance (just to use the term that came up, but other descriptors may be fine) and that the sovereign has also consented to a limited amount of immunity in specific, qualified ways. I don't know that these necessarily work in lockstep, either; they may just be the outlines of what the sovereign has, in fact, consented to, factually.

Of course, as I said, because Indians are Weird, yo, that doesn't necessarily mean much about the specifics of the illegal alien case. Therefore, I would move to looking at the two closest other categories that we have. One of these categories has generated plenty of case law (some of which many folks, including Respondents, think is controlling), while for the other case that I think is close, we have just a short discussion from Schooner.

What we're looking for, in the language of Schooner, is people who are almost like lawful temporary visitors, but who don't have any license under which they enter, implied or otherwise. In fact, they have an express prohibition on entering. Obviously, one near miss is lawful temporary visitors. The other place to look for a near miss is any discussion of any hypotheticals where anyone else enters without a license. The only one I've seen anywhere in all of that case law is basically a relatively short passage in Schooner that talks about if an Army enters without the consent of the sovereign.1 It contrasts with the case of an Army entering with the consent of the sovereign. I'll quote that section in full again.

Without doubt, a military force can never gain immunities of any other description than those which war gives by entering a foreign territory against the will of its sovereign. But if his consent, instead of being expressed by a particular license, be expressed by a general declaration that foreign troops may pass through a specified tract of country, a distinction between such general permit and a particular license is not perceived. It would seem reasonable that every immunity which would be conferred by a special license would be in like manner conferred by such general permit.

We have seen that a license to pass through a territory implies immunities not expressed, and it is material to inquire why the license itself may not be presumed.

It is obvious that the passage of an army through a foreign territory will probably be at all times inconvenient and injurious, and would often be imminently dangerous to the sovereign through whose dominion it passed. Such a practice would break down some of the most decisive distinctions between peace and war, and would reduce a nation to the necessity of resisting by war an act not absolutely hostile in its character, or of exposing itself to the stratagems and frauds of a power whose integrity might be doubted, and who might enter the country under deceitful pretexts. It is for reasons like these that the general license to foreigners to enter the dominions of a friendly power is never understood to extend to a military force, and an army marching into the dominions of another sovereign may justly be considered as committing an act of hostility, and if not opposed by force, acquires no privilege by its irregular and improper conduct. It may, however, well be questioned whether any other than the sovereign power of the state be capable of deciding that such military commander is without a license.

The passage is trying to consider the possibility that such entry may not in fact be hostile; it may not in fact be part of a declaration of war. And it's tricky. Remember my discussion of what I felt missing in Ex Parte Quirin. How do you deal with careful distinctions between official Armed Forces, which may or may not have hostile intent, nonofficial folks who are indeed belligerents, what counts as a "hostile occupation" or not? It's unsatisfying as of yet. But at the very least, we can see that Chief Justice Marshall wanted his framework to reason about unlicensed entry of Armed Forces, even perhaps without de facto hostile intent.

One might think that he concedes that hostile intent is inferred anyway, and that may be true. But then, I ask, in Chief Justice Marshall's framework, would one ascribe to them "temporary and local allegiance"? I think not. Nevertheless, they do not seem to gain any immunities. The sovereign has not consented to their entry, has not granted them any explicit/implicit license, has not consented to granting them "temporary and local allegiance", and has not consented to any limitation via immunities. As such, if one were to ask the question as to whether a birth occurred during such a hypothetical event to one of the members of this Army, would the child be a US citizen? I feel like I sort of have to believe that the answer is no, if I believe that this framework is the correct way to think about it.

Of course, yet again, this does not answer the actual question of illegal entry of non-Army regular folks. It's a near miss, but it's an exercise in working the framework.

The other near miss is temporary visitors, with the passage I have certainly quoted, and which WKA used for its entire analysis of this near miss case. My honest opinion, not trying to reach any result, just my impression of the words on the page, is that the implied license for their entry is critical for the "temporary and local allegiance" that is imputed to them. If, as it seems to me, that this implied license is a critical factor, then the absence of it leads to the inference that folks who lack any license whatsoever, express or implied, lack the "temporary and local allegiance" that lawful temporary visitors have. The sovereign has not consented to their entry, has not granted them any explicit/implicit license, has in fact expressly rejected giving a license, expressly prohibited their entry, has not consented to granting them any "temporary and local allegiance", and has also not consented to any limitation concerning prosecution via immunities.

Perhaps something is wrong there, but that is genuinely my best understanding of the framework. We can see that such a combination of conclusions is at least possible given some of the other cases, and it appears to be the most natural interpretation of the theory for the specific case.

1 - EDIT: I suppose it also gave the case of when another sovereign enters without consent. I think that's even further from a near miss, and I'm really not sure that it's all that helpful in either direction. That one is very weird.

Thank you! It's a chonker.

Cool. Thanks! I'll try to remember that, so I won't accidentally ask again the next time I try to make a main post, which could be months. Mods! Hello! Not @'ing you or calling you out or anything, but if you happen to see this. :) Thank you!

Tech support question. I just posted a new main post. Logged in, it shows fine. If I open TheMotte in incognito, it doesn't show. If I open the direct link in incognito, it says [Deleted by user], which, uh, I don't think I did? Do all main posts need to be approved? Did I break something?

We have designs

As I wrote (and just linked to):

Stepping back and taking a very broad view, there are several steps to the research, development, and engineering of a system. Generally, one begins with physical principles. With those physical principles, one can compute theoretical limits. One can also sketch a concept of operation based on those physical principles. Often times, at that point, one can still handwave away many practical concerns and compute how close a concept could, in theory, get to the raw theoretical limits. As one progresses, one may include an increasing number of more real-world difficulties.

For nuclear rocketry, we are not building on a blank slate, as though no one has ever started down this path at all, as though we simply have no idea what the theoretical limits are or what the concept-based performance could look like (still handwaving away many practical considerations). People have been doing this work and publishing it for half a century.

Sigh.

What we DO know is that most fusion systems provide much better specific impulse and exhaust velocity than chemical rockets can.

As I just wrote, and you ignored:

I'll note here that you've already betrayed that you know nothing about what you speak of. We already have some other technologies that are "massively more efficient", but you're not talking about them. They have trade-offs, because yeah, trade-offs exist. When discussing them, we talk about standard performance metrics and how that corresponds to capabilities.

But I'm glad to hear that you've finally admitted that there are at least two useful performance metrics that we can actually talk about (specific impulse and exhaust velocity). So, uh, about how much better, in theory? (A range is perfectly fine here.) How does that compare to other existing systems? Are there tradeoffs with other performance metrics?

The physics has already been worked out!

This was literally my claim, which you rejected.

We do not need to go over it.

This is, in a word, stupid. Perhaps in two words, monumentally stupid. It is nothing other than a self-declaration that you intend to remain ignorant of what you speak.

a general point that using a massively more efficient power source is superior

Sure. Are there trade-offs? How much more efficient? Is there a performance metric for that? What does it look like? What does that physically mean in terms of capabilities?

I'll note here that you've already betrayed that you know nothing about what you speak of. We already have some other technologies that are "massively more efficient", but you're not talking about them. They have trade-offs, because yeah, trade-offs exist. When discussing them, we talk about standard performance metrics and how that corresponds to capabilities.

Frankly, I was saying that your prior comments are basically incoherent.

Instead of taking the experts at their word, we should debate the physics of fusion rocketry as amateurs

...lol, you don't know what my profession is.

Look, I didn't want to wave credentials around, but the reason why I got into this discussion is because you were showing that you are ignorant of the physics involved. What I'm arguing is that you need to learn a little bit about it before you make claims about it. Especially before you make dismissive claims where you say that we don't even need to consider the physics involved. That we don't even need to think about concepts like the rocket equation, specific impulse, thrust, delta-v, etc.

My claim was that we already had pretty decent published literature on various not-yet-existing propulsive methods, that this literature uses the standard physics and the standard methods of analysis and standardized performance metrics. You were saying that we should just ignore all that. That it was wholly irrelevant.

Nah, dawg. You need to have a basic understanding of the domain you want to speak on. If you're going to now agree that we can go look at the published literature (at least I think this is what you want to go for; you just called out an "internet search", so maybe you're going to crackpot sites) and that doing so is not wholly irrelevant, then one requires a sufficient understanding of the basic physics and terminology to have any clue what it is, and is not, saying.

Ah, so you think that there is something to be looked at in terms of designs, and at least something about distances/payloads, yes? And you think that this information can be found and understood with just a quick internet search? You just think that these internet sources don't use first principles reasoning, conceptual designs, concepts like specific impulse, thrust, and delta-v, etc., I guess. Those things are wholly irrelevant to your point. Am I understanding you correctly?

As long as it's significantly better than chemical rocketry, which it is, then that makes it a better option for long-range spaceflight, since it can do the work and chemical rockets can't.

Lets use another one of your examples. Cars are much better than horses. Does that imply that cars are a better option for long-range spaceflight? If you think this statement doesn't quite make sense, try to explain without reference to any first principles, conceptual designs, concepts like specific impulse, thrust, and delta-v, etc.

Alternatively, to hone in really narrow:

since it [fusion rocketry] can do the work

How do you know that it can do the particular type of work you're asking it to do? Wouldn't it be nice if you had some reasoning, from first principles and/or conceptually, which could inform you as to whether it is plausibly up to the type of task you're asking of it? Some sort of check to see if you're accidentally expecting a car to go to the moon, just because it's better than a horse?

Stepping back and taking a very broad view, there are several steps to the research, development, and engineering of a system. Generally, one begins with physical principles. With those physical principles, one can compute theoretical limits. One can also sketch a concept of operation based on those physical principles. Often times, at that point, one can still handwave away many practical concerns and compute how close a concept could, in theory, get to the raw theoretical limits. As one progresses, one may include an increasing number of more real-world difficulties.

For nuclear rocketry, we are not building on a blank slate, as though no one has ever started down this path at all, as though we simply have no idea what the theoretical limits are or what the concept-based performance could look like (still handwaving away many practical considerations). People have been doing this work and publishing it for half a century.

Do you agree or disagree with this general picture?

Why would we need to escape the rocket equation? It's like going from horses to cars.

Neither horses nor cars are propelled by the physics of the rocket equation. The rocket equation is an exponential (or a logarithm, depending on which way you arrange it). It provides a hard limit on performance that cannot be hand-waved away. You say, rightly, that future technologies can perform better. This is true. How much better? What are the numbers that we can plug into the rocket equation in order to compare to the other numbers that we can plug into the rocket equation? It is only then that we can really get a sense for the scale of how much better future technologies can be.

PRISM is a code name for one of the tools that Section 702 authorized.

I think an equivalent way of saying this is, "Section 702 is the statutory authorization for tools like PRISM."

(I believe your comments are blurring the distinction between being something and authorizing something.) The fact that PRISM is a code name and was classified justifies calling it a "black program". Also, I interpreted the phrase line item from OP to be budgetary, since I have only ever heard that term used in a budgetary context before.

Perhaps blurring occurred. I would contend that the blurring occurred here:

As for the statutory authorizations, they were black programs and their replacements are almost certainly black. There's no statutory line item for PRISM or XKEYSCORE any more than there was for the SR-71, and there won't be for the replacements either.

Is this talking about "statutory authorizations"? Or is it talking about line items, which you interpret to be budgetary?

I contend that if it's talking about "statutory authorizations", we obviously have it. I will concede that if all that Nybbler was saying was that we don't have public budget lines, then sure, we don't have it. But I would also contend that that's pretty much entirely beside the point when the conversation is about what they have legal authority to do and what we know about what their programs actually did.

I also think that whether proper names are in the statutory text isn't particularly salient for whether we have a statutory authorization or are able to understand how something works. A couple of prompts to some AI on the topic, and it appears that the term "Head Start" was nowhere to be found in the Economic Opportunity Act of 1964, but nevertheless, a program by the name "Head Start" was understood to be authorized by this statute. That does not imply that this program was so 'black' that we can't understand anything about how it works.

Look, I'm normally not this belligerent, but Nybbler in particular has a history of being willfully ignorant on this topic. Over and over again.

PRISM is the name of one of the major components of FISA Section 702. This is public law. I don't believe the claim was that the budget is public. After all, the sentence immediately prior in the original comment was:

As for the statutory authorizations, they were black programs and their replacements are almost certainly black.

That speaks specifically about authorizations (which FISA Section 702 is), not appropriations.

They were intercepting the lines between the Google front end servers and the GMail backends

Facts not in evidence. We've been over this. There was one slide, where this was presented as an idea. There was none of the information you would have expected on a slide like that about implementation details, authorities, measurement of flows, nothing. We have literally zero actual evidence that they actually did this. It is entirely possible that they did do this, but we just frankly don't know. If they did do this, it would not likely be related to the two major programs that were controversial from Snowden leaks, if, ya know, you had any understanding whatsoever of how those programs worked. Showing again that you don't know anything about these programs and are just free-associating.

That they then pretended they didn't see the stuff that didn't relate to a targeted individual doesn't mean they didn't have it.

I'm not sure which actual claim this is referring to, because it's too vague. You might be trying for something that was real, but I can't tell, because you're again just free-associating rather than speaking about any genuine knowledge of the leaks or the law or literally any real, actual information that we have.

They use a very non-standard definition of the term "collected" to claim they didnt "collect" the data that didn't relate to targeted individuals, but they went through all of it.

This, I believe, is pretty much just false. They have a pretty clear definition of when they "collect" information, and they're pretty clear that they do collect information from people who aren't the targeted individual. They talk about this very explicitly.

You just have literally zero clue how any of this works, because you've persistently refused to educate yourself at all. It's really really really obvious and really bad. The last time we did this, I painstakingly forced you to the point of demonstrating that you were capable of downloading a document (yay! you can use a computer!), but you immediately went on to demonstrate that you were incapable of reading it.

There's no statutory line item for PRISM

I'll just jump in here to say that this is the first outright false thing in this comment. The rest of your comment is just admitting to the truth of my comment. You don't actually know the differences; you don't actually know how they worked; you don't know the follow-on history, how the statutes changed, etc.

they were tapping the major email providers and Hoovering up all the metadata AND content

This was a close second to being outright false. Actually, I'll probably say that it's outright false. You could make modifications to it to be true, but as stated, it's outright false.

The propaganda here is by those pretending this isn't a big deal.

Look, I'm not pretending it isn't a big deal. Of course it's a big deal. That's why you should put in the effort to understand it instead of continuing to be false false false.

mass domestic surveillance

What do people even mean by this anymore?

Precisely. I am once again reminding people that the vast majority of folks can't even identify the names of the two main programs that were contentious, much less say anything about how they worked... and even further less about how they were different from each other.

Do people think they stopped after the Snowden leaks?

...and yet somehow even further less about the subsequent history of those two programs. What follow-on statutory authorizations looked like or whether each of the programs continued.

No, people mostly don't have a clue. They absorbed a bunch of propaganda over a decade ago, never made coherent sense of it at the time, and are now just half-remembering faint glimmers of propaganda from days past.

I suppose the snark is aimed at me. So I guess I'll start with the snark right back, first. My comment included multiple very hopeful things, and a big part of that was due to the linked post's discussion of Alethia's performance on First Proof, which you would have known if you had read either of them. This now is just the ArXiV version of it.

I am still quite very hopeful, and it's nice to see the actual proofs that were generated. It is quite unfortunate that I can't beat on the system with my own problems to get a personal feel for it. I also endorse pretty much everything @PokerPirate has said below up to this point.

Obvious remaining concerns are obvious. It still generated wrong proofs, when evaluated by experts. Many many many hours of evaluation work. That can plausibly be managed; wrong proofs are definitely out there. I will repeat my related concern that the ballgame is quite different when you're working on a problem where you don't already have a solution (that is, where you don't know that a solution exists). And as you mention, cost, question mark? They're unclear about it, and what they do have looks scary, especially if you look at the numbers floating around in their prior papers (also untethered from an absolute scale, but wild in a relative scale).

Thanks again for the kind and thorough response.

My impression is that the models can just about do what you want them to do, but with significant frustration and wasted time on your part.

I would quibble with this. What I want them to do is to be able to help me with analysis that I don't already know how to do. I wrote it this way a couple days ago:

When I've known the solution, I can probably get it there. When I've not known the solution, I have to say that at best, it's been good at helping me find other results in the literature that might be helpful. It is, indeed, labor-intensive and quite frustrating to have to carefully pore over every detail, trying to see if it went astray when generating a mountain of text. Then, when you find something wrong, maybe not even having verified the rest of it, it'll happily produce another mountain of text, and it feels like you're starting from square one. When you're already confident that you know a method will work, then it's mostly just a test of will to see if you can get it to figure it out. When you don't know, the question of whether you potentially waste mountains of time on what may be a dead end or just proceed on your own becomes far more difficult, and you have to make that decision repeatedly along the way.

The reason why I was thinking about the particular flight mechanics problem for this thread here was that I wanted to further drive in that wedge that I think is between the folks who think that most knowledge work is already automatable and those who think that it can be useful if you already know what you're doing. Thus, even a problem where I'm quite confident that I could do the analysis, I predicted that the LLM would fail on its own without significant knowledge-work-educated input. To me, this means that there are two significant steps that the models must overcome before we're thinking about a possible world where basically all knowledge work is automatable.

Maybe as an aside, I'm able to leverage collaborators at multiple levels, from profs to post-docs to PhD students to MS students to undergrads. My experience has been that coming up with the right problem to solve is actually a huge part of the battle. During that process, I'm always considering if I can spin out sub-problems or related problems that may be useful to consider on the way to what we really want (or sufficient contributions in their own right). When considering them, I mentally bin them into a hierarchy. If it's a problem that I'm near 100% sure I could just sit down and do, perhaps I've already done all of the pieces, but never done quite that variant before, and now it seems like that variant might be of interest, it's a plausible candidate to go to an undergrad. On the other end, the vaguest, most conceptually-dense questions, I may reserve for conversations just with profs. There is sometimes something to be said for not "distracting the students" by letting them spin their wheels on something that they're not likely to really contribute on anyway. I have somewhat of a sliding scale for the in-between students/post-docs; I've put words to the basic contours of that scale before, but I don't think I'll bother here, because it's not the most important. There is a possible slight correction factor available if I've been working with a student for long enough to know that they're substantially better/worse than the average student in their category.

In any event, perhaps if I had listed out all of the steps of this scale, I'd have even more than two significant steps that models must overcome, but for my purposes in this thread, I was trying to pick a problem that was pretty directly in the realm of, "I could just give this to an undergrad."

Yes, could I bang on an LLM long enough, the amount of will required being dependent on the particular problem, that it eventually finds its way to the answer that I already knew was the answer all along? Yeah, probably. Is this a huge upgrade from GPT-4? Honestly, I don't know; I gave up back in those days rather than ever really try to beat it into submission.

...but this still is just not really useful, at least not if the goal is to actually automate the knowledge work piece. Sure, it's potentially useful once I've already done all the knowledge work, and I'm sitting down to actually code the thing that I definitely know how to code. But more likely, at this point, it's going to be useful to the student who I've asked to code the thing, because I'm probably not coding it myself, anyway.

I don't really have a good timeline or prediction for if/when some sort of AI system will cross these various thresholds. I'm still hopeful on the straight math side, as I said in my comment a couple days ago. But if the purpose of this exercise here is to find problems that cause someone to update, I was hoping that, "Here's a problem that I'm comfortable that I could give to an undergrad and pretty confident the LLM will fail," could pull you at least epsilon away from thinking that quite so much of knowledge work is currently automatable or perhaps epsilon more cautious about believing that it's quite so imminent.

Fair enough. Thanks for clarifying.

Do you have any thoughts that you'd be willing to share on what I wrote concerning the amount of knowledge work currently required to be input to do things like the task I was thinking about? I suppose I wasn't entirely clear, but I think it would likely fail to do the analysis task on its own. For clarity, this is a task that I thought, "It might be weird enough that no one's done it yet, but it's close enough to the standard stuff that I could almost certainly give it to a student who did well enough in their flight mechanics course, and they could almost certainly just do it." That seems to have been partly justified in that I found a publication in which a student did just do it (and skimming the paper, the analysis seems about on par with what I had expected; I guess my flaw was thinking the idea was sufficiently 'weird'; I guess it says something about the state of aerospace that someone out there has done almost every basic variant, sort of regardless of whether it makes sense to do). I'm probably <50% on whether it would make the "right" engineering implementation choices on its own. I don't have a precise number. I think it might get lucky, because there's a pretty large set of choices available, and I hadn't yet tailored the problem so that it requires it to really think conceptually about what's going on and only pick from a small subset; there's a good enough chance that it could guess somewhat randomly or pick a popular one that happens to work (though I'm not sure if it'll put the right context around it even if it does).

Perhaps, given your comment below, this is just something that you mostly don't care about. Does this sort of thing just bucket into, "No, it can't do this sort of knowledge work now, but with sufficient recursive self-improvement, it will be able to do it later"? (I guess, in line with your stated AGI timelines?)

I don't think you're quite clear in the post as to what camp you're actually in. Are you a straight bull? As in, do you think it can currently replace a sizeable portion of human knowledge work?

Moreover, it is not clear how knowledge work that is not coding qua coding fits into your schema. For example, I have in mind a flight dynamics simulation/control task. I'm not settled on it yet. My plan was to include a little twist that I had thought would likely not be in the published literature, but which I'm sure I could manage without too much difficulty, just pulling one book off of my shelf, confirming where exactly I need to make the modification and how (it's been a long time, but it's something I'm confident I could do without extreme effort), and then coding it. Unfortunately, I looked, and some darned student already published it (only minimal code published AFAICT, but they wrote out all the analysis in detail, so I can't really purely test its ability to do this aspect of the knowledge work on its own), so I'm trying to think of another good variant.

There are other little twists I had in mind, hoping to prevent it from being able to purely just pull code directly from others. These twists are things I've personally coded in the past, so I know they're doable. But the point is that they require sufficient knowledge to make choices along the way (for one example, choose this algorithm for this part, because I know it has certain characteristics) and I think they prevent it from being able to just use someone else's work for the core simulation components.

I guess, where does this fit within your schema, and where are you with respect to your own opinions? There is a lot of room between, "I personally know how to architect this code, what algorithms/assumptions to use, how to modify the analysis for the instant case, and then I use Claude to help with building the components", "I do the analysis, give it to it, tell it to code up the whole thing, then I go in and tell it to change things to make better choices that fit my knowledge-work-educated beliefs on how it should be done," and, "I tell it to code up the whole thing, maybe tell it that something's broken, but part of the test is whether it made the right analysis and knowledge-work-educated choices on its own along the way."

In other words, what I'm interested in is not so much about what it can do in terms of coding qua coding. It could be utterly magical at that, and that would be great. But how much of my own knowledge work do I need to input to get it to code the "right" thing, versus how much it's able to make the correct choices on its own about what the "right" thing is.

You know what? I don't think he is engaging with the article. The article specifically mentions GPT 5.2 Pro seven times, two of which seem, to my read, to imply that that's what he's using. There is one moment where he just says "GPT 5 Pro". Perhaps he just happened to leave off the ".X" in this one spot. Perhaps I'm reading the other seven mentions of GPT 5.2 Pro wrong, and the dirty secret is that he's using 5.0. I suppose he doesn't say in big bold highlighted words, "I'm definitely using 5.2 and not 5.0," so sure, maybe one could say that it would be nice to have a clear statement.

...but to come in, with one sketchy textual inference, and just boldly declare that the only way anyone could possibly be reporting the experience they're reporting is obviously just because they're using a six month old model, and that obviously it's now totally fixed... it's the same SMH annoyance at someone being annoying and arrogant.

In fairness, perhaps he only read my comment and not the article (thus, not engaging with the article), and in fairness, I did blockquote the one spot where he seemed to have left off the ".X". But yeah, "I didn't RTFA, but I'm going to boldly declare that I've diagnosed exactly what's going on, using the same tired objection," is pretty cold comfort.

The article discusses Erdos problems and Aletheia's performance on "First Proof".

Why is there always someone who blows up with such attitude, yet appearing to not really engage with anything?

But you will also notice the absense of issues you are facing.

Let's turn it around. What version mathematician are we dealing with here? What's your h-index? Have you used any particular LLMs, regardless of particular model/scaffold to solve components of your own publishable mathematics research? Can you personally attest to not encountering any issues like this? I just don't understand this insistence of not looking at the frontier, yet insisting where it is.