@faul_sname's banner p

faul_sname

Fuck around once, find out once. Do it again, now it's science.

1 follower   follows 3 users  
joined 2022 September 06 20:44:12 UTC
Verified Email

				

User ID: 884

faul_sname

Fuck around once, find out once. Do it again, now it's science.

1 follower   follows 3 users   joined 2022 September 06 20:44:12 UTC

					

No bio...


					

User ID: 884

Verified Email

And the guy behind ClawdBot / MoltBook (or whatever its called now) has openly discussed how his own deployment of ClawdBot was thinking and executing ahead of him.

I will point out that MoltBook had exposed it's entire production database for both reads and writes to anyone who had an API key (paywalled link, hn discussion).

And this is fairly representative of my experience with AI code on substantial new projects as well. In the process of building something, whether it's something new or something legacy, the builder will need to make thousands of tiny decisions. For a human builder, the quality of those decisions will generally be quite tightly correlated to how difficult it is for a different human to make a good decision there, and so, for the most part, if you see signs of high-thoughtfulness polish in a few different part of a human-built application that usually means that the human builder put at least some thought into all the parts of that application. Not so for "AI agents" though. One part might have a genuinely novel data structure which is a perfect fit for the needs of the project and then another part might ship all your API keys to the client or build a SQL query through string concatenation or drop and recreate tables any time a schema migration needs to happen.

That's not to say the "AI coding agent" tools are useless. I use them every day, and mostly on a janky legacy codebase at that. They're excellent for most tasks where success is difficult or time-consuming to achieve but easy to evaluate - and that's quite a lot of tasks. e.g.

  • Make an easy-to-understand regression test for a tricky bug: "User reports bug, expected behavior X, observed behavior Y. Here's the timestamped list of endpoints the user hit, all associated logs, and a local environment to play around in. Generate a hypothesis for what happened, then write a regression test which reproduces the bug by hitting the necessary subset of those endpoints in the correct order with plausible payloads. Iterate until you have reproduced the bug or falsified your hypothesis, If your hypothesis was falsified, generate a new hypothesis and try again up to 5 times. If your test successfully reproduces the bug, rewrite it with a focus on pedagogy - at each non-obvious step of setup, explain what that step of setup is doing and why it's necessary, and for each group of logically-connected assertions, group them together into an evocatively-named assert() method."
  • Take a SQL query which returns a piece of information about one user by id and rewrite it to performantly return that information for all users in a list
  • Review pull requests to identify which areas would really benefit from tests and don't currently have them
  • Review pull requests to identify obvious bugs

I think about this a lot, but I also catch myself thinking about how easy it must have been in the 90s to find alpha in X, and then realize that with the knowledge I have now it would be easy, but that obtaining that not-yet-common knowledge would have been much harder in the 90s. I'm sure that there's similar alpha available today if you know where to look for it, but if it was easy to find, it wouldn't be alpha.

Even Disney World has MBA'd itself into a place I would no longer remotely describe as the "happiest place on earth".

I actually went to Disneyland with my wife and daughter a couple months ago, and I was shocked by how much it wasn't MBA'd. The tickets were cheaper (inflation-adjusted) than they were when I was a kid, the food was decently good and not horribly expensive (~$20 / meal for decent bbq with big enough portions that we only needed one full meal plus a few snacks during our entire time from park open to park close), there weren't really any of the rigged carnival games that are optimized to make it seem like you just barely missed the big prize and should just try One More Time that you see in other amusement parks, and the lines didn't shove ads in your face (again, unlike other amusement parks). Possibly I just went in with sufficiently low expectations that I was pleasantly surprised.

Plus Trump's team is already running into random vigilante judges in farflung circuit courts attempting to adjust whatever they pass.

I think this is a symptom of the things where the legislative branch refuses to legislate, leading to a power vacuum which both the executive and the judiciary branches try to fill.

Take the statement "I think ICE was in the right during the recent shooting, because <reason>".

Take X, and plug it into the statement "I think we should go down the list of registered Democratic voters and send hit squads to their houses to kill everyone present, because <reason>".

Does the sentence still make sense?

Example A: <reason>="because declining to enforce laws if bystander-activists actively make the situation more dangerous sets up terrible incentives". "Kill the dems" statement makes no sense if you put this reason in, thus it is not an example of the sort of thing DiscourseMagnus was talking about.

Example B: <reason>="because the dems had it coming for ruining our country". "Kill the dems" statement does make sense if you put this reason in, thus it is an example of the sort of thing DiscourseMagnus was talking about.

TBH on here I don't see much of example B. On xitter I do, but the discourse here on the motte has been refreshingly free of that for the most part. I do agree with DiscourseMagnus that example B is bad and the sort of thing I want to see less of, but I don't agree with his implication that it's the sort of thing I see a lot of here.

This really seems like a case where you should petition your elected representatives to change the laws. If our legislators actually started legislating that would help a lot with the current power struggles between the judicial and executive branches, and maybe having their constituents getting on their case for failing to legislate would help with that.

Midterm elections are in 9 months. One way to lose is by declining to try, but another way to lose is deciding to try really hard, fucking everything up badly in a highly legible way, and being booted out of your position.

It's about a 30 min walk / 10 min uber from rockridge bart, so pretty doable. There's a sequences reading group every Tuesday at 6:30 pm at lightcone if you want to get the full bayrat experience (cc @falling-star).

The most sobering part? It’s domestic. Funded, trained (somewhere), and directed by people who live in the same country they’re trying to paralyze law enforcement in

Pangram says 100% AI generated. Make what assessments you will about the reliability of the author and how likely it is that they're actually a former Special Forces Warrant Officer.

But the rest of the rhyme is correct.

Thirty days have September, April, May, and December. All the rest have thirty one, save February, which is "fun".

I think "I can write assembly code better than the compiler" is usually true if and only if we are "unfair" to the compiler. As such,

As you pointed out, the assembly you've written does not match the C code, and would not be correct for the compiler to produce.

Yep. It would not be correct, in the general case, for the compiler to produce this code. As the human programmer, you have more context, and are able to determine that it is correct to produce this code in this particular place though. I do agree with you that "write code that compiles to the fast assembly" is probably the right way here, but often you don't even realize that the optimization is necessary until you benchmark, and reading the assembly will tell you what shape of optimization you need. What you do about that will vary. The correct answer is rarely "write assembly", but that's usually not because you couldn't write assembly that served your needs better than the compiler's asm, but instead because the maintenance burden of your own asm is large.

Anyway, that was kind of a toy example, but it did rhyme with a real case I've run into IRL, where compiling idiomatic code led to assembly which was suboptimal in this way (even with -O3 -march=native -ffast-math).

It also rhymes with things I've observed about LLM-assisted coding: even now, most of the ways LLMs fail in everyday situations are due more to them lacking context and affordances than them being less capable than a human with the same information and affordances. An LLM might be given a codebase and a ticket describing a change and have to make educated guesses about how the code is called in practice, while a human given the same ticket might go "oh, I need more information, let me go look at the logs to see what order these calls happen in prod" before touching code. The LLM might also even have tools to pull those logs, but not know when to use those tools (I do see this quite a bit too).

Do you have a source on this being the justification the US is using?

I like that analogy. However, there's one point that applies here, and that I think will also apply to LLM-generated code: at no point did it become impossible for an assembly programmer to improve the output generated by an optimizing compiler.

Even today, finding places where your optimizing compiler failed to produce optimal code is often pretty straightforward[1]. The issue is that it's easy to have the compiler write all of the assembly in your project, and it's easy from a build perspective to have the compiler write none of the assembly in your project, but having the compiler write most but not all of the assembly in your project is hard. You have many choices for what to do if you spot an optimization the compiler missed, and all of them are bad:

  1. Hope there's a pragma or compiler flag. If one exists, great! Add it and pray that your codebase doesn't change such that your pragma now hurts perf.
  2. Inline assembly. Now you're maintaining two mental models: the C semantics the rest of your code assumes, and the register/memory state your asm block manipulates. The compiler can't optimize across inline asm boundaries. Lots of other pitfalls as well - using inline asm feels to me like a knife except the handle has been replaced by a second blade so you can have twice as much knife per knife.
  3. Factor the hot path into a separate .s file, write an ABI-compliant assembly function and link it in. It works fine, but it's an awful lot of effort, and your cross-platform testing story also is a bit sadder.
  4. Patch the compiler's output: not a real option, but it's informative to think about why it's not a real option. The issue is that you'd have to redo the optimization on every build. Figuring out how to repeatably perform specific transforms on code that retain behavior but improve performance is hard. So hard, in fact, that we have a name for the sort of programs that can do it. Which brings us to
  5. Improve the compiler itself. The "correct" solution, in some sense[2] — make everyone benefit from your insight. Writing the transform is kinda hard though. Figuring out when to apply the transform, and when not to, is harder. Proving that your transform will never cause other programs to start behaving incorrectly is harder still.
  6. Shrug and move on. The compiler's output is 14x slower than what you could write, but it's fast enough for your use case. You have other work to do.

I think most of these strategies have fairly direct analogues with a codebase that an LLM agent generates from a natural language spec, actually, and that the pitfalls are also analogous. Specifically:

  1. Tweak your prompt or your spec.
  2. Write a snippet of code to accomplish some concrete subtask, and tell the LLM to use the code you wrote.
  3. Extract some subset of functionality to a library that you lovingly craft yourself, tell the LLM to use that library.
  4. Edit the code the LLM wrote, with the knowledge that it's just going to repeat the same bad pattern the next time it sees the same situation (unless you also tweak the prompt/spec to avoid that)
  5. I don't know what the analogue is here. Better scaffolding? Better LLM?
  6. Shrug and move on.

I do think there's a decent chance that some combination of 1 and 4 will work for LLM-generated code in a way that wasn't really viable for assembly, but that might just be cope.


Footnotes


^[1]: For a slightly contrived concrete example that rhymes with stuff that occurs in the wild, let's say you do something along the lines of "half-fill a hash table with entries, then iterate through the same keys in the same order summing the values in the hash table", like so.

// Throw 5M entries into a hashmap of size 10M
HashMap h;
h->keys = calloc(10000000 * sizeof(int));
h->values = calloc(10000000 * sizeof(double));
for (int k = 0; k < 5000000; k++) {
    hashmap_set(h, k, randn(0, 1));
}

// ... later, when we know the keys we care about are 1..4999999
double sum = 0.0;
for (int k = 0; k < 5000000; k++) {
    sum += hashmap_get(h, k);
}
printf("sum=%.6f\n", sum);

Your optimizing compiler will spit out something along the lines of

...
# ... stuff ...
                                        # key pos = hash(key) % capacity
.L29:                                   # linear probe loop to find idx of our key
    cmpl    %eax, %esi
    je      .L28
    leaq    1(%rcx), %rcx
    movl    (%r8,%rcx,4), %eax
    cmpl    $-1, %eax
    jne     .L29
.L28:
    vaddsd  (%r11,%rcx,8), %xmm0, %xmm0  # sum += values[idx]
# ... stuff ...

This is the best your compiler can do, since the ordering of floating point operations can matter. However, you the programmer might have some knowledge your compiler lacks, like "actually the backing array is zero-initialized, half-full, and we're going to be reading every value in it and summing". So you can replace the "optimized" code with something like "Go through the entire backing array in memory order and add all values".

# ... stuff ...
.L31:
    vaddsd  (%rdi), %xmm0, %xmm0
    vaddsd  8(%rdi), %xmm0, %xmm0
    vaddsd  16(%rdi), %xmm0, %xmm0
    vaddsd  24(%rdi), %xmm0, %xmm0
    addq    $32, %rdi
    cmpq    %rdi, %rax
    jne     .L31
# ... stuff ...

I observe a ~14x speedup with the hand-rolled assembly here. And yet, in real life, I would basically never hand-roll assembly here.

^[2]: Whenever someone says something is true "in some sense", that means that thing is false.

At least one person on this forum is associated with people in the tpot/post-rat scene on twitter.

Specifically I'm calling "first full paragraph was written by OP, rest was written by LLM told to continue the thought"

Happens all the time with newer players playing at a casino. They come in, sit down all fast and loose, every veteran at the table can see it a mile a while. What happens on every hand? If the new guy leads a bet ... fold,fold,fold,fold,fold. You just wait them out. Eventually, they get bored (quite quickly, actually) because there is "no action at this table!"

If you're at a table that looks like this, and you want to make money, you're at the wrong table. Honestly "they (plural!) get bored" should have already told you that there was more than one skilled player at the table, and therefore that it was a bad table.

Why would bytedance fake it, or why would some specific employees of bytedance fake it? The former is a hard question (bytedance the company does not need or benefit from prestige), but the latter is much easier (individual employees absolutely do).

being unable to run the reactors in the summer because the water level is so low in the various inflow rivers to the nuclear power plants can't be used for steam generation

I roll to disbelieve that this is a real problem rather than something like "French law says that there's a temperature limit on the water you can discharge into the river and one single summer during a heatwave all of the water in the river for one plant was already at that temperature so that particular nuclear plant was legally disallowed from using river water for cooling for 2 days".

View according to peakfinder matches. At 120km, Cape Breton to Newfoundland is nowhere near the longest line of sight photographed - as of now that record is 484 km.

Did the Trump admin change the definition back to exclude border turn backs again?

I would be very surprised if essay-writing was better than graph theory or logic.

I would guess logic > essay writing >> graph theory. "Understand the logical flow of this program" and "write a big block comment or internal doc explaining how some janky legacy thing work" both come up pretty much daily for me. I don't remember the last time a graph traversal problem more complicated than bfs/dfs came up in my actual work.

but that's a really artificial construction

It sure is. That's kind of the point, I left a comment in more depth elsewhere in the thread.

The scheme is deliberately designed so that your awakening doesn't matter anymore

That is rather the point, yeah. The goal is to show that the probabilities you use to guide your decision should be based on how that decision will be used.

Let's say Sleeping Beauty is actually a mind upload, and if the coin comes up heads I will run two copies of her and only use her answer if the two copies match (but the hardware is very good and the two copies will match 99.999% of the time), and if the coin comes up tails I will only run one copy. Halfer or thirder?

How about if, in the heads case, instead of running two independent copies of her entire mind, I run two independent copies of each neuron's computations, and at each step, if there's a mismatch, I run a third copy as a tiebreaker (but mismatches are incredibly rare). Halfer or thirder?

Actually it turns out I'm just using a less efficient algorithm if the coin came up heads which happens to use twice as much compute. Halfer or thirder?

Eh, I think that the issue is that probabilities are facts about our model of the world, not facts about the world itself, and we will use different models of the world depending on what we're going to use the probability for. If Sleeping Beauty is asked each time she awakens for a probability distribution over which side the coin landed on, and will be paid on Wednesday an amount of money proportional to the actual answer times the average probability she put on that answer across wakings, she should be a halfer to maximize payout. If instead she gets paid at the time she is asked, she should be a thirder.

But if you think there should be some actual fact of the matter about the "real" probability that exists out in the world instead of in your model of the world, you will be unsatisfied with that answer. Which is why this is such an excellent nerd snipe.

p.s. you might enjoy the technicolor sleeping beauty problem.