@lagrangian's banner p

lagrangian


				

				

				
0 followers   follows 0 users  
joined 2023 March 17 01:43:40 UTC
Verified Email

				

User ID: 2268

lagrangian


				
				
				

				
0 followers   follows 0 users   joined 2023 March 17 01:43:40 UTC

					

No bio...


					

User ID: 2268

Verified Email

it takes about one clock cycle for light to traverse a processor. this doesn't prove you wrong, quite, since there's still the possibility of a processor doing something much more clever with the distance it has than it does today.

i got nerd sniped here real hard, so here's a fundamental physics analysis (from Claude and I). Basically, three constraints (below) -> min latency of an operation is ~1e-13s, a 1e4 speedup from today.

That is far less than the "LLM cost/kernel syscall" ratio today, so current LLMs can never be fast enough. As to future algorithms that are magically better enough to close the gap, my best argument is "ehh I doubt it, definitely not soon."

  1. Margolus–Levitin: with a given energy, you can only switch between two states at a max frequency (min latency)
  2. Landauer: switching between states must dissipate a minimum amount of energy
  3. Thermodynamics 101: energy can only be dissipated so quickly

you are missing the point. it would add massive amounts of latency at the lowest level of the stack, and this ends up costing maybe a factor of 1000 even in the optimistic case. this is not "only gamers notice." this is "absolutely everything is uselessly slow"

latency is not ~ever picoseconds to start with - a clock cycle is 1/4GHz = 1/4 nanosecond = 250 picoseconds, and nothing is faster than that.

Realtime LLM code generation will absolutely never replace the core ("kernel") of an OS. The latency is unacceptable, even putting aside correctness and security.

we'll have microchips cheap enough for regular consumers to buy by the dozen from China that each make the entirety of Anthropic's current data centers look like a basic calculator in comparison.

Maybe. I doubt it, but it's not wildly unreasonable to think so. We could absolutely improve LLM throughput/efficiency with better hardware or algorithms.

When it's basically trivial for an entry-level PC to run the equivalent of 100 Mythoses at 100x the speed that we can today, I feel like it won't add enough overhead to the user experience to be noticeable.

No. You are conflating LLM throughput (/efficiency) with latency.

We can improve latency, to a degree. But, we will never have LLM + live-written OS code + compilation (whether via LLM or gcc etc) have latency close enough to pre-written OS code + gcc to not be noticeable, or even to be acceptable. This is a context where shaving off a single clock cycle matters.

A single LLM weight matrix multiplication takes ~100 million cycles, most spent on memory transfer of the weights. Even a radically more efficient algorithm has to have some amount of parametrization in it from an information theoretic standpoint - it's going to mean wayyy more cycles than highly tuned, handwritten in advance, code.

E.g. one pretty obvious thought I had was about LLM-based operating systems to replace Windows and Linux and iOS in the future, which won't need any software specifically written for it - just write any software in any language, including made-up language or pseudo-code, and the LLM would just "compile" that to the 1s and 0s required for whatever CPU to interpret to accomplish the logic of that code (this might last for a hot minute until it needs just some general list of specs

yeah that's not happening. an OS has to be extremely fast and secure. clock cycles matter. an LLM is a deeply terrible way to handle the lowest layer of hardware interaction.

the salvageable version of this idea is closer to an LLM writing whatever shitty electron app you need on the fly, running on a traditional OS and traditional app development frameworks (electron).

I had a similar project once, this makes sense to me. You can use heat shrink over the joint if you're having trouble getting it to seal.