site banner

Friday Fun Thread for September 09, 2022

We don't have the bot, so let me step in: this thread is not for serious in depth discussion of weighty topics, this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

8
Jump in the discussion.

No email address required.

Yay! Funnily enough I made a thread too, but I guess you beat me through the moderation queue.

Without further adieu. I've been continuing my adventures in x86 assembly.

When last we met, I had just gotten Visual Studio Code to compile, link and run my asm files in DOSBOX. Since then I wrote a javascript file which node.js can execute which will ingest the dbg file from NASM, a map file from ALINK, and the exported breakpoints from Visual Studio Code, and create a list of breakpionts for DOSBOX to set in memory. Using a patched version of DOSBOX which will autoexec debugger commands, it ingests this file, and viola! My breakpoints set in Visual Studio Code get set in DOSBOX. I'm still refining the script, as well as the tasks in VSC. I went from writing tasks for each asm file, to writing a generic tasks which will compile, link and execute any single asm file. I don't have it adding the breakpoints yet though.

This has made it super easy to spit out a random asm file to test or experiment with one specific feature. I went to the effort of creating some code snippets to fill out new asm files with everything I need to start a new program. I also began taking notes in a composition notebook. Because there is just something about synthesizing lots of information from numerous websites, books and forum posts into exactly the concise information you need and writing it down, which seems to etch it into stone in your memory.

For example, the 8088/8086 has 4 "generic" registers. AX, BX, CX and DX. Except they are generic in name only. Sure, you can use them in a brute force fashion. Except near as I can tell, AX, the accumulator, is faster, if not required, in most math functions. BX, the base register, is the only register that allows you to use various forms of indirect addressing in the data segment. CX, the count register, is used for any instruction that repeats like LOOP or assorted bit rotations. DX, the data register, often functions as an auxilary for AX in the case of division or multiplication. Knowing these things really helps you plan ahead your register usage, but wasn't really spelled out super clearly in the sources I saw. So I spent an hour making a single page of notes, scanning all the 8088/8086 instructions for their special usage of AX, BX, CX and DX. I think it was productive.

The NASM documentation lies. A lot of contemporary 8088 documentation I'm reading uses what I can only assume is the Intel assembler's syntax. So indirect addressing is shown as "[di]", "label[bx]" or "label[bx][si]" for increasingly convoluted forms of offsetting and indexing. NASM claims it doesn't support this. That the entire address needs to be either inside or outside of square brackets. So "label[bx][si]" should be "[label+bx+si]. This is a blatant lie as I've compiled "label[bx]", "label[bx,di]" and "label[bx+di]". In fact, "label[bx,di]", "label[bx+di]", "[label+bx,di]" and "[label+bx+di]" all compile to the exact same byte code. More or less the only illegal syntax in NASM is the original Intel? syntax of "label[bx][di]". It won't support multiple square brackets.

But it's nice having a toolchain where I can just belt out a quick asm test file to inquire about these things. Sure, it's DOSBOX, not a real machine. But I'm mostly confident DOSBOX has it's shit together for basic stuff like this. As I've delved deeper into BIOS, DOS and VGA interrupt programming, some of the documentation I've read goes deep into all the assorted bugs various hardware possessed. For example, a lot of hardware erroneously wipes out the BP register on INT 10h calls! That's kind of a big deal. I'd be shocked if DOSBOX reproduced this. But maybe!

Anyways, it's been fun. Hope to have my first very simple assembly game belted out in the next few days. Also, I shouldn't be surprised, but it's remarkable how fucking small assembly executables are. These little test programs often weigh in at less than 200 bytes. Several of KB for even a simple "Hello World" in any modern language isn't uncommon.

The a-d registers, to the best of my knowledge, are named generic because it's in contrast to the other registers with very specific functions. For example, sp and ip aren't something you'd ever store data in just for funsies, so ax is very generic by comparison (even if it does have some special uses for certain instructions).

I imagine that the special uses come from the need to reduce binary size back in the day. If you needed to specify "mul rax, rcx" instead of just "mul rcx" that's extra encoding which would add up over time. Nowadays not such a big deal, but at the time the instruction set was designed it would've been quite a big deal.

For what it's worth, in long mode you get 8 extra generic registers (r8-r15), and those really are generic if the OG generic registers aren't generic enough for your taste. 😉

Although the 80386 Programmer's Manual lists eax, ebx, ecx, and edx as "general-purpose registers", you sometimes see them referred to as "accumulator", "base", "counter", and "data". Example: the rep instruction works on cx or ecx as a loop counter and you don't have any choice in the matter. Not sure whether that's documented intent or folklore.