While implementing the x64 built-in assembler for Delphi 64bit, I got to “know” the AMD64/EM64T architecture a lot more. The good thing about the x64 architecture is that it really builds on the existing instruction format and design. However, unlike the move from 16bit to 32bit where most existing instruction encodings were automatically promoted to using 32bit arguments, the x64 design takes a different approach.
One myth about the x64 instructions is that “everything’s wider.” That’s not the case. In fact many addressing modes which were taken as absolute addresses (actually offsets within a segment, but the segments are 4G in 32bit), are actually now 32bit relative offsets now. There are very few addressing modes which use a full 64bit absolute address. Most addressing modes are 32bit offsets relative to one of the 64bit registers. One interesting addressing mode that is “implied” in many instruction encodings is the notion of RIP-relative addressing. RIP, is the 64bit equivalent of the 32bit EIP, or 16bit IP, or Instruction Pointer. This represents from which address the CPU will fetch the next instruction for execution. Most hard-coded addresses within many instructions are now relative offsets from the current RIP register. This is probably the biggest thing you have to wrap your head around when moving from 32bit assembler.
Even though many instructions will implicitly use the RIP-relative addressing mode, there are some instruction addressing modes that continue to use a 32bit offset, and are not RIP-relative. This can really bite you when doing simple mechanical translations from 32bit to 64bit. These are the SIB form with a 32bit (or even 8bit) offset. What can happen is that you end up forming an address that can only address 32bits, and is thus limited to addressing items below the 4G boundary! And this is a perfectly legal instruction! To demonstration this, consider the following 32bit assembler that we’ll translate to 64bits.
var
TestArray: array[0..255] of Word;
function GetValue(Index: Integer): Word;
asm
MOV AX,[EAX * 2 + TestArray]
end;
Let’s now translate this for use in 64bit using a simple mechanical translation.
var
TestArray: array[0..255] of Word;
function GetValue(Index: Integer): Word;
asm
MOVSX RAX,ECX
MOV AX,[RAX * 2 + TestArray]
end;
Pretty straight forward, right? Not so fast there partner. Let’s see; I know that I need to use a full 64bit register for the offset but since Integer is still 32bits, I need to “sign-extend” it to 64bits. The venerable MOVSX (Move with sign extension) instruction “promotes” the signed 32bit offset to 64bits while preserving the sign. Nope, that’s not a problem. The only thing I changed in the next instruction was EAX to RAX, so how could that be a problem? Well, when you compile this code you’ll get a rather strange error message:
[DCC Error] Project7.dpr(18): E2577 Assembler instruction requires a 32bit absolute address fixup which is invalid for 64bit
Huh? Remember the little note above about the SIB instruction form? Because the RAX (or EAX in 32bit) register is being scaled (the * 2), this instruction must use the SIB (Scale-Index-Base) instruction form. When using the SIB form RIP isn’t considered when calculating the actual address. Additionally, the offset encoded in the instruction can still only be 8 or 32bits. No 64bit offsets.
In 32bit, the compiler would generate a “fixup” to ensure that the encoding of the instruction offset field to the global “TestArray” variable was properly “fixed up” at runtime should the image happened to be relocated to another address. This is a 32bit absolute address. The 64bit version of this instruction, while actually a truly valid instruction, would only have 32bits in which to place the address of “TestArray.” The “fixup” generated would have to remain 32bit. This could lead to creating an image that were it ever relocated above the 4G boundary, would likely crash at best or read the wrong memory address at worst!
Ok, so now what? There is a SIB form that we can use to work around this problem, but it requires burning another register. The good news is that we now have another 8 registers with which to work. So if you have a rather complicated chunk of 32bit assembler code that burns up all the existing usable 32bit registers, you now have another group of registers that can help solve this problem without having to rework the code even more. So here’s how to fix this for 64 bit:
var
TestArray: array[0..255] of Word;
function GetValue(Index: Integer): Word;
asm
MOVSX RAX,ECX
LEA R10,[TestArray]
MOV AX,[RAX * 2 + R10]
end;
Here, I used the volatile R10 register (R8 an R9 are used for parameter passing) to get the absolute address of TestArray using the LEA instruction. While the “address” portion of this instruction is still 32bits, it is taken as RIP-relative. In other words, this value is the “distance” from the next instruction to the variable TestArray in memory. After this instruction, R10 now contains a true 64bit address of the TestArray variable. I must still use the SIB form in the next instruction, but instead of a hard-coded “offset” I use the value in R10. Yes, there is still an implicit offset of 0, which uses the 8bit offset form.
You can see that mindless, mechanical translations of assembler code is likely to cause you some grief due to some of the subtle changes in instruction behaviors. For this very reason, we strongly recommend you use all Object Pascal code instead of resorting to assembler when possible. This will not only better ensure that your code will more likely move unchanged to other processor architectures (think ARM here folks), but you’ll not have to worry about such assembler gotchas in the future. If you’re using assembler code because “it’s faster,” I would encourage you to look closely at the algorithm used. There are many cases where the proper algorithm written in Object Pascal will yield greater gains than a simple translation to assembler using the same algorithm. Yes there are some things which you simply must do in assembler (strange, off-beat calling conventions, “LOCK” instructions for concurrency, etc…), but I would contend that many assembler functions can be moved back to Object Pascal with little impact on performance.