Do you know what your Code is doing?
XOR swapping - Assembler
Swapping Integers
In CS textbooks, often we see code snippets like this:
*y = *x ^ *y;
*x = *x ^ *y;
*y = *x ^ *y;
Some professor will tell you it's brilliant - no temp variable! Look how elegant! Then they move on before you realize what actually happens.
Unlike the usual technique for swapping two values, we do not need a third location to temporarily store one value while we are moving the other.
What's the Deal here? This exploits XOR's self-inverse property and C's right-to-left evaluation of compound assignments:
XOR with itself = 0
B ^ B = 0
XOR with 0 = identity
A ^ 0 = A
XOR is associative
(A ^ B) ^ C = A ^ (B ^ C)
Applying these Statements:
A ^ B ^ B = A ^ (B ^ B) [associative]
= A ^ 0 [self-inverse]
= A [identity]
However, let's see what this actually does.
We could write some simple C Code like:
int main() {
int x = 42, y = 137;
int *a = &x, *b = &y;
*a ^= *b ^= *a ^= *b;
return 0;
}
What is this doing on the bit-level?
Initial:
x = 42 = 00101010
y = 137 = 10001001
(1)
00101010 (x = 42)
^ 10001001 (y = 137)
-----------
10100011 (x = 163)
(2)
10001001 (y = 137)
^ 10100011 (x = 163)
-----------
00101010 (y = 42) This is original x
(3)
10100011 (x = 163)
^ 00101010 (y = 42)
-----------
10001001 (x = 137) This is original y
... which is basically encoding/decoding shennanigans.
When compilnig this with the tiny c compiler:
tcc -s -fomit-frame-pointer -O2 xorswap.c
Note the -O2 flag here; GCC would simply erase this Code (for optimization, since it's doing not really anything) I just wanted to point this out, tcc does not really optimize it the way GCC does, so here we go:
main:
.LFB0:
.cfi_startproc
movl $42, -20(%rsp)
movl $137, -24(%rsp)
leaq -20(%rsp), %rax
movq %rax, -8(%rsp)
leaq -24(%rsp), %rax
movq %rax, -16(%rsp)
movq -8(%rsp), %rax
movl (%rax), %edx
movq -16(%rsp), %rax
movl (%rax), %eax
xorl %eax, %edx
movq -8(%rsp), %rax
movl %edx, (%rax)
movq -8(%rsp), %rax
movl (%rax), %edx
movq -16(%rsp), %rax
movl (%rax), %eax
xorl %eax, %edx
movq -16(%rsp), %rax
movl %edx, (%rax)
movq -16(%rsp), %rax
movl (%rax), %edx
movq -8(%rsp), %rax
movl (%rax), %eax
xorl %eax, %edx
movq -8(%rsp), %rax
movl %edx, (%rax)
movl $0, %eax
ret
.cfi_endproc
.. Our XOR Swapping in Assembly. Shocking.
Look at what the "clever" one-liner *a ^= *b ^= *a ^= *b actually generates - it's horrifically inefficient. Beautiful on a blackboard. Garbage in silicon.
For each XOR operation, the compiler has to: 1. Load the pointer from stack (movq -8(%rsp), %rax) 2. Dereference it to get the value (movl (%rax), %edx) 3. Load the other pointer 4. Dereference that one too 5. Do the XOR (xorl %eax, %edx) 6. Load the pointer AGAIN (can't assume it hasn't changed!) 7. Store the result back
That's ~30 instructions for what should be 3 XORs.
Undefined Behavior?
That one-liner *a ^= *b ^= *a ^= *b is actually undefined behavior in C. You're modifying *a twice without a sequence point. The compiler could legally make demons fly out of your nose.
TCC's literal translation reveals the truth:
- Uses MORE memory (storing all those pointers)
- Takes MORE instructions
- Has ZERO performance benefit
- Is actually undefined behavior
Compare to the "dumb" temp variable approach:
movl -20(%rsp), %eax # tmp = x
movl -24(%rsp), %edx # get y
movl %edx, -20(%rsp) # x = y
movl %eax, -24(%rsp) # y = tmp
4 instructions.
The compiler is smarter than you:
There is no performance advantage to this way of swapping, it's just an intellectual amusement.
Clever tricks from the 1970s often make terrible code today. The XOR swap is a beautiful mathematical curiosity that should stay in textbooks.