Do you know what your Code is doing?

XOR swapping - Assembler

Swapping Integers

In CS textbooks, often we see code snippets like this:

*y = *x ^ *y;
*x = *x ^ *y;
*y = *x ^ *y;

Some professor will tell you it's brilliant - no temp variable! Look how elegant! Then they move on before you realize what actually happens.

Unlike the usual technique for swapping two values, we do not need a third location to temporarily store one value while we are moving the other.

What's the Deal here? This exploits XOR's self-inverse property and C's right-to-left evaluation of compound assignments:

XOR with itself = 0
B ^ B = 0

XOR with 0 = identity
A ^ 0 = A

XOR is associative
(A ^ B) ^ C = A ^ (B ^ C)

Applying these Statements:

A ^ B ^ B = A ^ (B ^ B)    [associative]
          = A ^ 0          [self-inverse]
          = A              [identity]

However, let's see what this actually does.

We could write some simple C Code like:

int main() {
    int x = 42, y = 137;
    int *a = &x, *b = &y;
    *a ^= *b ^= *a ^= *b;
    return 0;
}

What is this doing on the bit-level?

Initial:
x = 42  = 00101010
y = 137 = 10001001

(1)
00101010  (x = 42)
^ 10001001  (y = 137)
-----------
  10100011  (x = 163)

(2)
10001001  (y = 137)
^ 10100011  (x = 163)
-----------
  00101010  (y = 42) This is original x


(3)
10100011  (x = 163)
^ 00101010  (y = 42)
-----------
  10001001  (x = 137) This is original y

... which is basically encoding/decoding shennanigans.

When compilnig this with the tiny c compiler:

tcc -s -fomit-frame-pointer -O2 xorswap.c

Note the -O2 flag here; GCC would simply erase this Code (for optimization, since it's doing not really anything) I just wanted to point this out, tcc does not really optimize it the way GCC does, so here we go:

main:
.LFB0:
	.cfi_startproc
	movl	$42, -20(%rsp)
	movl	$137, -24(%rsp)
	leaq	-20(%rsp), %rax
	movq	%rax, -8(%rsp)
	leaq	-24(%rsp), %rax
	movq	%rax, -16(%rsp)
	movq	-8(%rsp), %rax
	movl	(%rax), %edx
	movq	-16(%rsp), %rax
	movl	(%rax), %eax
	xorl	%eax, %edx
	movq	-8(%rsp), %rax
	movl	%edx, (%rax)
	movq	-8(%rsp), %rax
	movl	(%rax), %edx
	movq	-16(%rsp), %rax
	movl	(%rax), %eax
	xorl	%eax, %edx
	movq	-16(%rsp), %rax
	movl	%edx, (%rax)
	movq	-16(%rsp), %rax
	movl	(%rax), %edx
	movq	-8(%rsp), %rax
	movl	(%rax), %eax
	xorl	%eax, %edx
	movq	-8(%rsp), %rax
	movl	%edx, (%rax)
	movl	$0, %eax
	ret
	.cfi_endproc

.. Our XOR Swapping in Assembly. Shocking.

Look at what the "clever" one-liner *a ^= *b ^= *a ^= *b actually generates - it's horrifically inefficient. Beautiful on a blackboard. Garbage in silicon.

For each XOR operation, the compiler has to: 1. Load the pointer from stack (movq -8(%rsp), %rax) 2. Dereference it to get the value (movl (%rax), %edx) 3. Load the other pointer 4. Dereference that one too 5. Do the XOR (xorl %eax, %edx) 6. Load the pointer AGAIN (can't assume it hasn't changed!) 7. Store the result back

That's ~30 instructions for what should be 3 XORs.

Undefined Behavior?

That one-liner *a ^= *b ^= *a ^= *b is actually undefined behavior in C. You're modifying *a twice without a sequence point. The compiler could legally make demons fly out of your nose.

TCC's literal translation reveals the truth:

Compare to the "dumb" temp variable approach:

movl -20(%rsp), %eax  # tmp = x
movl -24(%rsp), %edx  # get y
movl %edx, -20(%rsp)  # x = y
movl %eax, -24(%rsp)  # y = tmp

4 instructions.

The compiler is smarter than you:

There is no performance advantage to this way of swapping, it's just an intellectual amusement.

Clever tricks from the 1970s often make terrible code today. The XOR swap is a beautiful mathematical curiosity that should stay in textbooks.