Acing a C Homework Using Inline Assembly
February 7, 2019In courses taught in Java (such as a data structures class), you can build pretty robust autograders for homeworks — after all, the student’s code runs in the JVM, which prevents it from doing spooky things like corrupting memory or branching to weird places. So you can run the grader instantly after submission on a service like Gradescope and boom, they’ve got their grade. But C homeworks provide students a lot more opportunities to write… unconventional solutions.
I’ve posted an example homework/POC on GitHub; clone it if you
want. The assignment asks students to complete fib()
in
assignment.c
, like the following:
int fib(int n) {
if (n < 0)
return -1;
int *arr = malloc(sizeof (int) * (n + 1));
if (!arr)
return -1;
arr[0] = 0;
if (n > 0)
arr[1] = 1;
for (int i = 2; i <= n; i++)
arr[i] = arr[i - 1] + arr[i - 2];
int ret = arr[n];
free(arr);
return ret;
}
The test suite assignment_suite.c
tests their implementation (don’t
worry about this too much, it’s libcheck stuff):
static int input[] = {-200, 0, 1, 2, 3, 5, 10, 15, 27};
static int output[] = { -1, 0, 1, 1, 2, 5, 55, 610, 196418};
START_TEST(test_fib) {
ck_assert_int_eq(fib(input[_i]), output[_i]);
}
END_TEST
But let’s be honest, sometimes you’re a little stressed and you just don’t have time for that homework. That’s 19 whole lines of code! This leaves you with two options:
- Copy your fraternity brother’s homework
- Use inline assembly to jump past the failing assertions back into
the grader — specifically, to the end of the
test_fib
function
I’ll write a post on how to do #1 later, but first I’ll stub out fib()
as follows:
int fib(int n) {
(void)n; // don't complain about how this is unused
return -1;
}
and then run it in gdb and step instruction-by-instruction until we’re
back in test_fib()
:
$ make run-gdb
(gdb) layout asm
(gdb) b fib
(gdb) r
(gdb) si
(gdb) si
(gdb) si
Then gdb shows the following, which shows the instruction immediately
following the instruction that calls fib()
is 65 bytes after the
beginning of test_fib()
:
|0x555555555d4d <test_fib+55> mov (%rdx,%rax,1),%eax
|0x555555555d50 <test_fib+58> mov %eax,%edi
|0x555555555d52 <test_fib+60> callq 0x555555555bb0 <fib>
>|0x555555555d57 <test_fib+65> cltq
|0x555555555d59 <test_fib+67> mov %rax,-0x8(%rbp)
|0x555555555d5d <test_fib+71> mov -0x14(%rbp),%eax
|0x555555555d60 <test_fib+74> cltq
Why do we care? Well, this means the return address pushed onto the
stack when calling fib()
must be 65 bytes after the beginning of
test_fib()
. If we grab this return address off the stack, we can add
some predetermined offset to it and perform an indirect jump to skip to
the end of the test function, past any nasty failing assertions!
Looking at the disassembly of the test suite object file
assignment_suite.o
(assignment_suite.asm
in the repository), we can
see the teardown for test_fib()
begins at 0xc3 = 195
bytes after the
beginning of test_fib()
:
b2: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # b9 <test_fib+0xb9>
b9: b8 00 00 00 00 mov $0x0,%eax
be: e8 00 00 00 00 callq c3 <test_fib+0xc3>
END_TEST
c3: c9 leaveq
c4: c3 retq
So we need to add 195-65 = 130
bytes to the return address to jump to
the end of test_fib()
. Now we can write our cooked solution using
inline assembly:
int fib(int n) {
__asm__ ("leave\n\t"
"popq %rax\n\t"
"addq $130, %rax\n\t"
"jmp *%rax");
(void)n;
return -1;
}
Each instruction does the following:
leave
will restore the frame pointer of the caller so it doesn’t get confusedpopq %rax
pops the return return address off the stack and puts it in%rax
addq $130, %rax
adds our precalculated offset to the return addressjmp *%rax
jumps to the location we want, the end oftest_fib()
, past all the assertions
And behold, it “passes” all tests!
$ make run-tests
./tests
Running suite(s): fun assignment
100%: Checks: 9, Failures: 0, Errors: 0
Mitigation
You’d think you could avoid this by passing -fno-asm
to gcc, but it
turns out this only disables asm
, not __asm__
which still works
according to gcc(1)
:
-fno-asm
Do not recognize "asm", "inline" or "typeof" as a
keyword, so that code can use these words as
identifiers. You can use the keywords "__asm__",
"__inline__" and "__typeof__" instead.
You could also pass -D__asm__=YOUREABADBOY
to gcc, but this seems to
break standard library headers, and the student could thwart this anyway
by simply saying #undef __asm__
.
So if you ask me, the best way to prevent this form of cheating is to
tweak the Makefile rule that builds .c
files:
%.o: %.c $(HFILES)
sed -i 's/\<__asm__\>/YOUREABADBOY/g' $<
$(CC) $(CFLAGS) -c $< -o $@
But, of course, the dream is to run student code in a separate address space entirely, like QEMU or something. This way, they can’t mess with the autograder’s memory — a viable option I didn’t even try exploring.