Introduction

Gradescope is great. I've been a big believer in automatically-graded assignments with rapid feedback ever since my undergraduate systems professor Dr. Carroll at MTSU assigned us the famous CMU malloc lab. In the malloc lab, you implement a memory allocator. If your allocator corrupts memory, you get a zero (deserved). If your allocator works, it is benchmarked rigorously and based on your performance, a grade is printed to the console before the teacher ever sees the assignment. Beautiful.

So when I was tasked with TAing for CS367, a refresh of the previous CS390R, I wanted to help design the course around automatic grading. Partially because the experience as students is great, and partially because grading binary exploitation assignments is extremely difficult.

So I set out the build an autograder on gradescope.

The Horrors

Gradescope's autograders are very easy. Write a dockerfile that pulls in programs that access static paths to grade each assignment. No problem, I hacked up some python in like 15 minutes to run students' programs, pipe them into the pwnable binaries and check to see if stderr contained the contents of flag.txt and whatever we asked for. I even made sure it tested numerous flag files and various environment variables to make sure nobody got cheeky and hardcoded a stack address.

I set up the grader and let people submit some assignments. A few were successful, but the complaints started rolling in.

My script isn't working!

It works on my machine!

Oh.... I forgot ASLR. ASLR is an exploit mitigation that uses randomized base addresses to frustrate hackers. Students who were successful were already extremely experienced and figured out how to get "peeks" - views into the address space that let them bypass ASLR. I conferred with the instructor, Lurene Grenier, and she decided that over the course of the class we would keep ASLR off so we could focus on stack exploits, ROP, OS internals, and heap exploits.

No problem, we'll just

echo 0 > /proc/sys/kernel/randomize_va_space

Wait... why isn't it working. Okay I'm a systems programmer. Let me write a quick C wrapper that executes the vulnerable binary after setting personality(ADDR_NO_RANDOMIZE);. Surely that will work.

Nope... what?

It turns out you can't set docker. There's a GitHub issue about it, with another instructor asking about the issue for the same reason I am. He mentioned setting up a QEMU image, but not having fully finished it. Sounds insane, challenge accepted.

The details here aren't super interesting. I grabbed a debian cloud-init image, spun it up with qemu, copied student assignments and the vulnerable binary into it, and had my grader run all of them over SSH. The only real challenge here was making sure to handle timeouts properly.

You can't do accelerated qemu inside docker, so it's unaccelerated and kinda slow. Sure beats waiting 2 weeks to find out what your grade is, or having to manually grade purposely shady exploit code on your computer, though!

As a bonus, sandboxing students code inside a qemu image means that students who get a little too cheeky about their hacking and hack the grader have a much harder time doing so. If they can write a qemu escape they probably don't need to be taking the class.