2015-10-15

PLDI evaluation

Running experiments for PLDI has begun in earnest. My evaluation plan calls for 800 CPU-days of testing:


80 P2 thread libraries * 6 test cases
79(?) Pintos kernels * 2 test cases

638 codebase+testcase pairs total

For each one, a 10-hour control experiment, a 10-cpu * 1-hour "live" experiment, and a 10-cpu * 1-hour "data race false negative" experiment (don't worry, the paper will explain it... should it get published!).

(80*6+79*2)*3*10 = 19,140 cpu-hours = 797.5 cpu-days.

And 200 CPUs to do it with.


2015-09-09

big green button

Hello internet, it's been a while.

Tonight I'm having a "1% moment" of research. That is, 99% of the time, I either have my head on the grindstone, or am endlessly worrying and guilting myself about not getting enough done, being an impostor, etc.; but that other 1% is why I'm still a grad student. Because sometimes at 5 in the morning I finish writing mindless automation glue code, finish patching horribly broken anonymous student code, and finish debugging the bugs in my bug-finding software (ha), and finally reach a state where I can hit a big green button marked "GO RUN THE EXPERIMENT" and watch the computer do something absolutely frickin' amazing.

My current project is an extension of Landslide that automatically searches for new preemption points (PPs) during the course of a systematic test, adds new state spaces to explore using those PPs, and figures out with state space estimation which state spaces are most likely to finish testing in a given CPU budget. I'm calling it "iterative deepening" by analogy with the chess AI technique, and you can find my latest talk slides here for more details.

But mostly the purpose of this post is for me to share some eye-candy. Here's what Landslide looks like when it's feeling victorious.



The key thing to note here is that bugs are only found in state spaces with data-race preemption points, which only Landslide's iterative deepening framework is capable of identifying and using. IOW, these bugs would be missed by any other systematic testing tool that interposed only on thread library calls.

I've finally got a conference deadline in my sights where getting accepted seems realistic. It's been a looong build-up to this point. Keep your eyes peeled.