because landslide's code gets invoked at every instruction of the guest's execution, we must have a notion of a scheduling operation being "still in progress" - that is, after triggering a timer interrupt, there will be several instructions before a target thread (one we decided to switch to) actually starts running. if we take note of when the target thread starts up, we can provide a few guarantees about the instructions that will be run until then - namely, that the kernel must be executing in an interrupt handler and/or the context switcher (and NOT running the "meaty bits" of any other thread's code-path). this may seem obvious, but it is still an invariant that i would like to rely on, so i expressed it as an assert in the code:
assert(ACTION(scheduler, context_switch) || HANDLING_INTERRUPT(scheduler));
imagine my surprise, testing the implementation, when this assert tripped!
after a bit of debugging, i discovered that the invariant violation was happening at the instruction immediately following the instruction at which i tried to trigger a timer interrupt. it turns out that, in some cases, simics may decide to delay interrupt processing by a few instructions (seemingly by a non-deterministic amount, too) after i set the CPU's pending interrupt flags.
the fix (ensuring that when i decide to trigger a timer interrupt it is actually received immediately) is nothing special; what is important is to realise that my programming environment had some peculiarity that broke an assumption that i didn't even realise i was making when i established the invariants of my own code. so upon finding this new assumption (and writing code to make sure it worked), i added another assert:
if (scheduler->just_triggered_timer_interrupt) {
assert(get_current_eip() == get_timer_wrap_begin() &&
"simics attempted to delay our interrupt! :<");assert(get_current_eip() == get_timer_wrap_begin() &&
scheduler->just_triggered_timer_interrupt = false;
}
now if something else of this nature goes wrong, i will know immediately, with a useful error message to boot. but imagine if i'd never written that first assert to begin with? simics could have merrily delayed all my interrupts for however long it wanted, i would never have known, and wherever i would decide to trigger interrupts (i.e., notionally good places for exposing race conditions) would have no bearing on when they actually happened! i could spend months on this project and it would never work right and i might never know why.
use asserts, fellow hackers - not just comments or thoughts in your heads. you'll be happy for it later.
No comments:
Post a Comment