🏎️ Signal handler/timeout race fix! (#507)

iximeow · web-flow · commit 0b51fe7b68a7 · 2020-04-28T16:30:10.000-07:00
* test that a SIGALRM sent to a guest thread handling a signal is well-behaved

* additional test to confirm worst-case spurious sigalrm is well-behaved
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,8 @@
 
   Creating or resetting an instance no longer implicitly runs the start function. Embedders must ensure that `run_start()` is called before calling any other exported functions. `Instance::run()` will now return `Err(Error::InstanceNeedsStart)` if the start function is present but hasn't been run since the instance was created or reset.
 
+- Corrected a race condition where a `KillSwitch` fired while lucet-runtime is handling a guest fault could result in a SIGALRM or panic in the Lucet embedder.
+
 [start-function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start
 
 ### 0.6.1 (2020-02-18)
diff --git a/docs/src/lucet-runtime/killswitch.md b/docs/src/lucet-runtime/killswitch.md
@@ -367,11 +367,66 @@ handlers used with Lucet is that they may not lock on `KillState`'s
 
 As a consequence, a `KillSwitch` may fire during the handling of a guest fault.
 `sigaction` must mask `SIGALRM` so that a signal fired before the handler exits
-is discarded. If the signal behavior is to continue without effect, leave
-termination in place and continue to the guest. Otherwise the signal handler
-has determined it must return to the host, and it disables termination on the
-instance to avoid sending an erroneous SIGALRM immediately after swapping to
-the host context.
+does not preempt the handler. If the signal behavior is to continue without
+effect, leave termination in place and continue to the guest. A pending SIGALRM
+will be raised at this point and the instance will exit. Otherwise, the signal
+handler has determined it must return to the host, and must be sensitive to a
+possible in-flight `KillSwitch`...
+
+#### Instance-stopping guest fault with concurrent `KillSwitch`
+
+In this case we must consider three constraints:
+* A `KillSwitch` may fire and deliver a `SIGALRM` at any point
+* A `SIGALRM` may already have been fired, pending on the handler returning
+* The signal handler must return to host code
+
+First, we handle the risk of a `KillSwitch` firing: disable termination. If we
+acquire `terminable`, we know this is to the exclusion of any `KillSwitch`, and
+are safe to return. Otherwise, some `KillSwitch` has terminated, or is in the
+process of terminating, this guest's execution. This means a `SIGALRM` may be
+pending or imminent!
+
+A slightly simpler model is to consider that a `SIGALRM` may arrive in the
+future. This way, for correctness we only have to make sure we handle the
+signal we can be sent! We know that we must return to the host, and the guest
+fault that occurred is certainly more interesting than the guest termination,
+so we would like to preserve that information. There is no information or
+effect we want from the signal, so silence the alarm on `KillState`. This way,
+if we recieve the possible `SIGARLM`, we know to ignore it.
+
+An important question to ask is "How do we know the possible `SIGARLM` must be
+ignored? Could it not be for another instance later on that thread?" The answer
+is, in short, "We ensure it cannot!"
+
+The `SIGARLM` is thread-directed, so to be an alarm for some other reason,
+another instance would have to run and be terminated. To prevent this, we must
+prevent another instance from running. Additionally, if a `SIGALRM` is _in
+flight_, we need to stop and wait anyway. Since `KillSwitch` maintains a lock
+on `execution_domain` as long as it's attempting to terminate a guest, we can
+achieve both of these goals by taking, and immediately dropping, a lock on
+`execution_domain` before descheduling an instance.
+
+Even taking this lock is interesting:
+0. This lock could be taken while racing a `KillSwitch`, after it has observed it may fire but before advancing to take this lock.
+1. This lock could be taken while racing a `KillSwitch`, after it has taken this lock.
+2. This lock could be taken without racing a `KillSwitch`.
+
+In the first case, we've won a race on `execution_domain`, but there might be a
+`KillSwitch` we're blocking with this. Disarm the `KillSwitch` by updating the
+domain to `Terminated`, which reflects the fact that this instance has exited.
+
+In the second case, descheduling until the `KillSwitch` has completed
+termination. The domain will be `Terminated`, and updating to `Terminated` has
+no effect. We simply use this lock to prevent continuting into the host until
+an in-flight `KillSwitch` completes.
+
+In the third case, we're not racing a `KillSwitch`, and any method of exiting
+the guest will have disabled termination. No `KillSwitch` will observe a
+changed `execution_domain`, so it's not incorrect to update it to `Terminated`.
+
+Taken together, this ensures that descheduling an instance serializes in-flight
+`KillSwitch` or prevents one from taking effect in a possibly-incorrect way, so
+we know this race is handled correctly.
 
 ### Terminated in hostcall fault
 
diff --git a/lucet-concurrency-tests/src/killswitch.rs b/lucet-concurrency-tests/src/killswitch.rs
@@ -478,6 +478,152 @@ macro_rules! killswitch_tests {
             })
         }
 
+        // If we terminate in the signal handler, but before termination has been disabled, a
+        // signal will be sent to the guest. Lucet must correctly handle this case, lest the sigalrm be
+        // delivered to disastrous effect to the host.
+        //
+        // This corresponds to a race during the documentation's State B -> State E "guest faults
+        // or is terminated" transition.
+        #[test]
+        fn terminate_during_guest_fault() {
+            test_c_with_instrumented_guest_entry("timeout", "fault.c", |mut inst| {
+                let kill_switch = inst.kill_switch();
+
+                // *Before* termination is critical, since afterward the `KillSwitch` we test with will
+                // just take no action.
+                let unfortunate_time_to_terminate = inst
+                    .lock_testpoints
+                    .signal_handler_before_disabling_termination
+                    .wait_at();
+                // Wait for the guest to reach a point we reaaallly don't want to signal at - somewhere in
+                // the signal handler.
+                let exiting_signal_handler = inst
+                    .lock_testpoints
+                    .signal_handler_before_returning
+                    .wait_at();
+                // Finally, we need to know when we're ready to signal to ensure it races with.
+                let killswitch_send_signal =
+                    inst.lock_testpoints.kill_switch_after_guest_alarm.wait_at();
+
+                let guest = thread::Builder::new()
+                    .name("guest".to_owned())
+                    .spawn(move || {
+                        match inst.run("main", &[0u32.into(), 0u32.into()]) {
+                            Err(Error::RuntimeFault(details)) => {
+                                assert_eq!(details.trapcode, Some(TrapCode::HeapOutOfBounds));
+                            }
+                            res => panic!("unexpected result: {:?}", res),
+                        }
+
+                        // Check that we can reset the instance and run a normal function.
+                        inst.reset().expect("instance resets");
+                        run_onetwothree(&mut inst);
+                    })
+                    .expect("can spawn guest thread");
+
+                let termination_thread = unfortunate_time_to_terminate.wait_and_then(|| {
+                    let thread = thread::Builder::new()
+                        .name("killswitch".to_owned())
+                        .spawn(move || {
+                            assert_eq!(kill_switch.terminate(), Ok(KillSuccess::Signalled));
+                        })
+                        .expect("can spawn killswitch thread");
+                    killswitch_send_signal.wait();
+                    thread
+                });
+
+                // Get ready to signal...
+                // and be sure that we haven't exited the signal handler until afterward
+                exiting_signal_handler.wait();
+
+                guest.join().expect("guest exits without panic");
+                termination_thread
+                    .join()
+                    .expect("termination completes without panic");
+            })
+        }
+
+        // Variant of the above where for scheduler reasons `terminable` and
+        // `execution_domain.lock()` happen on different sides of an instance descheduling.
+        //
+        // This corresponds to a race during the documentation's State B -> State E "guest faults
+        // or is terminated" transition.
+        //
+        // Specifically, we want:
+        // * signal handler fires, handling a guest fault
+        // * timeout fires, acquiring terminable
+        // * signal handler completes, locking in deschedule to serialize pending KillSwitch
+        // * KillSwitch is rescheduled, then fires
+        //
+        // And for all of this to complete without error!
+        #[test]
+        fn terminate_during_guest_fault_racing_deschedule() {
+            test_c_with_instrumented_guest_entry("timeout", "fault.c", |mut inst| {
+                let kill_switch = inst.kill_switch();
+
+                // *before* termination is critical, since afterward the `KillSwitch` we test with will
+                // just take no action.
+                let unfortunate_time_to_terminate = inst
+                    .lock_testpoints
+                    .signal_handler_before_disabling_termination
+                    .wait_at();
+                // we need to let the instance deschedule before our KillSwitch takes
+                // `execution_domain`.
+                let killswitch_acquire_termination = inst
+                    .lock_testpoints
+                    .kill_switch_after_acquiring_termination
+                    .wait_at();
+                // and the entire test revolves around KillSwitch taking effect after
+                // `CURRENT_INSTANCE` is cleared!
+                let current_instance_cleared = inst
+                    .lock_testpoints
+                    .instance_after_clearing_current_instance
+                    .wait_at();
+
+                let guest = thread::Builder::new()
+                    .name("guest".to_owned())
+                    .spawn(move || {
+                        match inst.run("main", &[0u32.into(), 0u32.into()]) {
+                            Err(Error::RuntimeFault(details)) => {
+                                assert_eq!(details.trapcode, Some(TrapCode::HeapOutOfBounds));
+                            }
+                            res => panic!("unexpected result: {:?}", res),
+                        }
+
+                        // Check that we can reset the instance and run a normal function.
+                        inst.reset().expect("instance resets");
+                        run_onetwothree(&mut inst);
+                    })
+                    .expect("can spawn guest thread");
+
+                let (termination_thread, killswitch_before_domain) = unfortunate_time_to_terminate
+                    .wait_and_then(|| {
+                        let ks_thread = thread::Builder::new()
+                            .name("killswitch".to_owned())
+                            .spawn(move || {
+                                assert_eq!(kill_switch.terminate(), Err(KillError::NotTerminable));
+                            })
+                            .expect("can spawn killswitch thread");
+
+                        // Pause the KillSwitch thread right before it acquires `execution_domain`
+                        let killswitch_before_domain = killswitch_acquire_termination.pause();
+
+                        (ks_thread, killswitch_before_domain)
+                    });
+
+                // `execution_domain` is not held, so instance descheduling will complete promptly.
+                current_instance_cleared.wait();
+
+                // Resume `KillSwitch`, which will acquire `execution_domain` and terminate.
+                killswitch_before_domain.resume();
+
+                guest.join().expect("guest exits without panic");
+                termination_thread
+                    .join()
+                    .expect("termination completes without panic");
+            })
+        }
+
         // This doesn't doesn't correspond to any state change in the documentation because it should have
         // no effect. The guest is in State E before, and should remain in State E after.
         #[test]
@@ -744,12 +890,12 @@ macro_rules! killswitch_tests {
 
                 ks1.join().expect("killswitch_1 did not panic");
 
-                // At this point the first `KillSwitch` has completed terminating the instance. Now try
-                // again and make sure there's no boom.
-                assert_eq!(second_kill_switch.terminate(), Err(KillError::Invalid));
-
                 // Allow the instance to reset and run a new function after termination.
-                guest_exit_testpoint.wait();
+                guest_exit_testpoint.wait_and_then(|| {
+                    // At this point the first `KillSwitch` has completed terminating the instance. Now try
+                    // again and make sure there's no boom.
+                    assert_eq!(second_kill_switch.terminate(), Err(KillError::Invalid));
+                });
 
                 // And after the instance successfully runs a test function, it exits without error.
                 guest.join().expect("guest stops running");
diff --git a/lucet-runtime/lucet-runtime-internals/src/instance.rs b/lucet-runtime/lucet-runtime-internals/src/instance.rs
@@ -1041,8 +1041,6 @@ impl Instance {
         );
         self.state = State::Running;
 
-        self.kill_state.schedule(unsafe { pthread_self() });
-
         let res = self.with_current_instance(|i| {
             i.with_signals_on(|i| {
                 HOST_CTX.with(|host_ctx| {
@@ -1055,6 +1053,11 @@ impl Instance {
             })
         });
 
+        #[cfg(feature = "concurrent_testpoints")]
+        self.lock_testpoints
+            .instance_after_clearing_current_instance
+            .check();
+
         if let Err(e) = res {
             // Something went wrong setting up or tearing down the signal handlers and signal
             // stack. This is an error, but we don't want it to mask an error that may have arisen
@@ -1084,8 +1087,6 @@ impl Instance {
         //
         // The state should never be `Ready`, `Terminated`, `Yielded`, or `Transitioning` at this point
 
-        self.kill_state.deschedule();
-
         // Set transitioning state temporarily so that we can move values out of the current state
         let st = mem::replace(&mut self.state, State::Transitioning);
 
@@ -1181,8 +1182,12 @@ impl Instance {
             Ok(())
         })?;
 
+        self.kill_state.schedule(unsafe { pthread_self() });
+
         let res = f(self);
 
+        self.kill_state.deschedule();
+
         CURRENT_INSTANCE.with(|current_instance| {
             *current_instance.borrow_mut() = None;
         });
diff --git a/lucet-runtime/lucet-runtime-internals/src/instance/execution.rs b/lucet-runtime/lucet-runtime-internals/src/instance/execution.rs
@@ -149,6 +149,13 @@ pub struct KillState {
     /// `tid_change_notifier` allows functions that may cause a change in `thread_id` to wait,
     /// without spinning, for the signal to be processed.
     tid_change_notifier: Condvar,
+    /// `ignore_alarm` indicates if a SIGALRM directed at this KillState's instance must be
+    /// ignored. This is necessary for a specific race where a termination occurs right around when
+    /// a Lucet guest, or hostcall the guest made, handles some other signal: if the termination
+    /// occurs during handling of a signal that arose from guest code, a SIGALRM will either be
+    /// pending, masked by Lucet's sigaction's signal mask, OR a SIGLARM will be imminent after
+    /// handling the signal.
+    ignore_alarm: AtomicBool,
     #[cfg(feature = "concurrent_testpoints")]
     /// When testing race permutations, `KillState` keeps a reference to the `LockTestpoints` its
     /// associated instance holds.
@@ -312,6 +319,7 @@ impl Default for KillState {
             tid_change_notifier: Condvar::new(),
             execution_domain: Mutex::new(Domain::Pending),
             thread_id: Mutex::new(None),
+            ignore_alarm: AtomicBool::new(false),
         }
     }
 }
@@ -330,6 +338,7 @@ impl KillState {
             tid_change_notifier: Condvar::new(),
             execution_domain: Mutex::new(Domain::Pending),
             thread_id: Mutex::new(None),
+            ignore_alarm: AtomicBool::new(false),
             lock_testpoints,
         }
     }
@@ -342,14 +351,22 @@ impl KillState {
         self.terminable.store(true, Ordering::SeqCst);
     }
 
-    pub fn disable_termination(&self) {
-        self.terminable.store(false, Ordering::SeqCst);
+    pub fn disable_termination(&self) -> bool {
+        self.terminable.swap(false, Ordering::SeqCst)
     }
 
     pub fn terminable_ptr(&self) -> *const AtomicBool {
         &self.terminable as *const AtomicBool
     }
 
+    pub fn silence_alarm(&self) -> bool {
+        self.ignore_alarm.swap(true, Ordering::SeqCst)
+    }
+
+    pub fn alarm_active(&self) -> bool {
+        !self.ignore_alarm.load(Ordering::SeqCst)
+    }
+
     /// Set the execution domain to signify that we are currently executing a hostcall.
     ///
     /// This method will panic if the execution domain is currently marked as anything but
@@ -450,6 +467,27 @@ impl KillState {
     pub fn deschedule(&self) {
         *self.thread_id.lock().unwrap() = None;
         self.tid_change_notifier.notify_all();
+
+        // If a guest is being descheduled, this lock is load-bearing in two ways:
+        // * If a KillSwitch is in flight and already holds `execution_domain`, we must wait for
+        // it to complete. This prevents a SIGALRM from being sent at some point later in host
+        // execution.
+        // * If a KillSwitch has aqcuired `terminable`, but not `execution_domain`, we may win the
+        // race for this lock. We don't know when the KillSwitch will try to check
+        // `execution_domain`. Holding the lock, we can update it to `Terminated` - this reflects
+        // that the instance has exited, but also signals that the KillSwitch should take no
+        // effect.
+        //
+        // This must occur *after* notifing `tid_change_notifier` so that we indicate to a
+        // `KillSwitch` that the instance was actually descheduled, if it was terminating a guest.
+        //
+        // If any other state is being descheduled, either the instance faulted in another domain,
+        // or a hostcall called `yield`, and we must preserve the `Hostcall` domain, so don't
+        // change it.
+        let mut execution_domain = self.execution_domain.lock().unwrap();
+        if let Domain::Guest = *execution_domain {
+            *execution_domain = Domain::Terminated;
+        }
     }
 }
 
@@ -593,6 +631,10 @@ impl KillSwitch {
                     unsafe {
                         pthread_kill(thread_id, SIGALRM);
                     }
+
+                    #[cfg(feature = "concurrent_testpoints")]
+                    state.lock_testpoints.kill_switch_after_guest_alarm.check();
+
                     // wait for the SIGALRM handler to deschedule the instance
                     //
                     // this should never actually loop, which would indicate the instance
diff --git a/lucet-runtime/lucet-runtime-internals/src/instance/signals.rs b/lucet-runtime/lucet-runtime-internals/src/instance/signals.rs
diff --git a/lucet-runtime/lucet-runtime-internals/src/lock_testpoints.rs b/lucet-runtime/lucet-runtime-internals/src/lock_testpoints.rs