timing wheel: avoid queueing expired timers and caching wrong earliest expiry value

This commit addresses two issues: 1) Avoid refilling the timing wheel with stale timers in rare circumstances. The timing_wheel_advance() may call advance_cpu_time_base() to update the cpu_time_base, which is used as a starting point for 32-bit offsets of events on the timer wheel. If the timing_wheel_advance() is not called for a longer period of time, then advance_cpu_time_base() is called multiple times in a loop. advance_cpu_time_base() has two parts - the first part adjusting the base for the existing event, and the second part trying to fill with the new events from the overflow queue, which now fit into the 32-bit-sized time window off the new cpu_time_base. In doing so this second part incorrectly considers the timers which have just expired (have the time index == w->current_time_index) to still be unexpired and places them onto the wheel instead of returning them as expired. For quick successive executions of timing_wheel_advance() these events result in a relatively benign late expiry - the newly placed events expire during the next call to timing_wheel_advance(). If the successive executions of timing_wheel_advance() result in multiple invocations of advance_cpu_time_base(), the Nth iteration of it may place a stale event on the timer wheel if the event time index equals to the current time index (which has been previously purged), while the N+1th iteration of it will trigger an assert violation on this stale event, resulting in a reboot. As part of the testing, two test runs were done before and after the change. Each of the test runs consisted of the following command: for i in `seq 1 300`; do ./test_timing_wheel validate events 10000 synthetic-time verbose seed $i iter 10000 wait-time 2 max-time 300; done The test runs completed identically, however they uncovered the following assert failure: vpp/src/vppinfra/test_timing_wheel.c:225 (test_timing_wheel_main) assertion `min_next_time[0] <= tm->events[i]' fails This assert is the second issue covered by this commit: 2) Inserting a new element may result in incorrect cached expiry value The w->cached_min_cpu_time_on_wheel is being updated within timing_wheel_advance() every time the elements are expired. However, it is not touched if the new elements are inserted. Assuming current time is "T" and the cached min cpu time is "T+X", if a new element is being inserted whose expiry time is "T+Y", and Y is such that Y < X, then the value w->cached_min_cpu_time_on_wheel becomes incorrect until the next expiry event, during which it is updated. The test catches this transient condition which results in the asserts seen in the runs above. The solution is to update the w->cached_min_cpu_time_on_wheel within timing_wheel_insert_helper() as necessary. Change-Id: I56a65a9a11cc2a1e0b36937a9c6d5ad10233a731 Signed-off-by: Andrew Yourtchenko <ayourtch@gmail.com>
author: Andrew Yourtchenko <ayourtch@gmail.com> 2017-02-01 14:08:21 +0000
committer: Dave Barach <openvpp@barachs.net> 2017-03-04 00:06:48 +0000
commit: fa5231d75e1530a03a0c4f14706ec58067fa32d0 (patch)
tree: a7f6d055cd9cafd2e80ff1f61a26df9372ccd22a /src/vppinfra/timing_wheel.c
parent: c83c3b7f117b981b677f646a0e30f44ec70de239 (diff)
1 files changed, 11 insertions, 2 deletions
diff --git a/src/vppinfra/timing_wheel.c b/src/vppinfra/timing_wheel.c
index 4c8a2c583a9..064171ab4e4 100644
--- a/src/vppinfra/timing_wheel.c
+++ b/src/vppinfra/timing_wheel.c
@@ -239,6 +239,8 @@ timing_wheel_insert_helper (timing_wheel_t * w, u64 insert_cpu_time,
       e = insert_helper (w, level_index, rtime);
       e->user_data = user_data;
       e->cpu_time_relative_to_base = dt;
+      if (insert_cpu_time < w->cached_min_cpu_time_on_wheel)
+	w->cached_min_cpu_time_on_wheel = insert_cpu_time;
     }
   else
     {
@@ -488,7 +490,8 @@ advance_cpu_time_base (timing_wheel_t * w, u32 * expired_user_data)
 	vec_foreach (e, l->elts[wi])
 	  {
 	    /* This should always be true since otherwise we would have already expired
-	       this element. */
+	       this element. Note that in the second half of this function we need
+               to take care not to place the expired elements ourselves. */
 	    ASSERT (e->cpu_time_relative_to_base >= delta);
 	    e->cpu_time_relative_to_base -= delta;
 	  }
@@ -505,10 +508,11 @@ advance_cpu_time_base (timing_wheel_t * w, u32 * expired_user_data)
       if (0 == ((oe->cpu_time - w->cpu_time_base) >> BITS (e->cpu_time_relative_to_base)))
 	{
 	  u64 ti = oe->cpu_time >> w->log2_clocks_per_bin;
-	  if (ti < w->current_time_index)
+	  if (ti <= w->current_time_index)
 	    {
 	      /* This can happen when timing wheel is not advanced for a long time
 		 (for example when at a gdb breakpoint for a while). */
+              /* Note: the ti == w->current_time_index means it is also an expired timer */
 	      if (! elt_is_deleted (w, oe->user_data))
 		vec_add1 (expired_user_data, oe->user_data);
 	    }
@@ -664,6 +668,8 @@ timing_wheel_advance (timing_wheel_t * w, u64 advance_cpu_time,
 		expire_bin (w, advance_level_index, wi, advance_cpu_time,
 			    expired_user_data);
 
+	    /* When we jump out, we have already just expired the bin,
+	       corresponding to advance_wheel_index */
 	    if (wi == advance_wheel_index)
 	      break;
 
@@ -682,6 +688,9 @@ timing_wheel_advance (timing_wheel_t * w, u64 advance_cpu_time,
     }
 
   /* Don't advance until necessary. */
+  /* However, if the timing_wheel_advance() hasn't been called for some time,
+     the while() loop will ensure multiple calls to advance_cpu_time_base()
+     in a row until the w->cpu_time_base is fresh enough. */
   while (PREDICT_FALSE
 	 (advance_time_index >= w->time_index_next_cpu_time_base_update))
     expired_user_data = advance_cpu_time_base (w, expired_user_data);
author	Andrew Yourtchenko <ayourtch@gmail.com>	2017-02-01 14:08:21 +0000
committer	Dave Barach <openvpp@barachs.net>	2017-03-04 00:06:48 +0000
commit	fa5231d75e1530a03a0c4f14706ec58067fa32d0 (patch)
tree	a7f6d055cd9cafd2e80ff1f61a26df9372ccd22a /src/vppinfra/timing_wheel.c
parent	c83c3b7f117b981b677f646a0e30f44ec70de239 (diff)