Changeset 1b20da0 in mainline for kernel/generic/src/synch/rcu.c
- Timestamp:
- 2018-02-28T17:52:03Z (7 years ago)
- Branches:
- lfn, master, serial, ticket/834-toolchain-update, topic/msim-upgrade, topic/simplify-dev-export
- Children:
- 3061bc1
- Parents:
- df6ded8
- git-author:
- Jiří Zárevúcky <zarevucky.jiri@…> (2018-02-28 17:26:03)
- git-committer:
- Jiří Zárevúcky <zarevucky.jiri@…> (2018-02-28 17:52:03)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
kernel/generic/src/synch/rcu.c
rdf6ded8 r1b20da0 35 35 * @file 36 36 * @brief Preemptible read-copy update. Usable from interrupt handlers. 37 * 37 * 38 38 * @par Podzimek-preempt-RCU (RCU_PREEMPT_PODZIMEK) 39 * 39 * 40 40 * Podzimek-preempt-RCU is a preemptible variant of Podzimek's non-preemptible 41 41 * RCU algorithm [1, 2]. Grace period (GP) detection is centralized into a … … 43 43 * that it passed a quiescent state (QS), ie a state when the cpu is 44 44 * outside of an rcu reader section (CS). Cpus check for QSs during context 45 * switches and when entering and exiting rcu reader sections. Once all 46 * cpus announce a QS and if there were no threads preempted in a CS, the 45 * switches and when entering and exiting rcu reader sections. Once all 46 * cpus announce a QS and if there were no threads preempted in a CS, the 47 47 * GP ends. 48 * 49 * The detector increments the global GP counter, _rcu_cur_gp, in order 50 * to start a new GP. Readers notice the new GP by comparing the changed 48 * 49 * The detector increments the global GP counter, _rcu_cur_gp, in order 50 * to start a new GP. Readers notice the new GP by comparing the changed 51 51 * _rcu_cur_gp to a locally stored value last_seen_gp which denotes the 52 52 * the last GP number for which the cpu noted an explicit QS (and issued 53 53 * a memory barrier). Readers check for the change in the outer-most 54 * (ie not nested) rcu_read_lock()/unlock() as these functions represent 55 * a QS. The reader first executes a memory barrier (MB) in order to contain 56 * memory references within a CS (and to make changes made by writers 57 * visible in the CS following rcu_read_lock()). Next, the reader notes 54 * (ie not nested) rcu_read_lock()/unlock() as these functions represent 55 * a QS. The reader first executes a memory barrier (MB) in order to contain 56 * memory references within a CS (and to make changes made by writers 57 * visible in the CS following rcu_read_lock()). Next, the reader notes 58 58 * that it reached a QS by updating the cpu local last_seen_gp to the 59 59 * global GP counter, _rcu_cur_gp. Cache coherency eventually makes 60 60 * the updated last_seen_gp visible to the detector cpu, much like it 61 61 * delivered the changed _rcu_cur_gp to all cpus. 62 * 63 * The detector waits a while after starting a GP and then reads each 64 * cpu's last_seen_gp to see if it reached a QS. If a cpu did not record 62 * 63 * The detector waits a while after starting a GP and then reads each 64 * cpu's last_seen_gp to see if it reached a QS. If a cpu did not record 65 65 * a QS (might be a long running thread without an RCU reader CS; or cache 66 66 * coherency has yet to make the most current last_seen_gp visible to … … 68 68 * via an IPI. If the IPI handler finds the cpu still in a CS, it instructs 69 69 * the cpu to notify the detector that it had exited the CS via a semaphore 70 * (CPU->rcu.is_delaying_gp). 70 * (CPU->rcu.is_delaying_gp). 71 71 * The detector then waits on the semaphore for any cpus to exit their 72 * CSs. Lastly, it waits for the last reader preempted in a CS to 72 * CSs. Lastly, it waits for the last reader preempted in a CS to 73 73 * exit its CS if there were any and signals the end of the GP to 74 74 * separate reclaimer threads wired to each cpu. Reclaimers then 75 75 * execute the callbacks queued on each of the cpus. 76 * 77 * 76 * 77 * 78 78 * @par A-RCU algorithm (RCU_PREEMPT_A) 79 * 79 * 80 80 * A-RCU is based on the user space rcu algorithm in [3] utilizing signals 81 * (urcu) and Podzimek's rcu [1]. Like in Podzimek's rcu, callbacks are 82 * executed by cpu-bound reclaimer threads. There is however no dedicated 83 * detector thread and the reclaimers take on the responsibilities of the 84 * detector when they need to start a new GP. A new GP is again announced 81 * (urcu) and Podzimek's rcu [1]. Like in Podzimek's rcu, callbacks are 82 * executed by cpu-bound reclaimer threads. There is however no dedicated 83 * detector thread and the reclaimers take on the responsibilities of the 84 * detector when they need to start a new GP. A new GP is again announced 85 85 * and acknowledged with _rcu_cur_gp and the cpu local last_seen_gp. Unlike 86 * Podzimek's rcu, cpus check explicitly for QS only during context switches. 86 * Podzimek's rcu, cpus check explicitly for QS only during context switches. 87 87 * Like in urcu, rcu_read_lock()/unlock() only maintain the nesting count 88 88 * and never issue any memory barriers. This makes rcu_read_lock()/unlock() 89 89 * simple and fast. 90 * 90 * 91 91 * If a new callback is queued for a reclaimer and no GP is in progress, 92 * the reclaimer takes on the role of a detector. The detector increments 93 * _rcu_cur_gp in order to start a new GP. It waits a while to give cpus 92 * the reclaimer takes on the role of a detector. The detector increments 93 * _rcu_cur_gp in order to start a new GP. It waits a while to give cpus 94 94 * a chance to switch a context (a natural QS). Then, it examines each 95 95 * non-idle cpu that has yet to pass a QS via an IPI. The IPI handler … … 98 98 * finds the cpu in a CS it does nothing and let the detector poll/interrupt 99 99 * the cpu again after a short sleep. 100 * 100 * 101 101 * @par Caveats 102 * 102 * 103 103 * last_seen_gp and _rcu_cur_gp are always 64bit variables and they 104 104 * are read non-atomically on 32bit machines. Reading a clobbered 105 105 * value of last_seen_gp or _rcu_cur_gp or writing a clobbered value 106 106 * of _rcu_cur_gp to last_seen_gp will at worst force the detector 107 * to unnecessarily interrupt a cpu. Interrupting a cpu makes the 107 * to unnecessarily interrupt a cpu. Interrupting a cpu makes the 108 108 * correct value of _rcu_cur_gp visible to the cpu and correctly 109 109 * resets last_seen_gp in both algorithms. 110 * 111 * 112 * 110 * 111 * 112 * 113 113 * [1] Read-copy-update for opensolaris, 114 114 * 2010, Podzimek 115 115 * https://andrej.podzimek.org/thesis.pdf 116 * 116 * 117 117 * [2] (podzimek-rcu) implementation file "rcu.patch" 118 118 * http://d3s.mff.cuni.cz/projects/operating_systems/rcu/rcu.patch 119 * 119 * 120 120 * [3] User-level implementations of read-copy update, 121 121 * 2012, appendix 122 122 * http://www.rdrop.com/users/paulmck/RCU/urcu-supp-accepted.2011.08.30a.pdf 123 * 123 * 124 124 */ 125 125 … … 139 139 #include <macros.h> 140 140 141 /* 142 * Number of milliseconds to give to preexisting readers to finish 141 /* 142 * Number of milliseconds to give to preexisting readers to finish 143 143 * when non-expedited grace period detection is in progress. 144 144 */ 145 145 #define DETECT_SLEEP_MS 10 146 /* 147 * Max number of pending callbacks in the local cpu's queue before 146 /* 147 * Max number of pending callbacks in the local cpu's queue before 148 148 * aggressively expediting the current grace period 149 149 */ … … 159 159 #define UINT32_MAX_HALF 2147483648U 160 160 161 /** 162 * The current grace period number. Increases monotonically. 161 /** 162 * The current grace period number. Increases monotonically. 163 163 * Lock rcu.gp_lock or rcu.preempt_lock to get a current value. 164 164 */ … … 171 171 /** Reclaimers use to notify the detector to accelerate GP detection. */ 172 172 condvar_t expedite_now; 173 /** 173 /** 174 174 * Protects: req_gp_end_cnt, req_expedited_cnt, completed_gp, _rcu_cur_gp; 175 175 * or: completed_gp, _rcu_cur_gp … … 177 177 SPINLOCK_DECLARE(gp_lock); 178 178 /** 179 * The number of the most recently completed grace period. At most 180 * one behind _rcu_cur_gp. If equal to _rcu_cur_gp, a grace period 179 * The number of the most recently completed grace period. At most 180 * one behind _rcu_cur_gp. If equal to _rcu_cur_gp, a grace period 181 181 * detection is not in progress and the detector is idle. 182 182 */ … … 189 189 /** Reader that have been preempted and might delay the next grace period.*/ 190 190 list_t next_preempted; 191 /** 192 * The detector is waiting for the last preempted reader 193 * in cur_preempted to announce that it exited its reader 191 /** 192 * The detector is waiting for the last preempted reader 193 * in cur_preempted to announce that it exited its reader 194 194 * section by up()ing remaining_readers. 195 195 */ … … 198 198 #ifdef RCU_PREEMPT_A 199 199 200 /** 201 * The detector waits on this semaphore for any preempted readers 200 /** 201 * The detector waits on this semaphore for any preempted readers 202 202 * delaying the grace period once all cpus pass a quiescent state. 203 203 */ … … 212 212 /** Number of consecutive grace periods to detect quickly and aggressively.*/ 213 213 size_t req_expedited_cnt; 214 /** 214 /** 215 215 * Number of cpus with readers that are delaying the current GP. 216 216 * They will up() remaining_readers. 217 217 */ 218 218 atomic_t delaying_cpu_cnt; 219 /** 219 /** 220 220 * The detector waits on this semaphore for any readers delaying the GP. 221 * 222 * Each of the cpus with readers that are delaying the current GP 223 * must up() this sema once they reach a quiescent state. If there 224 * are any readers in cur_preempted (ie preempted preexisting) and 221 * 222 * Each of the cpus with readers that are delaying the current GP 223 * must up() this sema once they reach a quiescent state. If there 224 * are any readers in cur_preempted (ie preempted preexisting) and 225 225 * they are already delaying GP detection, the last to unlock its 226 226 * reader section must up() this sema once. … … 252 252 static void start_reclaimers(void); 253 253 static void synch_complete(rcu_item_t *rcu_item); 254 static inline void rcu_call_impl(bool expedite, rcu_item_t *rcu_item, 254 static inline void rcu_call_impl(bool expedite, rcu_item_t *rcu_item, 255 255 rcu_func_t func); 256 256 static void add_barrier_cb(void *arg); … … 396 396 397 397 398 /** Cleans up global RCU resources and stops dispatching callbacks. 399 * 398 /** Cleans up global RCU resources and stops dispatching callbacks. 399 * 400 400 * Call when shutting down the kernel. Outstanding callbacks will 401 401 * not be processed. Instead they will linger forever. … … 444 444 snprintf(name, THREAD_NAME_BUFLEN - 1, "rcu-rec/%u", cpu_id); 445 445 446 cpus[cpu_id].rcu.reclaimer_thr = 446 cpus[cpu_id].rcu.reclaimer_thr = 447 447 thread_create(reclaimer, NULL, TASK, THREAD_FLAG_NONE, name); 448 448 449 if (!cpus[cpu_id].rcu.reclaimer_thr) 449 if (!cpus[cpu_id].rcu.reclaimer_thr) 450 450 panic("Failed to create RCU reclaimer thread on cpu%u.", cpu_id); 451 451 … … 460 460 static void start_detector(void) 461 461 { 462 rcu.detector_thr = 462 rcu.detector_thr = 463 463 thread_create(detector, NULL, TASK, THREAD_FLAG_NONE, "rcu-det"); 464 464 465 if (!rcu.detector_thr) 465 if (!rcu.detector_thr) 466 466 panic("Failed to create RCU detector thread."); 467 467 … … 479 479 } 480 480 481 /** Unlocks the local reader section using the given nesting count. 482 * 483 * Preemption or interrupts must be disabled. 484 * 485 * @param pnesting_cnt Either &CPU->rcu.tmp_nesting_cnt or 481 /** Unlocks the local reader section using the given nesting count. 482 * 483 * Preemption or interrupts must be disabled. 484 * 485 * @param pnesting_cnt Either &CPU->rcu.tmp_nesting_cnt or 486 486 * THREAD->rcu.nesting_cnt. 487 487 */ … … 493 493 _rcu_record_qs(); 494 494 495 /* 496 * The thread was preempted while in a critical section or 497 * the detector is eagerly waiting for this cpu's reader 498 * to finish. 499 * 495 /* 496 * The thread was preempted while in a critical section or 497 * the detector is eagerly waiting for this cpu's reader 498 * to finish. 499 * 500 500 * Note that THREAD may be NULL in scheduler() and not just during boot. 501 501 */ … … 518 518 */ 519 519 520 /* 520 /* 521 521 * If the detector is eagerly waiting for this cpu's reader to unlock, 522 522 * notify it that the reader did so. … … 566 566 assert(!rcu_read_locked()); 567 567 568 synch_item_t completion; 568 synch_item_t completion; 569 569 570 570 waitq_initialize(&completion.wq); … … 584 584 void rcu_barrier(void) 585 585 { 586 /* 586 /* 587 587 * Serialize rcu_barrier() calls so we don't overwrite cpu.barrier_item 588 588 * currently in use by rcu_barrier(). … … 590 590 mutex_lock(&rcu.barrier_mtx); 591 591 592 /* 592 /* 593 593 * Ensure we queue a barrier callback on all cpus before the already 594 594 * enqueued barrier callbacks start signaling completion. … … 610 610 } 611 611 612 /** Issues a rcu_barrier() callback on the local cpu. 613 * 614 * Executed with interrupts disabled. 612 /** Issues a rcu_barrier() callback on the local cpu. 613 * 614 * Executed with interrupts disabled. 615 615 */ 616 616 static void add_barrier_cb(void *arg) … … 631 631 } 632 632 633 /** Adds a callback to invoke after all preexisting readers finish. 634 * 633 /** Adds a callback to invoke after all preexisting readers finish. 634 * 635 635 * May be called from within interrupt handlers or RCU reader sections. 636 * 636 * 637 637 * @param rcu_item Used by RCU to track the call. Must remain 638 638 * until the user callback function is entered. … … 655 655 656 656 /** rcu_call() inline-able implementation. See rcu_call() for comments. */ 657 static inline void rcu_call_impl(bool expedite, rcu_item_t *rcu_item, 657 static inline void rcu_call_impl(bool expedite, rcu_item_t *rcu_item, 658 658 rcu_func_t func) 659 659 { … … 667 667 rcu_cpu_data_t *r = &CPU->rcu; 668 668 669 rcu_item_t **prev_tail 669 rcu_item_t **prev_tail 670 670 = local_atomic_exchange(&r->parriving_cbs_tail, &rcu_item->next); 671 671 *prev_tail = rcu_item; … … 704 704 { 705 705 assert(THREAD && THREAD->wired); 706 /* 707 * Accessing with interrupts enabled may at worst lead to 706 /* 707 * Accessing with interrupts enabled may at worst lead to 708 708 * a false negative if we race with a local interrupt handler. 709 709 */ … … 740 740 static bool wait_for_pending_cbs(void) 741 741 { 742 if (!all_cbs_empty()) 742 if (!all_cbs_empty()) 743 743 return true; 744 744 … … 772 772 if (exec_cnt < CRITICAL_THRESHOLD) { 773 773 exec_cbs(&CPU->rcu.cur_cbs); 774 exec_cbs(&CPU->rcu.next_cbs); 774 exec_cbs(&CPU->rcu.next_cbs); 775 775 } else { 776 /* 777 * Getting overwhelmed with too many callbacks to run. 778 * Disable preemption in order to prolong our time slice 776 /* 777 * Getting overwhelmed with too many callbacks to run. 778 * Disable preemption in order to prolong our time slice 779 779 * and catch up with updaters posting new callbacks. 780 780 */ 781 781 preemption_disable(); 782 782 exec_cbs(&CPU->rcu.cur_cbs); 783 exec_cbs(&CPU->rcu.next_cbs); 783 exec_cbs(&CPU->rcu.next_cbs); 784 784 preemption_enable(); 785 785 } … … 792 792 exec_cbs(&CPU->rcu.cur_cbs); 793 793 } else { 794 /* 795 * Getting overwhelmed with too many callbacks to run. 796 * Disable preemption in order to prolong our time slice 794 /* 795 * Getting overwhelmed with too many callbacks to run. 796 * Disable preemption in order to prolong our time slice 797 797 * and catch up with updaters posting new callbacks. 798 798 */ … … 828 828 CPU->rcu.stat_max_cbs = max(arriving_cnt, CPU->rcu.stat_max_cbs); 829 829 if (0 < arriving_cnt) { 830 CPU->rcu.stat_avg_cbs = 830 CPU->rcu.stat_avg_cbs = 831 831 (99 * CPU->rcu.stat_avg_cbs + 1 * arriving_cnt) / 100; 832 832 } … … 834 834 835 835 /** Prepares another batch of callbacks to dispatch at the nest grace period. 836 * 836 * 837 837 * @return True if the next batch of callbacks must be expedited quickly. 838 838 */ … … 849 849 CPU->rcu.arriving_cbs_cnt = 0; 850 850 851 /* 851 /* 852 852 * Too many callbacks queued. Better speed up the detection 853 853 * or risk exhausting all system memory. 854 854 */ 855 855 bool expedite = (EXPEDITE_THRESHOLD < CPU->rcu.next_cbs_cnt) 856 || CPU->rcu.expedite_arriving; 856 || CPU->rcu.expedite_arriving; 857 857 CPU->rcu.expedite_arriving = false; 858 858 … … 860 860 CPU->rcu.next_cbs = CPU->rcu.arriving_cbs; 861 861 862 /* 862 /* 863 863 * At least one callback arrived. The tail therefore does not point 864 864 * to the head of arriving_cbs and we can safely reset it to NULL. … … 873 873 ACCESS_ONCE(CPU->rcu.parriving_cbs_tail) = &CPU->rcu.arriving_cbs; 874 874 } else { 875 /* 876 * arriving_cbs was null and parriving_cbs_tail pointed to it 875 /* 876 * arriving_cbs was null and parriving_cbs_tail pointed to it 877 877 * so leave it that way. Note that interrupt handlers may have 878 878 * added a callback in the meantime so it is not safe to reset … … 884 884 upd_stat_cb_cnts(CPU->rcu.next_cbs_cnt); 885 885 886 /* 887 * Make changes prior to queuing next_cbs visible to readers. 886 /* 887 * Make changes prior to queuing next_cbs visible to readers. 888 888 * See comment in wait_for_readers(). 889 889 */ … … 898 898 CPU->rcu.next_cbs_gp = _rcu_cur_gp + 1; 899 899 900 /* 900 /* 901 901 * There are no callbacks to invoke before next_cbs. Instruct 902 902 * wait_for_cur_cbs_gp() to notify us of the nearest GP end. 903 * That could be sooner than next_cbs_gp (if the current GP 903 * That could be sooner than next_cbs_gp (if the current GP 904 904 * had not yet completed), so we'll create a shorter batch 905 905 * of callbacks next time around. … … 907 907 if (cur_cbs_empty()) { 908 908 CPU->rcu.cur_cbs_gp = rcu.completed_gp + 1; 909 } 909 } 910 910 911 911 spinlock_unlock(&rcu.gp_lock); … … 916 916 assert(CPU->rcu.cur_cbs_gp <= CPU->rcu.next_cbs_gp); 917 917 918 return expedite; 918 return expedite; 919 919 } 920 920 … … 922 922 #ifdef RCU_PREEMPT_A 923 923 924 /** Waits for the grace period associated with callbacks cub_cbs to elapse. 925 * 926 * @param expedite Instructs the detector to aggressively speed up grace 924 /** Waits for the grace period associated with callbacks cub_cbs to elapse. 925 * 926 * @param expedite Instructs the detector to aggressively speed up grace 927 927 * period detection without any delay. 928 * @param completed_gp Returns the most recent completed grace period 928 * @param completed_gp Returns the most recent completed grace period 929 929 * number. 930 930 * @return false if the thread was interrupted and should stop. … … 951 951 condvar_broadcast(&rcu.gp_ended); 952 952 } else { 953 /* GP detection is in progress.*/ 953 /* GP detection is in progress.*/ 954 954 955 if (expedite) 955 if (expedite) 956 956 condvar_signal(&rcu.expedite_now); 957 957 958 958 /* Wait for the GP to complete. */ 959 errno_t ret = _condvar_wait_timeout_spinlock(&rcu.gp_ended, &rcu.gp_lock, 959 errno_t ret = _condvar_wait_timeout_spinlock(&rcu.gp_ended, &rcu.gp_lock, 960 960 SYNCH_NO_TIMEOUT, SYNCH_FLAGS_INTERRUPTIBLE); 961 961 962 962 if (ret == EINTR) { 963 963 spinlock_unlock(&rcu.gp_lock); 964 return false; 964 return false; 965 965 } 966 966 } … … 984 984 while (!cpu_mask_is_none(reader_cpus)) { 985 985 /* Give cpus a chance to context switch (a QS) and batch callbacks. */ 986 if(!gp_sleep(&expedite)) 986 if(!gp_sleep(&expedite)) 987 987 return false; 988 988 … … 996 996 } 997 997 998 /* 998 /* 999 999 * All cpus have passed through a QS and see the most recent _rcu_cur_gp. 1000 1000 * As a result newly preempted readers will associate with next_preempted … … 1038 1038 1039 1039 if (locked && !passed_qs) { 1040 /* 1040 /* 1041 1041 * This cpu has not yet passed a quiescent state during this grace 1042 1042 * period and it is currently in a reader section. We'll have to … … 1057 1057 assert(interrupts_disabled()); 1058 1058 1059 /* 1060 * In order not to worry about NMI seeing rcu_nesting change work 1059 /* 1060 * In order not to worry about NMI seeing rcu_nesting change work 1061 1061 * with a local copy. 1062 1062 */ 1063 1063 size_t nesting_cnt = local_atomic_exchange(&THE->rcu_nesting, 0); 1064 1064 1065 /* 1065 /* 1066 1066 * Ensures NMIs see .rcu_nesting without the WAS_PREEMPTED mark and 1067 1067 * do not accidentally call rm_preempted_reader() from unlock(). … … 1079 1079 1080 1080 if (CPU->rcu.last_seen_gp != _rcu_cur_gp) { 1081 /* 1082 * Contain any memory accesses of old readers before announcing a QS. 1081 /* 1082 * Contain any memory accesses of old readers before announcing a QS. 1083 1083 * Also make changes from the previous GP visible to this cpu. 1084 * Moreover it separates writing to last_seen_gp from 1084 * Moreover it separates writing to last_seen_gp from 1085 1085 * note_preempted_reader(). 1086 1086 */ 1087 1087 memory_barrier(); 1088 /* 1088 /* 1089 1089 * The preempted reader has been noted globally. There are therefore 1090 1090 * no readers running on this cpu so this is a quiescent state. 1091 * 1092 * Reading the multiword _rcu_cur_gp non-atomically is benign. 1091 * 1092 * Reading the multiword _rcu_cur_gp non-atomically is benign. 1093 1093 * At worst, the read value will be different from the actual value. 1094 1094 * As a result, both the detector and this cpu will believe 1095 1095 * this cpu has not yet passed a QS although it really did. 1096 * 1096 * 1097 1097 * Reloading _rcu_cur_gp is benign, because it cannot change 1098 1098 * until this cpu acknowledges it passed a QS by writing to … … 1103 1103 } 1104 1104 1105 /* 1105 /* 1106 1106 * Forcefully associate the reclaimer with the highest priority 1107 1107 * even if preempted due to its time slice running out. … … 1109 1109 if (THREAD == CPU->rcu.reclaimer_thr) { 1110 1110 THREAD->priority = -1; 1111 } 1111 } 1112 1112 1113 1113 upd_max_cbs_in_slice(CPU->rcu.arriving_cbs_cnt); … … 1123 1123 } 1124 1124 1125 /** Called from scheduler() when exiting the current thread. 1126 * 1125 /** Called from scheduler() when exiting the current thread. 1126 * 1127 1127 * Preemption or interrupts are disabled and the scheduler() already 1128 1128 * switched away from the current thread, calling rcu_after_thread_ran(). … … 1132 1132 assert(THE->rcu_nesting == 0); 1133 1133 1134 /* 1135 * The thread forgot to exit its reader critical section. 1134 /* 1135 * The thread forgot to exit its reader critical section. 1136 1136 * It is a bug, but rather than letting the entire system lock up 1137 * forcefully leave the reader section. The thread is not holding 1137 * forcefully leave the reader section. The thread is not holding 1138 1138 * any references anyway since it is exiting so it is safe. 1139 1139 */ … … 1162 1162 size_t prev = local_atomic_exchange(&THE->rcu_nesting, 0); 1163 1163 if (prev == RCU_WAS_PREEMPTED) { 1164 /* 1164 /* 1165 1165 * NMI handlers are never preempted but may call rm_preempted_reader() 1166 1166 * if a NMI occurred in _rcu_preempted_unlock() of a preempted thread. … … 1168 1168 * in _rcu_preempted_unlock() is: an IPI/sample_local_cpu() and 1169 1169 * the initial part of rcu_after_thread_ran(). 1170 * 1170 * 1171 1171 * rm_preempted_reader() will not deadlock because none of the locks 1172 1172 * it uses are locked in this case. Neither _rcu_preempted_unlock() … … 1180 1180 #elif defined(RCU_PREEMPT_PODZIMEK) 1181 1181 1182 /** Waits for the grace period associated with callbacks cub_cbs to elapse. 1183 * 1184 * @param expedite Instructs the detector to aggressively speed up grace 1182 /** Waits for the grace period associated with callbacks cub_cbs to elapse. 1183 * 1184 * @param expedite Instructs the detector to aggressively speed up grace 1185 1185 * period detection without any delay. 1186 * @param completed_gp Returns the most recent completed grace period 1186 * @param completed_gp Returns the most recent completed grace period 1187 1187 * number. 1188 1188 * @return false if the thread was interrupted and should stop. … … 1190 1190 static bool wait_for_cur_cbs_gp_end(bool expedite, rcu_gp_t *completed_gp) 1191 1191 { 1192 /* 1192 /* 1193 1193 * Use a possibly outdated version of completed_gp to bypass checking 1194 1194 * with the lock. 1195 * 1196 * Note that loading and storing rcu.completed_gp is not atomic 1197 * (it is 64bit wide). Reading a clobbered value that is less than 1198 * rcu.completed_gp is harmless - we'll recheck with a lock. The 1199 * only way to read a clobbered value that is greater than the actual 1200 * value is if the detector increases the higher-order word first and 1201 * then decreases the lower-order word (or we see stores in that order), 1202 * eg when incrementing from 2^32 - 1 to 2^32. The loaded value 1203 * suddenly jumps by 2^32. It would take hours for such an increase 1204 * to occur so it is safe to discard the value. We allow increases 1195 * 1196 * Note that loading and storing rcu.completed_gp is not atomic 1197 * (it is 64bit wide). Reading a clobbered value that is less than 1198 * rcu.completed_gp is harmless - we'll recheck with a lock. The 1199 * only way to read a clobbered value that is greater than the actual 1200 * value is if the detector increases the higher-order word first and 1201 * then decreases the lower-order word (or we see stores in that order), 1202 * eg when incrementing from 2^32 - 1 to 2^32. The loaded value 1203 * suddenly jumps by 2^32. It would take hours for such an increase 1204 * to occur so it is safe to discard the value. We allow increases 1205 1205 * of up to half the maximum to generously accommodate for loading an 1206 1206 * outdated lower word. 1207 1207 */ 1208 1208 rcu_gp_t compl_gp = ACCESS_ONCE(rcu.completed_gp); 1209 if (CPU->rcu.cur_cbs_gp <= compl_gp 1209 if (CPU->rcu.cur_cbs_gp <= compl_gp 1210 1210 && compl_gp <= CPU->rcu.cur_cbs_gp + UINT32_MAX_HALF) { 1211 1211 *completed_gp = compl_gp; … … 1224 1224 assert(_rcu_cur_gp <= CPU->rcu.cur_cbs_gp); 1225 1225 1226 /* 1227 * Notify the detector of how many GP ends we intend to wait for, so 1226 /* 1227 * Notify the detector of how many GP ends we intend to wait for, so 1228 1228 * it can avoid going to sleep unnecessarily. Optimistically assume 1229 1229 * new callbacks will arrive while we're waiting; hence +1. … … 1232 1232 req_detection(remaining_gp_ends + (arriving_cbs_empty() ? 0 : 1)); 1233 1233 1234 /* 1235 * Ask the detector to speed up GP detection if there are too many 1234 /* 1235 * Ask the detector to speed up GP detection if there are too many 1236 1236 * pending callbacks and other reclaimers have not already done so. 1237 1237 */ 1238 1238 if (expedite) { 1239 if(0 == rcu.req_expedited_cnt) 1239 if(0 == rcu.req_expedited_cnt) 1240 1240 condvar_signal(&rcu.expedite_now); 1241 1241 1242 /* 1243 * Expedite only cub_cbs. If there really is a surge of callbacks 1242 /* 1243 * Expedite only cub_cbs. If there really is a surge of callbacks 1244 1244 * the arriving batch will expedite the GP for the huge number 1245 1245 * of callbacks currently in next_cbs … … 1252 1252 1253 1253 *completed_gp = rcu.completed_gp; 1254 spinlock_unlock(&rcu.gp_lock); 1254 spinlock_unlock(&rcu.gp_lock); 1255 1255 1256 1256 if (!interrupted) … … 1269 1269 /* Wait until wait_on_gp ends. */ 1270 1270 while (rcu.completed_gp < wait_on_gp && !interrupted) { 1271 int ret = _condvar_wait_timeout_spinlock(&rcu.gp_ended, &rcu.gp_lock, 1271 int ret = _condvar_wait_timeout_spinlock(&rcu.gp_ended, &rcu.gp_lock, 1272 1272 SYNCH_NO_TIMEOUT, SYNCH_FLAGS_INTERRUPTIBLE); 1273 1273 interrupted = (ret == EINTR); … … 1298 1298 1299 1299 while (wait_for_detect_req()) { 1300 /* 1300 /* 1301 1301 * Announce new GP started. Readers start lazily acknowledging that 1302 1302 * they passed a QS. … … 1306 1306 spinlock_unlock(&rcu.gp_lock); 1307 1307 1308 if (!wait_for_readers()) 1308 if (!wait_for_readers()) 1309 1309 goto unlocked_out; 1310 1310 … … 1329 1329 1330 1330 while (0 == rcu.req_gp_end_cnt && !interrupted) { 1331 int ret = _condvar_wait_timeout_spinlock(&rcu.req_gp_changed, 1331 int ret = _condvar_wait_timeout_spinlock(&rcu.req_gp_changed, 1332 1332 &rcu.gp_lock, SYNCH_NO_TIMEOUT, SYNCH_FLAGS_INTERRUPTIBLE); 1333 1333 … … 1357 1357 cpu_mask_active(reading_cpus); 1358 1358 1359 /* 1360 * Give readers time to pass through a QS. Also, batch arriving 1359 /* 1360 * Give readers time to pass through a QS. Also, batch arriving 1361 1361 * callbacks in order to amortize detection overhead. 1362 1362 */ … … 1417 1417 } 1418 1418 1419 /** Invoked on a cpu delaying grace period detection. 1420 * 1421 * Induces a quiescent state for the cpu or it instructs remaining 1419 /** Invoked on a cpu delaying grace period detection. 1420 * 1421 * Induces a quiescent state for the cpu or it instructs remaining 1422 1422 * readers to notify the detector once they finish. 1423 1423 */ … … 1432 1432 if (0 < CPU->rcu.nesting_cnt) { 1433 1433 assert(!CPU->idle); 1434 /* 1435 * Note to notify the detector from rcu_read_unlock(). 1436 * 1434 /* 1435 * Note to notify the detector from rcu_read_unlock(). 1436 * 1437 1437 * ACCESS_ONCE ensures the compiler writes to is_delaying_gp 1438 1438 * only after it determines that we are in a reader CS. … … 1443 1443 atomic_inc(&rcu.delaying_cpu_cnt); 1444 1444 } else { 1445 /* 1446 * The cpu did not enter any rcu reader sections since 1445 /* 1446 * The cpu did not enter any rcu reader sections since 1447 1447 * the start of the current GP. Record a quiescent state. 1448 * 1448 * 1449 1449 * Or, we interrupted rcu_read_unlock_impl() right before 1450 * it recorded a QS. Record a QS for it. The memory barrier 1451 * contains the reader section's mem accesses before 1450 * it recorded a QS. Record a QS for it. The memory barrier 1451 * contains the reader section's mem accesses before 1452 1452 * updating last_seen_gp. 1453 * 1453 * 1454 1454 * Or, we interrupted rcu_read_lock() right after it recorded 1455 1455 * a QS for the previous GP but before it got a chance to … … 1461 1461 } 1462 1462 } else { 1463 /* 1464 * This cpu already acknowledged that it had passed through 1465 * a quiescent state since the start of cur_gp. 1463 /* 1464 * This cpu already acknowledged that it had passed through 1465 * a quiescent state since the start of cur_gp. 1466 1466 */ 1467 1467 } 1468 1468 1469 /* 1469 /* 1470 1470 * smp_call() makes sure any changes propagate back to the caller. 1471 1471 * In particular, it makes the most current last_seen_gp visible … … 1495 1495 assert(interrupts_disabled()); 1496 1496 1497 /* 1497 /* 1498 1498 * Prevent NMI handlers from interfering. The detector will be notified 1499 * in this function if CPU->rcu.is_delaying_gp. The current thread is 1499 * in this function if CPU->rcu.is_delaying_gp. The current thread is 1500 1500 * no longer running so there is nothing else to signal to the detector. 1501 1501 */ 1502 1502 CPU->rcu.signal_unlock = false; 1503 /* 1504 * Separates clearing of .signal_unlock from accesses to 1503 /* 1504 * Separates clearing of .signal_unlock from accesses to 1505 1505 * THREAD->rcu.was_preempted and CPU->rcu.nesting_cnt. 1506 1506 */ … … 1516 1516 } 1517 1517 1518 /* 1518 /* 1519 1519 * The preempted reader has been noted globally. There are therefore 1520 1520 * no readers running on this cpu so this is a quiescent state. … … 1522 1522 _rcu_record_qs(); 1523 1523 1524 /* 1525 * Interrupt handlers might use RCU while idle in scheduler(). 1526 * The preempted reader has been noted globally, so the handlers 1524 /* 1525 * Interrupt handlers might use RCU while idle in scheduler(). 1526 * The preempted reader has been noted globally, so the handlers 1527 1527 * may now start announcing quiescent states. 1528 1528 */ 1529 1529 CPU->rcu.nesting_cnt = 0; 1530 1530 1531 /* 1532 * This cpu is holding up the current GP. Let the detector know 1533 * it has just passed a quiescent state. 1534 * 1535 * The detector waits separately for preempted readers, so we have 1531 /* 1532 * This cpu is holding up the current GP. Let the detector know 1533 * it has just passed a quiescent state. 1534 * 1535 * The detector waits separately for preempted readers, so we have 1536 1536 * to notify the detector even if we have just preempted a reader. 1537 1537 */ … … 1541 1541 } 1542 1542 1543 /* 1543 /* 1544 1544 * Forcefully associate the detector with the highest priority 1545 1545 * even if preempted due to its time slice running out. 1546 * 1546 * 1547 1547 * todo: Replace with strict scheduler priority classes. 1548 1548 */ 1549 1549 if (THREAD == rcu.detector_thr) { 1550 1550 THREAD->priority = -1; 1551 } 1551 } 1552 1552 else if (THREAD == CPU->rcu.reclaimer_thr) { 1553 1553 THREAD->priority = -1; 1554 } 1554 } 1555 1555 1556 1556 upd_max_cbs_in_slice(CPU->rcu.arriving_cbs_cnt); … … 1566 1566 CPU->rcu.nesting_cnt = THREAD->rcu.nesting_cnt; 1567 1567 1568 /* 1568 /* 1569 1569 * Ensures NMI see the proper nesting count before .signal_unlock. 1570 1570 * Otherwise the NMI may incorrectly signal that a preempted reader … … 1573 1573 compiler_barrier(); 1574 1574 1575 /* 1576 * In the unlikely event that a NMI occurs between the loading of the 1577 * variables and setting signal_unlock, the NMI handler may invoke 1575 /* 1576 * In the unlikely event that a NMI occurs between the loading of the 1577 * variables and setting signal_unlock, the NMI handler may invoke 1578 1578 * rcu_read_unlock() and clear signal_unlock. In that case we will 1579 1579 * incorrectly overwrite signal_unlock from false to true. This event 1580 * is benign and the next rcu_read_unlock() will at worst 1580 * is benign and the next rcu_read_unlock() will at worst 1581 1581 * needlessly invoke _rcu_signal_unlock(). 1582 1582 */ … … 1584 1584 } 1585 1585 1586 /** Called from scheduler() when exiting the current thread. 1587 * 1586 /** Called from scheduler() when exiting the current thread. 1587 * 1588 1588 * Preemption or interrupts are disabled and the scheduler() already 1589 1589 * switched away from the current thread, calling rcu_after_thread_ran(). … … 1595 1595 assert(PREEMPTION_DISABLED || interrupts_disabled()); 1596 1596 1597 /* 1598 * The thread forgot to exit its reader critical section. 1597 /* 1598 * The thread forgot to exit its reader critical section. 1599 1599 * It is a bug, but rather than letting the entire system lock up 1600 * forcefully leave the reader section. The thread is not holding 1600 * forcefully leave the reader section. The thread is not holding 1601 1601 * any references anyway since it is exiting so it is safe. 1602 1602 */ … … 1623 1623 ++_rcu_cur_gp; 1624 1624 1625 /* 1625 /* 1626 1626 * Readers preempted before the start of this GP (next_preempted) 1627 * are preexisting readers now that a GP started and will hold up 1627 * are preexisting readers now that a GP started and will hold up 1628 1628 * the current GP until they exit their reader sections. 1629 * 1630 * Preempted readers from the previous GP have finished so 1631 * cur_preempted is empty, but see comment in _rcu_record_qs(). 1629 * 1630 * Preempted readers from the previous GP have finished so 1631 * cur_preempted is empty, but see comment in _rcu_record_qs(). 1632 1632 */ 1633 1633 list_concat(&rcu.cur_preempted, &rcu.next_preempted); … … 1642 1642 { 1643 1643 /* 1644 * Ensure the announcement of the start of a new GP (ie up-to-date 1645 * cur_gp) propagates to cpus that are just coming out of idle 1644 * Ensure the announcement of the start of a new GP (ie up-to-date 1645 * cur_gp) propagates to cpus that are just coming out of idle 1646 1646 * mode before we sample their idle state flag. 1647 * 1647 * 1648 1648 * Cpus guarantee that after they set CPU->idle = true they will not 1649 1649 * execute any RCU reader sections without first setting idle to … … 1660 1660 * on the previously idle cpu -- again thanks to issuing a memory 1661 1661 * barrier after returning from idle mode. 1662 * 1662 * 1663 1663 * idle -> non-idle cpu | detector | reclaimer 1664 1664 * ------------------------------------------------------ 1665 1665 * rcu reader 1 | | rcu_call() 1666 1666 * MB X | | 1667 * idle = true | | rcu_call() 1668 * (no rcu readers allowed ) | | MB A in advance_cbs() 1667 * idle = true | | rcu_call() 1668 * (no rcu readers allowed ) | | MB A in advance_cbs() 1669 1669 * MB Y | (...) | (...) 1670 * (no rcu readers allowed) | | MB B in advance_cbs() 1670 * (no rcu readers allowed) | | MB B in advance_cbs() 1671 1671 * idle = false | ++cur_gp | 1672 1672 * (no rcu readers allowed) | MB C | 1673 1673 * MB Z | signal gp_end | 1674 1674 * rcu reader 2 | | exec_cur_cbs() 1675 * 1676 * 1675 * 1676 * 1677 1677 * MB Y orders visibility of changes to idle for detector's sake. 1678 * 1679 * MB Z pairs up with MB C. The cpu making a transition from idle 1678 * 1679 * MB Z pairs up with MB C. The cpu making a transition from idle 1680 1680 * will see the most current value of cur_gp and will not attempt 1681 1681 * to notify the detector even if preempted during this GP. 1682 * 1682 * 1683 1683 * MB Z pairs up with MB A from the previous batch. Updaters' changes 1684 * are visible to reader 2 even when the detector thinks the cpu is idle 1684 * are visible to reader 2 even when the detector thinks the cpu is idle 1685 1685 * but it is not anymore. 1686 * 1686 * 1687 1687 * MB X pairs up with MB B. Late mem accesses of reader 1 are contained 1688 * and visible before idling and before any callbacks are executed 1688 * and visible before idling and before any callbacks are executed 1689 1689 * by reclaimers. 1690 * 1690 * 1691 1691 * In summary, the detector does not know of or wait for reader 2, but 1692 1692 * it does not have to since it is a new reader that will not access … … 1696 1696 1697 1697 cpu_mask_for_each(*cpu_mask, cpu_id) { 1698 /* 1699 * The cpu already checked for and passed through a quiescent 1698 /* 1699 * The cpu already checked for and passed through a quiescent 1700 1700 * state since the beginning of this GP. 1701 * 1702 * _rcu_cur_gp is modified by local detector thread only. 1703 * Therefore, it is up-to-date even without a lock. 1704 * 1701 * 1702 * _rcu_cur_gp is modified by local detector thread only. 1703 * Therefore, it is up-to-date even without a lock. 1704 * 1705 1705 * cpu.last_seen_gp may not be up-to-date. At worst, we will 1706 * unnecessarily sample its last_seen_gp with a smp_call. 1706 * unnecessarily sample its last_seen_gp with a smp_call. 1707 1707 */ 1708 1708 bool cpu_acked_gp = (cpus[cpu_id].rcu.last_seen_gp == _rcu_cur_gp); … … 1750 1750 list_append(&THREAD->rcu.preempt_link, &rcu.cur_preempted); 1751 1751 } else { 1752 /* 1752 /* 1753 1753 * The reader started after the GP started and this cpu 1754 1754 * already noted a quiescent state. We might block the next GP. … … 1774 1774 bool last_removed = now_empty && !prev_empty; 1775 1775 1776 /* 1777 * Preempted readers are blocking the detector and 1778 * this was the last reader blocking the current GP. 1776 /* 1777 * Preempted readers are blocking the detector and 1778 * this was the last reader blocking the current GP. 1779 1779 */ 1780 1780 if (last_removed && rcu.preempt_blocking_det) { … … 1801 1801 1802 1802 return semaphore_down_interruptable(&rcu.remaining_readers); 1803 } 1803 } 1804 1804 1805 1805 return true; … … 1821 1821 void rcu_print_stat(void) 1822 1822 { 1823 /* 1824 * Don't take locks. Worst case is we get out-dated values. 1825 * CPU local values are updated without any locks, so there 1823 /* 1824 * Don't take locks. Worst case is we get out-dated values. 1825 * CPU local values are updated without any locks, so there 1826 1826 * are no locks to lock in order to get up-to-date values. 1827 1827 */ … … 1834 1834 1835 1835 printf("Config: expedite_threshold=%d, critical_threshold=%d," 1836 " detect_sleep=%dms, %s\n", 1836 " detect_sleep=%dms, %s\n", 1837 1837 EXPEDITE_THRESHOLD, CRITICAL_THRESHOLD, DETECT_SLEEP_MS, algo); 1838 1838 printf("Completed GPs: %" PRIu64 "\n", rcu.completed_gp); 1839 1839 printf("Expedited GPs: %zu\n", rcu.stat_expedited_cnt); 1840 printf("Delayed GPs: %zu (cpus w/ still running readers after gp sleep)\n", 1840 printf("Delayed GPs: %zu (cpus w/ still running readers after gp sleep)\n", 1841 1841 rcu.stat_delayed_cnt); 1842 1842 printf("Preempt blocked GPs: %zu (waited for preempted readers; "
Note:
See TracChangeset
for help on using the changeset viewer.