% vim:filetype=linux textwidth=73 ts=2 sw=2: Timers and time management Background Kinds of time Relative: for scheduling events in the future Absolute Wall time: actual time of day. var xtime [later] Uptime: since boot. var jiffies [later] Kinds of scheduling Periodic: driven by system timer, timer interrupt [later] Dynamic timers: run once [later] Timer interrupt Period of time called a tick; tick = (1/tickRate) seconds Fixed tick rate (architecture dependent) macro HZ Higher: finer resolution, so higher accuracy of timed events Average error is half a tick Resolution is important for syscall poll() and syscall select() Higher: decreased scheduling latency: worst-case overrun 1 tick Lower: less tick ISR overhead ISR accomplishes update uptime, wall time, statistics rebalance SMP runqueues reschedule if current timeslice exhausted run dynamic timer handlers Timer interrupt is a design choice; one could design OS without it. All timers become dynamic Uptime Uptime is stored in var jiffies seconds = jiffies / HZ jiffies = seconds * HZ notice strange declaration, overlaying jiffies with jiffies_64 applications can read with get_jiffies_64() [trace] most applications use lower 32 bits, use var jiffies. wraparound in 1193 hours = 49 days a trick: the first wraparound in 5 minutes after boot. coding: unsigned long timeout = jiffies + HZ * secondsToDelay; ... // do work if (jiffies > timeout) { either wraparound or timeout } if (time_after(jiffies, timeout)) { timeout } other macros: time_before(j, t), time_after_eq(j, t), time_before_eq(j, t) for reporting time to applications, convert via USER_HZ jiffies_to_clock_t(j) jiffies_64_to_clock_t(j) Supporting hardware Real-time clock (RTC) Runs on battery even when computer turned off. Linux reads at startup to set var xtime. can generate interrupts (2Hz -> 8KHz) (Linux doesn't use) see /proc/interrupts ; irq8 is /dev/rtc ports 0x70, 0x71; see /proc/ioports Linux only uses the RTC to derive time/date on boot system call settimeofday() can read/write RTC [I think] Programmable interrupt timer (PIT) (on i386) Linux initializes it to drive the system timer interrupt interrupts on IRQ0; see /proc/interrupts . Any CPU can handle. ports 0x40 - 0x43 see /proc/ioports macro CLOCK_TICK_RATE (internal oscillator frequency), macro LATCH initialized in setup_pit_timer() IRQ0 gate established by time_init() -> time_init_hook() ISR is timer_interrupt() Time stamp counter (TSC) read by rdtsc assembler instruction typical rate: 400 MHz calibrated by calibrate_tsc() at boot used to report times to higher precision than the PIT APIC timers: there are 32 (recent Pentiums) local interrupt only; used to trigger CPU-specific activities initialization: setup_boot_APIC_clock() > setup_APIC_timer(void * data) on each CPU spreads the interrupts on different CPUs across interval (avoids some spinlock contention) ISR is smp_apic_timer_interrupt() Timer ISR architecture-dependent: timer_interrupt() acquire xtime_lock, acknowledge or reset timer, maybe update RTC, call do_timer(). [trace: timer_interrupt() -> do_timer_interrupt() -> do_timer_interrupt_hook() -> do_timer()] architecture-independent: do_timer() [this has changed in 2.6.10] increment jiffies_64 -> update_times(): increments wall_jiffies and xtime It's not clear who calls it, but update_process_times(): -> update_one_process(): add to current-process's statistics -> run_local_timers(): raises softirq to handle any expired timers -> scheduler_tick(): decrements remaining timeslice, maybe rescheds Current time of day (wall clock): var xtime Mostly accessed by user code and filesystem code. Measures time elapsed since "the epoch": Jan 1, 1970. Will overflow in 2038 (32 bits of seconds) Protected by xtime_lock (a seqlock) To read: syscall gettimeofday() -> sys_gettimeofday() -> do_gettimeofday() [there are several versions] Obsolete syscall time() -> sys_time() To write: syscall settimeofday() -> sys_settimeofday() [notice indirection through security_ops for security; doesn't apparently use CAP_SYS_TIME] Dynamic timers (also just called "timers" or "kernel timers") Purpose: delay work at least the specified amount of time. preferably not much longer. destroyed ("decay") after they expire. dynamically created and destroyed. separate list of timers for each cpu. data: struct timer_list field expires: jiffies at which time to expire (absolute) field function: what to call on expiry field data: sole parameter to the function field base: not for client's use "toying with this data structure is discouraged" dynamic creation: struct timer_list myTimer; init_timer(&myTimer); -- must happen before calling any functions myTimer.expires = jiffies + delay; myTimer.function = myFunction; myTimer.data = whatever; -- if desired add_timer(&myTimer); -- schedules the timer mod_timer(&myTimer, jiffies + newDelay); -- to change expiry time also places the timer on current cpu's list del_timer(&myTimer); -- to remove a scheduled timer returns Boolean: was the timer scheduled ("active")? del_timer_sync(&myTimer); -- to remove a scheduled timer, wait until myFunction finishes if it is in progress guarantee: The timer will not expire before its scheduled time. It usually will expire at its scheduled time, occasionally one tick later, but don't depend on it for hard deadline scheduling. race conditions: Don't replace mod_timer() by del_timer(), expires = Val, add_timer(). On SMP, myFunction could be in progress. Use del_timer_sync() instead of del_timer(). Now you can assume that myFunction is not running. Protect any data shared by myFunction and elsewhere with locks. But myFunction must not sleep. It runs in bottom-half context in a softirq. Should delete timer before removing resources myFunction might touch Trigger: softirq TIMER_SOFTIRQ -> run_timer_softirq() -> __run_timers() scheduled for this cpu (there is one list for each cpu) Implementation: Kernel partitions timers into five groups based on expiration time. Timers move down through the groups as expiry nears. Most triggers are handled with very little work. Data: tvec_base_t holds 5 vectors of lists; first has 256, others 64 timer_jiffies: when the timers were last expired. each list contains timers to expire during a given tick. cascades all the timers in one list entry to lower list when lower list depleted. eg, when timer_jiffies mod 256 = 0, cascade all of a tv1 entry into tv0. tv1: timers that will decay in the next 255 ticks (.255 seconds) tv2: timers that will decay in the next 2^14 ticks (16 seconds) tv3: timers that will decay in the next 2^20 ticks (17 minutes) tv4: timers that will decay in the next 2^26 ticks (18 hours) tv5: timers that will decay in the next inf ticks How to delay execution bottom half: if work needs to be done soon, but not now. timer: if work needs to be done starting at a known point in future, and we can sleep (in KSPC, not holding locks) busy looping coarse-grain: by ticks {while (time_before(jiffies, rightTime)) ;} luckily, jiffies is declared as volatile in linux/include/linux/jiffies.h:79 coarse-grain, allowing sleep {while (time_before(jiffies, rightTime)) cond_resched();} only yields when need_resched is set. fine-grain: by microsecond or millisecond udelay(usecs) -- not useful for delays over a millisecond mdelay(msecs) use bogomips calibration (/proc/cpuinfo) invoking scheduler: only good from KSPC, best when not holding locks {set_current_state(TASK_INTERRUPTIBLE); schedule_timeout(s * HZ);} [trace] sleep on a wait queue and also set a timeout call schedule_timeout() instead of schedule() once on the wait queue. on awaken could be because event occurred, timer expired, or interrupt.