Cause and Effect

While working on idle power management, I used vaidy’s klog based patches to profile an idle system to obtain stats such as:

* The time when a cpu enters into the tickless idle mode
* The various interrupts that bring the cpu out of idle state.
* The timers that expire in this interval and cause a wake up.
* The tasks that demand/ or are made to be wake up on the idle cpu.
* The time when the cpu comes out of the tickless idle mode and starts executing the tasks.

While observing the task wakeup instrumentation data, I noticed that the wakeup statistics for kondemand appeared pretty strange. For the uninitiated, kondemand is a kernel thread that belongs to the ondemand governor of cpufreq subsystem, which changes the p-states of the system, based on the utilization statistics. Thus it’s something that helps in power management.

root@llm43 tests]# ps aux | grep kondemand | head -4
root 1143 0.0 0.0 0 0 ? S< 09:21 0:00 [kondemand/0]
root 1145 0.0 0.0 0 0 ? S< 09:21 0:00 [kondemand/1]
root 1146 0.0 0.0 0 0 ? S< 09:21 0:00 [kondemand/2]
root 1147 0.0 0.0 0 0 ? S< 09:21 0:00 [kondemand/3]

From the file wakeups.txt, an output of my profiling experiment,

pid cpu nr_wakeups
————————–
1143 0 468
1145 1 279
1146 2 78
1147 3 68

Couple of things bothered me here.

  1. The unusually high number of wakeups on CPU0 and CPU1. kondemand was wakeing up approximately at the rate of 4 time and 2 times respectively on these cpu’s over a observation idle period of 120 seconds.
  2. The difference in the number of wakeups by kondemand on the different CPUs.

Bewildered, I fired a mail to Venki asking for possible explanations.
And I started looking at the code. Now, the number of times the kondemand thread is supposed to check for a change in the frequency is determined by this sysfs tunable called sampling_rate. It was set to 256000us on my system. Which accounted for the unusually high number of wakeups on the CPUs.

But I was still confused. The sampling_rate is a global tunable which maps to the variable dbs_tuners_ins.sampling rate, which is common to all the kondemand threads. Then why the different wakeup rates on different CPUs?

Venki replied to my original query reminding me that kondemand uses deferrable timers! That explained everything.

Deferrable timers, behave normally on a busy system. But on a idle system, when are about to decide when should we wake up next inorder to service the next timer in the list, we skip any deferrable timer we encounter.Thus, a deferrable timer on an idle cpu would expire when the next nearest *hard* timer would expire.

So, the reason why CPU0 and CPU1 were having such high number of wake ups on an idle system can be accounted to the fact the expiry of some other timer like the ehci_watchdog would trigger the expiry of kondemand timer, and along with it the wakeup of the kondemand thread! And depending on the number of timers that are queued on different CPUs, we have the corresponding number of wakeups of kondemand thread!

So, what I was thinking to be the major cause for wakeups in the kernel, turned out to be an effect of the expiring timers queued by a totally unrelated subsystem, thus confirming the old wisdom of mathematical logic: “If two events occur one after the other, it doesn’t necessarily imply that one is the cause for the other”

Advertisements

About gautshen

A jack of many trades of which , Linux Kernel Programming puts food on the table. Also pursuing his PhD in the area Theoretical Computer Science at the Chennai Mathematical Institute. Is an avid reader interested in the Hindu traditions and philosophy. Loves Bicycling and Good Music. Name is Ranjal Gautham Shenoy.
This entry was posted in fundoo, geek, interesting, linux, programming and tagged , , , , . Bookmark the permalink.

4 Responses to Cause and Effect

  1. theG says:

    Nice one! Though its ironical, kondemand causes a lot of wakeups :). or maybe it does not considering the wakeups are happening when a hard timer expires.

  2. Balbir Singh says:

    Good one, but why should kondemand poll every 256000us? We need kondemand to turn off, if the CPU goes to idle (should be a part of idle code) and then start kondemand when we come out of idle. Also, one needs to look at the affect of sampling_rate on power saving and ignore_nice_load. I want to try and see how the other governers do w.r.t power savings as well.

  3. ego says:

    @theG
    I too thought so. But kondemand won’t wake up an idle cpu by itself. But when the processor is already back from idle, it won’t mind performing the check to change the processor frequency.

    @Balbir Singh
    From what I understood, the sampling_rate will have a bearing only on busy cpus or cpus on which there are a lot of timers. So one of the things that can be done is to make sampling_rate a read-only variable that gives the instantaneous sampling rate. But internally, the sampling_rate changes based on the idle characteristics of the processor. Not sure how effective this will be though!

  4. theG says:

    @ego
    hmm. I am wondering if that is a good idea. What should be that only one of the kondemand interrupts be honoured. if there is already a kondemand timer is queued up (to run that is), drop the other ones.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s