[ L T Net ] OPEN-EVENTS :: OPEN MUSIC :: MINICONTENTLINUXTAG.org 
Cornerstone
// LinuxTag 2006
Besuchen Sie uns auch nächstes Jahr wieder auf dem LinuxTag 2006 im Karlsruher Messe- und Kongresszentrum. Für nähere Details und den genauen Termin besuchen Sie bitte die LinuxTag Homepage.
EUROPAS GRÖSSTE GNU/LINUX MESSE UND KONFERENZ
KONFERENZ-DVD 2005
 Hauptseite  Vorträge  Bücher  History  Software  Kanotix  Videos  Sponsoren  Abspann  Impressum 
Hauptseite // Vorträge // Current trends in Linux Kernel Power Management

Current trends in Linux Kernel Power Management

Dominik Brodowski

University of Tübingen

Dieser Beitrag ist unter der Creative Commons Licence lizensiert.

Abstract

The biggest advantage of any notebook -- it being able to run on battery power -- is severely limited by the high default energy consumption of modern hardware. Also, power consumption means heat generation. With this becoming one of the most striking problems to increasing CPU frequencies, features previously only known to notebooks have been introduced to server and workstation CPUs. Because of these aspects, it is necessary to make use of advanced power saving techniques. The Linux kernel lacked support and awareness of this issue for a long time, but during the past two years much effort has been spent on reducing energy usage. This talk will discuss several projects and patches related to runtime power management, show their effect on power consumption and report on their current status and whether an inclusion into the Linux kernel sources maintained by Linus Torvalds seems likely.

Reducing power usage with the system running as normal as possible is one emerging area of power management in the Linux kernel. Several different projects aim at the reduction of CPU power consumption. The frequency the CPU executes code can be varied, even dynamically (cpufreq); the CPU can be put into a low power mode if there is nothing to do (ACPI C-States), and the CPU doesn't need to be woken up each millisecond from this low power mode if there is no work to do (tickless system). While cpufreq and ACPI C-States are ready for prime-time usage, they continue to be an area of development. In contrast, tickless systems are mostly a theory of concept so far, but improved patches seem likely for the near future.

In addition, other parts of the system also can be powered down if there is nothing to do for them. For example, the so-called "laptop mode" allows for harddisks to spin down for longer periods of time. USB, PCI and PCMCIA devices and LCD backlights can be put into a low power state manually if they are not needed.

If the user is willing to spend some time to properly set up the system, it is already possible to use notebooks for a long time on battery power or to reduce the energy consumption of servers and workstations. This talk will give some hints on this setup process and show trends which will likely lead to even better power consumption rates in future.


Introduction

Demand for Runtime Power Management

With the number of notebooks being sold increasing super proportionally[1], with hand-held devices and wearable computing[2] allowing “ubiquitous computing”[3], the Linux Operating System needs to become ready for what Nils Faerber called its “third age”[4] on last year's LinuxTag: excellent support for stand-alone, “wireless” computing devices. Besides the continuing troubles for proper device driver support – meaning that they work at all – this calls for managing their power consumption so that they run for long amounts of time with the existing, limited energy sources.

One different aspect of power management is that it reduces the amount of heat being generated, prolonging the life cycle of devices, allowing for passive (fan-less) cooling and thus causing less noise. Therefore, power management largely improves the technical usability of devices.

Runtime Power Management in contrast to Suspending

Two different concepts are distinguished in power management: suspending and runtime power management. The former means techniques like standby (ACPI[5] S1), suspend to memory (ACPI S3) or disk (ACPI S4) where the computer is put into a non-working “sleeping” state. In contrast, runtime power management attempts to provide uninterrupted usability of the system, and so tries to stay hidden from the user point of view.

As this paper focuses on runtime power management, here follows only a short description on these techniques: The standby mode stops the CPU, but with all relevant data continuing to be stored in memory, normal operation can be resumed within short time. When suspending to memory, the CPU and several other devices are powered down and need to be re-initialized with information stored in system memory. This means the waking process takes a bit longer. Suspend to disk saves the userspace state into an image on a hard disk and then powers off the computer completely.[6] When the computer is powered on the next time, the image is restored and terminals, editors and Internet browsers will show the same content and will behave exactly the same way as before the suspending command having been issued.

Even though the ACPI standard[7] attempts to provide an operating-system independent way to put a computer into such a sleeping state and to awake it again, the actual implementations need to be aware of many hardware-specific issues. With several manufacturers withholding the information necessary to write appropriate drivers, support for suspending devices is still highly experimental in Linux.[8]

Linux and Runtime Power Management

Linux traditional had a “weak spot” with regard to runtime power management. Partly no appropriate documentation for the highly integrated, specific hardware of notebooks has been made available to developers,[9] partly there exist constraints in the operating system,[10] and partly developers lacked interest – and time – to write appropriate code. Lately however, much work has gone into reducing the power consumption of mobile devices running Linux. Therefore, this paper will explain some of the features available in current distributions and kernels, and some which we'll likely see being merged in the near future.

Power Consumption

As they promise the largest reduction of energy consumption, it is important to determine those devices which typically consume most power. With processors specifically designed for mobile devices being available, the picture isn't as clear as it was before, though: while on desktops and servers CPUs require most power (up to 60%),[11] the display back light has become the leading consumer on modern notebooks.[12] Other major consumers are hard disks, system memory and wireless LAN devices.

As CPUs offer quite many different and effective methods to reduce their energy consumption, this paper will cover them first.

CPU Power Management

As the Central Processing Unit (CPU) of computers consume large amounts of energy, several different techniques attempt to reduce this. If there is no work to do, the processor is put into an low-power idle state (1), the frequency the CPU operates at can be modulated (2), and the CPU can be forced to a non-working state for short periods of time (throttling, 3).

Idling

Possibly the most important runtime power management technique is the „idling“ of CPUs. Under normal operation the CPU only needs to execute code once in a while; most of the time it either waits for data to arrive from hard disks or the user, or there simply isn't anything it could do. For example, on the author's system the CPU is only needed approx. 4% of the time while writing this text.

Theory of Operation

When there is no work to do, certain parts of the CPU – the actual code execution units, for example – can be shut down and re-activated once they are required for operation again.

On modern processors there exist multiple different such “idle states”[13], which allows to have states available where the CPU can be reactivated most quickly, but the power savings are limited (on the x86 architecture, this is the “hlt” command), and other states where on the one hand more power is saved, but on the other hand the wakeup latency – the time from an interrupt event to the CPU being ready to execute code again – increases.

Table 1 shows the energy consumption of some common CPUs of the x86 architecture depending on the idle state. With reductions of up to 60 to 98% compared to “continuous operation” and common CPU usage rates of 5 to 50%, this means a reduction in CPU energy consumption somewhere between 30% and 90%.

Improvement #1: ACPI _CST

With more different idle states being implemented in hardware which allow for a more fine-grained compromise between energy consumption rates and responsiveness, the operating system needs to be “smart” enough to activate them. On the x86 hardware, only the ACPI subsystem can utilize multiple idle states, and even it was designed for only three different idle states at first. The ACPI specification revision 2.0 added a new method called “_CST” for communication between the BIOS and the operating system to allow for even more idle states. Starting with kernel 2.6.11; support for this method was added to the Linux kernel. You can check which ACPI-based idle states are available on your system by cat'ting /proc/acpi/processor/*/power.

Improvement #2: Multiprocessor Support

As one physical CPU can only be put into an idle state as a whole, processors implementing synchronous multi-threading (SMT, for example HyperThreading) or multiple sub-CPUs on one CPU die (multi-core CPUs) need special care when they are to be put into an idle state. Also, on platforms using multiple physical CPU packages, special care needs to be taken that the CPU caches maintain valid content.[14] Due to these obstacles, idle states higher than C1 are not present in many BIOSes and platforms, and the Linux kernel even lacked support for them as of release 2.6.12. A small and simple patch already included into the ACPI development tree adds support for real, effective idle states on multiprocessor systems. As it is a hardly invasive patch, it is very likely to be merged into kernel 2.6.13 and might already be available in kernels used at the time of LinuxTag 2005.

Improvement #3: Tickless Systems

While the much discussed change from 100 HZ to 1000 HZ in kernel 2.6. – meaning the kernel is activated 1000 times a second (“ticks”) to determine which process is to run next and to fulfil some “housekeeping” tasks – provides better interactivity, it also means the time between entering and leaving idle states decreased dramatically. Taking into consideration that the putting into sleep and waking from sleep also consumes energy[15] it becomes clear that the power usage increased because of this change. Therefore, certain distribution kernels are specially modified for notebooks and continue to run with 100 HZ on mobile devices.

The technically superior approach is to not wake up every millisecond, but only when there actually is a new task to run (“dynamic ticks” or “tickless systems”) or interrupt activity (e.g. the user hitting a key on the keyboard) demands reaction by the kernel and userspace programs. With the Linux kernel relying heavily on the concept of “jiffies” (which is a counter incremented each “tick”) for timers, fair scheduling, and the interrupt generating hardware needing special care to be deactivated or to run at modifying rates, making the kernel “tickless” is a demanding and tough task. With the timing core of the kernel likely to undergo a major overhaul[16] making it aware of tickless systems, one obstacle seemed to disappear soon. Surprisingly, a group of developers did not even wait for this feature and proved with a medium-sized patch[17] that tickless systems are not as far away as feared. Nonetheless, this approach still needs to be considered highly experimental and the modified code should not be used on so-called production systems yet.

Frequency scaling

Perhaps the best-known runtime power management technique is CPU frequency and voltage scaling – not from its technical name, but from marketing names like Intel(R) SpeedStep Technology, AMD PowerNow! and Cool&Quiet! or Transmeta Longrun, to name a few.

Basic Principles

What makes this such an interesting feature? If the CPU clock frequency is lowered, the energy consumption is reduced linearly. However, the voltage driving the CPU can be lowered as well, resulting in a highly increased “instruction per energy consumption” ratio[18]. In plain speak: if you accept to wait longer for the result, you can compute much more data with the same amount of energy. Taking into consideration that usually there is not much work to do for the CPU – even watching a DVD is not very CPU intensive, and most often there are other “bottlenecks” like internet connections – and frequency and voltage scaling also has positive side-effects on CPU Power States[19], this is most definitely something a user should take use of when running a notebook on battery power.

Support for CPU frequency (and voltage) scaling is highly hardware-dependant. Therefore, only a minority of all platforms have CPU frequency scaling implemented; however with the emergence of “AMD Cool&Quiet” and “Intel Enhanced SpeedStep” this technology has propagated to the server market.

In the Linux kernel, support for CPU frequency and voltage scaling is provided by the cpufreq subsystem in all 2.6. kernels[20], and more and more hardware drivers are added. However, especially the userspace side of cpufreq is still in heavy development, with different daemons and tools springing out of the ground at a high rate. A common midlayer, cpufrequtils, is emerging, and using its cpufreq-info tool you can determine whether cpufreq is activated on a system.

Improvement #1: Dynamic Frequency Scaling

While frequency switching only occurring on changes of the power source – an AC adapter being attached or removed – already provides important reductions in energy consumption and thus was used in first-generation CPU frequency scaling implementations, on-demand switching between multiple frequency levels leads to continuous heat reduction and allows to utilize the CPU fully even on battery power, with energy consumption suffering only a minor increase.

In-kernel or userspace scheduling?

While the academic community, which lead the development of Linux cpufreq at first, suggested to let userspace tools determine the appropriate CPU frequency,[21] Linus Torvalds objected to this approach and explained that only the kernel has sufficient knowledge to select the best appropriate CPU frequency.[22]

Therefore, the current Linux cpufreq infrastructure is built around the concept of in-kernel governors. A governor is an algorithm which calculates the CPU frequency it considers appropriate for any given moment. One such governor – cynics may call it a pseudo-governor – allows for userspace control over the CPU frequency. Therefore, both approaches – userspace and kernelspace deciding over the processor speed – are possible and widely used.

So, which method should an user select? Userspace governors provide for more fine-grained tuning, many more and different algorithms are available in a multitude of userspace cpufreq daemons,[23] and academic studies have proved the technical superiority of this approach.

However, we do live in an imperfect world: these academic studies depended on highly specialised software which added callbacks to the daemon governing the processor speed; they rely on trusting the values given by these callbacks, and these devices had a highly predictable CPU load percentage. While on specialised embedded systems this userspace approach indeed seems to be best, on general-purpose computers, where many different and non-specialised and sometimes old software programs need to run which cannot easily have such callbacks added, only the kernel “knows” how much processing time it wants programs to give, and programs needed to run (process scheduler)[24], how many IO requests are pending (io and net schedulers) and whether the system temperature is getting too hot (ACPI, lm_sensors).[25] Combining all this data to determine the most appropriate CPU frequency is the difficult task in-kernel dynamic cpufreq governors need to fulfil.

Different in-kernel governors

As of 2.6.12, there are four cpufreq governors present in the Linux kernel. You can select which one to use using the command “cpufreq-set -g GOVERNOR” provided by the cpufrequtils package. The userspace governor was already described above; it allows for userspace control over the CPU frequency: “cpufreq-set -f FREQUENCY”. The performance and powersave governors statically select the highest and lowest CPU frequency currently available. Only the ondemand governor available since 2.6.8 provides for in-kernel dynamic frequency scaling.[26] The ondemand governor decreases the frequency step by step if the processor is idle, but increases it to full processing power if there seems to be demand for it. It needs to be noted that for example AMD processors do only allow for step by step switching between frequencies; using this governor may thus cause unexpected latencies.

Two additional in-kernel cpufreq governors named conservative and past are discussed at the moment.[27] The conservative governor[28] tries to overcome the limitation of the ondemand governor on AMD CPUs noted above and only increases the CPU frequency step by step – and only if the demand for processing power is present for a longer period of time. In doing so, it tries to reach more time at lower frequency states and provide for more energy savings at the cost of slightly reduced performance and responsiveness.

The past governor[29] implements a completely different calculation algorithm for the targeted CPU frequency.[30] It always targets a CPU load of 70%. If the CPU load was lower in the “window” looked at in the past, the CPU speed is decreased; if it was higher, it is increased – always aiming at 70% CPU load under the assumption of constant CPU load. As it tries to modify the CPU frequency only slightly – not the “all or nothing” decision the ondemand governor does when the CPU load is too high – it attempts to stay loner at lower frequency levels and provide increased energy savings. However, the large “idleness window” of 30% may prove to be counter-productive.

Improvement #2: Multiprocessor Support

One physical CPU package must run at one frequency, meaning there needs to be some sort of coordination between multiple logical or physical CPUs. This was achieved by changing cpufreq to be “processor package”-centric instead of “logical CPU”-centric. Another problematic issue is that the primarily used timing source on the x86 architecture, the Time Stamp Counter (TSC), is located on the CPU die and therefore is affected by a changing CPU frequency. Especially on SMP systems you should use different timing sources (ACPI Power management Timer, HPET) to achieve sufficient precise results, even if they are slower to access.

Throttling

The last and least CPU-related power management technique to be explained here is throttling. It stops the execution of commands in the CPU for certain short periods of time. As such, it means the CPU frequency is lowered; however it is not done in an homogeneous manner. Therefore the CPU voltage cannot be lowered in the meantime.

Basic Principles

In contrast to CPU frequency scaling, where the operating frequency is constantly modulated, throttling means the CPU is forced to a halt for short periods of time. If throttled, the CPU into a physical and electrical state comparable to the idling states mentioned above, so it can be described as an “enforced idling” of the CPU.

As certain CPU power state typically utilize similar hardware implementations,[31]and as throttling does not have a positive effect on the energy consumption during CPU power states, throttling the CPU by a given rate is only useful if the CPU is less idle than the throttling rate.

Therefore, throttling makes only sense if the CPU temperature has become too hot because the CPU was active excessively. As throttling “forces” some “idling”, which then lowers the energy consumption and heat generation, it is a good tool for “passive cooling”. Passive cooling – in contrast to active cooling which utilizes fans – is done by lowering the CPU load and works best if using CPU frequency scaling for it, but is also possible using throttling.

Throttling is implemented in a few cpufreq processor drivers, most notably p4-clockmod,[32] and on ACPI-based platforms it is user-controllable using the file „/proc/acpi/processor/*/throttling“.

Efficiency

While the CPU is in the throttling state, on typical Intel or AMD x86 and x86_64 CPUs it consumes as much power as in the Stop Grant (Cache Snoopable) state. Depending on how high the throttling rate is, and how idle the system is, the CPU energy consumption will vary.

Assuming an idleness rate of zero, the power usage rates of some CPUs of the x86 architecture can be seen in table 2. Assuming one specific computing task needs 1s to finish at 100% CPU power available, the CPU energy usage for this task actually increases if throttling is used (table 3).

This shows that throttling itself does not save battery power at all – it even increases battery usage of any specific computing task. However, if a fan doesn't need to be started because of passive cooling, or if certain tasks do not need to be run (e.g. the screen is normally refreshed each 10ms, but the CPU is throttled so much it misses the deadline every second occasion, so the screen is only refreshed each 20ms), energy consumption may be reduced slightly overall.

Conclusion

The large amounts of energy consumed by CPUs can be reduced largely by using idling and frequency scaling techniques. As a last resort, also throttling can be used for thermal management. While there is still need for improvement, especially in the support for multi-processor systems, the existing infrastructure already provides first-class power management results.

Other Devices

Introduction

Even though the CPU may well be the largest single consumer of energy, most power is still consumed by other devices. With CPU power management reaching a point of saturation, the savings potential of these devices gains more and more focus both by hardware manufacturers and kernel developers.

Device Tree

It is important not to shut down a device, or to put it to sleep, if it is needed by other devices still in use. The integrated device and driver model developed for kernel 2.6. attempts to show these dependencies.[33] However, appropriately describing all logical, technical and electrical dependencies continues to be a major hurdle.

Backlights

With LCD backlights being one of the major power consumer, four different methods exist currently to reduce their power consumption. The first is implemented independently of the operating system by the firmware, which is called by special keys or key combinations, or the lid being closed.

On some platforms, the ACPI video module allows to modify the brightness using “/proc/acpi/video/*/brightness”. However, this proved to be non-functional on the author's notebook; additionally it is only available on some newer notebooks conforming to the ACPI 2.0 or 3.0 standard[34].

Additionally, the display power management configuration in the X window manager may influence the backlight: even though these settings[35] technically are meant to affect only the display and not the backlight, on some platforms they do affect both.

Finally, a new backlight infrastructure was merged into kernel 2.6.11. It attempts to provide an unified interface in “/sys/class/backlight/” to handle backlights on all architectures and platforms. However, only one device driver for Sharp Corgi PDAs is present in the kernel as of 2.6.12-rc3.

Hard disks

Hard disks can also be put into a power-saving mode, where, for example, the spindle motor is turned off. The delay of how long the drive waits between the last read or write command until it enters this mode can be modified using the “hdparm -S” command.[36] However, the repeated starting and stopping of hard disks might wear it down at an advanced rate,[37] so do not set the delay too short.

“Laptop Mode”

However, during normal operation the hard disk is accessed quite often: whenever a file[38] or information related to a file[39] is changed, the kernel waits at most five seconds until the information is written out to disk. Using the “laptop mode” tuning available in 2.6. kernels, this delay is increased. Also, the kernel tries to batch disk access, so that whenever the hard disk is woken up, all possibly pending requests are handled so that the hard disk can sleep for a longer time again, hopefully.

The scripts to run “laptop mode” are both included in the kernel sources[40] and in distribution packages. They are highly configurable, and the possible settings are described in an excellent manner both in the kernel[41] and in the configuration file itself[42].

Tweaking Userspace

Also, manipulating userspace may reduce the disk accesses – for example, the CUPS daemon and any logging utilities commonly write data out at a continuous rate. Therefore, you can consider whether the loss of this functionality – or, at least, its reduction by storing the log file on a tmpfs partition – outweigh the increase in battery usability, and – as an added bonus – noise reduction. The CPU usage of these tools can usually be ignored, though.

Bus Devices (PCI, USB, PCMCIA et al.)

Power States

The Linux device model[43] offers a unified interface to put devices into a freeze or other runtime[44] power management states, as this feature is needed for proper suspend to disk support. While an interface to userspace exists for each device in the file power/state inside the sysfs representation of a physical device, this interface is almost completely unusable at the moment.[45] As it is intended to be fixed soon, here follows only a short description as of kernel 2.6.12-rc4: echoing “3” into power/state asks to put the device into a freeze[46], writing “0” to this file puts the device back into full power.

However, PCI devices which do not have a driver attached are not put into a low power or even off state at the moment. Also, devices are not disabled or put into an off state which might offer even greater savings.

Device driver tuning

Therefore, it might help to take a closer look at the drivers governing devices. For example, loading and rmmod'ing the ipw2100 driver for the Intel ® PRO/Wireless 2100 Driver for Linux puts the device into an off state. Using the vesafb driver for X instead of ati leads to a much higher power consumption rate.[47] Also, WLAN antenna power consumption might be tunable using “iwconfig device power”.[48]

Unified Power Management

As can be seen above, achieving a high reduction of power consumption requires the combination of several, differing techniques affecting different parts and pieces of hard- and software. However, merely adding these techniques, meaning using them side by side, is not the way to go – there needs to be coordination between the differing techniques to achieve an optimum between the conflicting goals of usability and energy consumption.

Currently, this coordination is required to be done, at least partly, by the user or system administrator. Several power management tools only handle one or few techniques, and, for example, none allows for putting specific devices into low power states. In addition, there is no integration with the actual applications which sometimes know quite well how much processing power in the CPU or in the graphics adapter they need. And when they next need data from the hard disk.

While a userspace-based library governing all these aspects already exists for specialised embedded systems, the amount of work necessary to make existing applications aware of such power management callbacks has hindered the development of such a daemon on multi-purpose computers.

In addition, such an unified userspace tool would easen the path to make the advanced power management techniques described in this paper not only to those willing to fine-tune their system, re-compile kernels and even risk some loss of data, but also to the ever-groing public using Linux not wanting to dig into kernel internals or externals.

Conclusion

While runtime power management was not adequately taken care of in the Linux kernel for a long time, several exiting features have already been included in kernels of the 2.6. series. Several emerging projects still promise even increased savings in energy consumption rates, most notably tickless systems and bus device power management. Therefore, staying at the bleeding edge of the stable kernel series is likely to continue letting the user experience continuing improvements in runtime power management.

Annex

Table 1: Idle States and Energy Consumption

Processor[a] [b]

C0 (Normal operation)

C1[c] [d]

C2

C3

C4

AMD Geode NX 1750

14 – 25 W

unk.

unk.

3.0 W

n/a

Mobile AMD Athlon 64 2800+

35 W

2.2 W

2.2 W

unk.

n/a

Mobile AMD Athlon 64 2800+, frequency scaled to 800 MHz

12 W

2.2 W

2.2 W

1.2 W

n/a

Intel Pentium M 1400 MHz

22 W

7.3 W

7.3 W

5.1 W

0.55 W

Intel Pentium M 1400 MHz, scaled to 600 MHz

6 W

1.8 W

1.8 W

1.1 W

0.55 W

Intel Mobile Pentium III 600 MHz

8.7–14.4 W

1.1 W

1.1 W

0.3 W

n/a

[a] The exact names of the processors are: AMD Geode(TM) NX 1750 @ 14W; Mobile AMD Athlon(TM) 64 Processor 2800+, Rev. CG & 1.20 V, 512 KB L2 Cache; Intel Pentium(R) M Processor, 1400 MHz & 1.484V; Intel Mobile Pentium(R) III Processor With Intel SpeedStep (TM) Technology, 600/500 MHz.

[b] Data sources: AMD Geode(TM) NX Processors Databook, Publication ID: 31177A, May 2004, pp. 19.; AMD Geode(TM) NX Processors Databook, Publication ID: 31177A, May 2004, p. 33; AMD Athlon(TM) 64 Processor Power and Thermal Data Sheet, Publication ID: 30430, August 2004, rev. 3.37, pp. 19; Intel(R) 440BX AGPset: 82443BX Host Bridge/Controller Datasheet, Order Nr. 290633, rev. 01, April 1998, p. 4-31; Intel(R) 82801DBM I/O Controller Hub 4 Mobile (ICH4-M) Datasheet, Order Nr. 252337, rev. 01, January 2003, p. 362; Intel(R) Pentium(R) M Processor Datasheet, Order Nr. 252612, rev. 02, June 2003, pp. 11, p. 72; Intel(R) 855GM/855GME Chipset Graphics and Memory Controller Hub (GMCH) Datasheet, Order Nr. 252615, rev. 02, September 2003, p. 170; Mobile Intel(R) Pentium(R) III Processor in BGA2 and Micro-PGA2 Packages Datasheet, Order Nr.283653-002, July 2003, p. 64.

[c] On AMD processors, the C1 idle state is named “Halt”, C2 is „Stop Grant Cache Snoopable“, and C3 is „Stop Grant Cache Not Snoopable Sleep“. On Intel processors, C1 is „AutoHALT Power-Down state“, C2 is „Stop-Grant state“, C3 is typically „Deep Sleep State“ (alternatively „Sleep State“) and C4 is „Deeper Sleep State“.

[d] According to AMD, “power dissipated on-die from VDD is ignored” in the values specified for C1 and C2. It is not known to the author how high this value is, so a direct comparison with other CPUs (especially from other manufactors) is impossible.

Table 2: Throttling Rates and Power Consumption

Processor[a] [b]

0 % throttling

25 % throttling

50 % throttling

75 % throttling

Mobile AMD Athlon 64 2800+

35 W

26.8 W

18.6 W

10.4 W

Intel Pentium M 1400 MHz

22 W

18.3 W

14.7 W

11.0 W

[a] The exact names of the processors are: Mobile AMD Athlon(TM) 64 Processor 2800+, Rev. CG & 1.20 V, 512 KB L2 Cache; Intel Pentium(R) M Processor, 1400 MHz & 1.484V.

[b] Data sources: AMD Athlon(TM) 64 Processor Power and Thermal Data Sheet, Publication ID: 30430, August 2004, rev. 3.37, pp. 19; Intel(R) Pentium(R) M Processor Datasheet, Order Nr. 252612, rev. 02, June 2003, pp. 11, p. 72.

Values are calculated by P = Px (1 – r) + Ps . r

where Px is the power consumption at normal operation, Ps is the power consumption in Stop Grant state, and r is the throttling rate.

Table 3: Power Consumption for a specific Computing Task related to Throttling Rates

Processor[a]

0 % throttling

25 % throttling

50 % throttling

75 % throttling

Mobile AMD Athlon 64 2800+

35 Ws

35.7 Ws

37.2 Ws

41.6 Ws

Intel Pentium M 1400 MHz

22 Ws

24.4 Ws

29.4 Ws

44.0 Ws

[a] The exact names of the processors are: Mobile AMD Athlon(TM) 64 Processor 2800+, Rev. CG & 1.20 V, 512 KB L2 Cache; Intel Pentium(R) M Processor, 1400 MHz & 1.484V.

Values are calculated by W = P(r) / (1 – r)

where P(r) is the power consumption at the specified throttling rate, as were determined in table 2.



[1] Pruitt, Gartner: Global PC shipment growth to slow in '05, computerworld (http://www.computerworld.com/hardwaretopics/hardware/desktops/story/0,10801,99775,00.html?source=x10 , last accessed on 2005/04/29 ).

[2] Pouwelse/Langendoen/Sips, Application-directed voltage scaling, TVLSI 2002, 812 (812).

[3] Weiser, Ubiquitous Computing ( http://www.ubiq.com/hypertext/weiser/UbiHome.html , last accessed on 2005/04/29 ).

[4] Faerber, Das dritte Linux Zeitalter, LinuxTag 2004.

[5] ACPI means Advanced Configuration and Power Interface Specification; current revision is 3.0 as of September 2, 2004. ( http://www.acpi.info, last accessed on 2005/04/29 ). For more information on ACPI and Linux see http://acpi.sourceforge.net.

[6] While the ACPI specification (cf. footnote 5) provides a different state “in between” for suspend to disk, the implementations in the Linux kernel (swsusp and swsusp2) do not follow this model and completely shut down the computer.

[7] Cf. footnote 5 above.

[8] If you want to try it out, a good starting point is the corresponding section in gentoo's Power Management Guide: Nienhüser, Gentoo Linux Documentation – Power Management Guide ( http://www.gentoo.org/doc/en/power-management-guide.xml#doc_chap7 , last accessed on 2005/04/29 ) or the documentation section of Linux4ACPI at http://acpi.sourceforge.net/documentation/sleep.html . Also, searching the Internet and especially the Linux laptop databases (for example http://tuxmobil.org/ and http://www.linux-on-laptops.com/) is often highly effective.

[9] Several chipsets related to the first-era Intel® SpeedStep® area still are a mystery as to how frequency scaling commands are issued.

[10] For example, the heavy reliance on “jiffies” as a concept of a linear time source, as can be seen by the usage of “loops per jiffies” for handling delays, makes introducing tickless systems (see below) or dynamic frequency scaling more difficult.

[11] Devriendt, Multi-processor and Frequency Scaling. Making Your Server Behave Like a Laptop, Proceedings of the Linux Symposium, Volume One, 2004, 167 (172),

[12] Nienhüser, Gentoo Linux Documentation – Power Management Guide (Cf. Footnote 8 above).

[13] The ACPI specification (cf. footnote 5) calls them CPU Power States and names them C0, C1, C2, ... Cn.

[14] Specifically, the CPU must either listen whether the content of any memory address it has cached changes, or flush the cache completely. If the latter occurs, the system is less effective after the CPU is woken up again, as much previously cached data needs to be re-loaded from system memory.

[15] As Russell King points out on the Linux Kernel Mailing List ( http://lkml.org/lkml/2004/12/13/70 , last accessed on 2005/05/08), on hardware with (almost) no wakeup latency and time to put the CPU into sleep (as common on the ARM architecture), tickless systems are not needed.

[16] Linux Weekly News, 2005/01/27, A new core time subsystem ( http://lwn.net/Articles/120850/ , last accessed on 2005/05/08 ).

[17] The project has a homepage at http://www.muru.com/linux/dyntick/ , last accessed on 2005/05/08. As of this writing, the latest revision of the patch is http://www.muru.com/linux/dyntick/patches/patch-dynamic-tick-2.6.12-rc2-050408-1.gz .

[18] The groundwork for this "millions-of-instructions-per-joule" (MIPJ) metric was laid down by Weiser/Welch/Demers/Shenker, Scheduling for Reduced CPU Energy, Operating Systems Design and Implementations, 1994, 13 (13).

[19] For example, the power usage in state C2 declines from 7.3 W to 1.8 W on the Intel Pentium M processor 1400MHz. Cf. table 1.

[20] Previous improvements to the cpufreq subsystem are discussed by Devriendt (footnote 11).

[21] Pouwelse, Power Management for Portable Devices, 2003; Mouw/Langendoen/Pouwelse, LART Lessons Learned: cpufreq, Ottawa Linux Symposium 2002, 376.

[23] An excellent overview of existing cpufreq daemons can be found in Nienhüser, Gentoo Linux Documentation – Power Management Guide (Cf. Footnote 8 above).

[24] Pallipadi, Enhanced Intel SpeedStep ® Technology and Demand-Based Switching on Linux, http://www.intel.com/cd/ids/developer/asmo-na/eng/195910.htm?page=4 , last accessed on 2005/05/08.

[25] While the reduction of the CPU frequency on a critical thermal event noticed by ACPI is done independently of the governor, the governor is informed of this change and can easily „bounce back“ once the thermal situation has relaxed. Userspace tools have proved to not handle this case comparably well.

[26] An excellent description of the ondemand governor is given by its author, Venkatesh Pallipadi, on http://www.intel.com/cd/ids/developer/asmo-na/eng/195910.htm?page=4,

[27] The tempscale governor, which only changes the CPU frequency of non-interactive processes based on the CPU frequency, does not seem to be a real candidate for inclusion into the Linux kernel sources and therefore is excluded here.

[28] Written by Alexander Clouter; latest revision http://lists.linux.org.uk/mailman/private/cpufreq/2005-February/004897.html , last accessed on 2005/05/08.

[29] Written by Bruno Ducrot; latest revision http://lists.linux.org.uk/mailman/private/cpufreq/2004-December/004745.html, last accessed on 2005/05/08.

[30] It is based on the work by Weiser/Welch/Demers/Shenker (cf. footnote 18).

[31] On Intel CPUs, this is the Stop Grant state.

[32] Due to its limited use these drivers are only available on hardware which does not offer true CPU frequency and voltage scaling.

[33] Both by the „physical“ device tree in /sys/devices/ and a power management dependency not exported to userspace via sysfs: The latter is still unused, however, as of 2.6.12.

[34] Cf. footnote 5 above.

[35] Cf. the description in Nienhüser, Gentoo Linux Documentation – Power Management Guide, footnote 8.

[36] The non-intuitive parameters of this command can be determined in „man hdparm“; several distributions offer a GUI interface to this setting.

[37] Nienhüser, Gentoo Linux Documentation – Power Management Guide; linux/Documentation/laptop-mode.txt (see footnote 40 below).

[38] Please note that a text editor, for example, does not instantly changes a file when you modify it – it only does so when you issue the „save“ command.

[39] One example is the „last access time“ (atime) which can be avoided by using the „noatime“ mount option.

[40] linux/Documentation/laptop-mode.txt

[41] linux/Documentation/laptop-mode.txt

[42] Which may be, depending on your distribution, „/etc/laptop-mode/laptop-mode.conf“, „/etc/default/laptop-mode“ or „/etc/sysconfig/laptop-mode“.

[43] See already the „Device Tree“ above.

[44] In this context, runtime means the operation of the computer as a whole does not get affected. Nonetheless, devices which are frozen are unusable while in this state.

[45] A. Leonard Brown, The State of ACPI in the Linux Kernel, Proceedings of the Linux Symposium, Volume One, 2004, 121 (128f.).

[46] For PCI devices, this is the „D3hot“ state; for PCMCIA and PCCard devices you need to issue the command for the Cardbus bridge device. Doing so is equivalent to cardctl suspend.

[47] Brodowski, http://marc.theaimsgroup.com/?l=acpi4linux&m=111547954832307&w=2, last accessed on 2005-08-08.

[48] See „man iwconfig“ for details. However, be aware of the relatively low energy consumption of actual transmissions determined by Chen/Jamieson/Balakrishnan/Morris, Span: An Energy-Efficient Coordination Algorithm for Topology Maintenance in Ad Hoc Wireless Networks, Mobile Computing and Networking, 2001, 85 (chapter 4.5).

 
Impressum // © 2005 LinuxTag e.V.