Turn the voltage down, you're stressing me out

|

Intel came close to giving the idea of having a fixed clock-speed rating on its upcoming Nehalem the heave-ho, according to Intel fellow Rajesh Kumar, speaking to journalists ahead of the VLSI Circuits Symposium in Hawaii this week. The people who were going to be putting the processor into PCs didn't care for the idea, it seems.

The company has radically altered the way that Nehalem is clocked compared with its predecessors in order to improve both memory bandwidth and power consumption. It means that the core, memory buses and I/O run almost independently.

The bigger change is internal, where it seems that the concept of a fixed clock running at several gigahertz has been discarded in favour of letting the logic run at its own speed. This is something that people such as former ARM architect Professor Steve Furber have been advocating for years. The concept of a system clock is entirely artificial and exists largely to make life easy for chip designers and simplify the job of testing chips as they come off the production line. Chips such as the Amulet don't run off any kind of clock: the logic inside finds its own speed.

"The idea is not new," said Kumar. "But the implementation is new."

With a conventional clocked design, the architects work out how much logic each part of the pipeline can perform within a cycle. They then add on some slack to cope with the vagaries of manufacturing. For decades, this has worked pretty well although knocking some bits of the pipeline into line can have engineers tearing their hair out. "Negative slack" is not a term they like to hear.

Now, everybody cares about power consumption and the synchronous techniques of the past don't look so attractive. One problem is that having to add the guard bands increases power consumption because some part of the system are struggling to keep up. Others produce a result and sit around twiddling their thumbs waiting for the end of the clock cycle. The first circuit needs as much voltage as you can feed it — the higher the voltage, the faster transistors will switch. The second one you could afford to feed with a lower voltage, so that it slows down to the point where it comes up with result just ahead of the next clock cycle.

Where it gets worse is that, in modern processes, every transistor works slightly differently to its neighbours. Some are faster than expected; others are slower. To make sure everything can keep up, you have to increase the guard bands. And that pushes the overall power consumption up.

Kumar claimed it's now different with the Nehalem: "We have introduced a chip that adapts every cycle to the dynamic power. We can get higher frequency and lower voltage."

According to one of Intel's slides, the "duty cycle adapts to transistor variation and lifetime stress". Yes, not only do transistors come out of the fab with something of a spread in terms of performance, they change as they get used. Some will fare much worse than others in usage, slowing down over time. Higher voltages do not do sub-micron transistors a lot of good. So, being able to adapt to that change in speed is important.

However, the idea of a sort-of-3GHz processor did not appeal to Intel's customers. "We debated this quite a bit while doing the implementation," said Kumar. A less clock-centric approach was "the obvious way" to go, he added. But, "We realised quickly that people did not want that. They hated the idea of asynchronicity and indeterminism. So a tremendous amount of innovation has gone into avoiding that. We spent a lot of time working on that. Internally, the chip is adapting but, from the outside, it is deterministic."

In effect, there is an averaging process that goes on to ensure that people do not wind up with processors that run at subtly different speeds long term. "Every few seconds, it is averaging so that, from the outside, it is running at a fixed frequency. All the time. There is no indeterminism."

It's important for Intel to hide a variable clock cycle. But the idea of a stretchy clock cycle is something we may see from other quarters. ARM's director of R&D Krisztián Flautner studied at the University of Michigan where the Razor concept was developed.

Razor works on the assumption that circuits run at maximum speed for a given supply voltage and that, if they cannot meet timing, the calculation will seem to fail. But the correct result will become available at some point once the circuit has stabilised. The trick is knowing when this happens — so you add some logic to watch for this. The additional Razor logic cancels the false result and fetch the correct result from a shadow register. In effect, the circuit runs speculatively but can be corrected after the fact.

“The concept here is that we treat part of the cycle time as the error region; the place where we might screw up,” said Flautner last year.

The team has developed a self-correcting flip-flop and a memory cell, where a second sense amplifier detects the error. One of the issues is that the circuit is no longer deterministic — the problem that Intel wanted to avoid — as errors need to be fixed after the fact.

However, with this kind of approach, chips could be 40 per cent more energy efficient than they are today, according to Flautner. But runtime approaches such as Razor incur their own energy and area overhead. He said the area overhead was relatively unimportant: the question is whether the power drawn by the additional Razor logic outweighs the potential saving from being able to run circuits much closer to their timing margin by reducing their supply voltages.

Does the circuit needs to be deterministic? Flautner said this issue was confronted by software engineers when caches were introduced on embedded processors.