Tabula trades space for time

|

Steve Teig pictured in front of several Tabula die-plotsI think the only sane advice I could give anyone on whether to start up a programmable-logic company today is: “I wouldn’t start from here if I were you.”

Perhaps the best response is to come up with what is, at first glance, the most insane architecture I’ve ever seen. Even its creator concedes that your best bet is not to think to what’s going inside because it gets too hard to conceptualise. Even the tools that build the design on Tabula’s architecture don’t deal directly with what happens inside the device.

Tabula is president and CTO Steve Teig’s fifth startup. Two were in electronic design automation (EDA) — the most recent being Simplex Solutions, which sold to Cadence Design Systems — and two in biotechnology. Even he spent a while, after leaving Cadence in 2003, wondering whether starting an FPGA company was a wise move considering the trail of dead companies that has formed in the wake of Xilinx’ and Altera’s dominance of the market since the early 1990s.

The company’s CEO, Dennis Segers, was responsible for the Virtex architecture at Xilinx — the company’s top-selling family of FPGAs — and was the man venture capitalists hauled in whenever they had a potential investment in an FPGA startup to make. And those VCs have not been short of choice in the past decade. Neither have they done very well out of it. Segers didn’t rate any of them.

However, after grilling Teig for several days in front of a whiteboard, Segers had a Victor Kiam moment and decided to join Tabula as CEO. The startup has a number of people who joined from Xilinx, including the vice president of sales, Steve Haynes, who had retired after leaving the number-one FPGA maker.

Teig contends that most of these companies started from the hardware and bolted on the design support later. Everyone still alive in the FPGA business knows it’s all about the tools. That’s how you can get away with selling chips for a thousand dollars a pop for those people who have to use the largest ones available. Teig says he started from a concept for an architecture, worked out whether it was possible to get a tool to support it and then find someone to work out how that hardware would work in practice. The result is, according to him, pieces of silicon that can sell for $200 versus the $1000+ an equivalent device from Xilinx or Altera might cost.

Now, I concede that most of this reads like one of those lifestyle-meets-investment pieces that results from a slick presentation. It’s actually taken two full meetings to get to this point. I have reservations about the potential success of Tabula just given the recent history of the FPGA business and also because the approach taken by the company is novel — leading to design issues that can’t be readily assessed from a discussion about the architecture.

On the other hand, it’s such an audacious attempt to rethink the way hardware is implemented, you can’t fail to like it. At first glance, Tabula is simply a reconfigurable computing architecture. Students of the history of this area will realise that startups trying to do this have flamed-out even faster than FPGA startups.

The cost of programmability
The problem with reconfigurable machines up to now is that the cost of changing logic on the fly has been pretty high. It’s too hard to get the gigabytes of data into a device to sustain any kind of throughput. For example, the architecture put together by Stretch was superficially interesting but hobbled by the speed at which you could shovel data into the device.

What Teig calls the ‘spacetime’ architecture is different because the memory used to store the chip’s state is held so close to the actual logic. It also benefits by the way in which scaling trends have provided some things, such as raw clock speed, in excess without the ability to use that speed fully. And active power continues to creep down, albeit more slowly than before, while static power consumption inexorably moves up. As static power is largely proportional to the number of idle transistors on a device, having a high ratio of active versus idle transistors is A Good Thing. And being able to get the same amount of logic out of fewer transistors is also A Good Thing.

“I believe spacetime is fundamental to the speed of computation,” Teig claims. “For any technology that you might use to do computation it becomes cheaper to do reconfiguration locally than to send the signal somewhere else for computation. We’ve been trying to look 25 years ahead with this approach.

“At a certain point most devices will be spacetime because it will take less energy to reconfigure than to send the signal far away. This is the next wave of computing strategy,” he adds.

Within the architecture, strips of memory flank the logic elements, which are based around lookup tables, and the multiplexers that route signals between them. This is a subtle difference from a classic FPGA architecture, where a lot of the memory needed to hold state and temporary data is spread across the chip. With the Tabula architecture, much more of the state and data is stored in large blocks of static memory (SRAM).

Use denser memory
Any block of SRAM has a certain overhead associated with it. So, the cost of adding multiple rows is not that high, until you’ve added a lot. SRAM is the one thing that scales well with Moore’s Law: chip designers work hard to make sure it’s as dense as possible because so much of it is on-chip now. And it’s regular structure means that optimising the core cell pays back millions of times over.

With those additional banks of state memory, Tabula divides a raw clock signal of more than 1GHz into multiple sub-cycles — Tabula refers to these sub-cycles as folds. Each sub-cycle, the logic elements and muxes read out their state from the local memory, perform the computation and then move onto the next sub-cycle. Signals still in flight at the end of each sub-cycle are caught by transparent latches within the programmable-interconnect section and held until the next sub-cycle that needs those signals as inputs. Any logic behind the latch once it’s closed can be reused by independent logic on subsequent sub-cycles.

time-via.jpgThe transparent latch — a ‘time via’ in Tabula-speak — looks to the hardware description level (HDL) code like a buffer. As just about every chip design tool on the market can insert buffers without having an effect on the HDL code this avoids the need to pipeline the logic to allow long sequences of combinatorial logic to be ‘folded’ onto a constantly reconfiguring collection of logic elements. Which is why Tabula calls the sub-cycles folds.

The reason for using words such as ‘fold’ to describe what’s going on is to make life easier for the design engineer using the devices and, indeed, for the people at Tabula who have written the tools.

Teig himself concedes that trying to think of the architecture as constantly reconfiguring is very difficult to deal with. It’s much easier to think of the folds as teh way to a virtual 3D chip. Each fold is a layer of logic connected by the time vias. So, in the initial parts, made on a 40nm process at TSMC, provide eight layers of logic using just one physical surface.

Why does this make the life of the tools writers easier? Because a 3D place-and-route algorithm is not conceptually different from a conventional 2D system. You can use the similar cost functions to those used in place-and-route tools that deal with an entirely physical set of logic elements and wires. A time via that connects fold one with fold eight has a longer ‘length’ than one that joins two adjacent folds, and it blocks more of the virtual interconnect. So, optimisation tools can attempt to minimise that cost to get better overall chip utilisation.

As Teig explains, it gets much easier to conceptualise once you’ve disposed of the idea that the architecture is time-slicing: “For 99 per cent of my work I pretend the chip is 3D: it’s much easier to visualise the x, y and z axes rather than trying to visualise what comes into existence when, which just give you a headache.”

Teig reckons the two advantages that Tabula has over reconfigurable-computing predecessors, such as Chameleon and Quicksilver, which disappeared more or less without trace, are two-fold: “We can reconfigure thousands of times faster than anyone who preceded us. The second part is that we have chosen to hide the revolution.”

However, the picture is slightly complicated by the fact that you don’t have to use all eight folds. If you want portions of the chip to run faster than an effective 200MHz, you just reduce the number of folds and have them run at 400MHz, 800MHz or, in the extreme case on the initial crop of 40nm devices, 1.6GHz.

“Why eight folds?” Teig asks rhetorically. “It’s a marketing reason rather than technical. We figured out that most logic is running at 200MHz or less. Even on ASICs that run at 500MHz, often only a small part needs to run at 500MHz. We wanted to make sure that for the 200MHz case there we could do the full folding. And we worked out it wasn’t worth the hassle of trying to go faster than 1.6GHz on a 40nm process.”

More folds
According to Tabula, the density advantage over a conventional FPGA, assuming eight folds, is around three times. A conventional FPGA is roughly 10 to 20 times less space efficient custom chip, such as an application-specific integrated circuit (ASIC) depending on how much memory your design needs. This makes the current generation of Tabula’s devices around four to eight times less dense than an ASIC.

By taking advantage of faster transistors in future processes, Teig reckons Tabula can attain parity in terms of density with ASICs and application-specific standard products (ASSPs) within three process generations — at the current rate of development, that is just after the middle of the decade.

“We are trying to take ASIC as well as ASSP market share. We are positioning ourselves not to be the number-six FPGA company but to be a major player in semiconductors,” Teig says.

This is far from the first time that someone in the FPGA business has said their product will take share away from the ASIC and ASSP markets. Rising design costs point to fewer design starts for custom chips. And yet in the past decade, sales of FPGA have barely moved at all. The revenue for the entire FPGA business has hovered around $3.5bn since the middle of the decade despite total semiconductor sales creeping closer to $300bn a year.

The trends surely point towards programmability. But that does not necessarily equate to FPGA. What’s happened is that standard products have gradually eaten into the custom-chip business simply because it often works out cheaper to buy a device that ships in high volume and not use half of it than to design a device that just has what you need on it.

FPGAs tend to do well in communications infrastructure equipment because there are not many off-the-shelf parts that can do the job. In consumer and other high-volume markets the chances are a platform processor does more or less what you need.

Cost trends
Teig argues that the spacetime approach will scale down into more consumer-oriented markets. But the company is concentrating first on the communications business as it is the key market for FPGAs. It also means that, assuming it can get two to three times more logic on each square millimetre of silicon, the company gains maximum benefit from the yield curve. As chip size approaches the reticle limit, yield plummets. Tabula, on the other hand, is not trying to ship chips close to that limit. So, the price advantage that Tabula can claim to have reaches its maximum against Altera’s and Xilinx’s most expensive devices.

The question is: will it work?

I haven’t spent this long trying to get to grips with an architecture for many years. Part of that is due to the affable Teig who is in the unusual position of being both president and CTO. That has neatly avoided the usual problem of trying to get past the marketing veep’s understanding of the system to try to work out what is going on beneath the buzzwords.

The approach that Tabula has taken seems to fit well with the major trends in silicon and, potentially, could benefit by what happens once 2D scaling in silicon runs out of steam towards the end of the decade. Relying on the scaling ability of large blocks of memory rather than random logic or interconnect looks sensible. Memory has the best chance to profit from a move into the third dimension and there is a strong commercial impetus to keep scaling memory.

There are some conceptual similarities between what Tabula is doing and the research by David Patterson at the University of California at Berkeley on IRAM — devices that are primarily memories with processors bolted on the side.

Hardware architecture only gets you so far. FPGA companies live or die on the strength of tool support. Teig’s background in EDA is a clear benefit here. Until we’ve seen how real customers have dealt with the tools, it’s tough to say how well Tabula will do here. But the claim by the company is that porting to Tabula is no more tricky than porting to Altera or Xilinx. There is no demand on the customers to wrap their heads around reconfigurable computing. As Altera and Xilinx are pushing reconfiguration more heavily for their forthcoming 28nm architectures — and this will involve a mental shift for designers — this stands Tabula in good stead.

Then there is the reaction one has when faced with a new FPGA architecture. I know my immediate response was: “That’s never going to work” effect. Not only is this an FPGA architecture attempting to take on two powerful incumbents, it is an architecture trying to do with dynamic reconfiguration, which compounds the “that’s never going to work” effect.

Of all the new FPGA architectures that have appeared in the past 15 years, Tabula looks to be the strongest candidate. I have reservations as to whether it can genuinely break the FPGA market’s glass ceiling but it stands a better chance of doing it than what is currently out there on the market.