EDA's acceleration option


John Busco at John's Semi-Blog has pointed to the launch by Nascentric of an analogue-circuit simulator accelerated by nVidia's graphics processors, and wondered: "Will general-purpose GPU computing become the acceleration platform for EDA?"

I was sitting at the Many-core and Reconfigurable Supercomputing (MRSC) conference in Belfast the other week wondering the same thing. In recent years, hardware-specific EDA has been a dirty word. Mentor Graphics, which made its name selling proprietary workstations before it became a software-only company made a foray back into hardware in a deal with Mercury Computer Systems in late 2006. Mercury used the IBM Cell processor – the same one used in the Sony Playstation 3 – to speed up the job of checking chip designs before they go to fab. Mercury sells the hardware and Mentor provides a special version of Calibre.

It's not clear how well hardware acceleration has gone for Mentor and Mercury. However, in its 2007 annual report, Mercury declared that it saw a "slight rebound" in its semiconductor business, partly due to the sale of one accelerator for chip-mask inspection – which is not related to Calibre – and its deal with Mentor. The number-three EDA company has been busy showing off the hardware at events like the SPIE lithography conference, so the company must have some faith in the idea of speciality accelerators.

The algorithm in Calibre is probably a good candidate for acceleration by GPUs as well as the Cell. One thing that was noticeable from MRSC was that users in the academic environment there were not making that much use of Cell but they were very keen to look at GPUs as well as field-programmble gate arrays (FPGAs) – the latter just happening to be the EDA acceleration technology that nobody really notices.

People have been using either emulators made from hundreds of FPGAs - OK, not that many people – or FPGA breadboards to simulate digital chips for years. Synplicity made such a good business out of doing tools for FPGA-based prototyping that Synopsys dismissed the fact it killed off a tool to do the same thing a couple of years ago and bought the company. (Actually Synopsys has gone through three different FPGA synthesis tools in recent years - FPGA Compiler, FPGA Compiler II and DC FGPA - we will wait and see how it does with the stuff it buys in.)

When Mentor unveiled its deal with Mercury, Joe Sawicki, the head of Mentor's Calibre operation, said they had changed the way the tool worked in such a way that it would better suit an accelerator like Mercury's. In the past, tools like Calibre used a sparse representation of the chip's surface to perform their analysis. Mentor's argument was that, at 45nm, the features on the surface of a chip are so densely packed that you might as well just chop it up into a regular grid and have at it with fast Fourier transform operations.

If there are two things that run well on accelerators, it's regular grids and FFTs. And I can't see a reason why a GPU would not be a potential candidate for the nmDRC software. I'd be surprised if Mentor wasn't looking at a GPU option.

But, not everyone is convinced that accelerators are the future. Srini Raghvendra of Synopsys made the point at the time that, with general-purpose multicore processors on the way from AMD and Intel, optimising for a dedicated accelerator from a single hardware vendor was unlikely to be a long-term option: "We believe we can be comfortable riding the general-purpose processor horse."

One thing that hits a lot of EDA software is bandwidth: between processor and memory and from memory to disk. It can take hours just to read a design in and hours to write it back out again. Your best bet might not be a funky accelerator but a half-decent storage area network with a bunch of fat pipes into the back of your server farm.

Then there is the issue of how much EDA software has actually been multi-threaded to run across multiple processors. With the kind of job that a tool that Calibre does, multi-threading is commonplace. If design teams don't want the Mercury accelerator they can just buy a bunch of Calibre licences and run them across a server farm. With layout checks, the shape of one logic gate does not affect another one just micrometres away. You can, with some limitations, chop up the grid into little chunks and distribute them to many processors without worrying too much.

A lot of EDA software is not so lucky. It is only recently that Spice simulators, such as those from Nascentric, have gone multi-threaded. As recently as last year, people were arguing how useful multi-threading would be in that environment. Regular Spice is all about solving regular matrices. Fast-Spice simulators typically use sneaky mathematical tricks to avoid having to crunch through massive regular matrices. The name of the game, as with a lot of EDA, is to convert a problem that scales with square or cube of the number of elements to something much more linear, or even logarithmic.

Unfortunately, these more optimised algorithms don't necessarily divide well. If you're not careful, you can spend so much time sifting and sorting data that the speedup you get from multiprocessing gets almost wiped out. So, something like Spice, which looks like a great candidate for multiprocessing, doesn't fare so well.

The results from academia on sparse-matrix acceleration using GPUs are good but not spectacular so far. People tend to report the same problems: memory bandwidth issues, limited cache memory on the GPUs themselves and the need to run thousands of threads in parallel to get any meaningful acceleration. People have reported speedups of maybe 2x, sometimes 10x, but not the 100x you might expect from taking something that runs on one processor to a graphics chip with 128 processors inside.

The release from Nascentric is a masterpiece of legerdemain in that respect - leading you to assume that you will see a 100:1 or even 500:1 level of acceleration, based on the number of SIMD processors you get in the hardware. And then there's the claim from John Croix, Nascentric founder and CTO: "Using nVidia's Tesla platform we can perform circuit simulations in minutes to hours that would previously have taken hours, days and weeks."

However, there are no numbers in the release to back this up. You may have the situation where regular Spice code gets a big speed boost but does that happen for some of the Fast-Spice algorithms. Bear in mind there is a lot of tweaking inside Fast-Spice. Designers are invariably turning things off in the hope that their circuit doesn't depend on those elements. The detail on the numbers for each of those cases would be pretty revealing.

A second issue is one of numerical precision. I'm not sure how much codes like Spice depend on double-precision floating point maths but, on GPUs, you really only get single-precision. You are more vulnerable to underflow and overflow issues in code that iterates a lot. Scientists tend to worry about this a lot, although numerical analysis work by computer scientists is doing a lot to assuage those concerns. However, accuracy could be an issue with GPU acceleration in EDA.

On the other hand, any company that looks at GPU acceleration is set up for the future. With OmegaSim GX, today you have to buy a separate accelerator. A few years from now that same code may simply harness the integrated GPU sitting on the AMD or Intel processor. It gives floating-point intensive software an source of Glops in addition to extensions such as SSE2. Why not make use of it?

Personally, I reckon that hardware acceleration in EDA will rise to the surface for a while then disappear again as the general-purpose processor and PC-blade makers absorb those technologies - who knows, they might be a lot better at running technical codes than games. EDA companies would be wise to explore the FPGA and GPU options if only because elements of those products are likely to wind up inside the workstation and the blade. But take the speedup claims with a large dose of salt.