Embarrassingly unoptimised


At a workshop on advanced architectures earlier this week, the ARM's director of research Krisztián Flautner made an unprovable but seductive assertion. He said upfront that he had practically no evidence for it. But it's one of those statements that has the ring of truth about it. And at its heart is something that may prove to be the big problem that faces microprocessor makers as they try to work out how many cores they should put on one piece of silicon.

To make use of tens or hundreds of cores, you need applications that are "embarrasingly parallel". But Flautner is not so sure such things exist. His rule? That there is no such thing as an algorithm that is embarrassingly parallel, just an algorithm that hasn't been optimised enough yet. He pointed to situations in graphics where operations that used to be parcelled up and split across many processors in a dumb way have been replaced by approaches that apply more local intelligence but which are much tougher to parallelise.

I can think of one or two algorithms that remain embarrassingly parallel - they just aren't that useful. Anyone roving an electronics show in the late 1980s will remember people trying to push the Inmos transputer. If they weren't displaying scenes of shiny balls rendered by ray-tracing, they were processing the good-old Mandelbrot set.

People found lots of other ways to render scenes since then that don't parallelise so well. It's hard to think of a smarter way to generate Mandelbrot set images than the way they were processed then. But why would you want to?