D-Wave: scaling comparison with classical algorithms
In January 2014, a group (Rønnow et al) from ETH Zurich, USC and Google published the paper Defining and detecting quantum speedup on arxiv which looked at how the D-Wave Two device’s performance scaled as the number of qubits increased and compared this with the scaling behaviour of optimised classical algorithms on the same problem. In June 2014, the formal version of the paper appeared in Science.
There are many caveats and possible uncertainties in that paper, such as whether the annealing time is optimal, but in my view it looks reasonable to conclude that for this test the scaling behaviour of D-Wave is not as good as that of the classical algorithms, especially for the harder problem instances. The graphs in the top-right panel of figure 3 of the arxiv version of the Rønnow paper, representing the classical algorithm’s scaling as a function of problem size, are roughly concave (diminishing slope), whereas the corresponding D-Wave graphs in the bottom-right panel are roughly convex (increasing slope). Of course, the shape of the graph depends on the scaling of the axes, but this particular choice of log(time) against the square root of problem size is fairly natural in that there are theoretical reasons to believe that the behaviour should be asymptotically linear (plus a constant) in these co-ordinates. If this were true, then the most important feature of an algorithm would be the single number, the limiting gradient, and the second-most important feature the limiting offset, which we may guess at by looking at the graphs.
There is a further worry from D-Wave’s point of view. I believe the classical algorithms (simulated annealing) used in the Rønnow paper were not the best possible for the task and there are at least two completely different classical algorithms that have better scaling behaviour than those used in the paper (and for range 7 are faster in absolute terms in their current implementations). One is the method I described in June 2013, the program of which (“Prog-QUBO”) was made available on github. (The other method is exchange Monte Carlo, or parallel tempering, which is normally used to find thermal states but can be used quite effectively just to find the ground state. This method won’t be considered here.) Here are results of tests carried out using Prog-QUBO, comparable to those that produced figure 3 of the Rønnow paper.
On the left are the “range 1” results and on the right, “range 7”. The top row shows Prog-QUBO’s results and the bottom row shows these superimposed with the Figure 3 results from the Rønnow paper. (Prog-QUBO was run on six cores of a 3.2GHz Intel i7-3930K. The “S13” means that it was using what it calls strategy 13.) In terms of scaling, it looks like Prog-QUBO is behaving better on these problems than the simulated annealing methods used in Rønnow, which in turn was behaving better than the D-Wave device, as demonstrated in that paper.
The absolute speed of these is of course implementation-dependent and of secondary interest, but has nevertheless been a motivating topic for many people (including myself) given the claims and counter-claims about whether D-Wave can outperform classical computers. In the case of range 1, Prog-QUBO is slower than the program used in the Rønnow paper (“Rønnow-prog-1”) until $N=512$ where they are similar. This may be expected as Rønnow-prog-1 is highly-optimised to handle many spins in parallel and can achieve up to 50 spin-flips per nanosecond. In the case of range 7, the program used in the Rønnow paper (“Rønnow-prog-7”) is optimised to run on bipartite graphs, but not the Chimera graph specifically and the crossover point where Prog-QUBO becomes faster is about $N=72$. (By comparison, Prog-QUBO is written specifically for the Chimera graph, though it is written in C in a fairly straightforward way and isn’t particularly optimised in terms of hard coding the set of couplings or exploting processor architecture.) Reading from the range 7 graph, Prog-QUBO is similar in speed at $N=1152$ to the Rønnow-prog-7 at $N=512$, which in turn is faster than D-Wave at $N=512$ on most of the harder problem instances. If only for this reason, it seems unlikely that upgrading D-Wave to 1000 qubits will give it an advantage over classical computers.
- The timing statistics considered above and in the Rønnow paper arise from a particular distribution over problem instances, in this case choosing the couplings independently and uniformly. It may be that there are other distributions over problem instances that favour the D-Wave device more. What makes one distribution better than another is presumably that it arises from an actual use case, where it ultimately helps optimising something you actually want optimising. Without such a use case in mind it’s hard to know whether the uniform couplings are representative, but they do at least generate difficult problem instances.
- In the Prog-QUBO graphs above for $N\ge 648$, the optimal values (ground state energies) were not verified using a proving solver. It is therefore possible that some of these presumed optimal values are not actually optimal, which would change the graphs, though for what it’s worth I believe that all the presumed optimal values are in fact optimal for reasons given in previous posts.