## D-Wave: scaling comparison with classical algorithms

In January 2014, a group (Rønnow et al) from ETH Zurich, USC and Google published the paper *Defining and detecting quantum speedup* on arxiv which looked at how the D-Wave Two device’s performance scaled as the number of qubits increased and compared this with the scaling behaviour of optimised classical algorithms on the same problem. In June 2014, the formal version of the paper appeared in Science.

There are many caveats and possible uncertainties in that paper, such as whether the annealing time is optimal, but in my view it looks reasonable to conclude that for this test the scaling behaviour of D-Wave is not as good as that of the classical algorithms, especially for the harder problem instances. The graphs in the top-right panel of figure 3 of the arxiv version of the Rønnow paper, representing the classical algorithm’s scaling as a function of problem size, are roughly concave (diminishing slope), whereas the corresponding D-Wave graphs in the bottom-right panel are roughly convex (increasing slope). Of course, the shape of the graph depends on the scaling of the axes, but this particular choice of log(time) against the square root of problem size is fairly natural in that there are theoretical reasons to believe that the behaviour should be asymptotically linear (plus a constant) in these co-ordinates. If this were true, then the most important feature of an algorithm would be the single number, the limiting gradient, and the second-most important feature the limiting offset, which we may guess at by looking at the graphs.

There is a further worry from D-Wave’s point of view. I believe the classical algorithms (simulated annealing) used in the Rønnow paper were not the best possible for the task and there are at least two completely different classical algorithms that have better scaling behaviour than those used in the paper (and for range 7 are faster in absolute terms in their current implementations). One is the method I described in June 2013, the program of which (“Prog-QUBO”) was made available on github. (The other method is exchange Monte Carlo, or parallel tempering, which is normally used to find thermal states but can be used quite effectively just to find the ground state. This method won’t be considered here.) Here are results of tests carried out using Prog-QUBO, comparable to those that produced figure 3 of the Rønnow paper.

On the left are the “range 1” results and on the right, “range 7”. The top row shows Prog-QUBO’s results and the bottom row shows these superimposed with the Figure 3 results from the Rønnow paper. (Prog-QUBO was run on six cores of a 3.2GHz Intel i7-3930K. The “S13” means that it was using what it calls strategy 13.) In terms of scaling, it looks like Prog-QUBO is behaving better on these problems than the simulated annealing methods used in Rønnow, which in turn was behaving better than the D-Wave device, as demonstrated in that paper.

The absolute speed of these is of course implementation-dependent and of secondary interest, but has nevertheless been a motivating topic for many people (including myself) given the claims and counter-claims about whether D-Wave can outperform classical computers. In the case of range 1, Prog-QUBO is slower than the program used in the Rønnow paper (“Rønnow-prog-1”) until $N=512$ where they are similar. This may be expected as Rønnow-prog-1 is highly-optimised to handle many spins in parallel and can achieve up to 50 spin-flips per nanosecond. In the case of range 7, the program used in the Rønnow paper (“Rønnow-prog-7”) is optimised to run on bipartite graphs, but not the Chimera graph specifically and the crossover point where Prog-QUBO becomes faster is about $N=72$. (By comparison, Prog-QUBO is written specifically for the Chimera graph, though it is written in C in a fairly straightforward way and isn’t particularly optimised in terms of hard coding the set of couplings or exploting processor architecture.) Reading from the range 7 graph, Prog-QUBO is similar in speed at $N=1152$ to the Rønnow-prog-7 at $N=512$, which in turn is faster than D-Wave at $N=512$ on most of the harder problem instances. If only for this reason, it seems unlikely that upgrading D-Wave to 1000 qubits will give it an advantage over classical computers.

### Caveats

- The timing statistics considered above and in the Rønnow paper arise from a particular distribution over problem instances, in this case choosing the couplings independently and uniformly. It may be that there are other distributions over problem instances that favour the D-Wave device more. What makes one distribution better than another is presumably that it arises from an actual use case, where it ultimately helps optimising something you actually want optimising. Without such a use case in mind it’s hard to know whether the uniform couplings are representative, but they do at least generate difficult problem instances.
- In the Prog-QUBO graphs above for $N\ge 648$, the optimal values (ground state energies) were not verified using a proving solver. It is therefore possible that some of these presumed optimal values are not actually optimal, which would change the graphs, though for what it’s worth I believe that all the presumed optimal values are in fact optimal for reasons given in previous posts.

Hello Alex: I wonder if you have seen this latest study of benchmarking the DW2 against some optimised software, including yours, by Dr. Itay Hen of USC’s ISI, at a recent conference on Adiabatic Quantum Computation. Thanks:

http://www.isi.edu/sites/default/files/top_level/events/aqc2014/day4-talk4-Hen.pdf

Yes, I saw that and it looks very interesting. (By the way, the author informs me that we should disregard the slides after p.46 as these were not meant to be included in the upload and may possibly be erroneous.)

Hello Alex,

Do you find strategy 13 scales better than plain strategy 3? Also, how are you getting these 99% confidence timings, as I don’t see any options in the code on github (I’ve modified my local copy to time stabletreeexhaust directly). We have found that using precise timing of the stabletreeexhaust function to estimate the actual 99th percentile of the times to solution doesn’t actually match the expectation given from the assumption that the times for a single instance are Poisson distributed. It’s close but not quite right, and varies from instance to instance.

Hi Joshua,

Yes, strategy 13 appears to scale (a bit) better than strategy 3. They are similar for N<=392 or so, but S13 is better for larger N.

Did you mean to talk about the times to solution (TTS) being exponentially distributed (which would make the number of solutions found in a time interval Poisson)? You are right to question this, and I think the results above should be adjusted slightly as mentioned below, though I don’t think it will make that much difference.

I found that these times fitted an exponential distribution well, apart from very small problem sizes (N<=72) where it wasn’t such a good fit. (Not so surprising – for larger N, the chance of any one attempt at finding the solution is small, so a solution is the result of many quasi-independent attempts at finding a rare thing.) On the exponential/Poisson assumption, the 99th percentile time should be log(100)*TTS ~= 4.6*TTS, which is what I used for the above graphs (which were made by the file makegraphs2.py in the repository if you want to look).

This 4.6*TTS approximation can be separately tested (as it may hold even when the full Poisson approximation fails) by comparing with the actual 99th percentile time, and on the examples I tried it agrees very well, apart from N<=32 where 4.6*TTS overestimated the 99th percentile time. The reason I didn’t just use the actual 99th percentile time for the graphs was that I had already done the runs before with TTS output only and didn’t want to rerun them, and also because it is more accurate to estimate TTS when there aren’t so many attempts. (I prefer TTS as a measure, but I obviously wanted to make something comparable with the Rønnow paper.)

So summary: I used log(100)*TTS for the above graphs of S13 99th percentile times which I believe are accurate for N>=72, but should be reduced a little for N<=32.

Hello Alex: Here is a recent talk by Dr. Colin Williams of D-Wave, at U.of Cambridge, about some results from their latest 1152-qubit chip, which you might find interesting. Thanks:

http://sms.cam.ac.uk/media/1804114

Hi Sol. Thank you for pointing out that talk.

The relevant bit to this discussion is 27:50-28:20 where Dr Williams says that the new D-Wave chip scales better than one of the algorithms I use (the one I call S3).

I don’t agree with what he says, or at least I think it is misleading, for the reasons given below. (The summary of which is that D-Wave only manages to show a speed improvement as a function of problem size for easy families of problems that take negligible time to solve compared with known hard families.)

To explain a bit first, Williams is quoting, or reproducing, the results of Itay Hen who has devised a family of problem instances for which there is a “planted” optimal solution. This means that even for very large problem instances one can know whether a heuristic solver has found the optimal solution, without the need for an exhaustive (proving) search. It is one of these families that Williams is basing his comparison on. However:

1. The family of problems Williams has chosen is not the most difficult family of problems available using Hen’s technique, because he (Williams) has chosen the clause density to be higher than the hardest value.

2. Even the most difficult problem that can be created using Hen’s technique (using loops, at least – other variations may be different) is relatively easy compared to a simple random +/-1 instance.

3. When the comparison is made between D-Wave and S3 (or S13) using a family of hard Hen instances, D-Wave appears to scale worse, i.e., the time required grows faster, as a function of instance size, for D-Wave than for S3.

These statements can be checked by looking at this paper of Hen et al, where the algorithm I have been calling S3 is called HFS. The Fig. 6 graph shows that for the red lines (high clause density, easy family), D-Wave times are indeed growing more slowly than for HFS, but for the light blue lines (medium clause density, hard family) the D-Wave times are growing faster.

From the abstract, the authors say

“While we do not find evidence for a speedup for the hardest and most frustrated problems in our problem set, we cannot rule out that a speedup might exist for some of the easier, less frustrated problems.” [“Speedup” meaning D-Wave time grows more slowly than classical time.]

It should be said that Williams is using a newer chip (Washington, 1152 qubit) than Hen et al were able to work with, and it is possible that it scales better than the older one because it is less noisy, or due to some other quality improvement, so that is an unknown from my point of view. On the other hand, it is certainly true that S3 (which Williams and Hen have used for testing) is not the best version of my algorithm. S13 (pictured above) scales considerably better, and S14 (treewidth 2) better still. (S13 is better than S3 in absolute terms for all non-trivial sizes, but S14 only becomes better in absolute terms than S13 for about N>=1152.) There are also other (unpublished) classical algorithms that are competitive with, or better than, S13/S14.

Hello Alex:

I wasn’t sure if you had seen this latest study by James King….et al., about the results of D-Wave’s latest 2,000-qubit chip, and their claim that it beat all classical solvers, including yours, by some 2,600x. I know you are very busy and have far important things to worry about, but just thought I should bring it to your attention anyway. Thanks.

https://arxiv.org/abs/1701.04579

Best wishes,

Sol Warda