D-Wave: comment on comparison with classical computers

Update (20 June 2014): This is a comment on the paper by Rønnow et al which was published in Science yesterday.
Update (7 September 2013): There is now a successor page to this one, discussing harder problem instances.
Update (3 August 2013): The QUBO heuristic now has a more detailed description. The program has been improved and the corresponding results have been updated.
Update (29 June 2013): This has been rewritten to incorporate updated programs and results and more detailed explanations. Apologies that this renders comments below (including my own) that are prior to 29 June, out of date. In particular, all heuristic results have been verified by two independent exact search programs.

Introduction

In May 2013, there were two D-Wave-related papers published that attracted some interest.

“D-Wave Paper [1]” (Boixo et al) concerned D-Wave One (108 qubits) and gave some indirect arguments that the device is behaving like a genuine quantum annealer.

“D-Wave Paper [2]” (McGeoch-Wang) concerned the more recent D-Wave Two machine (439 qubits, potentially 512). It provides comparisons of D-Wave Two (together with a software layer called Blackbox) against some combinatorial optimisation solvers running on classical hardware for three specific tasks.

Some background reading on D-Wave is available at Scott Aaronson’s Blog (somewhat sceptical) and Nature (somewhat favourable).

The conclusion of [2] is that D-Wave outperforms the software against which it was tested. In the case of the first problem, D-Wave is reported as outperforming its nearest rival (CPLEX), amongst those tried, by a factor of 3600.

Co-author of [2], Catherine C. McGeoch, is careful to say (and I’m grateful for her personal communication) that her tests were a comparison between D-Wave and some specific software packages, so it would be improper to conclude anything about D-Wave vs classical computers as a whole. However, that has not stopped many commentators, including from the scientific community, doing just that, the take-home message being that quantum computing has come of age and “D-Wave is 3600x faster than classical computers”. Google, NASA-USRA and Lockheed-Martin have made purchases, and apparently Google intends to use the machine to find parameters for their AI models and Lockheed-Martin for the “test and measurement of things like jet aircraft designs, or the reliability of satellite systems”.
See, e.g.,
“3,600 times faster than a leading conventional computer when working on…”
“…3,600 times as fast as the conventional system”… “…50,000 times faster”
“…for some types of problems the quantum computer was 3,600 times faster than traditional supercomputers”
Quantum computer passes speed test (Nature)

I was curious to see whether these claims could be true, and whether the current D-Wave machine could possibly be useful as a computational device right now. (The short answer, I believe, is “no”, if you don’t want to read any further.) In this discussion, “current D-Wave device” refers to D-Wave Two with the V5 or V6 chip, which are the latest versions about which any information has been published. It is the V5 (and to some extent, the V6) device that is the subject of detailed tests in [2], which makes it possible to compare it with alternatives here.

The method was to write simple programs to solve two of the tasks tested in [2], in particular for the first task there (i.e., QUBO, which is the second one presented here) which is the task native to D-Wave, and see how they compare with the published D-Wave results. I am sure that you can improve a lot on these programs, but the point was to see if a simple idea would be enough.

(I should say up front that I do not think this is the most interesting question about the D-Wave device, which to me is whether, and to what extent, it is doing anything “clever” – non-trivially non-classical – and how might this develop, what limitations it may or may not have, etc.. However, the question of performance of a particular device does at least have the merit of being relatively easy to answer with some degree of certainty, and it is still at least somewhat interesting in that it addresses the question of whether quantum computers have at last come of age.)

Summary conclusions

1. Prog-QAP on a single core of a normal desktop processor is at least 120 times faster, and the C version (not stored here) at least 12,000 times faster, than the Blackbox/D-Wave hybrid solver, as it is used in [2], at optimising the Quadratic Assignment Problem (QAP). These figures give Blackbox/D-Wave the benefit of the doubt in respect of various unknown quantities and optimisation possibilities that weren’t implemented in the tests in [2] themselves (see below). However, the real conclusion of this test is that the method used to implement QAP is inefficient, rather than anything fundamental about D-Wave. The QUBO comparison (below) is a better measure of D-Wave’s capabilities.
• Caveat: The above speed ratio calculation assumes that Blackbox/D-Wave uses the same basic method as the one used for the QAP tests carried out in [2]. That is to say, it is possible that there are ways to tune Blackbox and different algorithms altogether that Blackbox/D-Wave could be using that would be better.
• Caveat to the caveat: in a sense there is obviously a better algorithm that produces a speed ratio of 1, in that D-Wave could be left unused, and Blackbox could copy whatever classical algorithm it is being compared with. This shows that when considering the best Blackbox algorithm, either these need to be restricted to those that “use” D-Wave, or (more well-defined) it is best only to look at the case where the speed ratio is less than 1, i.e., when there is actually an advantage to using D-Wave.
• Caveat: The collection of 33 QAP tests in [2] is treated as one single aggregate test for simplicity. This makes it possible to come up with a single figure for the relative speeds of solvers, but it is likely that the speed ratio for individual problems would be different from this overall figure.
2. Prog-QUBO is the main focus of this discussion. It is compared with D-Wave chips solving their native Quadratic Unconstrained Binary Optimization (QUBO) problems as described in [2] (though see below for a correction). The result is that on a single core, Prog-QUBO is approximately 160 times faster at finding the optimum than D-Wave V5 for a median difficulty size 439 problem, 31 times faster than D-Wave V6 for a median difficulty sized 439 problem, and 19 times faster than D-Wave V6 for a median difficulty size 502 problem. More favourable or less favourable comparisons can be made depending on various choices (some of which are detailed below), though the conclusion (I believe) is always similar, that a desktop computer can easily beat these D-Wave chips at their own game, and so I believe that these D-Wave chips do not at the moment provide useful computation since you could always replace one in any situation with a software QUBO solver functioning as a universal D-Wave simulator.
• Caveat: The test data from [2], including the hardware graph, instances used and full results, have not been made public yet. The above comparison instead runs tests on randomly generated instances that are meant to be similar (weightmode5 with a random subgraphs of sizes 439 or 502). It would be possible to make more definitive statements if and when the actual cases become available for perusal. However, there are reasons to believe that these instances are similar to the actual ones (see below).
3. QUBO instances in [2] (with correction) are easier than the hardest in the class of instances that could be set up on a D-Wave machine. It would be interesting to see how D-Wave performs on these more testing problems.
4. In a rather speculative bid to guess how “clever” D-Wave is, two different scaling comparisons are presented between D-Wave and various forms (hobbled and otherwise) of Prog-QUBO. However, these two comparisons give apparently opposite answers.

In more detail:

The Quadratic Assignment Problem (QAP) comparison

This is something of a warm-up task. Prog-QAP and full results for this task is available here. This problem makes for an easy comparison, because all the test cases are noted at the end of [2] and are publicly available from QAPLib.

Prog-QAP is a very simple and short python program which starts with a random permutation, and tries transpositions until no more benefit is possible, then it repeats until it runs out of time. The idea is to do an equal-quality-compare-time comparison between this and the various solvers reported in [2], which is carried out (approximately) by varying the running time of Prog-QAP until the overall quality of all solutions is similar to that of the best solvers in [2].

The result is that in 1 minute per problem (on a single core of a normal desktop computer) it performs much better than the best of the Blackbox/D-Wave + software solvers used in [2], and a running time of about 1.6 seconds is required to equalise the overall quality in the sense of there being as many problems for which Prog-QAP produces better answers as there are for which it produces worse answers. Detailed results are in the github repository. The C code vesion (not stored here) runs about 100x faster (thanks to Hannes Uppman for verifying this), so finds the same overall quality of answers in 0.016 seconds.

In the tests of [2], Blackbox/D-Wave ran for about 4.4 hours per problem (6 days elapsed time divided by 33 problems), though McGeoch (pers comm) points out that her tests were designed to extract useful information from Blackbox/D-Wave rather than to run it optimally fast, so the 4.4 hour figure is somewhat notional.

One reason that the Blackbox implementation could run faster is that it is asking D-Wave for 10000 samples for each query it makes to this hardware, as this is the default Blackbox setting. (This is the value called $k$ in [2], where owing to a typo it is reported that the default value is 1000.) However, it is probably true that many fewer samples are actually required in practice.

We (the public) don’t know (at the time of writing this) how many queries were made to D-Wave, how many samples would have been needed for each query, or how much conventional CPU time was used by Blackbox, but we may still proceed by conservative estimates.

For each query, there are at most 439 objective function evaluations (ofes) to set up the query and at most $k$ (10000) to analyse the results. For each problem, Blackbox ran to $10^7$ ofes, so there were at least $10^7/(10000+439)=958$ queries for each problem. The time for a query on the V5 chip is $(201+0.29k)$ms, where $k$ is the number of samples requested, reflecting a 201ms setup time and a 0.29ms anneal time. Assuming only a single sample were required for each query then that would require $958*0.20129 = 193$ seconds of D-Wave time.

In this scenario (minimum number of queries), Blackbox would have also used at least (6/33)days – 958*(201+.29*10000)ms = 3.5hours (classical) CPU time per problem, but we’re not sure how many queries there really were (which might lead to inferring different amount of CPU time spent) or how much this classical side to Blackbox can be improved, so we’ll write off all this CPU time and assume it can be removed entirely, leaving 193 seconds as the final minimum running time for Blackbox/D-Wave. This compares with 0.016s taken by the C version of Prog-QAP, which is therefore at least 12,000 times faster by this measure. Even using V6 chip, with a query time of $(36+0.126k)$ms, the program would be at least 2000 times faster than Blackbox.

We can go further and look at the effect of completely eliminating the D-Wave setup time (the constant component of the query time), though this is not currently possible as far as I am aware. This would give a speed ratio of Prog-QAP over a V6 chip with no setup time of at least $958*.000126/.016=7.5$.

In these comparisons, Prog-QAP was running on a single core of a normal desktop computer. Since the method is “memoryless” (it repeatedly starts from a random permutation), it readily parallelises, so it is reasonable to imagine dividing the running time of 0.016s by the number of cores available.

There is a certain amount of hand-coding that has gone into implementing the D-Wave version of this problem. The paper refers to a “more compact transformation” that results in 5.5 times smaller instances presented to Blackbox/D-Wave. This transformation is unspecified as yet (though it should be noted that this paper is a shortened version of the full journal version which is to appear).

Of course, Prog-QAP “knows” about permutations and is not a general combinatorial optimisation solver. Despite being a very simple program and far from the best possible, it is in some sense a dedicated or specialised solver. The conclusion is that the method used to convert QAP into QUBO for Blackbox, and the method used for the reviewed software solvers, are extremely inefficient. Thus this does not shed any light on to what the D-Wave chip is doing, but it does show that it is not sensible to use it in this way to solve QAP problems. This is pretty obvious really: to expand a permutation, which is describable with $n$ numbers, into an array of $n^2$ (or worse) variables and constraints is a huge upfront hit, especially when you consider how awkward it would be to move from one permutation to another in this expanded form.

The QUBO comparison

Since QAP was really too easy a win for software over Blackbox/D-Wave, the interesting comparison is going to be on D-Wave’s territory with its native problem of Quadratic Unconstrained Binary Optimisation (QUBO) or finding the minimum energy state in a spin glass with a particular graph. That is, the task is find $x_i\in\{0,1\}$ to minimise
$$\sum_{i,j}Q_{ij}x_ix_j,$$ where $i$ and $j$ range over vertices and $Q_{ij}$ is only allowed to be nonzero for edges $ij$.
(Strictly speaking, the form of this problem native to D-Wave is that of the Ising Model with external fields, where the state variables can take the values $\pm1$, but this is equivalent to QUBO by a simple transformation.) If you can better D-Wave at this QUBO, then obviously you can better it on any other problem that uses D-Wave as a backend, just by replacing D-Wave by your QUBO program. In this case I believe any algorithm, specialised or not, is fair game for the purposes of deciding whether D-Wave is currently a useful computational device. (Not for the purposes of deciding whether D-Wave really does do quantum annealing, which is a separate question.)

D-Wave Two (V5) uses a 439 vertex subgraph of the full 512 vertex graph, the Chimera graph $C_8$, and the V6 chip uses a 502 vertex subgraph, but as far as I am aware, these precise subgraphs have not been published. Since these subgraphs arise, perforce, from just using the vertices (flux loops) that can be made to work at the time, they are presumably not chosen to have a particular pattern and it seems reasonable to choose a random subset here for testing purposes. (The corresponding subgraph for an earlier D-Wave chip is published in [1] and doesn’t seem to have an obvious pattern, though perhaps the defects tend to be nearer the edges.) In practice, the subset chosen didn’t seem to make much difference.

An important point is that the problem description in [2] is not correct in saying that the entries $Q_{ij}$ were chosen randomly to have the values in $\{-1,1\}$. ($i$, $j$ label vertices of the 439 vertex subgraph of $C_8$, and $Q_{ij}$ is allowed to be nonzero when $i=j$ or when $ij$ is an edge of the 439 vertex graph.)

The procedure that was actually used in this paper was to choose Ising model parameters $J_{ij} (i<j)$ and $h_i$ to be IID in $\{-1,1\}$ and then the problem of minimising $-\sum_{i<j}J_{ij}s_i s_j-\sum_i h_i s_i$ over spins $s_i\in\{-1,1\}$ (native form for D-Wave) was converted to QUBO form (for the software solvers) by substituting $s_i=2x_i-1$. This amounts to setting $Q_{ij}=-4J_{ij} (i<j)$, $Q_{ij}=0 (i>j)$ and $Q_{ii}=2(-h_i+\sum_{j<i}J_{ji}+\sum_{j>i}J_{ij})$. (Obviously you can also shuffle weight between $Q_{ij}$ and $Q_{ji}$ without changing anything.)

This different problem description makes a big difference because the instances are much harder than if they arose from choosing $Q_{ij}$ IID $\in \{-1,1\}$, or any one of several variants like this starting with $Q_{ij}$ (such as enforcing symmetry, or upper-triangularity). The difficulty of these QUBO instances appears to be intermediate between those arising from the description in [2] (which are extremely easy) and those arising from the Ising model description with $h_i=0$ (which are quite hard).

Prog-QUBO is available here (along with full test cases and results). This program works in several modes of operation. In its heuristic search mode with “strategy 1” or “strategy 3” (dubbed Prog-QUBO-S1 and Prob-QUBO S3 or S1 and S3 for short, the modes mainly considered here) it starts with a random state, then optimises over rows and columns, or trees, of the 8×8 graph of $K_{4,4}$s until no more benefit is possible. Then it repeats until it runs out of time. Optimisation over a row, column or tree is efficient because they are “thin” in the sense of treewidth: they can be built up semi-$K_{4,4}$ by semi-$K_{4,4}$, at each stage only having a small internal boundary. Further details are given in AppendixB.

It is a slightly delicate matter to judge the performance of heuristic (non-proving) solvers. One method of evaluation would be to see what the best value they can produce in a given amount of time. This has the drawback that you can’t get speed comparison ratios (so you can’t say, for example, how many D-Waves are equal in value to how many CPUs), and it will also strongly depend on the time you choose to allow.

The other kind of method is to keep the quality of answer fixed and measure the time taken to arrive at this answer. This obviously depends on the quality level chosen, but here the problems are sufficiently easy (taking milliseconds) that it makes sense to ask for the optimum. It may be that for a particular real-life use, finding a quick good value may be more important than finding the optimum, but at least the optimum is an objective standard by which we can compare solvers. In any case, it is the only real basis of comparison we have available from the published information in [2], since we don’t have access to a complete breakdown of the answers produced by the D-Wave device. Section 4 of [2] attempts to reverse-engineer the time taken by the D-Wave chip to find an optimal answer, and we may attempt to mimic this to make a comparison.

To this end, we define the Time-to-Solve (TTS) for a heuristic solver operating on a given instance to be the average time taken to find the optimum value. It is assumed that the solver may be using a random algorithm, and the average is meant to be over this space of randomness. There are a couple of points arising.

The first point is that this single measure (TTS) does not take into account how confident the solver is that its solution is optimal. Of course, it cannot be sure the solution is optimal (otherwise it would be a proving solver, not a heuristic one), but it may be confident to a greater or lesser extent, and this degree of confidence may or may not be important for the end use. A more discerning performance measure might include two times: the expected time to find the first solution (TTS) and the expected time to be (say) 99% confident that the best solution obtained so far is the optimum, but here we disregard the time-to-be-confident and just consider the TTS as the performance measure. This is like having an oracle which knows the true optimum causing the heuristic program to halt at the point it finds the optimum. Fortunately, whatever the shortcomings of this measure, they apply more-or-less equally to D-Wave and Prog-QUBO, because both work (broadly speaking) by quickly making lots of trial solves from random start positions, and the only means they have to be confident that the solution obtained is optimal (or has a given quality) is to look at the distribution of values (energies) obtained. If the lowest value obtained has occurred many times then that suggests that it may be optimal.

The second point arising is that if we measure the average by asking the solver to find (say) 500 solutions and then dividing the time by 500, we need to be sure that it is finding 500 independent solutions, i.e., that no information from one solution gets used to help find another. It may be trivial to find many variants of a given solution, but we are interested in the time taken to solve from scratch, because in a real application we probably only want one solution: we are only finding lots of solutions here to get a more accurate time.

To summarise, the measurement procedure would ideally be this:

1. Repeat 500 times:
2. Randomise the configuration and clear all state
3. Repeat:
4. Produce candidate configurations until you hit one whose value is $v_\min$

where $v_\min$ is the optimal value. “Configuration” refers to those of the binary variables associated with each vertex, and “state” is auxiliary statistical information accumulated on the way. However, this procedure depends on knowing $v_\min$, and it is more convenient to have an autonomous evaluation procedure that doesn’t rely on knowing this optimal value beforehand. So the actual procedure used was this:

1. Set $v^*=\infty$
2. Set $n=0$ and timer=0
3. Repeat until $n=500$:
4. Randomise the configuration and clear all state
5. Keep producing next candidate configuration until you find one of value $v\le v^*$
6. If $v<v^*$ then set $v^*=v$ and goto 2.
7. ($v=v^*$) $n=n+1$

If $v^*$ manages to find its way down to $v_\min$, which was verified later to have happened in every case tried, then the elapsed time since the last time the timer was set to 0 will be the same (in expectaction) as in the ideal first version of the procedure above. This requires finding 501 optimal values to measure the time to find 500, because the first optimal value will have been found under conditions of doing extra resets, so the time taken to find the first optimal value will have the wrong expectation.

There is a convenient side-effect of asking Prog-QUBO-S* to find 500 independent optima, namely that in practice it can do so “unsupervised”, i.e., without being told the true optimum value. If it keeps doing random runs until the lowest value it has obtained has occurred 500 times, then (for the problems here) it is highly likely that this value is the true optimum. In fact, Prog-QUBO uses the value 500 as its default setting and in all 6000 instances tried, the value so obtained actually was the true optimum. This is due to branching ratios from one value to another: for the problems considered here, the number of occurrences of value $x+\Delta$ divided by the number of occurrences of value $x$ ($\Delta$ being the smallest change possible for the problem in question) is in practice almost always lower than 50, and usually much lower. If this branching ratio were 50, then the chance you have missed the optimum after collecting 500 lowest-values-obtained is about $\exp(-500/50)=4.5\times10^{-5}$.

(There is another convenient side-effect. The TTS can be rather small, e.g., 0.3ms, so to get an accurate measurement of CPU time using a system clock with an accuracy of 10ms it is necessary to repeat the run several times.)

To summarise: for each problem instance, Prog-QUBO-S1,-S3 are run for as long as necessary to find 500 independent optimal solutions. (These are later checked for optimality.) The TTS is taken to be the running time divided by 500, and is used as the performance measure. This is intended to be directly comparable to the analysis in section 4 of [2], where the expected Time-to-Solve (TTS) for D-Wave is calculated based on the number of solutions that appear in a sample of 1000 anneals. This D-Wave analysis is also effectively assuming that an oracle will step in and halt the search as soon as the device finds an optimum. In neither case is the Time-to-Solve the same as the Time-to-Solve-and-be-confident-you-have-the-optimum.

Prog-QUBO was run in heuristic mode, single-threaded on a single core of a standard desktop processor on 6000 random test instances (available in the github repository), split as 1000 instances of three different types of problem run with two different strategies. The setup details are available in Appendix A.

The three sets of 1000 are:

• Set 1: size 439, weightmode 5, QUBO examples, intended to be as similar as possible to those on which the V5 chip, and V6 chip in “V5 downgrade” mode, were tested.
• Set 2: size 502, weightmode 5, QUBO examples, intended to be as similar as possible to those on which the V6 chip was tested.
• Set 3: size 439, weightmode 7, (no-external-field Ising Model examples in QUBO form), intended to be harder instances for testing.

Set 1. V5, V6 chip comparison with Prog-QUBO on size 439 graph

Size 439 (weightmode 5) QUBO results:

 Sample size TTS (mean) TTS (median) TTS (80%-ile) TTS (90%-ile) Percentage optimal D-Wave-V5 100 $\ge$224ms 202ms 214ms 265ms 85-100% D-Wave-V6 1000 $\ge$56ms 37ms 40ms $\ge$50ms 85-100% D-Wave-V5 excluding setup 100 $\ge$23ms 1.4ms 13ms ~64ms 85-100% D-Wave-V6 excluding setup 1000 $\ge$20ms 0.63ms 3.8ms $\ge$14ms 85-100% Prog-QUBO-S1 1000 2.8ms 1.7ms 3.5ms 5.2ms 100% Prog-QUBO-S3 1000 1.6ms 1.2ms 2.1ms 2.9ms 100%

(Prog-QUBO lines in above table constructed from data here, here. Prog-QUBO running single-core, single-threaded; full test conditions explained here. D-Wave lines constructed from data from Fig. 6 following the method of section 4 of [2].)

(Graphs made using this program. Last few D-Wave sample values, especially for the V5 cases, are unreliable and may be underestimates for the reasons given below.)

The values in the D-Wave-V5 line of the above table were calculated from the information in Fig. 6 of [2], and the methods of section 4 of that paper. The last column of Fig. 6 consists of 100 circles, each representing, for its corresponding instance, the number of times the minimum found value showed up in a sample of 1000 anneals. These values (extracted directly from the pdf) are reproduced here, and in sorted form here. As an example, to calculate the TTS(80%-ile) D-Wave-V5 figure, we first look at the 20th and 21st multiplicities in the sorted list, which are 21 and 25. We expect the 80th percentile number multiplicity to be between these two values, so call it 23. This then implies that 1000/23 anneals are necessary on average to reach an optimum and this takes 201+1000/23*.29=214ms. Obviously the D-Wave times for easy/medium instances are dominated by the setup component. In fact the setup component may be a slight overestimate because the figures reported in [2] were upper bounds estimated from the maximum of several example values. The graph on the right shows what the V5, V6 timings would look like without the setup component entirely.

The “percentage optimal” column is not meant to be a measured outcome from the comparison – it wouldn’t be fair, because the programs and D-Wave devices ran for different amounts of time – but it points to some uncertainty (at the time of writing) as to which of the 100 D-Wave test instances from [2] produced optimal answers. It doesn’t make a great deal of difference to the overall comparison as we could assume that the D-Wave device found optimal answers in all cases and the conclusion (that D-Wave is currently not as fast as software) would be the same, but all the same it is better to make a like-for-like comparison if possible. To a large extent this uncertainty can be mitigated by not delving too far into the tail of very difficult test instances. That is, it makes sense to compare an instance of median-difficulty for Prog-QUBO with one of median-difficulty for D-Wave, and it probably makes sense to compare 90th percentile-difficulty instances, since we don’t care if the harder 10% were solved slowly or not solved at all and we expect the easier 90% to have been solved optimally, but it is more risky to compare (say) 98th percentile-difficulty instances since all of the 98% of easier instances were solved optimally by Prog-QUBO, but might not have been solved optimally in the case of D-Wave.

There is another reason not to compare 98th percentile-difficulty instances, or anything beyond about the 90th percentile, and that is that the statistical uncertainty from the set of D-Wave tests would be too great, since we are only working from a set of 100 instances. In fact 1000 n=439 instances were tested on D-Wave, but only 100 of these were solved optimally by CPLEX, due to speed limitations at the time, and only the information about these 100 has been published in any detail. Looking at the above method of calculation, it is apparent that when the number of minimum values found gets too low, there will be a large uncertainty in the D-Wave time. It is already somewhat uncertain for the 90th percentile difficulty problem (since we are dividing 4 or 5 into 1000), and this uncertainty would be much larger for a more difficult instance.

It looks like Prog-QUBO-S3 is about 31x faster than D-Wave-V6 for a median problem, 19x faster for a problem in the 80th percentile of difficulty, and $\ge$17x faster for a 90th percentile problem. It looks like the D-Wave-V6 device is “catching up” as the problems get harder, but in reality it is the other way around due to the constant setup time (36ms) included in the D-Wave figures. From the point of view of what the V6 device can actually do, it is reasonable to include the constant setup time, but from the point of view of following trends, the setup time acts as a distortion. See below for a setup-less comparison and a discussion of trends.

Set 2. V6 chip comparison (size 502 graph)

Besides being physically faster than the V5 device at setup and readout, then V6 can handle problems up to size 502, which may be a more important difference as these instances can be considerably harder than those of size 439. Test results from the V6 device are less detailed than those from the V5 device and are described in [2] as “very preliminary”. Nonetheless, there appears to be enough information in that paper to make a comparison. This time the important information is contained in Fig.8 from which (again inspecting the pdf) it is possible to discover the number of minimum found values for 1000 instances, each having 10000 sample values (anneals).

Unfortunately, we have no way of knowing which of these values correspond to optimal solutions, which would be useful to know to make a like-for-like comparison with Prog-QUBO-S3 (which is heuristically finding answers that are verified to be optimal). But again we can more-or-less sidestep this problem by not going beyond about the 90th percentile of difficulty. This time the assumption is that if the minimum value occurs more than about 50 times in D-Wave’s 10000 samples (which happens for problems of up to about the 90th percentile of difficulty) then it is reasonably likely that it is the optimum. This assumes that D-Wave’s branching ratios are somewhat well-behaved (which is actually not completely clear from Fig. 5), but again any errors in these assumptions would be in D-Wave’s favour, so we don’t particularly care since these wouldn’t affect our overall conclusion (that D-Wave appears not to be currently as fast as software).

Size 502 (weightmode 5) QUBO results:

 Sample size TTS (mean) TTS (median) TTS (80%-ile) TTS (90%-ile) Percentage optimal D-Wave-V6 1000 $\ge$60ms 36.9ms 42.1ms 59.3ms 85-100%? D-Wave-V6 excluding setup 1000 $\ge$24ms 0.89ms 6.1ms $\ge$23ms 85-100%? Prog-QUBO-S1 1000 5ms 3.1ms 6.8ms 10ms 100% Prog-QUBO-S3 1000 2.6ms 1.9ms 3.5ms 5.0ms 100%

(Prog-QUBO lines in above table constructed from data here, here. Prog-QUBO running single-core/single-threaded; full test conditions explained here. D-Wave lines constructed from data from Fig. 8 following the method of section 4 of [2].)

The V6 chip takes (36+0.126k)ms to do one setup and anneal k times, though again these times may be slight overestimates.

Prob-QUBO-S3 is about 19x faster for a median problem, 12x faster for a 80th percentile problem and also about 12x faster for a 90th percentile problem. Similar comments to the V5/439 comparison apply.

Set 3. No external field test (size 439 graph)

We might also wonder how D-Wave performs on the hardest form of its native problem, i.e., the set of couplings/weights for the Chimera graph that produces the hardest problem. While probably not the most difficult form, a harder class of problems (for the heuristic solver) of the type that are directly encodable into D-Wave occurs when $J_{ij}$ are IID $\in \{-1,1\} (i<j)$ and $h_i=0$. As noted in [1], $h_i$ being non-zero biases the $x_i$ and makes the problem much easier, and experiments using Prog-QUBO confirm this. That paper has test results using $h_i=0$ instances, but only for the older 108 qubit device, and instances of this size are sufficiently small to be solvable very easily, even using only local searches, so we lack any D-Wave comparison for these size 439 tests but the software results are printed here for interest.

Size 439 (no external field, weightmode 7) QUBO results:

 Sample size TTS (mean) TTS (median) TTS (80%-ile) TTS (90%-ile) Percentage optimal Prog-QUBO-S1 1000 34ms 22ms 46ms 69ms 100% Prog-QUBO-S3 1000 16ms 12ms 22ms 31ms 100%

(Prog-QUBO lines in above table constructed from data here, here. Prog-QUBO running single-core/single-threaded; full test conditions explained here.)

The original Prog-QUBO no-external-field instances derive from “Ising Model mode” with $h_i=0$, which can be implemented using the program options “-w3 -x-1” which allows state variables to take values in {-1,1}. This is (distributionally) equivalent to using weightmode 7 (“-w7”), which constructs instances of these problems in QUBO form, i.e., with state variables taking values in {0,1}. The second form is used here since the other problems are also in QUBO form.

Optimal values and proving solvers

The 3000 test instances described above were all solved by Prog-QUBO-m5 (Prog-QUBO in proving mode, option -m5), i.e., a mode which guarantees to find the optimal answer. In each case the value found by Prog-QUBO in heuristic mode was in fact the optimal answer. These answers were all subsequently independently checked by Jean-François Puget using CPLEX.

Interestingly, the Prog-QUBO-m5 finds Set 2 (1000 size 502 weightmode5 instances) a lot harder than Set 3, even though it is the other way around for the heuristic solvers.

Full output for the proving runs can be found here: Set 1, Set 2, Set 3.

The timing comparison with the heuristic solver can be found in the summary files in these directories: Set 1, Set 2, Set 3.

Table of timings for proving runs.

 Problem set Sample size TTS (mean) TTS (median) TTS (80%-ile) TTS (90%-ile) Prog-QUBO-m5 Set 1 1000 1.8s 1.6s 2.5s 3.4s Prog-QUBO-m5 Set 2 1000 110s 110s 150s 170s Prog-QUBO-m5 Set 3 1000 13s 11s 18s 24s

(This table built from summary files in these directories: Set 1, Set 2, Set 3. Prob-QUBO running single-core/single-threaded; full test conditions explained here.)

CPLEX as used in [2] took a little under 30s for a median instance and between 1min and 10min for 80th and 90th percentile difficulty instances. Since then, Puget has found ways to improve CPLEX’s performance, and reports that CPLEX can solve a Set 1, 2, 3 problem in an average time of 4.8s, 41s, 79s respectively, using 32 threads. He uses a variant kind of average which is between the geometric and arithmetic mean, but on these problem sets this average is quite similar to the arithmetic mean (and median).

Prob-QUBO-m5 works by keeping track of which of the 256 states of each $K_{4,4}$ are necessary to consider as part of an optimal solution. Initially all 256 states are allowed for each of the $8^2$ $K_{4,4}$s. Then for each possible set of allowed boundary values, $b\in B$, of each $K_{4,4}$ (up to $2^{16}$), that $K_{4,4}$ is exhausted and the set of its optimal states, $S(b)$, is calculated. Then the program finds as small a set of $K_{4,4}$ states as it can that meets each $S(b)$ for $b\in B$. This becomes the new set of allowable states for that $K_{4,4}$. The process is repeated for all $K_{4,4}$s in the graph until no further restriction is possible. Then a treewidth-based exhaust is used with the restricted set of values. Before this is done, each of the eight possible global graph automorphisms is applied to the problem to see which one minimises the predicted running time.

This method is quite simple, but fast enough to do the job of verifying the 3000 test examples in a reasonable amount of time. The implementation readily parallelises, running nearly $n$ times faster on an $n$ core machine for the harder instances (tested for $n=4$ and $n=6$).

Scaling and Speculation

Beyond seeing how the D-Wave devices match up with modern conventional processors in their absolute timings, one might also wonder how “cleverly” these devices are behaving in more fundamental terms. We would like to know whether D-Wave is acting non-trivially non-classically, and though this discussion does not claim to answer that question, it is still perhaps interesting to look a little further into the timings to see if we can extract any clues as to its behaviour, though naturally the interpretation of these results should be taken with a pinch of salt. (These interpretations are just a bit of fun, really.)

Rather than look at absolute timings (which depend arbitrarily on the technology level of the classical computer in the comparison), we can look at scaling behaviour. To do this, we need to jettison the setup time component of the D-Wave timings (which dominates the timings for easy/medium instances), because we are interested in the pure annealing. (The setup consists of reprogramming the inductive couplers and then waiting for the temperature to stabilise back down to the operating temperature of 0.02K. This takes 201ms in the V5 chip and 36ms in the V6 chip, but it is conceivable that this time could be significantly reduced further.)

To create another point of comparison, the heuristic S1 strategy is deliberately worsened into a strategy named S0 (Prog-QUBO-S0) where only local exhausts are used, that is, for each random start configuration, each $K_{4,4}$ is optimised in turn until no further improvement is possible. This has the potential to be an interesting test because doing local exhausts like this only is not likely to work very well if there are long-range dependences, and you may think that if the hardware works well even though there are long-range dependences, then that is a sign that something clever is going on.

To start with, here are the comparisons of D-Wave with Prog-QUBO-S0,S1,S3 , where the setup time has been excluded from the D-Wave figures.

(Last few D-Wave sample values in the above graphs, especially for the V5 case, are unreliable and may be underestimates for the reasons given in the Set 1 section above.)

We can fix n and look at the scaling as the difficulty of the problem instance increases. For n=439 there are five datasets, the original 100 n=439 V5 examples, the 1000 n=439 V6 examples, and 1000 Set 1 examples on which Prog-QUBO-S0, S1, S3 are run. Since the first of these only has 100 samples, we downsample the results from the 1000 instance datasets to be comparable. For n=502 there are only four datasets, because the V5 chip cannot go up to n=502.

The D-Wave samples assume that their results are optimal. As noted above, this assumption is reasonably safe for p up to 90 or so, but may be questionable in the 90…100 region, in which case the D-Wave sample points shown in the above graphs can be regarded as lower bounds.

The far left tail of the D-Wave plots, on the other hand, may be underestimating its speed slightly, partly because the really easy problems morally require only a fraction of an anneal, but this is not represented here.

For n=439, the V5 and V6 seem to have similar scaling behaviours. In both graphs for p>50 the V5/V6 chips appear to be scaling (on these logarithmic plots) as multiples of the software strategies S0, S1, S3. For n=502 the multiples are approximately 1.8, 3, 3.5, which would correspond to the relationships $\text{TTS}_{\text{V6}}(p) = (\text{const}_0)(\text{TTS}_{\text{S0}}(p))^{1.8} = (\text{const}_1)(\text{TTS}_{\text{S1}}(p))^3 = (\text{const}_3)(\text{TTS}_{\text{S3}}(p))^{3.5}$ for p>50.

So by that measure, the D-Wave chips aren’t doing anything very interesting, being worse than the local optimiser Prog-QUBO-S0. On the other hand, we can fix the difficulty and look at the scaling as n increases, and arrive at just the opposite conclusion.

D-Wave timings excluding setup:

Median 80%-ile 90%-ile Mean
V5-439 1.45ms 12.6ms ~64ms ~23.1ms
V6-439 0.63ms 3.8ms ≥13.8ms ≥20.1ms
S0-439 57ms 160ms 290ms 130ms
S1-439 1.7ms 3.5ms 5.2ms 2.8ms
S3-439 1.2ms 2.1ms 2.9ms 1.6ms
V6-502 0.89ms 6.1ms ≥23.3ms ≥24.1ms
S0-502 230ms 750ms 1500ms 830ms
S1-502 3.1ms 6.8ms 10ms 5.0ms
S3-502 1.9ms 3.5ms 5.0ms 2.6ms
V6-502/V6-439 1.4 1.6 ~1.7? ~1.2?
S0-502/S0-439 4.0 4.7 5.2 6.4
S1-502/S1-439 1.8 1.9 1.9 1.8
S3-502/S3-439 1.6 1.7 1.7 1.6

(Prog-QUBO lines in above table constructed from data here: S0-439, S1-439, S3-439, S0-502, S1-502, S3-502, D-Wave data from here: V5-439, V6-439, V6-502. Prog-QUBO running single-core/single-threaded; full test conditions explained here.)

So by this measure, the D-Wave chip is scaling up to the larger problem size better than S0, S1, S3 are.

Conclusion

Overall, it looks like there is reasonable evidence that the D-Wave devices can currently be bettered in software. Trying to imagine how this conclusion could be untrue: (i) it could be that there is some class of problem, hitherto untested, on which the D-Wave does much better than software (and that this class of problem is useful), (ii) it could be that the test set used here is not in fact similar to the test set used in [2] (which is easily checkable), (iii) it could be that there is a new D-Wave device that is much better than the ones for which the test results were published.

On the other hand, we have actually been quite conservative in our estimates of the software speed. (i) We have used a test set which is kind to D-Wave. If we imagine D-Wave to be helping solve some real-world problem, then such a problem would have to be translated into QUBO (or Ising Model) form, quite possibly incurring a large loss of efficiency. To prove a point, the above software could simulate this whole process down to emulating the resulting D-Wave QUBO problem, but in practice it is possible that a more direct method would be a lot better (as we saw with QAP). (ii) Even the particular form of the native problem chosen may be kind to D-Wave, since in Ising model form, the non-zero couplings ($J_{ij}$, $h_i$) are all the same magnitude ($\pm1$), which presumably makes it possible to set them in an analogue device, whereas $J_{ij}$ and $h_i$ arising from a real problem might exhibit a troublesomely wider dynamic range. On the other hand, this would also make it more difficult for the software heuristic solver. (iii) The program Prob-QUBO-S3 is fast enough to prove a point, but I’m sure that it’s possible to write something much faster. (iv) The software timings quoted are all single-threaded, but any normal desktop processor now would have multiple cores/threads available and Prog-QUBO-S* readily parallelise for all but the easiest instances, and a six core machine could run nearly six times as fast for the hard instances. (We’ve also been giving the benefit of the doubt to the D-Wave machine about price, in that right now we could obviously buy thousands of ordinary computers for the cost of one D-Wave machine, and we have only been comparing one D-Wave device with one desktop machine. The large cost of a present D-Wave machine is no bad reflection on it, in that if it’s a pioneering device then it will naturally be more expensive than a mass-produced device with decades of development behind it. In effect the above comparison assumes that D-Wave is ultimately mass-produceable down to the cost of a decent desktop machine, $1000 or so. This is perhaps slightly out of keeping with the ideal of comparing against a current D-Wave device, but it makes for a more interesting comparison in my view.) In general, I think that the 512 qubit Chimera graph may not be all that powerful a tool, as there are large induced subgraphs of small treewidth (see Appendix B), which means that it can be approximated to some extent by simple graphs. (There is nothing special about the Chimera family in this respect. Any planar graph with small degree local connectivity would be the same.) I imagine it will need to reach at least the promised 2048 qubits (and would really need to be doing some sort of quantum annealing) before it becomes something computationally useful, though even at that point the advantage over software is not totally clear. Of course, all this would not detract from the fascinating and wonderful achievement of D-Wave if the device really is doing something non-trivially non-classical. That is surely the interesting question and it would indicate whether something along these lines could eventually be made computationally useful (and much more). But I believe that’s a research question (a very interesting one), rather than a practical one right now. Appendix A. Test conditions All software (Prog-QAP/Prog-QUBO) tests were carried out on a hex core 3.2GHz Intel i7-3930K with 32GiB 1333MHz DDR3 RAM. The operating system was Ubuntu 12.04, 64 bit, using Linux kernel 3.2.0. Prog-QAP and Prog-QUBO are single-threaded, all timings are given for runs on a single core, and the quoted timings are CPU time as measured by the C function clock() and the Python funcion time.clock(). Though each given instance was run on a single core, to save time different instances were run simultaneously on multiple cores. To be clear, it was still the self-measured CPU time for each run that was reported, and in practice this is actually slightly larger (10% or so) than if the run had been run on its own on a single core, therefore the timing figures are conservative from this point of view and the multiple cores were not used to produce faster timings. The computers were free of any other intensive processes and at no point were more qubo processes running on them than there were cores available (thus avoiding any possible hyperthreading). The time taken to construct each qubo instance (generating random numbers, or reading from a file) was not included in the timings, on the grounds that the task is to solve the problem, not construct and solve the problem. The time taken to compile the instance into a form more amenable for the solver possibly should have been included in the timings (it’s a question of what form the problem is likely to appear in, in a real world example), though this was separately measured to be less than 0.1ms, so is virtually negligible for the purposes of these tests. Each heuristic qubo run consists of finding 500 optimal solutions which makes it possible to measure the CPU time to reasonable accuracy with the operating system CPU clock whose resolution is 1 centisecond. (The fastest TTS was 0.3ms, so 500 of these takes 15 centiseconds.) Each solution was found from scratch with no information allowed to leak from the previous solution. No attempt was made to flush the processor cache between optimal solutions, though the effect of this should be small since even the original form of the weight graph fits in 14kiB and the time taken to read this from ‘cold’ using the above-mentioned RAM is no more than a few microseconds. Appendix B. Description of QUBO heuristics Collapsed view of the Chimera graph of order 8 where each blob represents four vertices, each diagonal line represents a$K_{4,4}$with all four vertices connected to each other by 16 edges, and each horizontal and vertical line segment represents four parallel edges connecting each vertex to its corresponding vertex. It is convenient to picture the Chimera graph in collapsed form where each half of a$K_{4,4}$is represented by a single vertex, as above. The program works in terms of such “big” vertices, each of which has$2^4=16$possible states according to the spins of the four original vertices within. The program has several “strategies”. Those described in this note are all of the form: 1. Repeat: 2. Choose a random configuration 3. Repeat: 4. Choose a subset,$E$, of big vertices (semi-$K_{4,4}$s) from a collection,$\mathcal{S}$, of possible subsets 5. Exhaust over$E$, i.e., find an optimal configuration consisting of varying states in$S$and leaving others unchanged. Notes: • The Repeat in step 3 is for a number of steps depending on the quality of the current configuration. Rare, good configurations are worked on more than common, poor ones. It tries to equalise the amount of time spent at each quality level above (numerically below, since lower=better) the median. • It is important that it chooses randomly between equally good answers when exhausting, otherwise it is much more likely to get stuck. • It runs in two different modes. -m0 just runs forever, printing something every time it finds a new best answer. -m1 does repeated runs from a clean start in order to find a more accurate value for the expected time to reach an optimum. (This is slightly complicated by the fact that it doesn’t know the optimum before it starts, and it can’t use the value of the optimum to guide its search.) • The way the above description relates to the description above is that it replaces step 5. there. The collection$\mathcal{S}$is fixed, depending on the strategy, S0, S1, S3, or S4. (S2 is a hybrid not currently in use.) To illustrate each$\mathcal{S}$, a typical element$E\in\mathcal{S}$is illustrated below, together with the induced edges.  A$K_{4,4}$used for strategy 0. There are 64 possible sets like this. Coverage = 2/128 = 1.6% of vertices. A line of$K_{4,4}$s used for strategy 1. This could also be a horizontal line so there are 16 possible sets like this. Coverage = 16/128 = 12.5% of vertices. A maximal induced tree used for strategy 3. Taking into account symmetries, there are 32 possible sets of this form. Coverage = 100/128 = 78.1% of vertices. A maximal induced treewidth-2 subgraph used for strategy 4. Coverage = 114/128 = 89.1% of vertices. A maximal induced treewidth-2 subgraph used for strategy 4. Coverage = 107/128 = 83.6% of vertices. The subgraphs corresponding to the sets in S0, S1 and S3 have treewidth 1 (they have no loops), which means they can be exhausted in approximately$16^2$elementary steps per vertex. This exhaust takes into account interactions between vertices in$E$(black) and those not in$E$(grey). Loops involving non-$E$vertices don’t matter because the vertices are not part of the exhaust – they just act like external fields. Since the non-$E$vertices are included in the interaction, it means that the global value is guaranteed not to get worse and optimal configurations remain optimal. (Though they can change state after the exhaust due to the random choice between equal value configurations.) The subgraphs in S3 are maximal induced trees – adding any other vertex would create a loop. (An induced subgraph is one obtained by adding in edges from the original graph between each pair of vertices in the subset.) We could go further and exhaust over a spanning tree (one including all vertices), but then we’d have a problem with the interactions along non-tree edges. It would be possible to do something, like treat the other end of the edge as a constant, using the previous value, but whatever you do you will lose the monotonicity property, and optima may not remain optima, so I think this will not perform as well (though I’ve not tried it). The S3 collection was chosen for convenience and because it looks like it gives a pretty reasonable coverage, but there are many other maximal induced trees besides those in S3, some with higher coverage such as this one: A maximal induced tree not used for strategy 3. Coverage = 103/128 = 80.5% of vertices. The subgraphs in S4 are maximal induced subgraphs of treewidth 2 (characterised by having no$K_4$as a minor subgraph). This means that they can be exhausted in approximately$16^3\$ elementary steps per vertex. As it happens, S4 didn’t perform as well as S3 for the problem instances chosen. This may be because the instances weren’t hard enough to justify the extra work, or it may be because the 2-connected components of the subgraphs in S4 aren’t very big, in which case it might work out better to use graphs like this one

A maximal induced treewidth 2 subgraph, not used for strategy 4, with a large 2-connected component. Coverage = 112/128 = 87.5% of vertices.

with a large 2-connected component, but I haven’t tried this. Or possibly it’s better to use a hybrid method where S3 is used on low-quality configurations and S4 on high-quality configurations. (Update: treewidth 2 methods do turn out to be advantageous on harder problem instances.)

1. My first takeaway is that CPLEX must be a really sucky software package if it does so much poorer than your simple solvers. (There this time even got the spelling right).

2. […] More Updates (June 2): Alex Selby has a detailed new post summarizing his comparisons between the D-Wave device (as reported by McGeoch and Wang) and his own […]

3. Ben Recht says:

The discrepancy between your average minimum and the McGeoch and Wang’s average may be due to the way you are generating your J matrix. It looks like you assign J_{ij} and J_{ji} as independent Bernoulli random variables. Making the cost symmetric (J_{ij} = J_{ji}), might eliminate the discrepancy.

4. Charles says:

As I understand it CPLEX solves a completely different problem: it finds the optimum solution and then proves optimality. This is a much harder task than just finding a solution that seems to be good.

5. Alex Selby says:

Henning, I think CPLEX may be quite good at some things, but I suppose it isn’t designed for this specific task.
In fact, I would be very grateful if someone could use CPLEX to verify the optimality of my answers. (See adjusted text in the above article.)

Charles, I was trying to understand this point by looking at the CPLEX manual. Of course proving optimality is much harder than finding a solution, but CPLEX may have a way of just going for a solution.
ftp://public.dhe.ibm.com/software/websphere/ilog/docs/optimization/cplex/ps_usrmancplex.pdf
It looks like (p.474), CPLEX would be running in MIQP mode and (p.479) it has a facility (MIPEmphasis=1) to favour finding a good solution before finding a proof of optimality. Of course it is possible that even in this mode it is a proving solver at heart, and it is spending time looking for bounds. Are you familiar with CPLEX?

6. Douglas Knight says:

You say that Google and Lockheed have “invested” in dwave. Do you have a citation for this claim? All I have seen is that they have bought sculptures, not that they got any equity bundled with their purchase. (I think NASA’s only involvement is to host Google’s sculpture, but I’m not sure.) Or do you mean “invested” only in an idea and not in a financial sense?

7. blaze says:

Don’t you think for your results to be relevant / worth reporting you should be checking for optimality as a part of your analysis?

8. Alex Selby says:

Ben, as it happens the author contacted me to say the same thing: the Q_ij matrix should be constrained symmetric. I will repeat the runs with this mechanism of problem generation.

I did wonder about this before which is why there are four different problem generation mechanisms built into the program (-w0,-w1,-w2,-w3). The trouble is, I can’t find a mechanism that is consistent with (i) the mechanism as stated in the paper, (ii) the parity constraint: energies are all even, but not necessarily a multiple of 4, (iii) the expected value of the minimum should be near -815.2.

9. Alex Selby says:

Douglas, thanks for pointing that out. It was a loose phrase and I’ve changed it to “made purchases”.

10. Alex Selby says:

blaze, I think you are right that I was probably too hasty. I believe that the solutions are optimal (or at least >=97.5% of them are, which is what is required), but I have not established that. However, I do have some reasons for believing it, which I’ll add to the article, and you can see if you find them at all convincing.

The proper way forward is either to run a proving solver over my test cases, or (more ideally) use the same test cases that were used for the D-Wave (for which the optimal value is known) and compare the answers.

11. Alex, my understanding is that CPLEX was used in a mode that did not check for the optimal solution.

12. Peter Shor says:

Having seen Cathy McGeoch’s talk, CPLEX was indeed used in a mode that would have proved the solution optimal, but the times they recorded were for the first point that CPLEX found the optimal solution, and not the time at which it proved it optimal. With longer running times, it eventually was able to prove most of the solutions optimal (the cutoff time of 30 minutes was enough to prove all but a handful of instances optimal).

13. JF Puget says:

I’d be happy to run CPLEX to check if your solutions for QUBOs are optimal.

14. Alex Selby says:

Thank you JF. My program now incorporates a proving-solver which has verified my solutions as optimal. However, I would still be most grateful if you were to run CPLEX anyway since this would have the effect of verifying my program. I will follow up by email.

15. Cathy McGeoch says:

Alex,

As you know from our emails I do not think that 4 hours is anywhere near the right estimate for Blackbox time to solve a single QAP problem.

And it is of course possible for your QUBO solver to be faster than hardware according to your definition of success, and to be slower than hardware according to the two definitions used in our paper. I think conclusion 4 is premature, at best.

16. Alex Selby says:

Yes, I know (and pointed out above) that you don’t agree this is a correct time, but I still don’t fully understand why it is not correct, which is perhaps my failing. The best understanding I can come up at the moment is that Blackbox/D-Wave was not tuned for timing on these tests, so it makes more sense to try to work out what would be the shortest amount of time that it might have used. Does that correctly summarise your position?
If so, then perhaps we should adjust the D-Wave time down by some factor, but I’d be surprised if it made more than a factor of 100 difference, in which case the conclusion would surely be similar.

Re QUBO time, I tried my best to use the definitions in your paper. I used the smaller figure for D-Wave running time from section 4, rather than the flat 490ms from section 3.2. I don’t see the two definitions in your paper that make D-Wave come out faster.

17. JF Puget says:

I was able to prove the optimality of the 1000 easy instances (weightmode5) using CPLEX. It computes the same optimal solutions as you which is a quite convincing cross validation. The average running time is about 10 seconds per problem, and at most 40 second. I’ll document the model I use in my blog soon.

18. Alex Selby says:

Thank you very much, JF – that’s good to know. For clarification, weightmode5 is the internal name for the method of generating QUBO instances that is hopefully equivalent to that used by McGeoch-Wang in their tests. (I didn’t call these instances easy, only easier than the ‘hard’ instances – the ones with no external field.) This set of 1000 instances forms the basis of most of the QUBO discussion in the above writeup.

19. JF Puget says:

Hi Alex, I’ve also updated my blog post
https://www.ibm.com/developerworks/community/blogs/jfp/entry/d_wave_vs_cplex_comparison_part_2_qubo?lang=en
with results on your three instance sets (439, 502, and noextfield). CPLEX finds the same objective values as your solver on all instances, which is a pretty strong cross validation. Timings are also interesting, no solver dominates the other.

20. Alex Selby says:

Hi JFP. Thanks for doing that – it’s good to have a check.

21. HJ Garcia says:

Hello Alex,

I’m interested in understanding some of the details in your heuristic. Can you forward me some references or a short description of what the heuristic does? Much appreciated. Thanks!

22. Alex Selby says:

Hi HJ. I’ve added some more details about the heuristic in Appendix B.

23. […] Experts argued over whether the device, created by D-Wave Systems, in Burnaby, British Columbia, really offers the claimed speedups, whether it works the way the company thinks it does, and even whether it is […]

24. […] Presumably D-Wave Systems got together a lot of smart graph theorists and IC designers and thought really hard about these problems. What they came up with was the Chimera graph: […]

25. […] D-Wave 2 did a certain calculation 3,600 times faster than a conventional machine. But as it turned out, the likelier reason the conventional computer was slow was that its software was slow. […]