## Harder QUBO instances on a Chimera graph

### Purpose of page

The purpose of this page is to explore harder instances of the kind of problems considered on this D-Wave comparison page. There is no direct comparison here to any D-Wave results, because the examples considered do not correspond to existing D-Wave experiments, or in the case of N=16 (2048 spins) to an existing D-Wave device. This is mainly meant to be a standalone exploration of the hardest instances of D-Wave-type problems on Chimera graphs of order 16. It also serves the purpose of giving some of the higher order Prog-QUBO strategies a proper workout since their advantage did not show with the previous easier set of examples.

D-Wave is a hardware device that finds low energy states of a spin glass Ising model on (subgraphs of) a Chimera graph, $C_N$. Here spin glass means roughly that the edge weights are random and can equally well be of either sign. This leads to frustration, which means that finding the ground state is a kind of logic puzzle. This is in contrast, for example, to the ferromagnetic situation where all the weights are negative, in which case finding the ground state is trivial since you just have to align all the spins.

If you assume that the D-Wave device is an oracle that, given a set of edge weights, quickly gives you the ground state of the system, then the question arises, what is the most useful computation you can get out of it? An upper bound for this is given by the difficulty of the hardest Ising-Chimera instances, which are discussed below. First here is a brief mention of the lower bound on what D-Wave can do.

### Discussion of lower bound

A lower bound is going to be more tricky to state, since it will depend on P$\neq$NP, and also on certain properties of D-Wave hardware that have not to my knowledge been published. One way to approach a lower bound is to note that the complete graph, $K_{4N+1}$, can be minor-embedded into $C_N$. $K_{4N}$ can be minor-embedded by letting the $r^{\rm th}$ vertex of $K_{4N}$ correspond to the union of the $r^{\rm th}$ row and $r^{\rm th}$ column of $C_N$. This can be augmented to $K_{4N+1}$ by separating one of these row/column pairs into two separate minor-embedded vertices, and $4N+1$ is the best you can do since the treewidth of $C_N$ is $4N$.

This means that a $C_N$-based Ising problem is at least as hard as a $K_{4N+1}$-based Ising problem. You could regard Ising model problems on complete graphs as a standardised scale of difficulty, so bounding $C_N$ below by $K_{4N+1}$ is some kind of an answer to the question of the difficulty of $C_N$. To make this precise, we’d need to add something bounding the size and precision of the weights on the complete graph, for example by restricting them to $\pm1$, and say something about the external field, e.g., disallow it. (Of course, in the absence of a proof of P$\neq$NP we don’t know for sure how hard even these complete graph problems are, but we might guess they are exponential in the number of vertices.)

It should be added that the problem on $C_N$ that corresponds to a $\pm1$-weight problem on $K_{4N+1}$ will have weights of the order of $N$ (*), and this might present a practical obstacle to a hardware-based $C_N$-solving device, though it’s hard to imagine that the burden of coping with a linear (in $N$) increase in the size of weights would undo the presumed exponential (in $N$) advantage of being able to solve a problem on $K_{4N+1}$. (*) This follows from Theorem 4.2 of Vicky Choi’s paper on minor-embedding. (As it happens, the bounds given there can be strengthened in the case of a small external field. In the case of no external field, we can impose $F_i^e<-\frac12 C_i$ which is an improvement in our case where $l_i=4$ for all but two $i$.)

### Discussion of hardest cases

Turning now to the hardest Chimera instances, we seek to find as difficult a class of instances as possible for N=8 (512 spins) and N=16 (2048 spins). It should be pointed out that there is no claim to proof that the classes found are the hardest there are. They were found by experimentation and then by optimising an ansatz. Hardness is defined here as the TTS, the average time to find an optimal solution / ground state, required by a particular strategy of the program Prog-QUBO. (Of course this program is not optimal so it’s possible that instances that it finds hard would not be hard to another program, but that doesn’t matter since we’re only after an upper bound on hardness. As it happens, in the previous set of examples there was considerable agreement between the instances that Prog-QUBO found harder and those found harder by CPLEX, which uses an entirely different method.)

The original N=8 QUBO class, for which D-Wave results were published, was defined by letting $J_{ij} (i<j)$ and $h_i$ be IID in $\{-1,1\}$. This class is now made harder in the following four ways, each elaborated on below.

• Increasing $N$ (and not disabling any vertices).
• Setting the external field to zero.
• Continuising (not a word, but hey) the weights $J_{ij}$.
• Rebalancing the intra-$K_{44}$ and inter-$K_{44}$ weights.

Naturally, increasing the number of vertices will make the instances harder and here we consider the cases of $C_8$ with 512 vertices and $C_{16}$ with 2048 vertices.

Setting the external field, $h_i$, to zero was discussed in the D-Wave comparison as something that makes things harder with a Chimera graph. This is in contrast to the case of a planar graph, where a non-zero external field would make things (a lot) harder. The reason for this is that an external field effectively turns a planar graph into a new graph with one node connected to all the others, and this is highly non-planar in general. But in our case $C_N$ is already highly-non-planar, since every one of the $N^2$ $K_{44}$s introduces an element of non-planarity, so methods that are successful for planar (or low genus) graphs are of limited benefit here. Instead, the effect of an external field in the case of $C_N$ is to introduce a bias which gives a “clue” as to which way the spins should point. This (presumably) aids heuristic-type searches by decreasing the correlation length.

After some experimentation it became apparent that another factor making the previous examples easier was the discreteness of the weights, in particular restricting to using only $\pm1$ which makes the ground state is highly degenerate. That is, there are many local minima that “happen to be” global minima. This is cured by choosing $J_{ij}$ from a continuous distribution, or in practice a sufficiently finely divided discrete distribution. The intra-$K_{44}$ weights were chosen uniformly from the set $\{-100,-99,\ldots,99,100\}$. Using more steps than this did not appear to make it much harder. No attempt was made to discover if a nonuniform distribution would make it harder still. I don’t know if D-Wave hardware can cope with this kind of precision, but this doesn’t matter for the purposes of trying to get an upper bound to its capabilities.

Turning to the last item, “rebalancing”: neglecting edge effects, there are essentially two types of edge in a Chimera graph, the intra-$K_{44}$ edges and the inter-$K_{44}$ edges, so it’s reasonable to suppose that we correspondingly use two different distributions for the weights, $J_{\rm intra}$ and $J_{\rm inter}$ say. It’s clearly wrong to use either extreme $J_{\rm intra}\gg J_{\rm inter}$ (the $K_{44}$s split off into $N^2$ independent subgraphs) or $J_{\rm intra}\ll J_{\rm inter}$ (the graph splits into two trivial planes of horizontally connected and vertically connected vertices). But it is also presumably wrong to make these distributions the same, for then the two semi-$K_{44}$s within each $K_{44}$ would be more “tightly bound” than two semi-$K_{44}$s are between adjacent $K_{44}$s, since there are 16 edges from one semi-$K_{44}$ to another within a $K_{44}$, but only 4 to another semi-$K_{44}$ from an adjacent $K_{44}$. As the weights have mean 0 for maximum “glassiness”, a reasonable hunch is that balance is achieved by making $J_{\rm inter}$ about twice as big as $J_{\rm intra}$, reasoning from $(16/4)^{1/2}=2$. Something like this ratio does indeed turn out to generate the hardest examples of the form tried, as is seen below.

So then, after various experiments, an ansatz of the form $J_{\rm intra}=U\{-100,-99,\ldots,99,100\}$ and $J_{\rm inter}=U\{-L,-L+1,\ldots,L-1,L\}$ ($U$ meaning uniform distribution) was settled on, with the parameter $L$ to be determined by experiment. The best search strategy for N=8 was found to be S13, and the best for N=16 was S14. S13 and S14 are variants of the previously-described S3 and S4 in which instead of the state being completely randomised every so often, only a certain proportion of the nodes are randomised. It is interesting that the treewidth 2 strategy, S14, is superior to the treewidth 1 strategy S13 for N=16. Previously with N=8 there was no example where using treewidth 2 was an improvement on treewidth 1, but now it appears that was because the problem instances were too easy for the treewidth 2 strategies to overcome their initial overheads.

The heuristically generated answers here have not all been verified for optimality. In both N=8 and N=16 cases, the first runs that determine $L$ have not been checked and the final N=16 run at L=220 has not been checked either. Or in other words, the only answers that have been checked are those from the final run with N=8 at L=200. Doing a proving search at N=16 is beyond the capability of the current exhaustive searcher. I hope to upgrade the proving search to cope with these more difficult cases, but in the meantime the N=16 results rely for optimality on the assumptions alluded to previously. I think there is good reason to believe that the 500 N=16 answers below at L=220 that were searched to 50 independent solutions, are in fact optimal.

A plausibility argument is as follows. The heuristic search is generating a stream of local minimum states. Every now and again it resets itself so that it does not use any previously discovered information (that required non-negligible processing). Saying that two states are independently generated just means that there was such a reset separating the two finds. (It is possible they are the same state.) If you heuristically search until you have generated 50 independent presumed solutions, i.e., 50 states of energy $E_0$ where $E_0$ is the minimum energy seen on the whole run, then along the way you will find other local minimum states with energies $E>E_0$. For a particular run, let $n_E$ denote the number of times that a state of energy $E$ independently crops up before a lower energy is discovered. (There is a technicality explained before, whereby the first occurrence of energy $E$ doesn’t count.) Now taking the collection of all $\{n_E|E>E_0\}$ over all the 500 N=16 final runs (below), we get an empirical process distribution, though cut off at 50.

$n_E=i$ Number of occurrences =$m_i$
1 309
2 71
3 26
4 21
5 9
6 7
7 3
8 5
9 1
11 1
12 1
13 1
26 1
$\ge$50 ?

The number of occurrences sums to 456, which is less than 500 because on average there is less than 1 positive $n_E$ per run. If $i$ doesn’t appear in the left column, it means that $m_i=0$. The probability of an error, that is a heuristically declared optimum that isn’t actually the optimum, is the probability that the a random instance of the uncutoff version of the above process extends to 50 or beyond. Assuming some kind of regularity, we may crudely estimate this by modelling each $n_E$ event as a sample from a geometric distribution with an unknown parameter, $p$. That is, to generate $\{n_E|E>E_0\}$ for a particular run, you determine how many events will occur using a Poisson process, then for each event look up $p$ in a suitable prior, then take a sample from a geometric distribution whose parameter is that value of $p$. The concentration in the above chart is evidence for a single prior being a useful explanation, so further crudely estimating the prior to be concentrated on the MLEs of the geometric distributions corresponding to the above samples, that is $P(p=i/(i+1))=m_i/456$, we arrive at the following formula estimating the error probability (actually expected number of errors, but it’s practically the same thing for small values):
$$\sum_i \frac{m_i}{500}\left(\frac{i}{i+1}\right)^{50}=4.7\times10^{-4}.$$
This estimate relied on various assumptions, and the outlier value 26 may indicate that all is not well. On the other hand, the N=8 cases that have been verified (including 3000 examples from the previous analysis where the heuristic value turned out to be correct) have all shown a reasonably well behaved energy landscape with no decoy local minima that look like global minima.

### Results for N=8

Run used to determine the worst-case value of L.

L Sample size Number of optima
found per instance
TTS mean (seconds) Standard error (seconds) Percentage optimal
100 1000 10 0.0467 0.0013 100%?
125 1000 10 0.0553 0.0018 100%?
150 1000 10 0.0581 0.0020 100%?
175 1000 10 0.0591 0.0017 100%?
200 1000 10 0.0583 0.0016 100%?
225 1000 10 0.0569 0.0017 100%?
250 1000 10 0.0525 0.0014 100%?
275 1000 10 0.0523 0.0015 100%?
300 1000 10 0.0508 0.0015 100%?
325 1000 10 0.0471 0.0016 100%?

Final run with L=200 and increased number of optima per instance.

L Sample size Number of optima
found per instance
TTS mean (seconds) Standard error (seconds) Percentage optimal
200 1000 50 0.0592 0.0020 100%

The above table was constructed from this summary file. Problem instances are stored here. Timings are on a single core, single-threaded; full test conditions explained here. Results can be reproduced with this command (replacing 123 with the desired instance number, and replacing -m1 with -m0 to find a single minimum as fast as possible):

./qubo -N8 -S13 -p50 -m1 testproblems/weightmode9/myformat/123

### Results for N=16

Run used to determine the worst-case value of L.

L Sample size Number of optima
found per instance
TTS mean (seconds) Standard error (seconds) Percentage optimal
100 200 10 88 7.8 98-100%?
160 200 10 133 11.8 98-100%?
220 200 10 164 15.8 98-100%?
280 200 10 128 13.7 98-100%?
340 200 10 119 11.3 98-100%?
400 200 10 110 10.9 98-100%?
460 200 10 94 7.8 98-100%?
520 200 10 85 7.1 98-100%?
580 200 10 57 4.2 98-100%?
640 200 10 62 3.9 98-100%?

Final run with L=220 and increased number of optima per instance.

L Sample size Number of optima
found per instance
TTS mean (seconds) Standard error (seconds) Percentage optimal
220 500 50 162 10.7 99.8-100%?

The above table was constructed from this summary file. Problem instances are stored here. Timings are on a single core, single-threaded; full test conditions explained here. Results can be reproduced with this command (replacing 123 with the desired instance number, and replacing -m1 with -m0 to find a single minimum as fast as possible):

./qubo -N16 -S14 -p50 -m1 testproblems/weightmode10/myformat/123

(In this N=16 case, the problem instance numbers are not contiguous. This is due to having separate batches running on three different computers and the fact that the problem number directly maps to the seed used to generate the instance.)

### Conclusion

A tentative conclusion is that software may still be competitive with (future) hardware even up to 2048 qubits. The bottom line software figure from N=16 is 162 seconds per minimum. Of course there is no corresponding D-Wave device to compare with at present, but the current devices operating on 512 qubits take of the order of 10s of milliseconds to produce an answer. It may seem like 162s is slow by comparison, but there are several factors working in favour of software: the hardware may not be able to cope with the size/precision of the weights used here (hundreds to 1); the hardware may not work perfectly and may require error correction, cutting into its advantage; the hardware may not actually be returning absolute minima (ground states) in hard cases and if this is the case then the equivalent software could potentially be enormously faster; the software can easily be spread over multiple cores; lastly I think there is room to significantly improve the software presented here.

On the other hand there are some uncertainties in the above analysis, principally (i) are these instances really the hardest, and (ii) are the N=16 heuristic minima really true absolute minima?

Finally, I’d like to mention again that this isn’t intended as an exercise in negativity about D-Wave. If D-Wave really is the first quantum computing device operating with a large number of qubits then that is of course a stunning achievement and will surely lead to many wonderful things. Still, I think it is interesting to push the question of exactly how powerful it is. (And you never know, that line of inquiry may even lead to new software methods for spin glasses.)