Efficient subgraph-based sampling of models with frustration
13 September 2014
This blog post has been rewritten and converted into an arxiv article. The blog is retained here, but the arxiv article supersedes it.
31 January 2014
$\newcommand{\SC}{\mathbf{S}}\newcommand{\PP}{{\prime\prime}}$
Abstract:
Here is proposed a tree-like subgraph-based method for efficiently sampling certain graphical models. In the case of models with frustration, such as a spin glass, evidence is presented that this method can be more efficient than traditional single-site update methods.
Introduction:
Given a graph $G$ with a weight $J_{ij}$ for each edge $(i,j)\in E$, and spin variables $\mathbf{S}=\{S_i=\pm1|i\in V\}$ for each vertex, we can consider the energy function defined by
$$H=H(\mathbf{S})=-\sum_{(i,j)\in E}J_{ij}S_i S_j.$$
Here the edges are considered to be undirected, so that if $(i,j)$ is in the sum, then $(j,i)$ is not. We imagine that there is frustration in the model, i.e., lots of little loops where the product of the weights $J_{ij}$ is negative. This typically arises when $J_{ij}$ is itself chosen randomly, such as in the Edwards-Anderson spin glass. What we’re about to describe works for numbers of spin states other than $2$, but for simplicity we’ll describe it in the case of $2$ states and it will be clear how to extend it to other numbers.
Given an inverse temperature, $\beta$, the Gibbs/Boltzmann distribution over sets of spin configurations is given as usual by
\begin{align*}
P(\mathbf{S})&=Z(\beta)^{-1}e^{-\beta H(\mathbf{S})},\qquad\text{with}\\
Z(\beta)&=\sum_\mathbf{S} e^{-\beta H(\mathbf{S})}.
\end{align*}
Of course sampling from this distribution includes the problem of sampling the ground states (sampling at $\beta=\infty$), and if there is frustration then finding any ground state is NP-hard. Nevertheless, we saw here and here that for a sparse graph like the Chimera graph, ground states can be found quite efficiently even up to large sizes like $N=2048$ by searching over low treewidth subgraphs. Furthermore, we saw here that this subgraph-based search is considerably more efficient than the traditional methods, which usually amount to an optimised way of altering each spin in turn, only considering the immediate neighbours. (In models without frustration, more efficient block-update methods such as Swendsen-Wang are available.)
We might imagine that there is an analogue of the low-treewidth-subgraph-based ground state search to the case of sampling at non-zero temperature, and that proves to be the case.
Previous and related work
The sampling method described here is similar to an earlier technique described by Hamze and de Freitas (see also here), which focuses on a Markov Random Field with observations attached. The method considered there also uses subgraphs, but differs from the one described here in that it only uses trees for subgraphs, and in fact an exact partition of the graph by trees. These restrictions are not necessary for the purposes of obtaining detailed balance, though under these conditions the authors cleverly manage to prove rigorous bounds showing the efficacy of their method.
Description of sampling method
Take a collection of induced subgraphs $T_1,\ldots, T_k\subset G$. (Recall that an induced subgraph is the restriction of $G$ to a particular subset of vertices, and contains all the edges of $G$ between those vertices.) We require that $\cup T_i=G$, i.e., every vertex and edge is represented in some $T_i$.
As we’ll see, the idea is that $T_i$ should be chosen to be easy to exhaust over. A typical choice would be to restrict to subgraphs of a given treewidth (though it is not necessary to take all subgraphs of a given treewidth). For simplicity, the rest of this discussion is illustrated by the case of treewidth 1, i.e., $T_i$ will all be trees.
Given an induced subgraph, $T$, and a spin configuration, $\SC_{G\setminus T}=\{S_i|i\in G\setminus T\}$ defined on the remainder of $G$, we can condition $P()$ on $\SC_{G\setminus T}$ to get a distribution, $P_{T,\SC_{G\setminus T}}$ over the spin configurations, $\SC_T=\{S_i|i\in T\}$, over $T$.
The Monte-Carlo step is to take $T$ to be a random $T_i$ from our fixed collection and then replace $\SC_T$ with a randomly configuration chosen according to the conditional distribution $P_{T,\SC_{G\setminus T}}$. This defines a sampling procedure, subject to the usual Monte-Carlo caveats about knowing how much burn-in to take, knowing how long to wait for independence etc..
We need to show
- that this operation has the correct invariant distribution,
- that this operation is efficient to perform, and
- that the performance can be better than traditional methods.
We’ll take these in turn.
Correct invariant distribution
To show detailed balance, consider the flux from spin state $\SC$ to $\SC’$. As a matter of notation, let us take $\SC\Delta\SC’$ to mean $\{i\in V|S_i\neq S’_i\}$. Then the flux $\SC\to\SC’$ is equal to
\[P(\SC)\frac{1}{k}\sum_{T\supset\SC\Delta\SC’}P_{T,\SC_{G\setminus T}}(\SC’_T),\]
where the sum is over subgraphs in our fixed collection that include $\SC\Delta\SC’$. This is equal to
\[P(\SC)\frac{1}{k}\sum_{T\supset\SC\Delta\SC’}\frac{P(\SC’)}{\sum_{\SC”|\SC”_{G\setminus T}=\SC_{G\setminus T}}P(\SC”)}.\]
But $\SC\Delta\SC’\subset T$ is the same as saying $\SC_{G\setminus T}=\SC’_{G\setminus T}$, so the condition on $\SC^\PP$ in the sum in the denominator is equivalent to $\SC^\PP_{G\setminus T}=\SC’_{G\setminus T}$, and the whole expression is symmetric under interchanging $\SC$ and $\SC’$, proving detailed balance.
Efficient to calculate
For simplicity, let’s stick to the case where the $T_i$ all have treewidth 1, that is they are trees. In general, if $T_i$ had treewidth $w$ then the time taken to draw a random instance from $P_{T,\SC_{G\setminus T}}$, or to calculate $\sum_{\SC^\PP|\SC^\PP_{G\setminus T}=\SC_{G\setminus T}}P(\SC^\PP)$, would be roughly $2^{w+1}|E(T,G)|$ elementary operations, where $E(T,G)$ is the number of edges with at least one end in $T$. (Aside: the $2$ here is the number of spin states and $+1$ in $w+1$ matters, since in general there will be more than $2$ states. See later for an example with $16$. Really it would have been more convenient if treewidth had been defined to be one more than it is.)
To calculate efficiently, you need to work, in usual dynamic programming / treewidth style, from the leaves of $T$ inwards. So after $r$ steps, you will have a collection, $U_1,\ldots,U_m$ of connected subtrees of $T$ of total size $\sum_i|U_i|=r$, with $T\setminus\cup_i U_i$ being connected. Something like this:
The above vertex numbering shows one of many possible orders in which the vertices may be processed, and three successive steps are illustrated. The rule is that before a vertex may be processed, at most one of its neighbours may be unprocessed. Note that the rest of the graph $G\setminus T$ is suppressed in the above picture, but must be taken into account as a “background field” in the inductive calculation as it contributes to the conditional probability of the tree by interactions along edges between $T$ and $G\setminus T$.
There are two quantities that need to be maintained inductively, for each value, $S_b$, of the (single) boundary spin of $U_i$:
- The total $Z$ value for $U_i$ given $S_b$ and $\SC_{G\setminus T}$.
- A choice of spins of the vertices in $U_i$, representing a sample of the distribution $P()$ conditioned on $S_b$ and $\SC_{G\setminus T}$.
These two quantities are easy to maintain inductively when a vertex is added to a $U_i$. To illustrate the process, the example of the above pictures is traced in some detail. For $U_3$ in the first picture there will be, for each of the two possible values, $\pm1$, of $S_7$
- the value $Z_{U_3}$ given by
\[Z_{U_3}(S_7)=\sum_{S_4,S_5,S_6}e^{-\beta H_{U_3}(S_4,S_5,S_6, S_7, \SC_{G\setminus T})}\]
where $H_{U_3}$ contains the contribution to the energy from edges meeting the vertices $4,5,6$ of $U_3$. These edges will join vertices $4,5,6$ to each other, to vertex $7$, and to $G\setminus T$, but they can’t meet the other vertices of $T$ by construction. (This is the definition of $Z_{U_3}$, not the method by which it is calculated.) - A choice of spins $C(S_7)=(S_4,S_5,S_6)$
Moving from the first of the above pictures to the second, the vertex $7$ is incorporated into $U_3$ and $S_8$ is the new boundary vertex. It is seen that $\Delta_H=H_{U’_3}-H_{U_3}$ only depends on $S_7$ and $S_8$ (and spins over $G\setminus T$, which we suppress), not any other vertices of $T$. So
- The new value $Z_{U’_3}$ is easily calculated as
\[Z_{U’_3}(S_8)=\sum_{S_7=\pm1}e^{-\beta\Delta_H(S_7,S_8)}Z_{U_3}(S_7),\] - and the new choice of spins is given by
\[
C'(S_8)=\begin{cases} S_7=-1, (S_4,S_5,S_6)=C(-1)&\text{with prob.}\;Z_{U’_3}(S_8)^{-1}Z_-\\
S_7=+1, (S_4,S_5,S_6)=C(+1)&\text{with prob.}\;Z_{U’_3}(S_8)^{-1}Z_+\\
\end{cases}
\]
where $Z_{\pm}$ are the summands in the above expression for $Z_{U’_3}(S_8)$.
There is a slight extension to the process when the new vertex involves fusing more than one $U_i$, in which case the expression for the new $Z$ will involve products of the old $Z$s. In the example above, in the transition from the second to third picture, $Z_{U’_2}$ is built up of the sum of terms of the form $e^{-\beta\Delta_H}Z_{U_2}Z_{U’_3}$.
In practice this algorithm can be easily and compactly implemented, with the spin choice being represented by a pointer. Some care is needed for a fast implementation since straightforward methods of dealing with the inevitable large numerical range of values involve taking logs, exps and random numbers in the inner loop, or the loop one level outside it, which would be rather slow. These can be avoided, for example by using look-up tables for the exps, rescaling the $Z$-values every now and again, and storing and judiciously reusing random numbers. See Appendix B for details. The code used to produce the results below is available from github, with notes on how to invoke it in Appendix C.
Comparison with existing methods (1)
In a recent nice paper of Katzgraber, Hamze and Andrist, the authors study, inter alia, the critical behaviour of the bimodal ($J_{ij}=\pm1$) spin-glass on the Chimera graph. They make a prediction for its universality class, and test this by showing that the expectation of the Binder ratio as a function of suitably-scaled temperature is independent of the size ($N$) of the underlying graph (figure 2, upper, of Katzgraber).
For the present purpose, this represents an opportunity to compare the subgraph-based sampling methods described here with the standard, though highly tuned, Monte Carlo methods used in the paper.
There are now two levels of probability space: the space of random $J_{ij}$ (known as the “disorder”), and for a given choice of $J_{ij}$, the space of spins, $\SC$ (averaging over which can be referred to as a “thermal average”).
In Katzgraber, they use replica-exchange Monte Carlo (see, e.g., Hukushima et al), which greatly improves convergence in this sort of problem. It is a kind of Monte Carlo meta-method and can be used “on top” of another Monte Carlo method. Katzgraber uses a standard Monte Carlo sweep as the subroutine. For comparison we also use exchange Monte Carlo, but instead base it on top of subgraph-based sampling.
The comparison is with $p=0.5$, $N=512$ of Katzgraber et al.. That is, the graph is the Chimera graph of order 8 and $J_{ij}$ are IID $\pm1$. (For avoidance of doubt, I believe there is a slight typo in their paper where the Hamiltonian is given as $-\sum_{i,j=1}^N J_{ij}S_iS_j$. I’m sure that $-\sum_{i<j}J_{ij}S_iS_j$ was intended, otherwise the edge weights would be trimodal $-1, 0, 1$, not bimodal as they are meant to be.)
The choice of temperatures used here covers a similar range (0.2 – 2) to that specified in table 1 of Katzgraber et al (0.212 – 1.632). The temperature choice here was decided upon by trying to make the exchange probabilities all equal to 0.6, which ended up requiring 25 temperatures for the range 0.2 – 2. (As it turned out, these probabilities got a little higher than that for the bottom two temperatures.)
For a given disorder, Katzgraber takes two independent random spins, $\SC$ and $\SC’$ and defines the spin overlap $q=(1/N)\sum_iS_iS’_i$. Then the quantity of interest is its excess kurtosis, the Binder ratio
\[g_q=\frac{1}{2}\left(3-\frac{⟨q^4⟩}{⟨q^2⟩^2}\right)\]
Here $⟨.⟩$ denotes thermal average and disorder average. (The interest is in the thermal average, as it is trivial to sample $J_{ij}$ for the disorder average.)
At very low temperatures, assuming the ground-state is unique-ish, you might expect $q$ to take the values close to $+1$ or $-1$ according to whether $\SC’$ happened to hit the same ground state as $\SC$ or its negation. This would make $g_q$ close to 1. At high temperatures, $S_i$ will be independently $\pm1$, which makes $q\approx2(1/N)B(N,1/2)-1\approx N(0,1/N)$ and so $g_q\approx0$. This is what we see, with $g_q$ decreasing smoothly at intermediate temperatures.
The subgraph-based method used here was the Gibbs analogue of the method known as strategy 3 here. It uses the collapsed Chimera graph of 128 aggregate vertices with 16 spin-states each, and there is a prescribed set of 32 trees.
The relevant question in this simulation is how many steps the Monte Carlo takes to equilibrate, i.e., for the associated Markov chain to reach something close to an equilibrium distribution. For each disorder, a pair of spin states is initialised uniformly at random at each temperature. Then $R$ exchange Monte Carlo steps are performed, which involves $R$ tree-steps for each temperature. At that point the states are deemed to be in thermal equilibrium and they are run for $R$ further Monte Carlo steps during which the thermal averages $⟨q^2⟩$ and $⟨q^4⟩$ are evaluated.
This whole process (including disorder average) was repeated for
increasing values of $R$, ($250, 500, 1000, 2000, \ldots$) until the final values of $g_q$ appeared to stabilise within acceptable error margins (up to 0.01 absolute). It turned out this required $R=1000$, i.e., 1000 tree-steps. As far as I can gather, in terms of elementary operations, such a tree-step should be approximately comparable to about 100 standard Monte Carlo sweeps, when both are optimised, making about 100,000 sweep-equivalents for equilibration. This compares with $2^{21}$, or about 2,000,000 sweeps required in Katzgraber, so it appears there might be a significant advantage with this measure. (However, see the discussion below.)
The above graph (left) shows the results of the method described here applied to one of Katzgraber’s example problems: N=512, bimodal. The right-hand graph shows the errors artificially amplified by a factor of 10, since they are too small to be seen clearly at their normal size. As can be seen, there is good agreement with Katzgraber’s results (taken from his Fig.2, p.3). In both cases 5964 disorders were used. These graphs serve as a check that the method described here is functioning correctly. They do not compare performance, because that is hidden in the number of equilibration steps required for each disorder.
Returning to the performance comparison, as alluded to above, there are some problems with this comparison: (i) we’re guessing an equivalence between sweeps and tree-step; (ii) we don’t know for sure that the accuracy (used as a termination criterion) is comparable or if Katzgraber’s equilibration criterion was more or less severe; (iii) we don’t know if Katzgraber’s exchange Monte Carlo is tuned in the same way (well it won’t be exactly, but how much does that matter?).
What we’d really like to do is construct a comparison where as little as possible is changed between the method based on single-site updates and that based on subgraph updates. We could then compare the equilibration times required for a given final accuracy, and we could be sure about the basis of comparison between the two kinds of times. We could also see how both methods scale. These considerations inform the next comparison below.
Comparison with existing methods (2)
Due to the possible shortcomings of the above comparison, it seems sensible to run a much more careful and controlled comparison of subgraph-update-based sampling with site-update-based sampling. We choose a particular problem, and try to simulate it using versions of these two methods that are as near-identical as possible.
Notation:
- SGU shall mean subgraph-update-based sampling, as described above. In the example below, the subgraph will be a tree.
- SSU shall mean single-site-update-based sampling. In other words, the traditional method whereby each spin is updated depending only on its immediate neighbours. (So, to be clear, this category includes multispin methods, such as described in Isakov et al, even though they operate on more than one spin at a time.)
The particular problem set chosen is that of the Chimera graphs of sizes 6×6, 8×8, 10×10 and 12×12 (number of spins 288, 512, 800, 1152 respectively). For these graphs, the couplings on the edges are chosen uniformly from the 14 possibilities $\pm1, \pm2, \ldots \pm7$, there are no external fields and each spin can be $+1$ or $-1$. This mimics the “range 7” (harder) example set from Rønnow et al.
In fact the actual $J_{ij}$ used were half the values stated above, i.e., chosen from $\pm\frac12, \pm\frac22, \ldots, \pm\frac72$.
This factor of $\frac12$ means that the energy quantum, the smallest possible change in energy due to a change in $S_i$ for a given disorder $J_{ij}$, is $1$ rather than $2$. Of course this scaling factor doesn’t fundamentally change anything because you can always rescale $\beta$, but stating it allows the reader to interpret the numbers in what follows, where we mention specific values of $\beta$ and maximum allowable absolute errors in energy.
To make a good comparison we would ideally compare the best SGU-based method against the best SSU-based method, in so far as that makes sense. Of course, we don’t necessarily know the best methods, but we can use an approximation to the best-known methods. Here we use exchange Monte Carlo in both cases, tuned with slightly different parameters. The temperature sets used in both cases are fixed. Probably this could be improved by using an adaptive method, whereby the temperature set evolves according to the needs of the problem instance. This would be something to try in a further investigation.
Before describing the exchange Monte Carlo setup, it should be mentioned that if one attempts to do without it and equilibrate by simply doing updates at a fixed (low) temperature, then convergence is extremely slow for both SGU and SSU. SSU behaves particularly poorly, scaling much worse than SGU and having a higher effective critical temperature, below which problems are effectively intractable. Of course, this isn’t a big surprise, and it is more interesting to compare better methods, such as the exchange Monte Carlo considered here.
Exchange Monte Carlo setup
For each problem size, (6×6, 8×8, 10×10, 12×12), we choose 100 random disorders, and for each of these we determine the time required for the SGU- and SSU-based methods to have equilibrated to a reasonable accuracy. The principal statistic comparing SGU with SSU is simply the respective totals of these times, though other comparisons are considered.
It may be argued that fixing the required accuracy for each disorder does not match the likely underlying objective, which is to get a good expectation over disorders. It may be, for example, that it is more efficient to spend less time trying to equilibrate the most difficult disorders, allowing them to be less accurate, on the grounds that they will be compensated for by the majority of more accurate disorders. I do not believe this kind of optimisation would make a large difference because (a) by the nature of the exponential decay of the subleading eigenvalue of the Monte Carlo iteration, there should be a fairly sharp change from an inaccurate result to an accurate one as the number of equilibration steps crosses the characteristic threshold for that particular disorder. That means that you can’t afford to use very much less than the proper number of equilibration steps, otherwise the results would be very inaccurate and pollute all the others, and (b) scrutinising the actual results, though there are certain disorders that are considerably harder than the others, these still don’t represent an overwhelming proportion of the total equilibration steps expended over all 100 disorders.
The set of temperatures is determined by fixing an effective absolute zero at $\beta=\beta^*=20$ and aiming for a constant transition acceptance rate, $0.25$, between neighbouring $\beta$s. The value $\beta=20$ is sufficiently large (for the class of disorders considered here) that there is only a negligible chance of a state at this temperature not being in the ground state. This article on Exchange Monte Carlo by Katzgraber describes such a constant acceptance rate as in general “not too bad, but not optimal”. The temperature set is determined in advance of the actual simulation using a method based on $E$ and $C_V$. The acceptance rates during actual simulations match the target value reasonably well, almost always lying between $0.2$ and $0.3$.
Having fixed the top $\beta$, and having fixed the spacing between the $\beta$s by deciding on the transition acceptance rate, the remaining parameter to be decided on is the minimum $\beta$, or equivalently the number of temperatures. This is determined by trying a range of different possible values on a trial set of disorders and seeing which number of temperatures requires the fewest steps on average to equilibrate. It is found that SGU requires slightly fewer temperatures than SSU for a given problem size, thus SSU will end up with some “bonus information” about high temperature states. However, it is not given credit for this and all that is compared is the time required to estimate the ground state energy (and so presumably low temperature states) to a given accuracy. The justification for this is that it is assumed that the main interest lies in colder temperatures (strong coupling, high $\beta$), since higher temperatures are relatively easy to simulate using any sensible method. Full details of the sets of temperatures used are in Appendix A.
Determination of Equilibration
Equilibration is determined using the following method. Starting from a set of uniformly random states at each temperature, $n$ Monte Carlo steps are performed. Each such step consists of doing a single-temperature step at each temperature and then attempting to perform exchanges. After that, $n$ more steps are performed during which any observables can be measured and the energy at $\beta^*$ is averaged to make a sample value $E_1$. This whole process is repeated $25$ times starting each time from random initial states, and the average value $E=(E_1+\ldots+E_{25})/25$ is formed. If this is within a chosen threshold, taken to be $0.2$ here, of the smallest energy observed at any point so far, $E_\text{min}$, then it is deemed that $n$ steps are sufficient for equilibration. (It is possible to simultaneously test for $m$ being sufficiently large for all $m<n$ in not much more time than it takes just to test $n$ itself.)
This procedure relies on a number of assumptions: (1) that $E_\text{min}$ is the actual ground state energy. Empirical evidence strongly suggests that it is, but for the purposes of comparison it may not matter too much if it isn’t, provided that the same value of $E_\text{min}$ is used for both SSU and SGU, and this can be checked. (2) that $\beta^*$ is the hardest value of $\beta$ to equilibrate and the states at other temperatures will have equilibrated if the state at $\beta^*$ has. If this turns out not to be an accurate assumption, then at least SSU and SGU are being compared on an equal basis. In any case, it is assumed that the lower temperatures are the objects of interest and the higher temperatures are a means to this end. (3) That the number of restarts ($25$) is sufficiently large to get a good estimate of the required number of equilibration steps. The number $25$ was chosen to limit the overall computer time to a week or so, but in fact it is on the small side and there is a noticeable variance in the estimate for the required number of equilibration steps. However, when the estimate is averaged over 100 disorders, this variance becomes tolerably small.
Timing
Rather than just look at the wall time, we break this down into the product of the number of exchange Monte Carlo steps and the time per EMC-step. An EMC-step includes a sweep at each temperature and the attempting to do an exchange for each pair of neighbouring temperatures. The time for such a step is amenable to change if low-level spin-flip algorithms are optimised, if the computing device changes, or if there are other processes running on the computer. Separating out the wall time into a platform-independent step count and a simple low-level timing frees us up to do the step-counting runs in any environment with code at different stages of optimisation, means that we only have to measure the timings once under controlled conditions, and enables us to consider the effects of further optimisations. It is hoped that the ratio $t_\text{SGU}/t_\text{SSU}$ of times per step for the respective methods is fairly robust and won’t change too much over different platforms, though it is a significant assumption that this ratio is stable under further optimisation of the respective low-level spin-flip algorithms. The times $t_\text{SGU}$ and $t_\text{SSU}$ were measured on a test computer (an Intel Core i7-3930K CPU @ 3.20GHz) where both spin-flip algorithms (SGU, SSU) underwent a similar degree of optimisation and work in an analogous way as far as it makes sense for them to do so. The SSU spin-flip is intrinsically optimisable to a greater degree than the SGU spin-flip because all arithmetic can be eliminated: for a given $\beta$, it only requires a simple lookup of the neighbours of a spin to get the probability that the new spin should be up or down, whereas for SGU it seems to be actually necessary to accumulate Z-values. The SGU arithmetic can, however, be reduced to a few simple multiplications, additions and divisions, with no exponentials or logarithms necessary in the inner loops, due to a fortunate way in which the required precision is locally bounded: see Appendix B for further details. The level of optimisation used here is not as great as with the fastest multispin methods described in Isakov et al, though the code is general enough to work just as well with arbitrary weights $J_{ij}$. In the language of Isakov et al, the SSU code used here achieves about $0.16$ spin-flips per nanosecond on a single thread.
For the implementation and computer used, the timings using a single core are as follows.
$nt$ denotes the number of temperatures used.
Chimera size | Number of spins | $t_\text{SSU}/\mu\text{s}$ | $nt_\text{SSU}$ | $t_\text{SGU}/\mu\text{s}$ | $nt_\text{SGU}$ | $t_\text{SGU}/t_\text{SSU}$ |
---|---|---|---|---|---|---|
$6\times6$ | $288$ | $14.6$ | $10$ | $184$ | $7$ | $12.6$ |
$8\times8$ | $512$ | $35.9$ | $12$ | $449$ | $10$ | $12.5$ |
$10\times10$ | $800$ | $71.7$ | $15$ | $1020$ | $14$ | $14.2$ |
$12\times12$ | $1152$ | $143$ | $20$ | $1860$ | $17$ | $13.0$ |
$14\times14$ | $1568$ | $229$ | $23$ | $2800$ | $18$ | $12.2$ |
Results
Chimera size | Number of spins | SSU | SGU | Advantage of SGU over SSU | ||||
---|---|---|---|---|---|---|---|---|
$n_\text{SSU}$ | $t_\text{SSU}/\mu\text{s}$ | $t^\text{eq}_\text{SSU}/\text{s}$ | $n_\text{SGU}$ | $t_\text{SGU}/\mu\text{s}$ | $t^\text{eq}_\text{SGU}/\text{s}$ | $t^\text{eq}_\text{SSU}/t^\text{eq}_\text{SGU}$ | ||
$6\times6$ | $288$ | $3.76\times10^3$ | $14.6$ | $0.0549$ | $68.5$ | $184$ | $0.0126$ | $4.36$ |
$8\times8$ | $512$ | $2.24\times10^4$ | $35.9$ | $0.804$ | $303$ | $449$ | $0.136$ | $5.91$ |
$10\times10$ | $800$ | $1.49\times10^5$ | $71.7$ | $10.7$ | $1.24\times10^3$ | $1020$ | $1.26$ | $8.44$ |
$12\times12$ | $1152$ | $6.00\times10^5$ | $143$ | $85.8$ | $3.25\times10^3$ | $1860$ | $6.05$ | $14.2$ |
$14\times14$ | $1568$ | TBC | $229$ | TBC | TBC | $2800$ | TBC | TBC |
In the above table, $n_\text{SSU}$ and $n_\text{SGU}$ denote the number of equilibration steps required.
Discussion
The above results show some evidence for an advantage of using subgraph-based methods over traditional spin-flip methods, and also some evidence that this advantage increases with problem size. How likely is this to be a true result?
It is possible that a constant factor of this advantage could be erased by improvements to the low-level SSU spin-flip that are not applicable to the SGU case. On the other hand, if the problem were generalised slightly, for example by using more than two spin states, then the extra complexity may hit SSU harder than SGU.
It is possible that a dynamically adaptive method of choosing temperatures, as mentioned here, would help SSU more than SGU, because it might be especially helpful with the difficult disorders for which SGU has a greater advantage over SSU. It is hard to make a confident guess at the differential advantage, and so this needs to be tried out.
On the other hand, the Chimera graph is in fact a relatively easy graph to simulate because it is sparse and locally-connected. Since the advantage of SGU over SSU appears to be larger with the more difficult disorders and larger problem sizes, it is possible that a more difficult graph altogether would show up the advantage of SGU over SSU considerably more strongly, though this can’t be taken for granted as the subgraphs used would also become more restricted. Again, more work is required to determine this.
Appendix A – temperature sets used
Chimera size | $\beta$s used by SSU only | $\beta$s used by SSU and SGU |
---|---|---|
$6\times6$ | 0.256 0.296 0.346 | 0.414 0.507 0.645 0.882 1.375 2.598 20.000 |
$8\times8$ | 0.285 0.322 | 0.363 0.414 0.478 0.559 0.669 0.840 1.121 1.646 2.894 20.000 |
$10\times10$ | 0.310 | 0.341 0.376 0.419 0.472 0.539 0.622 0.736 0.892 1.134 1.531 2.222 3.507 6.166 20.000 |
$12\times12$ | 0.250 0.269 0.289 | 0.310 0.337 0.367 0.399 0.439 0.489 0.552 0.630 0.727 0.850 1.018 1.264 1.626 2.169 3.110 4.850 20.000 |
$14\times14$ | 0.244 0.259 0.275 0.292 0.310 | 0.333 0.358 0.385 0.419 0.455 0.495 0.545 0.608 0.677 0.763 0.871 1.006 1.190 1.442 1.812 2.417 3.724 20.000 |
Appendix B – implementation details
(To be written)
Appendix C – use of program to produce the above results
(To be written)
Hello!
Pls contact me about this. I am very intrigued about the method you described above. Maybe we can chat at some point and see if your approach can be applied to other problems in spin glasses.
Cheers, h.
Please refer to the following publications for background on the algorithms.
From Fields to Trees. Firas Hamze and Nando de Freitas. In Uncertainty in Artificial Intelligence (UAI). Pages 243–250. Arlington‚ Virginia. 2004. AUAI Press.
Information Theory Tools to Rank MCMC Algorithms on Probabilistic Graphical Models. Firas Hamze‚ Jean−Noel Rivasseau and Nando de Freitas. In Information Theory and Applications Workshop (ITA). 2006.
Thanks for your comment. Helmut told me about your and Firas’ work in this area and I’m working on an update to incorporate this and other previous work.