This is an expanded version of a set of notes I used to give a talk on 17th January 2001. It has been edited to include some web references and to fill out what were originally just reminders to myself.
Thanks to Mark Wainwright who rewrote an earlier article of mine. I think his presentation was an improvement, and I've borrowed from it for this talk.
Talk about when the puzzle was introduced, and by whom. (see early Telegraph article) I started working on it in November 1999. Oliver Riordan joined me in January 2000. We solved it in May 2000 with the help of two ordinary computers and some luck.
See my eternity page. Lots of links are available at Brendan Owen's page.
The problem is to put these into this. Pieces can be reflected (turned over). Describe what the real puzzle looks like physically.
Discrete positioning (although finer than you might think - so called "against-the-grain" placings are possible, but not necessarily useful to look for). Can rotate pieces through 60 degrees, so they have 12 orientations.
Remember to say that the problem is to find a way to fit the pieces in the grid.
Here's a question to think about. If you had a solution, how might you convince someone else you had a solution without telling him what it was (or giving him enough information to reconstruct it)? Günter Stertenbrink - the only other known solver of this puzzle - came up with a nice answer to this question. We'll return to this at the end of the talk.
Refer to Brendan Owen's page for a description of the superset. All pieces have the same area (of 6 full equilateral triangles, or 12 half-triangles).
Mention why it's an easy problem to create the puzzle with a solution - but defer more exact explanation 'til later. (In fact his solution is a red herring.)
If you have 770 pieces to fit in a board whose area is 209 pieces then it will be easy to do so (leaving 561 left over of course). The point is that you can more-or-less just put pieces down one-by-one, and because you have so many to choose from, you won't get very stuck (if you do reasonable things). We will come across this fact again several times. If you have 209 pieces to fit in a board whose area is 209 pieces, then it will be much harder because towards the end you will have very little choice as to which piece to use.
At the same time as the puzzle was first launched, an internet discussion group was set up to talk about the problem and exchange ideas. You can use it as a mailing list (ie.[...] I'll tend to call it a "mailing list") or as a web-based group (ie.[...]). If the latter, then you can be a pure "lurker" - read and never write - if you want to. I must admit I was a lurker until after we'd solved it! To be fair on us, we didn't really use the ideas from the list, with the exception that one or two things that Patrick Hamlyn said may well have helped us. For one thing, he gave an account of his promising-looking progress at around the time I was starting to think about the puzzle, which lead me to suspect that it was doable and not yet done (something I hadn't previously thought likely).
Define "grid"(="board", "region") to be a two dimensional region which you are interested in tiling. There is no point considering regions not made out of half-equilateral triangles.
Define "size" of a region to be its area divided by the area of a piece. (i.e., the number of pieces that would tile it if it were tilable). So the above example is a region of size 4.
Define "state" = a region together with a set of pieces (which may or may not tile the region). Also allow states with more pieces than size of the region, in which case we're obviously looking to fill the region leaving some pieces left over, and there may be more than one solution.
(Slide used in talk: Postscript, pdf)
The first thing one might think of doing is an exhaustive search, that is systematically go through every possible way of solving a state (i.e., tiling the region). Although this turns out to be a bad idea to apply to the whole 209-piece puzzle, it will turn out to be useful to apply it to smaller versions.
How do you do an exhaustive search? Here is an algorithm (or rather, class of algorithms).
For any state, if the region can be tiled then every empty space on the board must be covered at some point. Let us assume that given a state we have a mechanism for choosing a special site. A site is defined as an unfilled space on the board which cannot be partially covered by a piece. That is, remembering that the choice for placing pieces is discrete, a site is an empty area which is either wholly covered or wholly disjoint from any legal piece placement. Let us demand that this site be covered by the next piece placed. If you consider the solutions which arise from each way of covering that site, then no solution will be missed (because that empty space must be eventually covered in any solution) and no solution will be counted twice (since each piece placement at the current site must give rise to a disjoint set of solutions). So by looping over possible placements and repeating this process, we exhaustively find all solutions once only. We get a tree whose nodes can be labelled with the information about what you have placed sofar. [Draw examples of sites on the previous state picture.] Note, by the way, that this algorithm still works fine if you have more pieces than the size of the region.
So a choice of site for every state gives rise to a method of exhaustive search. Although one could consider other methods of exhaustive search (such as picking the next piece to be placed and branching over where to put it), I believe for our present problem (Eternity-type pieces) the best method is of the described form, namely to choose a site and branch over ways of placing pieces to cover it.
What makes one exhaustive search better than another? ...time (number of nodes)... For the above form of exhaustive search (...) the only thing we are able to vary is the choice of site and the program implementation (which I don't propose to talk about here, although it makes quite a big difference). Now it is intuitively obvious that a good choice of site is one where there are least possible ways to place a piece to cover that site. If there is an an impossibility somewhere in the region, it would be best to find out about it as soon as possible rather than first waste a lot of time working out how to tile some easier part of the region. Many people have suggested this sort of method for this and similar search problems (you could wrote a program to find anagrams on this basis). It might be amenable to improvement though - I'll mention this in a couple of minutes.
A simple (and rough) analysis of the above method might go like this. To help us analyse the algorithm, let's pretend there is a typical branching factor (i.e., typical number of ways of covering a site) which depends only on the "depth" = number of placed pieces. If the branching factor when there are n pieces in place is bn then the number of nodes searched is N=1+b0(1+b1(1+b2(1+...))). Note that the product of bi is fixed, being the number of solutions. In the middle of that expression is bn(K+bn+r.L) for some K,L>0 which depend on the other bi's. If you could vary bn and bn+r keeping bn.bn+r and the other bi fixed (corresponding to branching over two sites in the opposite order) , then since bn(K+bn+r.L)=K.bn+constant, to minimise N you would reduce bn (increase bn+r) as much as you could. So it seems that you would always like to choose a site which defers the larger branching factors as long as possible, agreeing with our common sense.
By the way, this is a typical sort of argument made here. Can't really prove anything rigorously by this kind of thought experiment, but can make an idealised model which is probably not too far off and analyse that.
This bi model's inaccuracy is that the search tree isn't really that uniform. Obviously the search tree after one piece has been placed over a particular site will be different from the search tree after a different piece has been placed over that site, and the sites don't really act independently. Does this matter? Well maybe. Here is one way in which the above algorithm (which would be optimal if the bi model made complete sense) might be improved. Say you are doing an exhaustive search and are trying to decide whether to branch over site A or site B. Site A has 3 ways to cover it and site B has 4. So it looks as though site A is a better bet. But what if after each of the 3 ways to cover site A, there are many ways to continue, whereas after 2 of the ways to cover site B you are immediately stuck (i.e., there is a site which can't be covered)? Then it looks as though B might be a better bet by 2 to 3. Note that it wasn't free to find out this more refined information because we had to spend time investigating what would happen after tiling A and B, but nevertheless it might sometimes be profitable to spend some time reconnoitring the search tree to find out more accurately what the "true" branching factor is at a given site. Now I believe there is a good definition of what this branching factor is (as a function of the depth, d, which you are prepared to reconnoitre) and it _isn't_ just the result of doing a d-ply search and counting up the leaf nodes. It turns out to be a kind of multiplicative "thermal limit" - you try to work out by what factor the depth d tree rooted on the site in question will increase the total number of nodes searched "on average". Unfortunately this method does not turn out to make a large improvement with Eternity, because it is often too expensive to do a depth-2 reconnoitre, so I won't talk about it any more here. I think it might conceivably be of interest for other search problems though. (A more detailed account can be found here.)
Consider a further simplification of the bi model. Imagine that each piece has a typical number, t, of ways of tiling each site (if this made sense, t would be a small positive number around about 0.03). Then bi, the number of ways to cover a site after having placed i pieces, would be (209-i)t, or for a N-piece puzzle (trying to put N-pieces into a size N region) with a similarly behaved set of pieces, (N-i)t. So the total number of solutions would be the product of this over i=0,1,...,(N-1), i.e., N!tN. Obviously there will be an N for which this becomes larger than 1. So it looks like there should be a critical size of puzzle above which a solution spontaneously comes into existence. It also looks like there should be many solutions to large puzzles.
That was meant to be a plausibility argument as to why you should expect a critical size. This model is really too crude to use the numbers which come out of it. One quantitative aspect of it that I don't exactly believe is that it would predict that the time taken to exhaust the critical size problem is about 1012 operations, but I believe it to be much worse than this (and probably well outside the range of present computers). But on the other hand something would have to be majorly funny if there weren't a critical size, because it is hard to imagine bi differs so much from (N-i)t that whatever N!tN should be doesn't tend to infinity. So let's forget about that model now, and call the critical size c.
We did actually estimate c, and at the same time the total number of solutions to the full 209 piece problem (in a way not dependent on the preceding model, but still not foolproof). I think we put c at about 70 or 80, and the total number of solutions to the full Eternity at about 1095. (This may seem like a lot, but they are very sparse. Many more "non-solutions", whatever that means.)
Now there are two sorts of puzzle. Those for which N<c and those (like Eternity) for which N>c.
First consider the former case, N<c. This is overconstrained. We don't expect any "spontaneous" solutions - but of course the problem setter can "plant" an artificial one. I guess the only method to solve this is exhaustive search.
How did Christopher Monckton arrive at a 209 piece puzzle? During our work on the puzzle, it was very unclear what he intended about various aspects of the puzzle (although it doesn't really affect your approach to solving). Maybe now it is a little clearer. He probably had a computer program which exhaustively searched small puzzles. If he used Eternity-type pieces then he would have used puzzles considerably below the critical size. He would have exhausted them with his program and noticed that the running time increased very rapidly as he increased N. Perhaps he found N=50 was already impossible for him to exhaust, and so he reasoned that N=209 would be way outside anyone's reach. I think that he intended the puzzle to be impossible in its first year, but he was going to make it easier year by year until someone solved it. (Digression on the unusual rules and the optional hint pieces. A hint piece is the position of a piece in CM's solution. He was going to release more hints each year it remained unsolved. By releasing enough hints he can eventually make the puzzle easy, although the six hints released in the first year actually make the puzzle more difficult because six isn't enough.)
Anyway, this is where it gets interesting, because you can't extrapolate a difficulty estimate beyond N=c. For N>c it actually starts to get easier. If you had to completely exhaust (not stopping when you've found a solution), then it would carry on getting harder and harder, and for N=209 you would have to do at least 1095 work. But of course when N>c you expect many solutions so you don't need to exhaust. It is easy to see why. You could quickly put the first N-c pieces down more-or-less any old how and then you'd be left with a size c problem for which you'd expect one solution, so you could go and solve it. This shows that the time taken to find one solution to a N>c puzzle (e.g. Eternity) by simple exhaustive search (stopping when you've found a solution) should be the same as the time taken to exhaust a N=c puzzle. (As mentioned above, I believe this to be a lot of time. We didn't estimate it, but as a guesstimate I would think at least 1020 operations, and maybe many more.)
In this way we can see that for N>c the problem certainly doesn't get any harder. But of course you can do better than that. You can use your freedom in the first N-c piece placements, to set up a favourable state with c pieces left. (Actually at this point we should stop talking about "c", because its value will shift if you start paying attention to what pieces you are putting down to begin with since you will be left with an atypical set of pieces.) What is a favourable state? Well, one for which your program will expect to find solutions at a decent rate (such as 10-7 Hz...). Actually I don't think it's useful to estimate the rate of state directly. More about this later.
At this point an analogy might be useful. Compare trying to fit pieces into a grid with fitting packages into the boot of a car. Some of the packages are awkwardly shaped, and you usually put those ones in firsst, keeping the nicer shapes 'til later when you'll have less room for manoeuvre. You will also try to ensure that, at each stage, the space that's left is as "sensible" a shape as you can arrange (for example fragmented=bad).
The situation is also quite similar to the game Tetris, if you've ever played that. Everyone knows that the wiggly pieces ("S" and "Z") are much more awkward than the long thin one or the "T". Also some board configurations are bad (e.g., having lots of spikes) and others good (e.g., convex boundary).
So now the approximate form of the algorithm is starting to emerge. It looks as though there should be two phases. The first phase will work hard to choose favourable states of a certain size (in fact about 30) as fast and as well as possible and the second phase will exhaust them. It is clear that for states of less than a certain size, it will pay to completely exhaust them (as opposed to selectively search them, pruning off branches) because the time taken to exhaust them will be much less than the time taken to create them. (Actually there is some loss involved in switching abruptly from pruned search to exhaust, but not a ridiculous amount. We found our actual solution by that method, so we didn't pay that much serious attention to improving it!). A favourable state will be one with good(=tilable) pieces left and a nice boundary of the region. So we need to know how to put numbers to these ideas. What follows next is an account of the basic method we used. We refined it in various ways (for example to take into account possible second order effects of good pieces tending to work better with other good pieces), but it is not necessary to talk about these modifications to get the idea of what we did.
First I need to mention a fundamental problem that you always come up against when trying to make estimates of how solvable things are. Ideally we would like to try an experiment something like this. Sample many states of size 100. Exhaust them all. See what kind of pieces turn up in solutions - presumably these are the better (more tilable) pieces. Unfortunately, as discussed above, exhausting a size 100 region is harder than solving the original problem of Eternity. If we instead try states of size 40 say, then we may be able to exhaust them, but we won't find any solutions at all, because the chance of a state of size 40 of having a solution is very small indeed. There is no happy medium size.
So I think any method of prediciting how good a state is (how likely it is to be solvable, say) will rely on some form of extrapolation (unless you are capable of solving the original Eternity problem many times over). This introduces the problem that you don't know whether your extrapolation process is accurate, or whether it is producing biased results for some reason. We took quite a lot of care to try to reduce (and monitor) this problem, but we were never really sure that the numbers coming out of the model were really accurate. Of course there is the hope to fall back on that even if your model isn't accurately telling you how well you're doing, it still may be directing your search to do the right thing (for example this would be true if all its probability estimates were out by the same constant factor).
We chose to extrapolate by adding extra pieces. That is, we exhausted states which had more pieces than the size of the region. For example, say you use 90 pieces in a region of size 24. Suppose that you expected 10-17 solutions for a random state with 24 pieces and region size 24. Obviously then, sampling states with 24 pieces is out of the question, because you'd have to do this about 1017 times before you found a single solvable state. But (90 choose 24) is about 4x1021, so you'd expect about 40000 solutions to the 90 piece state. Well actually you'd expect about 40000/4=10000 solutions due to parity considerations, but we don't need to worry about that. (In fact I don't think you ever really need to worry about parity when solving Eternity, clever though it is. A description of 8-fold parity can be found on Ed Pegg's page. There has also been a lot written on this and other clever non rotation/reflection-invariant "parities" on the Eternity mailing list by David Eddy and others.)
Of course there has to be a cost for this benefit. Exhausting 90 pieces in a region of size 24 will take longer than exhausting 24 pieces in a region of size 24. But it won't take anything like 4x1021 times longer. If think about it, it is clear that doing a 90 piece exhaust is far more efficient than doing the (90 choose 24) 24 piece exhausts using every possible 24 piece subset of 90. [Illustrate this.] It turns out we can now find a "happy medium" where it doesn't take forever to exhaust and we do expect solutions. The sort of numbers we used were letting an exhaust take of the order of 10 minutes to find (up to) thousands of solutions. This provides one data point. This was then repeated a few thousand times on two computers over the course of one or two weeks to create enough data to extract the model parameters.
The other nice thing about extrapolating by adding excess pieces, is that the model we chose is still exactly solvable (that is one can calculate the likelihood exactly). This makes it possible to maximise the likelihood over the parameters.
This is a description of the mathematical process we used to turn the data from the above exhausts into a probability that a given region (of a certain size) can be tiled with a given set of pieces. The basic idea was to have a number, pi, for each piece (so i=1,...,209. Actually, we usually work with li, their logarithms.). For a given state of the appropriate size, you multiply together the pi to arrive at the chance that those pieces tile the region. You can also add other parameters, mj, which take into account the effect of the region shape.
(These are four forms of the same thing.)
(If this comes out as gibberish then try one of the lower three.)
(This is likely to work if you use Windows, and may work anyway.)
(This is likely to work if you're using a version of Unix. You may need to use ghostview to read it.)
[During the talk I just talked about a simplified model with no boundary terms and wB=1. P(R,X)=exp(sum of li), and the likelihood was L=(sum over i of aili)-(sum over B of Ak(BP)).]
The model would probably be useless if the likelihood, which is defined a sum of zillions of terms, really needed to be evaluated by adding up these terms. Fortunately the potentially problematic term, Ak(BP), can be evaluated in a different way. This explains the earlier remark about the extrapolation process leading to a calculable likelihood. [Mention that the analogue of Ak(BP) for interaction terms is not calculable, so much harder to deal with this model.]
After a week or two, we had enough data to make the parameter estimates reliable. We tested this by evaluating the parameters using only one half of the data, then using only the other half of the data. We were satisfied we had enough data when the two results were similar.
This is a graph of li (tilability) against rank obtained
from some variant of the above process. The worst piece is a little
hard to see because it lies on the y-axis at (0, -2.8). This is piece
166 (coined "the beast piece" by Patrick Hamlyn who also noticed it
was bad). It is a lot worse than the second worse piece. On the other
side, the best piece at (208, 0.66) is piece 35 known less
imaginatively to Oliver and me as the "superpiece", and it is much
better than the second best piece. It seems there are four
more-or-less equal-second best pieces: these are pieces 51-54 and are
very similar looking.
Here are the 12 best pieces and the 5 worst according to (some variant of) this method.
You may notice that the best pieces tend to be more convex and smooth than the worst pieces which are noticeably wiggly. This accords with our intuition, but intuition only takes us so far, and we really do need to ask the computer how good the pieces were. For example, pieces 120,124 and 125 are also convex, but all are ranked about 30th. It would be a significant loss to classify these pieces as second best.
We now have a probability of being able to tile a size 24 region, but we are going to need the probabilities of being able to tile other (greater) size regions. (Of course we would actually prefer to know the rate of finding solutions, i.e., probability of solution divided by time to search. But rate estimates are likely to be unstable and it doesn't cost too much to ignore the difference, because we hope the expected rate is approximately an increasing function of the probability, so maximising probability is approximately the same as maximising rate.) So we would like a set of model parameters for sizes greater than 24. Unfortunately we can't use the same method as in the above section "The basic model". We couldn't even afford to spend (209-24)x2 weeks collecting the data, but of course it would take a lot longer than that using the above method because it takes longer to find solutions for larger sizes.
Instead, we used a kind of iterative method to try to find parameters for sizes>24. This involved mimicking the procedure which would be used for an actual run which tried to find solutions (see next paragraph). These runs are tree-like: states of size n have child states (some of which will be kept) which are size n-1. We can then adjust the size n parameters to try to get them to predict (in the sense of least squares, say) the probability given by the size n-1 model on its children (or you can do it for grandchildren etc.). The procedure we were using was not all that good and the resulting parameters are almost certainly not as good as they could be. This was one of the things we were trying to improve before we were so rudely interrupted by an actual solution, which occurred earlier than we expected. Our method involved lots of ad-hoc things to make it possible to get the parameters, like insisting that they are a quadratic function of the size-24 parameters. Oliver and I both thought that the difference between the worst 100 pieces would be less at the start (i.e., early on - with few pieces placed - it's very bad to squander a good piece but it maybe doesn't matter too much that you use a middle rank piece instead of one of the very worst) and this was (arguably) borne out by the paramters the computer came up with.
Having decided on model parameters for all sizes greater than 24, you have a score for a state. The breadth-first-search (BFS) method we used then works like this. You start off with an empty board and all 209 pieces unused. This is the top state (level 0). You then decide on the most constrained site and generate all the piece placements which tile that site. You then take the top 10 scoring nodes of these, and throw away the others and also throw away the top node (though you will have to remember your family history if you want to be able to resconstruct a solution!). This gives you 10 level 1 nodes. At the next stage you generate all of the children of all of the level 1 nodes, take the 10 most highly scoring according to the level 2 score and throw away the rest, and so on. You can improve the performance by doing extra lookahead and increasing 10 (the `width'). We were using lookahead about 2 (but with large lookup tables applied to the boundary to make sure it was plausibly tilable) and width about 10000 on the run that found the solution, although we have since tried widths up to 1000000 and these are better. This method was described at various times by James Kittock, Grant Peacock, and Patrick Hamlyn on the mailing list. In fact it's a standard breadth-first search algorithm known apparently as "beamsearch", but I find it interesting that it works very well for this puzzle.
If you make some (untrue, but maybe not very untrue) assumptions, then it looks like beamsearch is optimal. I sort of imagined the situation is something like the following idealisation. There is a big tree with some typical branching factor at each depth, decreasing as depth increases. (The nodes correspond to states. The root node corresponds to the empty board with all pieces available. The edges from a state correspond to legal piece placements at the site of that state.) Every node has a score associated with it. The root gets score 0, and every time you go down an edge, you add an independent random number from some depth-dependent distribution to get the score of the child node. The objective is to get to depth 178 (say) with high-scoring nodes as fast as possible.
You may ask, how can you think of these scores as random, when they are fixed once and for all? The thing is, there is plenty of search space to keep you happy, and changing some move early on to something slightly suboptimal gives you a new subtree to search, and probably doesn't harm your prospects much. The more serious assumption in the above model is that of independence. It may be that the prospects of two children of a given node have a correlation over and above that which comes from their score sharing the common summand of their parents' score. The scores of the supposedly independently behaving child-subtrees may be correlated because, for example, somewhere else on the board there is some impossibility. However, as we do this breadth-first search, we take care that all of the boundary is locally tilable (to some depth), and I think different branches are then (dependent, but) sufficiently independent to make this model a worthwhile thought experiment.
If you make a further (also untrue) assumption that the search algorithm's state always consists of a set of nodes at a given depth, then it is clear that the best algorithm is some form of beamsearch. All you can do is expand (calculate the children of) some subset of those nodes, and because of independence, the next program-state may as well consist of the best n child nodes for some n. If you fix the numbers of nodes you keep at each level, (a0=1, a1, a2, ...) then it is also clear that these numbers should be as large as possible, subject to fitting in memory, in the sense that replacing all ai with 2ai obviously improves it. (BTW all ai=1 is depth-first search). Two (ai) runs come up with the same number (2a178) of depth 178 nodes as a single (2ai) run, so we need to compare their scores. Imagine running two (ai) runs simultaneously. Can improve at any depth, i, by "crossover": merge the children of the 2ai depth i nodes and take the 2ai+1 highest scoring, then split into two groups of ai+1 nodes, assigning one set to one run, and the other set to the other. Because of independence, this is obviously an improvement and can be repeated at any depth, after which the algorithm becomes a single (2ai) run.
Another way of stating the beamsearch method is that at each depth we choose a minimum (cutoff) score, xi say, such that the number of children of your set of depth i-1 nodes which score more than xi is equal to some appropriate number (ai in the above fixed number example.) Clearly for any given run, we could have got the same set of final nodes using O(1) memory by running a depth-first search with the score cutoffs xi. However, knowledge of these fixed cutoffs were obtained from the breadth-first search, and if they are used for subsequent runs, I believe the results would be much worse than doing another beamsearch. So any fixed score cutoff algorithm is going to be bad. The reason is easy to see. On the second run (reusing xi) the score of nodes at (e.g.) depth 50 are going to be a bit better or a bit worse than on the first run. If a bit worse, then it will probably not provide any depth 178 nodes - all nodes may be cutoff by depth 55. If a bit better, then the depth-first search won't ever get to know about the best scoring nodes at depth 50 because it will be satisfied searching the mediocre depth 50 nodes: according to the old cutoffs they are perfectly adequate. Because the search space is so vast it may never terminate this search, and so never get on the the more lucrative areas of the search space. So because of this "instability" it seems to be useful to compare scores with your peers.
As mentioned above, the uncertainty involved in extrapolation meant that we didn't know how accurate our probabilities were, and so didn't know how often to expect a solution. Nevertheless, we are fairly sure that we were lucky to find one so early. We had not been running the latest version of our program for very long, but it suddenly hit upon a solution. (It took me the best part of a day to convince myself for certain that it was correct!). So instead of waiting for a new one, here's one I made earlier [feeble attempt at getting a laugh from old Blue Peter catchphrase. mention some features in solution, e.g. wiggly pieces tend to be around the outside, 'cos that got tiled first. If time, compare with Günter's fixed-endgame method.]
[Ask audience for their ideas.]
Here's one method which we thought of. Draw a line roughly down the middle of your solution, following the boundary of the pieces in your solution. Say for example there are 104 pieces to the left of the line and 105 to the right. Publish this boundary line and the set of 104 pieces used to tile the left region (hence by inference the set used to tile the right). Someone who wants to check you've solved it is allowed to say "left" or "right" and then you have to demonstrate how you've tiled the half they've asked for, using the pieces you claimed to have used. As discussed before, it's harder to tile a given shape of size about 100 with a piece set of the same size than it is to solve the original problem, because 100 is above the critical size. So we're happy that revealing half a solution doesn't help anyone find a solution. And obviously if you had answers prepared to both "left" and "right" challenges, then you have solved the original puzzle. However you could cheat because it is easy to tile a region of size 104 with 209 pieces. This tiling gives you an answer to the "left" challenge, but not the "right". So you can cheat but you have a 1/2 chance of being found out by this method, and to convince someone you'd have to bet something of large value that you will be able to answer the challenge!
Günter's method is better in most ways, and is a nice simple idea applicable to many other situations. What he said was this. First agree on some format for describing a solution. Then someone writes an obscure program that reads in a file in that format and outputs a short message X if it is a correct solution, and a short message Y if it isn't. (X,Y are only known the the program writer.) The program is sent to the solution-claimer who immediately runs it and sends its output to the program author. If X is about 20 bits in length, then it is sufficiently long to make it very unlikely to be guessed, but it is almost certainly not long enough to encode enough information about the solution to help someone to reconstruct it (covering the case that the programmer is malicious and has made the program output many different things to gain information). For the solution-claimer to cheat, he would have to debug this program, which could be very obscure, and work out what it should output for a correct solution. This would all have to be carried out within the one second (say) time-limit allowed after which the reply must be sent. One weakness, as with the first method, is that it only proves it to one person (from a skeptical third party's point of view they could be collaborating). The only other major weakness I can think of is that it relies on the non-existence of an artificially intelligent program which can figure out what another program is meant to be doing on the same time scale as it takes to run it!