Let B be a collection of regions with added pieces. That is, each B B is a region BR of size k to be tiled, together with a set BP of n pieces from the 209 (of course n k).

For B B, let A(B) be the collection of all subsets of BP of size k, and let the solution set S(B) A(B) be the collection of choices of k pieces which will tile the region. Let's pretend that each S S(B) can only tile the region in one way (although it may be that due to a small symmetric subregion in the solution there are `duplicates').

We will assume that each piece has a `goodness' pi such that the probability of being able to tile a region of size k is proportional to the product of these numbers. Throughout, li = log(pi).

Let F be a set of boundary features and for j F, cj(R) be the value of the jth boundary feature of the region R. For example, we used c0(R) = the number of corners (changes in direction) in the boundary of R, c1(R) = total length of bad edge in R in units of altitudes, and c2(R) = the total length of good edge in R in units of half-base of equilateral triangle. (`bad edge' is our name for edge which consists of altitudes of the equilateral triangles, and `good edge' our name for the other sort). We shall assume there are numbers mj such that the probability of being able to tile a region R is proportional to
exp


j F 
mjcj(R)
.

Actually the above assumptions are not really assumptions since you can always declare the probability of being able to tile something is such-and-such. The above `assumptions' are really just defining the information on which our `distinguisher' is allowed to depend.

Using the above definitions the probability of being able to tile a region R with a set of pieces X is according to our model
P(R,X) = exp


i X 
li+

j F 
mj cj(R)
.
(1)

We need to choose li and mj to make P(R,X) high when X can tile R and low when it can't.

The likelihood of the model given the data is


B B 



S S(B) 
P(BR,S)

X A(B)\S(B) 
(1-P(BR,X))
.

We're going to make a few modifications (described below) to this formula, and after that we'll choose li and mj to maximise the (new) likelihood. [By the way S = the negative logarithm of the likelihood has another interpretation as `conditional entropy', or residual uncertainty. Imagine trying to explain (or compress) the raw binary sequence of 0,1 formed by taking all pairs (R,X) (in some random order) and writing 0 if the pieces X cannot tile the region R and writing 1 if they can. If there are N digits altogether in this 0,1 sequence and proportion p of them are 1 and you compressed it without knowing anything about tiling you would probably reduce it to length S0 = -N(plog(p)+(1-p)log(1-p)). This is what you would get for S if you didn't allow P(R,X) to depend on anything, in which case you would choose P(R,X) to be equal to the constant p. At the other extreme we could allow our probabilities to depend on everything and we would get P(R,X) = 1 when the set of pieces X can tile R and P(R,X) = 0 when it can't. Then the residual uncertainty S would be 0. Of course the problem with such a P(R,X) is that it presumably can only be evaluated by an actual search. We're trying to find the minimum S given we're allowed to do some computation of the form (1). We'll end up with S between 0 and S0.]

Now we'll describe the modifications to the above formula, but first let's talk instead in terms of the more convenient log likelihood, L:
L =

B B 



S S(B) 
log(P(BR,S))+

X A(B)\S(B) 
log(1-P(BR,X))
.
(2)
What sort of size are the numbers? We mainly used k = 24, and n is likely to be at least 60. Say n = 60. The typical number of solutions will vary according to how good the pieces and regions are, but let's aim (very roughly) at a few hundred solutions per B B. There are (60 || 24) 4×1016 elements of A(B) (choices of 24 pieces) so P(R,X) will be very small, of the order 10-14, and so we can replace log(1-P(BR,X)) with -P(BR,X) and A(B)\S(B) with A(B) in (2). (Actually this is not really an approximation - we're just changing to a model where the number of solutions to (R,X) has Poisson distribution with parameter P(R,X), which happens to be a very similar model because P(R,X) is so small.)

Now the formula for L looks like
L =

B B 




S S(B) 
logP(BR,S)
-

X A(B) 
P(BR,X)
.
Let nB = |S(B)| = the number of solutions in B. The trouble with using the above L is that nB will vary a lot. Some B might have nB = 0, others nB = 5 and a few B might have nB = 100000, because it is hard to accurately control how many solutions you get when you add a lot of pieces. This L will be dominated by the terms arising from B for which nB is particularly large. It will turn out that we may as well not have bothered exhausting 99.9% of the regions, because the pi and mj will end up being geared almost entirely towards minimising the terms in L arising from B for which nB is very large. So most of our data will be ignored and the model will spend all of its effort trying to `explain' the B for which nB is very large, and the values it gets for pi and mj will be pretty useless. When we chose lots of different B, we were trying to make a uniform sample from the (very large) space of all possible B. In effect with the above L we are weighting each sample point according to nB, but this is bad because the results associated with a given B are not independent. We decided to smooth this problem out by using a fiddle factor, wB, to weight each B. wB is defined as
wB = 1
nB




1- 3
  ____
nB+9




.
This definition is obviously a bit arbitrary, but it probably doesn't matter too much. wBnB is chosen to be between 0 and 1 and increasing in nB. Our final version of L is then
L =

B B,nB > 0 
wB



S S(B) 
logP(BR,S)
-

X A(B) 
P(BR,X)
.
This can be rewritten equivalently as
L =

i 
aili+

j 
bjmj-

B B, nB > 0 
wBexp


j 
mjcj(BR)
Ak(BP),
(3)
where
Ak(X) =

[(Y X) || (|Y| = k)] 


i Y 
pi,
and the constants ai, bj are defined by
ai =

B B,nB > 0 

wB

[(S S(B)) || (i S)] 
1
and
bj =

B B,nB > 0 

wB

S S(B) 
cj(BR)
.

The nice thing about L is that it can be evaluated quickly. The only problematic term might be Ak(BP), but in fact we can do the necessary sum over all possible piece subsets even though there might be, e.g., (60 || 24) 4×1016 of them. If we really had to do 1016 work to evaluate L, then everything above would be useless because we would never know how to choose pi and mj to maximise L. So we weren't really free to choose any model we liked - we had to keep in mind the need to actually be able to calculate the likelihood. To evaluate Ak(X) we list the elements of X in order and make use of the fact that for i X,
Ak(X) = piAk-1(X\{i})+Ak(X\{i}).
So if X = {x1,,xn} we can build up to Ak(X) in n stages, where at the rth stage we keep track of Ai({x1,,xr}) for all relevant i.

Now it is pretty clean optimising (3) over li and mj because it is smooth and concave (so a local maximum is a global maximum and there are no places to get stuck), and it is easy to evaluate first and second derivatives. For example if i X,

li
Ak(X) = piAk-1(X\{i}).

By the way, the main problem with broadening this model to include interaction terms between pieces is that L will no longer be possible to evaluate, because we'll be faced instead of Ak(X) with sums like this


[(Y X) || (|Y| = k)] 


i,j Y 
bi,j,
which I think are impossible in general. We did experiment with a restricted set of interaction terms, e.g. the 100 most important pairs of pieces, but this did not obviously make a great improvement, and it slowed everything down so we put it aside, intending perhaps to return to it when we had done other things that seemed more important! There are other (non maximum likelihood) ways of evaluating the effectiveness of model parameters, but we weren't sure how good they would be, and we didn't pursue them. So our final model never included piece-piece interaction terms, although we do have some fairly good data about the 100 or so most important interactions. piece-boundary interactions, on the other hand, would be easy to introduce, but we did not get around to trying this.




File translated from TEX by TTH, version 2.73.
On 6 Nov 2000, 10:35.