Contents
========
How to run the program
Varying the parameters
How to interpret the output
Problems
The Algorithm and Mathematics (FAQ)
How to run the program
======================
Under Unix/Linux:
-----------------
Download the program file, sim.c, and the data file, which must be called
fa2n.gz, into the same directory. Do not decompress fa2n.gz. If your browser
has decompressed it, recompress it by typing gzip fa2n.
Compile using the command:
gcc -o sim sim.c -lm -O3
Then run using the command:
sim [var1=value1 [var2=value2 [...]]]
Under Windows:
--------------
You will need gcc.
I'm not sure whether this works (I don't have a windows machine), but try the
following.
Download the program file and the data file into the same directory. Make sure
the data file is decompressed (your browser may do this; if not try using
winzip). Let us say the program file is called sim.c and the datafile
fa2n. You will need to alter two or three lines in sim.c:
fp=popen("gunzip -c fa2n.gz","r");
becomes
fp=fopen("fa2n","r");
and
pclose(fp);
becomes
fclose(fp);
and if full pathnames are required to refer to files from programs then
the first change should in fact have been to:
fp=fopen("fred/jim/jane/fa2n","r");
(i.e. the full pathname of fa2n, whatever that is) and you will also need to
make the change:
fp=fopen("log","a");
to
fp=fopen("fred/jim/jane/log","a");
Then you compile it in the usual way (use level 3 optimisation).
Then you run it in the usual way, except that you need to be able to enter
command line parameters. On Unix you'd type something like
sim nr=3 bl=2
I don't know what the windows way of doing this is, but there must be one.
I'd appreciate feedback on whether the above works, or on how to simplify the
above description.
Varying the parameters
======================
The default setup is the player 0 (P0), player 1 (P1) both ante 0.5, then P0
raises 0.5 (live), then P1 can fold, call or raise 1 unit, then P0 can fold, call or
raise 1 unit (even if P1 called since P0 first blind raise was live), etc., up to
a fixed number of non-blind actions each (default=1 each).
In more usual terminology this translates to: P0 is the big blind, P1 is the
small blind (sorry it's this way round), and the allowed raise is 1 (=1 big
blind).
You can change the betting structure etc. by using different command line variables.
Variable Description Default
-------- ----------- -------
nr max. number of actions each 1
r size of raise 1
an ante 0.5
bl blind r/2
lim limit flag 1
liv live blind flag 1
pr affects print width 6
Notes:
(1) "blind" in the above table means the _extra_ money (in addition to the
ante) the big blind (P0) has to put in the pot. Thus P1 blinds an, and P0
blinds an+bl.
(2) If lim=1, r is the size of the allowed (limit) raise. If lim=0, r is the
size of the allowed raise measured as the proportion of the pot after
calling. So lim=0, r=1 is ordinary potlimit, and lim=0, r=0.5 is halfpotlimit.
There is no spread limit: only the maximum raise is allowed.
Examples:
sim nr=2
"Standard" setup. P1 blinds 0.5, P0 blinds 1, Raise=1.
sim nr=2 lim=0
"Standard" potlimit. P1 blinds 0.5, P0 blinds 1. e.g. P1 could then
raise 2 (making it 3), P0 could then raise 6 (making it 9).
sim nr=2 liv=0 bl=0.5 r=.1
P1 blinds 0.5, P0 blinds 1 (non-live), then the standard raise is 0.1
sim nr=2 liv=0 an=5 bl=5
P1 blinds 5, P0 blinds 10 (non-live), then the standard raise is 1. Of
course this is equivalent to the previous example.
There is another, different mode of behaviour, which is to solve the game of
"numbers vs numbers". For example, each player is dealt one of the four
numbers 0,1,2 or 3. 3 beats 2,1,0 and ties with 3, 2 beats 1,0, etc.
To get this, comment out the line #define HOLDEM, recompile and run with the
flag d=4 (say). This tends to be less well-behaved than Holdem, because the
solution is very degenerate (ambiguous). See FAQ below.
How to interpret the output
===========================
Check that the numbers "Actual primal value" and "Actual dual value" are the
same. If they are the same then (i) they are equal to the EV (from the point of
view of P0) and (ii) the program output is valid (see "Problems" below).
The strategy for each player is described in two parts: the "Pure bit" and the
"Probabilistic bit". Each hand should appear in once part or the other, but not
both. For P0 (the big blind), "hand" has a special meaning: r84s means that P0
has 84 suited and P1 has raised, c84s means that P0 has 84 suited and P1 has
called. Thus P1 has 169 hands (as usual), but P0 has 169*2=338 "hands".
R0F means fold
R0C means call
R1F means raise and fold-if-reraised
R1C means raise and call-if-reraised
R2F means raise, rereraise-if-reraised and fold-if-rerereraised
etc.
R2 means raise, rereraise-if-reraised, and the opponent is not allowed to
rerereraise because of the limit on the number of rounds, so it is not
necessary to say what to do after that.
CR0C (only applicable to P1, the small blind) means call, then call if raised.
CR1C (only applicable to P1, the small blind) means call, reraise if raised,
then call if rereraised.
etc.
Here is some example output (not very realistic):
Strategy for player 0
Pure bit:
--- R0F ---
--- R0C ---
r22o c32o r32o c42o r42o c52o r52o c62o
r62o c72o r72o c82o r82o c92o r92o cT2o
[...]
--- R1F ---
rT2o cJ2o rJ2o rQ2o rK2o rA2o c32s r32s
r33o c43o r43o c53o r53o c63o r63o c73o
[...]
Probabilistic bit:
R0F R0C R1F R1C R2
c87o 0 0.00140499 0 0.998595 0
rTTo 0 0 0 0.595865 0.404135
Strategy for player 1
Pure bit:
[...]
--- CR1C ---
KQo AQo KTs KJs KQs AQs
[...]
This means that P0 (the big blind) should
(i) always call (check) when holding T2o, if P1 calls.
(ii) always raise and fold-if-reraised when holding T2o, if P1 raises.
(iii) raise and call-if-reraised with probability 0.595865 when holding TT, if
P1 raises.
(iv) raise and raise-if-reraised with probability 0.404135 when holding TT, if
P1 raises.
and P1 (the small blind) should call, reraise if raised, and call-if-rereraised
when holding KQo.
Problems
========
(i) When presented with difficult input parameters (e.g. potlimit with many
betting rounds), the algorithm can fail to converge, or converge to an
inaccurate value. As mentioned above, you will know when the output is reliable
because "actual primal value" will the same as "actual dual value".
(ii) The optimal strategy can be "degenerate". That means that there is more
than one possible optimal strategy. In this case, it is not predictable which
optimal strategy it will find, and it is possible that it will arrive at
different optimal strategies on different computers due to differences in
floating point implementation. It is also possible that in this case it will
find a non-intuitive choice of optimal strategy.
The Algorithm and Mathematics (FAQ)
===================================
I may fill this section in later.
In the meantime, here are some replies to emails which requested information
Also the program contains a few comments which might be useful.
-----------------------------------------------------------------------------
...
> I would love to see the program that computed that. I've worked out optimal
> strategies for simple games before, but I've always just done it
> algebraicially/by inspection. Applying simplex method to these situations
> has always, for some reason I'm slightly ashamed of, seemed rather mystical
> to me. If I saw code that did it, I think I'd finally get it.
...
Probably though this program wouldn't be the best way to learn about how
to get optimal strategies, because poker introduces lots of complicating
factors (like the fact that the players get dealt cards. This makes it
harder.)
I can recommend these books if you want to learn about solving two
player games.
"The Theory of Games and Linear Programming", S. Vajda,
"An Introduction to Linear Programming and the Theory of Games", S.Vajda,
"Game Theory; Mathematical Models of Conflict", A.J.Jones.
I think the first two books may be different editions of the same thing.
This guy S.Vajda has written lots of similar sounding books, but I found
the above books especially readable.
-----------------------------------------------------------------------------
...
>
> I am very interested in your program and your algorithms.
>
> I am somewhat familiar with game theory, but not with
> computational techniques to solve what are apparently
> very large problems such as the game you describe.
>
> Let me try to explain where I am coming from.
>
> Consider a somewhat simpler pre-flop Hold 'em game than
> yours, for the purpose of illustration. We have a small
> blind (1 chip) who acts first and a big blind (2 chips)
> who acts second.
>
> The sb may either raise (to 4 chips) or fold. The BB may either
> call (an additional 2 chips) or fold if the sb has raised.
> Obviously he wins 3 chips if the sb folds.
>
> If neither party folds then they play showdown.
> This is much like the game you described but with fewer
> options.
>
> What are the pure strategies for each player?
> There are 169 logically distinct 2-card holdings for each
> player. For each one of these 169 holdings the sb may
> raise or fold. We can think of a pure strategy as a
> vector, 169 elements long, each element containing either
> RAISE or FOLD. Hence there are 2**169 distinct pure
> strategies that the sb can choose. Likewise, there are
> 2**169 distinct pure strategies the bb can choose.
>
> Hence the game matrix is 2**169 x 2**169 !!!!!!
You are right. You _could_ set it up so that there are 2**169 pure
strategies. And of course you are right that this would be computationally
useless. But there is a much better way to do it.
The thing is, it is a good enough description of the strategy (of SB say)
to write down a probability of RAISE, FOLD (adding up to 1) for each of
the 169 holdings. (This is how you are likely to think about your strategy
as a human.) So his strategy is described by 2*169 numbers, rather than
2**169 numbers. (You could reduce it to 169 numbers, but this is a small
optimisation when there are more than two possible actions.)
This is a bit confusing, because we are actually abandoning the "standard"
way of writing a strategy as a combination of pure strategies (of which
there would have to be 2**169 as you say). I.e. this is _not_ now a matrix
game in standard form. But the nice fact is, the simplex algorithm can
_still_ easily be used to solve for things in this form.
Let us say:
Variable i ranges over the "situations" for SB (i=1,2,...,169 in this
example)
Variable j ranges over the "actions" for SB (j=1,2 in this example)
Variable k ranges over the "situations" for BB (k=1,2,...,169 in this
example)
Variable l ranges over the "actions" for BB (l=1,2 in this example)
Let ev(i,j,k,l) be the expected value given this combination of situations
and options. (Positive = good for the BB say.)
(You need to be a bit careful in working out ev(i,j,k,l). For example
there are several different terms contributing to the EV of AKo vs AQo
according to which suits match up. I.e. when working out AKo vs AQo you
need to add up AcKd vs AhQs, AcKd vs AhQc, etc.)
Let p(i,j) be the probability that SB does action j when in situation i.
Let P(i,k) be the probability of situtations i,k.
ev() and P() can be combined. Only their product matters, so let
EV(i,j,k,l)=ev(i,j,k,l)P(i,k).
Then the BB will choose his best option in each situtation, so
the EV for the BB is
sum_k( max_l( sum_{i,j}p(i,j)EV(i,j,k,l) ) )
Therefore, SB's task is to choose p(i,j) to minimise the above expression.
The only difference from the standard case is that one player is trying to
minimise the _sum of_ the maximum of things, as opposed to just the
maximum of things.
But the above optimisation problem for SB can just as easily be written in
simplex form. You just introduce the extra variables m(k) and rewrite it
as
Minimise: sum_k(m(k))
subject to the constraints
For all l, m(k) >= sum_{i,j}p(i,j)EV(i,j,k,l)
(and of course the constraints
For all i, sum_j p(i,j)=1
For all i,j, p(i,j)>=0)
So the upshot is that we have to deal with 2*169 x 2*169 matrices. Large,
but not impossibly large. As mentioned in the news article, I found that
you can speed up the computations immeasurably by starting off the simplex
alg. in a good "pure" state. Here "pure" is being used to mean for each i,
there is a unique j such that p(i,j)=1.
For this method, strategies have to be described "in one go". That is,
they are allowed to be contingent on random acts of nature (your cards),
but can't be contingent on your past actions or your opponent's past
actions.
So you can't use a variable which is the probability of raising given that
your opponent has three-bet. You have to say your possible strategies are
(something like) "Call", "Fold", "Raise-1-then-call", "Raise-1-then-fold",
"Raise-2-then-call", etc., with a variable for each.
This means that if there are several _rounds_ of betting, the number of
strategies starts to grow quite rapidly. I don't really have a good method
for dealing with this at the moment!
>
> If you know the all-in equity of each two-card hand vs
> every other 2-card hand (and I have such a table), then it
> is 'trivial' to compute the EV for each entry in the game
> matrix, and thus fill in the matrix with the correct values.
>
> But what I don't understand is how you go about solving
> such a huge game theory matrix to produce the probabilities
> of choosing each pure strategy (i.e., the mixed strategy).
>
> Obviously in theory we can eliminate dominated strategies
> and produce a presumably much smaller matrix, but it is
> still going to be enormous.
>
> And the more options each player has the (exponentially)
> larger the game matrix grows.
>
> What does the 'simplex' method do that can handle such
> a seemingly intractably large problem, or how does it
> reduce the problem down to something that is tractable?
>
> Thanks for your time, and any insight you can share.
>
>
>
>
Hope this is of help,
-----
> Here are some more questions.
>
> Does this method guarentee convergence to the optimal solution
> or can it get stuck in a local maximum?
Good question. Since we're just using the simplex algorithm, your question
is asking "does the simplex algorithm converge or can it get stuck?". The
answer in principle is no. The simplex algorithm is a genuine algorithm
which is guaranteed to converge to the best solution. Or at least to _a_
best solution. However this assumes that the all numbers it works with are
kept to infinite precision. In practice the algorithm can actually get
stuck in difficult cases. This mainly happens when there is a large amount
of so-called degeneracy at the current vertex of the simplex. Here I am
talking about my implementation of the simplex algorithm. There could be
better implementations that don't get stuck so much (e.g. NAG?). Despite
all this, it is very easy to check to see whether your answer is optimal
(so there needn't be any uncertainty as to whether the algorithm has
converged to the optimum or not), and it usually converges.
>
> I'm pretty sure I understand your equation for the BB's EV, but
> I have a question. You take the MAX over L (where L ranges over
> the BB's options given his hand). But we know that the optimal
> strategy can be mixed. That is, it may be correct for the BB
> to follow L[1] 60% of the time and L[2] 40% of the time.
> How is this consistent with the fact that the maximization problem
> only treats the 'best' L?
The optimal counter-strategy to any single fixed strategy is pure. (Or at
least, _an_ optimal counter-strategy is pure.) It's only when you try to
find a counter-strategy to all other strategies that you need to use a
mixed strategy. The reason (for the first statement of this paragraph) is,
in the jargon, because the space of strategies is convex and the objective
function is linear, so is maximised at an extreme point. Think of a
dodecahedron or cube (or any convex polyhedron) in some random
orientation. The highest point will always be at a vertex.
>
> I'm not really sure I understand how you get from the BB's EV
> equation to the form in which the SIMPLEX method can be applied.
> (I don't know much about linear programming. I did go and
> read the FAQ for the linear programming newsgroup). Is the
> form you cast the equations into the 'standard form' talked about
> in basic linear programming or is there another layer of
> complication here?
I haven't read this FAQ, but there are no other layers of complication. I
was just trying to show how to put SB's mimisation problem into standard
form. The exact definition of standard form can vary, but one definition
is
minimise c.x
subject to Ax>=b
(c and x are vectors of length n, b is a vector of length m, A is a mxn
matrix, A,b,c are given and x is the thing you are allowed to vary.)
> Finally, something of a more practical nature.
>
> If I understand what is going on, if I were to hypothetically
> supply appropriate values for EV(i,j,k,l), then, there could
> exist a program which can take these numbers and feed them
> into your existing program (or your program may already do
> this) and take the output and spit out the optimal percentages
> for doing each action for each possible hand type
> (the optimal probability of doing the j's for each i, and the
> l's for each k). Correct?
Yes. That is what the program does. Well actually the division into
two halves isn't totally clean because of the way it handles the fact that
the big blind is live. (This is an annoying kludge.)
>
> In heads-up Hold 'em there are 169x169 possible i's and k's.
> In the simple problem I proposed there are 2 options for the
> SB and 2 options for the BB, making a total of 114244 numbers
> which would have to be calculated before your program could
> start crunching. Correct?
Yes. Except that to handle the live blind, there are actually 169*2 i's
and 169 k's. The big blind thinks of c65s (he has 65s and the SB has
initially called) and r65s (he has 65s and the SB has initially raised) as
two different hands.
>
> If each player has N options the amount of data therefore
> goes up quadratically. What about solution time? How many
> options per player can the thing realistically handle?
Each step of the simplex alg. takes time proportional to the number of
entries in the matrix (114244 in your example). It's basically doing a few
multiplies/adds per entry. So it comes down to how many steps required by
the simplex alg. and how much time to initialise the simplex alg..
You can look up the maximum number of steps taken by the simplex alg.,
but you will get an answer which is much bigger than the number of steps
needed in practice, so that theoretical bound isn't of much use.
In practice what seems to happen is that if you start with any old
initialisation, the number of steps will be roughly equal to the number of
rows (or is it columns? I can't remember). However the program starts at a
locally optimal pure strategy, and then the simplex alg. only needs a few
steps. In fact most of the time seems to be spent finding the initial
optimal pure strategy, which probably takes time proportional to the
number of entries in the matrix, but with a larger constant of
proportionality.
In summary, it depends heavily on the characterisitics of the particular
problem. In poker each hand is somehow unique and special, which I think
leads to almost-pure optimal stratgies and speeds up the search a lot. You
could hope for time proportional to number-of-entries-in-matrix (114244 in
your example) for this class of problems. In fact I think the simplex alg.
is almost irrelevant when the strategies are so nearly pure.
In practice I found that for limit holdem you could handle as many
options as you had memory to keep the matrix, but for potlimit it got
stuck if you gave it too many (probably because with several potlimit
raises the numbers get big, which makes things unstable). But be warned
that the number of options could grow very rapidly if you wanted to do
anything remotely realistic. Just one round of betting keeps things fairly
simple.
-----------------------------------------------------------------------------
Exchange with Dr C.
> Alex,
> Thanks for the pointer to your preflop holdem page.
> Fascinating stuff, as you say, but I've got a few questions.
> 1. Just to check - I assume that the only legal raise is a pot raise?
> 2. If R4 were allowed, would it affect the results?
> 3. I assume R1 means Raise-Call, rather than Raise-Fold,
> and similarly with R2, CR1 and so forth?
> 4. Assuming that R1 means raise-call, I'll temporarily rename it to RC,
> if you don't mind, rename R2 to RRC, rename CR1 to CRC and so on.
> Okay, what about the strategies RF, RRF, CRF and so on
> (Raise, but fold if oppo raises back etc.)?
> Would it affect the results if these strategies were allowed?
> 5. I ask 4 above particularly because I am surprised that I can't see
> any genuine bluffs in your strategy, and bluffs are predicted
> (albeit weakly) by our analysis of the [0,1] game
> (by which I mean the game where each player receives an 0<=x<=1).
> Not compelling evidence, but enough to worry me.
> I'm wondering if maybe the bluffs should be played as RF,
> and that's why you're not seeing them.
> I'd expect (23o) to be played as 95%F + 5%RF or something like that.
> a.
>
In general the program used (sim.c) to find these results is configurable,
but the original Usenet article was written with the example of limit
Holdem. By altering command-line parameters you can change the raising
structure between fixed-limit and some-proportion-of-pot, and by altering
a #define you can change the game between Holdem and numbers-vs-numbers
(i.e., a discrete version of the [0,1] game). As far as I remember the
numbers-vs-numbers works a bit less well (I mean the simplex algorithm
converges less well for large pot-limit cases) than Holdem because the
simplex is so degenerate (i.e., there is a large dimensional family of
optimal strategies). I've only written a naive version of the simplex
algorithm, and it tends to get stuck if there is massive degeneracy. In
Holdem however, each hand is "special" which leads to not too much
degeneracy and also an almost pure optimal strategy. This last fact makes
the calculations much easier (because I take care to start the simplex
algorithm it in a local optimal pure state).
Answering your points:
1. (See above.) The program is configurable, but the posted Usenet results
were for limit.
2. No (except in a trivial way). You can set the maximum number of raises
on the command-line, and it eventually doesn't want to use any more.
(Except for AA for which it is happy to raise any number of times, but
this doesn't change the expectation because there is no other hand which
is prepared to match it in raises.)
3. Yes
4. CR1F, R1F, etc. were allowed but they turned out not to be part of the
optimal strategy for Holdem.
5. It likes to Call-Raise with some hands (e.g., KK) which is a kind of
bluff. If you run the program in numbers-vs-numbers mode you should
recover pure bluffs.
I think I had the same thoughts as you, but interestingly Paul was not at
all surprised that there are no pure-bluff Holdem hands. The other
interesting thing that Paul knew intuitively beforehand was that the
results would be quite meaningful in terms of real Holdem even though the
calculation is only preflop. When he convinced me of this, it made it seem
like much more of a worthwhile task to write the program.
Alex
PS Do you mind if I add the above email exchange (after anonymising) to my
FAQ?
> Thanks for the info.
> I'm interested by the lack of bluffing, and now I have absolutely no intuition
> about whether one should bluff in some other poker variant, i.e. what the
> essential difference between preflop holdem and [0,1] is that affects
> whether genuine bluffing is correct.
> Lots of experimentation could be useful.
> In a different life, this would make a good PhD subject!
>
> > PS Do you mind if I add the above email exchange (after anonymising) to my
> > FAQ?
>
> No problem at all.
> You don't have to anonymise on my account - I don't mind either way.
> a.
>
Actually I was wrong that there is never bluffing. To compare like with
like we should use an=0 bl=1 and r=1 or r=2, i.e., small blind is 0,
(rather than half of the big blind as in the published example). In this
case the big blind should occasionally run pure bluffs.
I'll think about it some more.
BTW exercise: if you run an=0 bl=1 it tells you that the big blind should
fold some hands (e.g. 42s) when the small blind has only called, so the
big blind could have checked it out for a free showdown, but decides to
muck his hand. Why is this not necessarily a bug? (This had me worried
before I realised what was happening.)
Alex
[Answer: because in this situtation small blind should fold or raise,
never call. The only way to exploit BB's "mistake" of mucking 42s after
a call, is for SB to call with something some of the time. But this
presumably works out as a bigger "mistake". This is an example of a
degenerate optimal strategy, and the program picking a non-intuitive
representative. It could equally well have said "check" with 42s, but
it is being confusing because it hasn't yet been instructed not to be.
Another (simpler) example of something similar is numbers vs numbers
using the parameters d=2 nr=1 an=0 bl=1, where BB may muck 0 after a SB
call.]
-----------------------------------------------------------------------------