Contents ======== How to run the program Varying the parameters How to interpret the output Problems The Algorithm and Mathematics (FAQ) How to run the program ====================== Under Unix/Linux: ----------------- Download the program file, sim.c, and the data file, which must be called fa2n.gz, into the same directory. Do not decompress fa2n.gz. If your browser has decompressed it, recompress it by typing gzip fa2n. Compile using the command: gcc -o sim sim.c -lm -O3 Then run using the command: sim [var1=value1 [var2=value2 [...]]] Under Windows: -------------- You will need gcc. I'm not sure whether this works (I don't have a windows machine), but try the following. Download the program file and the data file into the same directory. Make sure the data file is decompressed (your browser may do this; if not try using winzip). Let us say the program file is called sim.c and the datafile fa2n. You will need to alter two or three lines in sim.c: fp=popen("gunzip -c fa2n.gz","r"); becomes fp=fopen("fa2n","r"); and pclose(fp); becomes fclose(fp); and if full pathnames are required to refer to files from programs then the first change should in fact have been to: fp=fopen("fred/jim/jane/fa2n","r"); (i.e. the full pathname of fa2n, whatever that is) and you will also need to make the change: fp=fopen("log","a"); to fp=fopen("fred/jim/jane/log","a"); Then you compile it in the usual way (use level 3 optimisation). Then you run it in the usual way, except that you need to be able to enter command line parameters. On Unix you'd type something like sim nr=3 bl=2 I don't know what the windows way of doing this is, but there must be one. I'd appreciate feedback on whether the above works, or on how to simplify the above description. Varying the parameters ====================== The default setup is the player 0 (P0), player 1 (P1) both ante 0.5, then P0 raises 0.5 (live), then P1 can fold, call or raise 1 unit, then P0 can fold, call or raise 1 unit (even if P1 called since P0 first blind raise was live), etc., up to a fixed number of non-blind actions each (default=1 each). In more usual terminology this translates to: P0 is the big blind, P1 is the small blind (sorry it's this way round), and the allowed raise is 1 (=1 big blind). You can change the betting structure etc. by using different command line variables. Variable Description Default -------- ----------- ------- nr max. number of actions each 1 r size of raise 1 an ante 0.5 bl blind r/2 lim limit flag 1 liv live blind flag 1 pr affects print width 6 Notes: (1) "blind" in the above table means the _extra_ money (in addition to the ante) the big blind (P0) has to put in the pot. Thus P1 blinds an, and P0 blinds an+bl. (2) If lim=1, r is the size of the allowed (limit) raise. If lim=0, r is the size of the allowed raise measured as the proportion of the pot after calling. So lim=0, r=1 is ordinary potlimit, and lim=0, r=0.5 is halfpotlimit. There is no spread limit: only the maximum raise is allowed. Examples: sim nr=2 "Standard" setup. P1 blinds 0.5, P0 blinds 1, Raise=1. sim nr=2 lim=0 "Standard" potlimit. P1 blinds 0.5, P0 blinds 1. e.g. P1 could then raise 2 (making it 3), P0 could then raise 6 (making it 9). sim nr=2 liv=0 bl=0.5 r=.1 P1 blinds 0.5, P0 blinds 1 (non-live), then the standard raise is 0.1 sim nr=2 liv=0 an=5 bl=5 P1 blinds 5, P0 blinds 10 (non-live), then the standard raise is 1. Of course this is equivalent to the previous example. There is another, different mode of behaviour, which is to solve the game of "numbers vs numbers". For example, each player is dealt one of the four numbers 0,1,2 or 3. 3 beats 2,1,0 and ties with 3, 2 beats 1,0, etc. To get this, comment out the line #define HOLDEM, recompile and run with the flag d=4 (say). This tends to be less well-behaved than Holdem, because the solution is very degenerate (ambiguous). See FAQ below. How to interpret the output =========================== Check that the numbers "Actual primal value" and "Actual dual value" are the same. If they are the same then (i) they are equal to the EV (from the point of view of P0) and (ii) the program output is valid (see "Problems" below). The strategy for each player is described in two parts: the "Pure bit" and the "Probabilistic bit". Each hand should appear in once part or the other, but not both. For P0 (the big blind), "hand" has a special meaning: r84s means that P0 has 84 suited and P1 has raised, c84s means that P0 has 84 suited and P1 has called. Thus P1 has 169 hands (as usual), but P0 has 169*2=338 "hands". R0F means fold R0C means call R1F means raise and fold-if-reraised R1C means raise and call-if-reraised R2F means raise, rereraise-if-reraised and fold-if-rerereraised etc. R2 means raise, rereraise-if-reraised, and the opponent is not allowed to rerereraise because of the limit on the number of rounds, so it is not necessary to say what to do after that. CR0C (only applicable to P1, the small blind) means call, then call if raised. CR1C (only applicable to P1, the small blind) means call, reraise if raised, then call if rereraised. etc. Here is some example output (not very realistic): Strategy for player 0 Pure bit: --- R0F --- --- R0C --- r22o c32o r32o c42o r42o c52o r52o c62o r62o c72o r72o c82o r82o c92o r92o cT2o [...] --- R1F --- rT2o cJ2o rJ2o rQ2o rK2o rA2o c32s r32s r33o c43o r43o c53o r53o c63o r63o c73o [...] Probabilistic bit: R0F R0C R1F R1C R2 c87o 0 0.00140499 0 0.998595 0 rTTo 0 0 0 0.595865 0.404135 Strategy for player 1 Pure bit: [...] --- CR1C --- KQo AQo KTs KJs KQs AQs [...] This means that P0 (the big blind) should (i) always call (check) when holding T2o, if P1 calls. (ii) always raise and fold-if-reraised when holding T2o, if P1 raises. (iii) raise and call-if-reraised with probability 0.595865 when holding TT, if P1 raises. (iv) raise and raise-if-reraised with probability 0.404135 when holding TT, if P1 raises. and P1 (the small blind) should call, reraise if raised, and call-if-rereraised when holding KQo. Problems ======== (i) When presented with difficult input parameters (e.g. potlimit with many betting rounds), the algorithm can fail to converge, or converge to an inaccurate value. As mentioned above, you will know when the output is reliable because "actual primal value" will the same as "actual dual value". (ii) The optimal strategy can be "degenerate". That means that there is more than one possible optimal strategy. In this case, it is not predictable which optimal strategy it will find, and it is possible that it will arrive at different optimal strategies on different computers due to differences in floating point implementation. It is also possible that in this case it will find a non-intuitive choice of optimal strategy. The Algorithm and Mathematics (FAQ) =================================== I may fill this section in later. In the meantime, here are some replies to emails which requested information Also the program contains a few comments which might be useful. ----------------------------------------------------------------------------- ... > I would love to see the program that computed that. I've worked out optimal > strategies for simple games before, but I've always just done it > algebraicially/by inspection. Applying simplex method to these situations > has always, for some reason I'm slightly ashamed of, seemed rather mystical > to me. If I saw code that did it, I think I'd finally get it. ... Probably though this program wouldn't be the best way to learn about how to get optimal strategies, because poker introduces lots of complicating factors (like the fact that the players get dealt cards. This makes it harder.) I can recommend these books if you want to learn about solving two player games. "The Theory of Games and Linear Programming", S. Vajda, "An Introduction to Linear Programming and the Theory of Games", S.Vajda, "Game Theory; Mathematical Models of Conflict", A.J.Jones. I think the first two books may be different editions of the same thing. This guy S.Vajda has written lots of similar sounding books, but I found the above books especially readable. ----------------------------------------------------------------------------- ... > > I am very interested in your program and your algorithms. > > I am somewhat familiar with game theory, but not with > computational techniques to solve what are apparently > very large problems such as the game you describe. > > Let me try to explain where I am coming from. > > Consider a somewhat simpler pre-flop Hold 'em game than > yours, for the purpose of illustration. We have a small > blind (1 chip) who acts first and a big blind (2 chips) > who acts second. > > The sb may either raise (to 4 chips) or fold. The BB may either > call (an additional 2 chips) or fold if the sb has raised. > Obviously he wins 3 chips if the sb folds. > > If neither party folds then they play showdown. > This is much like the game you described but with fewer > options. > > What are the pure strategies for each player? > There are 169 logically distinct 2-card holdings for each > player. For each one of these 169 holdings the sb may > raise or fold. We can think of a pure strategy as a > vector, 169 elements long, each element containing either > RAISE or FOLD. Hence there are 2**169 distinct pure > strategies that the sb can choose. Likewise, there are > 2**169 distinct pure strategies the bb can choose. > > Hence the game matrix is 2**169 x 2**169 !!!!!! You are right. You _could_ set it up so that there are 2**169 pure strategies. And of course you are right that this would be computationally useless. But there is a much better way to do it. The thing is, it is a good enough description of the strategy (of SB say) to write down a probability of RAISE, FOLD (adding up to 1) for each of the 169 holdings. (This is how you are likely to think about your strategy as a human.) So his strategy is described by 2*169 numbers, rather than 2**169 numbers. (You could reduce it to 169 numbers, but this is a small optimisation when there are more than two possible actions.) This is a bit confusing, because we are actually abandoning the "standard" way of writing a strategy as a combination of pure strategies (of which there would have to be 2**169 as you say). I.e. this is _not_ now a matrix game in standard form. But the nice fact is, the simplex algorithm can _still_ easily be used to solve for things in this form. Let us say: Variable i ranges over the "situations" for SB (i=1,2,...,169 in this example) Variable j ranges over the "actions" for SB (j=1,2 in this example) Variable k ranges over the "situations" for BB (k=1,2,...,169 in this example) Variable l ranges over the "actions" for BB (l=1,2 in this example) Let ev(i,j,k,l) be the expected value given this combination of situations and options. (Positive = good for the BB say.) (You need to be a bit careful in working out ev(i,j,k,l). For example there are several different terms contributing to the EV of AKo vs AQo according to which suits match up. I.e. when working out AKo vs AQo you need to add up AcKd vs AhQs, AcKd vs AhQc, etc.) Let p(i,j) be the probability that SB does action j when in situation i. Let P(i,k) be the probability of situtations i,k. ev() and P() can be combined. Only their product matters, so let EV(i,j,k,l)=ev(i,j,k,l)P(i,k). Then the BB will choose his best option in each situtation, so the EV for the BB is sum_k( max_l( sum_{i,j}p(i,j)EV(i,j,k,l) ) ) Therefore, SB's task is to choose p(i,j) to minimise the above expression. The only difference from the standard case is that one player is trying to minimise the _sum of_ the maximum of things, as opposed to just the maximum of things. But the above optimisation problem for SB can just as easily be written in simplex form. You just introduce the extra variables m(k) and rewrite it as Minimise: sum_k(m(k)) subject to the constraints For all l, m(k) >= sum_{i,j}p(i,j)EV(i,j,k,l) (and of course the constraints For all i, sum_j p(i,j)=1 For all i,j, p(i,j)>=0) So the upshot is that we have to deal with 2*169 x 2*169 matrices. Large, but not impossibly large. As mentioned in the news article, I found that you can speed up the computations immeasurably by starting off the simplex alg. in a good "pure" state. Here "pure" is being used to mean for each i, there is a unique j such that p(i,j)=1. For this method, strategies have to be described "in one go". That is, they are allowed to be contingent on random acts of nature (your cards), but can't be contingent on your past actions or your opponent's past actions. So you can't use a variable which is the probability of raising given that your opponent has three-bet. You have to say your possible strategies are (something like) "Call", "Fold", "Raise-1-then-call", "Raise-1-then-fold", "Raise-2-then-call", etc., with a variable for each. This means that if there are several _rounds_ of betting, the number of strategies starts to grow quite rapidly. I don't really have a good method for dealing with this at the moment! > > If you know the all-in equity of each two-card hand vs > every other 2-card hand (and I have such a table), then it > is 'trivial' to compute the EV for each entry in the game > matrix, and thus fill in the matrix with the correct values. > > But what I don't understand is how you go about solving > such a huge game theory matrix to produce the probabilities > of choosing each pure strategy (i.e., the mixed strategy). > > Obviously in theory we can eliminate dominated strategies > and produce a presumably much smaller matrix, but it is > still going to be enormous. > > And the more options each player has the (exponentially) > larger the game matrix grows. > > What does the 'simplex' method do that can handle such > a seemingly intractably large problem, or how does it > reduce the problem down to something that is tractable? > > Thanks for your time, and any insight you can share. > > > > Hope this is of help, ----- > Here are some more questions. > > Does this method guarentee convergence to the optimal solution > or can it get stuck in a local maximum? Good question. Since we're just using the simplex algorithm, your question is asking "does the simplex algorithm converge or can it get stuck?". The answer in principle is no. The simplex algorithm is a genuine algorithm which is guaranteed to converge to the best solution. Or at least to _a_ best solution. However this assumes that the all numbers it works with are kept to infinite precision. In practice the algorithm can actually get stuck in difficult cases. This mainly happens when there is a large amount of so-called degeneracy at the current vertex of the simplex. Here I am talking about my implementation of the simplex algorithm. There could be better implementations that don't get stuck so much (e.g. NAG?). Despite all this, it is very easy to check to see whether your answer is optimal (so there needn't be any uncertainty as to whether the algorithm has converged to the optimum or not), and it usually converges. > > I'm pretty sure I understand your equation for the BB's EV, but > I have a question. You take the MAX over L (where L ranges over > the BB's options given his hand). But we know that the optimal > strategy can be mixed. That is, it may be correct for the BB > to follow L[1] 60% of the time and L[2] 40% of the time. > How is this consistent with the fact that the maximization problem > only treats the 'best' L? The optimal counter-strategy to any single fixed strategy is pure. (Or at least, _an_ optimal counter-strategy is pure.) It's only when you try to find a counter-strategy to all other strategies that you need to use a mixed strategy. The reason (for the first statement of this paragraph) is, in the jargon, because the space of strategies is convex and the objective function is linear, so is maximised at an extreme point. Think of a dodecahedron or cube (or any convex polyhedron) in some random orientation. The highest point will always be at a vertex. > > I'm not really sure I understand how you get from the BB's EV > equation to the form in which the SIMPLEX method can be applied. > (I don't know much about linear programming. I did go and > read the FAQ for the linear programming newsgroup). Is the > form you cast the equations into the 'standard form' talked about > in basic linear programming or is there another layer of > complication here? I haven't read this FAQ, but there are no other layers of complication. I was just trying to show how to put SB's mimisation problem into standard form. The exact definition of standard form can vary, but one definition is minimise c.x subject to Ax>=b (c and x are vectors of length n, b is a vector of length m, A is a mxn matrix, A,b,c are given and x is the thing you are allowed to vary.) > Finally, something of a more practical nature. > > If I understand what is going on, if I were to hypothetically > supply appropriate values for EV(i,j,k,l), then, there could > exist a program which can take these numbers and feed them > into your existing program (or your program may already do > this) and take the output and spit out the optimal percentages > for doing each action for each possible hand type > (the optimal probability of doing the j's for each i, and the > l's for each k). Correct? Yes. That is what the program does. Well actually the division into two halves isn't totally clean because of the way it handles the fact that the big blind is live. (This is an annoying kludge.) > > In heads-up Hold 'em there are 169x169 possible i's and k's. > In the simple problem I proposed there are 2 options for the > SB and 2 options for the BB, making a total of 114244 numbers > which would have to be calculated before your program could > start crunching. Correct? Yes. Except that to handle the live blind, there are actually 169*2 i's and 169 k's. The big blind thinks of c65s (he has 65s and the SB has initially called) and r65s (he has 65s and the SB has initially raised) as two different hands. > > If each player has N options the amount of data therefore > goes up quadratically. What about solution time? How many > options per player can the thing realistically handle? Each step of the simplex alg. takes time proportional to the number of entries in the matrix (114244 in your example). It's basically doing a few multiplies/adds per entry. So it comes down to how many steps required by the simplex alg. and how much time to initialise the simplex alg.. You can look up the maximum number of steps taken by the simplex alg., but you will get an answer which is much bigger than the number of steps needed in practice, so that theoretical bound isn't of much use. In practice what seems to happen is that if you start with any old initialisation, the number of steps will be roughly equal to the number of rows (or is it columns? I can't remember). However the program starts at a locally optimal pure strategy, and then the simplex alg. only needs a few steps. In fact most of the time seems to be spent finding the initial optimal pure strategy, which probably takes time proportional to the number of entries in the matrix, but with a larger constant of proportionality. In summary, it depends heavily on the characterisitics of the particular problem. In poker each hand is somehow unique and special, which I think leads to almost-pure optimal stratgies and speeds up the search a lot. You could hope for time proportional to number-of-entries-in-matrix (114244 in your example) for this class of problems. In fact I think the simplex alg. is almost irrelevant when the strategies are so nearly pure. In practice I found that for limit holdem you could handle as many options as you had memory to keep the matrix, but for potlimit it got stuck if you gave it too many (probably because with several potlimit raises the numbers get big, which makes things unstable). But be warned that the number of options could grow very rapidly if you wanted to do anything remotely realistic. Just one round of betting keeps things fairly simple. ----------------------------------------------------------------------------- Exchange with Dr C. > Alex, > Thanks for the pointer to your preflop holdem page. > Fascinating stuff, as you say, but I've got a few questions. > 1. Just to check - I assume that the only legal raise is a pot raise? > 2. If R4 were allowed, would it affect the results? > 3. I assume R1 means Raise-Call, rather than Raise-Fold, > and similarly with R2, CR1 and so forth? > 4. Assuming that R1 means raise-call, I'll temporarily rename it to RC, > if you don't mind, rename R2 to RRC, rename CR1 to CRC and so on. > Okay, what about the strategies RF, RRF, CRF and so on > (Raise, but fold if oppo raises back etc.)? > Would it affect the results if these strategies were allowed? > 5. I ask 4 above particularly because I am surprised that I can't see > any genuine bluffs in your strategy, and bluffs are predicted > (albeit weakly) by our analysis of the [0,1] game > (by which I mean the game where each player receives an 0<=x<=1). > Not compelling evidence, but enough to worry me. > I'm wondering if maybe the bluffs should be played as RF, > and that's why you're not seeing them. > I'd expect (23o) to be played as 95%F + 5%RF or something like that. > a. > In general the program used (sim.c) to find these results is configurable, but the original Usenet article was written with the example of limit Holdem. By altering command-line parameters you can change the raising structure between fixed-limit and some-proportion-of-pot, and by altering a #define you can change the game between Holdem and numbers-vs-numbers (i.e., a discrete version of the [0,1] game). As far as I remember the numbers-vs-numbers works a bit less well (I mean the simplex algorithm converges less well for large pot-limit cases) than Holdem because the simplex is so degenerate (i.e., there is a large dimensional family of optimal strategies). I've only written a naive version of the simplex algorithm, and it tends to get stuck if there is massive degeneracy. In Holdem however, each hand is "special" which leads to not too much degeneracy and also an almost pure optimal strategy. This last fact makes the calculations much easier (because I take care to start the simplex algorithm it in a local optimal pure state). Answering your points: 1. (See above.) The program is configurable, but the posted Usenet results were for limit. 2. No (except in a trivial way). You can set the maximum number of raises on the command-line, and it eventually doesn't want to use any more. (Except for AA for which it is happy to raise any number of times, but this doesn't change the expectation because there is no other hand which is prepared to match it in raises.) 3. Yes 4. CR1F, R1F, etc. were allowed but they turned out not to be part of the optimal strategy for Holdem. 5. It likes to Call-Raise with some hands (e.g., KK) which is a kind of bluff. If you run the program in numbers-vs-numbers mode you should recover pure bluffs. I think I had the same thoughts as you, but interestingly Paul was not at all surprised that there are no pure-bluff Holdem hands. The other interesting thing that Paul knew intuitively beforehand was that the results would be quite meaningful in terms of real Holdem even though the calculation is only preflop. When he convinced me of this, it made it seem like much more of a worthwhile task to write the program. Alex PS Do you mind if I add the above email exchange (after anonymising) to my FAQ? > Thanks for the info. > I'm interested by the lack of bluffing, and now I have absolutely no intuition > about whether one should bluff in some other poker variant, i.e. what the > essential difference between preflop holdem and [0,1] is that affects > whether genuine bluffing is correct. > Lots of experimentation could be useful. > In a different life, this would make a good PhD subject! > > > PS Do you mind if I add the above email exchange (after anonymising) to my > > FAQ? > > No problem at all. > You don't have to anonymise on my account - I don't mind either way. > a. > Actually I was wrong that there is never bluffing. To compare like with like we should use an=0 bl=1 and r=1 or r=2, i.e., small blind is 0, (rather than half of the big blind as in the published example). In this case the big blind should occasionally run pure bluffs. I'll think about it some more. BTW exercise: if you run an=0 bl=1 it tells you that the big blind should fold some hands (e.g. 42s) when the small blind has only called, so the big blind could have checked it out for a free showdown, but decides to muck his hand. Why is this not necessarily a bug? (This had me worried before I realised what was happening.) Alex [Answer: because in this situtation small blind should fold or raise, never call. The only way to exploit BB's "mistake" of mucking 42s after a call, is for SB to call with something some of the time. But this presumably works out as a bigger "mistake". This is an example of a degenerate optimal strategy, and the program picking a non-intuitive representative. It could equally well have said "check" with 42s, but it is being confusing because it hasn't yet been instructed not to be. Another (simpler) example of something similar is numbers vs numbers using the parameters d=2 nr=1 an=0 bl=1, where BB may muck 0 after a SB call.] -----------------------------------------------------------------------------