An Investigation of an Adaptive Poker Player - Introduction
September 4th, 2008Game playing has a long research history. Chess has received particular interest
culminating in Deep Blue beating Kasparov in 1997, albeit with specialized hardware
(Hamilton, 1997) and brute force search. However, although arguably, being a ‘solved
game’ chess still receives interest as researchers turn to adaptive learning techniques
which allow computers to ‘learn’ to play chess, rather than being ‘told’ how it should
play (Kendall, 2001). Adaptive learning was being used for checkers as far back as
the 1950’s with Samuel’s seminal work (1959, re-produced in Samuel, 2000).
Checkers research would lead to Jonathan Schaeffer developing Chinook, which
claimed the world title in 1994 (Schaeffer, 1996). Like Deep Blue, it is arguable if
Chinook used AI techniques. Chinook had an opening and ending database. In certain
games it was able to play the entire game from these two databases. If this could not
be achieved, a form of mini-max search, with alpha-beta pruning was used. Despite
Chinook becoming the world champion, the search has continued for an adaptive
checkers player. Chellapilla and Fogel’s (Chellapilla, 2000) Anaconda was named
due to the strangle hold it placed on its opponent. It is also named Blondie24, this
being the name it used when competing in internet games (Fogel, 2001). Anaconda
uses an artificial neural network (ANN), with 5000 weights, which are evolved by an
evolutionary strategy. The inputs to the ANN are the current board position and it
outputs a value which is used in a mini-max search. During the training period, using
co-evolution, the program is given no information other than whether it won or lost.
Once Anaconda is able to play at a suitable level, it often searches to a depth of 10,
but depths of 6 and 8 are also common in play. Anaconda has been available to the
delegates at the Congress on Evolutionary Computing (CEC) conference for the past
two years (CEC’00, San Diego and CEC’01, Seoul) with Fogel offering a prize of
$100 (CEC’00) and $200 (CEC’01) to anybody who could defeat it. The prize
remains unclaimed and at the next conference (CEC’02, Hawaii), the prize rises to
$300.
Poker also has an equally long research history with von Neumann and Morgensten
(von Neumann, 1944) experimenting with a simplified, two-player, version of poker.
Findler (Findler, 1977) studied poker, over a 20 year period. He also worked on a
simplified game, based on 5-card draw poker with no ante and no consideration of
betting position due to the computer always playing last. He concluded that dynamic
and adaptive algorithms are required for successful play and static mathematical
models were unsuccessful and easily beaten.
In more recent times three research groups have been researching poker. Jonathan
Schaeffer (of Chinook fame) and a number of his students have developed ideas
which have led to Loki, which is, arguably, the strongest poker playing program to
date. It is still a long way from being able to compete in the World Series of Poker
(WSOP), an annual event held in Las Vegas, but initial results are promising.
Schaeffer’s work concentrates on two main areas (Billings, 1998a and Schaeffer,
1999). The first research theme makes betting decisions using probabilistic
knowledge (Billings, 1999) to determine which action to take (fold, call or raise)
given the current game state. Billings et. al. also uses real time simulation of the
remainder of the game that allows the program to determine a statistically significant
result in the program’s decision making process. Schaeffer’s group also uses
opponent modeling (Billings, 1998b). This allows Loki to maintain a model of an
opponent and use this information to decide what betting decisions to make.
Koller and Pfeffer (Koller, 1997), using their Gala system, allow games of
imperfect information to be specified and solved, using a tree based approach.
However, due to the size of the trees they state “…we are nowhere close to being able
to solve huge games such as full-scale poker, and it is unlikely that we will ever be
able to do so.”
Luigi Barone and Lyndon While recognise four main types of poker player; Loose,
Tight, Passive, and Aggressive. These characteristics are combined to create the four
common types of poker players: Loose Passive, Loose Aggressive, Tight Passive and
Tight Aggressive players (Barone & While, 1999; 2000). A Loose Aggressive player
will overestimate their hand, raising frequently, and their aggressive nature will drive
the pot higher, increasing their potential winnings. A Loose Passive player will
overestimate their hand, but due to their passive nature will rarely raise, preferring to
call and allow other players to increase the pot. A Tight Aggressive player will play
to close constraints, participating in only a few hands which they have a high
probability of winning. The hands they do play, they will raise frequently to increase
the size of the pot. A Tight Passive player will participate in few hands, only
considering playing those that they have a high probability of winning. The passive
nature implies that they allow other players to drive the pot, raising infrequently
themselves.
In their first paper Barone and While (Barone, 1998) suggest evolutionary
strategies as a way of modelling an adaptive poker player. They use a simple poker
variant where each player has two private cards, there are five community cards and
one round of betting. This initial work incorporates three main areas of analysis; hand
strength, position and risk management. Two types of tables are used, a loose table
and a tight table. The work demonstrates how a player that has evolved using
evolutionary strategies can adapt its style to the two types of table.
In (Barone, 1999) they develop their work by introducing a hypercube which is an
n dimensional vector, used to store candidate solutions. The hypercube has one
dimension for the betting position (early, middle and late) and another dimension for
the risk management (selected from the interval 0..3). At each stage of the game the
relevant candidate solutions are selected from the hypercube (e.g. middle betting
position and risk management 2) and the decision is made whether to fold, call or
raise. To make the decision the hypercube entry holds seven real valued numbers
which are used as constants to three functions (fold, call and raise). In effect, the
functions lead to a probability of carrying out the relevant action. It is the seven real
values that are evolved depending on whether the player won the hand or not. Barone
reports that this poker player improves on the 1998 version. Their 2000 paper
(Barone, 2000) extends the dimensions of the hypercube to include four betting
rounds (pre-flop, post-flop, post-turn and post-river) and an opponent dimension so
that the evolved player can choose which type of player it is up against. The authors
report this player out performs a competent static player.
Poker, being a game of imperfect information, is interesting as a game for the basis
of research. Unlike chess and checkers, poker has some information that is unseen.
Poker also contains other unknowns such as the playing styles of the other players
who may use bluffing (and double bluffing) during the course of the game. These
elements add to the research interest. Unlike complete information games where the
techniques to solve the games (computational power allowing) have been known and
understood for a long time (such as mini-max search and alpha-beta pruning), games
of imperfect information have not received the same sort of analysis and, doing so,
could prove relevant to many other areas such as economics, on-line auctions and
negotiating.