Strategic Decision-Making in Complex Games

Symposium Thesis Defense Marc Ponsen
15th June 2011, 9:30-13:00
Department of Knowledge Engineering, Bouillonstraat 8-10, Maastricht
Room 0.015

Program

09:30 – 09:40 Welcome (karl Tuyls)
09:40 Jan Ramon (Katholieke Universiteit Leuven)
Making trade-offs between information, safety and immediate reward in complex domains
Ever since the early days of artificial intelligence, games have formed an important testbed for research, as they provide controllable environments for which human experts are available. In this presentation we look at games in the broadest possible sense. We illustrate with several examples that methods developed for ‘pure’ games can be successfully applied for decision making in applications with significant importance in daily life. We point out a number of links concerning the difficult trade-off between information, safety and immediate reward, especially in the context of active learning and planning.
10:25 Marc Ponsen (Maastricht University)
Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling
This talk discusses two of our contributions in the field of decision-making in complex partially observable stochastic games, focusing on the game of Poker. First, we apply two state-of-the-art search techniques that use Monte-Carlo sampling to the task of computing a Nash-Equilibrium strategy (NES) in such games, namely Monte-Carlo Tree Search (MCTS) and Monte-Carlo Counterfactual Regret Minimization (MCCFR). Our second contribution relates to the observation that Nash-Equilibrium strategies are not necessarily best to deal with clearly irrational opposition (i.e., players not playing a NES). A tailored best-response strategy can yield more profit. Pure best-response strategies however may be exploitable by any other strategy except for the one they were learned against. We present Monte-Carlo Restricted Nash Response (MCRNR), a sample-based algorithm for the computation of restricted Nash strategies, which are robust best-response strategies that (1) exploit irrational opponents more than Nash-Equilibrium strategies and (2) are not (overly) exploitable by other strategies.
11:10 Coffee break
11:30 Guy van den Broek (Katholieke Universiteit Leuven)
Monte-Carlo tree search for multi-player, no-limit Texas Hold’em Poker
This talk presents a number of adaptations that allow the application of Monte-Carlo Tree Search (MCTS) to the field of computer Poker, more specifically No-Limit Texas Hold’em. The hidden information in Poker results in so called miximax game trees where opponent decision nodes have to be modeled as chance nodes. The probability distribution in these nodes is modeled by an opponent model that predicts the actions of the opponents. We propose a modification of the standard MCTS selection and backpropagation strategies that explicitly model and exploit the uncertainty of sampled expected values. The new strategies are evaluated as a part of a complete Poker bot that is, to the best of our knowledge, the first exploiting no-limit Texas Hold’em bot that can play at a reasonable level in games of more than two players.
12:15 David Aha (Naval Research Laboratory, USA)
Goal Reasoning
Autonomous agents are typically told what goal(s) they should pursue, and perhaps how to decompose them, but lack the ability to independently decide what goals they should pursue. The ability to determine one’s goals is crucial when interacting in complex environments where future state expectations can be violated, either because unexpected opportunities arise or plans fail. Goal reasoning agents can dynamically self-select their goals by representing them explicitly, reasoning about the cause of unexpected states, selecting or formulating new goals in response, and managing its set of pending goals. I will summarize our progress and related work on investigating algorithms for goal reasoning, including studies pertaining to the control of agents in games and related training simulators.