Texas Hold'em: AI Now Outperforms Humans in Poker

baoshi.rao

Starting from the theory of perfect-information games, the author analyzes why AI can defeat humans in Texas Hold'em poker.

Recently, influenced by friends around me, I developed a liking for Texas Hold'em, enjoying the psychological battle with fellow players and the thrill of "gambling." I’ve always believed that the greatest charm of Texas Hold'em lies in its blend of rationality, emotion, and courage, with so much uncertainty that it differs from other card games. It’s not something you can master just by learning a few tricks or strategies to completely outplay your opponents.

After returning to school, I resumed my daily "battles" with AI. To my surprise, AI has now surpassed humans in Texas Hold'em poker! The competition lasted 20 days, pitting four professional human players—Jason Les, Dong Kim, Daniel McAulay, and Jimmy Chou—against the AI program Libratus. Over 120,000 hands were played, competing for a $200,000 prize. The final result was that "the human players were never ahead overall during the competition."

As the days passed, the gap between AI and human players became increasingly apparent.

In recent years, with the continuous advancement of technology, instances of computers defeating humans have become commonplace. As a "programmer girl," I’m somewhat of an insider, having participated in computer game and robotics competitions during my undergraduate studies and gained some basic understanding of AI during my master's. Today, I’ll explain from a rational perspective how AI manages to defeat humans.

Texas Hold'em is currently the most popular poker game in the world. To win, the first possibility is that your hand is stronger than everyone else’s, and the second is using betting strategies to bluff opponents with weaker hands into folding. This element of chance is what makes Texas Hold'em so fascinating.

Why is it difficult for AI to defeat humans in Texas Hold'em? What’s the difference between Texas Hold'em and Go for AI? First, we need to understand the distinction between perfect-information games (like chess) and imperfect-information games (like poker).

Perfect-information games are those where subsequent players can observe the actions of earlier players. In games like chess, both sides share all information—for example, in Go, both players can see all the moves made and assess each other’s strengths and weaknesses.

In contrast, poker, negotiations, and business decisions involve imperfect-information games, where players don’t know each other’s choices when making decisions. In other words, the decision-maker doesn’t fully understand the environment they’re operating in.

In Texas Hold'em, even if an opponent goes all-in, we don’t know what cards they hold. This information asymmetry forces players to adopt a "gamble" mentality. This is also why finance professionals and investors love playing Texas Hold'em.

For games like Go, it’s a zero-sum perfect-information game, meaning at any point, both players know the complete state of the game (perfect information), and the outcome is either a win or a loss after a finite number of moves (zero-sum). Knowing the finite states of the game, computers can use brute-force enumeration to calculate all possible future moves, forming a massive search tree. This tree can list all possible moves from the current state, and each subtree can be solved independently, allowing the computer to devise strategies for victory.

For example, imagine Xiao Ming, a child from an ordinary Chinese family, faces many choices in life. How can he reach the pinnacle of success? If we could list all his future possibilities and break down each choice into "sub-futures," we could calculate the path with the highest chance of success. (The example might not be perfect, but you get the idea.)

So, if we had infinite computing resources, we could break down a game into sub-games (listing all possibilities) and calculate the strategy with the highest chance of winning.

However, even for relatively simple games like chess, the branching factor is around 40, meaning predicting 20 moves ahead requires calculating 40^20 possibilities (which would take a 1GHz processor 3,486,528,500,050,735 years to compute). And that’s just for chess.

To manage this, scientists use algorithms like pruning and search optimization to reduce the computational scope and find the best strategy within a limited time.

So, when I saw AI defeating humans in Texas Hold'em, I was a little excited—maybe in the future, robots will also have a woman’s sixth sense.

How exactly does AI defeat humans?

The paper mentions many complex algorithms (which I don’t fully understand myself, haha). To simplify, let’s use a simple game model to illustrate how clever AI works.

Players A and B play a game. A flips a coin, and only they can see the result. After flipping, A has two choices: ① sell the coin; ② play a game with B.

① If A chooses to sell:

② If A chooses to play: The game continues, and B must guess whether the coin landed heads or tails.

For B, this is an imperfect-information game—they can’t deduce A’s coin result from A’s choice to play.

In one extreme case, if B always guesses heads, a clever A would adjust their strategy: selling the coin if it lands heads and playing if it lands tails, ensuring B always loses. A’s expected score would be:

0.5 (probability of heads) * 0.5 (score for selling) + 0.5 (probability of tails) * 1 (score for playing) = 0.75

If B always guesses tails, A would play if the coin lands heads (earning 1 point) and sell if it lands tails (losing 0.5 points). A’s expected score would be:

0.5 (probability of heads) * 1 (score for playing) + 0.5 (probability of tails) * (-0.5) (score for selling) = 0.25

This introduces the concept of Nash equilibrium, where B’s optimal strategy is to guess heads 25% of the time and tails 75% of the time to minimize losses.

Since games are dynamic, if B sticks to a fixed strategy, A will adjust theirs accordingly. Thus, B’s safest approach is to continuously update the expected payoff of A’s selling decision to find the optimal solution.

Our clever computer uses this method to dynamically calculate the expected returns of opponents’ bets, "continuously updating their strategies", ultimately achieving the result where "the human players were never ahead overall."

It seems that to outsmart AI, humans might need to think even faster—perhaps having no fixed strategy is the best strategy.

They say finance moguls love Texas Hold'em. After reading this, do they have any new insights?