Wonder why it is virtually impossible for an average person to beat a computer in strategic board games like chess, go, or tic tac toe? That’s because computers can literally see into the future (or futures, to be precise) of the game and decide the upcoming move accordingly. The class of algorithms exploited in this context is known as Adversarial Search. The core motive of such algorithms is to play defensively and resist the opponent from winning at any cost. So until you sit with a piece of paper and try to simulate the outcome up to a certain depth before finalizing any moves, odds are pretty high you lose.
The Minimax Algorithm
As stated earlier, AIs which employ the adversarial search, play in a defensive manner, with a prime target not to win, but to minimize the loss as much as the situation allows. This is known as loss aversion. Note that in case such decisive games mentioned above, the entire game can be represented by a decision tree, where each initial decision leads to a specific, unique leaf.
In the Minimax algorithm, the terminal nodes (called leaves) are each assigned a value (more on this value assignment later). Two players, called Mini(traditionally the AI player) and Max(the human opponent), are playing. Initial move is always made by Max. The incitement of each player is to maximize the utility upon their turn and to minimize on the turn of the opponent. Here’s a typical game tree,
Notice that the first move is made by Max and the game continues with alternating players. The algorithm as written in plain English is,
- Find all the possible moves from the current stage of the game
- For each possible moves, recur down the hierarchy until a leaf is found
- If the current node is a leaf, do the following,
3a. If the player is Max, return the maximum of scores of the current leaf node and its siblings
3b. Else, return the minimum
4. After the recursion completes and algorithm returns to the site of initial call, decide based on the computed score and current player (whether Min or Max) which path to take.
The algorithm is a back-tracking one.
How long does it take
Say, the time it takes to expand a node happens to be a constant and is of the value c and we’ve got a tree with a branching factor of 3, i.e. each time any node is expanded, it expands into 3 more nodes (usually the branching factor decreases with depth, as in tic-tac-toe, as you’ll see later in this episode). Observe, initially there is one node, the root, nodes at level 0 is 3⁰ (=1), it expands into 3 nodes as the branching factor is 3, so at level 1 there are 3¹ nodes, in the next level each of 3 node expands into 3 more nodes, so at level 2 there are 3x3 = 9 =3² nodes. Noticed the pattern? At level d, there are 3^d nodes. If branching factor is b then at level d, there are b^d nodes.
Now the total effort required to expand all the nodes is the sum of the nodes expanded at each level. Like at level 0, its c.3⁰ = c, at level 1 its c.3¹ = 3c and so on. The total effort for branching factor b and depth d is,
total effort = c.b⁰ + c.b² + c.b³ + … + c.b^d
Considering the largest term and omitting the constant part, the time complexity becomes O(b^d).
For games like chess, the average number of possible moves at any given instance is 35 and the average depth is about 100, which states that to simulate a chess game from any given point would require about 35¹⁰⁰ or 2.55x10¹⁵⁴ nodes be expanded, and that’s only the average, not the worst case possible!
Can we do better?
Minimax is a great leap forward for AIs competing with human counterparts but is way too naive for most of the strategic games. The fact that it expands each and every node up to the leaves makes it stumble on the ground. The AI can be taught to cut off some branches if they do not seem promising enough. This is called α-β pruning. This is not an algorithm by itself, rather like Minimax 2.0.
The fundamental idea is to maintain two variables called alpha and beta that keep track of how useful it is to expand the next node. Alpha is maximized while beta minimizes. The algorithmic pseudocode for a tic tac toe with α-β pruning is given in the next section.
The algorithm can almost readily be reproduced into a running python implementation with a few syntactic changes.
A couple of functions need to be implemented to get the flavor of how the algorithm runs.
- +/- inf denote positive and negative infinities. winning() is a utility function that checks the board configuration to tell if a player is winning. findEmptySlots() runs through every cell to find an empty one. The logic here is to return -10 upon losing, +10 upon winning and 0 on a tie — from the perspective of Min (AI player) with the added compensation, hence subtracting the depth.
Let’s actually employ the algorithm in an actual near end tic tac toe gam and see it in action. Consider the following stage of the game, O for AI, X for human.
Its got 3 empty slots. From here on, 11 new nodes can be expanded as follows, obviously there’s no single move at this stage that can lead to the victory of AI,
Here’s the convention,
- the tuple inside each is unique by at least one value
- the first value denotes the number of the node as per order of traversal, like node (3, 1, 2) is the 3rd node to be visited
- second value denotes the depth and thrid denotes number of empty slots at this stage
- numbers on the edge also denote the depth, red numbers below each leaf denotes the final score, +/- (10-depth)
- finally, the red circled nodes are the ones pruned
Give it a go yourself, the algorithm should prune off 4 nodes.
About how much better is alpha-beta pruning as compared to Minimax, this depends mostly on the ordering of the nodes, if the left-most node contains the best path, the algorithm prunes the other branches, if not — the same time is required as Minimax if not worse, in order to handle the alpha and beta variables. The average complexity may be summed off to O(sqrt(b^d)).