[Rod Stephens Books]
Index Books Python Examples About Rod Contact
[Mastodon] [Bluesky]
[Build Your Own Ray Tracer With Python]

[Beginning Database Design Solutions, Second Edition]

[Beginning Software Engineering, Second Edition]

[Essential Algorithms, Second Edition]

[The Modern C# Challenge]

[WPF 3d, Three-Dimensional Graphics with WPF and C#]

[The C# Helper Top 100]

[Interview Puzzles Dissected]

Title: Add Skill Level 3 to the tic-tac-toe game in Python

[This Python tic-tac-toe program uses minimax to play a perfect game.]

(Sorry it took so long to get back to the tic-tac-toe series. I got distracted.😉)

Recall that the post Make a tic-tac-toe game in Python made a basic tic-tac-toe (noughts and crosses) game. Because it moved randomly, it was fairly easy to beat.

The post Add Skill Level 2 to the tic-tac-toe game in Python added Skill Level 2 where the program looked for a win or a block and otherwise moved randomly.

This example improves those by adding Skill Level 3. At this level, the program uses the minimax algorithm to search the game tree for the best solution possible.

Minimax

To understand how minimax works, suppose you're in the middle of the game and it's time for the computer to move. The minimax algorithm examines all possible moves and asks, "If the computer took this move, what is the best possible score the human could achieve from that board position?"

In tic-tac-toe, a score is one of WIN, LOSE, or TIE. More generally, it's useful to have a few other scores like UNKOWN (we don't have enough information to know what will happen) and NONE (no move has been made yet).

This example's TicTacToeApp defines constants to represent possible scores like this:

class TicTacToeApp: # Define board values. SCORE_NONE = 0 # No move possible. SCORE_LOSE = 1 # Not good. SCORE_TIE = 2 # Better than losing. SCORE_UNKNOWN = 3 # We still *might* win. SCORE_WIN = 4 # Best outcome.

These are arranged so larger values are better outcomes. Here I decided to be optimistic and assume that UNKNOWN is better than TIE because we still might win.

Philosophical Digression

These score values are enough for tic-tac-toe, but they don't provide enough detail for more complex games like chess, reversi, or Go because, in those games, you generally cannot tell what the eventual outcome will be.

There's actually an interesting philosophical idea here. Suppose you assume that a tie goes player X. Then there are two possibilities. First, player X can make a series of perfect moves to always force a win. Second, there is no way for player 1 to force a win, so there must be a way for player 2 to force a win.

For tic-tac-toe, that's pretty obvious. If you've played much (or even not all that much), then you probably know that player 1 cannot lose without making some mistakes.

The interesting thing is that this also applies to other deterministic games like chess. (It does not apply to non-deterministic games like poker or backgammon where randomness is a factor.) That means that for chess there is either (1) a way for player 1 to always win or (2) a way for player 2 to always win.

The catch is that there are so many possible moves and board configurations that no human (or computer at this point) can find a perfect strategy. This is related to computability theory and complexity theory in computer science. We know that it is conceivable that you could find a winning strategy because, in theory, you could enumerate every possible move and board position. In practice, the number of possible positions is ridiculously huge so, while this isn't an instance of the halting problem, it is a problem that's much too large to actually solve with today's techniques.

All of this means the simple LOW/TIE/UNKNOWN/WIN scores aren't sufficient. For those games, you need to calculate some sort of numeric estimate of a board's strength so you can compare results.

Minimax Code

If you have the program set to Skill Level 3, the following code executes when it's the computer's turn to move.

def computer_move_level_3(self): '''Use minimax.''' # Get the best move. best_value, best_move = self.get_board_value( self.computer_char, self.player_char, 1, 9) if best_move is None: print(f'Best move: {best_move}, value: {best_value}') # Make a Lveel 2 move. self.computer_move_level_2() else: print(f'Best move: [{best_move[0]}, {best_move[1]}], value: {best_value}') # Take the move. self.squares[best_move[0]][best_move[1]]['text'] = self.computer_char

The most important part of this code is the first statement where it calls get_board_value. That method, which is shown in the following code, evaluates the value of the board for the computer.

def get_board_value(self, player1, player2, depth, max_depth): '''Return the best value the board can have for player char.''' # If we are too deep, then we don't know. if depth > max_depth: return TicTacToeApp.SCORE_UNKNOWN, None # Get possible moves. moves = self.get_possible_moves() # If there are no moves, it's a tie. if len(moves) == 0: return TicTacToeApp.SCORE_TIE, None # Find the move that gives the other player the worst score. random.shuffle(moves) best_move = None player2_value = TicTacToeApp.SCORE_WIN for r, c in moves: # Have player1 take this square. self.squares[r][c]['text'] = player1 # See if this gives player1 a win. if self.is_winner(r, c): # This gives player1 a win and therefore player2 a loss. # Take this move. best_move = (r, c) player2_value = TicTacToeApp.SCORE_LOSE else: # Recursively try moves for player2. test_value, test_move = \ self.get_board_value(player2, player1, depth + 1, max_depth) # See if this is an improvement for player 2. if test_value <= player2_value: best_move = (r, c) player2_value = test_value # Unmake this move. self.squares[r][c]['text'] = '' # If player2 will lose, stop searching and take this move. if player2_value == TicTacToeApp.SCORE_LOSE: break # We now know the worst we can force player2 to do. # Convert that into a board value for player1. if player2_value == TicTacToeApp.SCORE_LOSE: best_value = TicTacToeApp.SCORE_WIN elif player2_value == TicTacToeApp.SCORE_WIN: best_value = TicTacToeApp.SCORE_LOSE else: best_value = player2_value return best_value, best_move

The get_board_value method returns the board's value for player1.

It first checks the current depth of recursion to see if we have exceeded the maximum allowed depth. This doesn't matter for program because it allows up to 9 levels of recursion, which is enough to examine every possible game. You could set the maximum depth to something smaller like 3 to create Skill Level 2.5. You might also want to adjust this for more complicated games like chess.

If the depth of recursion is greater than the maximum allowed, the method returns UNKNOWN. (If you look back at the computer_move_level_3 method, you'll see that it calls computer_move_level_2 in this case.)

If we have not exceeded the allowed depth, the program calls get_possible_moves to get a list of all possible moves.

If there are no more possible moves, then the board is filled and the game is a tie. (If it wasn't a tie, then the previous move would have won the game.)

If there are possible moves, the program randomizes them so it doesn't always make the same move in response to a given board position.

The method initializes player2_value to the best possible value that player 2 could get from the board: WIN. It then loops through the possible moves.

For each move, the code takes that move for player 1. If that move lets player 1 win, the program saves that move and sets player2_value to LOSE.

If the move doesn't give player 1 a win, the get_board_value method calls itself recursively to see what the best possible result could be for player 2. If that's a better result than the best result found so far, the code updates best_move and player2_value.

After checking the current move, the program unmakes that move.

If the best value that player 2 can get from this move is a loss, that's the best possible outcome for player 1, so the code breaks out of the loop to take that move.

After it has check all of the moves, the method knows what the worst possible move is for player 2. It converts that into the best possible move for player 1 (for example, if player 2 wins, then player 2 loses) and returns the result.

Improvements

That's minimax in a nutshell: Examine every possible move, see which one gives your opponent the worst outcome, and make that move.

This isn't too hard for a game like tic-tac-toe, but the details can sometimes be tricky for more complex games where you need to use heuristics to estimate the value of a board position.

Minimax is pretty useful but it can cause a few weird effects. For example, in one that I call "giving up," the program notices that its opponent has a winning strategy. In that case, it doesn't matter what move it makes so it essentially moves randomly. For example, it might not move to block an immediate win because it assumes its opponent will move perfectly and it doesn't distinguish between losing in one move or losing in five moves. You could make it give a higher value to a later loss.

You may also notice that the program takes a few seconds to make its first move or two and then moves much more quickly after that. This happens because the game tree is bigger during the first few moves. he following table shows the number of times get_board_value is called when you make various first moves.

First Move# Function Calls
Computer moves first68,576
Upper middle11,736
Center5,598
Upper left4,811

This only takes a few seconds, but could be far worse with a more complex game. You can avoid some of the startup time if you precompute the computer's responses to the possible first moves. Then the program can just look up its first move (or first several moves for a more complicated game) and greatly reduce the time needed in the beginning of a game.

Conclusion

That's enough discussion of this little program. If you're interested in this kind of game programming, you may want to look into other heuristics. For example, if a chess program sees a chain of exchanges with the other player, it may follow that chain more deeply than usual to see if it would come out ahead. A long series of exchanges might give it an advantage. It would also reduce the number of pieces on the board making it easier to search the game tree.

Download the example to see all of the details and to experiment with it. Don't forget to switch to Skill level 3 or you'll wonder why the program plays so badly.

© 2024 Rocky Mountain Computer Consulting, Inc. All rights reserved.