Minimax explained for an idiot - python

I've wasted my entire day trying to use the minimax algorithm to make an unbeatable tictactoe AI. I missed something along the way (brain fried).
I'm not looking for code here, just a better explanation of where I went wrong.
Here is my current code (the minimax method always returns 0 for some reason):
from copy import deepcopy
class Square(object):
def __init__(self, player=None):
self.player = player
#property
def empty(self):
return self.player is None
class Board(object):
winning_combos = (
[0, 1, 2], [3, 4, 5], [6, 7, 8], [0, 3, 6], [1, 4, 7], [2, 5, 8],
[0, 4, 8], [2, 4, 6],
)
def __init__(self, squares={}):
self.squares = squares
for i in range(9):
if self.squares.get(i) is None:
self.squares[i] = Square()
#property
def available_moves(self):
return [k for k, v in self.squares.iteritems() if v.empty]
#property
def complete(self):
for combo in self.winning_combos:
combo_available = True
for pos in combo:
if not pos in self.available_moves:
combo_available = False
if combo_available:
return self.winner is not None
return True
#property
def player_won(self):
return self.winner == 'X'
#property
def computer_won(self):
return self.winner == 'O'
#property
def tied(self):
return self.complete == True and self.winner is None
#property
def winner(self):
for player in ('X', 'O'):
positions = self.get_squares(player)
for combo in self.winning_combos:
win = True
for pos in combo:
if pos not in positions:
win = False
if win:
return player
return None
#property
def heuristic(self):
if self.player_won:
return -1
elif self.tied:
return 0
elif self.computer_won:
return 1
def get_squares(self, player):
return [k for k,v in self.squares.iteritems() if v.player == player]
def make_move(self, position, player):
self.squares[position] = Square(player)
def minimax(self, node, player):
if node.complete:
return node.heuristic
a = -1e10000
for move in node.available_moves:
child = deepcopy(node)
child.make_move(move, player)
a = max([a, -self.minimax(child, get_enemy(player))])
return a
def get_enemy(player):
if player == 'X':
return 'O'
return 'X'

Step 1: Build your game tree
Starting from the current board generate all possible moves your opponent can make.
Then for each of those generate all the possible moves you can make.
For Tic-Tac-Toe simply continue until no one can play. In other games you'll generally stop after a given time or depth.
This looks like a tree, draw it yourself on a piece of paper, current board at top, all opponent moves one layer below, all your possible moves in response one layer below etc.
Step 2: Score all boards at the bottom of the tree
For a simple game like Tic-Tac-Toe make the score 0 if you lose, 50 tie, 100 win.
Step 3: Propagate the score up the tree
This is where the min-max come into play. The score of a previously unscored board depends on its children and who gets to play. Assume both you and your opponent always choose the best possible move at the given state. The best move for the opponent is the move that gives you the worst score. Likewise, your best move is the move that gives you the highest score. In case of the opponent's turn, you choose the child with the minimum score (that maximizes his benefit). If it is your turn you assume you'll make the best possible move, so you choose the maximum.
Step 4: Pick your best move
Now play the move that results in the best propagated score among all your possible plays from the current position.
Try it on a piece of paper, if starting from a blank board is too much for you start from some advanced Tic-Tac-Toe position.
Using recursion:
Very often this can be simplified by using recursion. The "scoring" function is called recursively at each depth and depending on whether or not the depth is odd or even it select max or min respectively for all possible moves. When no moves are possible it evaluates the static score of the board. Recursive solutions (e.g. the example code) can be a bit trickier to grasp.

As you already know the idea of Minimax is to deep search for the best value, assuming the opponent will always play the move with the worst value (worst for us, so best for them).
The idea is, you will try to give a value to each position. The position where you lose is negative (we don't want that) and the position where you win is positive. You assume you will always try for the highest-value position, but you also assume the opponent will always aim at the lowest-value position, which has the worst outcome for us, and the best for them (they win, we lose). So you put yourself in their shoes, try to play as good as you can as them, and assume they will do that.
So if you find out you have possible two moves, one giving them the choice to win or to lose, one resulting in a draw anyway, you assume they will go for the move that will have them win if you let them do that. So it's better to go for the draw.
Now for a more "algorithmic" view.
Imagine your grid is nearly full except for two possible positions.
Consider what happens when you play the first one :
The opponent will play the other one. It's their only possible move so we don't have to consider other moves from them. Look at the result, associate a resulting value (+∞ if won, 0 if draw, -∞ if lost : for tic tac toe you can represent those as +1 0 and -1).
Now consider what happens when you play the second one :
(same thing here, opponent has only one move, look at the resulting position, value the position).
You need to choose between the two moves. It's our move, so we want the best result (this is the "max" in minimax). Choose the one with the higher result as our "best" move. That's it for the "2 moves from end" example.
Now imagine you have not 2 but 3 moves left.
The principle is the same, you want to assign a value to each of your 3 possible moves, so that you can choose the best.
So you start by considering one of the three moves.
You are now in the situation above, with only 2 possible moves, but it's the opponent's turn. Then you start considering one of the possible moves for the opponent, like we did above. Likewise, you look at each of the possible moves, and you find an outcome value for both of them. It's the opponent move, so we assume they will play the "best" move for them, the one with the worst turnout for us, so it's the one with the lesser value (this is the "min" in minimax). Ignore the other one ; assume they will play what you found was best for them anyway. This is what your move will yield, so it's the value you assign to the first of your three moves.
Now you consider each of your other possible 2 moves. You give them a value in the same manner. And from your three moves, you choose the one with the max value.
Now consider what happens with 4 moves. For each of your 4 moves, you look what happens for the 3 moves of your opponent, and for each of them you assume they will choose the one that gives you the worst possible outcome of the best of the 2 remaining moves for you.
You see where this is headed. To evaluate a move n steps from the end, you look at what may happen for each of the n possible moves, trying to give them a value so that you can pick the best. In the process, you will have to try to find the best move for the player that plays at n-1 : the opponent, and choose the step with the lesser value. In the process of evaluating the n-1 move, you have to choose between the possible n-2 moves, which will be ours, and assume we will play as well as we can at this step. Etc.
This is why this algorithm is inherently recursive. Whatever n, at step n you evaluate all possible steps at n-1. Rinse and repeat.
For tic-tac-toe todays machines are far powerful enough to compute all possible outcomes right off from the start of the game, because there are only a few hundred of them. When you look to implement it for a more complex game, you will have to stop computing at some point because it will take too long. So for a complex game, you will also have to write code that decides whether to continue looking for all possible next moves or to try to give a value to the position now and return early. It means you will also have to compute a value for position that is not final - for example for chess you would take into account how much material each opponent has on the board, the immediate possibilities of check without mate, how many tiles you control and all, which makes it not trivial.

Your complete function is not working as expected, causing games to be declared tied before anything can happen. For instance, consider this setup:
>> oWinning = {
1: Square('X'),
3: Square('O'), 4: Square('X'),
6: Square('O'), 8: Square('X'),
}
>> nb = Board(oWinning)
>> nb.complete
True
>> nb.tied
True
This should be a win for the computer on the next move. Instead, it says the game is tied.
The problem is that your logic in complete, right now, checks to see if all of the squares in a combo are free. If any of them are not, it presumes that that combo can't be won with. What it needs to do is check if any positions in that combo are occupied, and so long as all of those combos are either None or the same player, that combo should be considered still available.
e.g.
def available_combos(self, player):
return self.available_moves + self.get_squares(player)
#property
def complete(self):
for player in ('X', 'O'):
for combo in self.winning_combos:
combo_available = True
for pos in combo:
if not pos in self.available_combos(player):
combo_available = False
if combo_available:
return self.winner is not None
return True
Now that I properly tested this with the updated code I'm getting the expected result on this test case:
>>> nb.minimax(nb, 'O')
-1
>>> nb.minimax(nb, 'X')
1

Related

I'm trying to simulate a coin flip game between two people using OOP, but for some reason my for loop through 1000 is only going through once

The point of this game is to have two people flip a coin, and if the first person gets heads and the second person gets heads, the first person wins, but if the second person gets the opposite coin they win. My code's output just displays "True" a thousand times, but I have a for loop in my method that isn't working?
import numpy as np
class Students():
def __init__(self,flip,history):
self.flip=flip
self.history=history
def flipcoin(self):
for self.flip in range (0,1000):
self.flip= np.random.random()
if (self.flip<0.5):
self.flip=0
else:
self.flip=1
print (str(self.flip))
self.history= self.flip
print(self.history)
return (str(self.flip))
student1=Students(flip=0,history=[])
student1.flipcoin()
student2=Students(flip=0,history=[])
student2.flipcoin()
for Students in range (0,1000):
if (student1==student2):
print('False')
else:
print('True')
print(student1.flip,student1.history)
So to answer your immediate question, your first problem is here:
for Students in range(0, 1000):
if student1 == student2:
print('False')
else:
print('True')
What you are comparing here are two instances of "Students". Since those are different instances, they are not equal. (And I'm not sure if you have your print statements reversed -- when that comparison returns False, the code prints 'True'.)

Range value changing when indexing a list backwards

I'm creating a chess game inside of Python (pygame) and in my validating moves function, I access a list of all possible moves.
However, as I'm removing from that list, I index it backwards. However, when I implement that change, the amount of repetitions undergone by the for loop encompassing the index changes from 20 to 1.
Here's the full function code:
def valid_move_2():
global white_to_move
possible_moves = generate_possible_moves()
print(range(len(possible_moves)))
for i in range(len(possible_moves)-1, -1, -1):
print("possible moves range")
print(str(possible_moves[i][0]))
move(possible_moves[i][0], possible_moves[i][1], possible_moves[i][2], possible_moves[i][3])
white_to_move = not white_to_move
check_check()
if check_check():
possible_moves.remove(possible_moves[i])
white_to_move = not white_to_move
undo_move()
if len(possible_moves) == 0:
if check_check():
checkmate = True
else:
stalement = True
else:
checkmate = False
stalemate = False
return possible_moves
To be specific:
print(range(len(possible_moves)))
This line returns 20.
However this line:
print("possible moves range")
only returns once, meaning the for loop repeats only once.
Where have I gone wrong?
THE ISSUE
The issue lies with this function, where my program gets stuck:
def square_under_attack(posx, posy):
print("SQUARE UNDER ATTACK FUNCTION CALLED")
global white_to_move
white_to_move = not white_to_move
enemy_moves = generate_possible_moves()
white_to_move = not white_to_move
x=0
for opponent_move in enemy_moves:
print(x)
if opponent_move[3] == posx and opponent_move[4] == posy: # if the opponent can move to the square being tested
print("returned true")
return True
x+=1
print("returned false")
return False
In this function, it gets stuck in the for loop. The x value was for troubleshooting to find out how many times the loop is iterated before not returning anything.
x is printed to have a value of 2 without either of the other 2 print functions being called at all.
What's wrong?
print(range(len(possible_moves)))
This shouldn't return 20.
It should return something like
range(0,20)
If so then it's problem of your for loop not range.
possible_moves.remove(possible_moves[i])
This line in particular is problematic.
If you are trying to remove i th element in possible_moves, use del
del possible_moves[i]
Also there are some dangerous bits in your code, one is global variable. Try to find another way to do that without using a global variable.
Another is your loop is iterated by initial length of possible_moves, and in the loop you are possibly removing something from possible_moves. This may cause index error. Easy fix would be to create a new list to keep track of what is deleted and what is not deleted.
If this doesn't solve error then there must be some issues in called method/functions
Try refactoring it with recursion since I think your iteration is very unorderly and possibly has repeated codes all over your code base.
If possible, use list comprehension instead of iteration to make things more concise.
If you want to do something like "Do until list L is empty" then use following pattern
while L:
do_something()
For your problem, chess, I would design it like this:
First define a function for each chess pieces that returns a set of grids they can go on next turn.
For check and checkmate I would look for a union of all the sets which represents all their possible next moves.
I think you are trying to implement your program by trying out every single enemy moves and then undoing it which is incredibly inefficient approach
You should make all the methods very concise and atomic. You shouldn't make them convoluted and do many implicit things under the hood. Keep them simple and logically concise as much as possible

MemoryError in the Nim Sum game on Leetcode

I'm trying to solve the following problem from https://leetcode.com/problems/nim-game/description/:
You are playing the following Nim Game with your friend: There is a heap of stones on the table, each time one of you take turns to remove 1 to 3 stones. The one who removes the last stone will be the winner. You will take the first turn to remove the stones.
Both of you are very clever and have optimal strategies for the game. Write a function to determine whether you can win the game given the number of stones in the heap.
For example, if there are 4 stones in the heap, then you will never win the game: no matter 1, 2, or 3 stones you remove, the last stone will always be removed by your friend.
I came up with the following solution, using a bottom-up dynamic programming approach:
class Solution(object):
def canWinNim(self, n):
"""
:type n: int
:rtype: bool
"""
if n <= 3:
return True
win = [None for _ in range(n+1)]
win[1] = win[2] = win[3] = True
for n in range(4, n+1):
win[n] = not all([win[n-i] for i in [1, 2 ,3]])
return win[n]
However, when I try to submit this, I get a MemoryError:
Since the exact test case for which this MemoryError arises is not given, I'm struggling to see what is causing the problem. Any ideas?
Indeed as pointed out by Kenny Ostrom, probably the test cases are for such large n that there is insufficient memory to store the answers generated in the 'bottom up' dynamic programming approach. By running it on a smaller example, I noticed that the answer is simply
bool(x % 4)

Checkers algorithm: how to reduce nested for loops

I’m trying to build a program that plays draughts/checkers. At the moment I’m trying to make the function, that allows the computer to make and evaluate moves. My idea is to have the computer look at all it’s own possible moves and for each of these moves, look at the possible opponents moves and then for each of these moves, again look at it’s own possible moves.
With each ply it will evaluate if the move is good or bad for the player and assign points, at the end it picks the moves with the highest points.
So far I have managed to get a version of this working, but involves a lot of nested for loops. The code is a mess and not very readable at the moment, but this is a simple model of the same concept. Instead of evaluating and producing more lists, it just multiplies by two for the new list.
counter = 0
for x in list:
counter += 1
list_2 = [x * 2 for x in list]
print 'list_2', list_2, counter
for x in list_2:
counter += 1
list_3 = [x * 2 for x in list_2]
print 'list_3',list_3, counter
for x in list_3:
counter += 1
list_4 = [x * 2 for x in list_3]
print 'list_4', list_4, counter
If I run this code, I get what I want, except that I can't easily control the depth of the search without copying in more for loops. I thought recursion might be a way of doing this, but I can’t figure out how to stop the recursion after x levels of search depth.
Is there a better way of getting the same output form the code above, while getting rid of all the for loops? If I can get that to work, I think I can do the rest myself.
Here's an equivalent function that uses recursion. It controls the recursion with two parameters which track the current depth and maximum depth. If current depth exceeds the maximum depth it will return immediately thus stopping the recursion:
def evaluate(l, max_depth, cur_depth=0, counter=0):
if cur_depth > max_depth:
return counter
for x in l:
counter += 1
l2 = [x * 2 for x in l]
print cur_depth, l2, counter
counter = evaluate(l2, max_depth, cur_depth + 1, counter)
return counter
If called with max_depth=2 it will produce the same output except that instead of variable name the current depth is printed.
I thought recursion might be a way of doing this, but I can’t figure out how to stop the recursion after x levels of search depth.
Your intuition is correct, and a simple a way of doing this would be to have an incrementing number passed to each level. When the recursion gets the maximum value then the recursion is completed. A trivial example is below to demonstrate.
def countup(i=0):
print(i)
if i==MAX_COUNT: return
countup(i+1)
For your algorithm, you need a value to represent the board evaluation. For instance in the range [-1,1]. Player A could be said to be winning if the evaluation is -1 and Player B is winning if the evaluation is 1 for example. A recursive algorithm could be as follows.
def evaulate(board, player, depth=0):
if depth==MAX_DEPTH: return hueristicEvaluation(board)
bestMove = None
if player==PLAYER_A:
val=999 # some large value
for move in get_moves():
newboard = board.makeMove(move)
eval, _ = evaluate(newboard, PLAYER_B, depth+1)
if eval < val:
bestMove = move
val = eval
elif player==PLAYER_B:
val=-999 # some large negative value
for move in get_moves():
newboard = board.makeMove(move)
eval, _ = evaluate(newboard, PLAYER_A, depth+1)
if eval > val:
bestMove = move
val = eval
return val, bestMove
This is abstract, but the idea is there. Adjust depending on how your are representing the board or the players. The function hueristicEvaluation could be something as simple as counting the pieces on the board for each player and how close they are to the other side. Remember that this function needs to return a number between [-1,1]
Edge cases to consider, which I didn't take into account:
If all moves are winning and/or losing
If the are NO moves in the position, for example if your pieces are all blocked by your opponent's pieces
Many improvements exist to a simple search like this. Read if you're interested :)
For checkers, perhaps Memoization would speed things up a lot. I'm not sure, but I'd think it would be especially in the beginning. See python's way of doing this.
Pruning
Alpha-beta pruning
Branch and bound

Python 3 Passing a Function into Another Function

I am somewhat new to Python, and this is a homework question, so I would appreciate no answers, just advice to help me understand. I am writing a game program that plays two different strategies against each other - a greedy strategy and a zoom-in strategy, which I have written as function. I have a game function that needs to pass in my greedy and zoom-in functions, as well as a game board. I need to be able to have either strategy function go first. So far I can only get it to where my greedy strategy goes first.
def game(P1,P2,board):
P1 = 0
P2 = 0
for x in range(len(board)):
if x%2 == 0:
move = greedy(board)
P1 += board[move]
board.remove(board[move])
else:
move = zoomin(board)
P2 += board[move]
board.remove(board[move])
if P1 > P2:
return 1
elif P1 == P2:
return 0.5
else:
return 0
This strategy always assumes that P1 is the greedy function, but I need to be able to play either first. I thought I could pass in the functions, so my call would be
game(greedy,zoomin,board)
but I am not sure how to actually implement it so that it can recognize who is playing first.
Thank you in advance for your help!
EDIT:
Here are my greedy and zoomin functions:
def greedy(board):
if board[0] > board[len(board)-1]:
#returns position of first item
return 0
elif board[len(board)-1] > board[0]:
#returns position of last item
return len(board)-1
else:
#if board[-1] == board[0]
return 0
def zoomin(board):
if len(board)%2 == 0:
evens = 0
odds = 0
for x in range(len(board)):
if x%2 ==0:
evens += board[x]
else:
odds += board[x]
if evens > odds:
return 0
else:
return len(board)-1
else:
#choose the larger value (greedy)
if board[0] < board[len(board)-1]:
return len(board)-1
else:
return 0
This is not a direct answer to your question (since senshin already answered it), but I wanted to point out that you can decrease your code duplication by using arrays instead. For instance, like this:
def game(players, board):
scores = [0] * len(players)
while i in range(len(board))
p = i % len(players)
move = players[p](board)
scores[p] += board[move]
del board[move] # <-- This is also a faster and more fail-safe version of your "board.remove(board[move])"
return scores
You can then call this function as game([greedy, zoomin], board). Also note how it extends to an arbitrary number of players, although that may not actually be useful for you. :)
You will want to rewrite your game function slightly. Notice that your game function accepts P1 and P2, but you don't do anything with them - you immediately assign 0 to both of them.
The correct way to approach this is to have your game function accept two strategies, which can be greedy or zoomin, or whatever else you might come up with later.
def game(strategy1, strategy2, board):
You will also need to replace the explicit calls to greedy and zoomin in the function body (e.g. move = greedy(board)) with calls to the strategies passed into your function instead - something like move = strategy1(board).
Then, in order to have greedy play first and zoomin play second, you could call:
game(greedy, zoomin, board)
Or if you wanted zoomin first and greedy second, you could call:
game(zoomin, greedy, board)
As you can see, the order of play is determined by the order in which you pass the two strategies into your function. Let me know if this needs clarification.

Categories

Resources