Minimax Algorithm Implementation In Python3

Minimax Algorithm Implementation In Python3 - python

I have been trying to build a Tic-Tac-Toe bot in Python. I tried to avoid using the Minimax algorithm, because I was QUITE daunted how to implement it. Until now.
I (finally) wrote an algorithm that sucked and could lose pretty easily, which kinda defeats the purpose of making a computer play Tic-Tac-Toe. So I finally took the courage to TRY to implement the algorithm. I stumbled upon this StackOverflow post. I tried to implement the chosen answer there, but I can't understand most of the stuff. The code in that answer follows:
def minimax(self, player, depth = 0) :
if player == "o":
best = -10
else:
best = 10
if self.complete():
if self.getWinner() == "x": # 'X' is the computer
return -10 + depth, None
elif self.getWinner() == "tie":
return 0, None
elif self.getWinner() == "o" : # 'O' is the human
return 10 - depth, None
for move in self.getAvailableMoves() :
self.makeMove(move, player)
val, _ = self.minimax(self.getEnemyPlayer(player), depth+1)
print(val)
self.makeMove(move, ".")
if player == "o" :
if val > best :
best, bestMove = val, move
else :
if val < best :
best, bestMove = val, move
return best, bestMove
First of all, why are we returning -10 + depth when the computer win and 10 -
depth when the human wins? (I get why we return 0 when it is a draw). Secondly, what is the depth parameter doing? Is there some way to omit it?
Should we omit it?
I'm probably missing something fundamental about the algorithm but, I think I understand it well enough. Please bear in mind that I'm very new to recursive algorithms...
EDIT
So, now I made myself the function:
def minimax(self, player):
won = 10
lost = -10
draw = 0
if self.has_won(HUMAN):
return lost, None
elif self.has_won(BOT):
return won, None
if not(self.board_is_empty()):
return draw, None
moves = self.get_available_moves()
for move in moves:
self.play_move(move[0], move[1], player)
make_board(self.board)
if self.board_is_empty():
val, _ = self.minimax(self.get_enemy_player(player))
self.rewind_move(move)
if val==won:
return val, move
But the problem now is I can't understand what happens when the move ends in a draw or a loss (for the computer). I think what it's doing is that it goes through a move's consequences to see if SOMEONE wins (that's probably what is happening, because I tested it) and then returns that move if SOMEONE wins. How do I modify this code to work properly?
Note:
This function is in a class, hence the self keywords.
moves is a list containing tuples. eg. moves = [(0, 1), (2, 2)] etc. So, moves contains all the empty squares. So each moves[i][j] is an integer modulo 3.
I'm using the exhaustive algorithm suggested by Jacques de Hooge in his answer below.

First note that 10 - depth = - (-10 + depth).
So computer wins have opposite signs from human wins.
In this way they can be added to evaluate the value of a gameboard state.
While with tictactoe this isn't really needed, in a game like chess it is, since it is too timeconsuming to try all possible combinations until checkmate, hence gameboard states have to be evaluated somehow in terms of losses and wins (losing and winning chess pieces, each worth a certain amount of points, values drawn from experience).
Suppose now we only look at 10 - depth (so human wins).
The most attractive wins are the ones that require the least plies (moves).
Since each move or countermove results in the depth being incremented,
more moves will result in parameter depth being larger, so 10 - depth (the "amount" of advantage) being smaller. So quick wins are favored over lenghty ones. 10 is enough, since there are only in total 9 moves possible in a 3 x 3 playfield.
So in short: since tictactoe is so simple, in fact the winning combination can be found in a exhaustive recursive search. But the minimax algorithm is suitable for more complicated situations like chess, in which intermediate situations have to be evaluated in terms of the sum of losses (negative) and gains (positives).
Should the depth parameter be omitted? If you care about the quickest win: No. If you only care about a win (with tictactoe): it can indeed be omitted, since an exhaustive search is possible.
[EDIT]
Exhaustive with tictactoe just means searching 9 plies deep, since the game can never last longer.
Make a recursive function with a parameter player (o or x) and a return value win or loss, that is first decided at the deepest recursion level, and then taken upward through the recursion tree. Let it call itself with the opposite player as parameter for all free fields. A move for the machine is the right one if any sequel results in the machine winning for all branches that the human may take on each level.
Note: The assumption I made is that there IS a winning strategy. If that is not the case (ties possible), the algorithm you have may be the best option. I remember with tictactoe the one who starts the game can always enforce a win in the way described above. So the algorithm will win in at least 50% of all games.
With a non-perfect human player it may also win if the computer doesn't start, if the human player does something suboptimal,

Related

Saving valid moves in Negamax makes little to no difference in speed

I have a normal Negamax algorithm with alpha-beta pruning which is initiated with iterative deepening (ID). I thought that to really get use of ID I save the calculated valid moves from depth 1 in a table, so next time I go for depth 2 and the same original position arrives I can just grab the valid moves from the table instead to save time. However, I find that this idea doesn't save any time at all really which makes me think:
I have never seen anyone do this, is it not worth it for some reason?
My implementation of this is wrong?
I am confused by how Negamax works and maybe this is impossible to do in the first place?
Here is the original iterative call, along with a snippet of the Negamax function itself:
self.valid_moves_history = []
for depth in range(1, s.max_search_depth):
move, evaluation = self.negamax(gamestate, depth, -math.inf, math.inf, s.start_color)
# ----------------------------------------------------------------------------
def negamax(self, gamestate, depth, alpha, beta, color):
if self.zobrist_key in self.valid_moves_history:
children = self.valid_moves_history[self.zobrist_key]
else:
children = gamestate.get_valid_moves()
self.valid_moves_history[key] = children
if depth == 0 or gamestate.is_check_mate or gamestate.is_stale_mate:
return None, e.evaluate(gamestate, depth) * color
# Negamax loop
max_eval = -math.inf
for child in reversed(children):
gamestate.make_move(child[0], child[1])
score = -self.negamax(gamestate, depth - 1, -beta, -alpha, -color)[1]
gamestate.unmake_move()
if score > max_eval:
max_eval = score
best_move = child
alpha = max(alpha, max_eval)
if beta <= alpha:
break
The most time consuming tasks of my complete program are distributed something like this (% of total runtime for a game):
Calculate valid moves: 60%
Evaluation function (medium complexity at the moment): 25%
Negamax itself with lookups, table saves etc: 5%
Make/unmake moves: 4%
Is it normal/reasonable for the calculating move time to be this high? This is the main reason why I thought to save valid moves in a list in the first place.
Or can someone please explain why this is a good/bad idea and what I should do instead? Thank you for any input.

I know this thread is quite old at this point but I think that this could still be useful to some people. The whole topic which you are talking about is called transposition tables in Minimax and you can find many links to the topic. Negamax is the same as Minimax except you do not have separate functions for the Max and Min players, and instead you just call a max function and turn it into a negative. I think it is probably more useful for you to implement move ordering first as it can double the speed of your program. You can also find a more efficient way to find valid moves to speed up the program.

Why is my answer wrong in Code jam 2018 "Saving the World Again"?

The problem is presented here: https://codingcompetitions.withgoogle.com/codejam/round/00000000000000cb/0000000000007966
An alien robot is threatening the universe, using a beam that will destroy all algorithms knowledge. We have to stop it!
Fortunately, we understand how the robot works. It starts off with a beam with a strength of 1, and it will run a program that is a series of instructions, which will be executed one at a time, in left to right order. Each instruction is of one of the following two types:
C (for "charge"): Double the beam's strength.
S (for "shoot"): Shoot the beam, doing damage equal to the beam's current strength.
For example, if the robot's program is SCCSSC, the robot will do the following when the program runs:
Shoot the beam, doing 1 damage.
Charge the beam, doubling the beam's strength to 2.
Charge the beam, doubling the beam's strength to 4.
Shoot the beam, doing 4 damage.
Shoot the beam, doing 4 damage.
Charge the beam, increasing the beam's strength to 8.
In that case, the program would do a total of 9 damage.
The universe's top algorithmists have developed a shield that can withstand a maximum total of D damage. But the robot's current program might do more damage than that when it runs.
The President of the Universe has volunteered to fly into space to hack the robot's program before the robot runs it. The only way the President can hack (without the robot noticing) is by swapping two adjacent instructions. For example, the President could hack the above program once by swapping the third and fourth instructions to make it SCSCSC. This would reduce the total damage to 7. Then, for example, the president could hack the program again to make it SCSSCC, reducing the damage to 5, and so on.
To prevent the robot from getting too suspicious, the President does not want to hack too many times. What is this smallest possible number of hacks which will ensure that the program does no more than D total damage, if it is possible to do so?
Input
The first line of the input gives the number of test cases, T. T test cases follow. Each consists of one line containing an integer D and a string P: the maximum total damage our shield can withstand, and the robot's program.
Output
For each test case, output one line containing Case #x: y, where x is the test case number (starting from 1) and y is either the minimum number of hacks needed to accomplish the goal, or IMPOSSIBLE if it is not possible.
I implemented the following logic:
- First calculate the damage of the ship.
- The S has it's greatest value when it is at the end so the swaps should start at end and continue towards the beginning of the list.
- The C at the end becomes useless so I pop it out of the list so it does not iterate over it again.
- In order to simplify the O() complexity I decided to subtract the last value of S from theSUM every time a swap is made.
The test results seem right - but the judge of the system says : Wrong Answer.
Can you help me find the mistake?
(I know only how to operate with lists and dictionaries in Python 3 and I am an absolute beginner at solving theese questions )
my code is below:
for case in range(1,T):
D, B = input().split()
D = int(D)
Blist =[]
[Blist.append(i) for i in B]
def beamDamage(Blist):
theSum=0
intS=1
Ccount = 0
for i in Blist:
if i == 'S':
theSum = theSum + intS
if i == 'C':
Ccount = Ccount +1
intS = intS*2
return theSum
def swap(Blist):
temp=''
for i in range(0,len(Blist)):
if Blist[len(Blist)- 1] == 'C':
Blist.pop()
if (Blist[len(Blist)- i - 1]) == 'C' and (Blist[len(Blist)- i] == 'S'):
temp = Blist[len(Blist)- i - 1] # C
Blist[len(Blist)- i - 1] = 'S'
Blist[len(Blist)- i] = temp
return Blist
bd = beamDamage(Blist)
y = 0
if 'C' not in B:
if beamDamage(Blist) > D:
print("Case #{}: IMPOSSIBLE".format(case))
else:
print("Case #{}: 0".format(case))
else:
while bd > D:
swap(Blist)
pwr=0
for ch in Blist:
if ch == 'C':
pwr=pwr+1
bd = bd - 2**(pwr-1)
y+=1
print("Case #{}: {}".format(case, y))

I will not give you a complete solution, but here is one issue:
If your input is a series of "S" followed by one or more "C" (like "SSSSC"), and the calculated damage is higher than asked for, you'll clearly see that the result is wrong. It should be IMPOSSIBLE...
The reason for the failure is that the condition in if 'C' not in B: will not apply, and so the loop will kick in (when it really shouldn't). Consequently pwr remains zero and you use a calculation with 2**-1, which yields a non-integer value.
The solution is to trim the list from terminating C characters at the very start, even before doing the if test.
Secondly, I don't see the benefit of doing the damage calculation in two different ways. On the one hand you have beamDamage, and you also have the inline loop, which does roughly the same (not faster).
Finally, even if you get this right, I suspect your code might run into a timeout, because it is not doing the job efficiently. Think of keeping track of the damage incrementally, without needing to go through the whole list again.
Once you have that improvement, you may still need to tune performance furhter. In that case, think of what damage reduction you would get it you would move a "C" immediately to the very end of the list. If that reduction is still not bringing the damage below the target, you can go for that in one go (but still count the steps correctly).

Minimax with alhpa-beta pruning for chess

I am creating a chess AI using the minimax method with alpha-beta pruning. I am trying to understand how the alpha-beta pruning works but can't get my head around it when it comes to chess where you set a certain search depth.
How do minimax with alpha-beta solve sacrificing a piece for advantage 2-3 moves ahead? Won't it just look at the position at the sacrifice and immediately discard that branch as bad, therefore missing the good "sacrifice"?
Thank you for any clarifications or advice on improvements. Here is my code so far:
def minimax(board, depth, alpha, beta, maximizing_player):
board.is_human_turn = not maximizing_player
children = board.get_all_possible_moves()
if depth == 0 or board.is_draw or board.is_check_mate:
return None, evaluate(board)
best_move = random.choice(children)
if maximizing_player:
max_eval = -math.inf
for child in children:
board_copy = copy.deepcopy(board)
board_copy.move(child)
current_eval = minimax(board_copy, depth - 1, alpha, beta, False)[1]
if current_eval > max_eval:
max_eval = current_eval
best_move = child
alpha = max(alpha, current_eval)
if beta <= alpha:
break
return best_move, max_eval
else:
min_eval = math.inf
for child in children:
board_copy = copy.deepcopy(board)
board_copy.move(child)
current_eval = minimax(board_copy, depth - 1, alpha, beta, True)[1]
if current_eval < min_eval:
min_eval = current_eval
best_move = child
beta = min(beta, current_eval)
if beta <= alpha:
break
return best_move, min_eval

To make this clear for you, I want to explain the minimax search tree.
In a minimax search tree, the engine assumes that in some certain depth ahead, both players do the best to maximize the board value (from evaluate()) at the end of the branch, which is after 3 moves. If the move sequence sacrifices a queen for nothing in 3 moves, this branch is considered bad. But if it sacrifices a queen for an inevitable checkmate in 2 moves, this is considered good.
By the way, Alpha-Beta pruning is the optimization of minimax, which help get the same result as minimax but faster. For more information, you may want to check out Alpha-beta pruning on Wikipedia.
An easy way to make your engine look for more long term advantage is to increase the search depth, but as it usually comes with the time cost exploding exponentially, this is not quite feasible. Here are some suggestions to make your engine faster.
Use Python chess library. I'm not sure if you are already using it from the code you've shown, but if not, I suggest you use this library. This library provides decent chess board calculation performance, convenient board representation, printing to FEN-string or PGN-string, and many more.
Use move ordering. This can usually help find the good move faster, and you can make the move generation and sorting algorithm into a function, and use functools.lru_cache() to further improve the search speed. Notice that lru_cache() uses a hashtable to store previous executions, but chess.Board isn't hashable by default, so you need to put this after importing for the caching to work: chess.Board.__hash__ = chess.polyglot.zobrist_hash. For more information, check out Move Ordering.
Using opening data and opening searcher. You can use data here to make your engine use openings, which takes no time to calculate, while also providing better results for the early game.

Explanation on Stone Nim Game

I was doing a coding problem which I somehow passed all test cases but I did not understand exactly what was going on. The problem was a small twist on the classic nim game:
There are two players A and B. There are N piles of various stones. Each player can take any amount of stones if the pile is less than K, otherwise they must take a multiple of K stones. The last person to take stones wins.
python
# solution -> will A win the game of piles, k?
def solution(piles, k):
gn = 0 # Grundy number
for pile in piles:
if pile % 2 != 0:
gn ^= pile + 1
else:
gn ^= pile - 1
return gn != 0
I'm not sure if there was enough test cases, but k was not even used here. To be honest, I am having a difficult time even understanding what gn (Grundy number) really means. I realize there is a proof of winning the Nim game if the xor of all piles is not zero, but I don't really understand why this variation requires checking the parity of the pile.

First, the given solution is incorrect. You noticed that it does not use k, and indeed this is a big red flag. You can also look at the result it gives for a single pile game, where it seems to say that player A only wins if the size of the pile is one which you should fairly quickly be able to show is incorrect.
The structure of the answer is sort of correct, though. A lot of the power of the Grundy number is that the Grundy number of a combined game state is the nim sum (XOR in the case of finite ordinals) of the Grundy numbers of the individual game states. (This only works for a very specific way of combining game states, but this turns out to be the natural way of considering Nim piles together.) So, this problem can indeed be solved by finding the Grundy number for each pile (considering k) and XOR-ing them together to get the Grundy number for the full game state. (In Nim where you can take any number of stones from a pile and win by taking the last stone, the Grundy number of a pile is just the size of a pile. That's why the solution to that version of Nim just XOR-s the sizes of the piles.)
So, taking the theory for granted, you can solve the problem by finding the correct Grundy values for a single pile given k. You only need to consider one pile games to do this. This is actually a pretty classic problem, and IMO significantly simpler to correctly analyze than multi-pile Nim. You should give it a go.
As for how to think of Grundy numbers, there are plenty of places to read about it, but here's my approach. The thing to understand is why the combination of two game states allows the previous player (B) to win exactly when the Grundy numbers are equal.
To do this, we need only consider what effect moves have on the Grundy numbers of the two states.
By definition as the minimum excluded value of successor states, there is always a move that changes the Grundy number of a state to any lower value (ie n could become any number from 0 up to n - 1). There is never a move that leaves the Grundy number the same. There may or may not be moves that increase the Grundy number.
Then, in the case of the combination of two states with the same Grundy number, the player B can win by employing the "copycat strategy". If player A makes a move that decreases the Grundy number of one state, player B can "copy" by reducing the Grundy number of the other state to the same value. If player A makes a move that increases the Grundy number of one state, player B can "undo" it by making a move on the same state to reduce it to the same value it was before. (Our game is finite, so we don't have to worry about an infinite loop of doing and undoing.) These are the only things A can do. (Remember, importantly, there is no move that leaves a Grundy number unchanged.)
If the states don't have the same Grundy number, then the way for the first player to win is clear, then; they just reduces the number of the state with a higher value to match the state with the lower value. This reduces things to the previous scenario.
Here we should note that the minimum excluded value definition allows us to construct the Grundy number for any states recursively in terms of their successors (at least for a finite game). There are no choices, so these numbers are in fact well-defined.
The next question to address is why we can calculate the Grundy number of a combined state. I prefer not to think about XOR at all here. We can define this nim sum operation purely from the minimum excluded value property. We abstractly consider the successors of nim_sum(x, y) to be {nim_sum(k, y) for k in 0..x-1} and {nim_sum(x, k) for k in 0..y-1}; in other words, making a move on one sub-state or the other. (We can ignore successor of one of the sub-states that increase the Grundy number, as such a state would have all the successors of the original state plus nim_sum(x, y) itself as another successor, so it must then have a strictly larger Grundy number. Yes, that's a little bit hand-wavy.) This turns out to be the same as XOR. I don't have a particularly nice explanation for this, but I feel it isn't really necessary to a basic understanding. The important thing is that it is a well-defined operation.

Weighted Traversal Algorithm (Breadth first is better?)

I'm having trouble designing an algorithm for a traversal problem.
I have a Ship that I control on a 2D grid and it starts on the very bottom of the grid. Each tile of the grid has a value (between 0 and 1000) equal to how much 'resource' is in that tile.
The Ship can go_left(), go_up(), go_right() or stay_still()
If the ship stay_still() it collects 25% of it's current tile's resource (rounded up to the nearest int).
If the ship uses a move command, it needs to spend 10% of it's current tile resource value rounded down. Moves that cost more than the ship has collected are illegal. (So if a ship is on a 100, it costs 10 to move off the 100, if it's on a 9 or less, moving is free).
The goal is to find a relatively short path that legally collects 1000 resource. Returning a list of the move order to corresponds to the path.
I naturally tried a recursive approach:
In sudo-code the algorithm is:
alg(position, collected, best_path):
if ship has 1000:
return best_path
alg(stay still)
if ship has enough to move:
alg(try left)
alg(try up)
alg(try right)
If you want a closer look at the actual syntax in python3 here it is:
def get_path_to_1000(self, current_position, collected_resource, path, game_map):
if collected_resource >= 1000:
return path
path_stay = path.copy().append(stay_still())
self.get_path_to_1000(current_position, collected_resource +
math.ceil(0.25 * game_map[current_position].value),
path_stay, game_map.copy().collect(current_position))
cost = math.floor(0.1 * game_map[current_position].value)
if collected_resource >= cost:
direction_list = [Direction.West, Direction.North, Direction.East]
move_list = [go_left(), go_up(), go_right()]
for i in range(3):
new_path = path.copy().append(move_list[i])
self.get_path_to_1000(
current_position.offset(direction_list[i]),
collected_resource - cost, new_path, game_map)
The problem with my approach is that the algorithm never completes because it keeps trying longer and longer lists of the ship staying still.
How can I alter my algorithm so it actually tries more than one option, returning a relatively short (or shortest) path to 1000?

The nature of this problem (ignoring the exact mechanics of the rounding down/variable cost of moving) is to find the shortest number of nodes in order to acquire 1,000 resources. Another way to look at this goal is that the ship is trying to find the most efficient move with each turn.
This issue can be solved with a slightly modified version of Dijksta's algorithm. Instead of greedily choosing the move with the least weight, we will choose the move with the most weight (greatest number of resources), and add this value to a running counter that will make sure that we reach 1000 resources total. By greedily adding the most efficient edge weights (while below 1000), we'll find the least number of moves to get a total 1000 resources.
Simply keep a list of the moves made with the algorithm and return that list when the resource counter reaches 1000.
Here's a helpful resource on how to best implement Dijkstra's algorithm:
https://www.geeksforgeeks.org/dijkstras-shortest-path-algorithm-greedy-algo-7/
With the few modifications, it should be your best bet!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.