Minimax with alhpa-beta pruning for chess

Minimax with alhpa-beta pruning for chess - python

I am creating a chess AI using the minimax method with alpha-beta pruning. I am trying to understand how the alpha-beta pruning works but can't get my head around it when it comes to chess where you set a certain search depth.
How do minimax with alpha-beta solve sacrificing a piece for advantage 2-3 moves ahead? Won't it just look at the position at the sacrifice and immediately discard that branch as bad, therefore missing the good "sacrifice"?
Thank you for any clarifications or advice on improvements. Here is my code so far:
def minimax(board, depth, alpha, beta, maximizing_player):
board.is_human_turn = not maximizing_player
children = board.get_all_possible_moves()
if depth == 0 or board.is_draw or board.is_check_mate:
return None, evaluate(board)
best_move = random.choice(children)
if maximizing_player:
max_eval = -math.inf
for child in children:
board_copy = copy.deepcopy(board)
board_copy.move(child)
current_eval = minimax(board_copy, depth - 1, alpha, beta, False)[1]
if current_eval > max_eval:
max_eval = current_eval
best_move = child
alpha = max(alpha, current_eval)
if beta <= alpha:
break
return best_move, max_eval
else:
min_eval = math.inf
for child in children:
board_copy = copy.deepcopy(board)
board_copy.move(child)
current_eval = minimax(board_copy, depth - 1, alpha, beta, True)[1]
if current_eval < min_eval:
min_eval = current_eval
best_move = child
beta = min(beta, current_eval)
if beta <= alpha:
break
return best_move, min_eval

To make this clear for you, I want to explain the minimax search tree.
In a minimax search tree, the engine assumes that in some certain depth ahead, both players do the best to maximize the board value (from evaluate()) at the end of the branch, which is after 3 moves. If the move sequence sacrifices a queen for nothing in 3 moves, this branch is considered bad. But if it sacrifices a queen for an inevitable checkmate in 2 moves, this is considered good.
By the way, Alpha-Beta pruning is the optimization of minimax, which help get the same result as minimax but faster. For more information, you may want to check out Alpha-beta pruning on Wikipedia.
An easy way to make your engine look for more long term advantage is to increase the search depth, but as it usually comes with the time cost exploding exponentially, this is not quite feasible. Here are some suggestions to make your engine faster.
Use Python chess library. I'm not sure if you are already using it from the code you've shown, but if not, I suggest you use this library. This library provides decent chess board calculation performance, convenient board representation, printing to FEN-string or PGN-string, and many more.
Use move ordering. This can usually help find the good move faster, and you can make the move generation and sorting algorithm into a function, and use functools.lru_cache() to further improve the search speed. Notice that lru_cache() uses a hashtable to store previous executions, but chess.Board isn't hashable by default, so you need to put this after importing for the caching to work: chess.Board.__hash__ = chess.polyglot.zobrist_hash. For more information, check out Move Ordering.
Using opening data and opening searcher. You can use data here to make your engine use openings, which takes no time to calculate, while also providing better results for the early game.

Related

Saving valid moves in Negamax makes little to no difference in speed

I have a normal Negamax algorithm with alpha-beta pruning which is initiated with iterative deepening (ID). I thought that to really get use of ID I save the calculated valid moves from depth 1 in a table, so next time I go for depth 2 and the same original position arrives I can just grab the valid moves from the table instead to save time. However, I find that this idea doesn't save any time at all really which makes me think:
I have never seen anyone do this, is it not worth it for some reason?
My implementation of this is wrong?
I am confused by how Negamax works and maybe this is impossible to do in the first place?
Here is the original iterative call, along with a snippet of the Negamax function itself:
self.valid_moves_history = []
for depth in range(1, s.max_search_depth):
move, evaluation = self.negamax(gamestate, depth, -math.inf, math.inf, s.start_color)
# ----------------------------------------------------------------------------
def negamax(self, gamestate, depth, alpha, beta, color):
if self.zobrist_key in self.valid_moves_history:
children = self.valid_moves_history[self.zobrist_key]
else:
children = gamestate.get_valid_moves()
self.valid_moves_history[key] = children
if depth == 0 or gamestate.is_check_mate or gamestate.is_stale_mate:
return None, e.evaluate(gamestate, depth) * color
# Negamax loop
max_eval = -math.inf
for child in reversed(children):
gamestate.make_move(child[0], child[1])
score = -self.negamax(gamestate, depth - 1, -beta, -alpha, -color)[1]
gamestate.unmake_move()
if score > max_eval:
max_eval = score
best_move = child
alpha = max(alpha, max_eval)
if beta <= alpha:
break
The most time consuming tasks of my complete program are distributed something like this (% of total runtime for a game):
Calculate valid moves: 60%
Evaluation function (medium complexity at the moment): 25%
Negamax itself with lookups, table saves etc: 5%
Make/unmake moves: 4%
Is it normal/reasonable for the calculating move time to be this high? This is the main reason why I thought to save valid moves in a list in the first place.
Or can someone please explain why this is a good/bad idea and what I should do instead? Thank you for any input.

I know this thread is quite old at this point but I think that this could still be useful to some people. The whole topic which you are talking about is called transposition tables in Minimax and you can find many links to the topic. Negamax is the same as Minimax except you do not have separate functions for the Max and Min players, and instead you just call a max function and turn it into a negative. I think it is probably more useful for you to implement move ordering first as it can double the speed of your program. You can also find a more efficient way to find valid moves to speed up the program.

How to implement a transposition table for connect 4?

I'm making a connect 4 AI in python, and I'm using minimax with iterative deepening and alpha beta pruning for this. For greater depths it's still quite slow, so I wanted to implement a transposition table. After reading up on it I think i get the general idea but i haven't been able to quite make it work. Here's part of my code: (the maximizing part of the minimax):
if(isMaximizing):
maxEval = -99999999999
bestMove = None
# cache.get(hash(board)) Here's where i'd check to see if the hash is already in the table
# if so i searched for the best move that was given to that board before.
# loop through possible moves
for move in [3,2,4,1,5,0,6]:
if moves[move] > -1:
# check if time limit has been reached for iterative deepening
if startTime - time.time() <= -10:
timeout = True
return (maxEval, bestMove, timeout)
if timeout == False:
board = makeMove((moves[move],move), True, board) # make the move
eval = minimax(depth - 1, board, False, alpha, beta, cache, zobTable, startTime, timeout)[0]
if eval > maxEval:
maxEval = eval
bestMove = (moves[move]+1,move)
board[moves[move] + 1][move] = '_' # undo the move on the board
moves[move] = moves[move] + 1 # undo the move in the list of legal moves
alpha = max(alpha, maxEval)
if alpha >= beta:
break
# cache.set(hash(board), (eval, value)) Here's where i would set the value and bestmove for the current boardstate
return (maxEval, bestMove, timeout)
Right now i'm hashing the board with the zobrist hashing method, and i'm using an ordered dict to add the hashed boards to. To this hashkey i've added the value for the board and the bestMove for that board. Unfortunately this seems to make the algorithm pick bad moves (it worked before), does anyone know where you should put the boardstates in the cache, and where you should get them from the cache?

A few points on your approach:
If you want things to be fast, writing efficient code in C or C++ is going to be much faster than python. I've seen 10-100x improvements in performance in this sort of search code by switching away from python and to a good C/C++ implementation. Either way you should try to write code that avoids allocating memory during search, as this is very expensive. That is to say, you could see better returns from coding more efficiently than from adding a transposition table.
When using Zobrist hashing for a transposition table in game tree search, you typically do not store the state explicitly. You only check to see if the hashes are equal. While there is a small chance of error, it requires far less memory to store just the hash, and with a 64-bit hash the chance of collisions are probably vanishingly small for the types of searches you are doing. (The chances of errors resulting are even lower.)
When you store values in the transposition table, you also need to store the alpha and beta bounds used during the search. When you get a value back at a node mid-search it is either an upper bound on the true value (because value = beta), a lower bound on the true value (because value = alpha) or the actual value of the node (alpha < value < beta). You need to store this in your transposition table. Then, when you want to re-use the value, you have to check that you can use the value given your current alpha and beta bounds. (You can validate this by actually doing the search after finding the value in the transposition table to see if you get the same value from search that you got in the table.)

Weighted Traversal Algorithm (Breadth first is better?)

I'm having trouble designing an algorithm for a traversal problem.
I have a Ship that I control on a 2D grid and it starts on the very bottom of the grid. Each tile of the grid has a value (between 0 and 1000) equal to how much 'resource' is in that tile.
The Ship can go_left(), go_up(), go_right() or stay_still()
If the ship stay_still() it collects 25% of it's current tile's resource (rounded up to the nearest int).
If the ship uses a move command, it needs to spend 10% of it's current tile resource value rounded down. Moves that cost more than the ship has collected are illegal. (So if a ship is on a 100, it costs 10 to move off the 100, if it's on a 9 or less, moving is free).
The goal is to find a relatively short path that legally collects 1000 resource. Returning a list of the move order to corresponds to the path.
I naturally tried a recursive approach:
In sudo-code the algorithm is:
alg(position, collected, best_path):
if ship has 1000:
return best_path
alg(stay still)
if ship has enough to move:
alg(try left)
alg(try up)
alg(try right)
If you want a closer look at the actual syntax in python3 here it is:
def get_path_to_1000(self, current_position, collected_resource, path, game_map):
if collected_resource >= 1000:
return path
path_stay = path.copy().append(stay_still())
self.get_path_to_1000(current_position, collected_resource +
math.ceil(0.25 * game_map[current_position].value),
path_stay, game_map.copy().collect(current_position))
cost = math.floor(0.1 * game_map[current_position].value)
if collected_resource >= cost:
direction_list = [Direction.West, Direction.North, Direction.East]
move_list = [go_left(), go_up(), go_right()]
for i in range(3):
new_path = path.copy().append(move_list[i])
self.get_path_to_1000(
current_position.offset(direction_list[i]),
collected_resource - cost, new_path, game_map)
The problem with my approach is that the algorithm never completes because it keeps trying longer and longer lists of the ship staying still.
How can I alter my algorithm so it actually tries more than one option, returning a relatively short (or shortest) path to 1000?

The nature of this problem (ignoring the exact mechanics of the rounding down/variable cost of moving) is to find the shortest number of nodes in order to acquire 1,000 resources. Another way to look at this goal is that the ship is trying to find the most efficient move with each turn.
This issue can be solved with a slightly modified version of Dijksta's algorithm. Instead of greedily choosing the move with the least weight, we will choose the move with the most weight (greatest number of resources), and add this value to a running counter that will make sure that we reach 1000 resources total. By greedily adding the most efficient edge weights (while below 1000), we'll find the least number of moves to get a total 1000 resources.
Simply keep a list of the moves made with the algorithm and return that list when the resource counter reaches 1000.
Here's a helpful resource on how to best implement Dijkstra's algorithm:
https://www.geeksforgeeks.org/dijkstras-shortest-path-algorithm-greedy-algo-7/
With the few modifications, it should be your best bet!

recursion code should go infinitely?

So, i am trying to learn python and i come across this code to demonstrate recursion in python. Now , I know c++ and I thought this code should create an infinte loop but it doesn't. Any help would be greatly appreciated. This is an program to sort a list by insertion sort.
def InsertionSort(seq):
isort(seq, len(seq))
def isort(seq, k): # Sort slice seq[0:k]
if k > 1:
isort(seq, k - 1) #1
insert(seq, k - 1) #2
def insert(seq, k): # Insert seq[k] into sorted seq[0:k-1]
pos = k
while pos > 0 and seq[pos] < seq[pos - 1]:
(seq[pos], seq[pos - 1]) = (seq[pos - 1], seq[pos])
pos = pos - 1
Shouldn't the compiler got to #1 and again call isort and thus return in a infinite loop and never go to #2.
Thank you for your help.

This code will terminate as the function is calld with k-1, meaning that the condition k>1 will eventually evaluate to False.
Imagine recursion is like a continually growing tree and a squirrel is the interpreter. When the isort() function is called, the tree branches off and the squirrel runs to the end of that branch. However, the tree uses has a finite supply of nutrients (k) each time, and a bit is used (k-1) each time it grows a new branch. The tree will stop branching off when it runs out of nutrients (which is the k>1 condition). When the tree stops growing, the squirrel will reach the end of the last branch and get the nut (return value/ next line(s) of code). The squirrel will now run back to the roots (the code (if any) after the call to the recursive function) by going back through the branches (leaving each recursion depth). When the squirrel arrives back at the roots, the program is finished.
(Hope this analogy helps :) )

Minimax Algorithm Implementation In Python3

I have been trying to build a Tic-Tac-Toe bot in Python. I tried to avoid using the Minimax algorithm, because I was QUITE daunted how to implement it. Until now.
I (finally) wrote an algorithm that sucked and could lose pretty easily, which kinda defeats the purpose of making a computer play Tic-Tac-Toe. So I finally took the courage to TRY to implement the algorithm. I stumbled upon this StackOverflow post. I tried to implement the chosen answer there, but I can't understand most of the stuff. The code in that answer follows:
def minimax(self, player, depth = 0) :
if player == "o":
best = -10
else:
best = 10
if self.complete():
if self.getWinner() == "x": # 'X' is the computer
return -10 + depth, None
elif self.getWinner() == "tie":
return 0, None
elif self.getWinner() == "o" : # 'O' is the human
return 10 - depth, None
for move in self.getAvailableMoves() :
self.makeMove(move, player)
val, _ = self.minimax(self.getEnemyPlayer(player), depth+1)
print(val)
self.makeMove(move, ".")
if player == "o" :
if val > best :
best, bestMove = val, move
else :
if val < best :
best, bestMove = val, move
return best, bestMove
First of all, why are we returning -10 + depth when the computer win and 10 -
depth when the human wins? (I get why we return 0 when it is a draw). Secondly, what is the depth parameter doing? Is there some way to omit it?
Should we omit it?
I'm probably missing something fundamental about the algorithm but, I think I understand it well enough. Please bear in mind that I'm very new to recursive algorithms...
EDIT
So, now I made myself the function:
def minimax(self, player):
won = 10
lost = -10
draw = 0
if self.has_won(HUMAN):
return lost, None
elif self.has_won(BOT):
return won, None
if not(self.board_is_empty()):
return draw, None
moves = self.get_available_moves()
for move in moves:
self.play_move(move[0], move[1], player)
make_board(self.board)
if self.board_is_empty():
val, _ = self.minimax(self.get_enemy_player(player))
self.rewind_move(move)
if val==won:
return val, move
But the problem now is I can't understand what happens when the move ends in a draw or a loss (for the computer). I think what it's doing is that it goes through a move's consequences to see if SOMEONE wins (that's probably what is happening, because I tested it) and then returns that move if SOMEONE wins. How do I modify this code to work properly?
Note:
This function is in a class, hence the self keywords.
moves is a list containing tuples. eg. moves = [(0, 1), (2, 2)] etc. So, moves contains all the empty squares. So each moves[i][j] is an integer modulo 3.
I'm using the exhaustive algorithm suggested by Jacques de Hooge in his answer below.

First note that 10 - depth = - (-10 + depth).
So computer wins have opposite signs from human wins.
In this way they can be added to evaluate the value of a gameboard state.
While with tictactoe this isn't really needed, in a game like chess it is, since it is too timeconsuming to try all possible combinations until checkmate, hence gameboard states have to be evaluated somehow in terms of losses and wins (losing and winning chess pieces, each worth a certain amount of points, values drawn from experience).
Suppose now we only look at 10 - depth (so human wins).
The most attractive wins are the ones that require the least plies (moves).
Since each move or countermove results in the depth being incremented,
more moves will result in parameter depth being larger, so 10 - depth (the "amount" of advantage) being smaller. So quick wins are favored over lenghty ones. 10 is enough, since there are only in total 9 moves possible in a 3 x 3 playfield.
So in short: since tictactoe is so simple, in fact the winning combination can be found in a exhaustive recursive search. But the minimax algorithm is suitable for more complicated situations like chess, in which intermediate situations have to be evaluated in terms of the sum of losses (negative) and gains (positives).
Should the depth parameter be omitted? If you care about the quickest win: No. If you only care about a win (with tictactoe): it can indeed be omitted, since an exhaustive search is possible.
[EDIT]
Exhaustive with tictactoe just means searching 9 plies deep, since the game can never last longer.
Make a recursive function with a parameter player (o or x) and a return value win or loss, that is first decided at the deepest recursion level, and then taken upward through the recursion tree. Let it call itself with the opposite player as parameter for all free fields. A move for the machine is the right one if any sequel results in the machine winning for all branches that the human may take on each level.
Note: The assumption I made is that there IS a winning strategy. If that is not the case (ties possible), the algorithm you have may be the best option. I remember with tictactoe the one who starts the game can always enforce a win in the way described above. So the algorithm will win in at least 50% of all games.
With a non-perfect human player it may also win if the computer doesn't start, if the human player does something suboptimal,

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.