Related
I am trying to learn reinforcement learning to train ai on custom games in python, and decided to use gym for the environment and stable-baselines3 for the training. I decided to start off with a basic tic tac toe environment. Here's my code
import gym
from gym import spaces
import numpy as np
from stable_baselines3.common.env_checker import check_env
class tictactoe(gym.Env):
def __init__(self):
#creating grid, action and obervation space
self.box = [0,0,0,0,0,0,0,0,0]
self.done=False
self.turn = 1
self.action_space = spaces.Discrete(9)
self.observation_space = spaces.Discrete(9)
def _get_obs(self):
#returns the observation (the grid)
return np.array(self.box)
def iswinner(self, b, l):
#function to check if a side has won
return (b[1] == l and b[2] == l and b[3] == l) or (b[4] == l and b[5] == l and b[6] == l) or (b[7] == l and b[8] == l and b[9] == l) or (b[1] == l and b[4] == l and b[7] == l) or (b[7] == l and b[5] == l and b[3] == l) or (b[1] == l and b[5] == l and b[9] == l) or (b[8] == l and b[5] == l and b[2] == l) or (b[9] == l and b[6] == l and b[3] == l)
def reset(self):
#resets the env (grid, turn and done variable) and returns the observation
self.box = [0,0,0,0,0,0,0,0,0]
self.turn = 1
self.done=False
return self._get_obs()
def step(self, action):
#gives negative reward for illegal move (square occupied)
if self.box[action] != 0:
return self._get_obs(), -10, True, {}
#enters a value (1 or 2) in the grid and flips the turn
self.box[action] = self.turn
self.turn = (1 if self.turn == 2 else 2)
reward = 0
#checks if the game is over and sets a reward (+5 win, 0 draw)
if self.iswinner([0]+self.box,1) and self.turn == 1: reward,self.done = 5,True
elif 0 not in self.box: reward,self.done = 0,True
#returns the observation (grid), reward, if the game is finished and extra information (empty dict for me)
return self._get_obs(), reward, self.done, {}
def render(self):
#renders the board so it looks like a grid
print(self.box[:3],self.box[3:6],self.box[6:],sep='\n')
#checking the env
env = tictactoe()
print(check_env(env))
Trying this code, I got the error AssertionError: The observation returned by 'reset()' method must be an int. I completely do not understand how this is supposed to work. Since my reset function returns the obervation from _get_obs. Is it trying to say that my observation must be an integer? That makes even less sense as now I have no idea how I'm supposed to do that.
When you do
self.observation_space = spaces.Discrete(9)
you're actually defining your observation space as a single value that can take in all values of {0, 1, 2, 3, 4, 5, 6, 7, 8} since you defined it as a discrete single-dimension space (aka an integer).
As you said you were trying to make a tic-tac-toe environment, I presume what you were actually trying to do was something like
self.observation_space = spaces.MultiDiscrete([3, 3, 3, 3, 3, 3, 3, 3, 3])
# or self.observation_space = spaces.MultiDiscrete(9 * [3]), which would be cleaner
which means you have 9 tiles in total and each tile can be in three different states (empty, X or O).
First I will explain the rules of peg solitaire (for 1 dimension):
you initially start with a 1 dimensional board of length n. n-1 elements are pegs (filled) and 1 element is a hole (empty). So a starting position can be [1, 1, 0, 1, 1, 1] where 1s represent pegs and 0s represent holes for n = 6
The goal of the game is to reach a board state where n-1 elements are holes and 1 element is a peg at any given position. So a valid solution can be [0, 0, 0, 1, 0, 0] for n = 6
Your available moves at any given position is to move one peg by two positions to the right or to the left if and only if there is a peg between the two position, then once you make that move, replace the middle peg with a hole.
So for a board such as board = [0, 1, 1, 0, 1, 1, 0] there are two available moves.
move board[1] to board[3] and set board[2] = 0
move board[5] to board[3] and set board[4] = 0
move board[2] to board[0] and set board[1] = 0
move board[4] to board[6] and set board[5] = 0
The goal of the algorithm I am trying to make is to take an input of n where n > 2 and n is an even number, then for a board of length n, find all the positions for a start state at which a hole can be placed to produce a valid solution.
I have created a brute force algorithm which finds all the possible moves until a valid solution is reached, but it starts taking a very long time to find a solution past n > 20. So I was wondering if there are some optimizations I could do or different solution approaches.
Here is my code:
import re
def generateBoard(n):
return [1]*n
def solve(board):
if checkBoard(board):
return True
elif checkUnsolvable(board):
return False
moves = []
for i in range(len(board)):
if i < len(board)-2:
if board[i] and board[i+1] and not board[i+2]:
moves.append((i, 'right'))
if i > 1:
if board[i] and board[i-1] and not board[i-2]:
moves.append((i, 'left'))
for move in moves:
newBoard = makeMove(board, move)
if solve(newBoard):
return True
continue
return False
def makeMove(board, move):
index, direction = move
b = [element for element in board]
if direction == 'right':
b[index] = 0
b[index+1] = 0
b[index+2] = 1
elif direction == 'left':
b[index] = 0
b[index-1] = 0
b[index-2] = 1
return b
def checkBoard(board):
if sum(board) == 1:
return True
return False
def checkUnsolvable(board):
expression1 = '1000+1' #RE for a proven to be unsolvable board
expression2 = '00100' #RE for a proven to be unsolvable board
string = ''.join([str(element) for element in board])
if re.search(expression1, string) or re.search(expression2, string):
return True
return False
def countSolutions(board):
indices = []
for i in range(len(board)):
b = [element for element in board]
b[i] = 0
if solve(b):
indices.append(i+1)
return indices
n = int(input())
print(countSolutions(generateBoard(n)))
Optimizations I have come up with so far:
A board containing [1, 0, 0, ..., 0, 1] is unsolvable. So when we find this patters we skip
Same thing for a board containing [0, 0, .. 0, 1, 0, 0, ..,0]
We only need to check half of the board, as the solutions of the other half would be symmetrical.
But despite these the code is still very slow.
This algorithm for doing the solitaire is covered in this research paper: https://arxiv.org/pdf/math/0006067.pdf.
It claims that a linear time algorithm exists.
A valid solution looks like this:
L = 1
or 011
or 110
or 11 (01)* [ 00 | 00(11)+ | (11)+00 | (11)*1011 | 1101(11)* ] (10)*11 # (A)
or 11 (01)* (11)* 01 # (B)
or 10 (11)* (10)* 11 # (C)
To solve A, you can use regex to check for the string. However, there are multiple cases of it due to the middle section.
First case: 11(01)*00(10)*11
Second case: 11(01)*(00)(11)+(10)*11
Third case: 11(01)*(11)+(00)(10)*11
Fourth case: 11(01)*(11)*(1011)(10)*11
Fifth case: 11(01)*1101(11)*(10)*11
To solve B and C is a simpler regex match:
Solution for B: 11(01)*(11)*01
Solution for C: 10(11)*(10)*11
If you match (you will need to convert it to a string though, such as ''.join([str(i) for i in mylist]) for example) 1, 011, 110, or any of the patterns above, then it will be solvable.
I've been working on this leetcode problem, which is essentially finding the number of valid paths in a maze given some obstacleGrid matrix. If obstacleGrid[i][j] == 1, then we have an obstacle at (i,j) and we have zero otherwise, which a valid spot. We can only move down and right with the goal of starting from the upper left to the bottom right.
I have written the code below:
def uniquePathsWithObstacles(self, obstacleGrid):
# obstruction at the start
if (obstacleGrid[0][0] == 1): return 0
# obstruction at the end
if (obstacleGrid[-1][-1] == 1): return 0
m, n = len(obstacleGrid), len(obstacleGrid[0])
memo = [[0] * n] * m
# starting move
memo[0][0] = 1
# now check the first row
for j in range(1, n):
memo[0][j] = 1 if (obstacleGrid[0][j] == 0 and memo[0][j-1] != 0) else 0
# now check the first column
for i in range(1, m):
memo[i][0] = 1 if (obstacleGrid[i][0] == 0 and memo[i-1][0] != 0) else 0
# now check everything else
for i in range(1, m):
for j in range(1, n):
if (obstacleGrid[i][j] == 1): memo[i][j] = 0
else: memo[i][j] = memo[i-1][j] + memo[i][j-1]
return memo[-1][-1]
I took the obvious DP approach and I know the idea works but something is wrong with the code; for some reason I don't think my memo matrix is being updated properly? I feel like the problem is staring at me in the face but for some reason I can't see it. Any help appreciated!
Edit: For obstacleGrid = [[0,0,0],[0,1,0],[0,0,0]] and if I had a print(memo) right before the return statement, I get [[1, 1, 2], [1, 1, 2], [1, 1, 2]]. This happens to give me the right answer, but the memo matrix is wrong!
One problem lies in the line memo = [[0] * n] * m.
This does not really create mcopies of the same list, but instead, it only creates the [0] * n list once and then creates memo as a list of m references to this list. Any change to any of these lists therefore modifies all other lists!
You can try this yourself:
memo = [[0] * 3] * 4
memo[0][1] = 1
print(memo)
This gives [[0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0]].
Instead, you have to initialize each list on their own, e.g.,
memo = []
for i in range(m):
memo.append([0] * n)
I just tried to do this with recursion as an comparison rather than an answer.
import numpy as np
def number_of_paths(obstacles):
"""
Calculate the number of paths available in a maze with obstacles, with only right and down moves, from top left
to bottom right.
Args:
obstacles (ndarray): binary matrix with 1 representing obstacle
Returns:
int: the number of paths
"""
if obstacles[0,0] == 1:
raise ValueError # cannot start on an obstacle
count = 0
if obstacles.shape == (2,1):
return 1
if obstacles.shape == (1,2):
return 1
if obstacles.shape[1] > 1 and obstacles[0,1] == 0:
count += number_of_paths(obstacles[:,1:])
if obstacles.shape[0] > 1 and obstacles[1,0] == 0:
count += number_of_paths(obstacles[1:,:])
return count
your code is correct and 1 line must be updated per the below:
def uniquePathsWithObstacles(self, obstacleGrid):
# obstruction at the start
if (obstacleGrid[0][0] == 1): return 0
# obstruction at the end
if (obstacleGrid[-1][-1] == 1): return 0
m, n = len(obstacleGrid), len(obstacleGrid[0])
memo = [[0] * n for i in range(m)]
# starting move
memo[0][0] = 1
# now check the first row
for j in range(1, n):
#memo[0][j] = 1 if (obstacleGrid[0][j] == 0 and memo[0][j-1] != 0) else 0
memo[0][j] = 1 if (obstacleGrid[0][j] == 0 and memo[0][j-1] != 0) else 0
# now check the first column
for i in range(1, m):
memo[i][0] = 1 if (obstacleGrid[i][0] == 0 and memo[i-1][0] != 0) else 0
# now check everything else
for i in range(1, m):
for j in range(1, n):
if (obstacleGrid[i][j] == 1): memo[i][j] = 0
else: memo[i][j] = memo[i-1][j] + memo[i][j-1]
return memo[-1][-1]
A list of integers is said to be a valley if it consists of a sequence of strictly decreasing values followed by a sequence of strictly increasing values. The decreasing and increasing sequences must be of length at least 2. The last value of the decreasing sequence is the first value of the increasing sequence.
Write a Python function valley(l) that takes a list of integers and returns True if l is a valley and False otherwise.
Here are some examples to show how your function should work.
>>> valley([3,2,1,2,3])
True
>>> valley([3,2,1])
False
>>> valley([3,3,2,1,2])
False
I have been sleepless for 2 days and the best i could write is this code
def valley(list):
first =False
second=False
midway=0
if(len(list)<2):
return False
else:
for i in range(0,len(list)):
if(list[i]<list[i+1]):
first=True
midway=i
break
for j in range(midway,len(list)-1):
if(list[j]<list[j+1] and j+1==len(list)):
Second=True
break
if(list[j]>=list[j+1]):
second=False
break
if(first==True and second==True):
return True
else:
return False
The solution i found that also works if the numbers are not in perfect sequence and it is not necessary that the lowest value must be equal to 1, what i'm trying to say is if the list is suppose [14,12,10,5,3,6,7,32,41], here also a valley is formed, as the values are decreasing up to 3(lowest) and then it's again increasing. List's such as [4,3,2,1,2,3,4] is a perfect valley.
Solution:
def valley(lst):
if len(lst)<2:
return False
else:
p = lst.index(min(lst))
for i in range (0,p):
x = (lst[i] > lst[i+1])
for q in range (p,len(lst)-1):
y = (lst[q]< lst[q+1])
return (x==y)
Don't forget to accept it if this solves the problem and is most helpful, thank you.
It seems saurav beat me to the punch, but if you'll allow for some NumPy magic:
import numpy as np
def valley(arr):
diff = arr[:-1] - arr[1:]
gt = np.where(diff > 0)[0]
lt = np.where(diff < 0)[0]
d = np.sum(diff == 0)
if gt.size == 0 or lt.size == 0:
# Doesn't have ascendings or decendings
return False
elif d > 0:
# Has a flat
return False
elif gt[-1] > lt[0]:
# Not strictly one descent into one ascent
return False
else:
return True
a = np.array([3, 2, 1, 2, 3])
b = np.array([3, 3, 2, 1, 2])
c = np.array([3, 2, 1])
d = np.array([1, 2, 3, 2, 1])
print(valley(a), valley(b), valley(c), valley(d))
>>> True False False False
You can also use plain old Python builtins to do it:
def valley(arr):
diff = [i1-i2 for i1, i2 in zip(arr, arr[1:])]
gt = [i for i, item in enumerate(diff) if item > 0]
lt = [i for i, item in enumerate(diff) if item < 0]
d = sum([True for d in diff if d == 0])
if len(gt) == 0 or len(lt) == 0:
# Doesn't have ascendings or decendings
return False
elif d > 0:
# Has a flat
return False
elif gt[-1] > lt[0]:
# Not strictly one descent into one ascent
return False
else:
return True
a = [3, 2, 1, 2, 3]
print(valley(a), ...)
>>> True False False False
Actually I did not want to send a complete solution but I just wanted to solve and for the first, and hopefully last, time I'm posting a solution for a task.
Here is my solution, of course there may be other solutions this is the first one my fingers typed.
def valley(heights):
directions = []
# Check input
if not heights:
return False
# Traverse array and compare current height with previous one
# if we are going down or up.
pre = heights[0]
for h in heights[1:]:
if h > pre:
# If we are going upward add 1
directions.append(1)
elif h < pre:
# If we are going downward add -1
directions.append(-1)
pre = h
# We have many -1s and 1s in out directions list.
# However, if it is a valley then it should first down and up
# which is [-1, 1]. Return the comparison result with [-1, 1]
return set(directions) == set([-1, 1])
The result of variable n in valley function, is the pairwise difference of the numbers in the input list, so if
input = [3,2,1,2,3]
n = [-1, -1, 1, 1]
Now the next variable h is, again pairwise difference of the n, so h will be
h = ['0', '2', '0']
So, every time you will have a valley, you just have to check the pattern "020". Use re module in python to do so,
import re
def valley(f):
n = [j-i for i, j in zip(f[:-1], f[1:])]
h = [str(j-i) for i, j in zip(n[:-1], n[1:])]
result = "".join(h)
m = re.search('020', result)
if m:
return True
else:
return False
Please let me know if its correct or not.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 6 years ago.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Improve this question
To start, sorry for the not-so-descriptive tittle, I really don't know how to call this problem, so let's start...
I have to designe an AI for a 4x4x4-4 in line game and we're using Monte Carlo method to do this AI, so basically I have to make a simulation of the posible moves of the computer n-times and determine which next move is the best posible, our professor gave us some base code to play a 4x4x4-4 in line game and we had to designe the AI so here is my method to simulate the AI:
EDIT1: Added full program.
from __future__ import print_function
import random
from copy import deepcopy
import time
board = [
[
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]
],
[
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]
],
[
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]
],
[
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]
]
]
p1_token = 1
p2_token = -1
draw_token = 0
pool = ThreadPool(processes=1)
def slice_winner(state_slice):
slice_size = len(state_slice)
sums = [sum(row) for row in state_slice]
sums.extend([sum([row[i] for row in state_slice]) for i in range(slice_size)])
if (p1_token * slice_size) in sums:
return p1_token
elif (p2_token * slice_size) in sums:
return p2_token
return 0
def winner(state):
for state_slice in state:
winner_in_slice = slice_winner(state_slice)
if winner_in_slice != draw_token:
return winner_in_slice
state_size = len(state)
for i in range(state_size):
state_slice = []
for j in range(state_size):
state_slice.append([state[j][i][k] for k in range(state_size)])
winner_in_slice = slice_winner(state_slice)
if winner_in_slice != draw_token:
return winner_in_slice
diagonals = [0, 0, 0, 0]
for i in range(state_size):
diagonals[0] += state[i][i][i]
diagonals[1] += state[state_size - 1 - i][i][i]
diagonals[2] += state[i][state_size - 1 - i][i]
diagonals[3] += state[state_size - 1 - i][state_size - 1 - i][i]
if (p1_token * state_size) in diagonals:
return p1_token
elif (p2_token * state_size) in diagonals:
return p2_token
return draw_token
def str_token(cell):
if cell == p1_token:
return "X"
elif cell == p2_token:
return "O"
return "."
def draw_board(state):
result = ""
state_size = len(state)
for y in range(state_size):
for z in range(state_size):
for x in range(state_size):
result += str_token(state[x][y][z]) + " "
result += "\t"
result += "\n"
return result
def isInVector(vector, x, y, z):
n = 0
while (n < len(vector)):
if (vector[n][0] == x and vector[n][1] == y and vector[n][2] == z):
return True
n += 1
return False
def getInVector(vector, x, y, z):
n = 0
while (n < len(vector)):
if (vector[n][0] == x and vector[n][1] == y and vector[n][2] == z):
return n
n += 1
return -1
def getBestPlay(vector):
max_value = -100000
index_of_max = -1
for i in range(len(vector)):
if (vector[i][3] > max_value):
max_value = vector[i][3]
index_of_max = i
return [vector[index_of_max][0], vector[index_of_max][1], vector[index_of_max][2]]
def AISim(main_state, p1x, p1y, p1z, maxIt):
n = 0 # Number of simulations
p2Moves = [] # A vector to hold player 2 moves.
while (n < maxIt): # While 1
x = p1x
y = p1y
z = p1z
player_turn = False # False because simulation will always start with player 2 turn
moves = 0
first_move = [0, 0, 0]
new_state = deepcopy(main_state)
while winner(new_state) == draw_token: # While 2
temp = new_state[x][y][z]
while temp != draw_token: # While 3
x = random.randint(0, 3)
y = random.randint(0, 3)
z = random.randint(0, 3)
temp = new_state[x][y][z]
# THIS IS THE PROBLEEEEEM!!!!
print (temp)
# END while 3
if (moves == 0):
first_move = [x, y, z]
if (not isInVector(p2Moves, x, y, z)):
p2Moves.append([x, y, z, 0])
# END if
# END if
new_state[x][y][z] = (1 if player_turn else -1)
player_turn = not player_turn
moves += 1
# END while 2
if (winner(new_state) == 1):
temPos = getInVector(p2Moves, first_move[0], first_move[1], first_move[2])
p2Moves[temPos][3] -= 1
else:
temPos = getInVector(p2Moves, first_move[0], first_move[1], first_move[2])
p2Moves[temPos][3] += 1
# END if-else
n += 1
# END while 1
return getBestPlay(p2Moves)
# --------------------------------------------------------------------------------------------------------
# ----------------------------------------- MAIN PROGRAM -------------------------------------------------
# --------------------------------------------------------------------------------------------------------
AIMove = [0, 0, 0]
player_1_turn = True
while winner(board) == draw_token:
# Print board state
print ("")
print ("Board:")
print (draw_board(board))
print ("")
# Print
print ("Player %s turn:" % (1 if player_1_turn else 2))
if (player_1_turn):
# Get input
x = int(raw_input("x: "))
y = int(raw_input("y: "))
z = int(raw_input("z: "))
else:
# Player 2 turn
start = time.time()
AIMove = AISim(board, x, y, z, 500)
end = time.time()
print ("Thinking time: %0.4f seconds" % (end - start))
x = AIMove[0]
print ("x:",x)
y = AIMove[1]
print ("y:",y)
z = AIMove[2]
print ("z:",z)
if board[x][y][z] == draw_token:
board[x][y][z] = 1 if player_1_turn else -1
player_1_turn = not player_1_turn
else:
print ("")
print ("ERROR: occupied position, please retry in a new position")
print ("")
print ("Player %s is the winner!" % (1 if winner(board) == 1 else 2))
And the code works I've test it many times and all, the problem is in the "While 3", for an unknown reason SOMETIMES the code may get stuck in a somewhat infinite loop regardless of the state of the board, like if there were no posible (x, y, z) coordinates that make the loop end, but I know there're! Because for the same unknown reason if I place a print (x, y, z) inside the loop, the AI work perfectly 100% of the times, but wait, there's more! As you might notice by printing (x, y, z) inside that loop make's the final output horrible, so I try print(x, y, z, end='\r) (I'm importing from __future__ import print_function) and it didn't work, the same weird behavior again, I've try so many things but the only solution seems to be printing the values I've been trying for hours and I need help, please!
You can test the program here.
My list of fail trys:
Save (x, y, z) in another array. It worked if I printed the vector...
Writing in an eternal file the coordinates, the coordinates were saved correctly, but it didn't fix it.
Assigning the random values generated to other variables and then those variables to the x, y and z.
Printing only 1 or 2 values, like x and z.
Swap x, y and z values.
EDIT2: Added a link to c9 so you can test it.
EDIT3: If you run it in your PC you might want to change the number of simulations, because printing over 1000 simulations might feel like it is infinite, but it isn't. Try printing like 100 or 300 simulations it might take a while but you'll see it working, then try like 1000 without printing(time should be the same since printing takes so much time) and you'll see it doesn't work.
EDIT4: Removed threading code.
UPDATE: I don't need this anymore for an assignment, I solve it in other way not using this code, but I really want to know what is happening and why does the print make's the program work or not.
P.S. Sorry again if it's somehow unclear what is happening, but I don't really know how to explain it better than this.