I'm trying to make an RNN predict moves for a card game. In each time step, only certain actions are legal (i.e. some moves cannot be made in certain situations).
So at any given point through the game, one out of 12 moves is the correct one. Each move is labeled as an int in range 0 through 11. In most situations, only a subset of these are actually legal moves. So say I try to train the model, and it predicts e.g. move 4 in a situation, however only moves 2, 3 and 9 are legal at this time. After this move has been made, a different subset is allowed for the next time step. How do I make it predict from only a subset of the moves?
I haven't come so far as to coding the model yet, but I intend to use Keras/TensorFlow LSTM in Python to do this.
I would be happy if you could point me in a good direction here!
You can add a mask as an argument of calling your model.
But the simpliest way is to get prediction as probabilities of all moves and then just ignore those which are not permitted.
Related
So recently I've started experimenting with python in terms of game dev and I now want to create a maze game. I've implemented 4 different algorithms (Prim's, Kruskal's, Recursive Backtracker, Hunt and Kill to be exact) to generate a maze. Each of them return a maze as a 2D python array. What I found ineresting is the amount of crucial desicions a player should/may take in order to solve a maze in a different generated mazes.
Meaning that whether a player faces a crossroad or a junction
a player have to take a decision on their next step which might be crucial in terms of completion time and path lenght. This is severely simple to analyse by 'hand' when it comes to small mazes, but is there any possible code algorithm to count the amount of such crossroads and junctions in bigger mazes?
For any given point, say maze[x][y], you should be able to see if you're able to move to maze[x-1][y], maze[x+1][y], maze[x][y-1], and maze[x][y+1]. If you can move to three, you have a junction. If you can get to all four, you have a crossroad.
Just be certain that you're doing a safety check on x and y equaling 0 or len - 1 so you don't get an exception on an overflow, or a loopback from 0 to -1.
I'm trying myself on an algorithm that plays ticTacToe against itself and learns out of the winning conditions. When it wins, it checks again all the moves it made and increases the probability for the next time the same situation comes.
I never did something like that before. So my idea is that I need every combination of possible Moves.
In the first round the PC has to chose from a list of 9 elements, each representing one of the tiles on the game. Then the other player can chose from 8. But: there has to be 9 different lists player two can chose from. When player one chose number 2 , player two is allowed to chose from the list of elements which does not include number 2.
So I need in the first row 1 list of 9 Elements. In the Second I need 9 lists of 8 elements each and so on.
This becomes pretty big, so I need to create those combinations automatically.
My idea was to create lists which contains either more lists or the elements to chose from. Then I can navigate through those lists to tell the player out of which list (or path in a big list of lists) to chose from. I‘m not really sure if there is an easy way to do this, especially the creating of those lists. I couldn’t find a way yet. Then I saw the tree datatype, which seems to be powerful, but I’m not sure if this is the right one that I search for. Hope you can give me advice
Edit: to make it clear, I know there is this minmax algorithm etc. What I wanted to do is let the game play a lot against itself and Let it find their own way in learning. Just by getting the result if he won or not.
The approach you plan to follow might be considered as an Ant Colonization Algorithm. As your description points out the idea is to explore available paths according to some heuristic and backtracking the path followed to increase/decrease the probability of that same path to be taken again in subsequent iterations, effectively weighting the graph edges (the state tree of TicTacToe in this case). At the end of the process the winning paths will have a greater weight than the loosing ones which would allow your engine to play TicTacToe well by following the heaviest edges. Here are some links if you're interested: wiki, seminar slides.
IMO the nature of the algorithm requires some kind of tree/graph data structure to ease backtracking and neighbor discovery and I would personally go for that instead of using lists of lists. To that effect you may try the NetworkX library, for example.
Separately I agree with #martin-wettstein comments that taking advantage of board symmetries would reduce the number of board states to be considered and would improve performance at the cost of a slightly more complicated logic.
Indeed I implemented the same approach as you some time ago and it was really fun, so good luck at it.
I'm working on an AI that should be able to play chess. I want to make use of keras neural networks to evaluate position on the board. I would like to teach the NN by playing plenty of games between AI and AI. I already have alpha-beta pruning implemented.
My idea was to create a csv file with positions of every single game the AI has played. I would chose variables I would like to store there. Very simple example:
"white_pawns","black_pawns","white_queens","black_queens","white_pawns_on_side","white_won"
3,7,1,2,0,False
3,5,3,0,1,True
I would like to train a model using these values and then use it to evaluate current board position. So the main question is:
How to make a neural network output a value of position given these variables? Eg. 0 when it's draw or 1 when we are one pawn up. keras preferred, but I'm open to any other python library.
I would also be grateful if you could dispel my few other doubts.
Are there any flaws in that approach? Wouldn't every position from a single game make the neural network overfitted? Maybe I should pick only few positions from each game?
I think you know this, but when a human evaluates the board, he is not only looking at the material—but also looking in the positions of the pieces. Secondly, with this csv, you can't decide what is a better movie if the thing you see is only true or false. This is why the engine's evaluation is numerical. Or you want it to output a number from -1 to 1, and then it is the score? Looking to do the same thing but do 1 for a white win, -1 for a black win or 0 for a draw (in the dataset file). If you want to do this with me, hit me up (is there a messaging service for stack overflow?).
conclusion
the input should be a numerical representation for the board, in my opinion, and the target should not be a classifier but a numerical classifier. it is actually simpler.
I have a python engine that I am working on and this is an opportunity to meet new people that are interested in the things I am.
only saying, this is my first answer so if something is unclear please make a comment and I will try to help!
also, like krish said this can be implemented with reinforcement learning. but first you need to make a dqn (deep q networks (q learning is a really popular reinforcement learning algorithm)) and for that you need another network. because if not, this will take a lot of time to train.
Consider a standard 7*6 board. Suppose I want to apply Q-Learning algorithm. For applying it, I need a set of all possible states and actions. There can be 3^(7*6) = 150094635296999121. Since its not feasible to store these, I am only considering legal states.
How can I generate Q(s,a) for all the legal states and actions?
This is not my homework. I am trying to learn about reinforcement algorithms. I have been searching about this since two days. The closest thing I have come to is consider only the legal states.
There are 3 process you need to set up. One that generates the next move, one that changes where that move leads, and lastly evaluating a block of 4x4 through a series of checks to see if this is a winner . Numpy and scipy will help with this.
Set up a Numpy array of zeroes. Change the number to 1 for player 1 moves and -1 for moves done by player 2. The 4x4 check is summing over the x axis and then the y axis and then the sum of the diagonals if the abs(sum(axis))==4 then yield board earlier than the end.
This may create duplicates depending on the implementation so put all of these in a set at the end.
**Edit due to comments and the question modification.
You need to use generators and do a depth first search. There is a max of 7 possible branches for any state with a possibility of 42 moves. You are only looking for winning or loosing states to store (don't save stalemates as they take the most memory). The states will be 2 sets of locations one for each player.
When you step forward and find a winning/losing state, store the state with the value, step backward to the previous move and update the value there storing this as well.
There are 144 possible ways of winning/losing to connect four with I don't know how many states associated with each. so I'm not sure how many steps away from winning you want to store.
I have a problem with a game I am making. I think I know the solution(or what solution to apply) but not sure how all the ‘pieces’ fit together.
How the game works:
(from How to approach number guessing game(with a twist) algorithm? )
users will be given items with a value(values change every day and the program is aware of the change in price). For example
Apple = 1
Pears = 2
Oranges = 3
They will then get a chance to choose any combo of them they like (i.e. 100 apples, 20 pears, and 1 oranges). The only output the computer gets is the total value(in this example, its currently $143). The computer will try to guess what they have. Which obviously it won’t be able to get correctly the first turn.
Value quantity(day1) value(day1)
Apple 1 100 100
Pears 2 20 40
Orange 3 1 3
Total 121 143
The next turn the user can modify their numbers but no more than 5% of the total quantity (or some other percent we may chose. I’ll use 5% for example.). The prices of fruit can change(at random) so the total value may change based on that also(for simplicity I am not changing fruit prices in this example). Using the above example, on day 2 of the game, the user returns a value of $152 and $164 on day 3. Here's an example.
quantity(day2) %change(day2) value(day2) quantity(day3) %change(day3) value(day3)
104 104 106 106
21 42 23 46
2 6 4 12
127 4.96% 152 133 4.72% 164
*(I hope the tables show up right, I had to manually space them so hopefully its not just doing it on my screen, if it doesn't work let me know and I'll try to upload a screenshot).
I am trying to see if I can figure out what the quantities are over time(assuming the user will have the patience to keep entering numbers). I know right now my only restriction is the total value cannot be more than 5% so I cannot be within 5% accuracy right now so the user will be entering it forever.
What I have done so far:
I have taken all the values of the fruit and total value of fruit basket that’s given to me and created a large table of all the possibilities. Once I have a list of all the possibilities I used graph theory and created nodes for each possible solution. I then create edges(links) between nodes from each day(for example day1 to day2) if its within 5% change. I then delete all nodes that do not have edges(links to other nodes), and as the user keeps playing I also delete entire paths when the path becomes a dead end.
This is great because it narrows the choices down, but now I’m stuck because I want to narrow these choices even more. I’ve been told this is a hidden markov problem but a trickier version because the states are changing(as you can see above new nodes are being added every turn and old/non-probable ones are being removed).
** if it helps, I got a amazing answer(with sample code) on a python implementation of the baum-welch model(its used to train the data) here: Example of implementation of Baum-Welch **
What I think needs to be done(this could be wrong):
Now that I narrowed the results down, I am basically trying to allow the program to try to predict the correct based the narrowed result base. I thought this was not possible but several people are suggesting this can be solved with a hidden markov model. I think I can run several iterations over the data(using a Baum-Welch model) until the probabilities stabilize(and should get better with more turns from the user).
The way hidden markov models are able to check spelling or handwriting and improve as they make errors(errors in this case is to pick a basket that is deleted upon the next turn as being improbable).
Two questions:
How do I figure out the transition and emission matrix if all states are at first equal? For example, as all states are equally likely something must be used to dedicate the probability of states changing. I was thinking of using the graph I made to weight the nodes with the highest number of edges as part of the calculation of transition/emission states? Does that make sense or is there a better approach?
How can I keep track of all the changes in states? As new baskets are added and old ones are removed, there becomes an issue of tracking the baskets. I though an Hierarchical Dirichlet Process hidden markov model(hdp-hmm) would be what I needed but not exactly sure how to apply it.
(sorry if I sound a bit frustrated..its a bit hard knowing a problem is solvable but not able to conceptually grasp what needs to be done).
As always, thanks for your time and any advice/suggestions would be greatly appreciated.
Like you've said, this problem can be described with a HMM. You are essentially interested in maintaining a distribution over latent, or hidden, states which would be the true quantities at each time point. However, it seems you are confusing the problem of learning the parameters for a HMM opposed to simply doing inference in a known HMM. You have the latter problem but propose employing a solution (Baum-Welch) designed to do the former. That is, you have the model already, you just have to use it.
Interestingly, if you go through coding a discrete HMM for your problem you get an algorithm very similar to what you describe in your graph-theory solution. The big difference is that your solution is tracking what is possible whereas a correct inference algorithm, like the Virterbi algorithm, will track what is likely. The difference is clear when there is overlap in the 5% range on a domain, that is, when multiple possible states could potentially transition to the same state. Your algorithm might add 2 edges to a point, but I doubt that when you compute the next day that has an effect (it should count twice, essentially).
Anyway, you could use the Viterbi algortihm, if you are only interested in the best guess at the most recent day I'll just give you a brief idea how you can just modify your graph-theory solution. Instead of maintaining edges between states maintain a fraction representing the probability that state is the correct one (this distribution is sometimes called the belief state). At each new day, propagate forward your belief state by incrementing each bucket by the probability of it's parent (instead of adding an edge your adding a floating point number). You also have to make sure your belief state is properly normalized (sums to 1) so just divide by its sum after each update. After that, you can weight each state by your observation, but since you don't have a noisy observation you can just go and set all the impossible states to being zero probability and then re-normalize. You now have a distribution over underlying quantities conditioned on your observations.
I'm skipping over a lot of statistical details here, just to give you the idea.
Edit (re: questions):
The answer to your question really depends on what you want, if you want only the distribution for the most recent day then you can get away with a one-pass algorithm like I've described. If, however, you want to have the correct distribution over the quantities at every single day you're going to have to do a backward pass as well. Hence, the aptly named forward-backward algorithm. I get the sense that since you are looking to go back a step and delete edges then you probably want the distribution for all days (unlike I originally assumed). Of course, you noticed there is information that can be used so that the "future can inform the past" so to speak, and this is exactly the reason why you need to do the backward pass as well, it's not really complicated you just have to run the exact same algorithm starting at the end of the chain. For a good overview check out Christopher Bishop's 6-piece tutorial on videolectures.net.
Because you mentioned adding/deleting edges let me just clarify the algorithm I described previously, keep in mind this is for a single forward pass. Let there be a total of N possible permutations of quantities, so you will have a belief state that is a sparse vector N elements long (called v_0). The first step you receive a observation of the sum, and you populate the vector by setting all the possible values to have probability 1.0, then re-normalize. The next step you create a new sparse vector (v_1) of all 0s, iterate over all non-zero entries in v_0 and increment (by the probability in v_0) all entries in v_1 that are within 5%. Then, zero out all the entries in v_1 that are not possible according to the new observation, then re-normalize v_1 and throw away v_0. repeat forever, v_1 will always be the correct distribution of possibilities.
By the way, things can get way more complex than this, if you have noisy observations or very large states or continuous states. For this reason it's pretty hard to read some of the literature on statistical inference; it's quite general.