Consider a standard 7*6 board. Suppose I want to apply Q-Learning algorithm. For applying it, I need a set of all possible states and actions. There can be 3^(7*6) = 150094635296999121. Since its not feasible to store these, I am only considering legal states.
How can I generate Q(s,a) for all the legal states and actions?
This is not my homework. I am trying to learn about reinforcement algorithms. I have been searching about this since two days. The closest thing I have come to is consider only the legal states.
There are 3 process you need to set up. One that generates the next move, one that changes where that move leads, and lastly evaluating a block of 4x4 through a series of checks to see if this is a winner . Numpy and scipy will help with this.
Set up a Numpy array of zeroes. Change the number to 1 for player 1 moves and -1 for moves done by player 2. The 4x4 check is summing over the x axis and then the y axis and then the sum of the diagonals if the abs(sum(axis))==4 then yield board earlier than the end.
This may create duplicates depending on the implementation so put all of these in a set at the end.
**Edit due to comments and the question modification.
You need to use generators and do a depth first search. There is a max of 7 possible branches for any state with a possibility of 42 moves. You are only looking for winning or loosing states to store (don't save stalemates as they take the most memory). The states will be 2 sets of locations one for each player.
When you step forward and find a winning/losing state, store the state with the value, step backward to the previous move and update the value there storing this as well.
There are 144 possible ways of winning/losing to connect four with I don't know how many states associated with each. so I'm not sure how many steps away from winning you want to store.
Related
So recently I've started experimenting with python in terms of game dev and I now want to create a maze game. I've implemented 4 different algorithms (Prim's, Kruskal's, Recursive Backtracker, Hunt and Kill to be exact) to generate a maze. Each of them return a maze as a 2D python array. What I found ineresting is the amount of crucial desicions a player should/may take in order to solve a maze in a different generated mazes.
Meaning that whether a player faces a crossroad or a junction
a player have to take a decision on their next step which might be crucial in terms of completion time and path lenght. This is severely simple to analyse by 'hand' when it comes to small mazes, but is there any possible code algorithm to count the amount of such crossroads and junctions in bigger mazes?
For any given point, say maze[x][y], you should be able to see if you're able to move to maze[x-1][y], maze[x+1][y], maze[x][y-1], and maze[x][y+1]. If you can move to three, you have a junction. If you can get to all four, you have a crossroad.
Just be certain that you're doing a safety check on x and y equaling 0 or len - 1 so you don't get an exception on an overflow, or a loopback from 0 to -1.
I'm trying myself on an algorithm that plays ticTacToe against itself and learns out of the winning conditions. When it wins, it checks again all the moves it made and increases the probability for the next time the same situation comes.
I never did something like that before. So my idea is that I need every combination of possible Moves.
In the first round the PC has to chose from a list of 9 elements, each representing one of the tiles on the game. Then the other player can chose from 8. But: there has to be 9 different lists player two can chose from. When player one chose number 2 , player two is allowed to chose from the list of elements which does not include number 2.
So I need in the first row 1 list of 9 Elements. In the Second I need 9 lists of 8 elements each and so on.
This becomes pretty big, so I need to create those combinations automatically.
My idea was to create lists which contains either more lists or the elements to chose from. Then I can navigate through those lists to tell the player out of which list (or path in a big list of lists) to chose from. I‘m not really sure if there is an easy way to do this, especially the creating of those lists. I couldn’t find a way yet. Then I saw the tree datatype, which seems to be powerful, but I’m not sure if this is the right one that I search for. Hope you can give me advice
Edit: to make it clear, I know there is this minmax algorithm etc. What I wanted to do is let the game play a lot against itself and Let it find their own way in learning. Just by getting the result if he won or not.
The approach you plan to follow might be considered as an Ant Colonization Algorithm. As your description points out the idea is to explore available paths according to some heuristic and backtracking the path followed to increase/decrease the probability of that same path to be taken again in subsequent iterations, effectively weighting the graph edges (the state tree of TicTacToe in this case). At the end of the process the winning paths will have a greater weight than the loosing ones which would allow your engine to play TicTacToe well by following the heaviest edges. Here are some links if you're interested: wiki, seminar slides.
IMO the nature of the algorithm requires some kind of tree/graph data structure to ease backtracking and neighbor discovery and I would personally go for that instead of using lists of lists. To that effect you may try the NetworkX library, for example.
Separately I agree with #martin-wettstein comments that taking advantage of board symmetries would reduce the number of board states to be considered and would improve performance at the cost of a slightly more complicated logic.
Indeed I implemented the same approach as you some time ago and it was really fun, so good luck at it.
I'm writing an algorithm to solve skyscrapers puzzles:
Skyscraper puzzles combine the row and column constraints of Sudoku with external clue values that re-imagine each row or column of numbers as a road full of skyscrapers of varying height. Higher numbers represent higher buildings.
To solve a Skyscraper puzzle you must place 1 to 5, or 1 to whatever the size of the puzzle is, once each into every row and column, while also solving each of the given skyscraper clues.
To understand Skyscraper puzzles, you must imagine that each value you place into the grid represents a skyscraper of that number of floors. So a 1 is a 1-floor skyscraper, while a 4 is a 4-floor skyscraper. Now imagine that you go and stand outside the grid where one of the clue numbers is and look back into the grid. That clue number tells you how many skyscrapers you can see from that point, looking only along the row or column where the clue is, and from the point of view of the clue. Taller buildings always obscure lower buildings, so in other words higher numbers always conceal lower numbers.
All the basic techniques are implemented and working, but I've realized that with bigger puzzles (5x5>) I need some sort of recursive algorithm. I found a decent working python script, but I'm not really following what it actually does beyond solving basic clues.
Does anyone know the proper way of solving these puzzles or can anyone reveal the essentials in the code above?
Misha showed you the brute-force way. A much faster recursive algorithm can be made based on constraint propagation. Peter Norvig (head of Google Research) wrote an excellent article about how to use this technique to solve Sudoku with python. Read it and try to understand every detail, you will learn a lot, guaranteed. Since the skyscraper puzzle has a lot in common with Sudoku (without the 3X3 blocks, but with some extra constraints given by the numbers on the edge), you could probably steal a lot of his code.
You start, as with Sudoku, where each field has a list of all the possible numbers from 1..N. After that, you look at one horizontal/vertical line or edge clue at a time and remove illegal options. E.g. in a 5x5 case, an edge of 3 excludes 5 from the first two and 4 from the first squares. The constraint propagation should do the rest. Keep looping over edge constraints until they are fulfilled or you get stuck after cycling through all constraints. As shown by Norvig, you then start guessing and remove numbers in case of a contradiction.
In case of Sudoku, a given clue has to be processed only once, since once you assign a single number to one square (you remove all the other possibilities), all the information of the clue has been used. With the skyscrapers, however, you might have to apply a given clue several times until it is totally satisfied (e.g. when the complete line is solved).
If you're desperate, you can brute-force the puzzle. I usually do this as a first step to become familiar with the puzzle. Basically, you need to populate NxN squares with integers from 1 to N inclusive, following the following constraints:
Each integer appears in every row exactly once
Each integer appears in every column exactly once
The row "clues" are satisfied
The column "clues" are satisfied
The brute force solution would work like this. First, represent the board as a 2D array of integers. Then write a function is_valid_solution that returns True if the board satisfies the above constraints, and False otherwise. This part is relatively easy to do in O(N^2).
Finally, iterate over the possible board permutations, and call is_valid_solution for each permutation. When that returns True, you've found a solution. There are a total of N^(NxN) possible arrangements, so your complete solution will be O(N^(NxN)). You can do better by using the above constraints for reducing the search space.
The above method will take a relatively long while to run (O(N^(NxN)) is pretty horrible for an algorithm), but you'll (eventually) get a solution. When you've got that working, try to think of a better way to to it; if you get stuck, then come back here.
EDIT
A slightly better alternative to the above would be to perform a search (e.g. depth-first) starting with an empty board. At each iteration of the search, you'd populate one cell of the table with a number (while not violating any of the constraints). Once you happen to fill up the board, you're done.
Here's pseudo-code for a recursive brute-force depth-first search. The search will be NxN nodes deep, and the branching factor at each node is at most N. This means you will need to examine at most 1 + N + N^2 + ... + N^(N-1) or (N^N-1)/(N-1) nodes. For each of these nodes, you need to call is_valid_board which is O(N^2) in the worst case (when the board is full).
def fill_square(board, row, column):
if row == column == N-1: # the board is full, we're done
print board
return
next_row, next_col = calculate_next_position(row, col)
for value in range(1, N+1):
next_board = copy.deepcopy(board)
next_board[row][col] = value
if is_valid_board(next_board):
fill_square(next_board, next_row, next_col)
board = initialize_board()
fill_square(board, 0, 0)
The function calculate_next_position selects the next square to fill. The easiest way to do this is just a scanline traversal of the board. A smarter way would be to fill rows and columns alternately.
I have a problem with a game I am making. I think I know the solution(or what solution to apply) but not sure how all the ‘pieces’ fit together.
How the game works:
(from How to approach number guessing game(with a twist) algorithm? )
users will be given items with a value(values change every day and the program is aware of the change in price). For example
Apple = 1
Pears = 2
Oranges = 3
They will then get a chance to choose any combo of them they like (i.e. 100 apples, 20 pears, and 1 oranges). The only output the computer gets is the total value(in this example, its currently $143). The computer will try to guess what they have. Which obviously it won’t be able to get correctly the first turn.
Value quantity(day1) value(day1)
Apple 1 100 100
Pears 2 20 40
Orange 3 1 3
Total 121 143
The next turn the user can modify their numbers but no more than 5% of the total quantity (or some other percent we may chose. I’ll use 5% for example.). The prices of fruit can change(at random) so the total value may change based on that also(for simplicity I am not changing fruit prices in this example). Using the above example, on day 2 of the game, the user returns a value of $152 and $164 on day 3. Here's an example.
quantity(day2) %change(day2) value(day2) quantity(day3) %change(day3) value(day3)
104 104 106 106
21 42 23 46
2 6 4 12
127 4.96% 152 133 4.72% 164
*(I hope the tables show up right, I had to manually space them so hopefully its not just doing it on my screen, if it doesn't work let me know and I'll try to upload a screenshot).
I am trying to see if I can figure out what the quantities are over time(assuming the user will have the patience to keep entering numbers). I know right now my only restriction is the total value cannot be more than 5% so I cannot be within 5% accuracy right now so the user will be entering it forever.
What I have done so far:
I have taken all the values of the fruit and total value of fruit basket that’s given to me and created a large table of all the possibilities. Once I have a list of all the possibilities I used graph theory and created nodes for each possible solution. I then create edges(links) between nodes from each day(for example day1 to day2) if its within 5% change. I then delete all nodes that do not have edges(links to other nodes), and as the user keeps playing I also delete entire paths when the path becomes a dead end.
This is great because it narrows the choices down, but now I’m stuck because I want to narrow these choices even more. I’ve been told this is a hidden markov problem but a trickier version because the states are changing(as you can see above new nodes are being added every turn and old/non-probable ones are being removed).
** if it helps, I got a amazing answer(with sample code) on a python implementation of the baum-welch model(its used to train the data) here: Example of implementation of Baum-Welch **
What I think needs to be done(this could be wrong):
Now that I narrowed the results down, I am basically trying to allow the program to try to predict the correct based the narrowed result base. I thought this was not possible but several people are suggesting this can be solved with a hidden markov model. I think I can run several iterations over the data(using a Baum-Welch model) until the probabilities stabilize(and should get better with more turns from the user).
The way hidden markov models are able to check spelling or handwriting and improve as they make errors(errors in this case is to pick a basket that is deleted upon the next turn as being improbable).
Two questions:
How do I figure out the transition and emission matrix if all states are at first equal? For example, as all states are equally likely something must be used to dedicate the probability of states changing. I was thinking of using the graph I made to weight the nodes with the highest number of edges as part of the calculation of transition/emission states? Does that make sense or is there a better approach?
How can I keep track of all the changes in states? As new baskets are added and old ones are removed, there becomes an issue of tracking the baskets. I though an Hierarchical Dirichlet Process hidden markov model(hdp-hmm) would be what I needed but not exactly sure how to apply it.
(sorry if I sound a bit frustrated..its a bit hard knowing a problem is solvable but not able to conceptually grasp what needs to be done).
As always, thanks for your time and any advice/suggestions would be greatly appreciated.
Like you've said, this problem can be described with a HMM. You are essentially interested in maintaining a distribution over latent, or hidden, states which would be the true quantities at each time point. However, it seems you are confusing the problem of learning the parameters for a HMM opposed to simply doing inference in a known HMM. You have the latter problem but propose employing a solution (Baum-Welch) designed to do the former. That is, you have the model already, you just have to use it.
Interestingly, if you go through coding a discrete HMM for your problem you get an algorithm very similar to what you describe in your graph-theory solution. The big difference is that your solution is tracking what is possible whereas a correct inference algorithm, like the Virterbi algorithm, will track what is likely. The difference is clear when there is overlap in the 5% range on a domain, that is, when multiple possible states could potentially transition to the same state. Your algorithm might add 2 edges to a point, but I doubt that when you compute the next day that has an effect (it should count twice, essentially).
Anyway, you could use the Viterbi algortihm, if you are only interested in the best guess at the most recent day I'll just give you a brief idea how you can just modify your graph-theory solution. Instead of maintaining edges between states maintain a fraction representing the probability that state is the correct one (this distribution is sometimes called the belief state). At each new day, propagate forward your belief state by incrementing each bucket by the probability of it's parent (instead of adding an edge your adding a floating point number). You also have to make sure your belief state is properly normalized (sums to 1) so just divide by its sum after each update. After that, you can weight each state by your observation, but since you don't have a noisy observation you can just go and set all the impossible states to being zero probability and then re-normalize. You now have a distribution over underlying quantities conditioned on your observations.
I'm skipping over a lot of statistical details here, just to give you the idea.
Edit (re: questions):
The answer to your question really depends on what you want, if you want only the distribution for the most recent day then you can get away with a one-pass algorithm like I've described. If, however, you want to have the correct distribution over the quantities at every single day you're going to have to do a backward pass as well. Hence, the aptly named forward-backward algorithm. I get the sense that since you are looking to go back a step and delete edges then you probably want the distribution for all days (unlike I originally assumed). Of course, you noticed there is information that can be used so that the "future can inform the past" so to speak, and this is exactly the reason why you need to do the backward pass as well, it's not really complicated you just have to run the exact same algorithm starting at the end of the chain. For a good overview check out Christopher Bishop's 6-piece tutorial on videolectures.net.
Because you mentioned adding/deleting edges let me just clarify the algorithm I described previously, keep in mind this is for a single forward pass. Let there be a total of N possible permutations of quantities, so you will have a belief state that is a sparse vector N elements long (called v_0). The first step you receive a observation of the sum, and you populate the vector by setting all the possible values to have probability 1.0, then re-normalize. The next step you create a new sparse vector (v_1) of all 0s, iterate over all non-zero entries in v_0 and increment (by the probability in v_0) all entries in v_1 that are within 5%. Then, zero out all the entries in v_1 that are not possible according to the new observation, then re-normalize v_1 and throw away v_0. repeat forever, v_1 will always be the correct distribution of possibilities.
By the way, things can get way more complex than this, if you have noisy observations or very large states or continuous states. For this reason it's pretty hard to read some of the literature on statistical inference; it's quite general.
For a mental exercise I decided to try and solve the bubble breaker game found on many cell phones as well as an example here:Bubble Break Game
The random (N,M,C) board consists N rows x M columns with C colors
The goal is to get the highest score by picking the sequence of bubble groups that ultimately leads to the highest score
A bubble group is 2 or more bubbles of the same color that are adjacent to each other in either x or y direction. Diagonals do not count
When a group is picked, the bubbles disappear, any holes are filled with bubbles from above first, ie shift down, then any holes are filled by shifting right
A bubble group score = n * (n - 1) where n is the number of bubbles in the bubble group
The first algorithm is a simple exhaustive recursive algorithm which explores going through the board row by row and column by column picking bubble groups. Once the bubble group is picked, we create a new board and try to solve that board, recursively descending down
Some of the ideas I am using include normalized memoization. Once a board is solved we store the board and the best score in a memoization table.
I create a prototype in python which shows a (2,15,5) board takes 8859 boards to solve in about 3 seconds. A (3,15,5) board takes 12,384,726 boards in 50 minutes on a server. The solver rate is ~3k-4k boards/sec and gradually decreases as the memoization search takes longer. Memoization table grows to 5,692,482 boards, and hits 6,713,566 times.
What other approaches could yield high scores besides the exhaustive search?
I don't seen any obvious way to divide and conquer. But trending towards larger and larger bubbles groups seems to be one approach
Thanks to David Locke for posting the paper link which talks above a window solver which uses a constant-depth lookahead heuristic.
According to this paper, determining if you can empty the board (which is related to the problem you want to solve) is NP-Complete. That doesn't mean that you won't be able to find a good algorithm, it just means that you likely won't find an efficient one.
I'm thinking you could try a branch and bound search with the following idea:
Given a state of the game S, you branch on S by breaking it up in m sets Si where each Si is the state after taking a legal move of all m legal moves given the state S
You need two functions U(S) and L(S) that compute a lower and upper bound respectively of a given state S.
For the U(S) function I'm thinking calculate the score that you would get if you were able to freely shuffle K bubbles in the board (each move) and arrange the blocks in such a way that would result in the highest score, where K is a value you choose yourself. When your calculating U(S) for a given S it should go quicker if you choose higher K (the conditions are relaxed) so choosing the value of K will be a trade of for quickness of finding U(S) and quality (how tight an upper bound U(S) is.)
For the L(S) function calculate the score that you would get if you simply randomly kept click until you got to a state that could not be solved any further. You can do this several times taking the highest lower bound that you get.
Once you have these two functions you can apply standard Bound and Branch search. Note that the speed of your search is going to greatly depend on how tight your Upper Bound is and how tight your Lower Bound is.
To get a faster solution than exhaustive search, I think what you want is probably dynamic programming. In dynamic programming, you find some sort of "step" that takes you possibly closer to your solution, and keep track of the results of each step in a big matrix. Then, once you have filled in the matrix, you can find the best result, and then work backward to get a path through the matrix that leads to the best result. The matrix is effectively a form of memoization.
Dynamic programming is discussed in The Algorithm Design Manual but there is also plenty of discussion of it on the web. Here's a good intro: http://20bits.com/articles/introduction-to-dynamic-programming/
I'm not sure exactly what the "step" is for this problem. Perhaps you could make a scoring metric for a board that simply sums the points for each of the bubble groups, and then record this score as you try popping balloons? Good steps would tend to cause bubble groups to coalesce, improving the score, and bad steps would break up bubble groups, making the score worse.
You can translate this problem into problem of searching shortest path on graph. http://en.wikipedia.org/wiki/Shortest_path_problem
I would try whit A* and heuristics would include number of islands.
In my chess program I use some ideas which could probably adapted to this problem.
Move Ordering. First find all
possible moves, store them in a list,
and sort them according to some
heuristic. The "better" ones first,
the "bad" ones last. For example,
this could be a function of the size
of the group (prefer medium sized
groups), or the number of adjacent
colors, groups, etc.
Iterative Deepening. Instead of
running a pure depth-first search,
cut of the search after a certain
deep and use some heuristic to assess
the result. Now research the tree
with "better" moves first.
Pruning. Don't search moves which
seems "obviously" bad, according to
some, again, heuristic. This involves
the risk that you won't find the
optimal solution anymore, but
depending on your heuristics you will
very likely find it much earlier.
Hash Tables. No need to store every
board you come accross, just remember
a certain number and overwrite older
ones.
I'm almost finished writing my version of the "solver" in Java. It does both exhaustive search, which takes fricking ages for larger board sizes, and a directed search based on a "pool" of possible paths, which is pruned after every generation, and a fitness function used to prune the pool. I'm just trying to tune the fitness function now...
Update - this is now available at http://bubblesolver.sourceforge.net/
This isn't my area of expertise, but I would like to recommend a book to you. Get a copy of The Algorithm Design Manual by Steven Skiena. This has a whole list of different algorithms, and once you read through it you can use it as a reference. If nothing else it will help you consider your options.