I am a neural-network beginner. I'd like to learn the basics of neural networks by teaching computers to play checkers. Actually, the games I want to learn are Domineering and Hex.
These games are pretty easy to store and the rules are much simpler than chess, but there aren't too many people who play. If I can get this idea off the ground it would be great for experimenting Combinatorial Game Theory.
PyBrain seems to be the clear winner for Python neural networks, but who can walk me through how to set up a neural net for my game-playing task? A google search turned up Blondie24 in 2001 but it uses some genetic algorithms - I don't want to complicate things.
Once you replace "neural networks" by machine learning (or even artificial intelligence, rather, imho) as the comments rightly suggest, I think you're better off starting with Alpha-beta pruning, the Minimax algorithm, and Branch and bound ideas.
Basically :
At each step, you build the tree of all possible futures, and evaluate leaf positions with an evaluation function (e.g. board domination, connectivity, material, etc.)
Propagate the results up in the tree, choosing the best play you can make, and the worse your opponent can (the best for him), until you know what move to play in the position you're at.
Rinse, repeat. Branch and bound saves you a lot of computation if you have a few good heuristics, and the level of your programm will basically be how deep it'll be able to search the game tree.
This will most probably be the basic framework in which anyone would introduce new ideas, so if you're not familiar with it, go for it :-)
Related
I'm working on an AI that should be able to play chess. I want to make use of keras neural networks to evaluate position on the board. I would like to teach the NN by playing plenty of games between AI and AI. I already have alpha-beta pruning implemented.
My idea was to create a csv file with positions of every single game the AI has played. I would chose variables I would like to store there. Very simple example:
"white_pawns","black_pawns","white_queens","black_queens","white_pawns_on_side","white_won"
3,7,1,2,0,False
3,5,3,0,1,True
I would like to train a model using these values and then use it to evaluate current board position. So the main question is:
How to make a neural network output a value of position given these variables? Eg. 0 when it's draw or 1 when we are one pawn up. keras preferred, but I'm open to any other python library.
I would also be grateful if you could dispel my few other doubts.
Are there any flaws in that approach? Wouldn't every position from a single game make the neural network overfitted? Maybe I should pick only few positions from each game?
I think you know this, but when a human evaluates the board, he is not only looking at the material—but also looking in the positions of the pieces. Secondly, with this csv, you can't decide what is a better movie if the thing you see is only true or false. This is why the engine's evaluation is numerical. Or you want it to output a number from -1 to 1, and then it is the score? Looking to do the same thing but do 1 for a white win, -1 for a black win or 0 for a draw (in the dataset file). If you want to do this with me, hit me up (is there a messaging service for stack overflow?).
conclusion
the input should be a numerical representation for the board, in my opinion, and the target should not be a classifier but a numerical classifier. it is actually simpler.
I have a python engine that I am working on and this is an opportunity to meet new people that are interested in the things I am.
only saying, this is my first answer so if something is unclear please make a comment and I will try to help!
also, like krish said this can be implemented with reinforcement learning. but first you need to make a dqn (deep q networks (q learning is a really popular reinforcement learning algorithm)) and for that you need another network. because if not, this will take a lot of time to train.
So I made Pong using PyGame and I want to use genetic algorithms to have an AI learn to play the game. I want it to only know the location of its paddle and the ball and controls. I just don't know how to have the AI move the paddle on its own. I don't want to do like: "If the ball is above you, go up." I want it to just try random stuff until it learns what to do.
So my question is, how do I get the AI to try controls and see what works?
Learning Atari-Pong has become a standard task in reinforcement learning. For example there is the OpenAI baselines github repo implementing RL algorithms that can be plugged into various tasks.
You definitely don't need those advanced algos just to learn Pong the way you describe, but you can learn from the API they're using to separate between tasks ("environments" in reinforcement learning terms) and the AI part ("controller" or "agent"). For this, I suggest to read the OpenAI Gymn Documentation for how you would add a new Environment.
In short, you could either use some float numbers (position and velocity of ball, or two positions instead of velocity, and position of the paddle). Or you could use discrete inputs (integers, or just pixels, much harder to learn). Those inputs could be connected to a small neural network.
For the command output, the simplest thing to do is to predict a probability for moving up or down. This is a good idea because when you evaluate your controller, it will have some non-zero chance of scoring points, so your genetic algorithm can compare different controllers (with different weights) against each other. Just use the sigmoid function on your neural net output, and interpret it as probability.
If you initialize all your neural network weights to a good random range, you probably can get a pong player that doesn't completely suck just by trying random weights for long enough (even without a GA).
PS: if you didn't plan to use a neural network: they are really simple to implement from scratch if you only have to implement the forward-pass. E.g. if you don't implement back-propagation training, and use a GA instead to learn the weights (or an evolution strategy, or just random weights). The hardest part is to find a good range for the initial random weights.
One design consideration which may be helpful is if you can provide some minimal set of display details out through another interface; and conversely allow for commands to the player paddle. For example, you could send a simple structure describing ball position and both paddles and the ball with each frame update out through a socket to another process. Following the same pattern, you could create a structure that is sent as a reply to that message describing how to move the player paddle. For example:
# Pong Game program
import socket
import struct
# Set up server or client socket
# ... Into game loop
state = (p1_paddle_y, p2_paddle_y, ball_x, ball_y, victory_state)
# assuming pixel locations, and victory_state is -1:Loss, 0:InProgress, 1:Win
myGameStateMsg = struct.pack('>LLLLh', state[0], state[1], state[2], state[3])
sock.send(myGameStateMsg) # Sending game state to player
playerMsg = sock.recv(4) # Get player command
playerCmd = struct.unpack('i', playerMsg)
# playerCmd is an integer describing direction & speed of paddle motion
# ... Process game state update, repeat loop
You could accomplish the same effect using threads and a transacted structure, but you'll need to consider properly guarding those structures (read-while-write problems, etc.)
Personally, I prefer the first approach (sockets & multi-processing) for stability reasons. Suppose there's some sort of bug that causes a crash; if you've already got process separation, it becomes easier to identify the source of the crash. At the thread-level, it's still possible but a little more challenging. One of the other benefits of the multi-processing approach is that you can easily set up multiple players and have the game expand (1vInGameAI, 1v1, 3v3, 4v4). Especially when you expand, you could test out different algorithms, like Q-Learning, adaptive dynamic programming, etc. and have them play each other!
Addendum: Sockets 101
Sockets are a mechanism to get more than one process (i.e., a running program) to send messages to one another. These processes can be running on the same machine or across the network. In a sense, using them is like reading and writing to a file that is constantly modifying (that's the abstraction that sockets provide), but also provide blocking calls so that make the process wait for information to be available.
There is a lot more detail that can be discussed about sockets (like file-sockets vs network-sockets (FD vs IP); UDP vs TCP, etc.) that could easily fill multiple pages. Instead, please refer to the following tutorial about a basic setup: https://docs.python.org/3/howto/sockets.html. With that, you'll have a basic understanding of what they can provide and where to go for more advanced techniques with them.
You may also want to consult the struct tutorial as well for introductory message packing: https://docs.python.org/3/library/struct.html. There are better ways of doing this, but you won't understand much about how they work and break-down without understanding structs.
So you'd want as the AI input the position of the paddle, and the position of the ball. The AI output is two boolean output whether the AI should press up or down button on the next simulation step.
I'd also suggest adding another input value, the ball's velocity. Otherwise, you would've likely needed to add another input which is the location of the ball in the previous simulation step, and a much more complicated middle layer for the AI to learn the concept of velocity.
I'd like to improve my little robot with machine learning.
Up to now it uses simple while and if then decisions in its main function to act as a lawn mowing robot.
My idea is to use SKLearn for that purpose.
Please help me to find the right first steps.
i have a few sensors that tell about the world otside:
World ={yaw, pan, tilt, distance_to_front_obstacle, ground_color}
I have a state vector
State = {left_motor, right_motor, cutter_motor}
that controls the 3 actors of the robot.
I'd like to build a dataset of input and output values to teach sklearn the wished behaviour, after that the input values should give the correct output values for the actors.
One example: if the motors are on and the robot should move forward but the distance meter tells constant values, the robot seems to be blocked. Now it should decide to draw back and turn and move to another direction.
First of all, do you think that this is possible with sklearn and second how should i start?
My (simple) robot control code is here: http://github.com/bgewehr/RPiMower
Please help me with the first steps!
I would suggest to use Reinforcement Learning. Here you have a tutorial of Q-Learning that fits well into your problem.
If you want code in python, right now I think there is no implementation of Q-learning in scikit-learn. However, I can give you some examples of code in python that you could use: 1, 2 and 3.
Also please have in mind that reinforcement learning is set to maximize the sum of all future rewards. You have to focus on the general view.
Good luck :-)
The sklearn package contains a lot of useful tools for machine learning so I dont think thats a problem. If it is, then there are definitely other useful python packages. I think collecting data for the supervised learning phase will be the challenging part, and wonder if it would be smart to make a track with tape within a grid system. That would make it be easier to translate the track to labels (x,y positions in the grid). Each cell in the grid should be small if you want to make complex tracks later on I think. It may be very smart to check how they did in the self-driving google car.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am a machine learning beginner. I'd like to learn the basics by teaching computers to play checkers. Actually, the games I want to learn are Domineering and Hex. My language of choice is Python
These games are pretty easy to store and the rules are much simpler than chess, but there aren't too many people who play. If I can get this idea off the ground it would be great for experimenting Combinatorial Game Theory to see if a computer and find the optimal move.
I found this old paper on checkers from the 1960's by a guy at IBM. Originally I had asked about neural networks, but they are saying it's the wrong tool.
EDIT: It could be that machine learning is not the right strategy. In that case, what goes wrong? and what is a better way?
You might want to take a look at the following: Chinook, Upper Confidence Trees, Reinforcement Learning, and Alpha-Beta pruning. I personally like to combine Alpha-Beta Pruning and Upper Confidence Trees (UCT) for perfect information games where each player has less than 10 reasonable moves. You can use Temporal Difference Learning to create a position evaluation function. Game AI is probably the most fun way to learn machine learning.
For links to all of these topics, click on
http://artent.net/blog/2012/09/26/checkers-and-machine-learning/
(I was not able to include more links because the stack overflow software considers me a newbie!)
Get the book called "Machine learning" by McGraw Hill and read the first chapter. It's extremely well written and the first chapter will teach you enough to make a program that plays checkers. Personally I made a program that plays 5 in a row on miniclip.com, also in python.
http://www.amazon.com/Learning-McGraw-Hill-International-Editions-Computer/dp/0071154671
When playing checkers, you seek to gain an advantage over your opponent by taking his or her pieces and crowning your own. Losing your pieces and allowing your opponent to crown his or her pieces is not desirable, so you avoid doing it.
Board game engines usually revolve around a position evaluation function. For checkers, my first guess would be something like this:
score = number of allies - number of opponents
+ 3 * number of crowned allies - 3 * number of crowned opponents
Given a board, this function will return the score of the board. The higher the score, the better your position. The lower the score, the worse your position.
To make a naive checkers "engine", all you need to do is find the best move given a board position, which is just searching through all immediate legal moves and finding the one that maximizes your score.
Your engine won't think ahead more than one move, but it will be able to play against you somewhat.
The next step would to give your engine the ability to plan ahead, which essentially is predicting your opponent's responses. To do that, just find your opponent's best move (here comes recursion) and subtract it from your score.
As long as I've been a programmer I still have a very elementary level education in algorithms (because I'm self-taught). Perhaps there is a good beginner book on them that you could suggest in your answer.
As a general note, Introduction to Algorithms. That book will get you through pretty much everything you need to know about general algorithms.
Edit:
As AndrewF mentioned, it doesn't actually contain minimax specifically, but it's still a very good resource for learning to understand and implement algorithms.
Look at the wikipedia article on Negamax: http://en.wikipedia.org/wiki/Negamax. It's a slight simplification of minimax that's easier to implement. There's pseudocode on that page.
There is an implementation of minimax as part of an othello game here (and for browsers here).
Stepping through this with a debugger and/or through use of logging statements
may supplement theoretical descriptions of the algorithm.
This visualization applet may also help.
At each stage the player will choose the move which is best for himself. What's best for one player will be worst for the other player. So at one stage the game state with the minimum score will be chosen, and at the next stage the game state available with the maximum score will be chosen, etc.