So I made Pong using PyGame and I want to use genetic algorithms to have an AI learn to play the game. I want it to only know the location of its paddle and the ball and controls. I just don't know how to have the AI move the paddle on its own. I don't want to do like: "If the ball is above you, go up." I want it to just try random stuff until it learns what to do.
So my question is, how do I get the AI to try controls and see what works?
Learning Atari-Pong has become a standard task in reinforcement learning. For example there is the OpenAI baselines github repo implementing RL algorithms that can be plugged into various tasks.
You definitely don't need those advanced algos just to learn Pong the way you describe, but you can learn from the API they're using to separate between tasks ("environments" in reinforcement learning terms) and the AI part ("controller" or "agent"). For this, I suggest to read the OpenAI Gymn Documentation for how you would add a new Environment.
In short, you could either use some float numbers (position and velocity of ball, or two positions instead of velocity, and position of the paddle). Or you could use discrete inputs (integers, or just pixels, much harder to learn). Those inputs could be connected to a small neural network.
For the command output, the simplest thing to do is to predict a probability for moving up or down. This is a good idea because when you evaluate your controller, it will have some non-zero chance of scoring points, so your genetic algorithm can compare different controllers (with different weights) against each other. Just use the sigmoid function on your neural net output, and interpret it as probability.
If you initialize all your neural network weights to a good random range, you probably can get a pong player that doesn't completely suck just by trying random weights for long enough (even without a GA).
PS: if you didn't plan to use a neural network: they are really simple to implement from scratch if you only have to implement the forward-pass. E.g. if you don't implement back-propagation training, and use a GA instead to learn the weights (or an evolution strategy, or just random weights). The hardest part is to find a good range for the initial random weights.
One design consideration which may be helpful is if you can provide some minimal set of display details out through another interface; and conversely allow for commands to the player paddle. For example, you could send a simple structure describing ball position and both paddles and the ball with each frame update out through a socket to another process. Following the same pattern, you could create a structure that is sent as a reply to that message describing how to move the player paddle. For example:
# Pong Game program
import socket
import struct
# Set up server or client socket
# ... Into game loop
state = (p1_paddle_y, p2_paddle_y, ball_x, ball_y, victory_state)
# assuming pixel locations, and victory_state is -1:Loss, 0:InProgress, 1:Win
myGameStateMsg = struct.pack('>LLLLh', state[0], state[1], state[2], state[3])
sock.send(myGameStateMsg) # Sending game state to player
playerMsg = sock.recv(4) # Get player command
playerCmd = struct.unpack('i', playerMsg)
# playerCmd is an integer describing direction & speed of paddle motion
# ... Process game state update, repeat loop
You could accomplish the same effect using threads and a transacted structure, but you'll need to consider properly guarding those structures (read-while-write problems, etc.)
Personally, I prefer the first approach (sockets & multi-processing) for stability reasons. Suppose there's some sort of bug that causes a crash; if you've already got process separation, it becomes easier to identify the source of the crash. At the thread-level, it's still possible but a little more challenging. One of the other benefits of the multi-processing approach is that you can easily set up multiple players and have the game expand (1vInGameAI, 1v1, 3v3, 4v4). Especially when you expand, you could test out different algorithms, like Q-Learning, adaptive dynamic programming, etc. and have them play each other!
Addendum: Sockets 101
Sockets are a mechanism to get more than one process (i.e., a running program) to send messages to one another. These processes can be running on the same machine or across the network. In a sense, using them is like reading and writing to a file that is constantly modifying (that's the abstraction that sockets provide), but also provide blocking calls so that make the process wait for information to be available.
There is a lot more detail that can be discussed about sockets (like file-sockets vs network-sockets (FD vs IP); UDP vs TCP, etc.) that could easily fill multiple pages. Instead, please refer to the following tutorial about a basic setup: https://docs.python.org/3/howto/sockets.html. With that, you'll have a basic understanding of what they can provide and where to go for more advanced techniques with them.
You may also want to consult the struct tutorial as well for introductory message packing: https://docs.python.org/3/library/struct.html. There are better ways of doing this, but you won't understand much about how they work and break-down without understanding structs.
So you'd want as the AI input the position of the paddle, and the position of the ball. The AI output is two boolean output whether the AI should press up or down button on the next simulation step.
I'd also suggest adding another input value, the ball's velocity. Otherwise, you would've likely needed to add another input which is the location of the ball in the previous simulation step, and a much more complicated middle layer for the AI to learn the concept of velocity.
Related
How can you prevent the agent from non-stop repeating the same action circle?
Of course, somehow with changes in the reward system. But are there general rules you could follow or try to include in your code to prevent such a problem?
To be more precise, my actual problem is this one:
I'm trying to teach an ANN to learn Doodle Jump using Q-Learning. After only a few generations the agent keeps jumping on one and the same platform/stone over and over again, non-stop. It doesn't help to increase the length of the random-exploration-time.
My reward system is the following:
+1 when the agent is living
+2 when the agent jumps on a platform
-1000 when it dies
An idea would be to reward it negative or at least with 0 when the agent hits the same platform as it did before. But to do so, I'd have to pass a lot of new input-parameters to the ANN: x,y coordinates of the agent and x,y coordinates of the last visited platform.
Furthermore, the ANN then would also have to learn that a platform is 4 blocks thick, and so on.
Therefore, I'm sure that this idea I just mentioned wouldn't solve the problem, contrarily I believe that the ANN would in general simply not learn well anymore, because there are too many unuseful and complex-to-understand inputs.
This is not a direct answer to the very generally asked question.
I found a workaround for my particular DoodleJump example, probably someone does something similar and needs help:
While training: Let every platform the agent jumped on disappear after that, and spawn a new one somewhere else.
While testing/presenting: You can disable the new "disappear-feature" (so that it's like it was before again) and the player will play well and won't hop on one and the same platform all the time.
For my first project in reinforcement learning I'm trying to train an agent to play a real time game. This means that the environment constantly moves and makes changes, so the agent needs to be precise about its timing. In order to have a correct sequence, I figured the agent will have to work in certain frequency. By that I mean if the agent has 10Hz frequency, it will have to take inputs every 0.1 secs and make a decision. However, I couldn't find any sources on this problem/matter, but it's probably due to not using correct terminology on my searches. Is this a valid way to approach this matter? If so, what can I use? I'm working with python3 in windows (the game is only ran in windows), are there any libraries that could be used? I'm guessing time.sleep() is not a viable way out, since it isn't very precise (when using high frequencies) and since it just freezes the agent.
EDIT: So my main questions are:
a) Should I use a certain frequency, is this a normal way to operate a reinforcement learning agent?
b) If so what libraries do you suggest?
There isn't a clear answer to this question, as it is influenced by a variety of factors, such as inference time for your model, maximum accepted control rate by the environment and required control rate to solve the environment.
As you are trying to play a game, I am assuming that your eventual goal might be to compare the performance of the agent with the performance of a human.
If so, a good approach would be to select a control rate that is similar to what humans might use in the same game, which is most likely lower than 10 Hertz.
You could try to measure how many actions you use when playing to get a good estimate,
However, any reasonable frequency, such as the 10Hz you suggested, should be a good starting point to begin working on your agent.
I'm working on an AI that should be able to play chess. I want to make use of keras neural networks to evaluate position on the board. I would like to teach the NN by playing plenty of games between AI and AI. I already have alpha-beta pruning implemented.
My idea was to create a csv file with positions of every single game the AI has played. I would chose variables I would like to store there. Very simple example:
"white_pawns","black_pawns","white_queens","black_queens","white_pawns_on_side","white_won"
3,7,1,2,0,False
3,5,3,0,1,True
I would like to train a model using these values and then use it to evaluate current board position. So the main question is:
How to make a neural network output a value of position given these variables? Eg. 0 when it's draw or 1 when we are one pawn up. keras preferred, but I'm open to any other python library.
I would also be grateful if you could dispel my few other doubts.
Are there any flaws in that approach? Wouldn't every position from a single game make the neural network overfitted? Maybe I should pick only few positions from each game?
I think you know this, but when a human evaluates the board, he is not only looking at the material—but also looking in the positions of the pieces. Secondly, with this csv, you can't decide what is a better movie if the thing you see is only true or false. This is why the engine's evaluation is numerical. Or you want it to output a number from -1 to 1, and then it is the score? Looking to do the same thing but do 1 for a white win, -1 for a black win or 0 for a draw (in the dataset file). If you want to do this with me, hit me up (is there a messaging service for stack overflow?).
conclusion
the input should be a numerical representation for the board, in my opinion, and the target should not be a classifier but a numerical classifier. it is actually simpler.
I have a python engine that I am working on and this is an opportunity to meet new people that are interested in the things I am.
only saying, this is my first answer so if something is unclear please make a comment and I will try to help!
also, like krish said this can be implemented with reinforcement learning. but first you need to make a dqn (deep q networks (q learning is a really popular reinforcement learning algorithm)) and for that you need another network. because if not, this will take a lot of time to train.
I'm producing an ugv prototype. The goal is to perform the desired actions to the targets set within the maze. When I surf the Internet, the mere right to navigate in the labyrinth is usually made with a distance sensor. I want to consult more ideas than the question.
I want to navigate the labyrinth by analyzing the image from the 3d stereo camera. Is there a resource or successful method you can suggest for this? As a secondary problem, the car must start in front of the entrance of the labyrinth, see the entrance and go in, and then leave the labyrinth after it completes operations in the labyrinth.
I would be glad if you suggest a source for this problem. :)
The problem description is a bit vague, but i'll try to highlight some general ideas.
An useful assumption is that labyrinth is a 2D environment which you want to explore. You need to know, at every moment, which part of the map has been explored, which part of the map still needs exploring, and which part of the map is accessible in any way (in other words, where are the walls).
An easy initial data structure to help with this is a simple matrix, where each cell represents a square in the real world. Each cell can be then labelled according to its state, starting in an unexplored state. Then you start moving, and exploring. Based on the distances reported by the camera, you can estimate the state of each cell. The exploration can be guided by something such as A* or Q-learning.
Now, a rather subtle issue is that you will have to deal with uncertainty and noise. Sometimes you can ignore it, sometimes you don't. The finer the resolution you need, the bigger is the issue. A probabilistic framework is most likely the best solution.
There is an entire field of research of the so-called SLAM algorithms. SLAM stands for simultaneous localization and mapping. They build a map using some sort of input from various types of cameras or sensors, and they build a map. While building the map, they also solve the localization problem within the map. The algorithms are usually designed for 3d environments, and are more demanding than the simpler solution indicated above, but you can find ready to use implementations. For exploration, something like Q-learning still have to be used.
I'd like to improve my little robot with machine learning.
Up to now it uses simple while and if then decisions in its main function to act as a lawn mowing robot.
My idea is to use SKLearn for that purpose.
Please help me to find the right first steps.
i have a few sensors that tell about the world otside:
World ={yaw, pan, tilt, distance_to_front_obstacle, ground_color}
I have a state vector
State = {left_motor, right_motor, cutter_motor}
that controls the 3 actors of the robot.
I'd like to build a dataset of input and output values to teach sklearn the wished behaviour, after that the input values should give the correct output values for the actors.
One example: if the motors are on and the robot should move forward but the distance meter tells constant values, the robot seems to be blocked. Now it should decide to draw back and turn and move to another direction.
First of all, do you think that this is possible with sklearn and second how should i start?
My (simple) robot control code is here: http://github.com/bgewehr/RPiMower
Please help me with the first steps!
I would suggest to use Reinforcement Learning. Here you have a tutorial of Q-Learning that fits well into your problem.
If you want code in python, right now I think there is no implementation of Q-learning in scikit-learn. However, I can give you some examples of code in python that you could use: 1, 2 and 3.
Also please have in mind that reinforcement learning is set to maximize the sum of all future rewards. You have to focus on the general view.
Good luck :-)
The sklearn package contains a lot of useful tools for machine learning so I dont think thats a problem. If it is, then there are definitely other useful python packages. I think collecting data for the supervised learning phase will be the challenging part, and wonder if it would be smart to make a track with tape within a grid system. That would make it be easier to translate the track to labels (x,y positions in the grid). Each cell in the grid should be small if you want to make complex tracks later on I think. It may be very smart to check how they did in the self-driving google car.