Multi-agent reinforcement learning environment Public transport problem

Multi-agent reinforcement learning environment Public transport problem - python

For my Msc thesis I want to apply multi-agent RL to a bus control problem. The idea is that the busses operate on a given line, but without a timetable. The busses should have bus stops where passengers accumulate over time and pick them up, the longer the interval between busses, the more passengers will be waiting at the stop (on average, it's a stochastic process). I also want to implement some intersections where busses will have to wait for a green light.
I'm not sure yet what my reward function will look like, but it will be something along the lines of keeping the intervals between busses as regular as possible or minimising total travel time of the passengers.
The agents in the problem will be the busses, but also the traffic lights. The traffic lights can choose when to show green light for which road: apart from the busses they will have other demand as well that has to be processed. The busses can choose to speed up, slow down, to wait longer at a stop or to continue on normal speed.
To be able to put this problem in a RL framework I will need an enviroment and suitable RL algorithms. Ideally I would have a flexible simulation environment to re-create my case study bus line and connect this to of-the-shelf RL algorithms. However, so far I haven't found this. This means I may have to connect a simulation environment to something like an OpenAI gym myself.
Does anyone have advice for which simulation environment may be suitable? And if it's possible to connect this to of-the-shelf RL algorithms?
I feel most comfortable with programming in Python, but other languages are an option as well (but this would mean considerable extra effort from my side).
So far I have found the following simulation environments that may be suitable:
NetLogo
SimPy
Mesa
MATSim (https://www.matsim.org)
Matlab
CityFlow (https://cityflow-project.github.io/#about)
Flatland (https://www.aicrowd.com/challenges/neurips-2020-flatland-challenge/)
For the RL algorithms the options seem to be:
Code them myself
Create the environment according to the OpenAI gym API guidelines and use the OpenAI baselines algorithms.
I would love to hear some suggestions and advice on which environments may be most suitable for my problem!

You can also check SUMO as a traffic simulator and RLLib library for multi-agent reinforcement learning.

Related

Reinforcement Learning solution for Flappy Bird with PPO algorithm

The quick summary of my question:
I'm trying to solve a clone of the Flappy Bird game found on the internet with the Reinforcement Learning algorithm Proximal Policy Optimization. Apparently, I've faced an issue with designing the reward system. How can I specify a reward for the agent given that it's a third party game so it does not return anything to me and the only info I get is visual information form the window?
Some details and the background:
Prior to trying to solve a third party game I've played with several google gym environments such as Cart-Pole, Mountain Car, Lunar Lander and recently Car Racing. To solve them I used PG, DQN, Actor-Critic and PPO algorithms. After understanding how to work with problems when the state is an image I've decided to take on a new challenge and try to get out of the sandbox (gym).
I've picked Flappy Bird because it's simple in concept, action space is 1 (actually 2) and it's notoriously hard for humans.
My code can be found here: https://github.com/Mike-Kom/Flappy-Bird-PPO
Agent class and buffer was tested on the Car Racing so there shouldn't be any issues with the RL algorithm. The neural net was changed a little due to a different state size but conceptually it's the same so there should not be any problems ether.
My current guess is that the reward system is not robust and causes the agent not to learn properly.
Currently I'm just giving the agent 0,025 points each step and 2 points after the 25th frame and above (I've found that this is exactly the frame at which the agent passes between the first two pipes.) but it does not seems to work.
Any suggestions on how to solve an external environment and especially on how to design the reward system are welcome!
Sorry if the code is messy and not professional it was originally meant to be just for me :) Programing is just my hobby and my occupation is far from code writing.
Moreover, this is my first question here and I wanted to take an opportunity and thank all of you for writing our answers and suggestions for different question! You make this community super helpful for so many people! Even though, I did not write a question before I found here a tone of answers and good suggestions :)

Python Multi-processing architecture

I am actually doing a smart RC car demo with Raspberry Pi 3B, some sensors and actuators.
Basically, the car should run autonomously in a controlled indoor environment: move autonomously while tracking a line on the ground and detecting, avoiding an obstacle on its way and taking a picture of the obstacle, etc.
My first idea of architecture looks like below (you can ignore the camera part):
However, I think it is far from optimal of using a file in the middle to communicate between different processes.
As the sensors have different frequency to launch, I think multi-processing should probably solve the problem.
I did some search but am still not clear on how to architect it with multi-processing.
Any advices would be really appreciated.
best regards,

Time step in reinforcement learning

For my first project in reinforcement learning I'm trying to train an agent to play a real time game. This means that the environment constantly moves and makes changes, so the agent needs to be precise about its timing. In order to have a correct sequence, I figured the agent will have to work in certain frequency. By that I mean if the agent has 10Hz frequency, it will have to take inputs every 0.1 secs and make a decision. However, I couldn't find any sources on this problem/matter, but it's probably due to not using correct terminology on my searches. Is this a valid way to approach this matter? If so, what can I use? I'm working with python3 in windows (the game is only ran in windows), are there any libraries that could be used? I'm guessing time.sleep() is not a viable way out, since it isn't very precise (when using high frequencies) and since it just freezes the agent.
EDIT: So my main questions are:
a) Should I use a certain frequency, is this a normal way to operate a reinforcement learning agent?
b) If so what libraries do you suggest?

There isn't a clear answer to this question, as it is influenced by a variety of factors, such as inference time for your model, maximum accepted control rate by the environment and required control rate to solve the environment.
As you are trying to play a game, I am assuming that your eventual goal might be to compare the performance of the agent with the performance of a human.
If so, a good approach would be to select a control rate that is similar to what humans might use in the same game, which is most likely lower than 10 Hertz.
You could try to measure how many actions you use when playing to get a good estimate,
However, any reasonable frequency, such as the 10Hz you suggested, should be a good starting point to begin working on your agent.

Course on HPC with Python?

I am developing a simulation and data-processing pipeline in Python. Currently I am still "making it work", but in a few months I will have to "make it fast". I am not very good at HPC in general and particularly not with Python.
What online (or Europe-based) courses are available that contain at least the following topics:
parallel computation in Python, and
interfacing Python and C?
Opinion based part (sorry about this):
The only course I managed to find is Python Academy in Leipzig (http://www.python-academy.com/courses/python_hpc.html). Has anybody tried Python Academy? I don't find any independent reviews, but there is a significant cost, so I would not want to go in blind.

None of the items mentioned in the course description qualifies for HPC; which currently refers to GPU utilization for massive parallelisation.
High-performance computing (HPC) is the use of parallel processing for running advanced application programs efficiently, reliably and
quickly. The term applies especially to systems that function above a
teraflop or 10^12 floating-point operations per second.
The course contents will enable you to program current generation of Raspberry PI which has a quad core processor and dedicated graphics processor. But I would not call that HPC (High Performance Computing)
If you are interested in High Performance Computing, get a NVIDIA graphics card and try pycuda or try pyopencl which is a more open standard for hybrid computing.
You can find good Videos on Youtube explaining the both.

Use machine learning for simple robot control

I'd like to improve my little robot with machine learning.
Up to now it uses simple while and if then decisions in its main function to act as a lawn mowing robot.
My idea is to use SKLearn for that purpose.
Please help me to find the right first steps.
i have a few sensors that tell about the world otside:
World ={yaw, pan, tilt, distance_to_front_obstacle, ground_color}
I have a state vector
State = {left_motor, right_motor, cutter_motor}
that controls the 3 actors of the robot.
I'd like to build a dataset of input and output values to teach sklearn the wished behaviour, after that the input values should give the correct output values for the actors.
One example: if the motors are on and the robot should move forward but the distance meter tells constant values, the robot seems to be blocked. Now it should decide to draw back and turn and move to another direction.
First of all, do you think that this is possible with sklearn and second how should i start?
My (simple) robot control code is here: http://github.com/bgewehr/RPiMower
Please help me with the first steps!

I would suggest to use Reinforcement Learning. Here you have a tutorial of Q-Learning that fits well into your problem.
If you want code in python, right now I think there is no implementation of Q-learning in scikit-learn. However, I can give you some examples of code in python that you could use: 1, 2 and 3.
Also please have in mind that reinforcement learning is set to maximize the sum of all future rewards. You have to focus on the general view.
Good luck :-)

The sklearn package contains a lot of useful tools for machine learning so I dont think thats a problem. If it is, then there are definitely other useful python packages. I think collecting data for the supervised learning phase will be the challenging part, and wonder if it would be smart to make a track with tape within a grid system. That would make it be easier to translate the track to labels (x,y positions in the grid). Each cell in the grid should be small if you want to make complex tracks later on I think. It may be very smart to check how they did in the self-driving google car.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.