Time step in reinforcement learning

Time step in reinforcement learning - python

For my first project in reinforcement learning I'm trying to train an agent to play a real time game. This means that the environment constantly moves and makes changes, so the agent needs to be precise about its timing. In order to have a correct sequence, I figured the agent will have to work in certain frequency. By that I mean if the agent has 10Hz frequency, it will have to take inputs every 0.1 secs and make a decision. However, I couldn't find any sources on this problem/matter, but it's probably due to not using correct terminology on my searches. Is this a valid way to approach this matter? If so, what can I use? I'm working with python3 in windows (the game is only ran in windows), are there any libraries that could be used? I'm guessing time.sleep() is not a viable way out, since it isn't very precise (when using high frequencies) and since it just freezes the agent.
EDIT: So my main questions are:
a) Should I use a certain frequency, is this a normal way to operate a reinforcement learning agent?
b) If so what libraries do you suggest?

There isn't a clear answer to this question, as it is influenced by a variety of factors, such as inference time for your model, maximum accepted control rate by the environment and required control rate to solve the environment.
As you are trying to play a game, I am assuming that your eventual goal might be to compare the performance of the agent with the performance of a human.
If so, a good approach would be to select a control rate that is similar to what humans might use in the same game, which is most likely lower than 10 Hertz.
You could try to measure how many actions you use when playing to get a good estimate,
However, any reasonable frequency, such as the 10Hz you suggested, should be a good starting point to begin working on your agent.

Related

PPO2 reinforcement learning 'catastrophic forgetting'?

I'm implementing PPO2 reinforcement learning on my self-build tasks and always encounter such situations where the agent seems to be nearly matured then suddenly catstrophically loses its performance and couldn't hold its stable performance. I don't know what's the right word for it.
I'm just wondering what could be the cause for such catastrophic drop in performance? Any hints or tips?
Many thanks
learningprocess1 learningprocess2

I would guess that your reward function is not capped and can produce extremely high negative rewards in some edge cases.
Two things to prevent this are:
Limit the values from your reward function
Make sure that you can handle situations when your learning environment is unstable like the process crashed, froze, experienced a bug. For example if you give your agent negative reward when he falls (robot trying to walk) and the environment doesn't detect the fall because of some rare bug, then your reward function keeps giving negative rewards until the episode stopped.
Most of the time this is not that big of a deal but if you are unlucky your environment could even produce NaN values and those would corrupt your network

SUMO Simulation. Detecting high Traffic and reducing the speed limit

I am learning SUMO from beggining, I read and learned most of tutorials from: http://sumo.dlr.de/wiki/Tutorials . What I want to do now is to make Cars slow down when there is a Traffic on a Road. I only know how to change the speed limit after a certain Time from here: http://sumo.dlr.de/wiki/Simulation/Variable_Speed_Signs . Do you know how can I change the speed limit when there is Traffic? I think that changing the value of speed Signs is the best Idea here, but I don't how do it.

There is no such thing as event triggered speed signs in SUMO, so you probably need to do that via TraCI. The easiest way is probably to start with the TraCI tutorial for trafficlights and then use functions like traci.edge.getLastStepOccupancy to find out whether vehicles are on the edge in question and then traci.edge.setMaxSpeed to set a new speed.

When building Python with profile guided optimization do I have to leave the computer alone?

This may seem or even be a stupid question: When I build something self-tuning like Python with PGO (or ATLAS or I believe FFTW also does it), does the computer have to be otherwise idle (to not interfere with the measurements) or can I pass the time playing Doom?
The linked README from the python source distribution seems to deem this too trivial a matter to mention, but I'm genuinely unsure about this.

What you do on your computer while it is performing the PGO measurements should have no impact what so ever on the result of the optimization. What PGO do is to use measurments to find the hot paths in the code for a given data set and use this information to make the program as fast as possible for this data set and which path is hot and which is not is independent of other programs running on the computer.
To explain things a bit, when optimizing code there are trade offs. The improvement will be higher in some parts of the code and lower in others depending on which code transforms are used and where they are applied. To get a better final result you want high improvements in code that is executed a lot (hot code in compiler lingo) while you can live with smaller improvements in code that is executed less frequently (cold code). Normally a set of heuristics are used to identify these hot parts of the program and apply optimizations in a way that makes these parts as fast as possible. The problem with this approach is that the heuristics does not know anything about how the program will be used in practice and may misidentify hot code as cold.
Profile guided optimization (PGO) is a method to help the compiler to locate the hot parts of the code using data from real executions. As a first step you tell the compiler to build an instrumented version of the program to measure how the code is executed in practice, typically by adding counters to count the number of iterations in loops and which branch is chosen in if-statements. The second step is to run the instrumented program on real data. At the end of execution the program will output the values of all the added counters and by matching counters with the code it is possible to see which parts of the program are hot (high numbers) and which are cold (low numbers). Finally the program is compiled but this time agumented with the program profile. This implies that the compiler no longer need to guess which parts should be faster and which could be slower it can look it up in the profile.

Machine Learning for optimizing parameters

For my master's thesis I am using a 3rd party program (SExtractor) in addition to a python pipeline to work with astronomical image data. SExtractor takes a configuration file with numerous parameters as input, which influences (after some intermediate steps) the statistics of my data. I've already spent way too much time playing around with the parameters, so I've looked a little bit into machine learning and have gained a very basic understanding.
What I am wondering now is: Is it reasonable to use a machine learning algorithm to optimize the parameters of the SExtractor, when the only method to judge the performance or quality of the parameters is with the final statistics of the analysis run (which takes at least an hour on my machine) and there are more than 6 parameters which influence the statistics.
As an example, I have included 2 different versions of the statistics I am referring to, made from slightly different versions of Sextractor parameters. Red line in the left image is the median value of the standard deviation (as it should be). Blue line is the median of the standard deviation as I get them. The right images display the differences of the objects in the 2 data sets.
I know this is a very specific question, but as I am new to machine learning, I can't really judge if this is possible. So it would be great if someone could suggest me if this is a pointless endeavor and point me in the right.

You can try an educated guess based on the data that you already have. You are trying to optimize the parameters such that the median of the standard deviation has the desired value. You could assume various models and try to estimate the parameters based on the model and the estimated data. But I think you should have a good understanding of machine learning to do so. With good I mean beyound an undergraduate course.

Efficient scheduling of university courses

I'm currently working on a website that will allow students from my university to automatically generate valid schedules based on the courses they'd like to take.
Before working on the site itself, I decided to tackle the issue of how to schedule the courses efficiently.
A few clarifications:
Each course at our university (and I assume at every other
university) comprises of one or more sections. So, for instance,
Calculus I currently has 4 sections available. This means that, depending on the amount of sections, and whether or not the course has a lab, this drastically affects the scheduling process.
Courses at our university are represented using a combination of subject abbreviation and course code. In the case of Calculus I: MATH 1110.
The CRN is a code unique to a section.
The university I study at is not mixed, meaning males and females study in (almost) separate campuses. What I mean by almost is that the campus is divided into two.
The datetimes and timeranges dicts are meant to decreases calls to datetime.datetime.strptime(), which was a real bottleneck.
My first attempt consisted of the algorithm looping continuously until 30 schedules were found. Schedules were created by randomly choosing a section from one of the inputted courses, and then trying to place sections from the remaining courses to try to construct a valid schedule. If not all of the courses fit into the schedule i.e. there were conflicts, the schedule was scrapped and the loop continued.
Clearly, the above solution is flawed. The algorithm took too long to run, and relied too much on randomness.
The second algorithm does the exact opposite of the old one. First, it generates a collection of all possible schedule combinations using itertools.product(). It then iterates through the schedules, crossing off any that are invalid. To ensure assorted sections, the schedule combinations are shuffled (random.shuffle()) before being validated. Again, there is a bit of randomness involved.
After a bit of optimization, I was able to get the scheduler to run in under 1 second for an average schedule consisting of 5 courses. That's great, but the problem begins once you start adding more courses.
To give you an idea, when I provide a certain set of inputs, the amount of combinations possible is so large that itertools.product() does not terminate in a reasonable amount of time, and eats up 1GB of RAM in the process.
Obviously, if I'm going to make this a service, I'm going to need a faster and more efficient algorithm. Two that have popped up online and in IRC: dynamic programming and genetic algorithms.
Dynamic programming cannot be applied to this problem because, if I understand the concept correctly, it involves breaking up the problem into smaller pieces, solving these pieces individually, and then bringing the solutions of these pieces together to form a complete solution. As far as I can see, this does not apply here.
As for genetic algorithms, I do not understand them much, and cannot even begin to fathom how to apply one in such a situation. I also understand that a GA would be more efficient for an extremely large problem space, and this is not that large.
What alternatives do I have? Is there a relatively understandable approach I can take to solve this problem? Or should I just stick to what I have and hope that not many people decide to take 8 courses next semester?
I'm not a great writer, so I'm sorry for any ambiguities in the question. Please feel free to ask for clarification and I'll try my best to help.
Here is the code in its entirety.
http://bpaste.net/show/ZY36uvAgcb1ujjUGKA1d/
Note: Sorry for using a misleading tag (scheduling).

Scheduling is a very famous constraint satisfaction problem that is generally NP-Complete. A lot of work has been done on the subject, even in the same context as you: Solving the University Class Scheduling Problem Using Advanced ILP Techniques. There are even textbooks on the subject.
People have taken many approaches, including:
Dynamic programming
Genetic algorithms
Neural networks
You need to reduce your problem-space and complexity. Make as many assumptions as possible (max amount of classes, block based timing, ect). There is no silver bullet for this problem but it should be possible to find a near-optimal solution.
Some semi-recent publications:
QUICK scheduler a time-saving tool for scheduling class sections
Scheduling classes on a College Campus

Did you ever read anything about genetic programming? The idea behind it is that you let the 'thing' you want solved evolve, just by itsself, until it has grown to the best solution(s) possible.
You generate a thousand schedules, of which usually zero are anywhere in the right direction of being valid. Next, you change 'some' courses, randomly. From these new schedules you select some of the best, based on ratings you give according to the 'goodness' of the schedule. Next, you let them reproduce, by combining some of the courses on both schedules. You end up with a thousand new schedules, but all of them a tiny fraction better than the ones you had. Let it repeat until you are satisfied, and select the schedule with the highest rating from the last thousand you generated.
There is randomness involved, I admit, but the schedules keep getting better, no matter how long you let the algorithm run. Just like real life and organisms there is survival of the fittest, and it is possible to view the different general 'threads' of the same kind of schedule, that is about as good as another one generated. Two very different schedules can finally 'battle' it out by cross breeding.
A project involving school schedules and genetic programming:
http://www.codeproject.com/Articles/23111/Making-a-Class-Schedule-Using-a-Genetic-Algorithm
I think they explain pretty well what you need.
My final note: I think this is a very interesting project. It is quite difficult to make, but once done it is just great to see your solution evolve, just like real life. Good luck!

The way you're currently generating combinations of sections is probably throwing up huge numbers of combinations that are excluded by conflicts between more than one course. I think you could reduce the number of combinations that you need to deal with by generating the product of the sections for only two courses first. Eliminate the conflicts from that set, then introduce the sections for a third course. Eliminate again, then introduce a fourth, and so on. This should see a more linear growth in the processing time required as the number of courses selected increases.

This is a hard problem. It you google something like 'course scheduling problem paper' you will find a lot of references. Genetic algorithm - no, dynamic programming - yes. GAs are much harder to understand and implement than standard DP algos. Usually people who use GAs out of the box, don't understand standard techniques. Do some research and you will find different algorithms. You might be able to find some implementations. Coming up with your own algorithm is way, way harder than putting some effort into understanding DP.

The problem you're describing is a Constraint Satisfaction Problem. My approach would be the following:
Check if there's any uncompatibilities between courses, if yes, record them as constraints or arcs
While not solution is found:
Select the course with less constrains (that is, has less uncompatibilities with other courses)
Run the AC-3 algorithm to reduce search space
I've tried this approach with sudoku solving and it worked (solved the hardest sudoku in the world in less than 10 seconds)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.