I am working on the MADDPG algorithm. I am working with many agents (>20) and wanted to parallelize the action/prediction for each agents which is done is a for loop. The code is below:
action_n = [agent.action(obs) for agent, obs in zip(trainers,obs_n)]
Here agent is the object to the model which would call the action function and obs is the state for the respective actor
I am unable to do so. Can anyone help?
Related
I just started using Psychopy in order to create my first adaptive staircase experiment.
I tried to set up the experiment by using the Builder interface. The loop type I'm using is the staircase, not the interleaved staircase.
In the experiment, I would like to change the contrast of the image according to the participants response.
I've already designed the experiment so far that I can present the start stimulus to the participants, when the program runs. Also, the participant can respond. But the problem is, that my stimulus does not change at all after a participant responded. I've tried so many things to fix that, starting from inserting every possible stimulus manually, coding it according to the tutorial of Yentl de Kloe, but nothing is working - the simulus remains unchanged which leads to the result that the experiment runs forever, if I dont cancel it manually.
Is there anyone, who can tell me a simple (for a beginner understandable), but detailed solution how to solve this problem within the Psychopy Builder?
Thank you in advance!
Experimental Structure
Staircase Loop
I am just getting started self-studying reinforcement-learning with stable-baselines 3. My long-term goal is to train an agent to play a specific turn-based boardgame. Currently I am quite overwhelmed with new stuff, though.
I have implemented a gym-environment that I can use to play my game manually or by having it pick random actions.
Currently I am stuck with trying to get a model to hand me actions in response to an observation. The action-space of my environment is a DiscreteSpace(256). I create the model with the environment as model = PPO('MlpPolicy', env, verbose=1). When I later call model.predict(observation) I do get back a number that looks like an action. When run repeatedly I get different numbers, which I assume is to be expected on an untrained model.
Unfortunately in my game most of the actions are illegal in most states and I would like to filter them and pick the best legal one. Or simply dump the output result for all the actions out to get an insight on what's happening.
In browsing other peoples code I have seen references to model.action_probability(observation). Unfortunately method is not part of stable baselines 3 as far as I can tell. The guide for migration from stable baselines 2 to v3 only mentions it not being implemented [1].
Can you give me a hint on how to go on?
In case anyone comes across this post in the future, this is how you do it for PPO.
import numpy as np
from stable_baselines3.common.policies import obs_as_tensor
def predict_proba(model, state):
obs = obs_as_tensor(state, model.policy.device)
dis = model.policy.get_distribution(obs)
probs = dis.distribution.probs
probs_np = probs.detach().numpy()
return probs_np
About this point.
when I later call model.predict(observation) I do get back a number that looks like an action.
You can prevent that behavior with the following line
model.predict(observation, deterministic=True)
when you add deterministic=True, all the predicted actions will be always determined by the maximum probability, instead of the probability by itself.
Just to give you an example, let's suppose you have the following probabilities:
25% of action A
75% of action B
If you don't use the deterministic=True, the model will use those probabilities to return a prediction.
If you use deterministic=True, the model is going to return always action B.
I would like to create in python some RL algorithm where the algorithm would interact with a very big DataFrame representing stock prices. The algorithm would tell us: Knowing all of the prices and price changes in the market, what would be the best places to buy/sell (minimizing loss maximizing reward). It has to look at the entire DataFrame each step (or else it wouldn't have the entire information from the market).
Is it possible to build such algorithm (which works relatively fast on a large df)? How should it be done? What should my environment look like and which algorithm (specifically) should I use for this type of RL and which reward system? Where should I start
I think you are a little confused with this .what I think you want to do is to check whether the stock prices of a particular company will go up or not or the stock price of which company will shoot up where you already have a dataset regarding the problem statement.
about RL it does not work on any dataset it's a technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences.
you can check this blog for some explanation don't get confused.
https://towardsdatascience.com/types-of-machine-learning-algorithms-you-should-know-953a08248861
I'm trying to apply the concept of a digital twin and would like to update my CAD model in CATIA using real-time data.
e.g. a servo motor cad model in CATIA/solidworks would continuously be receiving data such as speed and acceleration and I would be able to see the kinematics in CATIA/solidworks like the end result of this video https://www.youtube.com/watch?v=tbVXumMtH1A . I would also like to see the stress on the motor parts like the end result of this video https://youtu.be/9glRJyWWXZw
I want to do all of this using a script that sends commands to catia/solidworks, it updates the model and returns updated parameters. Since I want it to work continuously, the state of the model must be preserved between consecutive commands.
From all the information I have come across online I'm pretty sure that it is possible but I can't figure out how to do it. I've tried using pycatia but the documentation only mentions very basic functionality nothing about analysis or simulations.
Within your kinematic CATIA Product you could add parameters that define the elements whose attributes would change. You could then use pycatia to update those parameters which in turn would update the kinematic model.
Not sure if it would be relevant now, but pySW helps you update a predefined CAD models by updating its equations manager.
Dask distributed supports work stealing, which can speed up the computation and makes it more robust, however, each task can be run more than once.
Here I am asking for a way to "tidy up" the results of workers, who did not contribute to the final result. To illustrate what I am asking for:
Let's assume, each worker is doing Monte-Carlo like simulations and saves the ~10GB simulation result in a results folder. In case of work stealing, the simulation result will be stored several times, making it desirable to only keep one of it. What would be the best way to achieve this? Can dask.distributed automatically call some "tidy up" procedure on tasks that did not end up to contribute to the final result?
Edit:
I currently start the simulation using the following code:
c = distributed.Client(myserver)
mytask.compute(get = c.get) #mytask is a delayed object
So I guess, afterwards all data is deleted from the cluster and if I "look at data that exists in multiple locations" after the computation, it is not guaranteed that I can find the respective tasks? Also I currently have no clear idea how to map the ID of the future object to the filename to which the respective task saved its results. I currently rely on tempfile to avoid name collisions, which given the setting of a Monte Carlo simulation is by far the easiest.
There is currently no difference in Dask between a task that was stolen and a task that was not.
If you want you can look at data that exists in multiple locations and then send commands directly to those workers using the following operations:
http://distributed.readthedocs.io/en/latest/api.html#distributed.client.Client.who_has
http://distributed.readthedocs.io/en/latest/api.html#distributed.client.Client.run