How to get action_propability() in stable baselines 3

How to get action_propability() in stable baselines 3 - python

I am just getting started self-studying reinforcement-learning with stable-baselines 3. My long-term goal is to train an agent to play a specific turn-based boardgame. Currently I am quite overwhelmed with new stuff, though.
I have implemented a gym-environment that I can use to play my game manually or by having it pick random actions.
Currently I am stuck with trying to get a model to hand me actions in response to an observation. The action-space of my environment is a DiscreteSpace(256). I create the model with the environment as model = PPO('MlpPolicy', env, verbose=1). When I later call model.predict(observation) I do get back a number that looks like an action. When run repeatedly I get different numbers, which I assume is to be expected on an untrained model.
Unfortunately in my game most of the actions are illegal in most states and I would like to filter them and pick the best legal one. Or simply dump the output result for all the actions out to get an insight on what's happening.
In browsing other peoples code I have seen references to model.action_probability(observation). Unfortunately method is not part of stable baselines 3 as far as I can tell. The guide for migration from stable baselines 2 to v3 only mentions it not being implemented [1].
Can you give me a hint on how to go on?

In case anyone comes across this post in the future, this is how you do it for PPO.
import numpy as np
from stable_baselines3.common.policies import obs_as_tensor
def predict_proba(model, state):
obs = obs_as_tensor(state, model.policy.device)
dis = model.policy.get_distribution(obs)
probs = dis.distribution.probs
probs_np = probs.detach().numpy()
return probs_np

About this point.
when I later call model.predict(observation) I do get back a number that looks like an action.
You can prevent that behavior with the following line
model.predict(observation, deterministic=True)
when you add deterministic=True, all the predicted actions will be always determined by the maximum probability, instead of the probability by itself.
Just to give you an example, let's suppose you have the following probabilities:
25% of action A
75% of action B
If you don't use the deterministic=True, the model will use those probabilities to return a prediction.
If you use deterministic=True, the model is going to return always action B.

Related

Original system and bakeoff meaning in NLP

So I am doing this homework in the one of standford courses and I managed to solve all the questions but I am trying to understand the last and correct me if I am wrong.
One it says build original system: That is building the model
The other one is bake off: That is comparing different models to each other to see the best one that perform the best.
Am I correct ?
This is the link to the homework: https://www.youtube.com/watch?v=vqNj1dr8-HM
It is the very end. It is just that these terms very confusing and new to me
Thank in advance.
I need to know the exact steps for building the original system. What does it mean>? What is the bake off?

Backoff means you go back to a n-1 gram level to calculate the probabilities when you encounter a word with prob=0. So in our case you will use a 3-gram model to calculate the probability of "sunny" in the context "is a very".
The most used scheme is called "stupid backoff" and whenever you go back 1 level you multiply the odds by 0.4. So if sunny exists in the 3-gram model the probability would be 0.4 * P("sunny"|"is a very").
You can go back to the unigram model if needed multipliying by 0.4^n where n is the number of times you backed off.

Predicting score from text data and non-text data

This is for an assignment. I have ~100 publications on a dummy website. Each publication has been given an artificial score indicating how successful it is. I need to predict what affects that value.
So far, I've scraped the information that might affect the score and saved them into individual lists. The relevant part of my loop looks like:
for publicationurl:
Predictor 1 = 3.025
Predictor 2 = Journal A
Predictor 3 = 0
Response Variable = 42.5
Title = Sentence
Abstract = Paragraph
I can resolve most of that by putting predictors 1-3 and the response into a dataframe then doing regression. The bit that is tripping me up is the Title and Abstract text. I can strip them of punctuation and remove stopwords, but after that I'm not sure how to actually analyse them alongside the other predictors. I was looking into doing some kind of text-similarity comparison between high-high and high-low scoring pairs and basing whether the title and abstract affect the score based off of that, but am hoping there is a much neater method that allows me to actually put that text into a predictive model too.
I currently have 5 predictors besides the text and there are ~40,000 words in total across all titles and abstracts if any of that affects what kind of method works best. Ideally I'd like to end up being able to put everything into a single predictive model, but any method that can lead me to a working solution is good.

This would be an ideal situation for using Multinomial Naive Bayes. This is a relatively simple yet quite powerful method to classify the class of a text. If this is an introductory exercise I'm 99% sure that you're prof is expecting something with NB to solve the given problem.
I would recommend a library like sklearn which should make the task almost trivial. If you're interested in the intuition behind NB, this YouTube video should serve as a good introduction.
Start off by going over some examples/blog posts, google should provide you with countless examples. Then modify the code to fit your use case.
You could either group the articles in two classes e.g. score <=5 = bad, score > 5 = good. A next step would be to predict more than two classes like explained here.

Batch Norm - Extract Running Mean & Running Variance in TensorFlow

I am trying to look at the running mean and running variance of a trained tensorflow model that is exported via GCMLE (saved_model.pb, assets/* & variables/*). Where are these values kept in the graph? I can access gamma/beta values from tf.GraphKeys.TRAINABLE_VARIABLES but I have not been able to find the running mean and running variance in any of the tf.GraphKeys.MODEL_VARIABLES. Are the running mean and running variance stored somewhere else?
I know that at test time (ie. Modes.EVAL), the running mean and running variance are used to normalize the incoming data, then the normalized data is scaled and shifted using gamma and beta. I am trying to look at all of the variables that I need at inference time, but I cannot find the running mean and running variance. Are these only used at test time and not at inference time (Modes.PREDICT)? If so, that would explain why I can't find them in the exported model, but I am expecting them to be there.
Based on tf.GraphKeys I have tried other things like tf.GraphKeys.MOVING_AVERAGE_VARIABLES but they are also empty. I also saw this line in the batch_normalization documentation "Note: when training, the moving_mean and moving_variance need to be updated. By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they need to be added as a dependency to the train_op." so I then tried looking at tf.GraphKeys.UPDATE_OPS from my saved model and they contain an assign op batch_normalization/AssignMovingAvg:0 but still not clear where I would get the value from.

It appears that the moving mean and moving variance are stored within tf.GraphKeys.GLOBAL_VARIABLES and it looks like the reason nothing showed up in MODEL_VARIABLES is because you need to use tf.contrib.framework.local_variable

In addition to #reese0106's answer, if you'd like to take out the moving_mean, moving_variance for BatchNorm, you can index them with names as follows.
vars = tf.global_variables() # shows every variable being used.
vars_moving_mean_variance = []
for var in vars:
if ("moving_mean" in var.name) or ("moving_variance" in var.name):
vars_moving_mean_variance.append(var)
print(vars_moving_mean_variance)
p.s. Thanks for the question and the answer. I solved my own problem too.

Python - Decision Trees and Handling Unique Labels/features

Not sure if the title makes complete sense so sorry about that.
I'm new to Machine Learning and I'm using Scikit and decision trees.
Here's what I want to do; I want to take all of my inputs and include a unique feature which is a client ID. Now, the client ID is unique and can't be summed up in the normal way a feature would in decision tree analysis. What's happening now is that the tree is taking the client ID's as any other integer value and then branching it saying for instance, client ID's less than 430 go in a different path than those over 430. This isn't correct and not what I want to do. What I want to do is make the decision tree understand that the specific field can't be analyzed in such a way and each client will have their own branch. Is this possible with decision trees?
I do have a couple workarounds, one of which would be to develop unique decision trees for each client but training this would be a nightmare. I could also do another workaround, and lets say we have 800 clients, I would create 800 features with a bit field, but this is also crazy.

This is a fairly common problem in machine learning. A machine learning feature can't be unique to each instance in any case. Intuitively it makes sense; the algorithm doesn't learn anything if it can't extrapolate from that feature.
What you can do is just separate out that piece of information from the decision tree before you pass the rest of the features, and just re-merge the ID and the prediction after it is made.
I would strongly discourage any kind of manipulation of the feature vector to include the ID in any form. Features are only supposed to be things that the algorithm is supposed to use to make decisions. Don't give it information you don't want it to use. You're right in wanting to avoid using an ID as a feature because (most likely) the ID has no bearing on whatever you're trying to predict.
If you do want individual models (and have enough data for each user that you can make them), its not as big a pain as you might be thinking. You can use Scikit's model saving feature and this answer on saving pickles to MySQL to easily create and store personalized models. Unless you have a very large number of users, creating personalized decision trees shouldn't take very long.

Python OpenCV Random trees parameters

I am trying to create an image classifier based on RandomTrees from OpenCV (version 2.4). Right now, I initialize everything likes this:
self.model = cv.RTrees()
max_num_trees=10
max_error=1
max_d=5
criteria=cv.TERM_CRITERIA_MAX_ITER+cv.TERM_CRITERIA_EPS
parameters = dict(max_depth=max_d, min_sample_count=5, use_surrogates=False, nactive_vars=0,
term_crit=(criteria, max_num_trees, max_error))
self.model.train(dataset, cv.CV_ROW_SAMPLE, responses, params=parameters)
I did it by looking at this question. The only problem is, whatever I change in the parameters, classification always remains the same (and wrong). Since the python documentation on this is very very scarce, I have no choice but to ask here what to do and how to check what I am doing. How to get the number of trees it generates and all the other things that are explained for C++ but not for Python - like train error? For example, I tried:
self.model.tree_count
self.model.get_tree_count()
but got an error every time. Also, am I doing the termination criteria initialization correctly?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.