Can I create a contextual Multi-Armed Bandit Agent in SB3?

Can I create a contextual Multi-Armed Bandit Agent in SB3? - python

I wonder if it is possible to create an agent equivalent to a contextual Multi-Armed Bandit using the SB3 library.
It seems to me a much simpler agent, but checking the library documentation they say they don't cover that kind of algorithm, and I wonder if it is possible to create a similar agent (without trajectory interpretation) by tuning one of the existing agents.
My first approach was to use any agent by assigning a value of gamma=0, but I think that would not be mathematically correct.

Related

How to set your own value function in Reinforecement learning?

I am new to using reinforcement learning, I only read the first few chapters in R.Sutton (so I have a small theoretical background).
I try to solve a combinatorial optimization problem which can be broken down to:
I am looking for the optimal configuration of points (qubits) on a grid (quantum computer).
I already have a cost function to qualify a configuration. I also have a reward function.
Right now I am using simulated annealing, where I randomly move a qubit or swap two qubits.
However, this ansatz is not working well for more than 30 qubits.
That's why I thought to use a policy, which tells me which qubit to move/swap instead of doing it randomly.
Reading the gym documentation, I couldn't find what option I should use. I don't need Q-Learning or deep reinforcement learning as far as I understood since I only need to learn a policy?
I would also be fine using Pytorch or whatever. With this little amount of information, what do you recommend to chose? More importantly, how can I set my own value function?

There are two categories of RL algorithms.
One category like Q-learning, Deep Q-learning and other ones learn a value function that for a state and an action predicts the estimated reward that you will get. Then, once you know for each state and each action what the reward is, your policy is simply to select for each state the action that provides the biggest reward. Thus, in the case of these algorithms, even if you learn a value function, the policy depends on this value function.
Then, you have other deep rl algorithms where you learn a policy directly, like Reinforce, Actor Critic algorithms or other ones. You still learn a value function, but at the same time you also learn a policy with the help of the value function. The value function will help the system learn the policy during training, but during testing you do not use the value function anymore, but only the policy.
Thus, in the first case, you actually learn a value function and act greedy on this value function, and in the second case you learn a value function and a policy and then you use the policy to navigate in the environment.
In the end, both these algorithms should work for your problem, and if you say that you are new to RL, maybe you could try the Deep Q-learning from the gym documentation.

Merging many statistical methods for Text classification, starting with SVM multiclass classifier

Premise: I am not an expert of Machine Learning/Maths/Statistics. I am a linguist and I am entering the world of ML. Please when answering, try to be the more explicit you can.
My problem: I have 3000 expressions containing some aspects (or characteristics, or features) that users usually review in online reviews. These expressions are recognized and approved by human beings and experts.
Example: “they play a difficult role”
The labels are: Acting (referring to the act of acting and also to actors), Direction, Script, Sound, Image.
The goal: I am trying to classify these expressions according to their aspects.
My system: I am using SkLearn and Python under a Jupyter environment.
Technique used until now:
I built a bag-of-words matrix (so I kept track of the
presence/absence of – stemmed - words for each expression) and
I applied a SVM multiclass classifier with kernel RBF and C = 1 (or I
tuned according to the final accuracy.). The code used is this one from
https://www.geeksforgeeks.org/multiclass-classification-using-scikit-learn/
First attempt showed 0.63 of accuracy. When I tried to create more labels from the class Script accuracy went down to 0.50. I was interested in doing that because I have some expressions that for sure describe the plot or the characters.
I think that the problem is due to the presence of some words that are shared among these aspects.
I searched for a solution to improve the model. I found something called “learning curve”. I use the official code provided by sklearn documentation http://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html .
The result is like the second picture (the right one). I can't understand if it is good or not.
In addition to this, I would like to:
import the expressions from a text file. For the moment I have
just created an array and put inside the expressions and I don't
feel so comfortable.
find a way, if it possible, to communicate to the system that there are some words that are very specific / important to an Aspect and help it to improve the classification.
How can I do this? I read that in some works researchers have used more systems... How should I handle this? From where can I retrieve the resulting numbers from the first system to use them in the second one?
I would like to underline that there are some expressions, verbs, nouns, etc. that are used a lot in some contexts and not in others. There are some names that for sure are names of actors and not directors, for example. In the future I would like to add more linguistic pieces of information to the system and trying to improve it.
I hope to have expressed myself in an enough clear way and to have used an appropriate and understandable language.

Which model to use when mixed-effects, random-effects added regression is needed

So mixed-effects regression model is used when I believe that there is dependency with a particular group of a feature. I've attached the Wiki link because it explains better than me. (https://en.wikipedia.org/wiki/Mixed_model)
Although I believe that there are many occasions in which we need to consider the mixed-effects, there aren't too many modules that support this.
R has lme4 and Python seems to have a similar module, but they are both statistic driven; they do not use the cost function algorithm such as gradient boosting.
In Machine Learning setting, how would you handle the situation that you need to consider mixed-effects? Are there any other models that can handle longitudinal data with mixed-effects(random-effects)?
(R seems to have a package that supports mixed-effects: https://rd.springer.com/article/10.1007%2Fs10994-011-5258-3
But I am looking for a Python solution.

There are, at least, two ways to handle longitudinal data with mixed-effects in Python:
StatsModel for linear mixed effects;
MERF for mixed effects random forest.
If you go for StatsModel, I'd recommend you to do some of the examples provided here. If you go for MERF, I'd say that the best starting point is here.
I hope it helps!

Implementing Policy iteration methods in Open AI Gym

I am currently reading "Reinforcement Learning" from Sutton & Barto and I am attempting to write some of the methods myself.
Policy iteration is the one I am currently working on. I am trying to use OpenAI Gym for a simple problem, such as CartPole or continuous mountain car.
However, for policy iteration, I need both the transition matrix between states and the Reward matrix.
Are these available from the 'environment' that you build in OpenAI Gym.
I am using python.
If not, how do I calculate these values, and use the environment?

No, OpenAI Gym environments will not provide you with the information in that form. In order to collect that information you will need to explore the environment via sampling: i.e. selecting actions and receiving observations and rewards. With these samples you can estimate them.
One basic way to approximate these values is to use LSPI (least square policy iteration), as far as I remember, you will find more about this in Sutton too.

See these comments at toy_text/discrete.py:
P: transitions (*)
(*) dictionary dict of dicts of lists, where
P[s][a] == [(probability, nextstate, reward, done), ...]

What is the optimal topic-modelling workflow with MALLET?

Introduction
I'd like to know what other topic modellers consider to be an optimal topic-modelling workflow all the way from pre-processing to maintenance. While this question consists of a number of sub-questions (which I will specify below), I believe this thread would be useful for myself and others who are interested to learn about best practices of end-to-end process.
Proposed Solution Specifications
I'd like the proposed solution to preferably rely on R for text processing (but Python is fine also) and topic-modelling itself to be done in MALLET (although if you believe other solutions work better, please let us know). I tend to use the topicmodels package in R, however I would like to switch to MALLET as it offers many benefits over topicmodels. It can handle a lot of data, it does not rely on specific text pre-processing tools and it appears to be widely used for this purpose. However some of the issues outline below are also relevant for topicmodels too. I'd like to know how others approach topic modelling and which of the below steps could be improved. Any useful piece of advice is welcome.
Outline
Here is how it's going to work: I'm going to go through the workflow which in my opinion works reasonably well, and I'm going to outline problems at each step.
Proposed Workflow
1. Clean text
This involves removing punctuation marks, digits, stop words, stemming words and other text-processing tasks. Many of these can be done either as part of term-document matrix decomposition through functions such as for example TermDocumentMatrix from R's package tm.
Problem: This however may need to be performed on the text strings directly, using functions such as gsub in order for MALLET to consume these strings. Performing in on the strings directly is not as efficient as it involves repetition (e.g. the same word would have to be stemmed several times)
2. Construct features
In this step we construct a term-document matrix (TDM), followed by the filtering of terms based on frequency, and TF-IDF values. It is preferable to limit your bag of features to about 1000 or so. Next go through the terms and identify what requires to be (1) dropped (some stop words will make it through), (2) renamed or (3) merged with existing entries. While I'm familiar with the concept of stem-completion, I find that it rarely works well.
Problem: (1) Unfortunately MALLET does not work with TDM constructs and to make use of your TDM, you would need to find the difference between the original TDM -- with no features removed -- and the TDM that you are happy with. This difference would become stop words for MALLET. (2) On that note I'd also like to point out that feature selection does require a substantial amount of manual work and if anyone has ideas on how to minimise it, please share your thoughts.
Side note: If you decide to stick with R alone, then I can recommend the quanteda package which has a function dfm that accepts a thesaurus as one of the parameters. This thesaurus allows to to capture patterns (usually regex) as opposed to words themselves, so for example you could have a pattern \\bsign\\w*.?ups? that would match sign-up, signed up and so on.
3. Find optimal parameters
This is a hard one. I tend to break data into test-train sets and run cross-validation fitting a model of k topics and testing the fit using held-out data. Log likelihood is recorded and compared for different resolutions of topics.
Problem: Log likelihood does help to understand how good is the fit, but (1) it often tends to suggest that I need more topics than it is practically sensible and (2) given how long it generally takes to fit a model, it is virtually impossible to find or test a grid of optimal values such as iterations, alpha, burn-in and so on.
Side note: When selecting the optimal number of topics, I generally select a range of topics incrementing by 5 or so as incrementing a range by 1 generally takes too long to compute.
4. Maintenance
It is easy to classify new data into a set existing topics. However if you are running it over time, you would naturally expect that some of your topics may cease to be relevant, while new topics may appear. Furthermore, it might be of interest to study the lifecycle of topics. This is difficult to account for as you are dealing with a problem that requires an unsupervised solution and yet for it to be tracked over time, you need to approach it in a supervised way.
Problem: To overcome the above issue, you would need to (1) fit new data into an old set of topics, (2) construct a new topic model based on new data (3) monitor log likelihood values over time and devise a threshold when to switch from old to new; and (4) merge old and new solutions somehow so that the evolution of topics would be revealed to a lay observer.
Recap of Problems
String cleaning for MALLET to consume the data is inefficient.
Feature selection requires manual work.
Optimal number of topics selection based on LL does not account for what is practically sensible
Computational complexity does not give the opportunity to find an optimal grid of parameters (other than the number of topics)
Maintenance of topics over time poses challenging issues as you have to retain history but also reflect what is currently relevant.
If you've read that far, I'd like to thank you, this is a rather long post. If you are interested in the suggest, feel free to either add more questions in the comments that you think are relevant or offer your thoughts on how to overcome some of these problems.
Cheers

Thank you for this thorough summary!
As an alternative to topicmodels try the package mallet in R. It runs Mallet in a JVM directly from R and allows you to pull out results as R tables. I expect to release a new version soon, and compatibility with tm constructs is something others have requested.
To clarify, it's a good idea for documents to be at most around 1000 tokens long (not vocabulary). Any more and you start to lose useful information. The assumption of the model is that the position of a token within a given document doesn't tell you anything about that token's topic. That's rarely true for longer documents, so it helps to break them up.
Another point I would add is that documents that are too short can also be a problem. Tweets, for example, don't seem to provide enough contextual information about word co-occurrence, so the model often devolves into a one-topic-per-doc clustering algorithm. Combining multiple related short documents can make a big difference.
Vocabulary curation is in practice the most challenging part of a topic modeling workflow. Replacing selected multi-word terms with single tokens (for example by swapping spaces for underscores) before tokenizing is a very good idea. Stemming is almost never useful, at least for English. Automated methods can help vocabulary curation, but this step has a profound impact on results (much more than the number of topics) and I am reluctant to encourage people to fully trust any system.
Parameters: I do not believe that there is a right number of topics. I recommend using a number of topics that provides the granularity that suits your application. Likelihood can often detect when you have too few topics, but after a threshold it doesn't provide much useful information. Using hyperparameter optimization makes models much less sensitive to this setting as well, which might reduce the number of parameters that you need to search over.
Topic drift: This is not a well understood problem. More examples of real-world corpus change would be useful. Looking for changes in vocabulary (e.g. proportion of out-of-vocabulary words) is a quick proxy for how well a model will fit.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.