Refit python's surprise recommedation system with new data - python

I've built a recommender system using Python Surprise library.
Next step is to update algorithm with new data. For example a new user or a new item was added.
I've digged into documentation and got nothing for this case. The only possible way is to train new model from time to time from scratch.
It looks like I missed something but I can't figure out what exactly.
Can anybody point me out how I can refit existing algorithm with new data?

Unfortunately Surprise doesn't support partial fit yet.
In this thread there are some workarounds and forks with implemented partial fit.

Related

How to set your own value function in Reinforecement learning?

I am new to using reinforcement learning, I only read the first few chapters in R.Sutton (so I have a small theoretical background).
I try to solve a combinatorial optimization problem which can be broken down to:
I am looking for the optimal configuration of points (qubits) on a grid (quantum computer).
I already have a cost function to qualify a configuration. I also have a reward function.
Right now I am using simulated annealing, where I randomly move a qubit or swap two qubits.
However, this ansatz is not working well for more than 30 qubits.
That's why I thought to use a policy, which tells me which qubit to move/swap instead of doing it randomly.
Reading the gym documentation, I couldn't find what option I should use. I don't need Q-Learning or deep reinforcement learning as far as I understood since I only need to learn a policy?
I would also be fine using Pytorch or whatever. With this little amount of information, what do you recommend to chose? More importantly, how can I set my own value function?
There are two categories of RL algorithms.
One category like Q-learning, Deep Q-learning and other ones learn a value function that for a state and an action predicts the estimated reward that you will get. Then, once you know for each state and each action what the reward is, your policy is simply to select for each state the action that provides the biggest reward. Thus, in the case of these algorithms, even if you learn a value function, the policy depends on this value function.
Then, you have other deep rl algorithms where you learn a policy directly, like Reinforce, Actor Critic algorithms or other ones. You still learn a value function, but at the same time you also learn a policy with the help of the value function. The value function will help the system learn the policy during training, but during testing you do not use the value function anymore, but only the policy.
Thus, in the first case, you actually learn a value function and act greedy on this value function, and in the second case you learn a value function and a policy and then you use the policy to navigate in the environment.
In the end, both these algorithms should work for your problem, and if you say that you are new to RL, maybe you could try the Deep Q-learning from the gym documentation.

Efficient way to generate Lime explanations for full dataset

Am working on a binary classification problem with 1000 rows and 15 features.
Currently am using Lime to explain the predictions of each instance.
I use the below code to generate explanations for full test dataframe
test_indx_list = X_test.index.tolist()
test_dict={}
for n in test_indx_list:
exp = explainer.explain_instance(X_test.loc[n].values, model.predict_proba, num_features=5)
a=exp.as_list()
test_dict[n] = a
But this is not efficient. Is there any alternative approach to generate explanation/ get feature contributions quicker?
From what the docs show, there isn't currently an option to do batch explain_instance, although there are plans for it. This should help a lot with speed on newer versions later on.
What seems to be the most appropriate change to get better speed is decreasing the number of samples used to learn the linear model.
explainer.explain_instance(... num_features=5, num_samples=2500)
The default value for num_samples is 5000, which can be much more than you need depending on your model, and is currently the argument that will most affect the speed of the explainer.
Another approach would be to try adding parallelization to the snippet. It's a more complex solution where you run multiple instances of the snippet at the same time, and gather the results at the end. For that, I leave a link, but really it's not something I can give a snippet right out of the box.

How to train a model in C++ with tensorflow?

I tried to trained a experiment with deep learning model.
I found that tensorflow is the best way to do this.
But there is problem that tensorflow need to be writen in python.
And my program contain many loops.Like this..
for i=1~2000
for j=1~2000
I know this is a big drawback for python.
It's very slow than c.
I know tensorfow has a C++ API, but it's not clear.
https://www.tensorflow.org/api_docs/cc/index.html
(This is the worst Specification I have ever looked)
Can someone give me an easy example in that?
All I need is two simple code.
One is how to create a graph.
The other is how to load this graph and run it.
I really eager need this.Hope someone can help me out.
It's not so easy, but it is possible.
First, you need to create tensorflow graph in python and save it in file.
This article may help you
https://medium.com/jim-fleming/loading-a-tensorflow-graph-with-the-c-api-4caaff88463f#.krslipabt
Second, you need to compile libtensorflow, link it to your program (you need tensorflow headers as well, so it's a bit tricky) and load the graph from the file.
This article may help you this time
https://medium.com/jim-fleming/loading-tensorflow-graphs-via-host-languages-be10fd81876f#.p9s69rn7u

Sklearn - model persistence without pkl file

I'm interested in saving the model created in Sklearn (e.g., EmpiricalCovariance, MinCovDet or OneClassSVM) and re-applying later on.
I'm familiar with the option of saving a PKL file and joblib, however I would prefer to save the model explicitly and not a serialized python object.
The main motivation for this is that it allows easily viewing the model parameters.
I found one reference to doing this:
http://thiagomarzagao.com/2015/12/07/model-persistence-without-pickles/
The question is:
Can I count on this working over time (i.e., new versions of sklearn)? Is this too much of a "hacky" solution?
Does anyone have experience doing this?
Thanks
Jonathan
I don't think it's a hacky solution, a colleague has done a similar thing where he exports a model to be consumed by a scorer which is written in golang, and is much faster than the scikit-learn scorer. If you're worried about compatability with future versions of sklearn, you should consider using an environment manager like conda or virtualenv; in anycause this is just good software engineering practice and something you should start to get used to anyway.

solving ODEs on networks with PyDSTool

After using scipy.integrate for a while I am at the point where I need more functions like bifurcation analysis or parameter estimation. This is why im interested in using the PyDSTool, but from the documentation I can't figure out how to work with ModelSpec and if this is actually what will lead me to the solution.
Here is a toy example of what I am trying to do: I have a network with two nodes, both having the same (SIR) dynamic, described by two ODEs, but different initial conditions. The equations are coupled between nodes via the Epsilon (see formula below).
formulas as a picture for better read, the 'n' and 'm' are indices, not exponents ~>
http://image.noelshack.com/fichiers/2014/28/1404918182-odes.png
(could not use the upload on stack, sadly)
In the two node case my code (using PyDSTool) looks like this:
#multiple SIR metapopulations
#parameter and initial condition definition; a dict is a must
import PyDSTool as pdt
params={'alpha': 0.7, 'beta':0.1, 'epsilon1':0.5,'epsilon2':0.5}
ini={'s1':0.99,'s2':1,'i1':0.01,'i2':0.00}
DSargs=pdt.args(name='SIRtest_multi',
ics=ini,
pars=params,
tdata=[0,20],
#the for-macro generates formulas for s1,s2 and i1,i2;
#sum works similar but sums over the expressions in it
varspecs={'s[o]':'for(o,1,2,-alpha*s[o]*sum(k,1,2,epsilon[k]*i[k]))',
'i[l]':'for(l,1,2,alpha*s[l]*sum(m,1,2,epsilon[m]*i[m]))'})
#generator
DS = pdt.Generator.Vode_ODEsystem(DSargs)
#computation, a trajectory object is generated
trj=DS.compute('test')
#extraction of the points for plotting
pts=trj.sample()
#plotting; pylab is imported along with PyDSTool as plt
pdt.plt.plot(pts['t'],pts['s1'],label='s1')
pdt.plt.plot(pts['t'],pts['i1'],label='i1')
pdt.plt.plot(pts['t'],pts['s2'],label='s2')
pdt.plt.plot(pts['t'],pts['i2'],label='i2')
pdt.plt.legend()
pdt.plt.xlabel('t')
pdt.plt.show()
But in my original problem, there are more than 1000 nodes and 5 ODEs for each, every node is coupled to a different number of other nodes and the epsilon values are not equal for all the nodes. So tinkering with this syntax did not led me anywhere near the solution yet.
What I am actually thinking of is a way to construct separate sub-models/solver(?) for every node, having its own parameters (epsilons, since they are different for every node). Then link them to each other. And this is the point where I do not know wether it is possible in PyDSTool and if it is the way to handle this kind of problems.
I looked through the examples and the Docs of PyDSTool but could not figure out how to do it, so help is very appreciated! If the way I'm trying to do things is unorthodox or plain stupid, you are welcome to make suggestions how to do it more efficiently. (Which is actually more efficient/fast/better way to solve problems like this: subdivide it into many small (still not decoupled) models/solvers or one containing all the ODEs at once?)
(Im neither a mathematician nor a programmer, but willing to learn, so please be patient!)
The solution is definitely not to build separate simulation models. That won't work because so many variables will be continuously coupled between the sub-models. You absolutely must have all the ODEs in one place together.
It sounds like the solution you need is to use the ModelSpec object constructs. These let you hierarchically build the sub-model definitions out of symbolic pieces. They can have their own "epsilon" parameters, etc. You declare all the pieces when you're finished and let PyDSTool make the final strings containing the ODE definitions for you. I suggest you look at the tutorial example at:
http://www.ni.gsu.edu/~rclewley/PyDSTool/Tutorial/Tutorial_compneuro.html
and the provided examples: ModelSpec_test.py, MultiCompartments.py. But, remember that you still have to have a source for the parameters and coupling data (i.e., a big matrix or dictionary loaded from a file) to be able to automate the process of building the model, otherwise you'd still be writing it all out by hand.
You have to build some classes for the components that you want to have. You might also create a factory function (compare 'makeSoma' in the neuralcomp.py toolbox) that will take all your sub-components and create an ODE based on summing something up from each of the declared components. At the end, you can refer to the parameters by their position in the hierarchy. One might be 's1.epsilon' while another might be 'i4.epsilon'.
Unfortunately, to build models like this efficiently you will have to learn to do some more complex programming! So start by understanding all the steps in the tutorial. You can email me directly through the SourceForge support discussions or email once you've got started and have specific questions.

Categories

Resources