Using pybrain optimization algorithm to solve search problems - python

I recently started using pybrain library for classification problems using neural networks and with some struggle and documentation I made it work.
Now, I would like to use blackbox optimization algorithms from the same library, but not applied to classification.
Basically, I am trying to reproduce some result from Randy's blog http://www.randalolson.com/2015/02/03/heres-waldo-computing-the-optimal-search-strategy-for-finding-waldo/.
So, as a first step, I constructed supervised dataset with the following snippet:
ds = SupervisedDataSet(2, 2)
for row in range(len(waldo_df)):
ds.addSample(inp=waldo_df.iloc[row][['Book', 'Page']], target=waldo_df.iloc[row][['X', 'Y']])
return ds
Now, one sample from the dataset looks like:
ds.getSample()
[array([ 5., 8.]), array([ 3.51388889, 4.31944444])]
On the next step I would like to use HillClimber algorithm to find the optimal path:
ef = ds.evaluateModuleMSE
init_value = ds.getSample()
learner = HillClimber(evaluator=ef, initEvaluable=init_value, minimize=True)
learner.learn()
What I get back in exception:
/Users/maestro/anaconda/lib/python2.7/site-packages/pybrain/datasets/supervised.pyc in evaluateModuleMSE(self, module, averageOver, **args)
96 res = 0.
97 for dummy in range(averageOver):
---> 98 module.reset()
99 res += self.evaluateMSE(module.activate, **args)
100 return res/averageOver
AttributeError: 'numpy.ndarray' object has no attribute 'reset'
Can someone help me figure out what I am doing wrong? The documentation on this is very sparse and even searching through the code base did not help.
Thanks
P.S. If I am reading the API correctly
class pybrain.optimization.HillClimber(evaluator=None, initEvaluable=None, **kwargs)
The simplest kind of stochastic search: hill-climbing in the fitness landscape.
the optimization algorithm needs to take only evaluator which in my case would be ds.evaluateModuleMSE
Update
The whole code snippet is:
import pandas as pd
from pybrain.optimization import HillClimber
from pybrain.datasets import SupervisedDataSet
waldo_df = pd.read_csv('whereis-waldo-locations.csv')
ds = SupervisedDataSet(2, 2)
for row in range(len(waldo_df)):
ds.addSample(inp=waldo_df.iloc[row][['Book', 'Page']], target=waldo_df.iloc[row][['X', 'Y']])
learner = HillClimber(evaluator=ds.evaluateModuleMSE, initEvaluable=ds.getSample(), minimize=True)

Related

StandardScaler.inverse_transform() return the same array as input :/ Is sklearn broken or am I?

Good evening,
I'm currently pursuing a PhD in chemistry and in this framework I'm trying to apply my few knowledge in python and stats to discriminate sample based on their IR spectrum.
After a few of weeks of data acquisition I'm finally able to build my data set and was about to see what PCA can offer (this was the easy part).
I was able to build my script and get the loadings, scores and everything else that I could possibly need or want. However I used the StandardScaler from sklearn.preprocessing to scale down my data so (correct my if i'm wrong) I should get back loadings in this "standard scaled" space.
As my data are actual IR spectra those loadings have a chemical meanings (even thought there are not real spectrum) e.g. if my PC1 loadings have a peak at XX cm-1 i know that samples with high PC1 are likely to contain compounds that absorb at this wavenumber .
So i want to reverse the StandardScaler transformation. I've tried to used StandardScaler.inverse_transform() however it appears to return me the same array that I gave him... which is very frustrating...
I'm trying to do the same thing with my samples spectrum but it gave me the same result again : here is the portion of my script where I tried this :
Wavenumbers = DFF.columns
#in fact this is a little more complicated but that's the spirit
Spectre = DFF.values.tolist()
#btw DFF is my pandas.dataframe containing spectrum with features = wavenumber
SS = StandardScaler(copy=True)
DFF = SS.fit_transform(DFF) #at this point I use SS for preprocessing before PCA
#I'm then trying to inverse SS and get back the 1rst spectrum of the dataset
D = SS.inverse_transform(DFF[0])
#However at this point DFF[0] and D are almost-exactly the same I'm sure because :
plt.plot(Wavenumbers,D)
plt.plot(Wavenumbers,DFF[0]) #the curves are the sames, and :
for i,j in enumerate(D) :
if j==DFF[0][i] : pass
else : print("{}".format(j-DFF[0][i] )) #return nothing bigger than 10e-16
The problem is more than likely syntax or how i used StandardScaler, however i have no one around me to search for help with that . Can anyone tell me what i did wrong ? or give me an hint on how i could get back my loadings in the "actual real IR spectra" space ?
PS: sorry for the wacky English and i hope to be understandable
Good evening,
After putting the problem aside for a few days I finally re-coded the function I needed (as suggested by Robert Dodier).
For reminder, I wanted to have a function that could take my data from a pandas dataframe and mean-centered it in order to do PCA, but also that could reverse the preprocessing for latter uses.
Here is the code I ended up with :
import pandas as pd
import numpy as np
class Scaler:
std =[]
mean = []
def fit(self,DF):
self.std=[]
self.mean=[]
for c in DF.columns:
self.std.append(DF[c].std())
self.mean.append(DF[c].mean())
def transform(self,DF):
X = np.zeros(shape=DF.shape)
for i,c in enumerate(DF.columns):
for j in range(len(DF.index)):
X[j][i] = (DF[c][j] - self.mean[i]) / self.std[i]
return X
def reverse(self,X):
Y = np.zeros(shape=X.shape)
for i in range(len(X[0])):
for j in range(len(X)):
Y[j][i] = X[j][i] * self.std[i] + self.mean[i]
return Y
def fit_transform(self,DF):
self.fit(DF)
X = self.transform(DF)
return X
It's pretty slow and surely very low-tech but it seems to do the job just fine. Hope it will save some time to other python beginners.
I designed it to be as close as I think sklearn.preprocessing.StandardScaler does it.
example :
S = Scaler() #create scaler object
S.fit(DF) #fit the scaler to the dataframe (calculate mean and std for every columns in DF /!\ DF must be a pd.dataframe)
X=S.transform(DF) # return a np.array with mean centered data
Y = S.reverse(X) # reverse the transformation to get back original data
Again sorry for the fast tipped English. And thanks to Robert for taking the time to answer.

Trying to do extremely basic (3 categorical variables) Bayesian inference with PYMC3 and NetworkX

I'm trying to understand this example of a Bayesian network. Figured I'd dumb it down even more such that it's only looking at three variables: D1, D2, and D3. Each is categorical, with their probability tables given at the top of the code below. I'd like to set D3 = 0 and then compute the posterior probabilities of D1 and D2, like a simpler version of what's done at the bottom of this page. I've tried to do this by playing with the code from the first source but have been unsuccessful and I don't understand the error messages.
Any assistance in this would be greatly appreciated - I've really been struggling to implement Bayesian inference. I've tried looking at the PYMC3 Categorical documentation but it's pretty bare-bones. And the example of inference I could find uses continuous variables and seems to be doing a different thing than what I'm trying to do. Or if it isn't, I'm not smart enough to make the connection and use whatever they're demonstrating to meet my needs.
I'm not sure if posting large sections of code is approved here? But I'm not sure how else to do this. Here is my code (a much shorter, simpler version of the code in the first source):
import networkx as nx
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pymc3 as pm
import theano
import theano.tensor as T
from theano.compile.ops import as_op
d1_prob = np.array([0.3,0.7]) # 2 choices
d2_prob = np.array([0.6,0.3,0.1]) # 3 choices
d3_prob = np.array([[[0.1, 0.9], # (2x3)x2 choices
[0.3, 0.7],
[0.4, 0.6]],
[[0.6, 0.4],
[0.8, 0.2],
[0.9, 0.1]]])
BN = nx.DiGraph()
BN.add_node('D1', dtype='Discrete', prob=d1_prob)
BN.add_node('D2', dtype='Discrete', prob=d2_prob)
BN.add_node('D3', dtype='Discrete', prob = d3_prob, observe=np.array([0.]))
BN.add_edges_from([('D1', 'D3'), ('D2', 'D3')])
#print(BN.nodes(data=True))
#print(BN.pred['D3'])
def gpm(BN, node, num=0):
return BN.node[BN.predecessors(node)[num]]['dist_obj']
with pm.Model() as mod2:
BN.node['D1']['dist_obj'] = pm.Categorical('D1', p=BN.node['D1']['prob'])
BN.node['D2']['dist_obj'] = pm.Categorical('D2', p=BN.node['D2']['prob'])
BN.node['D3']['dist_obj'] = pm.Categorical('D3', p=BN.node['D3']['prob'][
gpm(BN,'D3', num=1),
gpm(BN,'D3', num=0)
], observed=BN.node['D3']['observe'])
with mod2:
trace = pm.sample(10000)
pm.summary(trace, varnames=['D3'], start=1000)
pm.traceplot(trace[1000:], varnames=['D3'])
I can't help you with PyMC3 , sorry. But maybe you just need the numbers.
Actually I don't understand why you need an inference algorithm at all here.
The probability tables are fully specified, there is no missing data, and therefore you can just apply Bayes Rule here. Admittedly I don't want to do this with pencil and paper even for such a simple example. So I've used java-based GUI tool samiam here, to use Bayes' rule for me.
When nothing is observed:
Interpreting your code gpm() and observe(), you observe d3 = 1. Then the CPT values change to this:
(The state0 values are arbitrary, samiam just assigns default labels stateX). The row-position in the CPT is what matters.

PyMC2 and PyMC3 give different results...?

I'm trying to get a simple PyMC2 model working in PyMC3. I've gotten the model to run but the models give very different MAP estimates for the variables. Here is my PyMC2 model:
import pymc
theta = pymc.Normal('theta', 0, .88)
X1 = pymc.Bernoulli('X2', p=pymc.Lambda('a', lambda theta=theta:1./(1+np.exp(-(theta-(-0.75))))), value=[1],observed=True)
X2 = pymc.Bernoulli('X3', p=pymc.Lambda('b', lambda theta=theta:1./(1+np.exp(-(theta-0)))), value=[1],observed=True)
model = pymc.Model([theta, X1, X2])
mcmc = pymc.MCMC(model)
mcmc.sample(iter=25000, burn=5000)
trace = (mcmc.trace('theta')[:])
print "\nThe MAP value for theta is", trace.sum()/len(trace)
That seems to work as expected. I had all sorts of trouble figuring out how to use the equivalent of the pymc.Lambda object in PyMC3. I eventually came across the Deterministic object. The following is my code:
import pymc3
with pymc3.Model() as model:
theta = pymc3.Normal('theta', 0, 0.88)
X1 = pymc3.Bernoulli('X1', p=pymc3.Deterministic('b', 1./(1+np.exp(-(theta-(-0.75))))), observed=[1])
X2 = pymc3.Bernoulli('X2', p=pymc3.Deterministic('c', 1./(1+np.exp(-(theta-(0))))), observed=[1])
start=pymc3.find_MAP()
step=pymc3.NUTS(state=start)
trace = pymc3.sample(20000, step, njobs=1, progressbar=True)
pymc3.traceplot(trace)
The problem I'm having is that my MAP estimate for theta using PyMC2 is ~0.68 (correct), while the estimate PyMC3 gives is ~0.26 (incorrect). I suspect this has something to do with the way I'm defining the deterministic function. PyMC3 won't let me use a lambda function, so I just have to write the expression in-line. When I try to use lambda theta=theta:... I get this error:
AsTensorError: ('Cannot convert <function <lambda> at 0x157323e60> to TensorType', <type 'function'>)
Something to do with Theano?? Any suggestions would be greatly appreciated!
It works when you use a theano tensor instead of a numpy function in your Deterministic.
import pymc3
import theano.tensor as tt
with pymc3.Model() as model:
theta = pymc3.Normal('theta', 0, 0.88)
X1 = pymc3.Bernoulli('X1', p=pymc3.Deterministic('b', 1./(1+tt.exp(-(theta-(-0.75))))), observed=[1])
X2 = pymc3.Bernoulli('X2', p=pymc3.Deterministic('c', 1./(1+tt.exp(-(theta-(0))))), observed=[1])
start=pymc3.find_MAP()
step=pymc3.NUTS(state=start)
trace = pymc3.sample(20000, step, njobs=1, progressbar=True)
print "\nThe MAP value for theta is", np.median(trace['theta'])
pymc3.traceplot(trace);
Here's the output:
Just in case someone else has the same problem, I think I found an answer. After trying different sampling algorithms I found that:
find_MAP gave the incorrect answer
the NUTS sampler gave the incorrect answer
the Metropolis sampler gave the correct answer, yay!
I read somewhere else that the NUTS sampler doesn't work with Deterministic. I don't know why. Maybe that's the case with find_MAP too? But for now I'll stick with Metropolis.
Also, NUTS doesn't handle discrete variables. If you want to use NUTS, you have to split up the samplers:
step1 = pymc3.NUTS([theta])
step2 = pymc3.BinaryMetropolis([X1,X2])
trace = pymc3.sample(10000, [step1, step2], start)
EDIT:
Missed that 'b' and 'c' were defined inline. Removed them from the NUTS function call
The MAP value is not defined as the mean of a distribution, but as its maximum. With pymc2 you can find it with:
M = pymc.MAP(model)
M.fit()
theta.value
which returns array(0.6253614422469552)
This agrees with the MAP that you find with find_MAP in pymc3, which you call start:
{'theta': array(0.6253614811102668)}
The issue of which is a better sampler is a different one, and does not depend on the calculation of the MAP. The MAP calculation is an optimization.
See: https://pymc-devs.github.io/pymc/modelfitting.html#maximum-a-posteriori-estimates for pymc2.

Beginner stats: Predict binary outcome of set of numbers given history (Logistic regression)

I apologize in advance for the simplicity of this question. I have no background in stats and am getting lost in the complexity of it all.
If I have a couple thousand numbers all with a binary outcome
number,outcome
14,0
27,1
88,1
04,0
42,1
How do I predict future numbers ? for example:
82
45
02
Or is this going to be inaccurate due to there only being a single variable ? All the examples I have seen use multiple variables.
I have been digging through statsmodels and went through this great tutorial: http://blog.yhathq.com/posts/logistic-regression-and-python.html. And through that I've made this:
import pandas as pd
import statsmodels.api as sm
df = pd.read_csv("binary.csv")
df.columns = ["number", "outcome"]
data = df[['number', 'outcome']]
train_cols = data.columns[0]
logit = sm.Logit(data['outcome'], data[train_cols])
result = logit.fit()
print result.summary()
But that seems to be analyzing the weight of current numbers, how would you predict new ones ? Am I even going about this the right way ?
The result of the fit should have a method predict(). That is what you need to use to predict future values, for example:
result = sm.Logit(outcomes, values).fit()
result.predict([82,45,2])

Neural net with Pybrain will not converge

I am trying to build a simple neural network using Python and Pybrain package.
As I am starting to learn both the method and Pybrain package. I tried to make a very simple neuralnet with some real data that I have available!
I know there is an underlying connection to my data, however the code does not converge at all, and the results after the training are basically the same for any set of real validation data that I put there. Below is my code and a small part of the data. I have over 5000 lines of data available with known g to train my network, but it does not matter the number of points added to the training.
from pybrain.tools.shortcuts import buildNetwork as bld
from pybrain.datasets import SupervisedDataSet as spds
from pybrain.supervised.trainers import BackpropTrainer as bpt
import numpy as np
u,g,r,i,z = np.loadtxt("dataset.dat",unpack=True)
data = spds(4,1)
net = bld(4,1000,1)
for i in range(0,len(umag)):
data.addSample((u[i],r[i],i[i],z[i]),(g[i]))
trainer = bpt(net,data)
trainer.trainUntilConvergence(dataset=data,maxEpochs=300,validationProportion=0.5)
p = net.activate([17.136,15.812,15.693,15.675])
print p
#expected result 16.225
p = net.activate([19.382,17.684,17.511,17.435])
# 18.195 - expected result
print p
18.14981 15.10829 13.96468 -10.8685 13.20411
16.84580 15.17839 14.61974 14.44930 14.44493
16.70895 15.57959 15.28097 15.16538 15.19260
18.44166 16.32709 15.45345 15.14938 15.04544
18.03881 16.49129 15.96768 15.78446 15.77211
21.15679 18.66248 17.46381 16.97513 16.75475
19.25665 17.80023 17.18956 16.97563 16.94967
17.01522 16.08040 15.85172 15.81930 15.92262
19.21695 17.72263 17.17900 16.98280 16.97201
19.98507 18.56911 17.98143 17.80738 17.81714
16.94824 15.97417 15.70555 15.59221 15.64357
21.20893 19.40982 18.68114 18.46647 18.43065
18.72652 17.38880 16.93716 16.73246 16.75096
20.57421 19.55045 19.15475 18.99772 19.02503
22.48833 20.07709 18.68276 17.60561 17.09613
22.27604 20.34056 19.66521 19.37319 19.30457
20.58372 19.18035 18.64691 18.43370 18.39288
22.25103 20.74570 20.16532 19.94144 19.78580
22.49646 19.63043 18.39409 17.97594 17.77803
19.22686 17.55373 16.97127 16.76445 16.70418
20.44500 19.34502 18.96556 18.80437 18.78767
22.69331 21.19628 19.89190 19.39628 19.11377
19.51075 18.02397 17.46963 17.31436 17.27759
19.92604 18.49456 17.97421 17.83519 17.80557
19.18904 18.22256 17.84221 17.70319 17.64457
20.23186 18.43468 17.81423 17.60103 17.54677
19.86590 18.32822 17.75089 17.57386 17.53067
20.84188 19.78345 19.42506 19.27895 19.34572
22.14103 21.86670 21.74832 21.61244 21.99680
18.02018 16.69380 16.23947 16.12869 16.09864
19.92574 18.63316 18.15877 17.95703 17.90224
Generally speaking, I get better results if I have scaled my data to be between 0 and 1, or better yet between 0.1 and 0.9. The neuron output is usually going to be between 0 and 1. You might try scaling your inputs and outputs to be within this range, and see if you get better results.

Categories

Resources