StatsModel quantile regression ValueError - python

I got an error after running quantile regression in Python StatsModel module. The error is following:
ValueError Traceback (most recent call last)
<ipython-input-221-3547de1b5e0d> in <module>()
16 model = smf.quantreg(fit_formula, train)
17
---> 18 fitted_model = model.fit(0.2)
19
20 #fitted_model.predict(test)
in fit(self, q, vcov, kernel, bandwidth, max_iter, p_tol, **kwargs)
177 resid = np.abs(resid)
178 xstar = exog / resid[:, np.newaxis]
--> 179 diff = np.max(np.abs(beta - beta0))
180 history['params'].append(beta)
181 history['mse'].append(np.mean(resid*resid))
ValueError: operands could not be broadcast together with shapes (178,) (176,)
I was thinking it was possibly caused by constant features, so I removed those, but I still got the same error. I am wondering what is the cause. My code is following:
quantiles = np.arange(.05, .99, .1)
cols = train.columns.tolist()[1:-2]
fit_formula = ''
for c in cols:
fit_formula = fit_formula + ' + ' + c
fit_formula = 'revenue ~ ' + train.columns.tolist()[0] + fit_formula
model = smf.quantreg(fit_formula, train)
fitted_model = model.fit(0.2)

I think your design matrix is singular, i.e. this does not hold for your data:
np.linalg.matrix_rank(model.exog) == model.exog.shape[1]
Guessing from looking at the code: The parameter, beta, is initialized for the iteration loop with
exog_rank = np_matrix_rank(self.exog)
beta = np.ones(exog_rank)
which has different lengtht than the beta from the auxiliary weighted least squares regression, and the convergence check fails. The iteratively reweighted step used a generalized inverse, pinv, which does not raise an exception because of the singular design matrix.
Based on your traceback, (178,) (176,), you would still have two collinear columns that need to be dropped.
(That's a bug: Either it should raise a proper exception for the singular case, or handle it with pinv throughout.)

Related

Find a matrix with optimization Python Scipi

I am trying to calculate the solution to the following equation: 0 = WB-S for B, while setting upper and lower bounds for the results. The matrices have the following shape:
W is mxn
B is zxn
S is mxz
I have tried using the scipy.optimize.lsq_linear function of scipy. However, I run into the following error:
b must have at most 1 dimension.
In this case, b is S on my equation.
df_ret.shape
(2414, 97)
df_matrix.shape
(2414, 336)
df_bd.shape
(97,)
lb = 2,5*df_bd
up = -2.5*df_bd
res = lsq_linear(df_matrix, df_ret, bounds = (lb,up))
--------------------------------------------------------------------------- ValueError Traceback (most recent call
last) in
----> 1 res = lsq_linear(df_matrix, df_ret, bounds = (lb,up))
~\AppData\Local\Continuum\anaconda3\lib\site-packages\scipy\optimize_lsq\lsq_linear.py
in lsq_linear(A, b, bounds, method, tol, lsq_solver, lsmr_tol,
max_iter, verbose)
261 b = np.atleast_1d(b)
262 if b.ndim != 1:
--> 263 raise ValueError("b must have at most 1 dimension.")
264
265 if b.size != m:
ValueError: b must have at most 1 dimension.

Can't differentiate wrt numpy arrays of dtype int64?

I am a newbie to numpy. Today when I use it to work with linear regression, it shows as below:
KeyError Traceback (most recent call
last)
~/anaconda3/lib/python3.6/site-packages/autograd/numpy/numpy_extra.py
in new_array_node(value, tapes)
84 try:
---> 85 return array_dtype_mappings[value.dtype](value, tapes)
86 except KeyError:
KeyError: dtype('int64')
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call
last)
<ipython-input-4-aebe8f7987b0> in <module>()
24 return cost/float(np.size(y))
25
---> 26 weight_h, cost_h = gradient_descent(least_squares, alpha,
max_its, w)
27
28 # a)
<ipython-input-2-1b74c4f818f4> in gradient_descent(g, alpha, max_its,
w)
12 for k in range(max_its):
13 # evaluate the gradient
---> 14 grad_eval = gradient(w)
15
16 # take gradient descent step
~/anaconda3/lib/python3.6/site-packages/autograd/core.py in
gradfun(*args, **kwargs)
19 #attach_name_and_doc(fun, argnum, 'Gradient')
20 def gradfun(*args,**kwargs):
---> 21 return
backward_pass(*forward_pass(fun,args,kwargs,argnum))
22 return gradfun
23
~/anaconda3/lib/python3.6/site-packages/autograd/core.py in
forward_pass(fun, args, kwargs, argnum)
57 tape = CalculationTape()
58 arg_wrt = args[argnum]
---> 59 start_node = new_node(safe_type(getval(arg_wrt)),
[tape])
60 args = list(args)
61 args[argnum] = merge_tapes(start_node, arg_wrt)
~/anaconda3/lib/python3.6/site-packages/autograd/core.py in
new_node(value, tapes)
185 def new_node(value, tapes=[]):
186 try:
--> 187 return Node.type_mappings[type(value)](value, tapes)
188 except KeyError:
189 return NoDerivativeNode(value, tapes)
~/anaconda3/lib/python3.6/site-packages/autograd/numpy/numpy_extra.py
in new_array_node(value, tapes)
85 return array_dtype_mappings[value.dtype](value, tapes)
86 except KeyError:
---> 87 raise TypeError("Can't differentiate wrt numpy arrays
of dtype {0}".format(value.dtype))
88 Node.type_mappings[anp.ndarray] = new_array_node
89
TypeError: Can't differentiate wrt numpy arrays of dtype int64
I really have no idea about what is happened. I guess it might be related to the structure of array in numpy. Or did I forget to download any packages? Below is my original codes.
# import statements
datapath = 'datasets/'
from autograd import numpy as np
# import automatic differentiator to compute gradient module
from autograd import grad
# gradient descent function
def gradient_descent(g,alpha,max_its,w):
# compute gradient module using autograd
gradient = grad(g)
# run the gradient descent loop
weight_history = [w] # weight history container
cost_history = [g(w)] # cost function history container
for k in range(max_its):
# evaluate the gradient
grad_eval = gradient(w)
# take gradient descent step
w = w - alpha*grad_eval
# record weight and cost
weight_history.append(w)
cost_history.append(g(w))
return weight_history,cost_history
# load in dataset
csvname = datapath + 'kleibers_law_data.csv'
data = np.loadtxt(csvname,delimiter=',')
# get input and output of dataset
x = data[:-1,:]
y = data[-1:,:]
x = np.log(x)
y = np.log(y)
#Data Initiation
alpha = 0.01
max_its = 1000
w = np.array([0,0])
#linear model
def model(x, w):
a = w[0] + np.dot(x.T, w[1:])
return a.T
def least_squares(w):
cost = np.sum((model(x,w)-y)**2)
return cost/float(np.size(y))
weight_h, cost_h = gradient_descent(least_squares, alpha, max_its, w)
# a)
k = np.linspace(-5.5, 7.5, 250)
y = weight_h[max_its][0] + k*weight_h[max_its][1]
plt.figure()
plt.plot(x, y, label='Linear Line', color='g')
plt.xlabel('log of mass')
plt.ylabel('log of metabolic rate')
plt.title("Answer Of a")
plt.legend()
plt.show()
# b)
w0 = weight_h[max_its][0]
w1 = weight_h[max_its][1]
print("Nonlinear relationship between the body mass x and the metabolic
rate y is " /
+ str(w0) + " + " + "log(xp)" + str(w1) + " = " + "log(yp)")
# c)
x2 = np.log(10)
Kj = np.exp(w0 + w1*x2)*1000/4.18
print("It needs " + str(Kj) + " calories")
Could someone help me to figure it out? Thanks a lot.
Here's the important parts of your error:
---> 14 grad_eval = gradient(w)
...
Type Error: Can't differentiate wrt numpy arrays of dtype int64
Your gradient function is saying it doesn't like to differentiate arrays of ints, which makes some sense, since it probably wants more precision than an int can give. You probably need them to be doubles or floats. For a simple solution to this, I believe you can just change your initializer from:
w = np.array([0,0])
which is going to automatically cast those 0s as ints, to:
w = np.array([0.0,0.0])
Those decimals after the 0 will let it know you want floats. There's other ways to go about telling it what kind of array you want (https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.array.html), but this is a simple way.

pymc3 : Dirichlet with multidimensional concentration factor

I am struggling with implementing a model where the concentration factor of the Dirichlet variable is dependent on another variable.
The situation is the following:
A system fails due to faulty components (there are three components, only one fails at each test/observation).
The probability of failure of the components is dependent on the temperature.
Here is a (commented) short implementation of the situation:
import numpy as np
import pymc3 as pm
import theano.tensor as tt
# Temperature data : 3 cold temperatures and 3 warm temperatures
T_data = np.array([10, 12, 14, 80, 90, 95])
# Data of failures of 3 components : [0,0,1] means component 3 failed
F_data = np.array([[0, 0, 1], \
[0, 0, 1], \
[0, 0, 1], \
[1, 0, 0], \
[1, 0, 0], \
[1, 0, 0]])
n_component = 3
# When temperature is cold : Component 1 fails
# When temperature is warm : Component 3 fails
# Component 2 never fails
# Number of observations :
n_obs = len(F_data)
# The number of failures can be modeled as a Multinomial F ~ M(n_obs, p) with parameters
# - n_test : number of tests (Fixed)
# - p : probability of failure of each component (shape (n_obs, 3))
# The probability of failure of components follows a Dirichlet distribution p ~ Dir(alpha) with parameters:
# - alpha : concentration (shape (n_obs, 3))
# The Dirichlet distributions ensures the probabilities sum to 1
# The alpha parameters (and the the probability of failures) depend on the temperature alpha ~ a + b * T
# - a : bias term (shape (1,3))
# - b : describes temperature dependency of alpha (shape (1,3))
_
# The prior on "a" is a normal distributions with mean 1/2 and std 0.001
# a ~ N(1/2, 0.001)
# The prior on "b" is a normal distribution zith mean 0 and std 0.001
# b ~ N(0, 0.001)
# Coding it all with pymc3
with pm.Model() as model:
a = pm.Normal('a', 1/2, 1/(0.001**2), shape = n_component)
b = pm.Normal('b', 0, 1/(0.001**2), shape = n_component)
# I generate 3 alphas values (corresponding to the 3 components) for each of the 6 temperatures
# I tried different ways to compute alpha but nothing worked out
alphas = pm.Deterministic('alphas', a + b * tt.stack([T_data, T_data, T_data], axis=1))
#alphas = pm.Deterministic('alphas', a + b[None, :] * T_data[:, None])
#alphas = pm.Deterministic('alphas', a + tt.outer(T_data,b))
# I think I should get 3 probabilities (corresponding to the 3 components) for each of the 6 temperatures
#p = pm.Dirichlet('p', alphas, shape = n_component)
p = pm.Dirichlet('p', alphas, shape = (n_obs,n_component))
# Multinomial is observed and take values from F_data
F = pm.Multinomial('F', 1, p, observed = F_data)
with model:
trace = pm.sample(5000)
I get the following error in the sample function:
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 73, in run
self._start_loop()
File "/anaconda3/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 113, in _start_loop
point, stats = self._compute_point()
File "/anaconda3/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 139, in _compute_point
point, stats = self._step_method.step(self._point)
File "/anaconda3/lib/python3.6/site-packages/pymc3/step_methods/arraystep.py", line 247, in step
apoint, stats = self.astep(array)
File "/anaconda3/lib/python3.6/site-packages/pymc3/step_methods/hmc/base_hmc.py", line 117, in astep
'might be misspecified.' % start.energy)
ValueError: Bad initial energy: inf. The model might be misspecified.
"""
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
ValueError: Bad initial energy: inf. The model might be misspecified.
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
<ipython-input-5-121fdd564b02> in <module>()
1 with model:
2 #start = pm.find_MAP()
----> 3 trace = pm.sample(5000)
/anaconda3/lib/python3.6/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, nuts_kwargs, step_kwargs, progressbar, model, random_seed, live_plot, discard_tuned_samples, live_plot_kwargs, compute_convergence_checks, use_mmap, **kwargs)
438 _print_step_hierarchy(step)
439 try:
--> 440 trace = _mp_sample(**sample_args)
441 except pickle.PickleError:
442 _log.warning("Could not pickle model, sampling singlethreaded.")
/anaconda3/lib/python3.6/site-packages/pymc3/sampling.py in _mp_sample(draws, tune, step, chains, cores, chain, random_seed, start, progressbar, trace, model, use_mmap, **kwargs)
988 try:
989 with sampler:
--> 990 for draw in sampler:
991 trace = traces[draw.chain - chain]
992 if trace.supports_sampler_stats and draw.stats is not None:
/anaconda3/lib/python3.6/site-packages/pymc3/parallel_sampling.py in __iter__(self)
303
304 while self._active:
--> 305 draw = ProcessAdapter.recv_draw(self._active)
306 proc, is_last, draw, tuning, stats, warns = draw
307 if self._progress is not None:
/anaconda3/lib/python3.6/site-packages/pymc3/parallel_sampling.py in recv_draw(processes, timeout)
221 if msg[0] == 'error':
222 old = msg[1]
--> 223 six.raise_from(RuntimeError('Chain %s failed.' % proc.chain), old)
224 elif msg[0] == 'writing_done':
225 proc._readable = True
/anaconda3/lib/python3.6/site-packages/six.py in raise_from(value, from_value)
RuntimeError: Chain 1 failed.
Any suggestions ?
Misspecified model. The alphas are taking on nonpositive values under your current parameterization, whereas the Dirichlet distribution requires them to be positive, making the model misspecified.
In Dirichlet-Multinomial regression, one uses an exponential link function to mediate between the range of the linear model and the domain of the Dirichlet-Multinomial, namely,
alpha = exp(beta*X)
There are details on this in the MGLM package documentation.
Dirichlet-Multinomial Regression Model
If we implement this model we can achieve decent model convergence and sampling.
import numpy as np
import pymc3 as pm
import theano
import theano.tensor as tt
from sklearn.preprocessing import scale
T_data = np.array([10,12,14,80,90,95])
# standardize the data for better sampling
T_data_z = scale(T_data)
# transform to theano tensor, so it works with tt.outer
T_data_z = theano.shared(T_data_z)
F_data = np.array([
[0,0,1],
[0,0,1],
[0,0,1],
[1,0,0],
[1,0,0],
[1,0,0],
])
# N = num_obs, K = num_components
N, K = F_data.shape
with pm.Model() as dmr_model:
a = pm.Normal('a', mu=0, sd=1, shape=K)
b = pm.Normal('b', mu=0, sd=1, shape=K)
alpha = pm.Deterministic('alpha', pm.math.exp(a + tt.outer(T_data_z, b)))
p = pm.Dirichlet('p', a=alpha, shape=(N, K))
F = pm.Multinomial('F', 1, p, observed=F_data)
trace = pm.sample(5000, tune=10000, target_accept=0.9)
Model Outcomes
The sampling in this model isn't perfect. For example, there are still a number of divergences even with the increased target acceptance rate and additional tuning.
There were 501 divergences after tuning. Increase target_accept or reparameterize.
There were 477 divergences after tuning. Increase target_accept or reparameterize.
The acceptance probability does not match the target. It is 0.5858954056820339, but should be close to 0.8. Try to increase the number of tuning steps.
The number of effective samples is smaller than 10% for some parameters.
Trace Plots
We can see the traces for a and b look good, and the mean locations make sense with data.
Pair Plot
While correlation is less of a problem for NUTS, having uncorrelated posterior sampling is ideal. For the most part we're seeing low correlation, with some slight structure within the a components.
Posterior Plots
Finally, we can look at the posterior plots of p and confirm they make sense with the data.
Alternative Model
The advantage of the Dirichlet-Multinomial is handling overdispersion. It might be worth trying the simpler Multinomial Logisitic Regression / Softmax Regression, since it runs significantly faster and doesn't exhibit any of the sampling problems coming up in the DMR model.
In the end, you could run both and perform model comparison to see if the Dirichlet-Multinomial really is adding explanatory value.
Model
with pm.Model() as softmax_model:
a = pm.Normal('a', mu=0, sd=1, shape=K)
b = pm.Normal('b', mu=0, sd=1, shape=K)
p = pm.Deterministic('p', tt.nnet.softmax(a + tt.outer(T_data_z, b)))
F = pm.Multinomial('F', 1, p, observed = F_data)
trace_sm = pm.sample(5000, tune=10000)
Posterior Plots

hierarchical clustering in scipy - memory error

Here is my code:
import numpy as np
from scipy.cluster.hierarchy import fclusterdata
def mydist(p1,p2):
return 1
Y = np.random.randn(100000,2)
fclust1 = fclusterdata(Y, 1.0, metric=mydist)
It produces the following error:
MemoryError Traceback (most recent call last)
<ipython-input-52-818db8791e96> in <module>()
----> 1 fclust1 = fclusterdata(Y, 1.0, metric=mydist)
C:\Anaconda3\lib\site-packages\scipy\cluster\hierarchy.py in fclusterdata(X, t, criterion, metric, depth, method, R)
1682 'array.')
1683
-> 1684 Y = distance.pdist(X, metric=metric)
1685 Z = linkage(Y, method=method)
1686 if R is None:
C:\Anaconda3\lib\site-packages\scipy\spatial\distance.py in pdist(X, metric, p, w, V, VI)
1218
1219 m, n = s
-> 1220 dm = np.zeros((m * (m - 1)) // 2, dtype=np.double)
1221
1222 wmink_names = ['wminkowski', 'wmi', 'wm', 'wpnorm']
MemoryError:
So I am guessing my vector is too large. I am a bit surprised, since my distance function is trivial. What is max size vector that fclusterdata can accept?
Hierarchical clustering usually requires a pairwise distance matrix.
That means you need O(n^2) memory. And it does not 'see' that your distance is constant (and it doesn't make sense to optimize for this either).
It's not a very scalable algorithm.

Issue with gradient descent implementation of linear regression

I am taking this Coursera class on machine learning / linear regression. Here is how they describe the gradient descent algorithm for solving for the estimated OLS coefficients:
So they use w for the coefficients, H for the design matrix (or features as they call it), and y for the dependent variable. And their convergence criteria is the usual of the norm of the gradient of RSS being less than tolerance epsilon; that is, their definition of "not converged" is:
I am having trouble getting this algorithm to converge and was wondering if I was overlooking something in my implementation. Below is the code. Please note that I also ran the sample dataset I use in it (df) through the statsmodels regression library, just to see that a regression could converge and to get coefficient values to tie out with. It did and they were:
Intercept 4.344435
x1 4.387702
x2 0.450958
Here is my implementation. At each iteration, it prints the norm of the gradient of RSS:
import numpy as np
import numpy.linalg as LA
import pandas as pd
from pandas import DataFrame
# First define the grad function: grad(RSS) = -2H'(y-Hw)
def grad_rss(df, var_name_y, var_names_h, w):
# Set up feature matrix H
H = DataFrame({"Intercept" : [1 for i in range(0,len(df))]})
for var_name_h in var_names_h:
H[var_name_h] = df[var_name_h]
# Set up y vector
y = df[var_name_y]
# Calculate the gradient of the RSS: -2H'(y - Hw)
result = -2 * np.transpose(H.values) # (y.values - H.values # w)
return result
def ols_gradient_descent(df, var_name_y, var_names_h, epsilon = 0.0001, eta = 0.05):
# Set all initial w values to 0.0001 (not related to our choice of epsilon)
w = np.array([0.0001 for i in range(0, len(var_names_h) + 1)])
# Iteration counter
t = 0
# Basic algorithm: keep subtracting eta * grad(RSS) from w until
# ||grad(RSS)|| < epsilon.
while True:
t = t + 1
grad = grad_rss(df, var_name_y, var_names_h, w)
norm_grad = LA.norm(grad)
if norm_grad < epsilon:
break
else:
print("{} : {}".format(t, norm_grad))
w = w - eta * grad
if t > 10:
raise Exception ("Failed to converge")
return w
# ##########################################
df = DataFrame({
"y" : [20,40,60,80,100] ,
"x1" : [1,5,7,9,11] ,
"x2" : [23,29,60,85,99]
})
# Run
ols_gradient_descent(df, "y", ["x1", "x2"])
Unfortunately this does not converge, and in fact prints a norm that is exploding with each iteration:
1 : 44114.31506051333
2 : 98203544.03067812
3 : 218612547944.95386
4 : 486657040646682.9
5 : 1.083355358314664e+18
6 : 2.411675439503567e+21
7 : 5.368670935963926e+24
8 : 1.1951287949674022e+28
9 : 2.660496151835357e+31
10 : 5.922574875391406e+34
11 : 1.3184342751414824e+38
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
......
Exception: Failed to converge
If I increase the maximum number of iterations enough, it doesn't converge, but just blows out to infinity.
Is there an implementation error here, or am I misinterpreting the explanation in the class notes?
Updated w/ Answer
As #Kant suggested, the eta needs to updated at each iteration. The course itself had some sample formulas for this but none of them helped in the convergence. This section of the Wikipedia page about gradient descent mentions the Barzilai-Borwein approach as a good way of updating the eta. I implemented it and altered my code to update the eta with it at each iteration, and the regression converged successfully. Below is my translation of the Wikipedia version of the formula to the variables used in regression, as well as code that implements it. Again, this code is called in the loop of my original ols_gradient_descent to update the eta.
def eta_t (w_t, w_t_minus_1, grad_t, grad_t_minus_1):
delta_w = w_t - w_t_minus_1
delta_grad = grad_t - grad_t_minus_1
eta_t = (delta_w.T # delta_grad) / (LA.norm(delta_grad))**2
return eta_t
Try decreasing the value of eta. Gradient descent can diverge if eta is too high.

Categories

Resources