Error in acorr_ljungbox from statsmodel - python

So I am trying to do a box-ljung test on a resudual, but I am getting a strange error and am not able to figure out why.
x = diag.acorr_ljungbox(np.random.random(20))
I tried doing the same with a random array also, still the same error:
ValueError: operands could not be broadcast together with shapes (19,) (40,)

This looks like a bug in the default lag setting, which is set to 40 independent of the length of the data.
As a workaround and to get a proper statistic, the lags needs to be restricted, e.g. using 5 lags below.
>>> from statsmodels.stats import diagnostic as diag
>>> diag.acorr_ljungbox(np.random.random(50))[0].shape
(40,)
>>> diag.acorr_ljungbox(np.random.random(20), lags=5)
(array([ 0.36718151, 1.02009595, 1.23734092, 3.75338034, 4.35387236]),
array([ 0.54454461, 0.60046677, 0.74406305, 0.44040973, 0.49966951]))

Related

Rodrigues function in openCV keeps giving error 'ValueError: matrix must be 2-dimensional' even with 2d arrays

So I have this code
#rotation matrix
R_mtx, jac = cv2.Rodrigues(rvecs[0])
cameraPosition = -R_mtx.T * np.matrix(tvecs)
cameraPosition
and the array rvecs from the callibrate camera function which is:
[array([[ 1.8774334 ],
[-0.02710091],
[ 0.25779132]])]
arr=rvecs[0]
print (arr.ndim)
This code gives out 2
So therefore the code is supposed to work because I am fulfilling the requirements for the function to work. But for some reason it doesn't. I tried looking into the code for the error and there is nothing wrong there too.
I corrected the error but it was also showing on the second line. I just corrected it too with
cameraPosition = -R_mtx.T * np.matrix(tvecs[0])

ValueError using dask QuantileTransformer : unknown shape (1, nan)

I'm trying to use the QuantileTransformer from das-ml
For that, I have the following DF:
When I try:
from dask_ml.preprocessing import StandardScaler,QuantileTransformer,MinMaxScaler
scaler = QuantileTransformer()
scaler.fit_transform(df[['LotFrontage','LotArea']])
I get this error:
ValueError: Tried to concatenate arrays with unknown shape (1, nan).
To force concatenation pass allow_unknown_chunksizes=True.
And I don't find where to set the parameter: allow_unknown_chunksizes=True
since in the transformer raises and error.
The first error disappears if I compute the df beforehand:
scaler = QuantileTransformer()
scaler.fit_transform(df[['LotFrontage','LotArea']].compute())
But I don't why is this necessary or even if it is the right thing to do.
Also, in contrast to the StandardScaler this returns an array instead of a dataframe.
This was a limitation of the previous Dask-ML implementation. It's fixed in https://github.com/dask/dask-ml/pull/533.

"RuntimeWarning: invalid value encountered in power" Using Scipy's ODR

I'm attempting to fit a function using Scipy's Orthogonal distance regression (odr) package and I keep getting the following error:
"RuntimeWarning: invalid value encountered in power"
this happened when I would use scipy's curve_fit function but I could always safely ignore the warning. But now it seems this is causing a numerical error that halts the fitting. I have based my code off of the example I found here:
python scipy.odrpack.odr example (with sample input / output)?
Here is my code:
import numpy as np
import scipy.odr.odrpack as odrpack
def divergence(x,xDiv):
return ( 1 - (x/xDiv) )**( -2.4 )
xValues = np.linspace(.25,.37,12)
yValues = np.array([ 6.94970607, 9.12475506, 10.65969954, 12.30241672,
14.44154148, 16.00261267, 19.98693664, 25.93076421,
30.89483997, 35.27106466, 50.81645983, 68.06009144])
xErrors = .0005*np.ones(len(xValues))
yErrors = np.array([ 0.31905094, 0.37956865, 0.24837562, 0.68320078, 1.25915789,
1.40241088, 0.33305157, 1.37165251, 0.32658393, 0.52253429,
1.04506858, 1.30633573])
wcModel = odrpack.Model(divergence)
mydata = odrpack.RealData(xValues, yValues, sx=xErrors, sy=yErrors)
myodr = odrpack.ODR(mydata, wcModel, beta0=[.8])
myoutput = myodr.run()
myoutput.pprint()
From looking at previous questions about this error I found here:
NumPy, RuntimeWarning: invalid value encountered in power
I suspected that the problem is that I'm raising a negatuve value to a power of a fractional value. But what I'm raising to the power -2.4 (1-x/xDiv) isn't negative (at least around the initial guess of xDiv=.8). But when I try to make my y-values of complex type I get a new error:
"ValueError: y could not be made into a suitable array"
from the line with the command
myoutput = myodr.run().
The only examples I can find that use this odr package are fitting to polynomials so I suspect that might be the problem?

How to get scipy.stats.chisquare to function properly

I have 2 input files of identical size/shape, however the data they contain has a different resolution and I am looking to perform a chi squared test on them.
The input files are 500 lines long and contain 4 columns delineated by spaces, I am trying to test the second column of each input file against the other.
My code is as follows:
# Import statements
C = pl.loadtxt("input_1.txt")
D = pl.loadtxt("input_2.txt")
col2_C = C[:,1]
col2_D = D[:,1]
f_obs = np.array([col2_C])
f_exp = np.array([col2_D])
chisquare(f_obs, f_exp)
This gives me an error saying:
ValueError: df <= 0
I don't even understand what it is complaining about here.
I have tried several other syntaxes within the script, each of which also resulted in various errors:
This one was found here.
chisquare = f_obs=[col2_C], f_exp=[col2_D])
TypeError: chisquare() takes at least one positional argument
Then I tried
chisquare = f_obs(col2_C), F_exp=[col2_D)
NameError: name 'f_obs' is not defined
I also tried several other syntactical tweaks but nothing to any avail. If anybody could please help me get this running I would appreciate it greatly.
Thanks in advance.
First, be sure you are importing chisquare from scipy.stats. Numpy has the function numpy.random.chisquare, but that does not do a statistical test. It generates samples from a chi-square probability distribution.
So be sure you use:
from scipy.stats import chisquare
There is a second problem.
As slices of the two-dimensional array returned by loadtxt, col2_C and col2_D are one-dimensional numpy arrays, so there is no need to use, for example, np.array([col2_C]) when you pass these to chisquare. Just use col2_C and col2_D directly:
chisquare(col2_C, col2_D)
Wrapping the arrays with np.array like you did is causing the problem. chisquare accepts multidimensional arrays and an axis argument. When you do f_exp = np.array([col2_C]) (with the extra square brackets), f_exp is actually a two-dimensional array, with shape (1, 500). Similarly f_obs has shape (1, 500). The default axis argument of chisquare is 0. So when you called chisquare(f_obs, f_exp), you were asking chisquare to perform 500 chi-square tests, with each test having a single observed and expected value.

scikit Mixtypes of Y error

Hi I'm a scikit newbie here. I'm trying to train the computer that given an array of float decide between the 3 classes. I was classifying the classes as 0, 0.5, and 1. I also tried 0, 1.0, and 2.0 . I still get the following error:
File "/Library/Python/2.7/site-packages/sklearn/utils/multiclass.py", line 85, in unique_labels
raise ValueError("Mix type of y not allowed, got types %s" % ys_types)
ValueError: Mix type of y not allowed, got types set(['continuous', 'multiclass'])
I have no idea what that error means
Try using integer types for your target labels. Or, perhaps better, use string labels like ['a', 'b', 'c'] but with more descriptive names.
If you check the code for this file multiclass.py (code is here) and look for the function type_of_target, you'll see that it is well-documented for this case.
Because some of the data are treated as float type (when 0.5 is included), it will believe you've got continuous-valued outputs, which won't do for multiclass discrete classification.
On the other hand, it will look at [0, 1.0, 2.0] like it is one integer and two floats, which is why you get both continuous and multiclass. Switching the last example to [0, 1, 2] should work. The documentation also makes it sound like switching to [0.0, 1.0. 2.0] would also work, but be careful and test that first.
Its hard to tell for sure without the code, but my guess is that the shape of your y data is not what is expected.
For example when my code threw this error it was because I was trying to pass y data into classification_report in the shape of (60000, 10, 2) when it was expecting it to be in the shape of (60000, 10)
I was re-running cells where I called to_categorical(y_test) more than once... When I loaded my code into a proper script and ran it it worked fine :)

Categories

Resources