Basinhopping Causes Error with Statsmodels MNLogit - python

I tried running the code below to fit a multinomial Logit model using the basinhopping method, but it returns the following error:
import numpy as np
import statsmodels.api as sm
x = np.random.randint(0, 100, 1000)
y = np.random.randint(0, 3, 1000)
model = sm.MNLogit(y, sm.add_constant(x))
results = model.fit(method='basinhopping')
print(results.summary())
Traceback (most recent call last):
File "/Users/wagnerpf134/opt/anaconda3/lib/python3.9/site-packages/spyder_kernels/py3compat.py", line 356, in compat_exec
exec(code, globals, locals)
File "/Users/wagnerpf134/Documents/untitled0.py", line 7, in <module>
results = model.fit(method='basinhopping')
File "/Users/wagnerpf134/opt/anaconda3/lib/python3.9/site-packages/statsmodels/discrete/discrete_model.py", line 654, in fit
mnfit = base.LikelihoodModel.fit(self, start_params = start_params,
File "/Users/wagnerpf134/opt/anaconda3/lib/python3.9/site-packages/statsmodels/base/model.py", line 563, in fit
xopt, retvals, optim_settings = optimizer._fit(f, score, start_params,
File "/Users/wagnerpf134/opt/anaconda3/lib/python3.9/site-packages/statsmodels/base/optimizer.py", line 241, in _fit
xopt, retvals = func(objective, gradient, start_params, fargs, kwargs,
File "/Users/wagnerpf134/opt/anaconda3/lib/python3.9/site-packages/statsmodels/base/optimizer.py", line 1040, in _fit_basinhopping
retvals = optimize.basinhopping(f, start_params,
File "/Users/wagnerpf134/opt/anaconda3/lib/python3.9/site-packages/scipy/optimize/_basinhopping.py", line 728, in basinhopping
callback(bh.storage.minres.x, bh.storage.minres.fun, True)
TypeError: <lambda>() takes 1 positional argument but 3 were given
The error does not appear when I specify other optimization methods, and it does not appear when I use basinhopping with Statsmodel's Logit model on binary dependent variables or OrderedModel(distr='logit') on ordered categorical dependent variables, so I'm not really sure what's going wrong. Any help in resolving this error would be greatly appreciated.

Related

MinMaxScaler for dataframe: ValueError: setting an array element with a sequence

I do the preprocessing for the data to apply to K-means cluster for time-series data following hour. Then, I normalize the data but it shows the error:
`
Traceback (most recent call last):
File ".venv\lib\site-packages\pandas\core\series.py", line 191, in wrapper
raise TypeError(f"cannot convert the series to {converter}")
TypeError: cannot convert the series to <class 'float'>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ".venv\timesequence.py", line 210, in <module>
matrix = pd.DataFrame(scaler.fit_transform(x_calls), columns=df_hours.columns, index=df_hours.index)
File ".venv\lib\site-packages\sklearn\base.py", line 867, in fit_transform
return self.fit(X, **fit_params).transform(X)
File ".venv\lib\site-packages\sklearn\preprocessing\_data.py", line 420, in fit
return self.partial_fit(X, y)
File ".venv\lib\site-packages\sklearn\preprocessing\_data.py", line 457, in partial_fit
X = self._validate_data(
File ".venv\lib\site-packages\sklearn\base.py", line 577, in _validate_data
X = check_array(X, input_name="X", **check_params)
File ".venv\lib\site-packages\sklearn\utils\validation.py", line 856, in check_array
array = np.asarray(array, order=order, dtype=dtype)
File ".venv\lib\site-packages\pandas\core\generic.py", line 2064, in __array__
return np.asarray(self._values, dtype=dtype)
ValueError: setting an array element with a sequence.
#--------------------Preprocessing ds
counter_ = 0
zero = 0
df_hours = pd.DataFrame({
'Hour': [],
'SumView':[],
'CountStudent':[]
}, dtype=object)
while counter_ < 24:
if (counter_ in sub_data_hour['Hour']):
row = sub_data_hour.loc[(pd.to_numeric(sub_data_hour['Hour'], errors='coerce')) == counter_]
df_hours.loc[len(df_hours.index)] = [counter_, row['SumView'], row['CountStudent']]
else:
df_hours.loc[len(df_hours.index)] = [counter_, zero, zero]
counter_ += 1
#----------Normalize dataset------------
x_calls = df_hours.columns[2:]
scaler = MinMaxScaler()
matrix = pd.DataFrame(scaler.fit_transform(df_hours[x_calls]), columns=x_calls, index=df_hours.index)
`
I did try .to_numpy() or .values or [['column1','column2']] following this post pandas dataframe columns scaling with sklearn
But it did not work. Could anyone please help me to fix this? Thanks.
The problem here is the datatype of df_hours I preprocessed.
Solution: change row['SumView'] to row['SumView'].values[0] and do the same with row['CountStudent'].

KDE - Is there something wrong in scipy or numpy? Or is it something I am doing?

I am simply trying to follow an example: https://medium.com/swlh/how-to-analyze-volume-profiles-with-python-3166bb10ff24
I am only on the second step and I am getting errors. Here is my code:
# Load data
df = botc.ib.data_saver.get_df(SYMBOL.lower())
# Separate for vol prof
volume = np.asarray(df['Volume'])
close = np.asarray(df['Close'])
print("Close:")
print(close)
print("VOLUME:")
print(volume)
# Plot volume profile based on close
px.histogram(df, x="Volume", y="Close", nbins=150, orientation='h').show()
# Kernel Density Estimator
kde_factor = 0.05
num_samples = 500
kde = stats.gaussian_kde(close, weights=volume, bw_method=kde_factor)
xr = np.linspace(close.min(), close.max(), num_samples)
kdy = kde(xr)
ticks_per_sample = (xr.max() - xr.min()) / num_samples
def get_dist_plot(c, v, kx, ky):
fig = go.Figure()
fig.add_trace(go.Histogram(name="Vol Profile", x=c, y=v, nbinsx=150,
histfunc='sum', histnorm='probability density'))
fig.add_trace(go.Scatter(name="KDE", x=kx, y=ky, mode='lines'))
return fig
get_dist_plot(close, volume, xr, kdy).show()
And here are the errors:
Traceback (most recent call last):
File "C:/Users/Jagel/PycharmProjects/VolumeBotv1-1-1/main.py", line 80, in <module>
start_bot()
File "C:/Users/Jagel/PycharmProjects/VolumeBotv1-1-1/main.py", line 64, in start_bot
kde = stats.gaussian_kde(close, weights=volume, bw_method=kde_factor)
File "M:\PROGRAMS\Anacondaa\envs\MLStockBot2\lib\site-packages\scipy\stats\_kde.py", line 207, in __init__
self.set_bandwidth(bw_method=bw_method)
File "M:\PROGRAMS\Anacondaa\envs\MLStockBot2\lib\site-packages\scipy\stats\_kde.py", line 555, in set_bandwidth
self._compute_covariance()
File "M:\PROGRAMS\Anacondaa\envs\MLStockBot2\lib\site-packages\scipy\stats\_kde.py", line 564, in _compute_covariance
self._data_covariance = atleast_2d(cov(self.dataset, rowvar=1,
File "<__array_function__ internals>", line 180, in cov
File "M:\PROGRAMS\Anacondaa\envs\MLStockBot2\lib\site-packages\numpy\lib\function_base.py", line 2680, in cov
avg, w_sum = average(X, axis=1, weights=w, returned=True)
File "<__array_function__ internals>", line 180, in average
File "M:\PROGRAMS\Anacondaa\envs\MLStockBot2\lib\site-packages\numpy\lib\function_base.py", line 550, in average
avg = np.multiply(a, wgt,
TypeError: can't multiply sequence by non-int of type 'float'
I have looked all over the internet for over an hour and haven't been able to solve this. Sorry if it is simple, but I'm starting to get quite angry, so any help is very much appreciated.
Other things I have tried: using different bw_methods, convert to numpy array first.
I don't know about your data, but in your bug, I can reproduce the error as follows:
>>> [5] * 0.1
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_18536/2403475853.py in <module>
----> 1 [5] * 0.1
TypeError: can't multiply sequence by non-int of type 'float'
So, you can check about your data, I think in a certain row of the column there is array data

Why am I getting ValueError: zero-size array to reduction operation maximum which has no identity?

I am using the following code.
import pmdarima as pm
import numpy as np
import pandas as pd
data = pd.read_csv('data.csv', index_col=0, header=None)
split = int(0.8 * data.shape[0])
train, test = data.iloc[:split], data.iloc[split:]
arima = pm.auto_arima(train, error_action='ignore', trace=1, suppress_warnings=True, seasonal=True, m=12)
This gives me a value error
ValueError: zero-size array to reduction operation maximum which has no identity
however when switching the line
arima = pm.auto_arima(train, error_action='ignore', trace=1, suppress_warnings=True, seasonal=True, m=12)
to
arima = pm.auto_arima(data, error_action='ignore', trace=1, suppress_warnings=True, seasonal=True, m=12)
I get no errors.
Data is pandas dataframe with two columns one containing dates and the other integers.
data and train both seem to be of the same type so I don't know what is causing this error?
Edit: Full error message with traceback
Traceback (most recent call last):
File "C:/Users/user.a/PycharmProjects/forecastingtrial/example.py", line 17, in <module>
arima = pm.auto_arima(train, error_action='ignore', trace=1, suppress_warnings=True, seasonal=True, m=12)
File "C:\Users\user.a\PycharmProjects\forecastingtrial\venv\lib\site-packages\pmdarima\arima\auto.py", line 394, in auto_arima
**seasonal_test_args)
File "C:\Users\user.a\PycharmProjects\forecastingtrial\venv\lib\site-packages\pmdarima\arima\utils.py", line 112, in nsdiffs
dodiff = testfunc(x)
File "C:\Users\user.a\PycharmProjects\forecastingtrial\venv\lib\site-packages\pmdarima\arima\seasonality.py", line 569, in estimate_seasonal_differencing_term
stat = self._compute_test_statistic(x)
File "C:\Users\user.a\PycharmProjects\forecastingtrial\venv\lib\site-packages\pmdarima\arima\seasonality.py", line 509, in _compute_test_statistic
fit = self._fit_ocsb(x, m, lag_term, maxlag)
File "C:\Users\user.a\PycharmProjects\forecastingtrial\venv\lib\site-packages\pmdarima\arima\seasonality.py", line 465, in _fit_ocsb
ar_fit = sm.OLS(y, add_constant(mf)).fit(method='qr')
File "C:\Users\user.a\PycharmProjects\forecastingtrial\venv\lib\site-packages\statsmodels\tools\tools.py", line 305, in add_constant
is_nonzero_const = np.ptp(x, axis=0) == 0
File "<__array_function__ internals>", line 6, in ptp
File "C:\Users\user.a\PycharmProjects\forecastingtrial\venv\lib\site-packages\numpy\core\fromnumeric.py", line 2496, in ptp
return _methods._ptp(a, axis=axis, out=out, **kwargs)
File "C:\Users\user.a\PycharmProjects\forecastingtrial\venv\lib\site-packages\numpy\core\_methods.py", line 230, in _ptp
umr_maximum(a, axis, None, out, keepdims),
ValueError: zero-size array to reduction operation maximum which has no identity
Process finished with exit code 1

Librosa feature tonnetz ends up in TypeError

I'm trying to extract tonnetz from the harmonic components of my audio. My code is basically a copy paste from the tutorial https://librosa.github.io/librosa/generated/librosa.feature.tonnetz.html
My code:
import librosa
def extract_feature(file_name):
y, sr = librosa.load(file_name)
y = librosa.effects.harmonic(y)
tonnetz = librosa.feature.tonnetz(y=y, sr=sr)
return tonnetz
print extract_feature("out.wav")
Here's the stack trace:
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/librosa/core/pitch.py:160: DeprecationWarning: object of type <type 'numpy.float64'> cannot be safely interpreted as an integer.
bins = np.linspace(-0.5, 0.5, np.ceil(1./resolution), endpoint=False)
Traceback (most recent call last):
File "test_python.py", line 10, in <module>
print extract_feature("out.wav")
File "test_python.py", line 6, in extract_feature
tonnetz = librosa.feature.tonnetz(y=y, sr=sr)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/librosa/feature/spectral.py", line 1157, in tonnetz
chroma = chroma_cqt(y=y, sr=sr)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/librosa/feature/spectral.py", line 936, in chroma_cqt
real=False))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/librosa/core/constantq.py", line 251, in cqt
cqt_resp.append(__cqt_response(my_y, n_fft, my_hop, fft_basis))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/librosa/core/constantq.py", line 531, in __cqt_response
D = stft(y, n_fft=n_fft, hop_length=hop_length, window=np.ones)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/librosa/core/spectrum.py", line 167, in stft
y_frames = util.frame(y, frame_length=n_fft, hop_length=hop_length)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/librosa/util/utils.py", line 102, in frame
strides=(y.itemsize, hop_length * y.itemsize))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/stride_tricks.py", line 102, in as_strided
array = np.asarray(DummyArray(interface, base=x))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numeric.py", line 531, in asarray
return array(a, dtype, copy=False, order=order)
TypeError: 'float' object cannot be interpreted as an index
Any idea how to fix this?
I rolled back to numpy 1.10.1 to fix this issue (although I was running chroma_cqt and getting this error).
Numpy > 1.11 as_strided makes a dummyarray and asarray does not handle the float array properly. 1.10.1 numpy has better stride_tricks, apparently.

Classifier.fit for oneclassSVM complaining about float Type. TypeError float is required

I'm trying to fit two One Class SVMs to a small sets of data. These sets of data are call m1 and m2 respectively. m1 and m2 are lists of decimals which are converted to numpy arrays of type float t1 and t2.
When I attempt to fit the oneclass SVMs to these sets of data I am seeing errors saying that the the fit function will only accept a float. Can someone help me fix this problem?
Example Values:
m1 =[0.020000000000000018, 0.22799999999999998, 0.15799999999999992, 0.18999999999999995, 0.264]
m2 = [0.1279999999999999, 0.07400000000000007, 0.75, 1.0, 1.0]
Code below:
classifier1 =sklearn.svm.OneClassSVM(kernel='linear', nu ='0.5',gamma ='auto')
classifier2 = sklearn.svm.OneClassSVM(kernel='linear', nu ='0.5',gamma='auto')
for x in xrange(len(m1)):
print" Iteration "+str(x)
t1.append(float(m1[x]))
t2.append(float(m2[x]))
tx = np.array(t1).astype(float)
ty = np.array(t2).astype(float)
t1 = np.r_[tx+1.0,tx-1.0]
t2 = np.r_[ty+1.0,ty-1.0]
print t1
print t2
clfit1 = classifier1.fit(t1.astype(float))
clfit2 = classifier2.fit(t2.astype(float))
Error on commandline:
/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
Traceback (most recent call last):
File "normalize_data.py", line 108, in <module>
main()
File "normalize_data.py", line 15, in main
trainSVM(result1[0],yval1,result2[0],yval2,0.04)
File "normalize_data.py", line 99, in trainSVM
clfit1 = classifier1.fit(t1.astype(float))
File "/usr/local/lib/python2.7/dist-packages/sklearn/svm/classes.py", line 1029, in fit
**params)
File "/usr/local/lib/python2.7/dist-packages/sklearn/svm/base.py", line 193, in fit
fit(X, y, sample_weight, solver_type, kernel, random_seed=seed)
File "/usr/local/lib/python2.7/dist-packages/sklearn/svm/base.py", line 251, in _dense_fit
max_iter=self.max_iter, random_seed=random_seed)
File "sklearn/svm/libsvm.pyx", line 59, in sklearn.svm.libsvm.fit (sklearn/svm/libsvm.c:1571)
TypeError: a float is required
made an error and set nu as a string instead of a float.
setting nu=0.05 fixes the problem.

Categories

Resources