Scipy Kmeans exits with TypeError

Scipy Kmeans exits with TypeError - python

When running the code below, I'm getting a TypeError that says:
"File "_vq.pyx", line 342, in scipy.cluster._vq.update_cluster_means
TypeError: type other than float or double not supported"
from PIL import Image
import scipy, scipy.misc, scipy.cluster
NUM_CLUSTERS = 5
im = Image.open('d:/temp/test.jpg')
ar = scipy.misc.fromimage(im)
shape = ar.shape
ar = ar.reshape(scipy.product(shape[:2]), shape[2])
codes, dist = scipy.cluster.vq.kmeans(ar, NUM_CLUSTERS)
vecs, dist = scipy.cluster.vq.vq(ar, codes)
counts, bins = scipy.histogram(vecs, len(codes))
peak = codes[scipy.argmax(counts)]
print 'Most frequent color: %s (#%s)' % (peak, ''.join(chr(c) for c in peak).encode('hex'))
I have no idea how to fix this.
Update:
Full traceback:
Traceback (most recent call last):
File "...\temp.py", line 110, in <module>
codes, dist = scipy.cluster.vq.kmeans2(ar, NUM_CLUSTERS)
File "...\site-packages\scipy\cluster\vq.py", line 642, in kmeans2
new_code_book, has_members = _vq.update_cluster_means(data, label, nc)
File "_vq.pyx", line 342, in scipy.cluster._vq.update_cluster_means
TypeError: type other than float or double not supported

Doing:
ar = ar.reshape(scipy.product(shape[:2]), shape[2])
print(ar.dtype)
you will see, that you call kmeans with data of type uint8.
As kmeans, in theory, is defined on a d-dimensional real vector, scipy also does not like it (as given in the error)!
So just do:
ar = ar.reshape(scipy.product(shape[:2]), shape[2]).astype(float)
Casting like that is making my example run until the print, which also needs to be changed to reflect the given types.

Related

KDE - Is there something wrong in scipy or numpy? Or is it something I am doing?

I am simply trying to follow an example: https://medium.com/swlh/how-to-analyze-volume-profiles-with-python-3166bb10ff24
I am only on the second step and I am getting errors. Here is my code:
# Load data
df = botc.ib.data_saver.get_df(SYMBOL.lower())
# Separate for vol prof
volume = np.asarray(df['Volume'])
close = np.asarray(df['Close'])
print("Close:")
print(close)
print("VOLUME:")
print(volume)
# Plot volume profile based on close
px.histogram(df, x="Volume", y="Close", nbins=150, orientation='h').show()
# Kernel Density Estimator
kde_factor = 0.05
num_samples = 500
kde = stats.gaussian_kde(close, weights=volume, bw_method=kde_factor)
xr = np.linspace(close.min(), close.max(), num_samples)
kdy = kde(xr)
ticks_per_sample = (xr.max() - xr.min()) / num_samples
def get_dist_plot(c, v, kx, ky):
fig = go.Figure()
fig.add_trace(go.Histogram(name="Vol Profile", x=c, y=v, nbinsx=150,
histfunc='sum', histnorm='probability density'))
fig.add_trace(go.Scatter(name="KDE", x=kx, y=ky, mode='lines'))
return fig
get_dist_plot(close, volume, xr, kdy).show()
And here are the errors:
Traceback (most recent call last):
File "C:/Users/Jagel/PycharmProjects/VolumeBotv1-1-1/main.py", line 80, in <module>
start_bot()
File "C:/Users/Jagel/PycharmProjects/VolumeBotv1-1-1/main.py", line 64, in start_bot
kde = stats.gaussian_kde(close, weights=volume, bw_method=kde_factor)
File "M:\PROGRAMS\Anacondaa\envs\MLStockBot2\lib\site-packages\scipy\stats\_kde.py", line 207, in __init__
self.set_bandwidth(bw_method=bw_method)
File "M:\PROGRAMS\Anacondaa\envs\MLStockBot2\lib\site-packages\scipy\stats\_kde.py", line 555, in set_bandwidth
self._compute_covariance()
File "M:\PROGRAMS\Anacondaa\envs\MLStockBot2\lib\site-packages\scipy\stats\_kde.py", line 564, in _compute_covariance
self._data_covariance = atleast_2d(cov(self.dataset, rowvar=1,
File "<__array_function__ internals>", line 180, in cov
File "M:\PROGRAMS\Anacondaa\envs\MLStockBot2\lib\site-packages\numpy\lib\function_base.py", line 2680, in cov
avg, w_sum = average(X, axis=1, weights=w, returned=True)
File "<__array_function__ internals>", line 180, in average
File "M:\PROGRAMS\Anacondaa\envs\MLStockBot2\lib\site-packages\numpy\lib\function_base.py", line 550, in average
avg = np.multiply(a, wgt,
TypeError: can't multiply sequence by non-int of type 'float'
I have looked all over the internet for over an hour and haven't been able to solve this. Sorry if it is simple, but I'm starting to get quite angry, so any help is very much appreciated.
Other things I have tried: using different bw_methods, convert to numpy array first.

I don't know about your data, but in your bug, I can reproduce the error as follows:
>>> [5] * 0.1
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_18536/2403475853.py in <module>
----> 1 [5] * 0.1
TypeError: can't multiply sequence by non-int of type 'float'
So, you can check about your data, I think in a certain row of the column there is array data

pymc3 - stochastic volatility model with latent AR(1) process

I've been trying to implement and estimate, with pymc3, a basic stochastic volatility (SV) model of the following form:
r_t = exp{h_t/2}*e_t
h_t = r_0 + r_1*h_{t-1} + n_t
where r_t is the return process and h_t the (latent) log-variance process following a AR(1) process. My code (MWE) for this looks as follows:
import numpy as np
import pymc3 as pm
# simulate some random data
np.random.seed(13)
data = np.random.randn(10)
# SV model with AR
with pm.Model() as model:
nu = 2
rho = pm.Uniform("rho", -1, 1)
h = pm.AR("h", rho=rho, sigma=1, shape=len(data))
volatility_process = pm.Deterministic(
"volatility_process", pm.math.exp(h / 2) ** 0.5
)
r = pm.StudentT("r", nu=nu, sigma=volatility_process, observed=data)
prior = pm.sample_prior_predictive(10)
# trace = pm.sample(10)
But running the above results in the following error message:
Traceback (most recent call last):
File "C:\Users\jrilla\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\distributions\distribution.py", line 801, in _draw_value
return dist_tmp.random(point=point, size=size)
File "C:\Users\jrilla\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\distributions\continuous.py", line 1979, in random
point=point, size=size)
File "C:\Users\jrilla\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\distributions\distribution.py", line 638, in draw_values
raise ValueError('Cannot resolve inputs for {}'.format([str(params[j]) for j in to_eval]))
ValueError: Cannot resolve inputs for ['Elemwise{mul,no_inplace}.0']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 9, in <module>
File "C:\Users\jrilla\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\sampling.py", line 1495, in sample_prior_predictive
values = draw_values([model[name] for name in names], size=samples)
File "C:\Users\jrilla\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\distributions\distribution.py", line 620, in draw_values
size=size)
File "C:\Users\jrilla\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\distributions\distribution.py", line 810, in _draw_value
size=None))
File "C:\Users\jrilla\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\distributions\continuous.py", line 1979, in random
point=point, size=size)
File "C:\Users\jrilla\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\distributions\distribution.py", line 638, in draw_values
raise ValueError('Cannot resolve inputs for {}'.format([str(params[j]) for j in to_eval]))
ValueError: Cannot resolve inputs for ['Elemwise{mul,no_inplace}.0']
Moreover, it is exactly the prior = ... line that causes the error. Note that I am using pm.AR() instead of pm.AR1(), but this would result in the same error. I don't really understand why it does not work as expected. I am able to run the (simplified) SV example as provided in the pymc3 documentation:
# SV model with GaussianRandomWalk
with pm.Model() as model:
nu = 2
sigma = pm.Exponential("sigma", 1.0, testval=1.0)
s = pm.GaussianRandomWalk("s", sigma=sigma, shape=len(data))
volatility_process = pm.Deterministic(
"volatility_process", pm.math.exp(-2 * s) ** 0.5
)
r = pm.StudentT("r", nu=nu, sigma=volatility_process, observed=data)
prior = pm.sample_prior_predictive(10)
# trace = pm.sample(10)
where they show the example for a Gaussian random walk (GRW), instead of the general AR process I want to use. Since the GRM is just a specific AR and it works for GRM, I don't see why it shouldn't also work for the general AR. As can be seen in the code, I basically just replace pm.GaussianRandomWalk(...) with pm.AR(...) (each with their required arguments). I am also able to simply implement/estimate the AR process itself:
# Simple AR
with pm.Model() as model:
rho = pm.Uniform("rho", -1, 1)
h = pm.AR("h", rho=rho, sigma=1, shape=len(data), observed=data)
prior = pm.sample_prior_predictive(10)
#trace = pm.sample(10)
which works fine as well, so I assume I am not making a mistake with defining the AR. It's only when the AR is used as the latent process that the error arises. The pymc3 documentation on both GRW and AR models can be found here
Any idea on what the issue is here or what I'm doing wrong?
Thanks!

Having asked the same question on the pymc3 discours pages. The developers responded and the reason for the above error is that for pm.sample_prior_predictive to work, one needs the random() method, which is only implemented for the GRM. However, pm.sample works just fine.

Getting ValueError: setting an array element with a sequence from tf.contrib.keras.preprocessing.image.ImageDatagenerator.flow

I am trying to do Data Augmentation in Tensorflow. I have written this code.
import numpy as np
import tensorflow as tf
import tensorflow.contrib.keras as keras
import time, random
def get_image_data_generator():
return keras.preprocessing.image.ImageDataGenerator(
rotation_range=get_random_rotation_angle(),\
width_shift_range=get_random_wh_shift(),\
height_shift_range=get_random_wh_shift(),\
shear_range=get_random_shear(),\
zoom_range=get_random_zoom(),\
horizontal_flip=get_random_flip(),\
vertical_flip=get_random_flip(),\
preprocessing_function=get_random_function())
def augment_data(image_array,label_array):
print image_array.shape
images_array = image_array.copy()
labels_array = label_array.copy()
#Create a list of various datagenerators with different arguments
datagenerators = []
ndg = 10
#Creating 10 different generators
for ndata in xrange(ndg):
datagenerators.append(get_image_data_generator())
#Setting batch_size to be equal to no.of images
bsize = image_array.shape[0]
print bsize
#Obtaining the augmented data
for dgen in datagenerators:
dgen.fit(image_array)
(aug_img,aug_label) = dgen.flow(image_array,label_array,batch_size=bsize,shuffle=True)
print aug_img.shape
#Concatenating with the original data
images_array = np.concatenate([images_array,aug_img],axis=0)
labels_array = np.concatenate([labels_array,aug_label],axis=0)
return (images_array,labels_array)
When I run the code using
augment_data(image_array,label_array)
I get an error which says
Traceback (most recent call last):
File "cnn_model.py", line 40, in <module>
images_array,labels_array = augment_data(image_array,label_array)
File "/media/siladittya/d801fb13-809a-41b4-8f2e-d617ba103aba/ISI/code/2. known_object_detection/aug_data.py", line 47, in augment_data
(aug_img,aug_label) = dgen.flow(image_array,label_array,batch_size=10000,shuffle=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/_impl/keras/preprocessing/image.py", line 1018, in next
return self._get_batches_of_transformed_samples(index_array)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/_impl/keras/preprocessing/image.py", line 991, in _get_batches_of_transformed_samples
batch_x[i] = x
ValueError: setting an array element with a sequence.
Edit :: I am getting this error even if I pass a single image as argument.
What am I doing wrong here? I can't understand. Please help.

Edit :: I am getting this error even if I pass a single image as argument.`
Can you pass the single element as an array and see:
example:
image_array, label_array = augment_data([image], [label])

Classifier.fit for oneclassSVM complaining about float Type. TypeError float is required

I'm trying to fit two One Class SVMs to a small sets of data. These sets of data are call m1 and m2 respectively. m1 and m2 are lists of decimals which are converted to numpy arrays of type float t1 and t2.
When I attempt to fit the oneclass SVMs to these sets of data I am seeing errors saying that the the fit function will only accept a float. Can someone help me fix this problem?
Example Values:
m1 =[0.020000000000000018, 0.22799999999999998, 0.15799999999999992, 0.18999999999999995, 0.264]
m2 = [0.1279999999999999, 0.07400000000000007, 0.75, 1.0, 1.0]
Code below:
classifier1 =sklearn.svm.OneClassSVM(kernel='linear', nu ='0.5',gamma ='auto')
classifier2 = sklearn.svm.OneClassSVM(kernel='linear', nu ='0.5',gamma='auto')
for x in xrange(len(m1)):
print" Iteration "+str(x)
t1.append(float(m1[x]))
t2.append(float(m2[x]))
tx = np.array(t1).astype(float)
ty = np.array(t2).astype(float)
t1 = np.r_[tx+1.0,tx-1.0]
t2 = np.r_[ty+1.0,ty-1.0]
print t1
print t2
clfit1 = classifier1.fit(t1.astype(float))
clfit2 = classifier2.fit(t2.astype(float))
Error on commandline:
/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
Traceback (most recent call last):
File "normalize_data.py", line 108, in <module>
main()
File "normalize_data.py", line 15, in main
trainSVM(result1[0],yval1,result2[0],yval2,0.04)
File "normalize_data.py", line 99, in trainSVM
clfit1 = classifier1.fit(t1.astype(float))
File "/usr/local/lib/python2.7/dist-packages/sklearn/svm/classes.py", line 1029, in fit
**params)
File "/usr/local/lib/python2.7/dist-packages/sklearn/svm/base.py", line 193, in fit
fit(X, y, sample_weight, solver_type, kernel, random_seed=seed)
File "/usr/local/lib/python2.7/dist-packages/sklearn/svm/base.py", line 251, in _dense_fit
max_iter=self.max_iter, random_seed=random_seed)
File "sklearn/svm/libsvm.pyx", line 59, in sklearn.svm.libsvm.fit (sklearn/svm/libsvm.c:1571)
TypeError: a float is required

made an error and set nu as a string instead of a float.
setting nu=0.05 fixes the problem.

Strange TypeError with Theano

Traceback (most recent call last):
File "test.py", line 37, in <module>
print convLayer1.output.shape.eval({x:xTrain})
File "/Volumes/TONY/anaconda/lib/python2.7/site-packages/theano/gof/graph.py", line 415, in eval
rval = self._fn_cache[inputs](*args)
File "/Volumes/TONY/anaconda/lib/python2.7/site-packages/theano/compile/function_module.py", line 513, in __call__
allow_downcast=s.allow_downcast)
File "/Volumes/TONY/anaconda/lib/python2.7/site-packages/theano/tensor/type.py", line 180, in filter
"object dtype", data.dtype)
TypeError
And here is my code:
import scipy.io as sio
import numpy as np
import theano.tensor as T
from theano import shared
from convnet3d import ConvLayer, NormLayer, PoolLayer, RectLayer
from mlp import LogRegr, HiddenLayer, DropoutLayer
from activations import relu, tanh, sigmoid, softplus
dataReadyForCNN = sio.loadmat("DataReadyForCNN.mat")
xTrain = dataReadyForCNN["xTrain"]
# xTrain = np.random.rand(10, 1, 5, 6, 2).astype('float64')
xTrain.shape
dtensor5 = T.TensorType('float64', (False,)*5)
x = dtensor5('x') # the input data
yCond = T.ivector()
# input = (nImages, nChannel(nFeatureMaps), nDim1, nDim2, nDim3)
kernel_shape = (5,6,2)
fMRI_shape = (51, 61, 23)
n_in_maps = 1 # channel
n_out_maps = 5 # num of feature maps, aka the depth of the neurons
num_pic = 2592
layer1_input = x
# layer1_input.eval({x:xTrain}).shape
# layer1_input.shape.eval({x:numpy.zeros((2592, 1, 51, 61, 23))})
convLayer1 = ConvLayer(layer1_input, n_in_maps, n_out_maps, kernel_shape, fMRI_shape,
num_pic, tanh)
print convLayer1.output.shape.eval({x:xTrain})
It is really weird as the error was not thrown in Jupyter (but it takes long long time to run and finally the kernel is down I really don't know why), but as I move it to the shell and run python fileName.py the error was thrown.

The problem lies in loadmat from scipy. The typeerror you are getting is thrown by this code in Theano:
if not data.flags.aligned:
...
raise TypeError(...)
Now, when you create a new array in numpy from raw data, it would usually be aligned:
>>> a = np.array(2)
>>> a.flags.aligned
True
But if you savemat / loadmat it, the value of the flag gets lost:
>>> savemat('test', {'a':a})
>>> a2 = loadmat('test')['a']
>>> a2.flags.aligned
False
(seems like this particular issue is discussed here)
One quick and dirty way to address it is to create a new numpy array from the array you loaded:
>>> a2 = loadmat('test')['a']
>>> a3 = np.array(a2)
>>> a3.flags.aligned
True
So, for your code:
dataReadyForCNN = np.array(sio.loadmat("DataReadyForCNN.mat"))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Scipy Kmeans exits with TypeError - python

Related

KDE - Is there something wrong in scipy or numpy? Or is it something I am doing?

pymc3 - stochastic volatility model with latent AR(1) process

Getting ValueError: setting an array element with a sequence from tf.contrib.keras.preprocessing.image.ImageDatagenerator.flow

Classifier.fit for oneclassSVM complaining about float Type. TypeError float is required

Strange TypeError with Theano

Categories

Resources