numpy.unique throwing error

numpy.unique throwing error - python

I'm trying to use the following function:
def randomChose(bp, xsteps, ysteps, bs):
# Number of points to be chosen
s = int((bp * xsteps * ysteps) / (bs * bs))
# Generating an array representing the input indexes
indices = numpy.arange(xsteps * ysteps)
# Resampling without replacement
cs = npr.choice(indices, size=s, replace=False)
f = []
for idx in cs:
nb = indices[max(idx-(bs*bs/2), 0):min(idx+(bs*bs/2)+1, xsteps*ysteps)]
f.append(nb)
f = numpy.array(f).flatten()
fix = numpy.unique(numpy.array(f))
return fix
Which takes as parameter a number bp, the data dimension xsteps * ysteps and a % bs.
What I want to do is to choose a number of valid indexes considering some neighborhood in this image.
However, I keep receiving error when calling numpy.unique, not always, though:
ValueError Traceback (most recent call last)
<ipython-input-35-1b5914c3cbc7> in <module>()
9 svf_y = []
10 for s in range(samples):
---> 11 fix = randomChose(bp, xsteps, ysteps, bs)
12 rs_z0, rs_z1, rs_z2 = interpolate(len(fix), xsteps, ysteps, mean_rs)
13 ds_z0, ds_z1, ds_z2 = interpolate(len(fix), xsteps, ysteps, mean_ds)
<ipython-input-6-def08adce84b> in randomChose(bp, xsteps, ysteps, bs)
14 f.append(nb)
15 f = numpy.array(f).flatten()
---> 16 fix = numpy.unique(numpy.array(f))
17
18 return f
/usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.pyc in unique(ar, return_index, return_inverse, return_counts)
198 ar.sort()
199 aux = ar
--> 200 flag = np.concatenate(([True], aux[1:] != aux[:-1]))
201
202 if not optional_returns:
ValueError: all the input arrays must have same number of dimensions
This is how I call it:
nx = 57.2
ny = 24.0
xsteps = 144
ysteps = 106
bs = 5 # Block size
bp = 0.1 # Percentage of blocks
fix = randomChose(bp, xsteps, ysteps, bs)
I'm trying to understand what is wrong. As far as I understood, such method expect a ndarray as input, which is being given.
Thank you for any help.

First:
f.append(nb)
should become:
f.append(list(nb))
That makes f a list of lists, that Numpy will have a chance to convert to a Numpy array of integers, BUT ONLY if all the lists have the same length. If not, you will only have a one dimension Numpy array of lists, and flatten() will have no effect.
You may add a
print(type(f[0]))
after the flattening.

The problem is with the edges. E.g., if idx=0,
nb = indices[max(idx-(bs*bs/2), 0):min(idx+(bs*bs/2)+1, xsteps*ysteps)]
is going to be [0]- i.e., with only one value instead of an xy coordinate. Then you will not be able to flatten your array properly.

Related

Index issue whilst running fmin_l_bfgs

I am trying to use the fmin_l_bfgs function in python to maximize the log-likelihood function below:
def loglik(x0):
p = np.zeros((NCS,1)) #vector to hold the probabilities for each observation
data['v'] = (data.iloc[:, [3,4]]).dot(x0) #calculate determinstic utility
for i in range(NCS):
vv = data.v[(data.idcase == i + 1)]
vy = data.v[(data.idcase == i + 1) & (data.depvar == 1)]
p[i][0] = np.maximum(np.exp(vy)/ sum(np.exp(vv)),0.00000001)
#print("p", p)
ll = -sum(np.log(p)) #Negative since neg of ll is minimized
return ll
The input data being used is:
data = pd.read_csv("drive/My Drive/example_data.csv") #read data
data.iloc[:, [3,4]] = data.iloc[:, [3,4]]/100 #scale costs
B = np.zeros((1,2)) #give starting values of beta; 1xK vector; 2alternatives so 1x2 vector
NCS = data['idcase'].nunique() # number of choice situations in the dataset
x0 = B.T
estimation
optim2 = fmin_l_bfgs_b(loglik, x0, fprime=None, args=(), approx_grad=0, bounds=None, m=10, factr=10000000.0, pgtol=1e-05, epsilon=1e-08,iprint=0, maxfun=15000, maxiter=15000, disp=None, callback=None)
However, I keep getting this:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-77-2821f2269a8c> in <module>()
83 print('which is the same as maximizing the log-likelihood.')
84
---> 85 optim2 = fmin_l_bfgs_b(loglik, x0, fprime=None, args=(), approx_grad=0, bounds=None, m=10, factr=10000000.0, pgtol=1e-05, epsilon=1e-08, iprint=0, maxfun=15000, maxiter=15000, disp=None, callback=None)
86
87 print(optim2)
4 frames
/usr/local/lib/python3.6/dist-packages/scipy/optimize/optimize.py in __call__(self, x, *args)
64 self.x = numpy.asarray(x).copy()
65 fg = self.fun(x, *args)
---> 66 self.jac = fg[1]
67 return fg[0]
68
IndexError: index 1 is out of bounds for axis 0 with size 1#
Can someone kindly advise me as to what to do? I am quite new in using numerical optimization methods.
Thanks

Can't differentiate wrt numpy arrays of dtype int64?

I am a newbie to numpy. Today when I use it to work with linear regression, it shows as below:
KeyError Traceback (most recent call
last)
~/anaconda3/lib/python3.6/site-packages/autograd/numpy/numpy_extra.py
in new_array_node(value, tapes)
84 try:
---> 85 return array_dtype_mappings[value.dtype](value, tapes)
86 except KeyError:
KeyError: dtype('int64')
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call
last)
<ipython-input-4-aebe8f7987b0> in <module>()
24 return cost/float(np.size(y))
25
---> 26 weight_h, cost_h = gradient_descent(least_squares, alpha,
max_its, w)
27
28 # a)
<ipython-input-2-1b74c4f818f4> in gradient_descent(g, alpha, max_its,
w)
12 for k in range(max_its):
13 # evaluate the gradient
---> 14 grad_eval = gradient(w)
15
16 # take gradient descent step
~/anaconda3/lib/python3.6/site-packages/autograd/core.py in
gradfun(*args, **kwargs)
19 #attach_name_and_doc(fun, argnum, 'Gradient')
20 def gradfun(*args,**kwargs):
---> 21 return
backward_pass(*forward_pass(fun,args,kwargs,argnum))
22 return gradfun
23
~/anaconda3/lib/python3.6/site-packages/autograd/core.py in
forward_pass(fun, args, kwargs, argnum)
57 tape = CalculationTape()
58 arg_wrt = args[argnum]
---> 59 start_node = new_node(safe_type(getval(arg_wrt)),
[tape])
60 args = list(args)
61 args[argnum] = merge_tapes(start_node, arg_wrt)
~/anaconda3/lib/python3.6/site-packages/autograd/core.py in
new_node(value, tapes)
185 def new_node(value, tapes=[]):
186 try:
--> 187 return Node.type_mappings[type(value)](value, tapes)
188 except KeyError:
189 return NoDerivativeNode(value, tapes)
~/anaconda3/lib/python3.6/site-packages/autograd/numpy/numpy_extra.py
in new_array_node(value, tapes)
85 return array_dtype_mappings[value.dtype](value, tapes)
86 except KeyError:
---> 87 raise TypeError("Can't differentiate wrt numpy arrays
of dtype {0}".format(value.dtype))
88 Node.type_mappings[anp.ndarray] = new_array_node
89
TypeError: Can't differentiate wrt numpy arrays of dtype int64
I really have no idea about what is happened. I guess it might be related to the structure of array in numpy. Or did I forget to download any packages? Below is my original codes.
# import statements
datapath = 'datasets/'
from autograd import numpy as np
# import automatic differentiator to compute gradient module
from autograd import grad
# gradient descent function
def gradient_descent(g,alpha,max_its,w):
# compute gradient module using autograd
gradient = grad(g)
# run the gradient descent loop
weight_history = [w] # weight history container
cost_history = [g(w)] # cost function history container
for k in range(max_its):
# evaluate the gradient
grad_eval = gradient(w)
# take gradient descent step
w = w - alpha*grad_eval
# record weight and cost
weight_history.append(w)
cost_history.append(g(w))
return weight_history,cost_history
# load in dataset
csvname = datapath + 'kleibers_law_data.csv'
data = np.loadtxt(csvname,delimiter=',')
# get input and output of dataset
x = data[:-1,:]
y = data[-1:,:]
x = np.log(x)
y = np.log(y)
#Data Initiation
alpha = 0.01
max_its = 1000
w = np.array([0,0])
#linear model
def model(x, w):
a = w[0] + np.dot(x.T, w[1:])
return a.T
def least_squares(w):
cost = np.sum((model(x,w)-y)**2)
return cost/float(np.size(y))
weight_h, cost_h = gradient_descent(least_squares, alpha, max_its, w)
# a)
k = np.linspace(-5.5, 7.5, 250)
y = weight_h[max_its][0] + k*weight_h[max_its][1]
plt.figure()
plt.plot(x, y, label='Linear Line', color='g')
plt.xlabel('log of mass')
plt.ylabel('log of metabolic rate')
plt.title("Answer Of a")
plt.legend()
plt.show()
# b)
w0 = weight_h[max_its][0]
w1 = weight_h[max_its][1]
print("Nonlinear relationship between the body mass x and the metabolic
rate y is " /
+ str(w0) + " + " + "log(xp)" + str(w1) + " = " + "log(yp)")
# c)
x2 = np.log(10)
Kj = np.exp(w0 + w1*x2)*1000/4.18
print("It needs " + str(Kj) + " calories")
Could someone help me to figure it out? Thanks a lot.

Here's the important parts of your error:
---> 14 grad_eval = gradient(w)
...
Type Error: Can't differentiate wrt numpy arrays of dtype int64
Your gradient function is saying it doesn't like to differentiate arrays of ints, which makes some sense, since it probably wants more precision than an int can give. You probably need them to be doubles or floats. For a simple solution to this, I believe you can just change your initializer from:
w = np.array([0,0])
which is going to automatically cast those 0s as ints, to:
w = np.array([0.0,0.0])
Those decimals after the 0 will let it know you want floats. There's other ways to go about telling it what kind of array you want (https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.array.html), but this is a simple way.

How can I map a vectorized function to a numpy array without using a for loop?

So here's what I already have:
import numpy as np
import matplotlib.pyplot as plt
def monteCarloPi(n):
np.random.seed() #seed the random number generator
y = np.random.rand(n)*2 - 1 #n random samples on (-1,1)
x = np.linspace(-1,1,n) #x axis to plot against
square = np.array([x,y]) #collecting axes as a single object
mask1 = ((x**2 + y**2) < 1) #filters
hits = np.sum(mask1) #calculating approximation
ratio = hits/n
pi_approx = ratio * 4
return pi_approx
Here is what I would like to do:
x = np.arange(100,1000)
y = monteCarloPi(x)
plt.scatter(x,y)
However, when I run the above code block, I get the following error:
---------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-52-bf4dcedaa309> in <module>()
1 x = np.arange(100,1000)
----> 2 y = monteCarloPi(x)
3 plt.scatter(x,y)
<ipython-input-51-8d5b36e22d4b> in monteCarloPi(n)
1 def monteCarloPi(n):
2 np.random.seed() #seed the random number generator
----> 3 y = np.random.rand(n)*2 - 1 #n random samples on (-1,1)
4 x = np.linspace(-1,1,n) #x axis to plot against
5
mtrand.pyx in mtrand.RandomState.rand()
mtrand.pyx in mtrand.RandomState.random_sample()
mtrand.pyx in mtrand.cont0_array()
TypeError: only integer scalar arrays can be converted to a scalar index
Based on my understanding of how broadcasting works in numpy, this should work. I could just use a for loop but that gets really slow really quickly as the number of samples goes up.
halp

Here is one option, where the maximum sample size is based, then subsampling occurs if start>0 (error handling not included).
import numpy as np
import matplotlib.pyplot as plt
def monteCarloPi(n,start=0,stride=1):
np.random.seed() # seed the random number generator
y = np.random.rand(n)*2 - 1 # n random samples on (-1,1)
x = np.linspace(-1,1,n) # x axis to plot against
mask = ( x**2 + y**2 ) < 1 # masking
samples = {}
inds = arange(n)
for k in range(n-start,n+1,stride):
sub_inds = np.random.choice(inds,k,replace=False)
sub_mask = mask[sub_inds]
sub_hits = np.sum(sub_mask)
ratio = sub_hits/n
pi_approx = ratio * 4
samples[k]=pi_approx
return pi_approx
This still requires a for loop, but it's handled inside the method quickly, since you're subsampling from one large random sample. To recreate your original call (running from n=100 to n=1000 [note that I am going up to n=1000 here]):
estimates = monteCarloPi(1000,start=900)
plt.plot(estimates.keys(),estimates.values())
You could of course pass the original x=arange(100,1001), but then there would need to be error checking in the method (to make sure an array or list was passed), and then n would be equal to the last element of x (n=x[-1]), and finally, the looping would be done over the elements of x (for k in x:).

__ValueError: setting an array element with a sequence

I was trying to calculate the trends of temperature
ntimes, ny, nx = tempF.shape
print tempF.shape
trend = MA.zeros((ny,nx),dtype=float)
print trend.shape
for y in range (ny):
for x in range(nx):
trends[y,x] = numpy.polyfit(tdum, tempF[:,y,x],1)
print trend()
the result is
(24, 241, 480)
(241, 480)
ValueErrorTraceback (most recent call last)
<ipython-input-31-4ac068601e48> in <module>()
12 for y in range (0,ny):
13 for x in range (0,nx):
---> 14 trend[y,x] = numpy.polyfit(tdum, tempF[:,y,x],1)
15
16
/home/charcoalp/anaconda2/envs/pyn_test/lib/python2.7/site-packages/numpy/ma/core.pyc in __setitem__(self, indx, value)
3272 if _mask is nomask:
3273 # Set the data, then the mask
-> 3274 _data[indx] = dval
3275 if mval is not nomask:
3276 _mask = self._mask = make_mask_none(self.shape, _dtype)
ValueError: setting an array element with a sequence.
I've just used python for few days, can any one help me, thank you

When you create ny by nx zeros ndarray, you can specify, which type you want to store in it's elements. If you want to store 1x2 array of float values (polyfit with degree=1 returns 1x2 array of floats) in each cell of your zeros array, you can choose a following type instead of float:
trend = numpy.zeros((ny,nx), dtype='2f')
After that, you can easily store your arrays, as elements of trend ndarray

ValueError: Chunks and shape must be of the same length/dimension

I read book "Introducing Data Science. Big data, machine learning, and more, using Python tools"
There is a code in Chapter4 about blocking matrix calculation:
import dask.array as da
import bcolz as bc
import numpy as np
import dask
n = 1e4 #A
ar = bc.carray(np.arange(n).reshape(n/2,2) , dtype='float64', rootdir = 'ar.bcolz', mode = 'w') #B
y = bc.carray(np.arange(n/2), dtype='float64', rootdir = 'yy.bcolz', mode = 'w') #B,
dax = da.from_array(ar, chunks=(5,5)) #C
dy = da.from_array(y,chunks=(5,5)) #C
XTX = dax.T.dot(dax) #D
Xy = dax.T.dot(dy) #E
coefficients = np.linalg.inv(XTX.compute()).dot(Xy.compute()) #F
coef = da.from_array(coefficients,chunks=(5,5)) #G
ar.flush() #H
y.flush() #H
predictions = dax.dot(coef).compute() #I
print (predictions)
I get ValueError:
ValueError Traceback (most recent call last)
<ipython-input-4-7ae8e9cf2346> in <module>()
10
11 dax = da.from_array(ar, chunks=(5,5)) #C
---> 12 dy = da.from_array(y,chunks=(5,5)) #C
13
14 XTX = dax.T.dot(dax) #D
C:\Users\F\Anaconda3\lib\site-packages\dask\array\core.py in from_array(x, chunks, name, lock, fancy, getitem)
1868 >>> a = da.from_array(x, chunks=(1000, 1000), lock=True) # doctest: +SKIP
1869 """
-> 1870 chunks = normalize_chunks(chunks, x.shape)
1871 if len(chunks) != len(x.shape):
1872 raise ValueError("Input array has %d dimensions but the supplied "
C:\Users\F\Anaconda3\lib\site-packages\dask\array\core.py in normalize_chunks(chunks, shape)
1815 raise ValueError(
1816 "Chunks and shape must be of the same length/dimension. "
-> 1817 "Got chunks=%s, shape=%s" % (chunks, shape))
1818
1819 if shape is not None:
ValueError: Chunks and shape must be of the same length/dimension. Got chunks=(5, 5), shape=(5000,)
What the problem is?

Problem is here:
np.arange(n/2).reshape(n)
you create an array of size n/2 and then try to reshape it to size n. You can't change the size with reshape.
It's probably a copy/paste mistake? It's not in your original code and It seems you're doing
np.arange(n).reshape(n/2,2)
elsewhere, which works as long as n is an even number (be careful, if n isn't even this will also fail.)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy.unique throwing error - python

The problem is with the edges. E.g., if idx=0, nb = indices[max(idx-(bsbs/2), 0):min(idx+(bsbs/2)+1, xsteps*ysteps)] is going to be [0]- i.e., with only one value instead of an xy coordinate. Then you will not be able to flatten your array properly.

Related

Index issue whilst running fmin_l_bfgs

Can't differentiate wrt numpy arrays of dtype int64?

How can I map a vectorized function to a numpy array without using a for loop?

__ValueError: setting an array element with a sequence

ValueError: Chunks and shape must be of the same length/dimension

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy.unique throwing error - python

The problem is with the edges. E.g., if idx=0, nb = indices[max(idx-(bs*bs/2), 0):min(idx+(bs*bs/2)+1, xsteps*ysteps)] is going to be [0]- i.e., with only one value instead of an xy coordinate. Then you will not be able to flatten your array properly.

Related

Index issue whilst running fmin_l_bfgs

Can't differentiate wrt numpy arrays of dtype int64?

How can I map a vectorized function to a numpy array without using a for loop?

__ValueError: setting an array element with a sequence

ValueError: Chunks and shape must be of the same length/dimension

Categories

Resources

The problem is with the edges. E.g., if idx=0, nb = indices[max(idx-(bsbs/2), 0):min(idx+(bsbs/2)+1, xsteps*ysteps)] is going to be [0]- i.e., with only one value instead of an xy coordinate. Then you will not be able to flatten your array properly.