ValueError: Chunks and shape must be of the same length/dimension - python

I read book "Introducing Data Science. Big data, machine learning, and more, using Python tools"
There is a code in Chapter4 about blocking matrix calculation:
import dask.array as da
import bcolz as bc
import numpy as np
import dask
n = 1e4 #A
ar = bc.carray(np.arange(n).reshape(n/2,2) , dtype='float64', rootdir = 'ar.bcolz', mode = 'w') #B
y = bc.carray(np.arange(n/2), dtype='float64', rootdir = 'yy.bcolz', mode = 'w') #B,
dax = da.from_array(ar, chunks=(5,5)) #C
dy = da.from_array(y,chunks=(5,5)) #C
XTX = dax.T.dot(dax) #D
Xy = dax.T.dot(dy) #E
coefficients = np.linalg.inv(XTX.compute()).dot(Xy.compute()) #F
coef = da.from_array(coefficients,chunks=(5,5)) #G
ar.flush() #H
y.flush() #H
predictions = dax.dot(coef).compute() #I
print (predictions)
I get ValueError:
ValueError Traceback (most recent call last)
<ipython-input-4-7ae8e9cf2346> in <module>()
10
11 dax = da.from_array(ar, chunks=(5,5)) #C
---> 12 dy = da.from_array(y,chunks=(5,5)) #C
13
14 XTX = dax.T.dot(dax) #D
C:\Users\F\Anaconda3\lib\site-packages\dask\array\core.py in from_array(x, chunks, name, lock, fancy, getitem)
1868 >>> a = da.from_array(x, chunks=(1000, 1000), lock=True) # doctest: +SKIP
1869 """
-> 1870 chunks = normalize_chunks(chunks, x.shape)
1871 if len(chunks) != len(x.shape):
1872 raise ValueError("Input array has %d dimensions but the supplied "
C:\Users\F\Anaconda3\lib\site-packages\dask\array\core.py in normalize_chunks(chunks, shape)
1815 raise ValueError(
1816 "Chunks and shape must be of the same length/dimension. "
-> 1817 "Got chunks=%s, shape=%s" % (chunks, shape))
1818
1819 if shape is not None:
ValueError: Chunks and shape must be of the same length/dimension. Got chunks=(5, 5), shape=(5000,)
What the problem is?

Problem is here:
np.arange(n/2).reshape(n)
you create an array of size n/2 and then try to reshape it to size n. You can't change the size with reshape.
It's probably a copy/paste mistake? It's not in your original code and It seems you're doing
np.arange(n).reshape(n/2,2)
elsewhere, which works as long as n is an even number (be careful, if n isn't even this will also fail.)

Related

Using autograd of compute Jacobian of a matrix ran into error

Can someone enlighten me why the following code to compute the jacobian of kernel matrix doesn't work:
import autograd.numpy as np
# import numpy as np
from autograd import grad
from autograd import jacobian
from numpy import linalg as LA
def kernel(x1,x2,l):
return np.exp(-((x1-x2)**2).sum()/(2*(l**2)))
def kernel_matrixx(top_k_history):
k_t_X_list = []
for i in range(k-1):
# print(kernel(top_k_history[i],observation,l))
k_t_X_list.append(np.expand_dims(np.expand_dims((kernel(top_k_history[0],top_k_history[i+1],l)), axis=0), axis=0))
# print(k_t_X_list[0].item())
# k_t_X = np.expand_dims(np.asarray(k_t_X_list), axis=0)
k_t_X = np.expand_dims(np.expand_dims((kernel(top_k_history[0],top_k_history[0],l)), axis=0), axis=0)
for i in range(k-1):
# temp = np.expand_dims(np.expand_dims(np.asarray(kernel(observation,top_k_history[i+1],l)), axis=0), axis=0)
k_t_X = np.concatenate([k_t_X, k_t_X_list[i]], axis=1)
k_t_X_first = k_t_X
k_t_X_list_list = []
for j in range(k-1):
k_t_X_list = []
for i in range(k-1):
# print(kernel(top_k_history[i],observation,l))
k_t_X_list.append(np.expand_dims(np.expand_dims((kernel(top_k_history[j+1],top_k_history[i+1],l)), axis=0), axis=0))
# print(k_t_X_list[0].item())
# k_t_X = np.expand_dims(np.asarray(k_t_X_list), axis=0)
k_t_X = np.expand_dims(np.expand_dims((kernel(top_k_history[j+1],top_k_history[0],l)), axis=0), axis=0)
for i in range(k-1):
# temp = np.expand_dims(np.expand_dims(np.asarray(kernel(observation,top_k_history[i+1],l)), axis=0), axis=0)
k_t_X = np.concatenate([k_t_X, k_t_X_list[i]], axis=1)
k_t_X_list_list.append(k_t_X)
for i in range(k-1):
k_t_X_first = np.concatenate([k_t_X_first, k_t_X_list_list[i]], axis=0)
return k_t_X_first
k=10
l=19
top_k_history = []
for i in range(10):
top_k_history.append(np.random.rand(10))
jac = jacobian(kernel_matrixx)
jac(top_k_history)
the error I got is:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15016/2419460232.py in <module>
1 jac = jacobian(kernel_matrixx)
----> 2 jac(top_k_history)
~\Anaconda3\envs\unlearning\lib\site-packages\autograd\wrap_util.py in nary_f(*args, **kwargs)
18 else:
19 x = tuple(args[i] for i in argnum)
---> 20 return unary_operator(unary_f, x, *nary_op_args, **nary_op_kwargs)
21 return nary_f
22 return nary_operator
~\Anaconda3\envs\unlearning\lib\site-packages\autograd\differential_operators.py in jacobian(fun, x)
57 vjp, ans = _make_vjp(fun, x)
58 ans_vspace = vspace(ans)
---> 59 jacobian_shape = ans_vspace.shape + vspace(x).shape
60 grads = map(vjp, ans_vspace.standard_basis())
61 return np.reshape(np.stack(grads), jacobian_shape)
TypeError: can only concatenate tuple (not "list") to tuple
I am already aware that I cannot create a zero matrix (or identity matrix) then fill in the value with nested for loop. Therefore I create np.array and then concat them. I use the same approach for compute the grad of some other output of the same kernel matrix and it did work so I'm not sure why it didn't work for Jacobian.
Edit: the error now should be reproducible
There is a datatype problem. I your code top_k_history is of type list and contains 10 1D-arrays, each of length 10. If you convert this into 1 2D-array of shape (10, 10), then the error should vanish:
# <original code except the last line>
top_k_history = np.array(top_k_history) # new
jac(top_k_history) # original last line

LinAlgError: 0-dimensional array given. Array must be at least two-dimensional

I am getting the above error, the error message complains about a 0-dimensional array
------------------------------------------------------------------- --------
LinAlgError Traceback (most recent call last)
<ipython-input-110-2e59b52b853b> in <module>()
8
9 # compute det and C_N
---> 10 const = np.sqrt(np.linalg.det((dof-2)/dof)*C_N)
11 print(const)
<__array_function__ internals> in det(*args, **kwargs)
/anaconda3/lib/python3.6/site-packages/numpy/linalg/linalg.py in det(a)
2110 """
2111 a = asarray(a)
-> 2112 _assert_stacked_2d(a)
2113 _assert_stacked_square(a)
2114 t, result_t = _commonType(a)
/anaconda3/lib/python3.6/site-packages/numpy/linalg/linalg.py in _assert_stacked_2d(*arrays)
205 if a.ndim < 2:
206 raise LinAlgError('%d-dimensional array given. Array must be '
--> 207 'at least two-dimensional' % a.ndim)
208
209 def _assert_stacked_square(*arrays):
LinAlgError: 0-dimensional array given. Array must be at least two-dimensional
My code is:
import numpy as np
npts = 5000
dof = 3
X_r = np.arange(npts)
product = X_r * X_r.transpose()
Rowsum = [np.sum(product[i]) for i in range(npts)]
C_N = np.sum(Rowsum)/(npts - 1)
# compute det and C_N
const = np.sqrt(np.linalg.det(((dof-2)/dof)*C_N))
print(const)
Many thanks to the intrepid who care and dare to read through all this.

Mean of 5 X 5 neighbor pixels of raster in Python

I have multiples raster with pixel resolution 10 X 10 meters and the raster value is in float format and I want to keep the out raster value in float format only.
I want to calculate 5 x 5 window pixel averaging to remove the noise from images and create a smooth raster in python. I worked out on a below code from different sources of stackoverflow, but it not working and its producing error.
I would appreciate any help.
Below Code:
import time
import glob
import os
import gdal
import osr
import numpy as np
start_time_script = time.clock()
path_ras=r'D:\Firm_SM\F1A/'
for rasterfile in glob.glob(os.path.join(path_ras,'*.tif')):
rasterfile_name=str(rasterfile[rasterfile.find('IMG'):rasterfile.find('.tif')])
print ('Processing:'+ ' ' + str(rasterfile_name))
ds = gdal.Open(rasterfile,gdal.GA_ReadOnly)
ds_xform = ds.GetGeoTransform()
print (ds_xform)
ds_driver = gdal.GetDriverByName('Gtiff')
srs = osr.SpatialReference()
#srs.ImportFromEPSG(4726)
ds_array = ds.ReadAsArray()
sz = ds_array.itemsize
print ('This is the size of the neighbourhood:' + ' ' + str(sz))
h,w = ds_array.shape
print ('This is the size of the Array:' + ' ' + str(h) + ' ' + str(w))
bh, bw = 5,5
shape = (h/bh, w/bw, bh, bw)
print ('This is the new shape of the Array:' + ' ' + str(shape))
strides = sz*np.array([w*bh,bw,w,1])
blocks = np.lib.stride_tricks.as_strided(ds_array,shape=shape,strides=strides)
resized_array = ds_driver.Create(rasterfile_name + '_resized_to_52m.tif',shape[1],shape[0],1,gdal.GDT_Float32)
resized_array.SetGeoTransform((ds_xform[0],ds_xform[1]*2,ds_xform[2],ds_xform[3],ds_xform[4],ds_xform[5]*2))
resized_array.SetProjection(srs.ExportToWkt())
band = resized_array.GetRasterBand(1)
zero_array = np.zeros([shape[0],shape[1]],dtype=np.float32)
print ('I start calculations using neighbourhood')
start_time_blocks = time.clock()
for i in xrange(len(blocks)):
for j in xrange(len(blocks[i])):
zero_array[i][j] = np.mean(blocks[i][j])
print ('I finished calculations and I am going to write the new array')
band.WriteArray(zero_array)
end_time_blocks = time.clock() - start_time_blocks
print ('Image Processed for:' + ' ' + str(end_time_blocks) + 'seconds' + '\n')
end_time = time.clock() - start_time_script
print ('Program ran for: ' + str(end_time) + 'seconds')
Error massage:
This is the size of the neighbourhood: 4
This is the size of the Array: 106 144
This is the new shape of the Array: (21.2, 28.8, 5, 5)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-13-fbee5d0233b0> in <module>
21 strides = sz*np.array([w*bh,bw,w,1])
22
---> 23 blocks = np.lib.stride_tricks.as_strided(ds_array,shape=shape,strides=strides)
24
25 resized_array = ds_driver.Create(rasterfile_name + '_resized_to_52m.tif',shape[1],shape[0],1,gdal.GDT_Float32)
c:\python37\lib\site-packages\numpy\lib\stride_tricks.py in as_strided(x, shape, strides, subok, writeable)
101 interface['strides'] = tuple(strides)
102
--> 103 array = np.asarray(DummyArray(interface, base=x))
104 # The route via `__interface__` does not preserve structured
105 # dtypes. Since dtype should remain unchanged, we set it explicitly.
c:\python37\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, order)
83
84 """
---> 85 return array(a, dtype, copy=False, order=order)
86
87
TypeError: 'float' object cannot be interpreted as an integer

Can't differentiate wrt numpy arrays of dtype int64?

I am a newbie to numpy. Today when I use it to work with linear regression, it shows as below:
KeyError Traceback (most recent call
last)
~/anaconda3/lib/python3.6/site-packages/autograd/numpy/numpy_extra.py
in new_array_node(value, tapes)
84 try:
---> 85 return array_dtype_mappings[value.dtype](value, tapes)
86 except KeyError:
KeyError: dtype('int64')
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call
last)
<ipython-input-4-aebe8f7987b0> in <module>()
24 return cost/float(np.size(y))
25
---> 26 weight_h, cost_h = gradient_descent(least_squares, alpha,
max_its, w)
27
28 # a)
<ipython-input-2-1b74c4f818f4> in gradient_descent(g, alpha, max_its,
w)
12 for k in range(max_its):
13 # evaluate the gradient
---> 14 grad_eval = gradient(w)
15
16 # take gradient descent step
~/anaconda3/lib/python3.6/site-packages/autograd/core.py in
gradfun(*args, **kwargs)
19 #attach_name_and_doc(fun, argnum, 'Gradient')
20 def gradfun(*args,**kwargs):
---> 21 return
backward_pass(*forward_pass(fun,args,kwargs,argnum))
22 return gradfun
23
~/anaconda3/lib/python3.6/site-packages/autograd/core.py in
forward_pass(fun, args, kwargs, argnum)
57 tape = CalculationTape()
58 arg_wrt = args[argnum]
---> 59 start_node = new_node(safe_type(getval(arg_wrt)),
[tape])
60 args = list(args)
61 args[argnum] = merge_tapes(start_node, arg_wrt)
~/anaconda3/lib/python3.6/site-packages/autograd/core.py in
new_node(value, tapes)
185 def new_node(value, tapes=[]):
186 try:
--> 187 return Node.type_mappings[type(value)](value, tapes)
188 except KeyError:
189 return NoDerivativeNode(value, tapes)
~/anaconda3/lib/python3.6/site-packages/autograd/numpy/numpy_extra.py
in new_array_node(value, tapes)
85 return array_dtype_mappings[value.dtype](value, tapes)
86 except KeyError:
---> 87 raise TypeError("Can't differentiate wrt numpy arrays
of dtype {0}".format(value.dtype))
88 Node.type_mappings[anp.ndarray] = new_array_node
89
TypeError: Can't differentiate wrt numpy arrays of dtype int64
I really have no idea about what is happened. I guess it might be related to the structure of array in numpy. Or did I forget to download any packages? Below is my original codes.
# import statements
datapath = 'datasets/'
from autograd import numpy as np
# import automatic differentiator to compute gradient module
from autograd import grad
# gradient descent function
def gradient_descent(g,alpha,max_its,w):
# compute gradient module using autograd
gradient = grad(g)
# run the gradient descent loop
weight_history = [w] # weight history container
cost_history = [g(w)] # cost function history container
for k in range(max_its):
# evaluate the gradient
grad_eval = gradient(w)
# take gradient descent step
w = w - alpha*grad_eval
# record weight and cost
weight_history.append(w)
cost_history.append(g(w))
return weight_history,cost_history
# load in dataset
csvname = datapath + 'kleibers_law_data.csv'
data = np.loadtxt(csvname,delimiter=',')
# get input and output of dataset
x = data[:-1,:]
y = data[-1:,:]
x = np.log(x)
y = np.log(y)
#Data Initiation
alpha = 0.01
max_its = 1000
w = np.array([0,0])
#linear model
def model(x, w):
a = w[0] + np.dot(x.T, w[1:])
return a.T
def least_squares(w):
cost = np.sum((model(x,w)-y)**2)
return cost/float(np.size(y))
weight_h, cost_h = gradient_descent(least_squares, alpha, max_its, w)
# a)
k = np.linspace(-5.5, 7.5, 250)
y = weight_h[max_its][0] + k*weight_h[max_its][1]
plt.figure()
plt.plot(x, y, label='Linear Line', color='g')
plt.xlabel('log of mass')
plt.ylabel('log of metabolic rate')
plt.title("Answer Of a")
plt.legend()
plt.show()
# b)
w0 = weight_h[max_its][0]
w1 = weight_h[max_its][1]
print("Nonlinear relationship between the body mass x and the metabolic
rate y is " /
+ str(w0) + " + " + "log(xp)" + str(w1) + " = " + "log(yp)")
# c)
x2 = np.log(10)
Kj = np.exp(w0 + w1*x2)*1000/4.18
print("It needs " + str(Kj) + " calories")
Could someone help me to figure it out? Thanks a lot.
Here's the important parts of your error:
---> 14 grad_eval = gradient(w)
...
Type Error: Can't differentiate wrt numpy arrays of dtype int64
Your gradient function is saying it doesn't like to differentiate arrays of ints, which makes some sense, since it probably wants more precision than an int can give. You probably need them to be doubles or floats. For a simple solution to this, I believe you can just change your initializer from:
w = np.array([0,0])
which is going to automatically cast those 0s as ints, to:
w = np.array([0.0,0.0])
Those decimals after the 0 will let it know you want floats. There's other ways to go about telling it what kind of array you want (https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.array.html), but this is a simple way.

numpy.unique throwing error

I'm trying to use the following function:
def randomChose(bp, xsteps, ysteps, bs):
# Number of points to be chosen
s = int((bp * xsteps * ysteps) / (bs * bs))
# Generating an array representing the input indexes
indices = numpy.arange(xsteps * ysteps)
# Resampling without replacement
cs = npr.choice(indices, size=s, replace=False)
f = []
for idx in cs:
nb = indices[max(idx-(bs*bs/2), 0):min(idx+(bs*bs/2)+1, xsteps*ysteps)]
f.append(nb)
f = numpy.array(f).flatten()
fix = numpy.unique(numpy.array(f))
return fix
Which takes as parameter a number bp, the data dimension xsteps * ysteps and a % bs.
What I want to do is to choose a number of valid indexes considering some neighborhood in this image.
However, I keep receiving error when calling numpy.unique, not always, though:
ValueError Traceback (most recent call last)
<ipython-input-35-1b5914c3cbc7> in <module>()
9 svf_y = []
10 for s in range(samples):
---> 11 fix = randomChose(bp, xsteps, ysteps, bs)
12 rs_z0, rs_z1, rs_z2 = interpolate(len(fix), xsteps, ysteps, mean_rs)
13 ds_z0, ds_z1, ds_z2 = interpolate(len(fix), xsteps, ysteps, mean_ds)
<ipython-input-6-def08adce84b> in randomChose(bp, xsteps, ysteps, bs)
14 f.append(nb)
15 f = numpy.array(f).flatten()
---> 16 fix = numpy.unique(numpy.array(f))
17
18 return f
/usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.pyc in unique(ar, return_index, return_inverse, return_counts)
198 ar.sort()
199 aux = ar
--> 200 flag = np.concatenate(([True], aux[1:] != aux[:-1]))
201
202 if not optional_returns:
ValueError: all the input arrays must have same number of dimensions
This is how I call it:
nx = 57.2
ny = 24.0
xsteps = 144
ysteps = 106
bs = 5 # Block size
bp = 0.1 # Percentage of blocks
fix = randomChose(bp, xsteps, ysteps, bs)
I'm trying to understand what is wrong. As far as I understood, such method expect a ndarray as input, which is being given.
Thank you for any help.
First:
f.append(nb)
should become:
f.append(list(nb))
That makes f a list of lists, that Numpy will have a chance to convert to a Numpy array of integers, BUT ONLY if all the lists have the same length. If not, you will only have a one dimension Numpy array of lists, and flatten() will have no effect.
You may add a
print(type(f[0]))
after the flattening.
The problem is with the edges. E.g., if idx=0,
nb = indices[max(idx-(bs*bs/2), 0):min(idx+(bs*bs/2)+1, xsteps*ysteps)]
is going to be [0]- i.e., with only one value instead of an xy coordinate. Then you will not be able to flatten your array properly.

Categories

Resources