__ValueError: setting an array element with a sequence - python

I was trying to calculate the trends of temperature
ntimes, ny, nx = tempF.shape
print tempF.shape
trend = MA.zeros((ny,nx),dtype=float)
print trend.shape
for y in range (ny):
for x in range(nx):
trends[y,x] = numpy.polyfit(tdum, tempF[:,y,x],1)
print trend()
the result is
(24, 241, 480)
(241, 480)
ValueErrorTraceback (most recent call last)
<ipython-input-31-4ac068601e48> in <module>()
12 for y in range (0,ny):
13 for x in range (0,nx):
---> 14 trend[y,x] = numpy.polyfit(tdum, tempF[:,y,x],1)
15
16
/home/charcoalp/anaconda2/envs/pyn_test/lib/python2.7/site-packages/numpy/ma/core.pyc in __setitem__(self, indx, value)
3272 if _mask is nomask:
3273 # Set the data, then the mask
-> 3274 _data[indx] = dval
3275 if mval is not nomask:
3276 _mask = self._mask = make_mask_none(self.shape, _dtype)
ValueError: setting an array element with a sequence.
I've just used python for few days, can any one help me, thank you

When you create ny by nx zeros ndarray, you can specify, which type you want to store in it's elements. If you want to store 1x2 array of float values (polyfit with degree=1 returns 1x2 array of floats) in each cell of your zeros array, you can choose a following type instead of float:
trend = numpy.zeros((ny,nx), dtype='2f')
After that, you can easily store your arrays, as elements of trend ndarray

Related

Dask looping over library function call

Goal
I would like to parallelize a loop with dask that uses a library function inside the loop. This function, mhw.detect(), calculates some statistics on a slice of a numpy array. None of the slices of the array depend on the other slices, so I was hoping that dask could be used to compute them in parallel and store them all in the same output array.
Code
The flow of the code I am working on is:
import numpy as np
import marineHeatWaves as mhw
from dask import delayed
# Create fake input data
lat_size, long_size = 100, 100
data = np.random.random_integers(0, 30, size=(10_000, long_size, lat_size)) # size = (time, longitude, latitude)
time = np.arange(730_000, 740_000) # time in ordinal days
# Initialize an empty array to hold the output
output_array = np.empty(data.shape)
# loop through each pixel in the data array
for idx_lat in range(lat_size):
for idx_long in range(long_size):
# Extract a slice of data
data_slice = data[:, idx_lat, idx_long]
# Use the library function to calculate the stats for the pixel
# `library_output` is a dictionary that has a numpy array inside it
_, library_output = delayed(mhw.detect)(time, data_slice)
# Update the output array with the calculated values from the library
output_array[:, idx_lat, idx_long] = library_output['seas']
Previous efforts
When I run this code I get the error TypeError: Delayed objects of unspecified length are not iterable. Another stack overflow post discusses this issue and resolves the issue by converting the output of the delayed function to a delayed object. However, because I didn't create the output object myself I am not sure if I can convert it to a delayed object.
I've also tried wrapping the last line in da.from_delayed(), as in output_array[:, idx_lat, idx_long] = da.from_delayed(library_output['seas']) and initalizing the output_array with da.empty(data.shape). I get the same error, though, since I think the code doesn't make it past the line with the library function delayed(mhw.detect)(time, data_slice).
Is it possible to parallelize this? Is this approach of asking dask to compute all the slices in parallel and put them together in an output array even a reasonable approach?
Full Traceback
TypeError Traceback (most recent call last)
/home/rwegener/mhw-ocetrac-census/notebooks/ejoliver_subset_MUR.ipynb Cell 44' in <cell line: 10>()
13 data_slice = data[:, idx_lat, idx_long]
14 # Use the library function to calculate the stats for the pixel
---> 15 _, point_clim = delayed(mhw.detect)(time_ordinal, data_slice)
16 # Update the output array with the calculated values from the library
17 output_array[:, idx_lat, idx_long] = point_clim['seas']
File ~/.conda/envs/dask/lib/python3.10/site-packages/dask/delayed.py:581, in Delayed.__iter__(self)
579 def __iter__(self):
580 if self._length is None:
--> 581 raise TypeError("Delayed objects of unspecified length are not iterable")
582 for i in range(self._length):
583 yield self[i]
TypeError: Delayed objects of unspecified length are not iterable
Update
Using .apply_along_axis() as suggested:
# Create fake input data
lat_size, long_size = 100, 100
data = np.random.randint(0, 30, size=(10_000, long_size, lat_size)) # size = (time, longitude, latitude)
data = dask.array.from_array(data, chunks=(-1, 100, 100))
time = np.arange(730_000, 740_000) # time in ordinal days
# Initialize an empty array to hold the output
output_array = np.empty(data.shape)
# define a wrapper to rearrange arguments
def func1d(arr, time, shape=(10000,)):
print(arr.shape)
return mhw.detect(time, arr)
res = dask.array.apply_along_axis(func1d, 0, data, time=time)
With the output:
(1,)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/homes/metogra/rwegener/mhw-ocetrac-census/notebooks/ejoliver_subset_MUR.ipynb Cell 48' in <cell line: 15>()
12 print(arr.shape)
13 return mhw.detect(time, arr)
---> 15 res = dask.array.apply_along_axis(func1d, 0, data, time=time)
File ~/.conda/envs/dask/lib/python3.10/site-packages/dask/array/routines.py:508, in apply_along_axis(func1d, axis, arr, dtype, shape, *args, **kwargs)
506 if shape is None or dtype is None:
507 test_data = np.ones((1,), dtype=arr.dtype)
--> 508 test_result = np.array(func1d(test_data, *args, **kwargs))
509 if shape is None:
510 shape = test_result.shape
/homes/metogra/rwegener/mhw-ocetrac-census/notebooks/ejoliver_subset_MUR.ipynb Cell 48' in func1d(arr, time, shape)
11 def func1d(arr, time, shape=(10000,)):
12 print(arr.shape)
---> 13 return mhw.detect(time, arr)
File ~/.conda/envs/dask/lib/python3.10/site-packages/marineHeatWaves-0.28-py3.10.egg/marineHeatWaves.py:280, in detect(t, temp, climatologyPeriod, pctile, windowHalfWidth, smoothPercentile, smoothPercentileWidth, minDuration, joinAcrossGaps, maxGap, maxPadLength, coldSpells, alternateClimatology, Ly)
278 tt = tt[tt>=0] # Reject indices "before" the first element
279 tt = tt[tt<TClim] # Reject indices "after" the last element
--> 280 thresh_climYear[d-1] = np.nanpercentile(tempClim[tt.astype(int)], pctile)
281 seas_climYear[d-1] = np.nanmean(tempClim[tt.astype(int)])
282 # Special case for Feb 29
IndexError: index 115 is out of bounds for axis 0 with size 1
Rather than using delayed, this seems like a good case for dask.array.
You can create the dask array by partitioning the numpy array:
da = dask.array.from_array(output_array, chunks=(-1, 10, 10))
Now you can call mhw.detect using dask.array.map_blocks alongside np.apply_along_axis within each block:
# define a wrapper to rearrange arguments
def func1d(arr, time):
return mhw.detect(time, arr)
def block_func(block, **kwargs):
return np.apply_along_axis(func1d, 0, block, **kwargs)
res = data.map_blocks(block_func, meta=data, time=time)
res = res.compute()
The map_blocks answer above works great! Additionally, apply_along_axis() was suggested and discussed in comments. I was able to get that method to work, but in order for it to function properly you need to use both the dtype and shape inputs to da.apply_along_axis(). If these aren't supplied the function can't figure out the shape of the data it should pass as an argument.
So, another solution:
import dask.array as da
# Create fake input data
lat_size, long_size = 100, 100
data = da.random.random_integers(0, 30, size=(1_000, long_size, lat_size), chunks=(-1, 10, 10)) # size = (time, longitude, latitude)
time = np.arange(730_000, 731_000) # time in ordinal days
# define a wrapper to rearrange arguments
def func1d(arr, time):
return mhw.detect(time, arr)
result = da.apply_along_axis(func1d, 0, data, time=time, dtype=data.dtype, shape=(1000,))
result.compute()

python - IndexError: index 9 is out of bounds for axis 0 with size 1

I'm trying to run the following code. Unfortunately, I get an error that I'm not able to solve myself yet.
import numpy as np
import random
def reallyrandom(arg1, arg2, arg3):
int1=int(arg1)
int2=int(arg2)
int3=int(arg3)
np.random.seed(42)
x=np.random.randint(0,10, size=int1)
print(x)
print(x.shape)
y=x*int2
print(y)
print(y.shape)
z=y[int3]
print(z)
print(f"Your random value is {z}")
reallyrandom(1,2,9)
Error:
IndexError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15232/4039400629.py in <module>
23
24 #reallyrandom(59,2,7)
---> 25 reallyrandom(1,2,9)
~\AppData\Local\Temp/ipykernel_15232/4039400629.py in reallyrandom(arg1, arg2, arg3)
16 print(y)
17 print(y.shape)
---> 18 z=y[int3]
19 #print(z)
20
IndexError: index 9 is out of bounds for axis 0 with size 1
The problem seems to start in defining value z in line z=y[int3]
I have no idea how to solve it. Could someone explain what I'm doing wrong?
I found on the internet that it is an index error?
Thank you in advance!
The line z = y[int3] is attempting to get the 9th value from the array y (because the index int3 is 9), but there's only one value in the array. This line:
x = np.random.randint(0, 10, size=int1)
is creating an array with only one random value in it (because the value of int1 is 1). If you want to create an array of 10 random numbers, for example, use:
x = np.random.randint(0, 10, size=10)
Alternatively, you can use some other variable for size, but it needs to be larger than the index you pass in z = y[int3] or you'll get the same IndexError.

Index Error: Index 206893 is out of bounds for axis 0 with size 206893, griddata issue

I have an issue for the last 4 days trying to understand a python error:
`enter code here`IndexError: index 206893 is out of bounds for axis 0 with size 206893
when applying, griddata and "nearest" interpolation method using the following lines:
create a matrix where I will store the first interpolated file
tempnew = np.ones((np.asarray(w1[0,0,:,:]).shape))*np.nan
The lon, lat coordinate points of the original grid
lonl,latl = np.meshgrid(lon,lat)
points = np.vstack((np.array(lonl).flatten(),np.array(latl).flatten())).transpose()
The values of the original file
values = np.array([np.asarray(temp[0,0,:,:])]).flatten()
The dimensions of the grid that I want to interpolate to
lons = np.array(nav_lon)
lats = np.array(nav_lat)
X,Y = np.meshgrid(lons,lats)
Interpolation
tempnew = griddata(points,values, (X,Y), method = "nearest",fill_value=-3)
Here the dimension of each of the variables that I use above:
#tempnew.shape: (728, 312) #(Dimensions of tempnew is (lats,lons))
#lat.shape: (661,) #(original latitude)
#lon.shape: (313,) #(original longitude)
#points.shape: (206893, 2)
#values.shape: (206893,)
#X.shape: (728, 312)
#Y.shape: (728, 312)
Can you help me? * I would like to note here that the original file grid is regular (A-type) grid data whereas the grid to which I want to interpolate to is not regular (C-grid data)
The error looks like this:
In [36]: tempnew = sp.interpolate.griddata(points,values, (X,Y), method = "nearest
...: ",fill_value=-3)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-36-0d0b46a3542f> in <module>
----> 1 tempnew = sp.interpolate.griddata(points,values, (X,Y), method =
"nearest",fill_value=-3)
~/software/anaconda3/envs/mhw/lib/python3.7/site-packages/scipy/interpolate/ndgriddata.py in
griddata(points, values, xi, method, fill_value, rescale)
217 elif method == 'nearest':
218 ip = NearestNDInterpolator(points, values, rescale=rescale)
--> 219 return ip(xi)
220 elif method == 'linear':
221 ip = LinearNDInterpolator(points, values, fill_value=fill_value,
~/software/anaconda3/envs/mhw/lib/python3.7/site-packages/scipy/interpolate/ndgriddata.py in
__call__(self, *args)
79 xi = self._scale_x(xi)
80 dist, i = self.tree.query(xi)
---> 81 return self.values[i]
82
83
IndexError: index 206893 is out of bounds for axis 0 with size 206893
Thanks in advance,
Sofi
I encountered this error in my Python code using the scipy.interpolate.NearestNDInterpolator class. The error message that is returned is not very clear. In the end, I found that one of the values I was inserting into my interpolant had a value of 1e184 and caused this error message. After resetting this value to 0.0, my Python script ran successfully.

ValueError: Chunks and shape must be of the same length/dimension

I read book "Introducing Data Science. Big data, machine learning, and more, using Python tools"
There is a code in Chapter4 about blocking matrix calculation:
import dask.array as da
import bcolz as bc
import numpy as np
import dask
n = 1e4 #A
ar = bc.carray(np.arange(n).reshape(n/2,2) , dtype='float64', rootdir = 'ar.bcolz', mode = 'w') #B
y = bc.carray(np.arange(n/2), dtype='float64', rootdir = 'yy.bcolz', mode = 'w') #B,
dax = da.from_array(ar, chunks=(5,5)) #C
dy = da.from_array(y,chunks=(5,5)) #C
XTX = dax.T.dot(dax) #D
Xy = dax.T.dot(dy) #E
coefficients = np.linalg.inv(XTX.compute()).dot(Xy.compute()) #F
coef = da.from_array(coefficients,chunks=(5,5)) #G
ar.flush() #H
y.flush() #H
predictions = dax.dot(coef).compute() #I
print (predictions)
I get ValueError:
ValueError Traceback (most recent call last)
<ipython-input-4-7ae8e9cf2346> in <module>()
10
11 dax = da.from_array(ar, chunks=(5,5)) #C
---> 12 dy = da.from_array(y,chunks=(5,5)) #C
13
14 XTX = dax.T.dot(dax) #D
C:\Users\F\Anaconda3\lib\site-packages\dask\array\core.py in from_array(x, chunks, name, lock, fancy, getitem)
1868 >>> a = da.from_array(x, chunks=(1000, 1000), lock=True) # doctest: +SKIP
1869 """
-> 1870 chunks = normalize_chunks(chunks, x.shape)
1871 if len(chunks) != len(x.shape):
1872 raise ValueError("Input array has %d dimensions but the supplied "
C:\Users\F\Anaconda3\lib\site-packages\dask\array\core.py in normalize_chunks(chunks, shape)
1815 raise ValueError(
1816 "Chunks and shape must be of the same length/dimension. "
-> 1817 "Got chunks=%s, shape=%s" % (chunks, shape))
1818
1819 if shape is not None:
ValueError: Chunks and shape must be of the same length/dimension. Got chunks=(5, 5), shape=(5000,)
What the problem is?
Problem is here:
np.arange(n/2).reshape(n)
you create an array of size n/2 and then try to reshape it to size n. You can't change the size with reshape.
It's probably a copy/paste mistake? It's not in your original code and It seems you're doing
np.arange(n).reshape(n/2,2)
elsewhere, which works as long as n is an even number (be careful, if n isn't even this will also fail.)

numpy.unique throwing error

I'm trying to use the following function:
def randomChose(bp, xsteps, ysteps, bs):
# Number of points to be chosen
s = int((bp * xsteps * ysteps) / (bs * bs))
# Generating an array representing the input indexes
indices = numpy.arange(xsteps * ysteps)
# Resampling without replacement
cs = npr.choice(indices, size=s, replace=False)
f = []
for idx in cs:
nb = indices[max(idx-(bs*bs/2), 0):min(idx+(bs*bs/2)+1, xsteps*ysteps)]
f.append(nb)
f = numpy.array(f).flatten()
fix = numpy.unique(numpy.array(f))
return fix
Which takes as parameter a number bp, the data dimension xsteps * ysteps and a % bs.
What I want to do is to choose a number of valid indexes considering some neighborhood in this image.
However, I keep receiving error when calling numpy.unique, not always, though:
ValueError Traceback (most recent call last)
<ipython-input-35-1b5914c3cbc7> in <module>()
9 svf_y = []
10 for s in range(samples):
---> 11 fix = randomChose(bp, xsteps, ysteps, bs)
12 rs_z0, rs_z1, rs_z2 = interpolate(len(fix), xsteps, ysteps, mean_rs)
13 ds_z0, ds_z1, ds_z2 = interpolate(len(fix), xsteps, ysteps, mean_ds)
<ipython-input-6-def08adce84b> in randomChose(bp, xsteps, ysteps, bs)
14 f.append(nb)
15 f = numpy.array(f).flatten()
---> 16 fix = numpy.unique(numpy.array(f))
17
18 return f
/usr/local/lib/python2.7/dist-packages/numpy/lib/arraysetops.pyc in unique(ar, return_index, return_inverse, return_counts)
198 ar.sort()
199 aux = ar
--> 200 flag = np.concatenate(([True], aux[1:] != aux[:-1]))
201
202 if not optional_returns:
ValueError: all the input arrays must have same number of dimensions
This is how I call it:
nx = 57.2
ny = 24.0
xsteps = 144
ysteps = 106
bs = 5 # Block size
bp = 0.1 # Percentage of blocks
fix = randomChose(bp, xsteps, ysteps, bs)
I'm trying to understand what is wrong. As far as I understood, such method expect a ndarray as input, which is being given.
Thank you for any help.
First:
f.append(nb)
should become:
f.append(list(nb))
That makes f a list of lists, that Numpy will have a chance to convert to a Numpy array of integers, BUT ONLY if all the lists have the same length. If not, you will only have a one dimension Numpy array of lists, and flatten() will have no effect.
You may add a
print(type(f[0]))
after the flattening.
The problem is with the edges. E.g., if idx=0,
nb = indices[max(idx-(bs*bs/2), 0):min(idx+(bs*bs/2)+1, xsteps*ysteps)]
is going to be [0]- i.e., with only one value instead of an xy coordinate. Then you will not be able to flatten your array properly.

Categories

Resources