So I have an array I am trying to slice and index using two other boolean arrays and then set a value on that subset of the array. I saw this post:
Setting values in a numpy arrays indexed by a slice and two boolean arrays
and suspect I am getting a copy instead of a view of my array so it isn't saving the values I am setting on the array. I think I managed to reproduce the problem with a much shorter code, but am a very out of my depth.
#first array
a = np.arange(0,100).reshape(10,10)
#conditional array of same size
b = np.random.rand(10,10)
b = b < 0.8
#create out array of same size as a
out = np.zeros(a.shape)
#define neighborhood to slice values from
nhood = tuple([slice(3,6),slice(5,7)])
#define subset where b == True in neighborhood
subset = b[nhood]
#define values in out that are in the neighborhood but not excluded by b
candidates = out[nhood][subset]
#get third values from neighborhood using math
c = np.random.rand(len(candidates))
#this is in a for loop so this is checking to see a value has already been changed earlier - returns all true now
update_these = candidates < c
#set sliced, indexed subset of array with the values from c that are appropriate
out[nhood][subset][update_these] = c[update_these]
print(out) ##PRODUCES - ARRAY OF ALL ZEROS STILL
I have also tried chaining the boolean index with
out[nhood][(subset)&(update_these)] = c[update_these]
But that made an array of the wrong size.
Help?
Related
Suppose I have a 6x6 matrix I want to add into a 9v9 matrix, but I also want to add it at a specified location and not necessarily in a 6x6 block.
The below code summarizes what I want to accomplish, the only difference is that I want to use variables instead of the rows 0:6 and 3:9.
import numpy as np
a = np.zeros((9,9))
b = np.ones((6,6))
a[0:6,3:9] += b #Inserts the 6x6 ones matrix into the top right corner of the 9x9 zeros
Now using variables:
rows = np.array([0,1,2,3,4,5])
cols = np.array([3,4,5,6,7,8])
a[rows,3:9] += b #This works fine
a[0:6,cols] += b #This also works fine
a[rows,cols += b #But this gives me the following error: ValueError: shape mismatch: value array of shape (6,6) could not be broadcast to indexing result of shape (6,)
I have spent hours reading through forums and trying different solutions but nothing has ever worked. The reason I need to use variables is because these are input by the user and could be any combination of rows and columns. This notation worked perfectly in MatLab, where I could add b into a with any combination of rows and columns.
Explanation:
rows = np.array([0,1,2,3,4,5])
cols = np.array([3,4,5,6,7,8])
a[rows,cols] += b
You could translate the last line to the following code:
for x, y, z in zip(rows, cols, b):
a[x, y] = z
That means: rows contains the x-coordinate, cols the y-coordinate of the field you want to manipulate. Both arrays contain 6 values, so you effectively manipulate 6 values, and b must thus also contain exactly 6 values. But your b contains 6x6 values. Therefore this is a "shape mismatch". This site should contain all you need about indexing of np.arrays.
My apologies if the title seems vague, but I tried my best. In any case, I have a dataframe with three columns, one containing datetime values (for time of observation), another containing the range (distance from instrument at which the observation was made), and the last containing the intensity of the observations. The scatter plot for this data is shown below:
I need to filter out the random isolated 'salt and pepper' observations, and plan to use a median filter to do this. However, I'm not sure how to do this. I've tried to create a 2D array containing intensity values indexed according to the time and range. So 00:00 UT corresponds to row 0 and 0 km corresponds to column 0 and so on... empty positions contain NaNs. I then apply the median filter (scipy's medfilt: scipy.ndimage.median_filter) to this 2D array.
My issue is that it seems inefficient, as I'm having to loop over large series of data to create the array. And, of course, converting the filtered 2D array to a corresponding 1D series is difficult.
Here's the code I am using to obtain a 2D array
def get2DData(df, filt_size):
'''
Implementing this method: We want a 2D array that stores all the LoS
velocities, so that we can ultimately apply median filtering to it.
To do this, iterate over all unique datetime values and all unique range values,
assigning LoS velocity values to the appropriate positions in a 2D array.
'''
arr = np.empty((len(df['time'].unique()), len(df['slist'].unique()), ))
arr[:] = np.nan
times_ = sorted(df['time'].unique())
times_index = np.arange(len(times_))
range_ = sorted(df['slist'].unique())
range_index = np.arange(len(range_))
times_dict = {A: B for A, B in zip(times_, times_index)}
range_dict = {A: B for A, B in zip(range_, range_index)}
times_dict_rev = {A: B for A, B in zip(times_index, times_)}
range_dict_rev = {A: B for A, B in zip(range_index, range_)}
for dt, rng_, v in zip(df['time'].values, df['slist'].values, df['v'].values):
arr[times_dict[dt]][range_dict[rng_]] = v
medfilt_arr = applyFilt(arr,filt_size)
dt_list = []
rng_list = []
v_list = []
for ix,iy in np.ndindex(medfilt_arr.shape):
dt_list.append(times_dict_rev[ix])
rng_list.append(range_dict_rev[iy])
v_list.append(medfilt_arr[ix][iy])
df_filtered = pd.DataFrame({'time': dt_list, 'slist': rng_list,'v': v_list})
return arr, df_filtered
One way to accelerate the creation of the array could be
def get2DArray(df):
alltime = df['time'].to_numpy()
allrange = df['slist'].to_numpy()
unique_time = np.unique(alltime)
unique_rang = np.unique(allrange)
row = unique_time.searchsorted(alltime)
col = unique_rang.searchsorted(allrange)
arr = np.full((len(unique_time), len(unique_rang)), fill_value=np.NaN)
arr[row, col] = df["v"].to_numpy()
return arr
The idea is use the relative position of the time and range in all the possibles values to find the associated row/col, instead of using a dictionary. On some tests that I did, this version was ~5-7 times faster, which is not that much but it's nice already
I have a space of zeros with a variable dimension and an array of ones with a variable dimension, for instance:
import numpy
space = numpy.zeros((1000,5))
a = numpy.ones((150))
I would like to insert the ones of the array inside the matrix in order that those ones will be homogeneously distributed inside the matrix.
You can use numpy.linspace to obtain the indices.
It's not obvious if you'd like to assign a slice of five ones to every index location or just assign to the first index of the slice. This is how both of these would work:
space = numpy.zeros((1000,5))
a = numpy.ones((150, 5))
b = numpy.ones((150,))
index = numpy.rint(numpy.linspace(start=0, stop=999, num=150)).astype(np.int)
# This would assign five ones to every location
space[index] = a
# This would assign a one to the first element at every location
space[index, 0] = b
One question about mask 2-d np.array data.
For example:
one 2-d np.array value in the shape of 20 x 20.
An index t = [(1,2),(3,4),(5,7),(12,13)]
How to mask the 2-d array value by the (y,x) in index?
Usually, replacing with np.nan are based on the specific value like y[y==7] = np.nan
On my example, I want to replace the value specific location with np.nan.
For now, I can do it by:
Creating a new array value_mask in the shape of 20 x 20
Loop the value and testify the location by (i,j) == t[k]
If True, value_mask[i,j] = value[i,j] ; In verse, value_mask[i,j] = np.nan
My method was too bulky especially for hugh data(3 levels of loops).
Are there some efficiency method to achieve that? Any advice would be appreciate.
You are nearly there.
You can pass arrays of indices to arrays. You probably know this with 1D-arrays.
With a 2D-array you need to pass the array a tuple of lists (one tuple for each axis; one element in the lists (which have to be of equal length) for each array-element you want to chose). You have a list of tuples. So you have just to "transpose" it.
t1 = zip(*t)
gives you the right shape of your index array; which you can now use as index for any assignment, for example: value[t1] = np.NaN
(There are lots of nice explanation of this trick (with zip and *) in python tutorials, if you don't know it yet.)
You can use np.logical_and
arr = np.zeros((20,20))
You can select by location, this is just an example location.
arr[4:8,4:8] = 1
You can create a mask the same shape as arr
mask = np.ones((20,20)).astype(bool)
Then you can use the np.logical_and.
mask = np.logical_and(mask, arr == 1)
And finally, you can replace the 1s with the np.nan
arr[mask] = np.nan
Suppose I have a two dimensional numpy array with a given shape and I would like to get a view of the values that satisfy a predicate based on the value's position. That is, if x and y are the column and row index accordingly and a predicate x>y the function should return only the array's values for which the column index is greater than the row index.
The easy way to do is a double loop but I would like a possibly faster (vectorized maybe?) approach?
Is there a better way?
In general, you could do this by constructing an open mesh grid corresponding to the row/column indices, apply your predicate to get a boolean mask, then index into your array using this mask:
A = np.zeros((10,20))
y, x = np.ogrid[:A.shape[0], :A.shape[1]]
mask = x > y
A[mask] = 1
Your specific example happens to be the upper triangle - you can get a copy of it using np.triu, or you can get the corresponding row/column indices using np.triu_indices.