Python: how to make conditional operation in an array - python

I have an numpy array M of dimension NxM and a dataframe tmp containing the information of the cell of the array.
If I have to add values to the cell of M, I do
M[tmp.a, tmp.b] = tmp1.n
However I would like to add the values only to those cells in which M < tmp.n, something like
M[M[tmp.a, tmp.b] < tmp1.n] = tmp1.n
I solved in this way
s = shape(M)
M0 = np.zeros((s[1], s[0]))
M0[tmp1.a, tmp1.b] += tmp1.n
idx = np.where(M < M0)
M[idx[:][0], idx[:][1]] = M0[idx[:][0], idx[:][1]]

If I understood you correctly you may do something like:
M[tmp.a, tmp.b] = max(tmp1.n, M[tmp.a, tmp.b])

This can be done using Numpy logical indexing
# a logical (boolean) array
log = M < tmp.n
# apply it to source and target and use `+=` to add the values
M[log] += tmp.n[log]
If the arrays don't have the same shape then you can also pick a specific dimension:
log = M[:, 0] < tmp.n
# apply it to source and target and use `+=` to add the values
M[log, 0] += tmp.n[log]

Related

Removing indexes to match array dimensions

I have two arrays (x, y) with different values and I am trying to find the median for y for values in x < 100. My problem is that I have filtered out some values in array x so the arrays are not the same shape. Is there a way I can remove the those indexes that I removed in array y in array x?
For example that they both are 24, 36 but after the filtering array y is 22, 32 and x is still 24, 36. How can I remove the same indexes? lets say I removed index 4, 7 and 9, 14. How can I remove those exact same ones in array x?
My code if needed. data_mg is y and data_dg is x.
data_mg = image_data_mg[0].data[0:x, 0:y].astype('float')
data_err = image_data_err[0].data[0:x, 0:y].astype('float')
data_dg = image_data_dg[0].data[0:x, 0:y].astype('float')
data_mg[data_mg == 0] = np.nan
data_err[data_err == 0] = np.nan
data_dg[data_dg == 0] = np.nan
data_mg = data_mg[data_mg/data_err > 2]
data_dg = np.ndarray.flatten(data_dg)
data_dg = data_dg[data_mg]
data_mg = np.ndarray.flatten(data_mg)
data_mg = data_mg[np.logical_not(np.isnan(data_mg))]
data_dg = np.ndarray.flatten(data_dg)
data_dg = data_dg[np.logical_not(np.isnan(data_dg))]
b = np.where(np.array(data_dg > 100))
median = np.median(data_mg[b])
print('Flux median at dispersion > 100 km/s is ' + str(median))
a = np.where(data_dg <= 100)
median1 = np.median(data_mg[a])
print('Flux median at dispersion <= 100 km/s is ' + str(median1))
IndexError: arrays used as indices must be of integer (or boolean) type, line 10
It looks like data_mg and data_dg start with the same shape and you use boolean indexing to keep the values that are not na in each. The trouble is that different values are nan in each array. I would suggest making a combined index that you can use for both arrays.
data_mg = np.ndarray.flatten(data_mg)
data_dg = np.ndarray.flatten(data_dg)
ix_mg = np.logical_not(np.isnan(data_mg))
ix_dg = np.logical_not(np.isnan(data_dg))
ix_combined = np.logical_and(ix_mg, ix_dg)
data_mg = data_mg[ix_combined]
data_dg = data_dg[ix_combined]
First, you could just do the same indexing operation on each array so they'll be of the same shape. I believe that would look something like this:
idx = data_mg / data_err > 2
data_mg = data_mg[idx]
data_df = data_dg[idx)
But the error you're getting may not be due to this. It looks like your error is coming from the line:
data_dg = data_dg[data_mg]
Giving the error:
IndexError: arrays used as indices must be of integer (or boolean) type, line 10
I'm not sure what your intent is here, so I'm not sure what to recommend. If this is you trying to get them to be the same shape, the lines I included above should do that for you.

How to insert an elements from vector into matrix based on array of random indexes

Basicly, im trying to insert an elements from vector into matrix based on random index
size = 100000
answer_count = 4
num_range = int(1e4)
a = torch.randint(-num_range, num_range, size=(size, ))
b = torch.randint(-num_range, num_range, size=(size, ))
answers = torch.randint(-num_range, num_range, size=(size, answer_count))
for i in range(size): answers[i, np.random.randint(answer_count)] = a[i] + b[i]
I tried something like
c = a + b
pos = torch.randint(answer_count, size=(size, ))
answers[:, pos] = c
But i'm certainly doing something wrong
I think you need to change the last line like this:
answers[np.arange(size), pos] = c
The problem lies in incorrect use of advanced indexing. To understand the difference of those indexing try printing out answers[:, pos] vs. answers[np.arange(size), pos] and you will see why the previous one does not work. answers[np.arange(size), pos] selects each pos with a single row while answers[:, pos] selects ALL rows with each pos. More information on advanced indexing in numpy doc here.

How do I make this function able to use a numpy array as an argument and return an array in python?

How do I make this function able to use a numpy array as an argument and return an array of the same size in which tan() was applied element-wise in python?
My current code is shown below, but it does not return a complete array for both options. How do I create an output array with tanc() values?
def tanc(x):
if x == 0:
return 1
else:
return np.tan(x)/x
want an output such as:
array([ 1.0, 0.27323654e+00, -4.89610183e-17])
You can use numpy.where, and the where parameter to np.divide and np.tan.
np.where(cond, a, b) gives an array where values from a are used for elements of cond that are truthy, and elements of b for the falsy elements of cond.
np.divide and np.tan's where argument tells them to only do their operation at locations that are true in another array, and leave some the other elements uninitialized (so they could be anything, but it doesn't matter, because we're not going to use them here).
nonzero = x != 0 # we only care about places where x isn't 0
# Get tan, then divide by x, but only where x is not 0
nonzero_tan = np.tan(x, where=nonzero)
nonzero_tanc = np.divide(nonzero_tan, x, where=nonzero)
# Where x is not zero, use tan(x)/x, and use 1 everywhere else
tanc = np.where(nonzero, nonzero_tanc, 1)
As suggested by hpaulj in their comment, you can combine the last two steps by also using the out parameter of np.divide to define the default values of the output array:
nonzero = x != 0
nonzero_tan = np.tan(x, where=nonzero)
tanc = np.divide(nonzero_tan, x, out=np.ones_like(x), where=nonzero)
Use a mask to encode your condition for each element:
mask = (x != 0)
You can apply numpy operations to the portions of the data that satisfy your condition:
output = np.zeros(x.shape, dtype=float)
output[~mask] = 1
output[mask] = tan(x[mask]) / x[mask]
All together (with reduced redundant operations):
def tanc(x):
output = np.zeros(x.shape, dtype=float)
output[~mask] = 1
selected = x[mask]
output[mask] = tan(selected) / selected
return output
Post Scriptum
#jirasaimok's excellent answer is, in my option, a more elegant (numpythonic if you will) way to accomplish the same thing: avoid more than one computation per element, and avoid zero division. I would suggest that their answer can be further enhanced by using the out keyword of tan and divide to avoid allocating and copying unnecessary temporary arrays:
def tanc(x):
mask = (x != 0)
output = np.tan(x, where=mask)
np.divide(output, x, where=mask, out=output)
output[~mask] = 1
return output
Or better yet:
def tanc(x):
mask = (x != 0)
output = np.tan(x, where=mask, out=np.ones(x.shape, float))
return np.divide(output, x, where=mask, out=output)
You could simply do:
def tanc(x):
return np.sinc(x/np.pi)/np.cos(x)
def tanc(x):
if x == 0:
return 1
else:
return np.tan(x)/x
def return_array(some_array):
return np.array(list(map(tanc, some_array)))

filling numpy array by index

I have a function which gives me the index for a given value. Eg,
def F(value):
index = do_something(value)
return index
I want to use this index to fill a huge numpy array by 1s. Lets call array features
l = [1,4,2,3,7,5,3,6,.....]
NOTE: features.shape[0] = len(l)
for i in range(features.shape[0]):
idx = F(l[i])
features[i, idx] = 1
Is there a pythonic way to perform this (as the loop takes a lot of time if the array is huge)?
If you can vectorize F(value) you could write something like
indices = np.arange(features.shape[0])
feature_indices = F(l)
features.flat[indices, feature_indices] = 1
try this:
i = np.arange(features.shape[0]) # rows
j = np.vectorize(F)(np.array(l)) # columns
features[i,j] = 1

Row, column assignment without for-loop

I wrote a small script to assign values to a numpy array by knowing their row and column coordinates:
gridarray = np.zeros([3,3])
gridarray_counts = np.zeros([3,3])
cols = np.random.random_integers(0,2,15)
rows = np.random.random_integers(0,2,15)
data = np.random.random_integers(0,9,15)
for nn in np.arange(len(data)):
gridarray[rows[nn],cols[nn]] += data[nn]
gridarray_counts[rows[nn],cols[nn]] += 1
In fact, then I know how many values are stored in the same grid cell and what the sum is of them. However, performing this on arrays of lengths 100000+ it is getting quite slow. Is there another way without using a for-loop?
Is an approach similar to this possible? I know this is not working yet.
gridarray[rows,cols] += data
gridarray_counts[rows,cols] += 1
I would use bincount for this, but for now bincount only takes 1darrays so you'll need to write your own ndbincout, something like:
def ndbincount(x, weights=None, shape=None):
if shape is None:
shape = x.max(1) + 1
x = np.ravel_multi_index(x, shape)
out = np.bincount(x, weights, minlength=np.prod(shape))
out.shape = shape
return out
Then you can do:
gridarray = np.zeros([3,3])
cols = np.random.random_integers(0,2,15)
rows = np.random.random_integers(0,2,15)
data = np.random.random_integers(0,9,15)
x = np.vstack([rows, cols])
temp = ndbincount(x, data, gridarray.shape)
gridarray = gridarray + temp
gridarray_counts = ndbincount(x, shape=gridarray.shape)
You can do this directly:
gridarray[(rows,cols)]+=data
gridarray_counts[(rows,cols)]+=1

Categories

Resources