numpy broadcast multiply on condition? - python

I have two arrays, one of shape arr1.shape = (1000,2) and the other of shape arr2.shape = (100,).
I'd like to somehow multiply arr1[:,1]*arr2 where arr1[:,0] == arr2.index so that I get a final shape of arr_out.shape = (1000,). The first column of arr1 is essentially an id where the following condition holds true: set(arr1[:,0]) == set(i for i in range(0,100)), i.e. there is always at least one value index of arr2 found in arr1[:,0].
I can't quite see how to do this in the numpy library but feel there should be a way using numpy multiply, if there was a way to configure the where condition correctly?
I considered perhaps a dummy index dimension for arr2 might help?
A toy example can be produced with the following code snippet
arr2_length = 100
arr1_length = 1000
arr1 = np.column_stack(
(np.random.randint(0,arr2_length,(arr1_length)),
np.random.rand(arr1_length))
)
arr2 = np.random.rand(arr2_length)
# Doesn't work
arr2_b = np.column_stack((
np.arange(arr2_length),
np.random.rand(arr2_length)
))
# np.multiply(arr1[:,1],arr2_b[:,1], where=(arr1[:,0]==arr2_b[:,0]))
One sort of solution I had was to leverage a left join in Pandas to broadcast the smaller array to a same-lengthed array and then multiply, as follows:
df = pd.DataFrame(arr1).set_index(0).join(pd.DataFrame(arr2))
arr_out = (df[0]*df[1]).values
But I'd really like to understand if there's a native numpy way of doing this since I feel using dataframe joins for a multiplication isn't a very readable solution and possibly suffers from excess memory overhead.
Thanks for your help.

I believe this does exactly what you want:
indices, values = arr1[:,0].astype(int), arr1[:,1]
arr_out = values * arr2[indices]

Is this what you're looking for?
import numpy as np
arr1 = np.random.randint(1, 5, (10, 2))
arr2 = np.random.randint(1, 5, (5,))
arr2_sampled = arr2[arr1[:, 0]]
result = arr1[:, 1]*arr2_sampled
Output:
arr1 =
[[4 2]
[3 3]
[2 3]
[3 1]
[2 1]
[2 4]
[1 1]
[4 2]
[4 1]
[3 4]]
arr2 =
[4 1 2 1 2]
arr2_sampled =
[2 1 2 1 2 2 1 2 2 1]
result =
[4 3 6 1 2 8 1 4 2 4]

Related

Substitute row in Numpy if a condition is met - Variation

I am still figuring out Numpy syntax! I have something that works but there must be a more concise way to perform this task. In the example below, I replace selected rows of an array with new entries, where the condition is just on one element.
import numpy as np
big_array = np.random.randint(10, size=(5, 2)) # multi-dimension array
print(big_array)
bad_values = np.less_equal(big_array[:,0], 4) # condition value in one dimension
bad_rows = np.nonzero(bad_values)[0] # indexes to change, e.g. rows
print(f'these are the rows to replace {bad_rows}')
new_rows = np.random.randint(10, size=((bad_rows.size),2))+10 # smaller multi-dim array
np.put(big_array[:,0],bad_rows,y[:,0]) # should be a single line to combine this
np.put(big_array[:,1],bad_rows,y[:,1]) # with this?
print(big_array)
sample output that I want might look like
[[2 4]
[5 9]
[6 6]
[6 7]
[0 6]]
these are the rows to replace [0 4]
[[16 17]
[ 5 9]
[ 6 6]
[ 6 7]
[18 17]]
I don't know how to format put for arguments with different dimensions. This seems like it should be a one-liner. (If I try where I get length issues broadcasting.) What am I missing?

Perform numpy product over non-zero elements of a row

I have a 2d array r. What I want to do is to take the product of each row (excluding the zero elements in that row). For example if I have:
r = [[1 2 0 3 4],
[0 2 5 0 1],
[1 2 3 4 0]]
Then what I want is to have another 2d array result such that:
result = [[24],
[10],
[24]]
How can I achieve this using numpy.prod?
I think I figured it out:
np.prod(r, axis = 1, where = r > 0, keepdims = True)
Output:
array([[24],
[10],
[24]])

Tensorflow scan multiple matrix rows with offset

Question
I want to scan a matrix analogous to Tensorflow's tf.scan(), but using multiple rows at a time. So given a [n, m] matrix, I want to be able to iterate the m rows (with n elements) from i + j to m giving m - j slices of shape [i - j, n].
How can this be achieved?
I know how tf.scan does something like this, returning the accumulated value of each iteration. But I don't think shifting the matrix as multiple inputs solves this, since the values that have an offset cannot be precomputed.
Example
To give an example for n = 3 and m = 5, let's say I have a matrix that looks like the following:
# [[1 0 0]
# [1 1 0]
# [0 0 0] row 3
# [0 0 0] row 4
# [0 0 0]] row 5
matrix_shape = [5, 3]
matrix_idx = tf.constant([[0, 0], [1, 0], [1, 1]])
matrix = tf.scatter_nd(matrix_idx,
tf.ones(tf.shape(matrix_idx)[0],
dtype=tf.int32),
matrix_shape)
I want to apply the following function from row 3 to row 5:
# [[ 1 0 0] ┌ a
# [ 1 1 0] ├ b
# [ 6 4 2] <─┴ output / current line
# [16 12 6]
# [46 34 18]]
def compute(x):
a = x[0]
b = x[1]
return (a + b + 1) * 2
Does Tensorflow have a function specific to this problem?
The following code I wrote does exactly what I wanted.
The important part here is the return of the function used by tf.scan, which not only gives back the current computation c, but also the row from the previous step b. It is therefore important to later cut off this excess from computation by only selecting the later tensor in this list with [1].
#!/usr/bin/env python3
import tensorflow as tf
def compute(x, _):
a = x[0]
b = x[1]
c = (a + b + 1) * 2
return (b, c)
matrix_shape = tf.constant([3, 3])
init_data = [[1, 0, 0], [1, 1, 0]]
initializer = (
tf.constant(init_data[0]),
tf.constant(init_data[1]),
)
matrix = tf.zeros(matrix_shape, dtype=tf.int32)
computation = tf.scan(compute, matrix, initializer)[1]
result = tf.concat((tf.constant(init_data), computation), axis=0)
with tf.Session() as sess:
sess.run(result)
print(result.eval())
Since I'm yet lacking experience: May this solution be bad for performance, because the function is returning a tuple and therefore not using Tensorflow's speed optimizations?

Can't append numpy arrays after for loop?

After a for loop, I can not append each iteration into a single array:
in:
for a in l:
arr = np.asarray(a_lis)
print(arr)
How can I append and return in a single array the above three arrays?:
[[ 0.55133 0.58122 0.66129032 0.67562724 0.69354839 0.70609319
0.6702509 0.63799283 0.61827957 0.6155914 0.60842294 0.60215054
0.59946237 0.625448 0.60215054 0.60304659 0.59856631 0.59677419
0.59408602 0.61021505]
[ 0.58691756 0.6784946 0.64964158 0.66397849 0.67114695 0.66935484
0.67293907 0.66845878 0.65143369 0.640681 0.63530466 0.6344086
0.6281362 0.6281362 0.62634409 0.6281362 0.62903226 0.63799283
0.63709677 0.6978495]
[ 0.505018 0.53405018 0.59408602 0.65143369 0.66577061 0.66487455
0.65412186 0.64964158 0.64157706 0.63082437 0.62634409 0.6218638
0.62007168 0.6648746 0.62096774 0.62007168 0.62096774 0.62007168
0.62275986 0.81362 ]]
I tried to append as a list, using numpy's append, merge, and hstack. None of them worked. Any idea of how to get the previous output?
Use numpy.concatenate to join the arrays:
import numpy as np
a = np.array([[1, 2, 3, 4]])
b = np.array([[5, 6, 7, 8]])
arr = np.concatenate((a, b), axis=0)
print(arr)
# [[1 2 3 4]
# [5 6 7 8]]
Edit1: To do it inside the array (as mentioned in the comment) you can use numpy.vstack:
import numpy as np
for i in range(0, 3):
a = np.random.randint(0, 10, size=4)
if i == 0:
arr = a
else:
arr = np.vstack((arr, a))
print(arr)
# [[1 1 8 7]
# [2 4 9 1]
# [8 4 7 5]]
Edit2: Citing Iguananaut from the comments:
That said, using concatenate repeatedly can be costly. If you know the
size of the output in advance it's better to pre-allocate an array and
fill it as you go.

Operations on 'N' dimensional numpy arrays

I am attempting to generalize some Python code to operate on arrays of arbitrary dimension. The operations are applied to each vector in the array. So for a 1D array, there is simply one operation, for a 2-D array it would be both row and column-wise (linearly, so order does not matter). For example, a 1D array (a) is simple:
b = operation(a)
where 'operation' is expecting a 1D array. For a 2D array, the operation might proceed as
for ii in range(0,a.shape[0]):
b[ii,:] = operation(a[ii,:])
for jj in range(0,b.shape[1]):
c[:,ii] = operation(b[:,ii])
I would like to make this general where I do not need to know the dimension of the array beforehand, and not have a large set of if/elif statements for each possible dimension.
Solutions that are general for 1 or 2 dimensions are ok, though a completely general solution would be preferred. In reality, I do not imagine needing this for any dimension higher than 2, but if I can see a general example I will learn something!
Extra information:
I have a matlab code that uses cells to do something similar, but I do not fully understand how it works. In this example, each vector is rearranged (basically the same function as fftshift in numpy.fft). Not sure if this helps, but it operates on an array of arbitrary dimension.
function aout=foldfft(ain)
nd = ndims(ain);
for k = 1:nd
nx = size(ain,k);
kx = floor(nx/2);
idx{k} = [kx:nx 1:kx-1];
end
aout = ain(idx{:});
In Octave, your MATLAB code does:
octave:19> size(ain)
ans =
2 3 4
octave:20> idx
idx =
{
[1,1] =
1 2
[1,2] =
1 2 3
[1,3] =
2 3 4 1
}
and then it uses the idx cell array to index ain. With these dimensions it 'rolls' the size 4 dimension.
For 5 and 6 the index lists would be:
2 3 4 5 1
3 4 5 6 1 2
The equivalent in numpy is:
In [161]: ain=np.arange(2*3*4).reshape(2,3,4)
In [162]: idx=np.ix_([0,1],[0,1,2],[1,2,3,0])
In [163]: idx
Out[163]:
(array([[[0]],
[[1]]]), array([[[0],
[1],
[2]]]), array([[[1, 2, 3, 0]]]))
In [164]: ain[idx]
Out[164]:
array([[[ 1, 2, 3, 0],
[ 5, 6, 7, 4],
[ 9, 10, 11, 8]],
[[13, 14, 15, 12],
[17, 18, 19, 16],
[21, 22, 23, 20]]])
Besides the 0 based indexing, I used np.ix_ to reshape the indexes. MATLAB and numpy use different syntax to index blocks of values.
The next step is to construct [0,1],[0,1,2],[1,2,3,0] with code, a straight forward translation.
I can use np.r_ as a short cut for turning 2 slices into an index array:
In [201]: idx=[]
In [202]: for nx in ain.shape:
kx = int(np.floor(nx/2.))
kx = kx-1;
idx.append(np.r_[kx:nx, 0:kx])
.....:
In [203]: idx
Out[203]: [array([0, 1]), array([0, 1, 2]), array([1, 2, 3, 0])]
and pass this through np.ix_ to make the appropriate index tuple:
In [204]: ain[np.ix_(*idx)]
Out[204]:
array([[[ 1, 2, 3, 0],
[ 5, 6, 7, 4],
[ 9, 10, 11, 8]],
[[13, 14, 15, 12],
[17, 18, 19, 16],
[21, 22, 23, 20]]])
In this case, where 2 dimensions don't roll anything, slice(None) could replace those:
In [210]: idx=(slice(None),slice(None),[1,2,3,0])
In [211]: ain[idx]
======================
np.roll does:
indexes = concatenate((arange(n - shift, n), arange(n - shift)))
res = a.take(indexes, axis)
np.apply_along_axis is another function that constructs an index array (and turns it into a tuple for indexing).
If you are looking for a programmatic way to index the k-th dimension an n-dimensional array, then numpy.take might help you.
An implementation of foldfft is given below as an example:
In[1]:
import numpy as np
def foldfft(ain):
result = ain
nd = len(ain.shape)
for k in range(nd):
nx = ain.shape[k]
kx = (nx+1)//2
shifted_index = list(range(kx,nx)) + list(range(kx))
result = np.take(result, shifted_index, k)
return result
a = np.indices([3,3])
print("Shape of a = ", a.shape)
print("\nStarting array:\n\n", a)
print("\nFolded array:\n\n", foldfft(a))
Out[1]:
Shape of a = (2, 3, 3)
Starting array:
[[[0 0 0]
[1 1 1]
[2 2 2]]
[[0 1 2]
[0 1 2]
[0 1 2]]]
Folded array:
[[[2 0 1]
[2 0 1]
[2 0 1]]
[[2 2 2]
[0 0 0]
[1 1 1]]]
You could use numpy.ndarray.flat, which allows you to linearly iterate over a n dimensional numpy array. Your code should then look something like this:
b = np.asarray(x)
for i in range(len(x.flat)):
b.flat[i] = operation(x.flat[i])
The folks above provided multiple appropriate solutions. For completeness, here is my final solution. In this toy example for the case of 3 dimensions, the function 'ops' replaces the first and last element of a vector with 1.
import numpy as np
def ops(s):
s[0]=1
s[-1]=1
return s
a = np.random.rand(4,4,3)
print '------'
print 'Array a'
print a
print '------'
for ii in np.arange(a.ndim):
a = np.apply_along_axis(ops,ii,a)
print '------'
print ' Axis',str(ii)
print a
print '------'
print ' '
The resulting 3D array has a 1 in every element on the 'border' with the numbers in the middle of the array unchanged. This is of course a toy example; however ops could be any arbitrary function that operates on a 1D vector.
Flattening the vector will also work; I chose not to pursue that simply because the book-keeping is more difficult and apply_along_axis is the simplest approach.
apply_along_axis reference page

Categories

Resources