I am still figuring out Numpy syntax! I have something that works but there must be a more concise way to perform this task. In the example below, I replace selected rows of an array with new entries, where the condition is just on one element.
import numpy as np
big_array = np.random.randint(10, size=(5, 2)) # multi-dimension array
print(big_array)
bad_values = np.less_equal(big_array[:,0], 4) # condition value in one dimension
bad_rows = np.nonzero(bad_values)[0] # indexes to change, e.g. rows
print(f'these are the rows to replace {bad_rows}')
new_rows = np.random.randint(10, size=((bad_rows.size),2))+10 # smaller multi-dim array
np.put(big_array[:,0],bad_rows,y[:,0]) # should be a single line to combine this
np.put(big_array[:,1],bad_rows,y[:,1]) # with this?
print(big_array)
sample output that I want might look like
[[2 4]
[5 9]
[6 6]
[6 7]
[0 6]]
these are the rows to replace [0 4]
[[16 17]
[ 5 9]
[ 6 6]
[ 6 7]
[18 17]]
I don't know how to format put for arguments with different dimensions. This seems like it should be a one-liner. (If I try where I get length issues broadcasting.) What am I missing?
I have a 2d array r. What I want to do is to take the product of each row (excluding the zero elements in that row). For example if I have:
r = [[1 2 0 3 4],
[0 2 5 0 1],
[1 2 3 4 0]]
Then what I want is to have another 2d array result such that:
result = [[24],
[10],
[24]]
How can I achieve this using numpy.prod?
I think I figured it out:
np.prod(r, axis = 1, where = r > 0, keepdims = True)
Output:
array([[24],
[10],
[24]])
Question
I want to scan a matrix analogous to Tensorflow's tf.scan(), but using multiple rows at a time. So given a [n, m] matrix, I want to be able to iterate the m rows (with n elements) from i + j to m giving m - j slices of shape [i - j, n].
How can this be achieved?
I know how tf.scan does something like this, returning the accumulated value of each iteration. But I don't think shifting the matrix as multiple inputs solves this, since the values that have an offset cannot be precomputed.
Example
To give an example for n = 3 and m = 5, let's say I have a matrix that looks like the following:
# [[1 0 0]
# [1 1 0]
# [0 0 0] row 3
# [0 0 0] row 4
# [0 0 0]] row 5
matrix_shape = [5, 3]
matrix_idx = tf.constant([[0, 0], [1, 0], [1, 1]])
matrix = tf.scatter_nd(matrix_idx,
tf.ones(tf.shape(matrix_idx)[0],
dtype=tf.int32),
matrix_shape)
I want to apply the following function from row 3 to row 5:
# [[ 1 0 0] ┌ a
# [ 1 1 0] ├ b
# [ 6 4 2] <─┴ output / current line
# [16 12 6]
# [46 34 18]]
def compute(x):
a = x[0]
b = x[1]
return (a + b + 1) * 2
Does Tensorflow have a function specific to this problem?
The following code I wrote does exactly what I wanted.
The important part here is the return of the function used by tf.scan, which not only gives back the current computation c, but also the row from the previous step b. It is therefore important to later cut off this excess from computation by only selecting the later tensor in this list with [1].
#!/usr/bin/env python3
import tensorflow as tf
def compute(x, _):
a = x[0]
b = x[1]
c = (a + b + 1) * 2
return (b, c)
matrix_shape = tf.constant([3, 3])
init_data = [[1, 0, 0], [1, 1, 0]]
initializer = (
tf.constant(init_data[0]),
tf.constant(init_data[1]),
)
matrix = tf.zeros(matrix_shape, dtype=tf.int32)
computation = tf.scan(compute, matrix, initializer)[1]
result = tf.concat((tf.constant(init_data), computation), axis=0)
with tf.Session() as sess:
sess.run(result)
print(result.eval())
Since I'm yet lacking experience: May this solution be bad for performance, because the function is returning a tuple and therefore not using Tensorflow's speed optimizations?
After a for loop, I can not append each iteration into a single array:
in:
for a in l:
arr = np.asarray(a_lis)
print(arr)
How can I append and return in a single array the above three arrays?:
[[ 0.55133 0.58122 0.66129032 0.67562724 0.69354839 0.70609319
0.6702509 0.63799283 0.61827957 0.6155914 0.60842294 0.60215054
0.59946237 0.625448 0.60215054 0.60304659 0.59856631 0.59677419
0.59408602 0.61021505]
[ 0.58691756 0.6784946 0.64964158 0.66397849 0.67114695 0.66935484
0.67293907 0.66845878 0.65143369 0.640681 0.63530466 0.6344086
0.6281362 0.6281362 0.62634409 0.6281362 0.62903226 0.63799283
0.63709677 0.6978495]
[ 0.505018 0.53405018 0.59408602 0.65143369 0.66577061 0.66487455
0.65412186 0.64964158 0.64157706 0.63082437 0.62634409 0.6218638
0.62007168 0.6648746 0.62096774 0.62007168 0.62096774 0.62007168
0.62275986 0.81362 ]]
I tried to append as a list, using numpy's append, merge, and hstack. None of them worked. Any idea of how to get the previous output?
Use numpy.concatenate to join the arrays:
import numpy as np
a = np.array([[1, 2, 3, 4]])
b = np.array([[5, 6, 7, 8]])
arr = np.concatenate((a, b), axis=0)
print(arr)
# [[1 2 3 4]
# [5 6 7 8]]
Edit1: To do it inside the array (as mentioned in the comment) you can use numpy.vstack:
import numpy as np
for i in range(0, 3):
a = np.random.randint(0, 10, size=4)
if i == 0:
arr = a
else:
arr = np.vstack((arr, a))
print(arr)
# [[1 1 8 7]
# [2 4 9 1]
# [8 4 7 5]]
Edit2: Citing Iguananaut from the comments:
That said, using concatenate repeatedly can be costly. If you know the
size of the output in advance it's better to pre-allocate an array and
fill it as you go.
I am attempting to generalize some Python code to operate on arrays of arbitrary dimension. The operations are applied to each vector in the array. So for a 1D array, there is simply one operation, for a 2-D array it would be both row and column-wise (linearly, so order does not matter). For example, a 1D array (a) is simple:
b = operation(a)
where 'operation' is expecting a 1D array. For a 2D array, the operation might proceed as
for ii in range(0,a.shape[0]):
b[ii,:] = operation(a[ii,:])
for jj in range(0,b.shape[1]):
c[:,ii] = operation(b[:,ii])
I would like to make this general where I do not need to know the dimension of the array beforehand, and not have a large set of if/elif statements for each possible dimension.
Solutions that are general for 1 or 2 dimensions are ok, though a completely general solution would be preferred. In reality, I do not imagine needing this for any dimension higher than 2, but if I can see a general example I will learn something!
Extra information:
I have a matlab code that uses cells to do something similar, but I do not fully understand how it works. In this example, each vector is rearranged (basically the same function as fftshift in numpy.fft). Not sure if this helps, but it operates on an array of arbitrary dimension.
function aout=foldfft(ain)
nd = ndims(ain);
for k = 1:nd
nx = size(ain,k);
kx = floor(nx/2);
idx{k} = [kx:nx 1:kx-1];
end
aout = ain(idx{:});
In Octave, your MATLAB code does:
octave:19> size(ain)
ans =
2 3 4
octave:20> idx
idx =
{
[1,1] =
1 2
[1,2] =
1 2 3
[1,3] =
2 3 4 1
}
and then it uses the idx cell array to index ain. With these dimensions it 'rolls' the size 4 dimension.
For 5 and 6 the index lists would be:
2 3 4 5 1
3 4 5 6 1 2
The equivalent in numpy is:
In [161]: ain=np.arange(2*3*4).reshape(2,3,4)
In [162]: idx=np.ix_([0,1],[0,1,2],[1,2,3,0])
In [163]: idx
Out[163]:
(array([[[0]],
[[1]]]), array([[[0],
[1],
[2]]]), array([[[1, 2, 3, 0]]]))
In [164]: ain[idx]
Out[164]:
array([[[ 1, 2, 3, 0],
[ 5, 6, 7, 4],
[ 9, 10, 11, 8]],
[[13, 14, 15, 12],
[17, 18, 19, 16],
[21, 22, 23, 20]]])
Besides the 0 based indexing, I used np.ix_ to reshape the indexes. MATLAB and numpy use different syntax to index blocks of values.
The next step is to construct [0,1],[0,1,2],[1,2,3,0] with code, a straight forward translation.
I can use np.r_ as a short cut for turning 2 slices into an index array:
In [201]: idx=[]
In [202]: for nx in ain.shape:
kx = int(np.floor(nx/2.))
kx = kx-1;
idx.append(np.r_[kx:nx, 0:kx])
.....:
In [203]: idx
Out[203]: [array([0, 1]), array([0, 1, 2]), array([1, 2, 3, 0])]
and pass this through np.ix_ to make the appropriate index tuple:
In [204]: ain[np.ix_(*idx)]
Out[204]:
array([[[ 1, 2, 3, 0],
[ 5, 6, 7, 4],
[ 9, 10, 11, 8]],
[[13, 14, 15, 12],
[17, 18, 19, 16],
[21, 22, 23, 20]]])
In this case, where 2 dimensions don't roll anything, slice(None) could replace those:
In [210]: idx=(slice(None),slice(None),[1,2,3,0])
In [211]: ain[idx]
======================
np.roll does:
indexes = concatenate((arange(n - shift, n), arange(n - shift)))
res = a.take(indexes, axis)
np.apply_along_axis is another function that constructs an index array (and turns it into a tuple for indexing).
If you are looking for a programmatic way to index the k-th dimension an n-dimensional array, then numpy.take might help you.
An implementation of foldfft is given below as an example:
In[1]:
import numpy as np
def foldfft(ain):
result = ain
nd = len(ain.shape)
for k in range(nd):
nx = ain.shape[k]
kx = (nx+1)//2
shifted_index = list(range(kx,nx)) + list(range(kx))
result = np.take(result, shifted_index, k)
return result
a = np.indices([3,3])
print("Shape of a = ", a.shape)
print("\nStarting array:\n\n", a)
print("\nFolded array:\n\n", foldfft(a))
Out[1]:
Shape of a = (2, 3, 3)
Starting array:
[[[0 0 0]
[1 1 1]
[2 2 2]]
[[0 1 2]
[0 1 2]
[0 1 2]]]
Folded array:
[[[2 0 1]
[2 0 1]
[2 0 1]]
[[2 2 2]
[0 0 0]
[1 1 1]]]
You could use numpy.ndarray.flat, which allows you to linearly iterate over a n dimensional numpy array. Your code should then look something like this:
b = np.asarray(x)
for i in range(len(x.flat)):
b.flat[i] = operation(x.flat[i])
The folks above provided multiple appropriate solutions. For completeness, here is my final solution. In this toy example for the case of 3 dimensions, the function 'ops' replaces the first and last element of a vector with 1.
import numpy as np
def ops(s):
s[0]=1
s[-1]=1
return s
a = np.random.rand(4,4,3)
print '------'
print 'Array a'
print a
print '------'
for ii in np.arange(a.ndim):
a = np.apply_along_axis(ops,ii,a)
print '------'
print ' Axis',str(ii)
print a
print '------'
print ' '
The resulting 3D array has a 1 in every element on the 'border' with the numbers in the middle of the array unchanged. This is of course a toy example; however ops could be any arbitrary function that operates on a 1D vector.
Flattening the vector will also work; I chose not to pursue that simply because the book-keeping is more difficult and apply_along_axis is the simplest approach.
apply_along_axis reference page