Repeat a NumPy array in multiple dimensions at once? - python

np.repeat(np.repeat([[1, 2, 3]], 3, axis=0), 3, axis=1)
works as expected and produces
array([[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3]])
However,
np.repeat([[1, 2, 3]], [3, 3])
and
np.repeat([[1, 2, 3]], [3, 3], axis=0)
produce errors.
Is it possible to repeat an array in multiple dimensions at once?

First off, I think the original method you propose is totally fine. It's readable, it makes sense, and it's not very slow.
You could use the repeat method instead of function which reads a bit more nicely:
>>> x.repeat(3, 1).repeat(3, 0)
array([[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3]])
With numpy's broadcasting rules, there's likely dozens of ways to create the repeated data and throw it around into the shape you want, too. One approach could be to use np.broadcast_to() and repeat the data in D+1 dimensions, where D is the dimension you need, and then collapse it down to D.
For example:
>>> x = np.array([[1, 2, 3]])
>>> np.broadcast_to(x.T, (3, 3, 3)).reshape((3, 9))
array([[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3]])
And without reshaping (so that you don't need to know the final length):
>>> np.hstack(np.broadcast_to(x, (3, 3, 3)).T)
array([[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3],
[1, 1, 1, 2, 2, 2, 3, 3, 3]])
And there's likely a dozen other ways to do this. But I still think your original version is more idiomatic, as throwing it into extra dimensions to collapse it down is weird.

It isn't possible, see repeat. But you are using a array with the shape (1,3), so you have to use:
np.repeat(X, [2], axis=0)
because np.repeat(X, [2,2], axis=0) needs shape (2,3), e.g.
X = np.array([[1, 2, 3], [5, 6, 7]])
np.repeat(X, [2, 5], axis=0)
the output looks like:
[[1 2 3]
[1 2 3]
[5 6 7]
[5 6 7]
[5 6 7]
[5 6 7]]
This means [2,5] stands for [2, 5]:2x first row and [2, 5]:5x second row (shape: (2, *doesn't matter*) because axis=0 means you want to repeat the rows.
Therefore you first have to generate an array with the dimensions (3, *), and then produce the next array.
If you want to repeat your array:
np.repeat(X2, [5], axis=0)
produces:
[[1 2 3]
[1 2 3]
[1 2 3]
[1 2 3]
[1 2 3]]
because you have only a 1-dimensional array.
The first call of np.repeat produces a 2D-array, the second call duplicates the columns. If you want to use np.repeat(X2, [5], axis=0) you get the same result as you have mentioned in your post above, because you have to call np.repeat a second time on the output of np.repeat(X2, [5], axis=0).
In my opinion your use of np.repeat is the easiest and best way to achieve your output.
Edit: Hopefully the answer is now more clearly

Related

Permute a single row or column of a matrix

I have a large matrix where I want to permute (or shift) one row of it.
For example:
np.array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
And the desired shifting output is: (for the second row by 1, for that example)
np.array([[1, 2, 3, 4],
[2, 3, 4, 1],
[1, 2, 3, 4],
[1, 2, 3, 4]])
This can be done naively by extracting the row of interest, permute and stick it back in the matrix.
I want a better solution that is in-place and efficient.
How to shift desired row or column by n places?
How to permute (change the order as desired)?
Can this be done efficiently for more than 1 row? for example shift the i'th row i places forward:
np.array([[1, 2, 3, 4],
[2, 3, 4, 1],
[3, 4, 1, 2],
[4, 1, 2, 3]])
You can do it indexing by slicing the rows and rolling them:
import numpy as np
a = np.array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
shift = 2
rows = [1, 3]
a[rows] = np.roll(a[rows], shift, axis=1)
array([[1, 2, 3, 4],
[3, 4, 1, 2],
[1, 2, 3, 4],
[3, 4, 1, 2]])

Indexing numpy array with index array of lower dim yields array of higher dim than both

a = np.zeros((5,4,3))
v = np.ones((5, 4), dtype=int)
data = a[v]
shp = data.shape
This code gives shp==(5,4,4,3)
I don't understand why. How can a larger array be output? makes no sense to me and would love an explanation.
This is known as advanced indexing. Advanced indexing allows you to select arbitrary elements in the input array based on an N-dimensional index.
Let's use another example to make it clearer:
a = np.random.randint(1, 5, (5,4,3))
v = np.ones((5, 4), dtype=int)
Say in this case a is:
array([[[2, 1, 1],
[3, 4, 4],
[4, 3, 2],
[2, 2, 2]],
[[4, 4, 1],
[3, 3, 4],
[3, 4, 2],
[1, 3, 1]],
[[3, 1, 3],
[4, 3, 1],
[2, 1, 4],
[1, 2, 2]],
...
By indexing with an array of np.ones:
print(v)
array([[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]])
You will simply be indexing a with 1 along the first axis as many times as v. Putting it in another way, when you do:
a[1]
[[4, 4, 1],
[3, 3, 4],
[3, 4, 2],
[1, 3, 1]]
You're indexing along the first axis, as no indexing is specified along the additional axes. It is the same as doing a[1, ...], i.e taking a full slice along the remaining axes. Hence by indexing with a 2D array of ones, you will have the above 2D array (5, 4) times stacked together, resulting in an ndarray of shape (5, 4, 4, 3). Or in other words, a[1], of shape (4,3), stacked 5*4=20 times.
Hence, in this case you'd be getting:
array([[[[4, 4, 1],
[3, 3, 4],
[3, 4, 2],
[1, 3, 1]],
[[4, 4, 1],
[3, 3, 4],
[3, 4, 2],
[1, 3, 1]],
...
the value of v is:
[[1 1 1 1]
[1 1 1 1]
[1 1 1 1]
[1 1 1 1]
[1 1 1 1]]
every single 1 indexes a complete "row" in a, but every "element" in said "row" is a matrix. so every "row" in v indexes a "row" of "matrix"es in a.
(does this make any sense to you..?)
so you get 5 * 4 1s, each is a 4*3 "matrix".
if instead of zeroes you define a as a = np.arange(5*4*3).reshape((5, 4, 3))
it might be easier to understand, because you get to see which parts of a are being chosen:
import numpy as np
a = np.arange(5*4*3).reshape((5, 4, 3))
v = np.ones((5,4), dtype=int)
data = a[v]
print(data)
(output is pretty long, I don't want to paste it here)

Numpy: Imposing row dependent maximum on array

Suppose I have the following array:
a = [[1, 4, 2, 3]
[3, 1, 5, 4]
[4, 3, 1, 2]]
What I'd like to do is impose a maximum value on the array, but have that maximum vary by row. For instance if I wanted to limit the 1st and 3rd row to a maximum value of 3, and the 2nd row to a value of 4, I could create something like:
[[1, 3, 2, 3]
[3, 1, 4, 4]
[3, 3, 1, 2]
Is there any better way than just looping over each row individually and setting it with 'nonzero'?
With numpy.clip (using the method version here):
a.clip(max=np.array([3, 4, 3])[:, None]) # np.clip(a, ...)
# array([[1, 3, 2, 3],
# [3, 1, 4, 4],
# [3, 3, 1, 2]])
Generalized:
def clip_2d_rows(a, maxs):
maxs = np.asanyarray(maxs)
if maxs.ndim == 1:
maxs = maxs[:, np.newaxis]
return np.clip(a, a_min=None, a_max=maxs)
You might be safer using the module-level function (np.clip) rather than the class method (np.ndarray.clip). The former uses a_max as a parameter, while the latter uses the builtin max as a parameter which is never a great idea.
With masking -
In [50]: row_lims = np.array([3,4,3])
In [51]: np.where(a > row_lims[:,None], row_lims[:,None], a)
Out[51]:
array([[1, 3, 2, 3],
[3, 1, 4, 4],
[3, 3, 1, 2]])
With
>>> a
array([[1, 4, 2, 3],
[3, 1, 5, 4],
[4, 3, 1, 2]])
Say you have
>>> maxs = np.array([[3],[4],[3]])
>>> maxs
array([[3],
[4],
[3]])
What about doing
>>> a.clip(max=maxs)
array([[1, 3, 2, 3],
[3, 1, 4, 4],
[3, 3, 1, 2]])

How do I use numpy to form a 2D array from another array's columns(dimension is 2*4) given an array of indices of column number efficiently

I'm trying to make an array of 2 by n using numpy, elements inside come from specific columns selected by an array of column numbers.
For example if I have something like this
[[1, 2, 3],
[2, 3, 4]]
as my input array, and i want to have columns
[2,3,1,2,3],
i will get
[[2, 3, 1, 2, 3],
[3, 4, 2, 3, 4]]
as my output array
You want to slice along the second dimension. However, keep in mind that numpy uses zero based indexing. You'll need [1, 2, 0, 1, 2] instead of [2, 3, 1, 2, 3]
a = np.array([
[1, 2, 3],
[2, 3, 4]])
a[:, [1, 2, 0, 1, 2]]
array([[2, 3, 1, 2, 3],
[3, 4, 2, 3, 4]])
​

How can I take out(or slice) the elements in a rank-2 tensor , whose first element is unique?

My title might be ambiguous due to my awkward English. But I mean this:
suppose i have a tensor a like this:
array([[1, 2, 3],
[2, 2, 3],
[2, 2, 4],
[3, 2, 3],
[4, 2, 3]], dtype=int32)
the 'first column' of this tensor could contain duplicate elements (e.g. [1, 2, 2, 3, 4] or [1, 1, 2, 3, 3, 4, 5, 5]), and which element is duplicated is not known beforehand.
and i wanna take out a tensor this:
array([[1, 2, 3],
[2, 2, 3],
[3, 2, 3],
[4, 2, 3]], dtype=int32)
as u can see, I take out the rows whose first element is a unique element in the column of a.
I first wanted to use the function tf.unique() . BUT the idx value returned by it doesn't indicate the first index of each value of output tensor in the original tensor.
tf.unique() works like this:
# tensor 'x' is [1, 1, 2, 3, 3, 3, 7, 8, 8]
y, idx = tf.unique(x)
y ==> [1, 2, 3, 7, 8]
idx ==> [0, 0, 1, 2, 2, 2, 3, 4, 4]
The function tf.unique(x, name=None) finds the unique elements in a 1-D tensor. And it now returns two value: y and idx. y contains all of the unique elements of x sorted inthe same order that they occur in x. idx contains the index of each value of x in the unique output y.
How I wish it has a third return value which contains the first index of each value of y in the original tensor x is also needed. It might work like this:
# tensor 'x' is [1, 1, 2, 3, 3, 3, 7, 8, 8]
y, idx, idx_ori = tf.unique(x)
y ==> [1, 2, 3, 7, 8]
idx ==> [0, 0, 1, 2, 2, 2, 3, 4, 4]
idx_ori ==> [0, 2, 3, 6, 7]
Just like its equivalent in Numpy does:
array 'x' is [1, 1, 2, 3, 3, 3, 7, 8, 8]
y, idx_ori = np.unique(x, return_index=True)
y ==> [1, 2, 3, 7, 8]
idx_ori ==> [0, 2, 3, 6, 7]
IF i have this idx_ori, i can solve my problem by tf.gather():
_, _1, idx_ori = tf.unique(a[:, 0])
result = tf.gather(a, idx_ori)
Any idea to workaround this problem? or any idea to get this indices that i want.
P.S. I know my description is tediously long ... :-p
This is a bit gross, but you could do:
print a
y, idx = tf.unique(a[:,0])
z = tf.one_hot(idx, tf.shape(y)[0])
s = tf.cumsum(z)
e = tf.equal(s, 1) # only seen once so far
ss = tf.to_int32(e) * tf.to_int32(z) # and we equal the thing
m = tf.reduce_max(ss, reduction_indices=1)
out = tf.boolean_mask(a, tf.equal(m, 1))
sess = tf.Session()
print sess.run(out)
[[1 2 3]
[2 2 3]
[2 2 4]
[3 2 3]
[4 2 3]]
[[1 2 3]
[2 2 3]
[3 2 3]
[4 2 3]]

Categories

Resources