Maintaining shape of output as of input after Boolean indexing in python

Maintaining shape of output as of input after Boolean indexing in python - python

I want help in the following problem, plz.
Suppose X = [1 3 0 8
1 4 6 0
2 0 7 8 ]
mask = (X != 0)
mask = [ T T F T
T T T F
T F T T]
X1 = X[(mask,np.newaxis)]
Its output X1 is of shape (9,1)
But i want X1 to be of (3,3), i.e., maintaining the same shape as of X except the masked entries.
X1 = [1 3 8
1 4 6
2 7 8 ]
Can someone help me plz? Thank you.
Every row of X will contain a zero and I don't want to use reshape(). Here is the working
X= np.array([[1,3,0,8],[1,4,6,0],[2,0,7,8]])
mask = (X!=0)
X1=X[(mask,np.newaxis)]
The output X is of shape (9,1). Is there any way that X1 be of (3,3) as mentioned.

I think you might want to start on something easier in python, since your question doesn't even contain correct syntax. I'm hoping this was just a psuedocode attempt. However, here's some code to do the mask you desire.
import numpy as np
X = np.array([1, 3, 0, 8, 1, 4, 6, 0, 2, 0, 7, 8])
indicies_we_want = np.where(X > 0) # Results in an array containing the indicies of X we want to keep
result = np.take(X, indicies_we_want) # Filter by these indicies
result = result.reshape(3, 3) # Reshape to desired result
print result
This code could be condensed considerably, but I wanted to show each step as you have in your question for clarity.
As pointed out in the comments section, the reshape typically isn't a good idea unless you somehow know after filtering out 0s that you'll be left with 9 elements. In the case you described, we certainly know this, but for a given array, not so much.

In [173]: x=[[1,3,0,8],[1,4,6,0],[2,0,7,8]]
In [174]: xa=np.array(x)
solution with reshape:
In [175]: xa[xa!=0].reshape(3,3)
Out[175]:
array([[1, 3, 8],
[1, 4, 6],
[2, 7, 8]])
a solution without reshape:
In [176]: np.array([i[i!=0] for i in xa])
Out[176]:
array([[1, 3, 8],
[1, 4, 6],
[2, 7, 8]])
Obviously both depend on there being only one deletion per row.
You aren't deleting a common column; nothing in your code tells the underlying numpy that the result will be reshapeable. So boolean indexing operates on the flattened array.
In [177]: xa[xa!=0]
Out[177]: array([1, 3, 8, 1, 4, 6, 2, 7, 8])
In [178]: xa.flat[xa.flat!=0]
Out[178]: array([1, 3, 8, 1, 4, 6, 2, 7, 8])
I could throw in an extra 0, and this indexing would still work the same; but the efforts to reshape it to 3x3 will fail.
Keep in mind that the underlying data buffer is flat, 1d, and that it only displays as 2d because of the shape and striding attributes. Selecting elements (or skipping some) will produce a copy, and a 1d copy is just as easy, even faster, than a 2d one. reshape doesn't change the data buffer, just the shape attribute.

Related

How to get max (top) N values across entire numpy matrix

I want to get the top N (maximal) args & values across an entire numpy matrix, as opposed to across a single dimension (rows / columns).
Example input (with N=3):
import numpy as np
mat = np.matrix([[9,8, 1, 2], [3, 7, 2, 5], [0, 3, 6, 2], [0, 2, 1, 5]])
print(mat)
[[9 8 1 2]
[3 7 2 5]
[0 3 6 2]
[0 2 1 5]]
Desired output: [9, 8, 7]
Since max isn't transitive across a single dimension, going by rows or columns doesn't work.
# by rows, no 8
np.squeeze(np.asarray(mat.max(1).reshape(-1)))[:3]
array([9, 7, 6])
# by cols, no 7
np.squeeze(np.asarray(mat.max(0)))[:3]
array([9, 8, 6])
I have code that works, but looks really clunky to me.
# reshape into single vector
mat_as_vector = np.squeeze(np.asarray(mat.reshape(-1)))
# get top 3 arg positions
top3_args = mat_as_vector.argsort()[::-1][:3]
# subset the reshaped matrix
top3_vals = mat_as_vector[top3_args]
print(top3_vals)
array([9, 8, 7])
Would appreciate any shorter way / more efficient way / magic numpy function to do this!

Using numpy.partition() is significantly faster than performing full sort for this purpose:
np.partition(np.asarray(mat), mat.size - N, axis=None)[-N:]
assuming N<=mat.size.
If you need the final result also be sorted (besides being top N), then you need to sort previous result (but presumably you will be sorting a smaller array than the original one):
np.sort(np.partition(np.asarray(mat), mat.size - N, axis=None)[-N:])
If you need the result sorted from largest to lowest, post-pend [::-1] to the previous command:
np.sort(np.partition(np.asarray(mat), mat.size - N, axis=None)[-N:])[::-1]

One way may be with flatten and sorted and slice top n values:
sorted(mat.flatten().tolist()[0], reverse=True)[:3]
Result:
[9, 8, 7]

The idea is from this answer: How to get indices of N maximum values in a numpy array?
import numpy as np
import heapq
mat = np.matrix([[9,8, 1, 2], [3, 7, 2, 5], [0, 3, 6, 2], [0, 2, 1, 5]])
ind = heapq.nlargest(3, range(mat.size), mat.take)
print(mat.take(ind).tolist()[0])
Output
[9, 8, 7]

Zipping and reshaping in Tensorflow

Problem:
Let's say I have two tensors, a and b. Both have the same shape: [?, 10, 4096].
How do I zip the two in a manner that the resulting tensor has a shape of [?, 20, 4096], but also such that the ith element of a comes right before the ith element of b.
Example with lists:
a = [1, 3, 5]
b = [2, 4, 6]
and now I want a tensor that looks like [1, 2, 3, 4, 5, 6] and not [1, 3, 5, 2, 4, 6], which is what would happen if I were to tf.stack the two and then use tf.reshape, right?.
Or perhaps a more general question would be, how do you know in what order tf.reshape reshapes a tensor?

First it looks like, stacking and then reshaping does the job:
import tensorflow as tf
a = tf.constant([1, 3, 5])
b = tf.constant([2, 4, 6])
c = tf.stack([a, b], axis = 1)
d = tf.reshape(c, (-1,))
with tf.Session() as sess:
print(sess.run(c)) # [[1 2],[3 4],[5 6]]
print(sess.run(d)) # [1 2 3 4 5 6]
To answer your second question, TensorFlow reshape operation use the same order than numpy default order, a.k.a C order, quoting from here:
Read the elements of a using this index order, and place the elements into the reshaped array using this index order. ‘C’ means to read / write the elements using C-like index order, with the last axis index changing fastest, back to the first axis index changing slowest.
import numpy as np
a = np.array([1, 3, 5])
b = np.array([2, 4, 6])
c = np.stack([a, b], axis=1)
c.reshape((-1,), order='C') # array([1, 2, 3, 4, 5, 6])

Remove head and tail from numpy array PYTHON

I have a numpy.ndarray, and want to remove first h elements and last t.
As I see, the more general way is by selecting:
h, t = 1, 1
my_array = [0,1,2,3,4,5]
middle = my_array[h:-t]
and the middle is [1,2,3,4]. This is correct, but when I want not to remove anything, I used h = 0 and t = 0 since I was trying to remove nothing, but this returns empty array. I know it is because of t = 0 and I also know that an if condition for this border case would solve it with my_array[h:] but I don't want this solution (my problem is a little more complex, with more dimensions, code will become ugly)
Any ideas?

Instead, use
middle = my_array[h:len(my_array)-t]
For completeness, here's the trial run:
my_array = [0,1,2,3,4,5]
h,t = 0,0
middle = my_array[h:len(my_array)-t]
print(middle)
Output: [0, 1, 2, 3, 4, 5]
This example was just for a standard array. Since your ultimate goal is to work with numpy multidimensional arrays, this problem is actually a bit trickier. When you say you want to remove the first h elements and the last t elements, are we guaranteed that h and t satisfy the proper divisibility criteria so that the result will be a well-formed array?
I actually think the cleanest solution is simply to use this solution, but divide out by the appropriate factor first. For example, in two dimensions:
h = 3
t = 6
a = numpy.array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
d = numpy.prod(numpy.shape(a)[1:])
mid_a = a[int(h/3):int(len(a)-t/3)]
print(mid_a)
Output: array([[4, 5, 6]])
I have included the int casts in the indices because python 3 automatically promotes division to float, even when the numerator evenly divides the denominator.

The i:j can be replaced with a slice object. and ':j' with slice(None,j), etc:
In [55]: alist = [0,1,2,3,4,5]
In [56]: h,t=1,-1; alist[slice(h,t)]
Out[56]: [1, 2, 3, 4]
In [57]: h,t=None,-1; alist[slice(h,t)]
Out[57]: [0, 1, 2, 3, 4]
In [58]: h,t=None,None; alist[slice(h,t)]
Out[58]: [0, 1, 2, 3, 4, 5]
This works for lists and arrays. For multidimensional arrays use a tuple of indices, which can include slice objects
x[i:j, k:l]
x[(slice(i,j), Ellipsis, slice(k,l))]

Numpy 3d array indexing

I have a 3d numpy array (n_samples x num_components x 2) in the example below n_samples = 5 and num_components = 7.
I have another array (indices) which is the selected component for each sample which is of shape (n_samples,).
I want to select from the data array given the indices so that the resulting array is n_samples x 2.
The code is below:
import numpy as np
np.random.seed(77)
data=np.random.randint(low=0, high=10, size=(5, 7, 2))
indices = np.array([0, 1, 6, 4, 5])
#how can I select indices from the data array?
For example for data 0, the selected component should be the 0th and for data 1 the selected component should be 1.
Note that I can't use any for loops because I'm using it in Theano and the solution should be solely based on numpy.

Is this what you are looking for?
In [36]: data[np.arange(data.shape[0]),indices,:]
Out[36]:
array([[7, 4],
[7, 3],
[4, 5],
[8, 2],
[5, 8]])

To get component #0, use
data[:, 0]
i.e. we get every entry on axis 0 (samples), and only entry #0 on axis 1 (components), and implicitly everything on the remaining axes.
This can be easily generalized to
data[:, indices]
to select all relevant components.
But what OP really wants is just the diagonal of this array, i.e. (data[0, indices[0]], (data[1, indices[1]]), ...) The diagonal of a high-dimensional array can be extracted using the diagonal function:
>>> np.diagonal(data[:, indices])
array([[7, 7, 4, 8, 5],
[4, 3, 5, 2, 8]])
(You may need to transpose the result.)

You have a variety of ways to do so, but this is my loop recommendation:
selection = np.array([ datum[indices[k]] for k,datum in enumerate(data)])
The resulting array, selection, has the desired shape.

Concatenate two NumPy arrays vertically

I tried the following:
>>> a = np.array([1,2,3])
>>> b = np.array([4,5,6])
>>> np.concatenate((a,b), axis=0)
array([1, 2, 3, 4, 5, 6])
>>> np.concatenate((a,b), axis=1)
array([1, 2, 3, 4, 5, 6])
However, I'd expect at least that one result looks like this
array([[1, 2, 3],
[4, 5, 6]])
Why is it not concatenated vertically?

Because both a and b have only one axis, as their shape is (3), and the axis parameter specifically refers to the axis of the elements to concatenate.
this example should clarify what concatenate is doing with axis. Take two vectors with two axis, with shape (2,3):
a = np.array([[1,5,9], [2,6,10]])
b = np.array([[3,7,11], [4,8,12]])
concatenates along the 1st axis (rows of the 1st, then rows of the 2nd):
np.concatenate((a,b), axis=0)
array([[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11],
[ 4, 8, 12]])
concatenates along the 2nd axis (columns of the 1st, then columns of the 2nd):
np.concatenate((a, b), axis=1)
array([[ 1, 5, 9, 3, 7, 11],
[ 2, 6, 10, 4, 8, 12]])
to obtain the output you presented, you can use vstack
a = np.array([1,2,3])
b = np.array([4,5,6])
np.vstack((a, b))
array([[1, 2, 3],
[4, 5, 6]])
You can still do it with concatenate, but you need to reshape them first:
np.concatenate((a.reshape(1,3), b.reshape(1,3)))
array([[1, 2, 3],
[4, 5, 6]])
Finally, as proposed in the comments, one way to reshape them is to use newaxis:
np.concatenate((a[np.newaxis,:], b[np.newaxis,:]))

If the actual problem at hand is to concatenate two 1-D arrays vertically, and we are not fixated on using concatenate to perform this operation, I would suggest the use of np.column_stack:
In []: a = np.array([1,2,3])
In []: b = np.array([4,5,6])
In []: np.column_stack((a, b))
array([[1, 4],
[2, 5],
[3, 6]])

A not well known feature of numpy is to use r_. This is a simple way to build up arrays quickly:
import numpy as np
a = np.array([1,2,3])
b = np.array([4,5,6])
c = np.r_[a[None,:],b[None,:]]
print(c)
#[[1 2 3]
# [4 5 6]]
The purpose of a[None,:] is to add an axis to array a.

a = np.array([1,2,3])
b = np.array([4,5,6])
np.array((a,b))
works just as well as
np.array([[1,2,3], [4,5,6]])
Regardless of whether it is a list of lists or a list of 1d arrays, np.array tries to create a 2d array.
But it's also a good idea to understand how np.concatenate and its family of stack functions work. In this context concatenate needs a list of 2d arrays (or any anything that np.array will turn into a 2d array) as inputs.
np.vstack first loops though the inputs making sure they are at least 2d, then does concatenate. Functionally it's the same as expanding the dimensions of the arrays yourself.
np.stack is a new function that joins the arrays on a new dimension. Default behaves just like np.array.
Look at the code for these functions. If written in Python you can learn quite a bit. For vstack:
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)

Suppose you have 3 NumPy arrays (A, B, C). You can contact these arrays vertically like this:
import numpy as np
np.concatenate((A, B, C), axis=1)
np.shape

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Maintaining shape of output as of input after Boolean indexing in python - python

Related

How to get max (top) N values across entire numpy matrix

Zipping and reshaping in Tensorflow

Remove head and tail from numpy array PYTHON

Numpy 3d array indexing

Concatenate two NumPy arrays vertically

Categories

Resources