Numpy: operations on columns (or rows) of NxM array - python

This may be a silly question, but I've just started using numpy and I have to figure out how to perform some simple operations.
Suppose that I have the 2x3 array
array([[1, 3, 5],
[2, 4, 6]])
And that I want to perform some operation on the first column, for example subtract 1 to all the elements to get
array([[0, 3, 5],
[1, 4, 6]])
How can I perform such an operation?

arr
# array([[1, 3, 5],
# [2, 4, 6]])
arr[:,0] = arr[:,0] - 1 # choose the first column here, subtract one and
# assign it back to the same column
arr
# array([[0, 3, 5],
# [1, 4, 6]])

Related

Comparing three numpy arrays so wherever one is not a given value, the other are not that given value either

Is there an effective way in which to compare all three numpy arrays at once?
For example, if the given value to check is 5, wherever the value is not 5, it should be not 5 for all three arrays.
The only way I've thought of how to do this would be checking that occurrences that arr1 != 5 & arr2 == 5 is 0. However this only checks one direction between the two arrays, and then I need to also incorporate arr3. This seems inefficient and might end up with some logical hole.
This should pass:
arr1 = numpy.array([[1, 7, 3],
[4, 5, 6],
[4, 5, 2]])
arr2 = numpy.array([[1, 2, 3],
[4, 5, 6],
[8, 5, 6]])
arr3 = numpy.array([[1, 1, 3],
[4, 5, 6],
[9, 5, 6]])
However this should fail due to arr2 having a 3 where other arrays have 5s
arr1 = numpy.array([[1, 2, 3],
[8, 5, 6],
[4, 5, 6]])
arr2 = numpy.array([[1, 2, 3],
[2, 3, 1],
[2, 5, 6]])
arr3 = numpy.array([[1, 2, 3],
[4, 5, 6],
[4, 5, 3]])
There is a general solution (regardless number of arrays). And it's quite educational:
import numpy as np #a recommended way of import
arr = np.array([arr1, arr2, arr3])
is_valid = np.all(arr==5, axis=0) == np.any(arr==5, axis=0) #introduce axis
out = np.all(is_valid)
#True for the first case, False for the second one
Is this a valid solution?
numpy.logical_and(((arr1==5)==(arr2==5)).all(), ((arr2==5)==(arr3==5)).all())
You could AND all comparisons to 5 and compare to any one of the comparisons:
A = (arr1==5)
(A==(A&(arr2==5)&(arr3==5))).all()
Output: True for the first example, False for the second
NB. This works for any number of arrays

Asymmetric slicing python

Consider the following matrix:
X = np.arange(9).reshape(3,3)
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Let say I want to subset the following array
array([[0, 4, 2],
[3, 7, 5]])
It is possible with some indexing of rows and columns, for instance
col=[0,1,2]
row = [[0,1],[1,2],[0,1]]
Then if I store the result in a variable array I can do it with the following code:
array=np.zeros([2,3],dtype='int64')
for i in range(3):
array[:,i]=X[row[i],col[i]]
Is there a way to broadcast this kind of operation ? I have to do this as a data cleaning stage for a large file ~ 5 Gb, and I would like to use dask to parallelize it. But in a first time if I could avoid using a for loop I would feel great.
For arrays with NumPy's advanced-indexing, it would be -
X[row, np.asarray(col)[:,None]].T
Sample run -
In [9]: X
Out[9]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [10]: col=[0,1,2]
...: row = [[0,1],[1,2],[0,1]]
In [11]: X[row, np.asarray(col)[:,None]].T
Out[11]:
array([[0, 4, 2],
[3, 7, 5]])

Efficiently change order of numpy array

I have a 3 dimensional numpy array. The dimension can go up to 128 x 64 x 8192. What I want to do is to change the order in the first dimension by interchanging pairwise.
The only idea I had so far is to create a list of the indices in the correct order.
order = [1,0,3,2...127,126]
data_new = data[order]
I fear, that this is not very efficient but I have no better idea so far
You could reshape to split the first axis into two axes, such that latter of those axes is of length 2 and then flip the array along that axis with [::-1] and finally reshape back to original shape.
Thus, we would have an implementation like so -
a.reshape(-1,2,*a.shape[1:])[:,::-1].reshape(a.shape)
Sample run -
In [170]: a = np.random.randint(0,9,(6,3))
In [171]: order = [1,0,3,2,5,4]
In [172]: a[order]
Out[172]:
array([[0, 8, 5],
[4, 5, 6],
[0, 0, 2],
[7, 3, 8],
[1, 6, 3],
[2, 4, 4]])
In [173]: a.reshape(-1,2,*a.shape[1:])[:,::-1].reshape(a.shape)
Out[173]:
array([[0, 8, 5],
[4, 5, 6],
[0, 0, 2],
[7, 3, 8],
[1, 6, 3],
[2, 4, 4]])
Alternatively, if you are looking to efficiently create those constantly flipping indices order, we could do something like this -
order = np.arange(data.shape[0]).reshape(-1,2)[:,::-1].ravel()

how to search for unique elements by the first column of a multidimensional array

I am trying to find a way how to create a new array from a multidimensional array by taking only elements that are unique in the first column, for example if I have an array
[[1,2,3],
[1,2,3],
[5,2,3]]
After the operation I would like to get this output
[[1,2,3],
[5,2,3]]
Obviously the second an third columns do not need to be unique.
Thanks
Since you are looking to keep the first row of first column uniqueness, you can just use np.unique with its optional return_index argument which will give you the first occurring index (thus fulfils the first row criteria) among the uniqueness on A[:,0] elements, where A is the input array. Thus, we would have a vectorized solution, like so -
_,idx = np.unique(A[:,0],return_index=True)
out = A[idx]
Sample run -
In [16]: A
Out[16]:
array([[1, 2, 3],
[5, 2, 3],
[1, 4, 3]])
In [17]: _,idx = np.unique(A[:,0],return_index=True)
...: out = A[idx]
...:
In [18]: out
Out[18]:
array([[1, 2, 3],
[5, 2, 3]])
main = [[1, 2, 3], [1, 3, 4], [2, 4, 5], [3, 6, 5]]
used = []
new = [[sub, used.append(sub[0])][0] for sub in main if sub[0] not in used]
print(new)
# Output: [[1, 2, 3], [2, 3, 4], [3, 6, 5]]

numpy, Select elements in rows by 1d indexes array

Let we have square array, n*n. For example, n=3 and array is this:
arr = array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
And let we have array of indices in the each ROW. For example:
myidx=array([1, 2, 1], dtype=int64)
I want to get:
[1, 5, 7]
Because in line [0,1,2] take element with index 1, in line [3,4,5] get element with index 2, in line [6,7,8] get element with index 1.
I'm confused, and can't take elements this way using standard numpy indexing.
Thank you for answer.
There's no real pretty way but this does what you are looking for :)
In [1]: from numpy import *
In [2]: arr = array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [3]: myidx = array([1, 2, 1], dtype=int64)
In [4]: arr[arange(len(myidx)), myidx]
Out[4]: array([1, 5, 7])
Simpler way to reach the goal is using choose numpy function:
numpy.choose(myidx, arr.transpose())

Categories

Resources