Transform an array to 1s and 0s using another array - python

I have two arrays
arr1 = np.array([[4, 1, 3, 2, 5], [5, 2, 4, 1, 3]])
arr2 = np.array([[2], [1]])
I want to transform array 1 to a binary array using the elements of the array 2 in the following way
For row 1 of array 1, I want to use the row 1 of array 2 i.e. 2 - to make the top 2 values of array 1 as 1s and the rest as 0s
Similarly for row 2 of array 1, I want to use the row 2 of array 2 i.e. 1 - to make the top 1 value of array 1 as 1s and the rest as 0s
So arr1 would get transformed as follows
arr1_transformed = np.array([[1, 0, 0, 0, 1], [1, 0, 0, 0, 0]])
Here is what I tried.
arr1_sorted_indices = np.argosrt(-arr1)
This gave me the indices of the sorted array
array([[1, 3, 2, 0, 4],
[3, 1, 4, 2, 0]])
Now I think I need to mask this array with the help of arr2 to get the desired output and I'm not sure how to do it.

this should do the job in the mentioned case:
def trasform_arr(arr1,arr2):
for i in range(0,len(arr1)):
if i >= len(arr2):
arr1[i] = [0 for x in arr1[i]]
else:
sorted_arr = sorted(arr1[i])[-arr2[i][0]:]
arr1[i] = [1 if x in sorted_arr else 0 for x in arr1[i]]
arr1 = [[4, 1, 3, 2, 5], [5, 2, 4, 1, 3]]
arr2 = [[2], [1]]
trasform_arr(arr1,arr2)
print(arr1)

You can try the following:
import numpy as np
arr1 = np.array([[4, 1, 3, 2, 5], [5, 2, 4, 1, 3]])
arr2 = np.array([[2],[1]])
r, c = arr1.shape
s = np.argsort(np.argsort(-arr1))
out = (np.arange(c) < arr2)[np.c_[0:r], s] * 1
print(out)
It gives:
[[1 0 0 0 1]
[1 0 0 0 0]]

Related

Reorder a square array using a sorted 1D array

Let's say I have a symmetric n-by-n array A and a 1D array x of length n, where the rows/columns of A correspond to the entries of x, and x is ordered. Now assume both A and x are randomly rearranged, so that the rows/columns still correspond but they're no longer in order. How can I manipulate A to recover the correct order?
As an example: x = array([1, 3, 2, 0]) and
A = array([[1, 3, 2, 0],
[3, 9, 6, 0],
[2, 6, 4, 0],
[0, 0, 0, 0]])
so the mapping from x to A in this example is A[i][j] = x[i]*x[j]. x should be sorted like array([0, 1, 2, 3]) and I want to arrive at
A = array([[0, 0, 0, 0],
[0, 1, 2, 3],
[0, 2, 4, 6],
[0, 3, 6, 9]])
I guess that OP is looking for a flexible way to use indices that sorts both rows and columns of his mapping at once. What is more, OP might be interested in doing it in reverse, i.e. find and initial view of mapping if it's lost.
def mapping(x, my_map, return_index=True, return_inverse=True):
idx = np.argsort(x)
out = my_map(x[idx], x[idx])
inv = np.empty_like(idx)
inv[idx] = np.arange(len(idx))
return out, idx, inv
x = np.array([1, 3, 2, 0])
a, idx, inv = mapping(x, np.multiply.outer) #sorted mapping
b = np.multiply.outer(x, x) #straight mapping
print(b)
>>> [[1 3 2 0]
[3 9 6 0]
[2 6 4 0]
[0 0 0 0]]
print(a)
>>> [[0 0 0 0]
[0 1 2 3]
[0 2 4 6]
[0 3 6 9]]
np.array_equal(b, a[np.ix_(inv, inv)]) #sorted to straight
>>> True
np.array_equal(a, b[np.ix_(idx, idx)]) #straight to sorted
>>> True
A simple implementation would be
idx = np.argsort(x)
A = A[idx, :]
A = A[:, idx]
Another possibility would be (all credit to #mathfux):
A[np.ix_(idx, idx)]
You can use argsort and fancy indexing:
idx = np.argsort(x)
A2 = A[idx[None], idx[:,None]]
output:
array([[0, 0, 0, 0],
[0, 1, 2, 3],
[0, 2, 4, 6],
[0, 3, 6, 9]])

Return difference of two 2D arrays

I have 2 2d arrays and I would like to return all values that are differing in the second array while keeping the existing dimensions.
I've done something like diff = arr2[np.nonzero(arr2-arr1)] works to give me the differing elements but how do I keep the dimensions and relative position of the elements?
Example Input:
arr1 = [[0 1 2] arr2 = [[0 1 2]
[3 4 5] [3 5 5]
[6 7 8]] [6 7 8]]
Expected output:
diff = [[0 0 0]
[0 5 0]
[0 0 0]]
How about the following:
import numpy as np
arr1 = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
arr2 = np.array([[0, 1, 2], [3, 5, 5], [6, 7, 8]])
diff = arr2 * ((arr2 - arr1) != 0)
print(diff)
# [[0 0 0]
# [0 5 0]
# [0 0 0]]
EDIT: Surprisingly to me, the following first version of my answer (corrected by OP) might be faster:
diff = arr2 * np.abs(np.sign(arr2 - arr1))
If they are numpy arrays, you could do
ans = ar1 * 0
ans[ar1 != ar2] = ar2[ar1 != ar2]
ans
# array([[0, 0, 0],
# [0, 5, 0],
# [0, 0, 0]])
Without numpy, you can use map
list(map(lambda a, b: list(map(lambda x, y: y if x != y else 0, a, b)), arr1, arr2))
# [[0, 0, 0], [0, 5, 0], [0, 0, 0]]
Data
import numpy as np
arr1 = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
arr2 = [[0, 1, 2], [3, 5, 5], [6, 7, 8]]
ar1 = np.array(arr1)
ar2 = np.array(arr2)
I am surprised no one proposed the numpy.where method:
diff = np.where(arr1!=arr2, arr2, 0)
Literally, where arr1 and arr2 are different take the values of arr2, else take 0.
Output:
array([[0, 0, 0],
[0, 5, 0],
[0, 0, 0]])
np.copyto
You can check for inequality between the two arrays then use np.copyto with np.zeros/ np.zeros_like.
out = np.zeros(arr2.shape) # or np.zeros_like(arr2)
np.copyto(out, arr2, where=arr1!=arr2)
print(out)
# [[0 0 0]
# [0 5 0]
# [0 0 0]]
np.where
You can use np.where and specify x, y args.
out = np.where(arr1!=arr2, arr2, 0)
# [[0 0 0]
# [0 5 0]
# [0 0 0]]

Numpy assign an array value based on the values of another array with column selected based on a vector

I have a 2 dimensional array
X
array([[2, 3, 3, 3],
[3, 2, 1, 3],
[2, 3, 1, 2],
[2, 2, 3, 1]])
and a 1 dimensional array
y
array([1, 0, 0, 1])
For each row of X, i want to find the column index where X has the lowest value and y has a value of 1, and set the corresponding row column pair in a third matrix to 1
For example, in case of the first row of X, the column index corresponding to the minimum X value (for the first row only) and y = 1 is 0, then I want Z[0,0] = 1 and all other Z[0,i] = 0.
Similarly, for the second row, column index 0 or 3 gives the lowest X value with y = 1. Then i want either Z[1,0] or Z[1,3] = 1 (preferably Z[1,0] = 1 and all other Z[1,i] = 0, since 0 column is the first occurance)
My final Z array will look like
Z
array([[1, 0, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0],
[0, 0, 0, 1]])
One way to do this is using masked arrays.
import numpy as np
X = np.array([[2, 3, 3, 3],
[3, 2, 1, 3],
[2, 3, 1, 2],
[2, 2, 3, 1]])
y = np.array([1, 0, 0, 1])
#get a mask in the shape of X. (True for places to ignore.)
y_mask = np.vstack([y == 0] * len(X))
X_masked = np.ma.masked_array(X, y_mask)
out = np.zeros_like(X)
mins = np.argmin(X_masked, axis=0)
#Output: array([0, 0, 0, 3], dtype=int64)
#Now just set the indexes to 1 on the minimum for each axis.
out[np.arange(len(out)), mins] = 1
print(out)
[[1 0 0 0]
[1 0 0 0]
[1 0 0 0]
[0 0 0 1]]
you can use numpy.argmin(), to get the indexes of the min value at each row of X. For example:
import numpy as np
a = np.arange(6).reshape(2,3) + 10
ids = np.argmin(a, axis=1)
Similarly, you can the indexes where y is 1 by either numpy.nonzero or numpy.where.
Once you have the two index arrays setting the values in third array should be quite easy.

Sum over rows in scipy.sparse.csr_matrix

I have a big csr_matrix and I want to add over rows and obtain a new csr_matrix with the same number of columns but reduced number of rows. (Context: The matrix is a document-term matrix obtained from sklearn CountVectorizer and I want to be able to quickly combine documents according to codes associated with these documents)
For a minimal example, this is my matrix:
import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse import vstack
row = np.array([0, 4, 1, 3, 2])
col = np.array([0, 2, 2, 0, 1])
dat = np.array([1, 2, 3, 4, 5])
A = csr_matrix((dat, (row, col)), shape=(5, 5))
print A.toarray()
[[1 0 0 0 0]
[0 0 3 0 0]
[0 5 0 0 0]
[4 0 0 0 0]
[0 0 2 0 0]]
No let's say I want a new matrix B in which rows (1, 4) and (2, 3, 5) are combined by summing them, which would look something like this:
[[5 0 0 0 0]
[0 5 5 0 0]]
And should be again in sparse format (because the real data I'm working with is large). I tried to sum over slices of the matrix and then stack it:
idx1 = [1, 4]
idx2 = [2, 3, 5]
A_sub1 = A[idx1, :].sum(axis=1)
A_sub2 = A[idx2, :].sum(axis=1)
B = vstack((A_sub1, A_sub2))
But this gives me the summed up values just for the non-zero columns in the slice, so I can't combine it with the other slices because the number of columns in the summed slices are different.
I feel like there must be an easy way to do this. But I couldn't find any discussion of this online or in the documentation. What am I missing?
Thank you for your help
Note that you can do this by carefully constructing another matrix. Here's how it would work for a dense matrix:
>>> S = np.array([[1, 0, 0, 1, 0,], [0, 1, 1, 0, 1]])
>>> np.dot(S, A.toarray())
array([[5, 0, 0, 0, 0],
[0, 5, 5, 0, 0]])
>>>
The sparse version is only a little more complicated. The information about which rows should be summed together is encoded in row:
col = range(5)
row = [0, 1, 1, 0, 1]
dat = [1, 1, 1, 1, 1]
S = csr_matrix((dat, (row, col)), shape=(2, 5))
result = S * A
# check that the result is another sparse matrix
print type(result)
# check that the values are the ones we want
print result.toarray()
Output:
<class 'scipy.sparse.csr.csr_matrix'>
[[5 0 0 0 0]
[0 5 5 0 0]]
You can handle more rows in your output by including higher values in row and extending the shape of S accordingly.
The indexing should be:
idx1 = [0, 3] # rows 1 and 4
idx2 = [1, 2, 4] # rows 2,3 and 5
Then you need to keep A_sub1 and A_sub2 in sparse format and use axis=0:
A_sub1 = csr_matrix(A[idx1, :].sum(axis=0))
A_sub2 = csr_matrix(A[idx2, :].sum(axis=0))
B = vstack((A_sub1, A_sub2))
B.toarray()
array([[5, 0, 0, 0, 0],
[0, 5, 5, 0, 0]])
Note, I think the A[idx, :].sum(axis=0) operations involve conversion from sparse matrices - so #Mr_E's answer is probably better.
Alternatively, it works when you use axis=0 and np.vstack (as opposed to scipy.sparse.vstack):
A_sub1 = A[idx1, :].sum(axis=0)
A_sub2 = A[idx2, :].sum(axis=0)
np.vstack((A_sub1, A_sub2))
Giving:
matrix([[5, 0, 0, 0, 0],
[0, 5, 5, 0, 0]])

1D to 2D array - python

I would like to change the data stored in 1D into 2D:
I mean:
from
x|y|a
1|1|a(1,1)
2|1|a(2,1)
3|1|a(3,1)
1|2|a(1,2)
...
into:
x\y|1 |2 |3
1 |a(1,1)|a(1,2)|a(1,3
2 |a(2,1)|a(2,2)|a(2,3)...
3 |a(3,1)|a(3,2)|a(3,3)...
...
I did it by 2 loops:
(rows - array of x,y,a)
for n in range(len(rows)):
for k in range(x_len):
for l in range(y_len):
if ((a[2, n] == x[0, k]) and (a[3, n] == y[0, l])):
c[k, l] = a[0, n]
but it takes ages, so my question is if there is a smart and quick
solution for that in Python.
So to clarify what I want to do:
I know the return() function, the point is that it's randomly in array a.
So:
a = np.empty([4, len(rows)]
I read the data into array a from the database which has 4 columns (1,2,x,y) and 'len(rows)' rows.
I am interested in '1' column - this one I want to put to the new modified array.
x = np.zeros([1, x_len], float)
y = np.zeros([1, y_len], float)
x is a vector of sorted column(x) from the array a, but without duplicitas with a length x_len
(I read it by the sql query: select distinct ... )
y is a vector of sorted column(y) from the array a (without duplicitas) with a length y_len
Then I am making the array:
c = np.zeros([x_len, y_len], float)
and put by 3 loops (sorry for the mistake before) the data from array a:
>
for n in range(len(rows)):
for k in range(x_len):
for l in range(y_len):
if ((a[2, n] == x[0, k]) and (a[3, n] == y[0, l])):
c[k, l] = a[0, n]
Example:
Array a
array([[1, 3, 6, 5, 6],
[1, 2, 5, 5, 6],
[1, 4, 7, 1, 2], ## x
[2, 5, 3, 3, 4]]) ## y
Vectors: x and y
[[1,2,4,7]] ## x with x_len=4
[[2,3,4,5]] ## y with y_len=4
Array c
array([[1, 5, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 3],
[0, 6, 0, 0]])
the last array c looks like this (the first a[0] is written into):
x\y 2|3|4|5
-----------
1 1|5|0|0
2 0|0|0|0
4 0|0|0|3
7 0|6|0|0
I hope I didn't make mistake how it's written into the array c.
Thanks a lot for any help.
You could use numpy:
>>> import numpy as np
>>> a = np.arange(9)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
>>> a.reshape(3,3)
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
#or:
>>> a.reshape(3,3).transpose()
array([[0, 3, 6],
[1, 4, 7],
[2, 5, 8]])

Categories

Resources