Collating row entries in 2D array - python

I have a 2d numpy array consisting of 1s and 0s.
I want to club up the 1s and 0s of each row.
arr =
[[0 1 0]
[ 0 0 0]
[ 1 1 1]
[ 0 1 1]]
Desired output (each element is dtype str, to make sure leading zeros are not omitted)
[ 010 , 000 , 111 , 011 ]
How can I manipulate the 2d array to get this output? Is it possible in numpy or regex packages, by using their functions? Can a for loop be avoided to do this array transformation?

Using strings:
import numpy as np
arr = np.array([[0, 1, 0], [ 0, 0, 0], [ 1, 1, 1], [ 0, 1, 1]])
binaries = []
for idx, row in enumerate(arr):
strings = [str(integer) for integer in row]
a_string = "".join(strings)
binaries.append(a_string)
>>> binaries
>>> ['010', '000', '111', '011']

The question is quite unclear, assuming integers in and out, you could use:
a = np.array([[0, 1, 0],
[0, 0, 0],
[1, 1, 1],
[0, 1, 1]])
out = (a[:,::-1]*(10**np.arange(a.shape[1]))).sum(1)
But you won't have leading zeros…
output:
array([ 10, 0, 111, 11])
Assuming you really want to convert from binary, you should probably use np.packbits:
out = np.packbits(np.pad(a, ((0,0), (8-a.shape[1],0))), axis=1, bitorder='big')
output:
array([[2],
[0],
[7],
[3]], dtype=uint8)
or as flat version:
out = (np.packbits(np.pad(a, ((0,0), (8-a.shape[1],0))), axis=1, bitorder='big')
.ravel()
)
# array([2, 0, 7, 3], dtype=uint8)

Related

Indexing the max elements in a multidimensional tensor in PyTorch

I'm trying to index the maximum elements along the last dimension in a multidimensional tensor. For example, say I have a tensor
A = torch.randn((5, 2, 3))
_, idx = torch.max(A, dim=2)
Here idx stores the maximum indices, which may look something like
>>>> A
tensor([[[ 1.0503, 0.4448, 1.8663],
[ 0.8627, 0.0685, 1.4241]],
[[ 1.2924, 0.2456, 0.1764],
[ 1.3777, 0.9401, 1.4637]],
[[ 0.5235, 0.4550, 0.2476],
[ 0.7823, 0.3004, 0.7792]],
[[ 1.9384, 0.3291, 0.7914],
[ 0.5211, 0.1320, 0.6330]],
[[ 0.3292, 0.9086, 0.0078],
[ 1.3612, 0.0610, 0.4023]]])
>>>> idx
tensor([[ 2, 2],
[ 0, 2],
[ 0, 0],
[ 0, 2],
[ 1, 0]])
I want to be able to access these indices and assign to another tensor based on them. Meaning I want to be able to do
B = torch.new_zeros(A.size())
B[idx] = A[idx]
where B is 0 everywhere except where A is maximum along the last dimension. That is B should store
>>>>B
tensor([[[ 0, 0, 1.8663],
[ 0, 0, 1.4241]],
[[ 1.2924, 0, 0],
[ 0, 0, 1.4637]],
[[ 0.5235, 0, 0],
[ 0.7823, 0, 0]],
[[ 1.9384, 0, 0],
[ 0, 0, 0.6330]],
[[ 0, 0.9086, 0],
[ 1.3612, 0, 0]]])
This is proving to be much more difficult than I expected, as the idx does not index the array A properly. Thus far I have been unable to find a vectorized solution to use idx to index A.
Is there a good vectorized way to do this?
You can use torch.meshgrid to create an index tuple:
>>> index_tuple = torch.meshgrid([torch.arange(x) for x in A.size()[:-1]]) + (idx,)
>>> B = torch.zeros_like(A)
>>> B[index_tuple] = A[index_tuple]
Note that you can also mimic meshgrid via (for the specific case of 3D):
>>> index_tuple = (
... torch.arange(A.size(0))[:, None],
... torch.arange(A.size(1))[None, :],
... idx
... )
Bit more explanation:
We will have the indices something like this:
In [173]: idx
Out[173]:
tensor([[2, 1],
[2, 0],
[2, 1],
[2, 2],
[2, 2]])
From this, we want to go to three indices (since our tensor is 3D, we need three numbers to retrieve each element). Basically we want to build a grid in the first two dimensions, as shown below. (And that's why we use meshgrid).
In [174]: A[0, 0, 2], A[0, 1, 1]
Out[174]: (tensor(0.6288), tensor(-0.3070))
In [175]: A[1, 0, 2], A[1, 1, 0]
Out[175]: (tensor(1.7085), tensor(0.7818))
In [176]: A[2, 0, 2], A[2, 1, 1]
Out[176]: (tensor(0.4823), tensor(1.1199))
In [177]: A[3, 0, 2], A[3, 1, 2]
Out[177]: (tensor(1.6903), tensor(1.0800))
In [178]: A[4, 0, 2], A[4, 1, 2]
Out[178]: (tensor(0.9138), tensor(0.1779))
In the above 5 lines, the first two numbers in the indices are basically the grid that we build using meshgrid and the third number is coming from idx.
i.e. the first two numbers form a grid.
(0, 0) (0, 1)
(1, 0) (1, 1)
(2, 0) (2, 1)
(3, 0) (3, 1)
(4, 0) (4, 1)
An ugly hackaround is to create a binary mask out of idx and use it to index the arrays. The basic code looks like this:
import torch
torch.manual_seed(0)
A = torch.randn((5, 2, 3))
_, idx = torch.max(A, dim=2)
mask = torch.arange(A.size(2)).reshape(1, 1, -1) == idx.unsqueeze(2)
B = torch.zeros_like(A)
B[mask] = A[mask]
print(A)
print(B)
The trick is that torch.arange(A.size(2)) enumerates the possible values in idx and mask is nonzero in places where they equal the idx. Remarks:
If you really discard the first output of torch.max, you can use torch.argmax instead.
I assume that this is a minimal example of some wider problem, but be aware that you are currently reinventing torch.nn.functional.max_pool3d with kernel of size (1, 1, 3).
Also, be aware that in-place modification of tensors with masked assignment can cause issues with autograd, so you may want to use torch.where as shown here.
I would expect that somebody comes up with a cleaner solution (avoiding the intermedia allocation of the mask array), likely making use of torch.index_select, but I can't get it to work right now.
could use torch.scatter here
>>> import torch
>>> a = torch.randn(4,2,3)
>>> a
tensor([[[ 0.1583, 0.1102, -0.8188],
[ 0.6328, -1.9169, -0.5596]],
[[ 0.5335, 0.4069, 0.8403],
[-1.2537, 0.9868, -0.4947]],
[[-1.2830, 0.4386, -0.0107],
[ 1.3384, 0.5651, 0.2877]],
[[-0.0334, -1.0619, -0.1144],
[ 0.1954, -0.7371, 1.7001]]])
>>> ind = torch.max(a,1,keepdims=True)[1]
>>> ind
tensor([[[1, 0, 1]],
[[0, 1, 0]],
[[1, 1, 1]],
[[1, 1, 1]]])
>>> torch.zeros_like(a).scatter(1,ind,a)
tensor([[[ 0.0000, 0.1102, 0.0000],
[ 0.1583, 0.0000, -0.8188]],
[[ 0.5335, 0.0000, 0.8403],
[ 0.0000, 0.4069, 0.0000]],
[[ 0.0000, 0.0000, 0.0000],
[-1.2830, 0.4386, -0.0107]],
[[ 0.0000, 0.0000, 0.0000],
[-0.0334, -1.0619, -0.1144]]])

Python Numpy. Manipulating with 2 matrices

I have 2 CSV files with the same size. Values are 1s and 0s.
I need to loop over 2 files (matrices) and create a new matrix using the following logic:
if matrix A value = 1 and matrix B value = 1
then
result value is 0,
if 1 and 0
then
0,
if 0 and 0
then
0.
A = [
[1, 0, 1],
[1, 1, 1]
]
B = [
[1, 0, 0],
[1, 0, 0]
]
=>
C = [
[0, 0, 1],
[0, 1, 1]
]
I know that Numpy is used to loop and manipulate with matrices and arrays, but I stuck to find how to do it in a proper way.
Here is one way to get your desired output, but I think the logic you described was not quite what you meant. This outputs an array of 1 where your matrices are different from one another, and 0 where they are alike.
A = np.array([
[1, 0, 1],
[1, 1, 1]
])
B = np.array([
[1, 0, 0],
[1, 0, 0]])
C = (A != B).astype('int')
array([[0, 0, 1],
[0, 1, 1]])

Insert a list into numpy-based matrix

I want to insert a list into numpy-based matrix in a specific index. For instance, the following code (python 2.7) is supposed to insert the list [5,6,7] into M in the second place:
M = [[0, 0], [0, 1], [1, 0], [1, 1]]
M = np.asarray(M)
X = np.insert(M, 1, [5,6,7])
print(X)
This, however, does not output what I would like. It causes to mess up the matrix M by merging all lists into one single list. How can I achieve adding any list in any place of numpy-based matrix?
Thank you
In [80]: M = [[0, 0], [0, 1], [1, 0], [1, 1]]
...: M1 = np.asarray(M)
...:
List insert:
In [81]: M[1:2] = [[5,6,7]]
In [82]: M
Out[82]: [[0, 0], [5, 6, 7], [1, 0], [1, 1]]
Contrast the array made from the original M and the modified one:
In [83]: M1
Out[83]:
array([[0, 0],
[0, 1],
[1, 0],
[1, 1]])
In [84]: np.array(M)
Out[84]:
array([list([0, 0]), list([5, 6, 7]), list([1, 0]), list([1, 1])],
dtype=object)
The second one is not a 2d array.
np.insert without an axis ravels things (check the docs)
In [85]: np.insert(M1,1,[5,6,7])
Out[85]: array([0, 5, 6, 7, 0, 0, 1, 1, 0, 1, 1])
If I specify an axis it complains about a mismatch in shapes:
In [86]: np.insert(M1,1,[5,6,7],axis=0)
...
5071 new[slobj] = arr[slobj]
5072 slobj[axis] = slice(index, index+numnew)
-> 5073 new[slobj] = values
5074 slobj[axis] = slice(index+numnew, None)
5075 slobj2 = [slice(None)] * ndim
ValueError: could not broadcast input array from shape (1,3) into shape (1,2)
It creates a (1,2) shape slot to receive the new value, but [5,6,7] won't fit.
In [87]: np.insert(M1,1,[5,6],axis=0)
Out[87]:
array([[0, 0],
[5, 6],
[0, 1],
[1, 0],
[1, 1]])
arr = numpy.array([input().split() for i in range(int(input().split()[0]))])
print(arr)
INPUT:
2 1 2 3 4 5 6 7 8
OUTPUT:
[['1' '2' '3' '4']
['5' '6' '7' '8']]

Using a numpy array to assign values to another array

I have the following numpy array matrix ,
matrix = np.zeros((3,5), dtype = int)
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
Suppose I have this numpy array indices as well
indices = np.array([[1,3], [2,4], [0,4]])
array([[1, 3],
[2, 4],
[0, 4]])
Question: How can I assign 1s to the elements in the matrix where their indices are specified by the indices array. A vectorized implementation is expected.
For more clarity, the output should look like:
array([[0, 1, 0, 1, 0], #[1,3] elements are changed
[0, 0, 1, 0, 1], #[2,4] elements are changed
[1, 0, 0, 0, 1]]) #[0,4] elements are changed
Here's one approach using NumPy's fancy-indexing -
matrix[np.arange(matrix.shape[0])[:,None],indices] = 1
Explanation
We create the row indices with np.arange(matrix.shape[0]) -
In [16]: idx = np.arange(matrix.shape[0])
In [17]: idx
Out[17]: array([0, 1, 2])
In [18]: idx.shape
Out[18]: (3,)
The column indices are already given as indices -
In [19]: indices
Out[19]:
array([[1, 3],
[2, 4],
[0, 4]])
In [20]: indices.shape
Out[20]: (3, 2)
Let's make a schematic diagram of the shapes of row and column indices, idx and indices -
idx (row) : 3
indices (col) : 3 x 2
For using the row and column indices for indexing into input array matrix, we need to make them broadcastable against each other. One way would be to introduce a new axis into idx, making it 2D by pushing the elements into the first axis and allowing a singleton dim as the last axis with idx[:,None], as shown below -
idx (row) : 3 x 1
indices (col) : 3 x 2
Internally, idx would be broadcasted, like so -
In [22]: idx[:,None]
Out[22]:
array([[0],
[1],
[2]])
In [23]: indices
Out[23]:
array([[1, 3],
[2, 4],
[0, 4]])
In [24]: np.repeat(idx[:,None],2,axis=1) # indices has length of 2 along cols
Out[24]:
array([[0, 0], # Internally broadcasting would be like this
[1, 1],
[2, 2]])
Thus, the broadcasted elements from idx would be used as row indices and column indices from indices for indexing into matrix for setting elements in it. Since, we had -
idx = np.arange(matrix.shape[0]),
Thus, we would end up with -
matrix[np.arange(matrix.shape[0])[:,None],indices] for setting elements.
this involves loop and hence may not be very efficient for large arrays
for i in range(len(indices)):
matrix[i,indices[i]] = 1
> matrix
Out[73]:
array([[0, 1, 0, 1, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 0, 1]])

A row of zeroes is added to an array instead of the row I want

I have an array which I'm doing some calculations on. The array begins as
e = array([[1,2,1,3],
[3,-1,-3,-1],
[2,3,1,4]])
I modify it a bit to convert it to:
array([[ 1, 2, 1, 3],
[ 0, -7, -6, -10],
[ 0, -1, -1, -2]])
Then I run this code on it:
import numpy as np
from fractions import Fraction
def ref(x):
dimension = x.shape
row_counter = 1
first_values = [x[i][row_counter] for i in range(dimension[0])] #gets a list of elements of the column
first_values = [number != 0 for number in first_values] #0 is a pivot element?
if False in first_values:
true_index = first_values.index(True); false_index = first_values.index(False)
if true_index > false_index: #not any more
x[[false_index, true_index]] = x[[true_index, false_index]]
for i in range(row_counter+1,dimension[0]):
multiplier = Fraction(x[row_counter][row_counter], x[i][row_counter])**-1
row1 = multiplier*x[row_counter]
row1 = x[i]-row1
print row1
x[i] = row1
return x
Running this returns:
[0 0 -1/7 -4/7]
array([[ 1, 2, 1, 3],
[ 0, -7, -6, -10],
[ 0, 0, 0, 0]])
So the result should be
array([[ 1, 2, 1, 3],
[ 0, -7, -6, -10],
[ 0, 0, -1/7, -4/7]])
It prints the correct row entry but it doesn't get added to the array, and a row of zeroes is added instead. Could someone please tell me why? Thanks.
In general, numpy arrays are homogeneous with specific types. For example:
>>> a = np.array([1,2,3])
>>> a
array([1, 2, 3])
>>> a.dtype
dtype('int64')
When you set an element or slice specifically, what you add gets coerced to the current dtype, so:
>>> a[0] = 5
>>> a
array([5, 2, 3])
but
>>> a[0] = 4.3
>>> a
array([4, 2, 3])
You can get upcasting when you're not acting in-place and so numpy is going to have to make a copy (i.e. a new object) anyway:
>>> a = np.array([1,2,3])
>>> a + 4.3
array([ 5.3, 6.3, 7.3])
>>> (a + 4.3).dtype
dtype('float64')
In your case, you can get the behaviour you want if you start with a numpy array of dtype object:
>>> e = np.array([[ 1, 2, 1, 3],
... [ 0, -7, -6, -10],
... [ 0, -1, -1, -2]], dtype=object)
>>>
>>> ref(e)
array([[1, 2, 1, 3],
[0, -7, -6, -10],
[0, 0, -1/7, -4/7]], dtype=object)

Categories

Resources