Python Numpy. Manipulating with 2 matrices

Python Numpy. Manipulating with 2 matrices - python

I have 2 CSV files with the same size. Values are 1s and 0s.
I need to loop over 2 files (matrices) and create a new matrix using the following logic:
if matrix A value = 1 and matrix B value = 1
then
result value is 0,
if 1 and 0
then
0,
if 0 and 0
then
0.
A = [
[1, 0, 1],
[1, 1, 1]
]
B = [
[1, 0, 0],
[1, 0, 0]
]
=>
C = [
[0, 0, 1],
[0, 1, 1]
]
I know that Numpy is used to loop and manipulate with matrices and arrays, but I stuck to find how to do it in a proper way.

Here is one way to get your desired output, but I think the logic you described was not quite what you meant. This outputs an array of 1 where your matrices are different from one another, and 0 where they are alike.
A = np.array([
[1, 0, 1],
[1, 1, 1]
])
B = np.array([
[1, 0, 0],
[1, 0, 0]])
C = (A != B).astype('int')
array([[0, 0, 1],
[0, 1, 1]])

Related

Performing operation on 2D array using indices from 1D array

I have the following array in python:
a = np.array([[1,1,1],[1,1,1],[1,1,1]])
and the following index array:
b = np.array([0,1,2])
I want to index a using b such that I can subtract 1 from the matching row/column and get the following result:
[[0,1,1],[1,0,1],[1,1,0]]
I can do it using loops, wanted to know if there was a "non-loop" way of doing it.
for i in range(len(b)):
a[i][b[i]] = a[i][b[i]] - 1

It looks like there is some confusion on how to handle this.
You want a simple indexing:
a[np.arange(len(a)), b] -= 1
Output:
array([[0, 1, 1],
[1, 0, 1],
[1, 1, 0]])
Output for b = np.array([2,0,1])
array([[1, 1, 0],
[0, 1, 1],
[1, 0, 1]])

Your code produces output as follows:
a = np.array([[1,1,1],[1,1,1],[1,1,1]])
b = np.array([0,1,2])
for i in range(len(b)):
a[i][b[i]] = a[i][b[i]] - 1
Output:
array([[0, 1, 1],
[1, 0, 1],
[1, 1, 0]])
This can be done in non -loopy way as follows:
a[np.arange(len(b)),b] -= 1
print(a)
Output:
array([[0, 1, 1],
[1, 0, 1],
[1, 1, 0]])

Python Numpy According a 2-D array's value to assign values to a 3-D array

Let's say, an array A which shape is (2,3) and values are in 0, 1, 2, 3
Another array B which shape is (2, 3, 4)
Goal:According to A position and value to add 1 in B. without using loop. maybe numpy.where? is possible?
Example:
A = [[0, 1, 3],[2, 1, 0]]
B = np.zeros((2, 3, 4))
something I'm looking for help
B = [[[1, 0, 0, 0]
[0, 1, 0, 0]
[0, 0, 0, 1]]
[[0, 0, 1, 0]
[0, 1, 0, 0]
[1, 0, 0, 0]]]
further more, if value in A is Nah, what will happen. can we just do nothing?

Check out this code:
Method-1
B[0,[0,1,2], A[0]] = 1
B[1,[0,1,2], A[1]] = 1
Method-2
import numpy as np
A = [[0, 1, 3],[2, 1, 0]]
B = np.zeros((2, 3, 4))
for i,j in zip(range(len(A)),A):
for k,l in zip(range(len(j)),j):
B[i][k][l] = 1
print(B)

I've got an idea.
one hot coding.
numpy.eye(4)[A]
so that A has the same shape as B.
A + B

Python pandas: set row and col to zeros if element is zero?

I'm trying to solve the following python interview questions using Pandas:
Given a m x n matrix, if an element is 0, set its entire row and column to 0. Do it in-place.
without using (enumerate)!!!
Here are some examples:
Example 1
[[1, 1, 1], [1, 0, 1], [1, 1, 1]] # input
[[1, 0, 1], [0, 0, 0], [1, 0, 1]] # output
Example 2
[[0, 1, 2, 0], [3, 4, 5, 2], [1, 3, 1, 5]] # input
[[0, 0, 0, 0], [0, 4, 5, 0], [0, 3, 1, 0]] # output

You can try this:
lst = [[1, 1, 1], [1, 0, 1], [1, 1, 1]]
df = pd.DataFrame(lst)
df_result = df.copy(deep=True)
df_result.loc[df.eq(0).any(axis=1)] = 0
df_result.loc[:, df.eq(0).any(axis=0)] = 0
result = df_result.values.tolist()
output:
[[1, 0, 1], [0, 0, 0], [1, 0, 1]]

Using only built-in Python functions:
# Example data (list)
lst = [[1, 1, 1], [1, 0, 1], [1, 1, 1]]
# For each row, if any of the values in the row is 0, replace all the values with 0
# Obs: I'm using a `list comprehension` to make the code shorter
for row in lst:
if any([value==0 for value in row]):
row[:] = [0] * len(row)
Using numpy:
# Import and create the array from the list
import numpy as np
a = np.array(lst)
# Set zeros in-place
a[(a==0).any(1), :] = 0
Using pandas:
# Import and create the dataframe from the list
import pandas as pd
df = pd.DataFrame(lst)
# Set zeros in-place
df.iloc[df.eq(0).any(1), :] = 0
The output for all of them is the same (rows with all zeros if there's at least one original zero on them). That logic was applied in all examples here. As you're still learning nested lists in Python, I would recommend to continue your studies with Python built-in classes, methods, functions, and etc. Afterwards you may want to take a look how indexing works in numpy and pandas so that you can get a better understanding of the code here.
Output:
print(lst)
[[1, 1, 1], [0, 0, 0], [1, 1, 1]]
print(a)
[[1 1 1]
[0 0 0]
[1 1 1]]
# ignore the first line and column,
# as they indicate the row and column names, respectively:
print(df)
0 1 2
0 1 1 1
1 0 0 0
2 1 1 1

Python/NumPy: find the first index of zero, then replace all elements with zero after that for each row

I have an numpy array like this:
a = np.array([[1, 0, 1, 1, 1],
[1, 1, 1, 1, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 1]])
Question 1:
As shown in the title, I want to replace all elements with zero after the first zero appeared. The result should be like this :
a = np.array([[1, 0, 0, 0, 0],
[1, 1, 1, 1, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]])
Question 2: how to slice different columns for each row like this example?
As I am dealing with an array with large size. If any one could find an efficient way to solve this please. Thank you very much.

One way to accomplish question 1 is to use numpy.cumprod
>>> np.cumprod(a, axis=1)
array([[1, 0, 0, 0, 0],
[1, 1, 1, 1, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]])

Question 1:
You could iterate over the array like so:
for i in range(a.shape[0]):
j = 0
row = a[i]
while row[j]>0:
j += 1
row[j+1:] = 0
This will change the array in-place. If you are interested in very high performance, the answers to this question could be of use to find the first zero faster. np.where scans the entire array for this and therefore is not optimal for the task.
Actually, the fastest solution will depend a bit on the distribution of your array entries: If there are many floats in there and rarely is there ever a zero, the while loops in the code above will interrupt late on average, requiring to write only "a few" zeros. If however there are only two possible entries like in your sample array and these occur with a similar probability (i.e. ~50%), there would be a lot of zeros to be written to a, and the following will be faster:
b = np.zeros(a.shape)
for i in range(a.shape[0]):
j = 0
a_row = a[i]
b_row = b[i]
while a_row[j]>0:
b_row[j] = a_row[j]
j += 1
Question 2:
If you mean to slice each row individually on a similar criterion dealing with a first occurence of some kind, you could simply adapt this iteration pattern. If the criterion is more global (like finding the maximum of the row, for example) built-in methods like np.where exist that will be more efficient, but it probably would depend a bit on the criterion itself which choice is best.

Question 1: An efficient way to do this would be the following.
import numpy as np
a = np.array([[1, 0, 1, 1, 1],
[1, 1, 1, 1, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 1]])
for row in a:
zeros = np.where(row == 0)[0]
if (len(zeros)):# Check if zero exists
row[zeros[0]:] = 0
print(a)
Output:
[[1 0 0 0 0]
[1 1 1 1 0]
[1 0 0 0 0]
[1 0 0 0 0]]
Question 2: Using the same array, for each row rowIdx, you can have a array of columns colIdxs that you want to extract from.
rowIdx = 2
colIdxs = [1, 3, 4]
print(a[rowIdx, colIdxs])
Output:
[0 1 1]

I prefer Ayrat's creative answer for the first question, but if you need to slice different columns for different rows in large size, this could help you:
indexer = tuple(np.s_[i:a.shape[1]] for i in (a==0).argmax(axis=1))
for i,j in enumerate(indexer):
a[i,j]=0
indexer:
(slice(1, 5, None), slice(4, 5, None), slice(1, 5, None), slice(1, 5, None))
or:
indexer = (a==0).argmax(axis=1)
for i in range(a.shape[0]):
a[i,indexer[i]:]=0
indexer:
[1 4 1 1]
output:
[[1 0 0 0 0]
[1 1 1 1 0]
[1 0 0 0 0]
[1 0 0 0 0]]

Creating array with relations between identifiers

Consider the following toy array a:
a = np.array([[1074279, 937077, 1445858, 1679465],
[1074280, 1023600, 1679465, 937077],
[1074281, 908450, 1932761, 1100360],
[1074282, 1445858, 893656, 908183],
[1074283, 1958030, 1932761, 1445858]])
The first column is an identifier.
How can I transform the array in a way that shows when an identifier is related
to another? A relation exists if two identifiers have in common at least one
value in columns 2-4 of a.
The end result should be the array b below:
b = np.array([[1, 1, 0, 1, 1],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 1, 1],
[1, 0, 1, 1, 1]])
This can perhaps better be understood as follows:
1074279 1074280 1074281 1074282 1074283
1074279 1 1 0 1 1
1074280 1 1 0 0 0
1074281 0 0 1 0 1
1074282 1 0 0 1 1
1074283 1 0 1 1 1
I have tried (double) looping over elements to find all the combinations and
then reduce that to the desired array but I cannot get it right.

Outer-equality does the job for a vectorized solution -
In [90]: np.equal.outer(a[:,1:],a[:,1:]).any(axis=(1,3)).view('i1')
Out[90]:
array([[1, 1, 0, 1, 1],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 1, 1],
[1, 0, 1, 1, 1]], dtype=int8)
Explanation
Basically, we are performing pairwise equality comparison for all rows and within each row pairwise equality comparison with np.equal.outer(..). The equality comparison is a 4D array. Thus, for the slice a[:,1:] being (m,n) shaped, would give us a equality comparison array of shape (m,n,m,n). So, then we ANY reduce it along the axes - 1 and 3 to give us a 2D boolean array of shape (m,m) and that's our final output after conversion to an int array.
An alternative with explicit dimension-expansion would be -
In [92]: (a[:,1:,None,None]==a[:,1:]).any(axis=(1,3)).view('i1')
Out[92]:
array([[1, 1, 0, 1, 1],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 1, 1],
[1, 0, 1, 1, 1]], dtype=int8)
So, the only change is that we are adding new axes for the first version of the slice with None/np.newaxis to create a 4D version. This is then compared against the original 2D version to result in the 4D equality compared boolean array.

A simpler classic solution that is easily understandable:
def has_in_common(a1, a2):
"""
#param a1, a2: two input arrays
#returns True if a1 and a2 has at least one value in common, otherwise False
"""
for v1 in a1[1:]:
for v2 in a2[1:]:
if v1 == v2:
return True
return False
def relation_matrix(a):
"""
#param a: an input array
#returns m a matrix specifying the relationship between the rows of a
ex: a = [[1074279, 937077, 1445858, 1679465],
[1074280, 1023600, 1679465, 937077],
[1074281, 908450, 1932761, 1100360],
[1074282, 1445858, 893656, 908183],
[1074283, 1958030, 1932761, 1445858]]
m = [[1, 1, 0, 1, 1],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 1, 1],
[1, 0, 1, 1, 1]]
more precisely
m = 1074279 1074280 1074281 1074282 1074283
1074279 1 1 0 1 1
1074280 1 1 0 0 0
1074281 0 0 1 0 1
1074282 1 0 0 1 1
1074283 1 0 1 1 1
"""
m = np.zeros((a.shape[0], a.shape[0]))
for i in range(len(a)):
for j in range(len(a)):
if has_in_common(a[i], a[j]):
m[i, j] = 1
return m.astype('int')
Demo:
In [1]:relation_matrix(a)
Out[1]:
array([[1, 1, 0, 1, 1],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 1, 1],
[1, 0, 1, 1, 1]])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Numpy. Manipulating with 2 matrices - python

Related

Performing operation on 2D array using indices from 1D array

Python Numpy According a 2-D array's value to assign values to a 3-D array

Python pandas: set row and col to zeros if element is zero?

Python/NumPy: find the first index of zero, then replace all elements with zero after that for each row

Creating array with relations between identifiers

Categories

Resources