Python pandas: set row and col to zeros if element is zero? - python

I'm trying to solve the following python interview questions using Pandas:
Given a m x n matrix, if an element is 0, set its entire row and column to 0. Do it in-place.
without using (enumerate)!!!
Here are some examples:
Example 1
[[1, 1, 1], [1, 0, 1], [1, 1, 1]] # input
[[1, 0, 1], [0, 0, 0], [1, 0, 1]] # output
Example 2
[[0, 1, 2, 0], [3, 4, 5, 2], [1, 3, 1, 5]] # input
[[0, 0, 0, 0], [0, 4, 5, 0], [0, 3, 1, 0]] # output

You can try this:
lst = [[1, 1, 1], [1, 0, 1], [1, 1, 1]]
df = pd.DataFrame(lst)
df_result = df.copy(deep=True)
df_result.loc[df.eq(0).any(axis=1)] = 0
df_result.loc[:, df.eq(0).any(axis=0)] = 0
result = df_result.values.tolist()
output:
[[1, 0, 1], [0, 0, 0], [1, 0, 1]]

Using only built-in Python functions:
# Example data (list)
lst = [[1, 1, 1], [1, 0, 1], [1, 1, 1]]
# For each row, if any of the values in the row is 0, replace all the values with 0
# Obs: I'm using a `list comprehension` to make the code shorter
for row in lst:
if any([value==0 for value in row]):
row[:] = [0] * len(row)
Using numpy:
# Import and create the array from the list
import numpy as np
a = np.array(lst)
# Set zeros in-place
a[(a==0).any(1), :] = 0
Using pandas:
# Import and create the dataframe from the list
import pandas as pd
df = pd.DataFrame(lst)
# Set zeros in-place
df.iloc[df.eq(0).any(1), :] = 0
The output for all of them is the same (rows with all zeros if there's at least one original zero on them). That logic was applied in all examples here. As you're still learning nested lists in Python, I would recommend to continue your studies with Python built-in classes, methods, functions, and etc. Afterwards you may want to take a look how indexing works in numpy and pandas so that you can get a better understanding of the code here.
Output:
print(lst)
[[1, 1, 1], [0, 0, 0], [1, 1, 1]]
print(a)
[[1 1 1]
[0 0 0]
[1 1 1]]
# ignore the first line and column,
# as they indicate the row and column names, respectively:
print(df)
0 1 2
0 1 1 1
1 0 0 0
2 1 1 1

Related

Performing operation on 2D array using indices from 1D array

I have the following array in python:
a = np.array([[1,1,1],[1,1,1],[1,1,1]])
and the following index array:
b = np.array([0,1,2])
I want to index a using b such that I can subtract 1 from the matching row/column and get the following result:
[[0,1,1],[1,0,1],[1,1,0]]
I can do it using loops, wanted to know if there was a "non-loop" way of doing it.
for i in range(len(b)):
a[i][b[i]] = a[i][b[i]] - 1
It looks like there is some confusion on how to handle this.
You want a simple indexing:
a[np.arange(len(a)), b] -= 1
Output:
array([[0, 1, 1],
[1, 0, 1],
[1, 1, 0]])
Output for b = np.array([2,0,1])
array([[1, 1, 0],
[0, 1, 1],
[1, 0, 1]])
Your code produces output as follows:
a = np.array([[1,1,1],[1,1,1],[1,1,1]])
b = np.array([0,1,2])
for i in range(len(b)):
a[i][b[i]] = a[i][b[i]] - 1
Output:
array([[0, 1, 1],
[1, 0, 1],
[1, 1, 0]])
This can be done in non -loopy way as follows:
a[np.arange(len(b)),b] -= 1
print(a)
Output:
array([[0, 1, 1],
[1, 0, 1],
[1, 1, 0]])

Python/NumPy: find the first index of zero, then replace all elements with zero after that for each row

I have an numpy array like this:
a = np.array([[1, 0, 1, 1, 1],
[1, 1, 1, 1, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 1]])
Question 1:
As shown in the title, I want to replace all elements with zero after the first zero appeared. The result should be like this :
a = np.array([[1, 0, 0, 0, 0],
[1, 1, 1, 1, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]])
Question 2: how to slice different columns for each row like this example?
As I am dealing with an array with large size. If any one could find an efficient way to solve this please. Thank you very much.
One way to accomplish question 1 is to use numpy.cumprod
>>> np.cumprod(a, axis=1)
array([[1, 0, 0, 0, 0],
[1, 1, 1, 1, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]])
Question 1:
You could iterate over the array like so:
for i in range(a.shape[0]):
j = 0
row = a[i]
while row[j]>0:
j += 1
row[j+1:] = 0
This will change the array in-place. If you are interested in very high performance, the answers to this question could be of use to find the first zero faster. np.where scans the entire array for this and therefore is not optimal for the task.
Actually, the fastest solution will depend a bit on the distribution of your array entries: If there are many floats in there and rarely is there ever a zero, the while loops in the code above will interrupt late on average, requiring to write only "a few" zeros. If however there are only two possible entries like in your sample array and these occur with a similar probability (i.e. ~50%), there would be a lot of zeros to be written to a, and the following will be faster:
b = np.zeros(a.shape)
for i in range(a.shape[0]):
j = 0
a_row = a[i]
b_row = b[i]
while a_row[j]>0:
b_row[j] = a_row[j]
j += 1
Question 2:
If you mean to slice each row individually on a similar criterion dealing with a first occurence of some kind, you could simply adapt this iteration pattern. If the criterion is more global (like finding the maximum of the row, for example) built-in methods like np.where exist that will be more efficient, but it probably would depend a bit on the criterion itself which choice is best.
Question 1: An efficient way to do this would be the following.
import numpy as np
a = np.array([[1, 0, 1, 1, 1],
[1, 1, 1, 1, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 1]])
for row in a:
zeros = np.where(row == 0)[0]
if (len(zeros)):# Check if zero exists
row[zeros[0]:] = 0
print(a)
Output:
[[1 0 0 0 0]
[1 1 1 1 0]
[1 0 0 0 0]
[1 0 0 0 0]]
Question 2: Using the same array, for each row rowIdx, you can have a array of columns colIdxs that you want to extract from.
rowIdx = 2
colIdxs = [1, 3, 4]
print(a[rowIdx, colIdxs])
Output:
[0 1 1]
I prefer Ayrat's creative answer for the first question, but if you need to slice different columns for different rows in large size, this could help you:
indexer = tuple(np.s_[i:a.shape[1]] for i in (a==0).argmax(axis=1))
for i,j in enumerate(indexer):
a[i,j]=0
indexer:
(slice(1, 5, None), slice(4, 5, None), slice(1, 5, None), slice(1, 5, None))
or:
indexer = (a==0).argmax(axis=1)
for i in range(a.shape[0]):
a[i,indexer[i]:]=0
indexer:
[1 4 1 1]
output:
[[1 0 0 0 0]
[1 1 1 1 0]
[1 0 0 0 0]
[1 0 0 0 0]]

Appending a list based on the column that the data comes from

I'm attempting to append a binary numpy array to another numpy array to feed into a neural network. The binary list is dependent on the column that the array is coming from.
For example, an array that comes from the third column is [0 0 1 0 0 0 0 0 0].
Here is an example:
Data (list of arrays):
[[0, 1, 1, 1, 0], [0, 1, 0, 0, 1], [1, 0, 0, 0, 0]]
Let's say that the first two elements came from the first column of a dataframe and the third element came from the second column. After appending the binary array the data would look something like this:
[([0, 1, 1, 1, 0],
[1 0 0 0 0 0 0 0 0]),
([0, 1, 0, 0, 1],
[1 0 0 0 0 0 0 0 0]),
([1, 0, 0, 0, 0],
[0 1 0 0 0 0 0 0 0])]
For context, I was originally training on just a single column of a dataframe, however I want to be able to train over the entire dataframe now.
Is there a way to automatically append this array to my data depending on the column the data is coming from so that the neural network can train on the whole data set rather than just going column by column?
Additionally, would this require two input layers or just one?
Maybe you could add a more concrete example to your question. But anyway, is this what you're expecting?
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'col1': [[0,0,1], [1,1,1]], 'col2': [[1,1,0],[0,0,0]]})
In [3]: df
Out[3]:
col1 col2
0 [0, 0, 1] [1, 1, 0]
1 [1, 1, 1] [0, 0, 0]
In [4]: for col_index, col_name in enumerate(df.columns):
...: array_to_append = [0] * len(df.columns)
...: array_to_append[col_index] = 1
...: df[col_name] = df[col_name].map(lambda x: (x, array_to_append))
...:
In [5]: df
Out[5]:
col1 col2
0 ([0, 0, 1], [1, 0]) ([1, 1, 0], [0, 1])
1 ([1, 1, 1], [1, 0]) ([0, 0, 0], [0, 1])

How can I unnest list in column when calling pd.values

I have a pd dataframe. When I call pd.values, the result is like:
np.array([
[1, 2, [0, 0, 0], 3],
[1, 2, [0, 0, 0], 3]
])
and I want it to look like this when calling pd.values:
np.array([
[1, 2, 0, 0, 0, 3],
[1, 2, 0, 0, 0, 3]
])
Please help me out.
Assuming your dataframe is:
df = pd.DataFrame([
[1, 2, [0, 0, 0], 3],
[1, 2, [0, 0, 0], 3]
])
I'd use the insight from this post by #wim where I present the modified function below.
This flattens arbitrarily nested collections.
from collections import Iterable
def flatten(collection):
for element in collection:
if isinstance(element, Iterable) and not isinstance(element, str):
yield from flatten(element)
else:
yield element
I can then use this to flatten each row of the dataframe:
pd.DataFrame([*map(list, map(flatten, df.values))])
0 1 2 3 4 5
0 1 2 0 0 0 3
1 1 2 0 0 0 3

Python Numpy. Manipulating with 2 matrices

I have 2 CSV files with the same size. Values are 1s and 0s.
I need to loop over 2 files (matrices) and create a new matrix using the following logic:
if matrix A value = 1 and matrix B value = 1
then
result value is 0,
if 1 and 0
then
0,
if 0 and 0
then
0.
A = [
[1, 0, 1],
[1, 1, 1]
]
B = [
[1, 0, 0],
[1, 0, 0]
]
=>
C = [
[0, 0, 1],
[0, 1, 1]
]
I know that Numpy is used to loop and manipulate with matrices and arrays, but I stuck to find how to do it in a proper way.
Here is one way to get your desired output, but I think the logic you described was not quite what you meant. This outputs an array of 1 where your matrices are different from one another, and 0 where they are alike.
A = np.array([
[1, 0, 1],
[1, 1, 1]
])
B = np.array([
[1, 0, 0],
[1, 0, 0]])
C = (A != B).astype('int')
array([[0, 0, 1],
[0, 1, 1]])

Categories

Resources