How Do You Check if an Entry Is in Pandas DataFrame?

How Do You Check if an Entry Is in Pandas DataFrame? - python

How do I check if an entry is in a pandas DataFrame? For example, say
import pandas as pd
df = pd.DataFrame({'A' : [1,2,3], 'B' : [4,5,6], 'C' : [7,8,9]})
How do I check if the entry 1,4 exists in an entry under columns A,B, regardless of what is in C?

You can pass a dictionary (to isin) with the values to search by column:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
result = df.isin({'A': [1], 'B': [4]})
print(result)
Output
A B
0 True True
1 False False
2 False False
Afterwards you can find if the entry exists using all:
result = df.isin({'A': [1], 'B': [4]}).all(1)
print(result)
Output
0 True
1 False
2 False
dtype: bool
To use it in an if statement, and just on the columns ['A', 'B'], use any, for example:
if df[['A', 'B']].isin({'A': [1], 'B': [4]}).all(1).any():
print('found')

Related

How to fill column with condition in polars

I would like to add new column using other column value with condition
In pandas, I do this like below
import pandas as pd
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
df['c'] = df['a']
df.loc[df['b']==4, 'c'] = df['b']
The result is
a
b
c
1
3
1
2
4
4
Could you teach me how to do this with polars?

Use when/then/otherwise
df = pl.DataFrame({'a': [1, 2], 'b': [3, 4]})
df.with_columns(
pl.when(pl.col("b") == 4).then(pl.col('b')).otherwise(pl.col('a')).alias("c")
)

Filling column of dataframe based on 'groups' of values of another column

I am trying to fill values of a column based on the value of another column. Suppose I have the following dataframe:
import pandas as pd
data = {'A': [4, 4, 5, 6],
'B': ['a', np.nan, np.nan, 'd']}
df = pd.DataFrame(data)
And I would like to fill column B but only if the value of column A equals 4. Hence, all rows that have the same value as another in column A should have the same value in column B (by filling this).
Thus, the desired output should be:
data = {'A': [4, 4, 5, 6],
'B': ['a', a, np.nan, 'd']}
df = pd.DataFrame(data)
I am aware of the fillna method, but this gives the wrong output as the third row also gets the value 'A' assigned:
df['B'] = fillna(method="ffill", inplace=True)
data = {'A': [4, 4, 5, 6],
'B': ['a', 'a', 'a', 'd']}
df = pd.DataFrame(data)
How can I get the desired output?

Try this:
df['B'] = df.groupby('A')['B'].ffill()
Output:
>>> df
A B
0 4 a
1 4 a
2 5 NaN
3 6 d

How to convert Pandas data frame to dict with values in a list

I have a huge Pandas data frame with the structure follows as an example below:
import pandas as pd
df = pd.DataFrame({'col1': ['A', 'A', 'B', 'C', 'C', 'C'], 'col2': [1, 2, 5, 2, 4, 6]})
df
col1 col2
0 A 1
1 A 2
2 B 5
3 C 2
4 C 4
5 C 6
The task is to build a dictionary with elements in col1 as keys and corresponding elements in col2 as values. For the example above the output should be:
A -> [1, 2]
B -> [5]
C -> [2, 4, 6]
Although I write a solution as
from collections import defaultdict
dd = defaultdict(set)
for row in df.itertuples():
dd[row.col1].append(row.col2)
I wonder if somebody is aware of a more "Python-native" solution, using in-build pandas functions.

Without apply we do it by for loop
{x : y.tolist() for x , y in df.col2.groupby(df.col1)}
{'A': [1, 2], 'B': [5], 'C': [2, 4, 6]}

Use GroupBy.apply with list for Series of lists and then Series.to_dict:
d = df.groupby('col1')['col2'].apply(list).to_dict()
print (d)
{'A': [1, 2], 'B': [5], 'C': [2, 4, 6]}

How to cast a Pandas Dataframe column into 'list' type?

Following the Pandas explode() method documentation, one can explode a list in a row into multiple rows:
df = pd.DataFrame({'A': [[1, 2, 3], 'foo', None, [3, 4]], 'B': 1})
df.explode('A')
However, the DataFrame() I get from a database contains lists that are seen as strings :
df = pd.DataFrame({'A': ["[1, 2, 3]", 'foo', None, "[3, 4]"], 'B': 1})
df.explode('A')
# Does not fail, but does not explode "[1, 2, 3]"
Outside Pandas, I use ast.literal_eval() but don't know how to make it work for my column.
How to cast my 'A' column as list so that explode('A') works ?

You could try with DataFrame.apply:
def f(x):
try:
return ast.literal_eval(x)
except Exception:
return x
df['A']=df['A'].apply(f)
df.explode('A')
A B
0 1 1
0 2 1
0 3 1
1 foo 1
2 None 1
3 3 1
3 4 1

Pandas: Convert dataframe to dict of lists

I have a dataframe like this:
col1, col2
A 0
A 1
B 2
C 3
I would like to get this:
{ A: [0,1], B: [2], C: [3] }
I tried:
df.set_index('col1')['col2'].to_dict()
but that is not quite correct. The first issue I have is 'A' is repeated, I end up getting A:1 only (0 gets overwritten). How to fix?

You can use a dictionary comprehension on a groupby.
>>> {idx: group['col2'].tolist()
for idx, group in df.groupby('col1')}
{'A': [0, 1], 'B': [2], 'C': [3]}

Solution
df.groupby('col1')['col2'].apply(lambda x: x.tolist()).to_dict()
{'A': [0, 1], 'B': [2], 'C': [3]}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How Do You Check if an Entry Is in Pandas DataFrame? - python

How do I check if an entry is in a pandas DataFrame? For example, say import pandas as pd df = pd.DataFrame({'A' : [1,2,3], 'B' : [4,5,6], 'C' : [7,8,9]}) How do I check if the entry 1,4 exists in an entry under columns A,B, regardless of what is in C?

Related

How to fill column with condition in polars

Filling column of dataframe based on 'groups' of values of another column

How to convert Pandas data frame to dict with values in a list

How to cast a Pandas Dataframe column into 'list' type?

Pandas: Convert dataframe to dict of lists

Categories

Resources