How Do You Check if an Entry Is in Pandas DataFrame? - python

How do I check if an entry is in a pandas DataFrame? For example, say
import pandas as pd
df = pd.DataFrame({'A' : [1,2,3], 'B' : [4,5,6], 'C' : [7,8,9]})
How do I check if the entry 1,4 exists in an entry under columns A,B, regardless of what is in C?

You can pass a dictionary (to isin) with the values to search by column:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
result = df.isin({'A': [1], 'B': [4]})
print(result)
Output
A B
0 True True
1 False False
2 False False
Afterwards you can find if the entry exists using all:
result = df.isin({'A': [1], 'B': [4]}).all(1)
print(result)
Output
0 True
1 False
2 False
dtype: bool
To use it in an if statement, and just on the columns ['A', 'B'], use any, for example:
if df[['A', 'B']].isin({'A': [1], 'B': [4]}).all(1).any():
print('found')

Related

How to fill column with condition in polars

I would like to add new column using other column value with condition
In pandas, I do this like below
import pandas as pd
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
df['c'] = df['a']
df.loc[df['b']==4, 'c'] = df['b']
The result is
a
b
c
1
3
1
2
4
4
Could you teach me how to do this with polars?
Use when/then/otherwise
df = pl.DataFrame({'a': [1, 2], 'b': [3, 4]})
df.with_columns(
pl.when(pl.col("b") == 4).then(pl.col('b')).otherwise(pl.col('a')).alias("c")
)

Filling column of dataframe based on 'groups' of values of another column

I am trying to fill values of a column based on the value of another column. Suppose I have the following dataframe:
import pandas as pd
data = {'A': [4, 4, 5, 6],
'B': ['a', np.nan, np.nan, 'd']}
df = pd.DataFrame(data)
And I would like to fill column B but only if the value of column A equals 4. Hence, all rows that have the same value as another in column A should have the same value in column B (by filling this).
Thus, the desired output should be:
data = {'A': [4, 4, 5, 6],
'B': ['a', a, np.nan, 'd']}
df = pd.DataFrame(data)
I am aware of the fillna method, but this gives the wrong output as the third row also gets the value 'A' assigned:
df['B'] = fillna(method="ffill", inplace=True)
data = {'A': [4, 4, 5, 6],
'B': ['a', 'a', 'a', 'd']}
df = pd.DataFrame(data)
How can I get the desired output?
Try this:
df['B'] = df.groupby('A')['B'].ffill()
Output:
>>> df
A B
0 4 a
1 4 a
2 5 NaN
3 6 d

How to convert Pandas data frame to dict with values in a list

I have a huge Pandas data frame with the structure follows as an example below:
import pandas as pd
df = pd.DataFrame({'col1': ['A', 'A', 'B', 'C', 'C', 'C'], 'col2': [1, 2, 5, 2, 4, 6]})
df
col1 col2
0 A 1
1 A 2
2 B 5
3 C 2
4 C 4
5 C 6
The task is to build a dictionary with elements in col1 as keys and corresponding elements in col2 as values. For the example above the output should be:
A -> [1, 2]
B -> [5]
C -> [2, 4, 6]
Although I write a solution as
from collections import defaultdict
dd = defaultdict(set)
for row in df.itertuples():
dd[row.col1].append(row.col2)
I wonder if somebody is aware of a more "Python-native" solution, using in-build pandas functions.
Without apply we do it by for loop
{x : y.tolist() for x , y in df.col2.groupby(df.col1)}
{'A': [1, 2], 'B': [5], 'C': [2, 4, 6]}
Use GroupBy.apply with list for Series of lists and then Series.to_dict:
d = df.groupby('col1')['col2'].apply(list).to_dict()
print (d)
{'A': [1, 2], 'B': [5], 'C': [2, 4, 6]}

How to cast a Pandas Dataframe column into 'list' type?

Following the Pandas explode() method documentation, one can explode a list in a row into multiple rows:
df = pd.DataFrame({'A': [[1, 2, 3], 'foo', None, [3, 4]], 'B': 1})
df.explode('A')
However, the DataFrame() I get from a database contains lists that are seen as strings :
df = pd.DataFrame({'A': ["[1, 2, 3]", 'foo', None, "[3, 4]"], 'B': 1})
df.explode('A')
# Does not fail, but does not explode "[1, 2, 3]"
Outside Pandas, I use ast.literal_eval() but don't know how to make it work for my column.
How to cast my 'A' column as list so that explode('A') works ?
You could try with DataFrame.apply:
def f(x):
try:
return ast.literal_eval(x)
except Exception:
return x
df['A']=df['A'].apply(f)
df.explode('A')
A B
0 1 1
0 2 1
0 3 1
1 foo 1
2 None 1
3 3 1
3 4 1

Pandas: Convert dataframe to dict of lists

I have a dataframe like this:
col1, col2
A 0
A 1
B 2
C 3
I would like to get this:
{ A: [0,1], B: [2], C: [3] }
I tried:
df.set_index('col1')['col2'].to_dict()
but that is not quite correct. The first issue I have is 'A' is repeated, I end up getting A:1 only (0 gets overwritten). How to fix?
You can use a dictionary comprehension on a groupby.
>>> {idx: group['col2'].tolist()
for idx, group in df.groupby('col1')}
{'A': [0, 1], 'B': [2], 'C': [3]}
Solution
df.groupby('col1')['col2'].apply(lambda x: x.tolist()).to_dict()
{'A': [0, 1], 'B': [2], 'C': [3]}

Categories

Resources