Pandas: Find the max value in one column containing lists

Pandas: Find the max value in one column containing lists - python

I have a dataframe like this:
fly_frame:
day plcae
0 [1,2,3,4,5] A
1 [1,2,3,4] B
2 [1,2] C
3 [1,2,3,4] D
If I want to find the max value in each entry in the day column.
For example:
fly_frame:
day plcae
0 5 A
1 4 B
2 2 C
3 4 D
What should I do?
Thanks for your help.

df.day.apply(max)
#0 5
#1 4
#2 2
#3 4

Use apply with max:
#if strings
#import ast
#print (type(df.loc[0, 'day']))
#<class 'str'>
#df['day'] = df['day'].apply(ast.literal_eval)
print (type(df.loc[0, 'day']))
<class 'list'>
df['day'] = df['day'].apply(max)
Or list comprehension:
df['day'] = [max(x) for x in df['day']]
print (df)
day plcae
0 5 A
1 4 B
2 2 C
3 4 D

Try a combination of pd.concat() and df.apply() with:
import numpy as np
import pandas as pd
fly_frame = pd.DataFrame({'day':[[1,2,3,4,5],[1,2,3,4],[1,2],[1,2,3,4]],'place':['A','B','C','D']})
df = pd.concat([fly_frame['day'].apply(max),fly_frame.drop('day',axis=1)],axis=1)
print(df)
day place
0 5 A
1 4 B
2 2 C
3 4 D
Edit
You can also use df.join() with:
fly_frame.drop('day',axis=1).join(fly_frame['day'].apply(np.max,axis=0))
place day
0 A 5
1 B 4
2 C 2
3 D 4

I suggest bringing your dataframe into a better format first.
>>> df
day plcae
0 [1, 2, 3, 4, 5] A
1 [1, 2, 3, 4] B
2 [1, 2] C
3 [1, 2, 3, 4] D
>>>
>>> df = pd.concat([df.pop('day').apply(pd.Series), df], axis=1)
>>> df
0 1 2 3 4 plcae
0 1.0 2.0 3.0 4.0 5.0 A
1 1.0 2.0 3.0 4.0 NaN B
2 1.0 2.0 NaN NaN NaN C
3 1.0 2.0 3.0 4.0 NaN D
Now everything is easier, for example computing the maximum of numeric values along the columns.
>>> df.max(axis=1)
0 5.0
1 4.0
2 2.0
3 4.0
dtype: float64
edit: renaming the index might also be useful to you.
>>> df.max(axis=1).rename(df['plcae'])
A 5.0
B 4.0
C 2.0
D 4.0
dtype: float64

Related

Pandas Dataframe multiple rows with same index

I have a dictionary that looks like this:
dict = {
"A": [1,2,3],
"B": [4]
}
when I try to create a panda Dataframe I use:
output_df = pd.DataFrame.from_dict(dict, orient='index')
Output:
-
1
2
3
A
1
2
3
B
4
What I want:
-
1
A
1
A
2
A
3
B
4
Thanks for your help! :)

try:
df.stack().swaplevel(0,1)
1 A 1.0
2 A 2.0
3 A 3.0
1 B 4.0
dtype: float64
df.stack().swaplevel(0,1).reset_index(level=[1], name='a').reset_index(drop=True)
level_1 a
0 A 1.0
1 A 2.0
2 A 3.0
3 B 4.0

Get Index of matching string from Two dataframe

I have two data frames. I need to search through datframe 2 to see whichone matches in in datframe 1. And replace the string with its index.
So I Want a third data frame indicating the index of the matching string from dataframe 2 to dataframe 1.
X = pd.DataFrame(np.array(['A','B','C','D','AA','AB','AC','AD','BA','BB','BC','AD']).reshape(4,3),columns=['a','b','c'])
a b c
0 A B C
1 D AA AB
2 AC AD BA
3 BB BC AD
Y = pd.DataFrame(np.array(['A','AA','AC','D','B','AB','C','AD','BC','BB']).reshape(10,1),columns=['X'])
X
0 A
1 AA
2 AC
3 D
4 B
5 AB
6 C
7 AD
8 BC
9 BB
Resulting Datafreme
a b c
0 0 4 6
1 3 1 5
2 2 7 NA
3 9 8 7
Some guy suggested me with the following code but does not seems okay. Not working.
t = pd.merge(df1.stack().reset_index(), df2.reset_index(), left_on = 0, right_on = "0")
res = t.set_index(["level_0", "level_1"]).drop([0, "0"], axis=1).unstack()
print(res)

Use apply with map:
Y = Y.reset_index().set_index('X')['index']
X = X.apply(lambda x: x.map(Y))
print(X)
a b c
0 0 4 6.0
1 3 1 5.0
2 2 7 NaN
3 9 8 7.0

Step1 : Create a mapping from Y :
mapping = {value: key for key, value in Y.T.to_dict("records")[0].items()}
mapping
{'A': 0,
'AA': 1,
'AC': 2,
'D': 3,
'B': 4,
'AB': 5,
'C': 6,
'AD': 7,
'BC': 8,
'BB': 9}
Step 2: stack the X column, map the mapping to the stacked dataframe, and unstack to get back to the original shape :
X.stack().map(mapping).unstack()
a b c
0 0.0 4.0 6.0
1 3.0 1.0 5.0
2 2.0 7.0 NaN
3 9.0 8.0 7.0
Alternatively, you can avoid the stack/unstack step and use replace, with pd.to_numeric :
X.replace(mapping).apply(pd.to_numeric, errors="coerce")
No tests done, just my gut feeling that mapping should be faster.

Short solution based on applymap:
X.applymap(lambda x: Y[Y.X==x].index.max())
result:
a b c
0 0 4 6.0
1 3 1 5.0
2 2 7 NaN
3 9 8 7.0

Y = pd.Series(Y.index, index=Y.X).sort_index()
will give you a more easily searchable object... then something like
flat = X.to_numpy().flatten()
Y = Y.reindex(np.unique(flatten)) # all items need to be in index to be able to use loc[list]
res = pd.DataFrame(Y.loc[flat].reshape(X.shape), columns=X.columns)

Let us do
X = X.where(X.isin(Y.X.tolist())).replace(dict(zip(Y.X,Y.index)))
Out[15]:
a b c
0 0 4 6.0
1 3 1 5.0
2 2 7 NaN
3 9 8 7.0

Setting values in one dataframe from the boolean values in another

I have a MWE that can be reproduced with the following code:
import pandas as pd
a = pd.DataFrame([[1,2],[3,4]], columns=['A', 'B'])
b = pd.DataFrame([[True,False],[False,True]], columns=['A', 'B'])
Which creates the following dataframes:
In [8]: a
Out[8]:
A B
0 1 2
1 3 4
In [9]: b
Out[9]:
A B
0 True False
1 False True
My question is, how can I change the values for dataframe A based on the boolean values in dataframe B?
Say for example if I wanted to make NAN values in dataframe A where there's an instance of False in dataframe B?

If need replace False to NaN:
print (a[b])
A B
0 1.0 NaN
1 NaN 4.0
or:
print (a.where(b))
A B
0 1.0 NaN
1 NaN 4.0
and if need replace True to NaN:
print (a[~b])
A B
0 NaN 2.0
1 3.0 NaN
or:
print (a.mask(b))
A B
0 NaN 2.0
1 3.0 NaN
Also you can use where or mask with some scalar value:
print (a.where(b, 7))
A B
0 1 7
1 7 4
print (a.mask(b, 7))
A B
0 7 2
1 3 7
print (a.where(b, 'TEST'))
A B
0 1 TEST
1 TEST 4

Why Python Pandas append to DataFrame like this?

I want to add l in column 'A' but it creates a new column and adds l to the last one. Why is it happening? And how can I make what I want?
import pandas as pd
l=[1,2,3]
df = pd.DataFrame(columns =['A'])
df = df.append(l, ignore_index=True)
df = df.append(l, ignore_index=True)
print(df)
A 0
0 NaN 1.0
1 NaN 2.0
2 NaN 3.0
3 NaN 1.0
4 NaN 2.0
5 NaN 3.0

Edited
Is this what you want to do:
In[6]:df=df.A.append(pd.Series(l)).reset_index().drop('index',1).rename(columns={0:'A'})
In[7]:df
Out[7]:
A
0 1
1 2
2 3
Then you can add any list of different length.
Suppose:
a=[9,8,7,6,5]
In[11]:df=df.A.append(pd.Series(a)).reset_index().drop('index',1).rename(columns={0:'A'})
In[12]:df
Out[12]:
A
0 1
1 2
2 3
3 9
4 8
5 7
6 6
7 5
Previously
are you looking for this :
df=pd.DataFrame(l,columns=['A'])
df
Out[5]:
A
0 1
1 2
2 3

You can just pass a dictionary in the dataframe constructor, that if I understand your question correctly.
l = [1,2,3]
df = pd.DataFrame({'A': l})
df
A
0 1
1 2
2 3

Value Counts of Column Slice to Contain All Possible Unique Values in Column

I have a df that looks like this:
group val
A 1
A 1
A 2
B 1
B 2
B 3
I want to get the value_counts for each group separately, but want to show all possible values for each value_count group:
> df[df['group']=='A']['val'].value_counts()
1 2
2 1
3 NaN
Name: val, dtype: int64
But it currently looks like this:
> df[df['group']=='A']['val'].value_counts()
1 2
2 1
Name: val, dtype: int64
Any one know any way I can show value_counts with all possible values represented?

In [185]: df.groupby('group')['val'].value_counts().unstack('group')
Out[185]:
group A B
val
1 2.0 1.0
2 1.0 1.0
3 NaN 1.0
In [186]: df.groupby('group')['val'].value_counts().unstack('group')['A']
Out[186]:
val
1 2.0
2 1.0
3 NaN
Name: A, dtype: float64

This works:
from io import StringIO
import pandas as pd
import numpy as np
data = StringIO("""group,val
A,1
A,1
A,2
B,1
B,2
B,3""")
df = pd.read_csv(data)
print(df, '\n')
res_idx = pd.MultiIndex.from_product([df['group'].unique(), df['val'].unique()])
res = pd.concat([pd.DataFrame(index=res_idx),
df.groupby('group').apply(lambda x: x['val'].value_counts())],
axis=1)
print(res)
Produces:
group val
0 A 1
1 A 1
2 A 2
3 B 1
4 B 2
5 B 3
val
A 1 2.0
2 1.0
3 NaN
B 1 1.0
2 1.0
3 1.0

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas: Find the max value in one column containing lists - python

I have a dataframe like this: fly_frame: day plcae 0 [1,2,3,4,5] A 1 [1,2,3,4] B 2 [1,2] C 3 [1,2,3,4] D If I want to find the max value in each entry in the day column. For example: fly_frame: day plcae 0 5 A 1 4 B 2 2 C 3 4 D What should I do? Thanks for your help.

df.day.apply(max) #0 5 #1 4 #2 2 #3 4

Related

Pandas Dataframe multiple rows with same index

Get Index of matching string from Two dataframe

Setting values in one dataframe from the boolean values in another

Why Python Pandas append to DataFrame like this?

Value Counts of Column Slice to Contain All Possible Unique Values in Column

Categories

Resources