I have a dataframe like this:
fly_frame:
day plcae
0 [1,2,3,4,5] A
1 [1,2,3,4] B
2 [1,2] C
3 [1,2,3,4] D
If I want to find the max value in each entry in the day column.
For example:
fly_frame:
day plcae
0 5 A
1 4 B
2 2 C
3 4 D
What should I do?
Thanks for your help.
df.day.apply(max)
#0 5
#1 4
#2 2
#3 4
Use apply with max:
#if strings
#import ast
#print (type(df.loc[0, 'day']))
#<class 'str'>
#df['day'] = df['day'].apply(ast.literal_eval)
print (type(df.loc[0, 'day']))
<class 'list'>
df['day'] = df['day'].apply(max)
Or list comprehension:
df['day'] = [max(x) for x in df['day']]
print (df)
day plcae
0 5 A
1 4 B
2 2 C
3 4 D
Try a combination of pd.concat() and df.apply() with:
import numpy as np
import pandas as pd
fly_frame = pd.DataFrame({'day':[[1,2,3,4,5],[1,2,3,4],[1,2],[1,2,3,4]],'place':['A','B','C','D']})
df = pd.concat([fly_frame['day'].apply(max),fly_frame.drop('day',axis=1)],axis=1)
print(df)
day place
0 5 A
1 4 B
2 2 C
3 4 D
Edit
You can also use df.join() with:
fly_frame.drop('day',axis=1).join(fly_frame['day'].apply(np.max,axis=0))
place day
0 A 5
1 B 4
2 C 2
3 D 4
I suggest bringing your dataframe into a better format first.
>>> df
day plcae
0 [1, 2, 3, 4, 5] A
1 [1, 2, 3, 4] B
2 [1, 2] C
3 [1, 2, 3, 4] D
>>>
>>> df = pd.concat([df.pop('day').apply(pd.Series), df], axis=1)
>>> df
0 1 2 3 4 plcae
0 1.0 2.0 3.0 4.0 5.0 A
1 1.0 2.0 3.0 4.0 NaN B
2 1.0 2.0 NaN NaN NaN C
3 1.0 2.0 3.0 4.0 NaN D
Now everything is easier, for example computing the maximum of numeric values along the columns.
>>> df.max(axis=1)
0 5.0
1 4.0
2 2.0
3 4.0
dtype: float64
edit: renaming the index might also be useful to you.
>>> df.max(axis=1).rename(df['plcae'])
A 5.0
B 4.0
C 2.0
D 4.0
dtype: float64
Related
I have a dictionary that looks like this:
dict = {
"A": [1,2,3],
"B": [4]
}
when I try to create a panda Dataframe I use:
output_df = pd.DataFrame.from_dict(dict, orient='index')
Output:
-
1
2
3
A
1
2
3
B
4
What I want:
-
1
A
1
A
2
A
3
B
4
Thanks for your help! :)
try:
df.stack().swaplevel(0,1)
1 A 1.0
2 A 2.0
3 A 3.0
1 B 4.0
dtype: float64
df.stack().swaplevel(0,1).reset_index(level=[1], name='a').reset_index(drop=True)
level_1 a
0 A 1.0
1 A 2.0
2 A 3.0
3 B 4.0
I have two data frames. I need to search through datframe 2 to see whichone matches in in datframe 1. And replace the string with its index.
So I Want a third data frame indicating the index of the matching string from dataframe 2 to dataframe 1.
X = pd.DataFrame(np.array(['A','B','C','D','AA','AB','AC','AD','BA','BB','BC','AD']).reshape(4,3),columns=['a','b','c'])
a b c
0 A B C
1 D AA AB
2 AC AD BA
3 BB BC AD
Y = pd.DataFrame(np.array(['A','AA','AC','D','B','AB','C','AD','BC','BB']).reshape(10,1),columns=['X'])
X
0 A
1 AA
2 AC
3 D
4 B
5 AB
6 C
7 AD
8 BC
9 BB
Resulting Datafreme
a b c
0 0 4 6
1 3 1 5
2 2 7 NA
3 9 8 7
Some guy suggested me with the following code but does not seems okay. Not working.
t = pd.merge(df1.stack().reset_index(), df2.reset_index(), left_on = 0, right_on = "0")
res = t.set_index(["level_0", "level_1"]).drop([0, "0"], axis=1).unstack()
print(res)
Use apply with map:
Y = Y.reset_index().set_index('X')['index']
X = X.apply(lambda x: x.map(Y))
print(X)
a b c
0 0 4 6.0
1 3 1 5.0
2 2 7 NaN
3 9 8 7.0
Step1 : Create a mapping from Y :
mapping = {value: key for key, value in Y.T.to_dict("records")[0].items()}
mapping
{'A': 0,
'AA': 1,
'AC': 2,
'D': 3,
'B': 4,
'AB': 5,
'C': 6,
'AD': 7,
'BC': 8,
'BB': 9}
Step 2: stack the X column, map the mapping to the stacked dataframe, and unstack to get back to the original shape :
X.stack().map(mapping).unstack()
a b c
0 0.0 4.0 6.0
1 3.0 1.0 5.0
2 2.0 7.0 NaN
3 9.0 8.0 7.0
Alternatively, you can avoid the stack/unstack step and use replace, with pd.to_numeric :
X.replace(mapping).apply(pd.to_numeric, errors="coerce")
No tests done, just my gut feeling that mapping should be faster.
Short solution based on applymap:
X.applymap(lambda x: Y[Y.X==x].index.max())
result:
a b c
0 0 4 6.0
1 3 1 5.0
2 2 7 NaN
3 9 8 7.0
Y = pd.Series(Y.index, index=Y.X).sort_index()
will give you a more easily searchable object... then something like
flat = X.to_numpy().flatten()
Y = Y.reindex(np.unique(flatten)) # all items need to be in index to be able to use loc[list]
res = pd.DataFrame(Y.loc[flat].reshape(X.shape), columns=X.columns)
Let us do
X = X.where(X.isin(Y.X.tolist())).replace(dict(zip(Y.X,Y.index)))
Out[15]:
a b c
0 0 4 6.0
1 3 1 5.0
2 2 7 NaN
3 9 8 7.0
I have a MWE that can be reproduced with the following code:
import pandas as pd
a = pd.DataFrame([[1,2],[3,4]], columns=['A', 'B'])
b = pd.DataFrame([[True,False],[False,True]], columns=['A', 'B'])
Which creates the following dataframes:
In [8]: a
Out[8]:
A B
0 1 2
1 3 4
In [9]: b
Out[9]:
A B
0 True False
1 False True
My question is, how can I change the values for dataframe A based on the boolean values in dataframe B?
Say for example if I wanted to make NAN values in dataframe A where there's an instance of False in dataframe B?
If need replace False to NaN:
print (a[b])
A B
0 1.0 NaN
1 NaN 4.0
or:
print (a.where(b))
A B
0 1.0 NaN
1 NaN 4.0
and if need replace True to NaN:
print (a[~b])
A B
0 NaN 2.0
1 3.0 NaN
or:
print (a.mask(b))
A B
0 NaN 2.0
1 3.0 NaN
Also you can use where or mask with some scalar value:
print (a.where(b, 7))
A B
0 1 7
1 7 4
print (a.mask(b, 7))
A B
0 7 2
1 3 7
print (a.where(b, 'TEST'))
A B
0 1 TEST
1 TEST 4
I want to add l in column 'A' but it creates a new column and adds l to the last one. Why is it happening? And how can I make what I want?
import pandas as pd
l=[1,2,3]
df = pd.DataFrame(columns =['A'])
df = df.append(l, ignore_index=True)
df = df.append(l, ignore_index=True)
print(df)
A 0
0 NaN 1.0
1 NaN 2.0
2 NaN 3.0
3 NaN 1.0
4 NaN 2.0
5 NaN 3.0
Edited
Is this what you want to do:
In[6]:df=df.A.append(pd.Series(l)).reset_index().drop('index',1).rename(columns={0:'A'})
In[7]:df
Out[7]:
A
0 1
1 2
2 3
Then you can add any list of different length.
Suppose:
a=[9,8,7,6,5]
In[11]:df=df.A.append(pd.Series(a)).reset_index().drop('index',1).rename(columns={0:'A'})
In[12]:df
Out[12]:
A
0 1
1 2
2 3
3 9
4 8
5 7
6 6
7 5
Previously
are you looking for this :
df=pd.DataFrame(l,columns=['A'])
df
Out[5]:
A
0 1
1 2
2 3
You can just pass a dictionary in the dataframe constructor, that if I understand your question correctly.
l = [1,2,3]
df = pd.DataFrame({'A': l})
df
A
0 1
1 2
2 3
I have a df that looks like this:
group val
A 1
A 1
A 2
B 1
B 2
B 3
I want to get the value_counts for each group separately, but want to show all possible values for each value_count group:
> df[df['group']=='A']['val'].value_counts()
1 2
2 1
3 NaN
Name: val, dtype: int64
But it currently looks like this:
> df[df['group']=='A']['val'].value_counts()
1 2
2 1
Name: val, dtype: int64
Any one know any way I can show value_counts with all possible values represented?
In [185]: df.groupby('group')['val'].value_counts().unstack('group')
Out[185]:
group A B
val
1 2.0 1.0
2 1.0 1.0
3 NaN 1.0
In [186]: df.groupby('group')['val'].value_counts().unstack('group')['A']
Out[186]:
val
1 2.0
2 1.0
3 NaN
Name: A, dtype: float64
This works:
from io import StringIO
import pandas as pd
import numpy as np
data = StringIO("""group,val
A,1
A,1
A,2
B,1
B,2
B,3""")
df = pd.read_csv(data)
print(df, '\n')
res_idx = pd.MultiIndex.from_product([df['group'].unique(), df['val'].unique()])
res = pd.concat([pd.DataFrame(index=res_idx),
df.groupby('group').apply(lambda x: x['val'].value_counts())],
axis=1)
print(res)
Produces:
group val
0 A 1
1 A 1
2 A 2
3 B 1
4 B 2
5 B 3
val
A 1 2.0
2 1.0
3 NaN
B 1 1.0
2 1.0
3 1.0