I want to fetch the max value according to 2 columns in a pandas dataframe. I managed to do this according to 1 column but not 2.
For 1 column:
import numpy as np
import pandas as pd
df = pd.DataFrame({"name": list("ABABCD"), "value": np.arange(6)})
maxes = df.groupby(["name"]).agg("max")
df["maxvalue"]=df["name"].apply(lambda x: maxes.loc[x])
>>> df
name value maxvalue
0 A 0 2
1 B 1 3
2 A 2 2
3 B 3 3
4 C 4 4
5 D 5 5
For 2 columns, I've tried this but it doesn't work:
import numpy as np
import pandas as pd
df = pd.DataFrame({"name": list("ABABCD"),"name2": list("MNOMNO"), "value": np.arange(6)})
maxes = df.groupby(["name","name2"]).agg("max")
df["maxvalue"]=df[["name","name2"]].apply(lambda x: maxes.loc[x])
How can this be done for multiple columns?
Use transform instead of agg. Using one or two columns is exactly the same, for two columns it will be as follows:
df["maxvalue"] = df.groupby(["name", "name2"]).transform("max")
Related
I have dataframe df:
0
0 a
1 b
2 c
3 d
4 e
O/P should be:
a b c d e
0
1
2
3
4
5
I want column containing(a, b,c,d,e) as header of my dataframe.
Could anyone help?
If your dataframe is pandas and its name is df. Try solving it with pandas:
Firstly convert initial df content to a list, afterwards create a new dataframe defining its columns with the list.
import pandas as pd
list = df[0].tolist() #df[0] is getting the content of first column
dfSolved = pd.DataFrame([], columns = list)
You may provide more details like the index and values of the expected output, the operation you wanna do, etc, so that we could give a specific solution to your case
Here is the solution:
import pandas as pd
import io
import numpy as np
data_string = """ columns_name
0 a
1 b
2 c
3 d
4 e
"""
df = pd.read_csv(io.StringIO(data_string), sep='\s+')
# Solution
df_result = pd.DataFrame(data=[[np.nan]*5],
columns=df['columns_name'].tolist())
I have a data frame like this
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0,2,5,3], [3,1,4,2], [1,3,5,2], [5,1,3,4], [4,2,5,1], [2,3,5,1]]))
df
Now, I need first row of data frame and make it another data frame
like
row
0 0
1 2
2 5
3 3
Use DataFrame.iloc like:
df1 = df.iloc[0].to_frame('row')
For Series:
s = df.iloc[0]
You can also use double [[ ]] and transpose it:
>>> df.iloc[[0]].T
0
0 0
1 2
2 5
3 3
I have this dataset
In [4]: df = pd.DataFrame({'A':[1, 2, 3, 4, 5]})
In [5]: df
Out[5]:
A
0 1
1 2
2 3
3 4
4 5
I want to add a new column in dataset based em last value of item, like this
A
New Column
1
2
1
3
2
4
3
5
4
I tryed to use apply with iloc, but it doesn't worked
Can you help
Thank you
With your shown samples, could you please try following. You could use shift function to get the new column which will move all elements of given column into new column with a NaN in first element.
import pandas as pd
df['New_Col'] = df['A'].shift()
OR
In case you would like to fill NaNs with zeros then try following, approach is same as above for this one too.
import pandas as pd
df['New_Col'] = df['A'].shift().fillna(0)
Suppose I have a pandas dataframe given by
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5,2))
df
0 1
0 0.264053 -1.225456
1 0.805492 -1.072943
2 0.142433 -0.469905
3 0.758322 0.804881
4 -0.281493 0.602433
I want to return a Series object with 4 rows, containing max(df[0,0], df[1,1]), max(df[1,0], df[2,1]), max(df[2,0], df[3,1]), max(df[3,0], df[4,1]). More generally, what is the best way to compare the max of column 0 and column 1 offset by n rows?
Thanks.
You want to apply max to rows after having shifted the first column.
pd.concat([df.iloc[:, 0].shift(), df.iloc[:, 1]], axis=1).apply(max, axis=1).dropna()
In Python to check if a value is in a list you can simply do the following:
>>>9 in [1,2,3,6,9]
True
I would like to do the same for a Pandas DataFrame but unfortunately Pandas does not recognise that notation:
>>>import pandas as pd
>>>df = pd.DataFrame([[1,2,3,4],[5,6,7,8]],columns=["a","b","c","d"])
a b c d
0 1 2 3 4
1 5 6 7 8
>>>7 in df
False
How would I achieve this using Pandas DataFrame without iterating through each column/row or anything complicated?
Basically you have to check the matrix without the schema, so:
7 in df.values
x in df checks if x is in the columns:
for x in df:
print x,
out: a b c d