Adding a new column where some values are manipulated [duplicate] - python

This question already has answers here:
Pandas conditional creation of a series/dataframe column
(13 answers)
Closed 3 years ago.
I have a dataframe where say, 1 column is filled with dates and the 2nd column is filled with Ages. I want to add a 3rd column which looks at the Ages column and multiplies it the value by 2 if the value in the row is < 20, else just put the Age in that row. The lambda function below multiples every Age by 2.
def fun(df):
change = df.loc[:, "AGE"].apply(lambda x: x * 2 if x <20 else x)
df.insert(2, "NEW_AGE", change)
return df

Use pandas.Series.where:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(15, 25), columns=['AGE'])
df['AGE'].where(df['AGE'] >= 20, df['AGE'] * 2)
Output:
0 30
1 32
2 34
3 36
4 38
5 20
6 21
7 22
8 23
9 24
Name: AGE, dtype: int64

Related

Pandas - find second largest value in each row [duplicate]

This question already has answers here:
How do I obtain the second highest value in a row?
(3 answers)
Closed 10 months ago.
Good morning! I have a three column dataframe and need to find the second largest value per each row
DATA=pd.DataFrame({"A":[10,11,4,5],"B":[23,8,3,4],"C":[12,7,11,9]})
A B C
0 10 23 12
1 11 8 7
2 4 3 11
3 5 4 9
I tried using nlargest but it seems to be column based and can't find a pandas solution for this problem. Thank you in advance!
import pandas as pd
df=pd.DataFrame({"A":[10,11,4,5],"B":[23,8,3,4],"C":[12,7,11,9]})
# find the second largest value for each row
df['largest2'] = df.apply(lambda x: x.nlargest(2).iloc[1], axis=1)
print(df.head())
result:
A B C largest2
0 10 23 12 12
1 11 8 7 8
2 4 3 11 4
3 5 4 9 5
In A Python List
mylist = [1, 2, 8, 3, 12]
print(sorted(mylist, reverse=True)[1])
In A Python Pandas List
import pandas as pd
df=pd.DataFrame({"A":[10,11,4,5],"B":[23,8,3,4],"C":[12,7,11,9]})
print(sorted(df['A'].nlargest(4))[3])
print(sorted(df['B'].nlargest(4))[3])
print(sorted(df['C'].nlargest(4))[3])
In A Python Pandas List mk.2
import pandas as pd
df=pd.DataFrame({"A":[10,11,4,5],"B":[23,8,3,4],"C":[12,7,11,9]})
num_of_rows = len(df.index)
second_highest = num_of_rows - 2
print(sorted(df['A'].nlargest(num_of_rows))[second_highest])
print(sorted(df['B'].nlargest(num_of_rows))[second_highest])
print(sorted(df['C'].nlargest(num_of_rows))[second_highest])
In A Python Pandas List mk.3
import pandas as pd
df=pd.DataFrame({"A":[10,11,4,5],"B":[23,8,3,4],"C":[12,7,11,9]})
col_names
num_of_rows = len(df.index)
second_highest = num_of_rows - 2
for col_name in col_names:
print(sorted(df[col_name].nlargest(num_of_rows))[second_highest])
In A Python Pandas List mk.4
import pandas as pd
df=pd.DataFrame({"A":[10,11,4,5],"B":[23,8,3,4],"C":[12,7,11,9]})
top_n = (len(df.columns))
pd.DataFrame({n: df.T[col].nlargest(top_n).index.tolist()
for n, col in enumerate(df.T)}).T
df.apply(pd.Series.nlargest, axis=1, n=2)

How to add a new pandas column whose value is conditioned on one column, but value depends on other columns? [duplicate]

This question already has answers here:
Pandas conditional creation of a series/dataframe column
(13 answers)
Closed 1 year ago.
I have a dataframe that looks like this:
idx group valA valB
-----------------------
0 A 10 5
1 A 22 7
2 B 9 0
3 B 6 1
I want to add a new column 'val' that takes 'valA' if group = 'A' and takes 'valB' if group = 'B'.
idx group valA valB val
---------------------------
0 A 10 5 10
1 A 22 7 22
2 B 9 0 0
3 B 6 1 1
How can I do this?
This should do the trick
df['val'] = df.apply(lambda x: x['valA'] if x['group'] == 'A' else x['valB'], axis=1)

Adding a entire column data below the other column in pandas [duplicate]

This question already has answers here:
Convert columns into rows with Pandas
(6 answers)
Closed 2 years ago.
I have a dataframe like this:
time a b
0 10 20
1 11 21
Now i need a dataframe like this:
time a
0 10
1 11
0 20
1 21
This can be done with melt:
df.melt('time', value_name='a').drop('variable', axis=1)
Output:
time a
0 0 10
1 1 11
2 0 20
3 1 21
Or if you have columns other than a,b in your data:
df.melt('time', ['a','b'], value_name='a').drop('variable', axis=1)

Python: drop value=0 row in specific columns [duplicate]

This question already has answers here:
How to delete rows from a pandas DataFrame based on a conditional expression [duplicate]
(6 answers)
How do you filter pandas dataframes by multiple columns?
(10 answers)
Closed 4 years ago.
I want to drop rows with zero value in specific columns
>>> df
salary age gender
0 10000 23 1
1 15000 34 0
2 23000 21 1
3 0 20 0
4 28500 0 1
5 35000 37 1
some data in columns salary and age are missing
and the third column, gender is a binary variables, which 1 means male 0 means female. And 0 here is not a missing data,
I want to drop the row in either salary or age is missing
so I can get
>>> df
salary age gender
0 10000 23 1
1 15000 34 0
2 23000 21 1
3 35000 37 1
Option 1
You can filter your dataframe using pd.DataFrame.loc:
df = df.loc[~((df['salary'] == 0) | (df['age'] == 0))]
Option 2
Or a smarter way to implement your logic:
df = df.loc[df['salary'] * df['age'] != 0]
This works because if either salary or age are 0, their product will also be 0.
Option 3
The following method can be easily extended to several columns:
df.loc[(df[['a', 'b']] != 0).all(axis=1)]
Explanation
In all 3 cases, Boolean arrays are generated which are used to index your dataframe.
All these methods can be further optimised by using numpy representation, e.g. df['salary'].values.

Pandas how to delete alternate rows [duplicate]

This question already has answers here:
Deleting DataFrame row in Pandas based on column value
(18 answers)
Closed 7 years ago.
I have a pandas dataframe with duplicate ids. Below is my dataframe
id nbr type count
7 21 High 4
7 21 Low 6
8 39 High 2
8 39 Low 3
9 13 High 5
9 13 Low 7
How to delete only the rows having the type Low
You can also just slice your df using iloc:
df.iloc[::2]
This will step every 2 rows
You can try this way :
df = df[df.type != "Low"]
Another possible solution is to use drop_duplicates
df = df.drop_duplicates('nbr')
print(df)
id nbr type count
0 7 21 High 4
2 8 39 High 2
4 9 13 High 5
You can also do:
df.drop_duplicates('nbr', inplace=True)
That way you don't have to reassign it.

Categories

Resources