Panda column value change based on another column by lamda function - python

I have a panda dataframe:
star = pd.DataFrame({'Country':['Canada','USA', 'Mexico'],'Rating':[1,2,3], 'Score':[70,80,90]})
I want to give Rating value 3 to Canada. And this code works.
star.loc[star['Country'] == 'Canada', 'Rating'] = 3
But I want to do it with lambda function:
star.Rating.map(lambda x: 3 if star.Country == 'Canada')
Gives a syntax error
File "<ipython-input-41-544a311d7f86>", line 1
star.Rating.map(lambda x: 3 if star.Country == 'Canada')
^
SyntaxError: invalid syntax
I want help in the lambda function

That is indeed a syntax error. You should do:
star.apply(lambda x: 3 if x.Country == 'Canada' else x.Rating, axis=1)
However, your original solution is much better.

I suggest you to avoid apply (or map) for these kind of problems. np.where is faster and easier to implement
star["Rating"] = np.where(star.Country=="Canada", 3, star.Rating)

Related

How to write conditionals across multiple columns in dataframe?

I have the following pandas dataframe:
I am trying to write some conditional python statements, where if we have issue_status of 10 or 40 AND market_phase of 0 AND tade_state of (which is what we have in all of the cases in the above screenshot). Then I want to call a function called resolve_collision_mp(...).
Can I write the conditional in Python as follows?
# Collision for issue_status == 10
if market_info_df['issue_status'].eq('10').all() and market_info_df['market_phase'].eq('0').all() \
and market_info_df['trading_state'] == ' ': # need to change this, can't have equality for dataframe, need loc[...]
return resolve_collision_mp_10(market_info_df)
# Collision for issue_status == 40
if market_info_df['issue_status'].eq('40').all() and market_info_df['market_phase'].eq('0').all() \
and not market_info_df['trading_state']:
return resolve_collision_mp_40(market_info_df)
I don't think the above is correct, any help would be much appreciated!
You can use .apply() with the relevant conditions,
df['new_col'] = df.apply(lambda row: resolve_collision_mp_10(row) if (row['issue_status'] == 10 and row['market_phase'] == 0 and row['tade_state'] = '') else None, axis=1)
df['new_col'] = df.apply(lambda row: resolve_collision_mp_40(row) if (row['issue_status'] == 40 and row['market_phase'] == 0 and row['tade_state'] = '') else None, axis=1)
Note: I am assuming that you are trying to create a new column with the return values of the resolve_collision_mp_10 and resolve_collision_mp_40 functions.

Selecting value from Pandas without going through .values[0]

An example dataset I'm working with
df = pd.DataFrame({"competitorname": ["3 Musketeers", "Almond Joy"], "winpercent": [67.602936, 50.347546] }, index = [1, 2])
I am trying to see whether 3 Musketeers or Almond Joy has a higher winpercent. The code I wrote is:
more_popular = '3 Musketeers' if df.loc[df["competitorname"] == '3 Musketeers', 'winpercent'].values[0] > df.loc[df["competitorname"] == 'Almond Joy', 'winpercent'].values[0] else 'Almond Joy'
My question is
Can I select the values I am interested in without python returning a Series? Is there a way to just do
df[df["competitorname"] == 'Almond Joy', 'winpercent']
and then it would return a simple
50.347546
?
I know this doesn't make my code significantly shorter but I feel like I am missing something about getting values from pandas that would help me avoid constantly adding
.values[0]
The underlying issue is that there could be multiple matches, so we will always need to extract the match(es) at some point in the pipeline:
Use Series.idxmax on the boolean mask
Since False is 0 and True is 1, using Series.idxmax on the boolean mask will give you the index of the first True:
df.loc[df['competitorname'].eq('Almond Joy').idxmax(), 'winpercent']
# 50.347546
This assumes there is at least 1 True match, otherwise it will return the first False.
Or use Series.item on the result
This is basically just an alias for Series.values[0]:
df.loc[df['competitorname'].eq('Almond Joy'), 'winpercent'].item()
# 50.347546
This assumes there is exactly 1 True match, otherwise it will throw a ValueError.
How about simply sorting the dataframe by "winpercent" and then taking the top row?
df.sort_values(by="winpercent", ascending=False, inplace=True)
then to see the winner's row
df.head(1)
or to get the values
df.iloc[0]["winpercent"]
If you're sure that the returned Series has a single element, you can simply use .item() to get it:
import pandas as pd
df = pd.DataFrame({
"competitorname": ["3 Musketeers", "Almond Joy"],
"winpercent": [67.602936, 50.347546]
}, index = [1, 2])
s = df.loc[df["competitorname"] == 'Almond Joy', 'winpercent'] # a pandas Series
print(s)
# output
# 2 50.347546
# Name: winpercent, dtype: float64
v = df.loc[df["competitorname"] == 'Almond Joy', 'winpercent'].item() # a scalar value
print(v)
# output
# 50.347546

How to add round brackets to strings in column

I have a data frame with strings in Column_A: row1:Anna, row2:Mark, row3:Emy
I would like to get something like:row1(Anna). row2:(Mark), row3:(Emy)
I have found some examples on how to remove the brackets, however have not found anything on how to add them.
Hence, any clue would be much appreciated.
Using apply form pandas you can create a function which adds the brackets. In this case the function is an lambda function using the f-string.
df['Column_A'] = df['Column_A'].apply(lambda x: f'({x})')
# Example:
l = ['Anna', 'Mark', 'Emy']
df = pd.DataFrame(l, columns=['Column_A'])
Column_A
0 Anna
1 Mark
2 Emy
df['Column_A'] = df['Column_A'].apply(lambda x: f'({x})')
Column_A
0 (Anna)
1 (Mark)
2 (Emy)
Approach 1: Using direct Concatenation (Simpler)
df['Column_A'] = '(' + df['Column_A'] + ')'
Approach 2: Using apply() function and f-strings
df.Column_A.apply(lambda val: f'({val})')
Approach 3: Using map() function
df = pd.DataFrame(map(lambda val: f'({val})', df.Column_A))

How to include two lambda operations in transform function?

I have a dataframe like as given below
df = pd.DataFrame({
'date' :['2173-04-03 12:35:00','2173-04-03 17:00:00','2173-04-03 20:00:00','2173-04-04 11:00:00','2173-04-04 12:00:00','2173-04-04 11:30:00','2173-04-04 16:00:00','2173-04-04 22:00:00','2173-04-05 04:00:00'],
'subject_id':[1,1,1,1,1,1,1,1,1],
'val' :[5,5,5,10,10,5,5,8,8]
})
I would like to apply couple of logic (logic_1 on val column and logic_2 on date column) to the code. Please find below the logic
logic_1 = lambda x: (x.shift(2).ge(x.shift(1))) & (x.ge(x.shift(2).add(3))) & (x.eq(x.shift(-1)))
logic_2 = lambda y: (y.shift(1).ge(1)) & (y.shift(2).ge(2)) & (y.shift(-1).ge(1))
credit to SO users for helping me with logic
This is what I tried below
df['label'] = ''
df['date'] = pd.to_datetime(df['date'])
df['tdiff'] = df['date'].shift(-1) - df['date']
df['tdiff'] = df['tdiff'].dt.total_seconds()/3600
df['lo_1'] = df.groupby('subject_id')['val'].transform(logic_1).map({True:'1',False:''})
df['lo_2'] = df.groupby('subject_id')['tdiff'].transform(logic_2).map({True:'1',False:''})
How can I make both the logic_1 and logic_2 be part of one logic statement? Is it even possible? I might have more than 2 logics as well. Instead of writing one line for each logic, is it possible to couple all logics together in one logic statement.
I expect my output to be with label column being filled with 1 when both logic_1 and logic_2 are satisfied
You have a few things to fix
First, in logic_2, you have lambda x but use y, so, you got to change that as below
logic_2 = lambda y: (y.shift(1).ge(1)) & (y.shift(2).ge(2)) & (y.shift(-1).ge(1))
Then you can use the logic's together as below'
No need to create a blank column label. You can create the '`label' column directly as below.
df['label'] = ((df.groupby('subject_id')['val'].transform(logic_1))
& (df.groupby('subject_id')['tdiff'].transform(logic_2))).map({True:'0',False:'1'})
Note: You logic produces all False values. So, you will get 1's if False is mapped to 1, not True

How to use "apply" with a dataframe and avoid SettingWithCopyWarning?

I am using the following function with a DataFrame:
df['error_code'] = df.apply(lambda row: replace_semi_colon(row), axis=1)
The embedded function is:
def replace_semi_colon(row):
errrcd = str(row['error_code'])
semi_colon_pat = re.compile(r'.*;.*')
if pd.notnull(errrcd):
if semi_colon_pat.match(errrcd):
mod_error_code = str(errrcd.replace(';',':'))
return mod_error_code
return errrcd
But I am receiving the (in)famous
SettingWithCopyWarning
I have read many posts but still do not know how to prevent it.
The strange thing is that I use other apply functions the same way but they do not throw the same error.
Can someone explain why I am getting this warning?
Before the apply there was another statement:
df = df.query('error_code != "BM" and eror_code != "PM"')
I modified that to:
df.loc[:] = df.query('error_code != "BM" and eror_code != "PM"')
That solved it.

Categories

Resources