I have the following pandas dataframe:
I am trying to write some conditional python statements, where if we have issue_status of 10 or 40 AND market_phase of 0 AND tade_state of (which is what we have in all of the cases in the above screenshot). Then I want to call a function called resolve_collision_mp(...).
Can I write the conditional in Python as follows?
# Collision for issue_status == 10
if market_info_df['issue_status'].eq('10').all() and market_info_df['market_phase'].eq('0').all() \
and market_info_df['trading_state'] == ' ': # need to change this, can't have equality for dataframe, need loc[...]
return resolve_collision_mp_10(market_info_df)
# Collision for issue_status == 40
if market_info_df['issue_status'].eq('40').all() and market_info_df['market_phase'].eq('0').all() \
and not market_info_df['trading_state']:
return resolve_collision_mp_40(market_info_df)
I don't think the above is correct, any help would be much appreciated!
You can use .apply() with the relevant conditions,
df['new_col'] = df.apply(lambda row: resolve_collision_mp_10(row) if (row['issue_status'] == 10 and row['market_phase'] == 0 and row['tade_state'] = '') else None, axis=1)
df['new_col'] = df.apply(lambda row: resolve_collision_mp_40(row) if (row['issue_status'] == 40 and row['market_phase'] == 0 and row['tade_state'] = '') else None, axis=1)
Note: I am assuming that you are trying to create a new column with the return values of the resolve_collision_mp_10 and resolve_collision_mp_40 functions.
df["load_weight"] = df.loc[(df["dropoff_site"] == "HORNSBY BEND") & (df['load_type'] == "BRUSH")].fillna(1000, inplace=True)
i want to change the NaN value in "load_weight" column, but only for the rows that contain "HORNSBY BEND" and "BRUSH", but above code gave me "none" to the whole "load_weight" column, what did i do wrong?
I would use a mask for boolean indexing:
m = (df["dropoff_site"] == "HORNSBY BEND") & (df['load_type'] == "BRUSH")
df.loc[m, "load_weight"] = df.loc[m, 'load_weight'].fillna(1000)
NB. you can't keep inplace=True when you assign the output. This is what was causing your data to be replaced with None as methods called with inplace=True return nothing.
Alternative with only boolean indexing:
m1 = (df["dropoff_site"] == "HORNSBY BEND") & (df['load_type'] == "BRUSH")
m2 = df['load_weight'].isna()
df.loc[m1&m2, "load_weight"] = 1000
Instead of fillna, you can directly use df.loc to do the required imputation
df.loc[((df['dropoff_site']=='HORNSBY BEND')&(df['load_type']=='BRUSH')
&(df['load_weight'].isnull())),'load_weight'] = 1000
I want to do a multiple queries. Here is my data frame:
data = {'Name':['Penny','Ben','Benny','Mark','Ben1','Ben2','Ben3'],
'Eng':[5,1,4,3,1,2,3],
'Math':[1,5,3,2,2,2,3],
'Physics':[2,5,3,1,1,2,3],
'Sports':[4,5,2,3,1,2,3],
'Total':[12,16,12,9,5,8,12],
'Group':['A','A','A','A','A','B','B']}
df1=pd.DataFrame(data, columns=['Name','Eng','Math','Physics','Sports','Total','Group'])
df1
I have 3 queries:
Group A or B
Math > Eng
Name starts with 'B'
I tried to do it one by one
df1[df1.Name.str.startswith('B')]
df1.query('Math > Eng')
df1[df1.Group == 'A'] #I cannot run the code with df1[df1.Group == 'A' or 'B']
Then, I tried to merge those queries
df1.query("'Math > Eng' & 'df1[df1.Name.str.startswith('B')]' & 'df1[df1.Group == 'A']")
TokenError: ('EOF in multi-line statement', (2, 0))
I also tried to pass str.startswith() into df.query()
df1.query("df1.Name.str.startswith('B')")
UndefinedVariableError: name 'df1' is not defined
I have tried lots of ways but no one works. How can I put those queries together?
The long way to solve this – and the one with the most transparency, so best for beginners – is to create a boolean column for each filter. Then sum those columns as one final filter:
df1['filter_1'] = df1['Group'].isin(['A','B'])
df1['filter_2'] = df1['Math'] > df1['Eng']
df1['filter_3'] = df1['Name'].str.startswith('B')
# If all are true
df1['filter_final'] = df1[['filter_1', 'filter_2', 'filter_3']].all(axis=1)
You can certainly combine these steps into one:
mask = ((df1['Group'].isin(['A','B'])) &
(df1['Math'] > df1['Eng']) &
(df1['Name'].str.startswith('B'))
)
df['filter_final'] = mask
Lastly, selecting rows which satisfy your filter is done as follows:
df_filtered = df1[df1['filter_final']]
This selects rows from df1 where final_filter == True
Firstly, the answer is:
df1.query("Math > Eng & Name.str.startswith('B') & Group=='A'")
Additional comments
In query, the column's name doesn't accompany the data frame's name.
df1[df1.Group.isin(['A', 'B'])] or df1.query("Group in ['A', 'B']") instead of df1[df1.Group == 'A' or 'B']
I have a huge database, which I need to change the value of a column according to a certain condition.
In Pandas I execute the following code to accomplish what I want:
df.loc[
(df['ID_CRITERIO_APURACAO'] == TipoDestinatario.RESIDENCIAL.value) &
(df['CODG_GRUPO_TENSAO'] == 8) &
(df['CONSUMO'].between(0, 30)),
'DESCONTO'
] = 35
How can I do something similar in Dask?
Dask doesn't support inplace mutation. Try this:
condition = (df['ID_CRITERIO_APURACAO'] == TipoDestinatario.RESIDENCIAL.value) &
(df['CODG_GRUPO_TENSAO'] == 8) &
(df['CONSUMO'].between(0, 30))
desconto = df.where(condition, 35)
df['DESCONTO'] = desconto
I want to drop rows in my dataset using:
totes = df3.loc[(df3['Reporting Date'] != '18/08/2017') & (df3['Business Line'] != 'Bondy')]
However it is not what I expect; I know that the number of rows I want ot drop is 496 after using:
totes = df3.loc[(df3['Reporting Date'] == '18/08/2017') & (df3['Business Line'] == 'Bondy')]
When I run my drop function, it is giving back much less rows than my dataset minus 496.
Does anyone know how to fix this?
You are correct to use &, but it is being misused. This is a logic problem. Note:
(NOT X) AND (NOT Y) != NOT(X AND Y)
Instead, you can calculate the negative of a Boolean condition via the ~ operator:
totes = df3.loc[~((df3['Reporting Date'] == '18/08/2017') & (df3['Business Line'] == 'Bondy'))]
Those parentheses and masks can get confusing, so you can write this more clearly:
m1 = df3['Reporting Date'].eq('18/08/2017')
m2 = df3['Business Line'].eq('Bondy')
totes = df3.loc[~(m1 & m2)]
Alternatively, note that:
NOT(X & Y) == NOT(X) | NOT(Y)
So you can use:
m1 = df3['Reporting Date'].ne('18/08/2017')
m2 = df3['Business Line'].ne('Bondy')
totes = df3.loc[m1 | m2]