Dropping rows at specific minutes - python

I am trying to drop rows at specific minutes ( 05,10, 20 )
I have datetime as an index
df5['Year'] = df5.index.year
df5['Month'] = df5.index.month
df5['Day']= df5.index.day
df5['Day_of_Week']= df5.index.day_name()
df5['hour']= df5.index.strftime('%H')
df5['Min']= df5.index.strftime('%M')
df5
Then I run below
def clean(df5):
for i in range(len(df5)):
hour = pd.Timestamp(df5.index[i]).hour
minute = pd.Timestamp(df5.index[i]).minute
if df5 = df5[(df5.index.minute ==5) | (df5.index.minute == 10)| (df5.index.minute == 20)]
df.drop(axis=1, index=i, inplace=True)
it returnes invalid syntax error.

Here looping is not necessary, also not recommended.
Use DatetimeIndex.minute with Index.isin and inverted mask by ~ filtering in boolean indexing:
df5 = df5[~df5.index.minute.isin([5, 10, 20])]
For reuse column df5['Min'] use strings values:
df5 = df5[~df5['Min'].isin(['05', '10', '20'])]
All together:
def clean(df5):
return df5[~df5.index.minute.isin([5, 10, 20])]

You can just do it using boolean indexing, assuming that the index is already parsed as datetime.
df5 = df5[~((df5.index.minute == 5) | (df5.index.minute == 10) | (df5.index.minute == 20))]
Or the opposite of the same answer:
df5 = df5[(df5.index.minute != 5) | (df5.index.minute != 10) | (df5.index.minute != 20)]

Generally speaking, the right synthax to combine a logic OR inside an IF statement is the following:
today = 'Saturday'
if today=='Sunday' OR today=='Saturday':
print('Today is off. Rest at home')
In your case, you should probably use something like this:
if df5 == df5[(df5.index.minute ==5)] OR df5[(df5.index.minute ==10)]
......
FINAL NOTE:
You made some mistakes using == and =
In Python (and many other programming languages), a single equal mark = is used to assign a value to a variable, whereas two consecutive equal marks == is used to check whether 2 expressions give the same value .
= is an assignment operator
== is an equality operator

Related

How to write conditionals across multiple columns in dataframe?

I have the following pandas dataframe:
I am trying to write some conditional python statements, where if we have issue_status of 10 or 40 AND market_phase of 0 AND tade_state of (which is what we have in all of the cases in the above screenshot). Then I want to call a function called resolve_collision_mp(...).
Can I write the conditional in Python as follows?
# Collision for issue_status == 10
if market_info_df['issue_status'].eq('10').all() and market_info_df['market_phase'].eq('0').all() \
and market_info_df['trading_state'] == ' ': # need to change this, can't have equality for dataframe, need loc[...]
return resolve_collision_mp_10(market_info_df)
# Collision for issue_status == 40
if market_info_df['issue_status'].eq('40').all() and market_info_df['market_phase'].eq('0').all() \
and not market_info_df['trading_state']:
return resolve_collision_mp_40(market_info_df)
I don't think the above is correct, any help would be much appreciated!
You can use .apply() with the relevant conditions,
df['new_col'] = df.apply(lambda row: resolve_collision_mp_10(row) if (row['issue_status'] == 10 and row['market_phase'] == 0 and row['tade_state'] = '') else None, axis=1)
df['new_col'] = df.apply(lambda row: resolve_collision_mp_40(row) if (row['issue_status'] == 40 and row['market_phase'] == 0 and row['tade_state'] = '') else None, axis=1)
Note: I am assuming that you are trying to create a new column with the return values of the resolve_collision_mp_10 and resolve_collision_mp_40 functions.

How can i use .fillna with specific values?

df["load_weight"] = df.loc[(df["dropoff_site"] == "HORNSBY BEND") & (df['load_type'] == "BRUSH")].fillna(1000, inplace=True)
i want to change the NaN value in "load_weight" column, but only for the rows that contain "HORNSBY BEND" and "BRUSH", but above code gave me "none" to the whole "load_weight" column, what did i do wrong?
I would use a mask for boolean indexing:
m = (df["dropoff_site"] == "HORNSBY BEND") & (df['load_type'] == "BRUSH")
df.loc[m, "load_weight"] = df.loc[m, 'load_weight'].fillna(1000)
NB. you can't keep inplace=True when you assign the output. This is what was causing your data to be replaced with None as methods called with inplace=True return nothing.
Alternative with only boolean indexing:
m1 = (df["dropoff_site"] == "HORNSBY BEND") & (df['load_type'] == "BRUSH")
m2 = df['load_weight'].isna()
df.loc[m1&m2, "load_weight"] = 1000
Instead of fillna, you can directly use df.loc to do the required imputation
df.loc[((df['dropoff_site']=='HORNSBY BEND')&(df['load_type']=='BRUSH')
&(df['load_weight'].isnull())),'load_weight'] = 1000

How to do multiple queries?

I want to do a multiple queries. Here is my data frame:
data = {'Name':['Penny','Ben','Benny','Mark','Ben1','Ben2','Ben3'],
'Eng':[5,1,4,3,1,2,3],
'Math':[1,5,3,2,2,2,3],
'Physics':[2,5,3,1,1,2,3],
'Sports':[4,5,2,3,1,2,3],
'Total':[12,16,12,9,5,8,12],
'Group':['A','A','A','A','A','B','B']}
df1=pd.DataFrame(data, columns=['Name','Eng','Math','Physics','Sports','Total','Group'])
df1
I have 3 queries:
Group A or B
Math > Eng
Name starts with 'B'
I tried to do it one by one
df1[df1.Name.str.startswith('B')]
df1.query('Math > Eng')
df1[df1.Group == 'A'] #I cannot run the code with df1[df1.Group == 'A' or 'B']
Then, I tried to merge those queries
df1.query("'Math > Eng' & 'df1[df1.Name.str.startswith('B')]' & 'df1[df1.Group == 'A']")
TokenError: ('EOF in multi-line statement', (2, 0))
I also tried to pass str.startswith() into df.query()
df1.query("df1.Name.str.startswith('B')")
UndefinedVariableError: name 'df1' is not defined
I have tried lots of ways but no one works. How can I put those queries together?
The long way to solve this – and the one with the most transparency, so best for beginners – is to create a boolean column for each filter. Then sum those columns as one final filter:
df1['filter_1'] = df1['Group'].isin(['A','B'])
df1['filter_2'] = df1['Math'] > df1['Eng']
df1['filter_3'] = df1['Name'].str.startswith('B')
# If all are true
df1['filter_final'] = df1[['filter_1', 'filter_2', 'filter_3']].all(axis=1)
You can certainly combine these steps into one:
mask = ((df1['Group'].isin(['A','B'])) &
(df1['Math'] > df1['Eng']) &
(df1['Name'].str.startswith('B'))
)
df['filter_final'] = mask
Lastly, selecting rows which satisfy your filter is done as follows:
df_filtered = df1[df1['filter_final']]
This selects rows from df1 where final_filter == True
Firstly, the answer is:
df1.query("Math > Eng & Name.str.startswith('B') & Group=='A'")
Additional comments
In query, the column's name doesn't accompany the data frame's name.
df1[df1.Group.isin(['A', 'B'])] or df1.query("Group in ['A', 'B']") instead of df1[df1.Group == 'A' or 'B']

Change column value with Dask DataFrame loc

I have a huge database, which I need to change the value of a column according to a certain condition.
In Pandas I execute the following code to accomplish what I want:
df.loc[
(df['ID_CRITERIO_APURACAO'] == TipoDestinatario.RESIDENCIAL.value) &
(df['CODG_GRUPO_TENSAO'] == 8) &
(df['CONSUMO'].between(0, 30)),
'DESCONTO'
] = 35
How can I do something similar in Dask?
Dask doesn't support inplace mutation. Try this:
condition = (df['ID_CRITERIO_APURACAO'] == TipoDestinatario.RESIDENCIAL.value) &
(df['CODG_GRUPO_TENSAO'] == 8) &
(df['CONSUMO'].between(0, 30))
desconto = df.where(condition, 35)
df['DESCONTO'] = desconto

Dropping rows in Python using != operator is not working

I want to drop rows in my dataset using:
totes = df3.loc[(df3['Reporting Date'] != '18/08/2017') & (df3['Business Line'] != 'Bondy')]
However it is not what I expect; I know that the number of rows I want ot drop is 496 after using:
totes = df3.loc[(df3['Reporting Date'] == '18/08/2017') & (df3['Business Line'] == 'Bondy')]
When I run my drop function, it is giving back much less rows than my dataset minus 496.
Does anyone know how to fix this?
You are correct to use &, but it is being misused. This is a logic problem. Note:
(NOT X) AND (NOT Y) != NOT(X AND Y)
Instead, you can calculate the negative of a Boolean condition via the ~ operator:
totes = df3.loc[~((df3['Reporting Date'] == '18/08/2017') & (df3['Business Line'] == 'Bondy'))]
Those parentheses and masks can get confusing, so you can write this more clearly:
m1 = df3['Reporting Date'].eq('18/08/2017')
m2 = df3['Business Line'].eq('Bondy')
totes = df3.loc[~(m1 & m2)]
Alternatively, note that:
NOT(X & Y) == NOT(X) | NOT(Y)
So you can use:
m1 = df3['Reporting Date'].ne('18/08/2017')
m2 = df3['Business Line'].ne('Bondy')
totes = df3.loc[m1 | m2]

Categories

Resources