I'm trying to insert a new column in Python using Pandas based on an "or" condition, but I'm having trouble with the code. Here's what I'm trying to do:
If the column "Regulatory Body" says FDIC, Fed, or Treasury, then I want a new column "Filing" to say "Yes"; otherwise, say "No". This is what I've written. My dataframe is df200.
df200["Filing"] = np.where(df200["Regulatory Body"]=="FDIC","Yes","No")
Is there a way to have an "or" condition in this code to fit the other two variables?
Thanks!
Yes. Use pd.Series.isin:
bodies = {'FDIC', 'Fed', 'Treasury'}
df200['Filing'] = np.where(df200['Regulatory Body'].isin(bodies), 'Yes', 'No')
Alternatively, use pd.Series.map with the Boolean array you receive from pd.Series.isin:
df200['Filing'] = df200['Regulatory Body'].isin(bodies).map({True: 'Yes', False: 'No'})
Related
I am trying to write some Python logic to fill a csv file/pandas dataframe table called (table) with certain conditions, but I can't seem to get it to do what I want.
I have two columns in table: 1. trade_type and 2. execution_venue.
Conditional statement I want to write in Python:
The execution_venue entry will only be filled with either AQXE or AQEU, depending on the trade_type.
When the trade_type is filled with the string DARK, I want the the execution_venue to be filled with XUBS (if it was filled with AQXE before), and AQED (if it was filled with AQEU before).
Here is my code to do this:
security_mic = ('AQXE', 'AQEU')
table.loc[table['trade_type'] == 'DARK', 'execution_venue'] = {'AQXE': 'XUBS',
'AQEU': 'AQED'}.get(security_mic)
When I replace the right hand side of the equality with a string test, I am getting the same error, so I suspect the error is to do with the left hand side, in that it is not accessing the correct place in the dataframe!
Lets use replace for substitution of old values where trade_type os DARK
d = {'AQXE': 'XUBS', 'AQEU': 'AQED'}
table.loc[table['trade_type'] == 'DARK', 'execution_venue'] = table['execution_venue'].replace(d)
I'm trying to filter a Pandas dataframe based on a set of or conditions, but they're all very similar, and I'm wondering if there's a more efficient way to write this.
Specifically, I want to include rows from the dataframe (df) where any of a set of variables is 1:
df.query("Q50r5==1 or Q50r6==1 or Q50r7==1 or Q50r8==1 or Q50r9==1 or Q50r10==1 or Q50r11==1")
This filters correctly to rows where any of these variables is 1.
However, I expect to have a lot more situations where I need to filter my dataframe to something similar, e.g.:
df.query("Q20r1==1 or Q20r2==1 or Q20r3==1")
df.query("Q23r2==1 or Q23r5==1 or Q23r7==1 or Q23r8==1")
The pandas documentation on .query() doesn't specify any more than that you can use and and or like you can elsewhere in Python, so it's possible this is the only way to do this query, but is there some kind of sum or count I could do across these columns within the query? Something like "any(1,Q20r1,Q20r2,Q20r3)" or "sum(Q20r1,Q20r2,Q20r3)>0"?
EDIT: For example, using this small dataframe:
I would want to retrieve ID #s 1,2,4,5,7 and exclude #s 3 and 6, because 3 and 6 do not have any 1's across the columns I'm referring to.
You can use any with axis = 1 to check that at least one value is True in a row.
For example, you can run
df[(df[["Q20r1", "Q20r2", "Q20r3"]] == 1).any(axis = 1)]
My dataframe looks roughly like this
Job
blue-collar
management
self-employed
admin
student
technician
I want to create a new column called "trades" that will have a "yes" for every entry where the job is blue-collar or technician.
This is what I am currently using:
df['trades'] = np.where(df['job']=="blue-collar", 'yes', df['trades'] = np.where(df['job']=="technician", 'yes','no'))
But this did not work
Any help is appreciated!
If you want to work with conditions you can do this:
df.loc[:,"trades"] = "No"
df.loc[(df['job']=="blue-collar")|(df['job']=="technician"),"trades"] = "Yes"
Wih loc you can add new columns to your dataframe. This method takes 2 parameters inside the brackets, the first one is the rows you want to change, so if you write : you are referring the whole dataframe, and the second is the name of the column.
So if you write df['job']=="blue-collar" as first parameter it will only change the rows that match this condition.
df.loc[:,"trades"] = "no"
df.loc[df['job']=="blue-collar","trades"] = "yes"
df.loc[df['job']=="technician","trades"] = "yes"
I'd use either .isin or .str.contains method with np.where
df["trades"] = np.where(
df["job"].isin(["blue-collar", "technician"]),
# df["job"].str.contains("blue-collar|technicial), as an alternative
"yes",
"no"
)
I'm wanting to put multiple conditions into one variable so that I can determine the value I will insert into my column 'EmptyCol'. Please see below. Note: This works with one condition but I believe I'm missing something with multiple conditions
Condition = ((df['status']=='Live') and
(df['name'].str.startswith('A') and
(df['true']==1))
df.loc[Condition, 'EmptyCol'] = 'True'
Use "&" instead of "and"
Condition = ((df['status']=='Live') &
(df['name'].str.startswith('A') &
(df['true']==1))
also I recomend to use df.at
I got some truble with df.loc sometime !
Condition = ((df['status']=='Live') &
(df['name'].str.startswith('A') &
(df['true']==1))
def ChangeValueFunc(Record):
df.at[Record['index'],'EmptyCol'] = 'True'
df_2.loc[Condition ,:].reset_index().apply(lambda x:ChangeValueFunc(x) , axis = 1)
I am trying to use a for loop to assign a column with one of two values based on the value of another column. I created a list of the items I want to assign to one element, using else to assign the others. However, my code is only assigning the else value to the column. I also tried elif and it did not work. Here is my code:
#create list of aggressive reasons
aggressive = ['AGGRESSIVE - ANIMAL', 'AGGRESSIVE - PEOPLE', 'BITES']
#create new column assigning 'Aggressive' or 'Not Aggressive'
for reason in top_dogs_reason['Reason']:
if reason in aggressive:
top_dogs_reason['Aggression'] = 'Aggressive'
else:
top_dogs_reason['Aggression'] = 'Not Aggressive'
My new column top_dogs_reason['Aggression'] only has the value of Not Aggressive. Can someone please tell me why?
You should be using loc to assign things like this which isolate a part of a dataframe you want to update. The first line grabs the values in the "Aggression" column where the "Reason" column has a value contained in the list `aggressive1. The second line finds places where its not in the "Reason" column.
top_dogs_reason[top_dogs_reason['Reason'].isin(aggressive), 'Aggression'] = 'Aggressive'
top_dogs_reason[~top_dogs_reason['Reason'].isin(aggressive), 'Aggression'] = 'Not Aggressive'
or in one line as Roganjosh explained which uses np.where which is much like an excel if/else statement. so here we're saying if reason is in aggressive, give us "Aggressive", otherwise "Not Aggressive", and assign that to the "Aggression" column:
top_dogs_reason['Aggression'] = np.where(top_dogs_reason['Reason'].isin(aggressive), "Aggressive", "Not Aggressive")
or anky_91's answer which uses .map to map values. this is an effective way to feed a dictionary to a pandas series, and for each value in the series it looks at the key in the dictionary and returns the corresponding value:
top_dogs_reason['reason'].isin(aggressive).map({True:'Aggressive',False:'Not Aggressive'})