Python: Define subgroup of data with multiple conditions - python

I have a table with several dummy variables
I would now like to create a subgroup where I list the winpercent values of those rows where fruity=1 and hard=0. My first attempt was this one but it was unsuccesful:
df6=full_data[full_data['fruity'&'hard']==['1'&'0'].iloc[:,-1]
Can anyone help, please?

Please write the conditions one by one separated by the '&' operator:
full_data.loc[(full_data['fruity'] == 1) &
(full_data['hard'] == 0), 'winpercent']
You can also query it:
full_data.query("fruity == 1 and hard == 0", inplace=False)['winpercent']

Related

comparing two columns of a row in python dataframe

I know that one can compare a whole column of a dataframe and making a list out of all rows that contain a certain value with:
values = parsedData[parsedData['column'] == valueToCompare]
But is there a possibility to make a list out of all rows, by comparing two columns with values like:
values = parsedData[parsedData['column01'] == valueToCompare01 and parsedData['column02'] == valueToCompare02]
Thank you!
It is completely possible, but I have never tried using and in order to mask the dataframe, rather using & would be of interest in this case. Note that, if you want your code to be more clear, use ( ) in each statement:
values = parsedData[(parsedData['column01'] == valueToCompare01) & (parsedData['column02'] == valueToCompare02)]

Multiple Conditions in 1 Variable

I'm wanting to put multiple conditions into one variable so that I can determine the value I will insert into my column 'EmptyCol'. Please see below. Note: This works with one condition but I believe I'm missing something with multiple conditions
Condition = ((df['status']=='Live') and
(df['name'].str.startswith('A') and
(df['true']==1))
df.loc[Condition, 'EmptyCol'] = 'True'
Use "&" instead of "and"
Condition = ((df['status']=='Live') &
(df['name'].str.startswith('A') &
(df['true']==1))
also I recomend to use df.at
I got some truble with df.loc sometime !
Condition = ((df['status']=='Live') &
(df['name'].str.startswith('A') &
(df['true']==1))
def ChangeValueFunc(Record):
df.at[Record['index'],'EmptyCol'] = 'True'
df_2.loc[Condition ,:].reset_index().apply(lambda x:ChangeValueFunc(x) , axis = 1)

Python DataFrames: finding *almost" identical rows

I have a DF loaded with orders. Some of them contains negative quantities, and the reason for that is that they are actually cancellations of prior orders.
Problem, there is no unique key that can help me find back which order corresponds to which cancellation.
So I've built the following code ('cancelations' is a subset of the original data containing only the rows that correspond to... well... cancelations):
for i, item in cancelations.iterrows():
#find a row similar to the cancelation we are currently studying:
#We use item[1] to access second value of the tuple given back by iterrows()
mask1 = (copy['CustomerID'] == item['CustomerID'])
mask2 = (copy['Quantity'] == item['Quantity'])
mask3 = (copy['Description'] == item['Description'])
subset = copy[ mask1 & mask2 & mask3]
if subset.shape[0] >0: #if we find one or several corresponding orders :
print('possible corresponding orders:', subset.index.tolist())
copy = copy.drop(subset.index.tolist()[0]) #retrieve only the first ot them from the copy of the data
So, this works, but :
first, it takes forever to run; and second, I read somewhere that whenever you find yourself writing complex code to manipulate dataframes, there's already a method for it.
So perhaps one of you know something that could help me ?
thank you for your time !
edit : note that sometimes, we can have several orders that could correspond to the cancelation at hand. This is why I didn't use drop_duplicates with only some columns specified... because it eliminates all duplicates (or all but one) : I need to drop only one of them.

Searching a list of dataframes for a specific value

I scraped a bunch of tables of financial data using pandas.read_excel. I am trying to search through the list of dataframes and select only the ones that contain a certain value/string. Is it possible to do that? I had thought I could do something like:
search = [x.isin('string') for x in df_list]
You might want this (for each frame):
(df == 'foo').any()
That will return True if 'foo' is anywhere in the frame.
[x for x in df.isin('string').any().sum()]
check if word exist in each column and sum the boolean vales for each columns.
so, it will returm True if it exist at least in one of the columns.

Pandas - Selecting multiple dataframe criteria

I have a DataFrame with multiple columns and I need to set the criteria to access specific values from two different columns. I'm able to do it successfully on one column as shown here:
status_filter = df[df['STATUS'] == 'Complete']
But I'm struggling to specify values from two columns. I've tried something like this but get errors:
status_filter = df[df['STATUS'] == 'Complete' and df['READY TO INVOICE'] == 'No']
It may be a simple answer, but any help is appreciated.
Your code has two very small errors: 1) need parentheses for two or more criteria and 2) you need to use the ampersand between your criteria:
status_filter = df[(df['STATUS'] == 'Complete') & (df['READY TO INVOICE'] == 'No')]
status_filter = df.ix[(df['STATUS'] == 'Complete') & (df['READY TO INVOICE'] == 'No'),]
ur welcome
you can use:
status_filter = df[(df['STATUS'] == 'Complete') & (df['READY TO INVOICE'] == 'No')]

Categories

Resources