change a value based on other value in dataframe [duplicate] - python

This question already has answers here:
Pandas DataFrame: replace all values in a column, based on condition
(8 answers)
Conditional Replace Pandas
(7 answers)
Closed 1 year ago.
If product type == option, I replace the value in the PRICE column with the value of the STRIKE column.
How can I do this without using the for loop? (to make it faster)
Now I have the following but it's slow:
for i in range(df.shape[0]):
if df.loc[i,'type'] == 'Option:
df.loc[i,'PRICE'] = df.loc[i,'STRIKE']

Use .loc in a vectorized fashion
df.loc[df['type'] == 'Option', 'PRICE'] = df['STRIKE']

mask = (df.type == 'Option')
df[mask].PRICE = df[mask].STRIKE
see:
https://www.geeksforgeeks.org/boolean-indexing-in-pandas/

Related

Overwriting values using .loc [duplicate]

This question already has answers here:
Try to replace a specific value in a dataframe, but does not overwritte it
(1 answer)
Changing values in pandas dataframe does not work
(1 answer)
Closed 2 years ago.
I want to conditionally overwrite some values for a given column in my DataFrame using this command
enq.dropna().loc[q16.apply(lambda x: x[:3].lower()) == 'oui', q16_] = 'OUI' # q16 = enq[column_name].dropna()
which has the form
df.dropna().loc[something == something_else, column_name] = new_value
I don't get any error but when I check the result, I see that nothing has changed.
Thanks for reading and helping.
Your problem is because dropna() is a new dataframe which is a copy of df, you have to do it in two steps:
enq.dropna(inplace=True)
enq.loc[q16.apply(lambda x: x[:3].lower()) == 'oui', q16_] = 'OUI'

How can I use multiple .contains() inside a .when() in pySpark? [duplicate]

This question already has answers here:
PySpark: multiple conditions in when clause
(4 answers)
Closed 3 years ago.
I am trying to create classes in a new column, based on existing words in another column. For that, I need to include multiple .contains() conditions. But none of the one I tried work.
def classes_creation(data):
df = data.withColumn("classes", when(data.where(F.col("MISP_RFW_Title").like('galleys') | F.col("MISP_RFW_Title").like('coffee')),"galleys") ).otherwise(lit(na))
return df
# RETURNS ERROR
def classes_creation(data):
df = data.withColumn("classes", when(col("MISP_RFW_Title").contains("galleys").contains("word"), 'galleys').otherwise(lit(na))
return df
# RETURNS COLUMN OF NA ONLY
def classes_creation(data):
df = data.withColumn("classes", when(col("MISP_RFW_Title").contains("galleys" | "word"), 'galleys').otherwise(lit(na))
return df
# RETURNS COLUMN OF NA ONLY
If I understood your requirements correctly, you can use regex for matching with rlike
data.withColumn("classes", when(col("MISP_RFW_Title").rlike("galleys|word"), 'galleys').otherwise('a'))
or maybe if you have different columns, you can use something like this
data.withColumn("classes", when((col("MISP_RFW_Title").contains("galleys")|col("MISP_RFW_Title").contains("word")), 'galleys').otherwise('a'))

Count instances in a dataframe [duplicate]

This question already has answers here:
Pandas, group by count and add count to original dataframe?
(3 answers)
Closed 3 years ago.
I have a dataframe containing a column of values (X).
df = pd.DataFrame({'X' : [2,3,5,2,2,3,7,2,2,7,5,2]})
For each row, I would like to find how many times it's value of X appears (A).
My expected output is:
create temp column with 1 and groupby and count to get your desired answer
df = pd.DataFrame({'X' : [2,3,5,2,2,3,7,2,2,7,5,2]})
df['temp'] = 1
df['count'] = df.groupby(['X'],as_index=False).transform(pd.Series.count)
del df['temp']
print(df)

Signal conditional for column [duplicate]

This question already has answers here:
How do I assign values based on multiple conditions for existing columns?
(7 answers)
Closed 4 years ago.
Reading from yahoo finance download ohlcv for nvidia,
I am creating a column for signal buy/dontbuy, when I try to define which passes the avg>volume test everything either comes out all 'buy' or don't buy.
df=pd.read_csv('NVDA.csv',dtype={'label':str})
df['Price%delta']=((df['Close']/df['Open'])*100)
df['Avg_volume']=df['Volume'].rolling(7).mean()
df['Signal']=0
for index, row in df.iterrows():
if row['Volume'] > row['Avg_volume']:
df['Signal']='Buy'
else:
df['Signal']='Dont Buy'
You don't really need the for loop at all:
mask = df["Volume"] > df["Avg_volume"]
df.loc[mask, "Signal"] = "Buy"
df.loc[~mask, "Signal"] = 'Don't buy'
You are not specifying any index where to assign 'Buy' or 'Don't buy'. Use loc instead:
for index, row in df.iterrows():
if row['Volume'] > row['Avg_volume']:
df.loc[index, 'Signal']='Buy'
else:
df.loc[index, 'Signal']='Dont Buy'
A vectorized solution using np.where():
df['Signal'] = np.where(df['Volume'] > df['Avg_volume'], 'Buy', 'Dont Buy')

remove rows from dataframe where contents could be a choice of strings [duplicate]

This question already has answers here:
dropping rows from dataframe based on a "not in" condition [duplicate]
(2 answers)
Closed 4 years ago.
so i can do something like:
data = df[ df['Proposal'] != 'C000' ]
to remove all Proposals with string C000, but how can i do something like:
data = df[ df['Proposal'] not in ['C000','C0001' ]
to remove all proposals that match either C000 or C0001 (etc. etc.)
You can try this,
df = df.drop(df[df['Proposal'].isin(['C000','C0001'])].index)
Or to select the required ones,
df = df[~df['Proposal'].isin(['C000','C0001'])]
import numpy as np
data = df.loc[np.logical_not(df['Proposal'].isin({'C000','C0001'})), :]
# or
data = df.loc[ ~df['Proposal'].isin({'C000','C0001'}) , :]

Categories

Resources