This question already has answers here:
Set value of one Pandas column based on value in another column
(9 answers)
Closed 10 months ago.
I have a column called trip_cost and I want to create a new column called level.
I am doing this:
df['level'] = ''
for i in df.trip_cost:
if i < 65.0:
df['Nivel'] = 'low'
elif 65.0 <= i <= 82.0:
df['Nivel'] = 'medium'
else:
df['Nivel'] = 'high'
The problem is that all the column is getting the level 'low'instead of the others when it should..
Am I doing something wrong?
So I think you apply it on whole column, so probably your last record has trip_cost < 65. I suggest something like this:
def label_nivel (row):
if row['trip_cost'] < 65:
return 'low'
if row['trip_cost'] < 82:
return 'medium'
return 'high'
df['Nivel'] df.apply (lambda row: label_race(row), axis=1)
Related
This question already has answers here:
Pandas DataFrame: replace all values in a column, based on condition
(8 answers)
Conditional Replace Pandas
(7 answers)
Closed 1 year ago.
If product type == option, I replace the value in the PRICE column with the value of the STRIKE column.
How can I do this without using the for loop? (to make it faster)
Now I have the following but it's slow:
for i in range(df.shape[0]):
if df.loc[i,'type'] == 'Option:
df.loc[i,'PRICE'] = df.loc[i,'STRIKE']
Use .loc in a vectorized fashion
df.loc[df['type'] == 'Option', 'PRICE'] = df['STRIKE']
mask = (df.type == 'Option')
df[mask].PRICE = df[mask].STRIKE
see:
https://www.geeksforgeeks.org/boolean-indexing-in-pandas/
I want to check if each value in a column exist in another dataframe (df2) and if its date is at least 3 days close to the date in the second dataframe (df2) or if they meet other conditions.
The code I've written works, but I want to know if there's a better solution to this problem or a code that's more efficient
Exemple:
def check_answer(df):
if df.ticket_count == 1:
return 'Yes'
elif (df.ticket_count > 0) and (df.occurrences == 1):
return 'Yes'
elif any(
df2[df2.partnumber == df.partnumber]['ticket_date'] >= df['date']
) and any(
df2[df2.partnumber == df.partnumber]['ticket_date'] <= df['date'] + pd.DateOffset(days=3)
):
return 'Yes'
else:
return 'No'
df['result'] = df.apply(check_answer, axis=1)
You could try using list comprehension.
Here's an example:
list comprehension in pandas
And if you need to create a copy of your data frame with news columns containing the result of your conditions, you can check this exemple: Pandas DataFrame Comprehensions
I hope I could help
Best regards.
This question already has an answer here:
Replacing the missing values in pandas
(1 answer)
Closed 3 years ago.
I have dataframe with values Na, blank and others. I want to replace Na with noted (string value)
I want to transform from here to here without changing blank cell.
I already tried
df['A']=df['A'].replace(regex=['NaN'], value='needed')
and
df['A'].replace(regex=['NA'], value='noted
You can use fillna():
df['A'].fillna('noted')
Alternatively, if NA is a string and not np.nan then you can use replace():
df['A'].replace(['NA'], 'noted')
You can use the fillna() method -
df['A'].fillna('Noted', inplace=True)
OR
df['A'] = df['A'].fillna('Noted')
To change without changing blanks - you can use a mapping function
def fillna_not_blanks(value):
if value.strip() == '':
return value
elif value == np.nan:
return 'Noted'
else:
return value
df['A'] = df['A'].map(fillna_not_blanks)
This question already has answers here:
How do I assign values based on multiple conditions for existing columns?
(7 answers)
Closed 4 years ago.
Reading from yahoo finance download ohlcv for nvidia,
I am creating a column for signal buy/dontbuy, when I try to define which passes the avg>volume test everything either comes out all 'buy' or don't buy.
df=pd.read_csv('NVDA.csv',dtype={'label':str})
df['Price%delta']=((df['Close']/df['Open'])*100)
df['Avg_volume']=df['Volume'].rolling(7).mean()
df['Signal']=0
for index, row in df.iterrows():
if row['Volume'] > row['Avg_volume']:
df['Signal']='Buy'
else:
df['Signal']='Dont Buy'
You don't really need the for loop at all:
mask = df["Volume"] > df["Avg_volume"]
df.loc[mask, "Signal"] = "Buy"
df.loc[~mask, "Signal"] = 'Don't buy'
You are not specifying any index where to assign 'Buy' or 'Don't buy'. Use loc instead:
for index, row in df.iterrows():
if row['Volume'] > row['Avg_volume']:
df.loc[index, 'Signal']='Buy'
else:
df.loc[index, 'Signal']='Dont Buy'
A vectorized solution using np.where():
df['Signal'] = np.where(df['Volume'] > df['Avg_volume'], 'Buy', 'Dont Buy')
I am attempting to loop through two columns in my dataframe and add either a 1 or 0 to a new column based on the two aforementioned column values. For example, if Column A is > Column B then add a 1 to Column C. However, I keep receiving the following error and I'm not sure why.
ValueError: The truth value of a Series is ambiguous. Use a.empty,
a.bool(), a.item(), a.any() or a.all().
My code:
for i in df.itertuples():
if df['AdjClose'] > df['30ma']:
df['position'] = 1
elif df['AdjClose'] < df['30ma']:
df['position'] = 0
You aren't actually looping through the rows. In your if statement, instead of your condition being True or False, it's a Series. Hence, the error. A Series is not true or false, it's a Series. A more correct way to write your code would be
for i in range(len(df)):
if df.loc[i, 'AdjClose'] > df.loc[i, '30ma']:
df.loc[i, 'position'] = 1
elif df.loc[i, 'AdjClose'] < df.loc[i, '30ma']:
df.loc[i, 'position'] = 0
A shorter, cleaner, and more pandas-y way to write the code that also has the benefit of running faster would be:
df.loc[df.AdjClose > df['30ma'], 'position'] = 1
df.loc[df.AdjClose < df['30ma'], 'position'] = 0
I highly recommend reading the docs on indexing, it can be a bit tricky in pandas to start with. https://pandas.pydata.org/pandas-docs/stable/indexing.html
Edit:
Note, the for loop code makes the assumption that your index is made of unique values ranging from 0 to n-1. It's a bit more complicated if you have a different index. See https://pandas.pydata.org/pandas-docs/stable/whatsnew.html#deprecate-ix
Your code is calling df.itertuples, but not using the result. You could fix that using one of Ian Kent's suggestions, or something like this:
for row in df[['AdjClose', '30ma']].itertuples():
if row[1] > row[2]: # note: row[0] is the index value
df.loc[row.Index, 'position'] = 1
elif row[1] < row[2]:
df.loc[row.Index, 'position'] = 0
If your columns all had names that were valid Python identifiers, you could use something neater:
for row in df.itertuples():
if row.AdjClose > row.ma30:
df.loc[row.Index, 'position'] = 1
elif row.AdjClose < row.ma30:
df.loc[row.Index, 'position'] = 0
Note that neither of these will work if the index for df has duplicate values.
You might also be able to use df.apply, like this:
def pos(row):
if row['AdjClose'] > row['30ma']:
return 1
elif row['AdjClose'] > row['30ma']:
return 0
else:
return pd.np.nan # undefined?
df['position'] = df.apply(pos)
or just
df['position'] = df.apply(lambda row: 1 if row['AdjClose'] > row['30ma'] else 0)
This should work even if the index has duplicate values. However, you have to define a value for every row, even the ones where row['AdjClose'] == row['30ma'].
Overall, you're probably best off with Ian Kent's second recommendation.
You're trying to test a boolean over multiple values (similar to if pd.Series([False, True, False]) which is not clear what the result is), so pandas raises that error.
The message suggests you could use any() to return if any value (in this case the one value you're testing) is True.
So maybe something like this:
for i in df.itertuples():
if (df['AdjClose'] > df['30ma']).any():
df['position'] = 1
elif (df['AdjClose'] < df['30ma']).any():
df['position'] = 0
See these docs for further details Using If/Truth Statements with pandas