Overwriting values using .loc [duplicate] - python

This question already has answers here:
Try to replace a specific value in a dataframe, but does not overwritte it
(1 answer)
Changing values in pandas dataframe does not work
(1 answer)
Closed 2 years ago.
I want to conditionally overwrite some values for a given column in my DataFrame using this command
enq.dropna().loc[q16.apply(lambda x: x[:3].lower()) == 'oui', q16_] = 'OUI' # q16 = enq[column_name].dropna()
which has the form
df.dropna().loc[something == something_else, column_name] = new_value
I don't get any error but when I check the result, I see that nothing has changed.
Thanks for reading and helping.

Your problem is because dropna() is a new dataframe which is a copy of df, you have to do it in two steps:
enq.dropna(inplace=True)
enq.loc[q16.apply(lambda x: x[:3].lower()) == 'oui', q16_] = 'OUI'

Related

change a value based on other value in dataframe [duplicate]

This question already has answers here:
Pandas DataFrame: replace all values in a column, based on condition
(8 answers)
Conditional Replace Pandas
(7 answers)
Closed 1 year ago.
If product type == option, I replace the value in the PRICE column with the value of the STRIKE column.
How can I do this without using the for loop? (to make it faster)
Now I have the following but it's slow:
for i in range(df.shape[0]):
if df.loc[i,'type'] == 'Option:
df.loc[i,'PRICE'] = df.loc[i,'STRIKE']
Use .loc in a vectorized fashion
df.loc[df['type'] == 'Option', 'PRICE'] = df['STRIKE']
mask = (df.type == 'Option')
df[mask].PRICE = df[mask].STRIKE
see:
https://www.geeksforgeeks.org/boolean-indexing-in-pandas/

is there any method in python to replace nan in dataframe by string without effect blank cell [duplicate]

This question already has an answer here:
Replacing the missing values in pandas
(1 answer)
Closed 3 years ago.
I have dataframe with values Na, blank and others. I want to replace Na with noted (string value)
I want to transform from here to here without changing blank cell.
I already tried
df['A']=df['A'].replace(regex=['NaN'], value='needed')
and
df['A'].replace(regex=['NA'], value='noted
You can use fillna():
df['A'].fillna('noted')
Alternatively, if NA is a string and not np.nan then you can use replace():
df['A'].replace(['NA'], 'noted')
You can use the fillna() method -
df['A'].fillna('Noted', inplace=True)
OR
df['A'] = df['A'].fillna('Noted')
To change without changing blanks - you can use a mapping function
def fillna_not_blanks(value):
if value.strip() == '':
return value
elif value == np.nan:
return 'Noted'
else:
return value
df['A'] = df['A'].map(fillna_not_blanks)

How can I use multiple .contains() inside a .when() in pySpark? [duplicate]

This question already has answers here:
PySpark: multiple conditions in when clause
(4 answers)
Closed 3 years ago.
I am trying to create classes in a new column, based on existing words in another column. For that, I need to include multiple .contains() conditions. But none of the one I tried work.
def classes_creation(data):
df = data.withColumn("classes", when(data.where(F.col("MISP_RFW_Title").like('galleys') | F.col("MISP_RFW_Title").like('coffee')),"galleys") ).otherwise(lit(na))
return df
# RETURNS ERROR
def classes_creation(data):
df = data.withColumn("classes", when(col("MISP_RFW_Title").contains("galleys").contains("word"), 'galleys').otherwise(lit(na))
return df
# RETURNS COLUMN OF NA ONLY
def classes_creation(data):
df = data.withColumn("classes", when(col("MISP_RFW_Title").contains("galleys" | "word"), 'galleys').otherwise(lit(na))
return df
# RETURNS COLUMN OF NA ONLY
If I understood your requirements correctly, you can use regex for matching with rlike
data.withColumn("classes", when(col("MISP_RFW_Title").rlike("galleys|word"), 'galleys').otherwise('a'))
or maybe if you have different columns, you can use something like this
data.withColumn("classes", when((col("MISP_RFW_Title").contains("galleys")|col("MISP_RFW_Title").contains("word")), 'galleys').otherwise('a'))

remove rows from dataframe where contents could be a choice of strings [duplicate]

This question already has answers here:
dropping rows from dataframe based on a "not in" condition [duplicate]
(2 answers)
Closed 4 years ago.
so i can do something like:
data = df[ df['Proposal'] != 'C000' ]
to remove all Proposals with string C000, but how can i do something like:
data = df[ df['Proposal'] not in ['C000','C0001' ]
to remove all proposals that match either C000 or C0001 (etc. etc.)
You can try this,
df = df.drop(df[df['Proposal'].isin(['C000','C0001'])].index)
Or to select the required ones,
df = df[~df['Proposal'].isin(['C000','C0001'])]
import numpy as np
data = df.loc[np.logical_not(df['Proposal'].isin({'C000','C0001'})), :]
# or
data = df.loc[ ~df['Proposal'].isin({'C000','C0001'}) , :]

Correct way to set value on a slice in pandas [duplicate]

This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 6 years ago.
I have a pandas dataframe: data. it has columns ["name", 'A', 'B']
What I want to do (and works) is:
d2 = data[data['name'] == 'fred'] #This gives me multiple rows
d2['A'] = 0
This will set the column A on the fred rows to 0.
I've also done:
indexes = d2.index
data['A'][indexes] = 0
However, both give me the same warning:
/Users/brianp/work/cyan/venv/lib/python2.7/site-packages/pandas/core/indexing.py:128: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
How does pandas WANT me to do this?
This is a very common warning from pandas. It means you are writing in a copy slice, not the original data so it might not apply to the original columns due to confusing chained assignment. Please read this post. It has detailed discussion on this SettingWithCopyWarning. In your case I think you can try
data.loc[data['name'] == 'fred', 'A'] = 0

Categories

Resources