Avoid raising Setting with copy warning - python

Suppose I have a dataframe df with columns a, b, c, d and I want to subtract mean of the columns from columns a,b,d. How do I achieve the same?
I have tried df[['a','b','d']] = df[['a','b','d']] - df[['a','b','d']].mean() but I get SettingWithCopyWarning. How do I achieve the same without the warning?

df[['a','b','d']] is a like view of original dataframe...trying to set values in a view may or may not work everytime
do it seperately
df['a']=df['a'].mean()
df['b']=df['b'].mean()
df['d']=df['d'].mean()
its doesn't make much difference in performance

When you try to modify a slice of the dataframe directly, e.g., df[['a','b','d']], this can lead to unexpected behavior if you're not careful. Thus, this warning arises to carefully warn you that the original dataframe is being changed by doing this copying process. To suppress this warning, you can use:
mean = df[['a','b','d']].mean()
df[['a','b','d']] = df[['a','b','d']] - mean
or
df.loc[:, ['a','b','d']] = df[['a','b','d']] - df[['a','b','d']].mean()

Are you sure you're getting the warning at that statement/line ?
Anyways, In a Pandorable way and to reduce visible noise, I would do :
cols = ["a", "b", "d"]
df[cols] = df[cols].sub(df[cols].mean())

Related

Get a KeyError in Pandas

I am trying to call a function from a different module as below:
module1 - func1: returns a dataframe
module1 - func2(p_df_in_fromfunc1)
function 2:
for i in range(0,len(p_df_in_fromfunc1):
# Trying to retrieve row values of individual columns and assign to variables
v_tmp = p_df_in_fromfunc1.loc[i,"Col1"]
When trying to run the above code, I get the error:
KeyError 0
Could the issue be because I don't have a zero numbered row?
Without knowing much of you're code, well my guess is, for positional indexing try using iloc instead of loc, if you're interesed in going index-wise.
Something like:
v_tmp = p_df_in_fromfunc1.iloc[i,"Col1"]
You may have a missed to close the quote in the loc function after Col1 ?
v_tmp = p_df_in_fromfunc1.loc[i,"Col1"]
For retrieving a row for specific columns do:
columns = ['Col1', 'Col2']
df[columns].iloc[index]
If you only want one column, you can simplify it to: df['Col1'].iloc(index)
As per your comment, you do not need to reset the index, you can iterate over the values of your index array: df.index

Can't get around Pandas Series SettingWithCopyWarning

I'd like to fetch a Series and make changes to it, which I'd like reflected in the DataFrame later on. However I can't understand how to do it without the SettingWithCopyWarning. Is this a false positive or am I doing something wrong?
df = pd.DataFrame([[1,2,3],[4,5,6]], columns=list('abc'))
df['d'] = df['a'].diff()
d = df.loc[:, 'd']
d.loc[d>0] *= 3
I've read the docs (and yes, I did read this question before asking but it only deals with DataFrames and not with Series), but isn't able to work out how to fix this. I would prefer not to disable the warning, as I have code where I don't want to make this type of mistake inadvertently.
I'd like to fetch a Series and make changes to it, which I'd like
reflected in the DataFrame later on.
In this case, you should temporarily disable this warning and proceed as you are now. Using .copy() will mean your original df will be unmoidified by changes to d.
with pd.option_context('mode.chained_assignment', None):
df = pd.DataFrame([[1,2,3],[4,5,6]], columns=list('abc'))
df['d'] = df['a'].diff()
d = df.loc[:, 'd']
d.loc[d>0] *= 3
# Code you run outside of `with` will maintain your original setting:
# pd.get_option('chained_assignment')
option_context is a context manager, meaning it can be used with with, and the option only applies to code within the block.
Read more: pandas > Getting & Setting Options

Is this Pandas 'SettingWithCopyWarning' a False Positive?

I have a dataframe that I subset to produce a new dataframe:
temp_df = initial_df.loc[initial_df['col'] == val]
And then I add columns to this dataframe, setting all values to np.nan:
temp_df[new_col] = np.nan
This triggers a 'SettingWithCopyWarning', as it should, and tells me:
Try using .loc[row_indexer,col_indexer] = value instead
However, when I do that, like so:
temp_df.loc[:,new_col] = np.nan
I still get the same warning. In fact, I get one instance of the warning using the 1st method, but get two instances of the warning using .loc:
Is this warning incorrect here? I don't care that the new column I am adding doesn't make it back to the initial_df. Is it a false positive? And why are there two warnings?

Proper way to utilize .loc in python's pandas

When trying to change a column of numbers from object to float dtypes using pandas dataframes, I receive the following warning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Now, the code runs just fine, but what would be the proper and intended way to avoid this warning and still achieve the goal of:
df2[col] = df2[col].astype('float')
Let it be noted that df2 is a subset of df1 using a condition similar to:
df2 = df1[df1[some col] == value]
Use the copy method. Instead of:
df2 = df1[df1[some col] == value]
Just write:
df2 = df1[df1[some col] == value].copy()
Initially, df2 is a slice of df1 and not a new dataframe. Which is why, when you try to modify it, python raises an error.

Assignment through chained indexers

I would like to be able to assign to a DataFrame through chained indexers. Notionally like this:
subset = df.loc[mask]
... # much later
subset.loc[mask2, 'column'] += value
This does not work because, as I understand it, the second .loc triggers a copy-on-write. Is there a way to do this?
I could pass df and mask around so that the later code could combine mask and mask2 before making an assignment but it feels much cleaner to be able to pass around the subset view instead so that the later code only has to worry about it's own mask.
When you get to:
subset.loc[mask2, 'column']
assign this to another subset so you can access its index and columns attributes.
subsubset = subset.loc[mask2, 'column']
Then you can access df with subsubset's index and columns
df.loc[subsubset.index, subsubset.columns] += 1

Categories

Resources