When trying to change a column of numbers from object to float dtypes using pandas dataframes, I receive the following warning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Now, the code runs just fine, but what would be the proper and intended way to avoid this warning and still achieve the goal of:
df2[col] = df2[col].astype('float')
Let it be noted that df2 is a subset of df1 using a condition similar to:
df2 = df1[df1[some col] == value]
Use the copy method. Instead of:
df2 = df1[df1[some col] == value]
Just write:
df2 = df1[df1[some col] == value].copy()
Initially, df2 is a slice of df1 and not a new dataframe. Which is why, when you try to modify it, python raises an error.
Related
I want to modify only the values that are greater than 750 on a column of a pandas dataframe
datf.iloc[:,index][datf.iloc[:,index] > 750] = datf.iloc[:,index][datf.iloc[:,index] > 750]/10.7639
I think that the syntax is fine but i get a Pandas warning so i don't know if its correct this way:
<ipython-input-24-72eef50951a4>:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-
docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
What is the correct way to do this without getting this warning?
You can use the apply method to make your modification to your column using your custom function.
N.B you can also use the applymap for multiple columns
def my_func(x):
if x > 750:
x= #do your modification
else:
x
return x
new_dta= datf['col_name'].apply(my_func)
I need to sort panda dataframe df, by a datetime column my_date. IWhenever I use .loc sorting does not apply.
df = df.loc[(df.some_column == 'filter'),]
df.sort_values(by=['my_date'])
print(dfolc)
# ...
# Not sorted!
# ...
df = df.loc[(df.some_column == 'filter'),].sort_values(by=['my_date'])
# ...
# sorting WORKS!
What is the difference of these two uses? What am I missing about dataframes?
In the first case, you didn't perform an operation in-place: you should have used either df = df.sort_values(by=['my_date']) or df.sort_values(by=['my_date'], inplace=True).
In the second case, the result of .sort_values() was saved to df, hence printing df shows sorted dataframe.
In the code df = df.loc[(df.some_column == 'filter'),] df.sort_values(by=['my_date']) print(dfolc), you are using df.loc() df.sort_values(), I'm not sure how that works.
In the seconf line, you are calling it correctly df.loc().sort_values(), which is the correct way. You don't have to use the df. notation twice.
I have a pandas DF with 56 columns. 2 of those columns(X and Y) are empty and I would like to duplicate values stored in 2 different columns in the same DF. Right now, I managed to do it, but I get a warning :
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
I tried this version as well, but still get the caveat warning. Here's my syntax so far :
subset = df[(df.Longitude.isnull()) & (df.Latitude.isnull())]
subset.Longitude = subset.x2
subset.Latitude = subset.y2
Any idea on how to do this without getting the warning notification? Thanks.
fillna
This is how you should be doing it. Pass a dictionary to fillna specifying what to fill each column with. The keys of the dictionary are mapped to column names. So below, fill the missing values of the 'Longitude' column with corresponding values from df.x2.
df.fillna({'Longitude': df.x2, 'Latitude': df.y2})
loc
But to answer your question and barring any other issues.
mask = df.Longitude.isna() & df.Latitude.isna()
df.loc[mask, ['Longitude', 'Latitude']] = df.loc[mask, ['x2', 'y2']].to_numpy()
Not super useful
Because most people find this difficult to read
mask = df.Longitude.isna() & df.Latitude.isna()
df.loc[mask, 'Longitude'], df.loc[mask, 'Latitude'] = map(df.get, ['x2', 'y2'])
df
I have a dataframe that I subset to produce a new dataframe:
temp_df = initial_df.loc[initial_df['col'] == val]
And then I add columns to this dataframe, setting all values to np.nan:
temp_df[new_col] = np.nan
This triggers a 'SettingWithCopyWarning', as it should, and tells me:
Try using .loc[row_indexer,col_indexer] = value instead
However, when I do that, like so:
temp_df.loc[:,new_col] = np.nan
I still get the same warning. In fact, I get one instance of the warning using the 1st method, but get two instances of the warning using .loc:
Is this warning incorrect here? I don't care that the new column I am adding doesn't make it back to the initial_df. Is it a false positive? And why are there two warnings?
city state neighborhoods categories
Dravosburg PA [asas,dfd] ['Nightlife']
Dravosburg PA [adad] ['Auto_Repair','Automotive']
I have above dataframe I want to convert each element of a list into column for eg:
city state asas dfd adad Nightlife Auto_Repair Automotive
Dravosburg PA 1 1 0 1 1 0
I am using following code to do this :
def list2columns(df):
"""
to convert list in the columns
of a dataframe
"""
columns=['categories','neighborhoods']
for col in columns:
for i in range(len(df)):
for element in eval(df.loc[i,"categories"]):
if len(element)!=0:
if element not in df.columns:
df.loc[:,element]=0
else:
df.loc[i,element]=1
How to do this in more efficient way?
Why still there is below warning when I am using df.loc already
SettingWithCopyWarning: A value is trying to be set on a copy of a slice
from a DataFrame.Try using .loc[row_indexer,col_indexer] = value instead
Since you're using eval(), I assume each column has a string representation of a list, rather than a list itself. Also, unlike your example above, I'm assuming there are quotes around the items in the lists in your neighborhoods column (df.iloc[0, 'neighborhoods'] == "['asas','dfd']"), because otherwise your eval() would fail.
If this is all correct, you could try something like this:
def list2columns(df):
"""
to convert list in the columns of a dataframe
"""
columns = ['categories','neighborhoods']
new_cols = set() # list of all new columns added
for col in columns:
for i in range(len(df[col])):
# get the list of columns to set
set_cols = eval(df.iloc[i, col])
# set the values of these columns to 1 in the current row
# (if this causes new columns to be added, other rows will get nans)
df.iloc[i, set_cols] = 1
# remember which new columns have been added
new_cols.update(set_cols)
# convert any un-set values in the new columns to 0
df[list(new_cols)].fillna(value=0, inplace=True)
# if that doesn't work, this may:
# df.update(df[list(new_cols)].fillna(value=0))
I can only speculate on an answer to your second question, about the SettingWithCopy warning.
It's possible (but unlikely) that using df.iloc instead of df.loc will help, since that is intended to select by row number (in your case, df.loc[i, col] only works because you haven't set an index, so pandas uses the default index, which matches the row number).
Another possibility is that the df that is passed in to your function is already a slice from a larger dataframe, and that is causing the SettingWithCopy warning.
I've also found that using df.loc with mixed indexing modes (logical selectors for rows and column names for columns) produces the SettingWithCopy warning; it's possible that your slice selectors are causing similar problems.
Hopefully the simpler and more direct indexing in the code above will solve any of these problems. But please report back (and provide code to generate df) if you are still seeing that warning.
Use this instead
def list2columns(df):
"""
to convert list in the columns
of a dataframe
"""
df = df.copy()
columns=['categories','neighborhoods']
for col in columns:
for i in range(len(df)):
for element in eval(df.loc[i,"categories"]):
if len(element)!=0:
if element not in df.columns:
df.loc[:,element]=0
else:
df.loc[i,element]=1
return df