Add column Python [duplicate] - python

This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 1 year ago.
Hi I am trying to add a new column ("A") in an existing data frame based in which the values will be 1 or 3 based on the information in one of the columns ("B")
df["A"] = np.where(df["B"] == "reported-public", 1,3)
When doing so I am getting the warning message:
<ipython-input-239-767754e40f8a>:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Any idea why?
Thanks

Any idea why?
A very simple explanation is that you are slicing the data and trying to assign a value to the slice. Is this slice the same as your original dataframe ? We don't know what Pandas is doing exactly doing underneath. Under some situations it will get assigned into your original dataframe. If it works, then probably it got assigned correctly. That's why it's a warning.
There are some links you get more detailed explanation:
How to deal with SettingWithCopyWarning in Pandas

I have made dummy date as follows, to my best abilities based on your limited sample:
import pandas as pd
data = []
data.append([1, "reported-private"])
data.append([2, "reported-private"])
data.append([3, "reported-public"])
df = pd.DataFrame(data, columns=['Number', 'B'])
While using the command provided with numpy 1.19.5 and pandas 1.2.4
df["A"] = np.where(df["B"] == "reported-public", 1,3)
The following output, probably the one your expecting:
Number B A
1 reported-private 3
2 reported-private 3
3 reported-public 1
Now the error is hinting that you might want to use .loc from pandas itself, and maybe .apply for extra functionality. Example provided as such:
df['A'] = df.apply(lambda row: 1 if row.B == 'reported-public' else 3, axis = 1)
Output for this way is the same as previous:
Number B A
1 reported-private 3
2 reported-private 3
3 reported-public 1
So to sum up, might be a version problem, if it is, try changing the version or try the second approach. Cheers.
You can always disable this behavior, as shown below and is from this post:
import pandas as pd
pd.options.mode.chained_assignment = None # default='warn'

Related

Figuring out if an entire column in a Pandas dataframe is the same value or not

I have a pandas dataframe that works just fine. I am trying to figure out how to tell if a column with a label that I know if correct does not contain all the same values.
The code
below errors out for some reason when I want to see if the column contains -1 in each cell
# column = "TheColumnLabelThatIsCorrect"
# df = "my correct dataframe"
# I get an () takes 1 or 2 arguments but 3 is passed in error
if (not df.loc(column, estimate.eq(-1).all())):
I just learned about .eq() and .all() and hopefully I am using them correctly.
It's a syntax issue - see docs for .loc/indexing. Specifically, you want to be using [] instead of ()
You can do something like
if not df[column].eq(-1).all():
...
If you want to use .loc specifically, you'd do something similar:
if not df.loc[:, column].eq(-1).all():
...
Also, note you don't need to use .eq(), you can just do (df[column] == -1).all()) if you prefer.
You could drop duplicates and if you get only one record it means all records are the same.
import pandas as pd
df = pd.DataFrame({'col': [1, 1, 1, 1]})
len(df['col'].drop_duplicates()) == 1
> True
Question not as clear. Lets try the following though
Contains only -1 in each cell
df['estimate'].eq(-1).all()
Contains -1 in any cell
df['estimate'].eq(-1).any()
Filter out -1 and all columns
df.loc[df['estimate'].eq(-1),:]
df['column'].value_counts() gives you a list of all unique values and their counts in a column. As for checking if all the values are a specific number, you can do that by dropping duplicates and checking the length to be 1.
len(set(df['column'])) == 1

Returning a copy versus a view warning when using Python pandas dataframe

My purpose is to transform date column from object type in dateframe df into datetime type, but suffered a lot from view and copy warning when running the program.
I've found some useful information from link: https://stackoverflow.com/a/25254087/3849539
And tested following three solutions, all of them work as expected, but with different warning messages. Could anyone help explain their differences and point out why still warning message for returning a view versus a copy? Thanks.
Solution 1: df['date'] = df['date'].astype('datetime64')
test.py:85: SettingWithCopyWarning: A value is trying to be set on a
copy of a slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['date'] = df['date'].astype('datetime64')
Solution 2: df['date'] = pd.to_datetime(df['date'])
~/report/lib/python3.8/site-packages/pandas/core/frame.py:3188:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]
test.py:85: SettingWithCopyWarning: A value is
trying to be set on a copy of a slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Solution 3: df.loc[:, 'date'] = pd.to_datetime(df.loc[:, 'date'])
~/report/lib/python3.8/site-packages/pandas/core/indexing.py:1676:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_column(ilocs[0], value, pi)
Changing how you do the datetime conversion will not fix the SettingWithCopyWarning. You get it because the df you are working with is already a slice of some larger data frame. Pandas is simply warning you that you are working with the slice and not the full data. Try instead to create a new column in df - you'll get the warning, but the column will exist in your slice. It won't in the original data set.
You can turn off these warnings if you now what you are doing by using pd.options.mode.chained_assignment = None # default='warn'
I got similar warnings recently. After several tries, at least in my case, the problem is not related to your 3 solutions. It might be your 'df'.
If your df was a slice of another pandas df, such as:
df = dfOrigin[slice,:] or
df = dfOrigin[[some columns]] or
df = dfOrigin[one column]
Then, if you do anything on df, that warning will appear. Try using df = dfOrigin[[]].copy() instead.
Code to reproduce this:
import numpy as np
import pandas as pd
np.random.seed(2021)
dfOrigin = pd.DataFrame(np.random.choice(10, (4, 3)), columns=list('ABC'))
print("Orignal dfOrigin")
print(dfOrigin)
# A B C
# 0 4 5 9
# 1 0 6 5
# 2 8 6 6
# 3 6 6 1
df = dfOrigin[['B', 'C']] # Returns a view
df.loc[:,'B'] = df['B'].astype(str) #Get SettingWithCopyWarning
df2 = dfOrigin[['B', 'C']].copy() #Returns a copy
df2['B'] = df2['B'].astype(str) #OK

Python/Pandas - Query a MultiIndex Column [duplicate]

This question already has answers here:
Select columns using pandas dataframe.query()
(5 answers)
Closed 4 years ago.
I'm trying to use query on a MultiIndex column. It works on a MultiIndex row, but not the column. Is there a reason for this? The documentation shows examples like the first one below, but it doesn't indicate that it won't work for a MultiIndex column.
I know there are other ways to do this, but I'm specifically trying to do it with the query function
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random((4,4)))
df.index = pd.MultiIndex.from_product([[1,2],['A','B']])
df.index.names = ['RowInd1', 'RowInd2']
# This works
print(df.query('RowInd2 in ["A"]'))
df = pd.DataFrame(np.random.random((4,4)))
df.columns = pd.MultiIndex.from_product([[1,2],['A','B']])
df.columns.names = ['ColInd1', 'ColInd2']
# query on index works, but not on the multiindexed column
print(df.query('index < 2'))
print(df.query('ColInd2 in ["A"]'))
To answer my own question, it looks like query shouldn't be used at all (regardless of using MultiIndex columns) for selecting certain columns, based on the answer(s) here:
Select columns using pandas dataframe.query()
You can using IndexSlice
df.query('ilevel_0>2')
Out[327]:
ColInd1 1 2
ColInd2 A B A B
3 0.652576 0.639522 0.52087 0.446931
df.loc[:,pd.IndexSlice[:,'A']]
Out[328]:
ColInd1 1 2
ColInd2 A A
0 0.092394 0.427668
1 0.326748 0.383632
2 0.717328 0.354294
3 0.652576 0.520870

Correct way to set value on a slice in pandas [duplicate]

This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 6 years ago.
I have a pandas dataframe: data. it has columns ["name", 'A', 'B']
What I want to do (and works) is:
d2 = data[data['name'] == 'fred'] #This gives me multiple rows
d2['A'] = 0
This will set the column A on the fred rows to 0.
I've also done:
indexes = d2.index
data['A'][indexes] = 0
However, both give me the same warning:
/Users/brianp/work/cyan/venv/lib/python2.7/site-packages/pandas/core/indexing.py:128: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
How does pandas WANT me to do this?
This is a very common warning from pandas. It means you are writing in a copy slice, not the original data so it might not apply to the original columns due to confusing chained assignment. Please read this post. It has detailed discussion on this SettingWithCopyWarning. In your case I think you can try
data.loc[data['name'] == 'fred', 'A'] = 0

python pandas 0.16: SettingWithCopyWarning incorrectly reported

As per my other question:
Python Anaconda: how to test if updated libraries are compatible with my existing code?
I curse the day I was forced to upgrade to pandas 0.16.
One of the things I don't understand is why I get a chained assignment warning when I do something as banal as adding a new field to an existing dataframe and initialising it with 1:
mydataframe['x']=1
causes the following warning:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
See the the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
mydataframe['x']=1
I understand there can be problems when assigning values to a copy of a dataframe, but here I am just adding a new field to a dataframe! How am I supposed to change my code (which worked perfectly in previous versions of pandas)?
Here's an attempt at an answer, or at least an attempt to reproduce the message. (Note that you may only get this message once and might need to start a new shell or do %reset in ipython to get this message.)
In [1]: %reset
Once deleted, variables cannot be recovered. Proceed (y/[n])? y
In [2]: import pandas as pd
In [3]: pd.__version__
Out[3]: '0.16.0'
Here are 3 variations of setting a new column to '1'. The first two do not generate the warning, but the third one does. (Second one thanks to #Jeff's suggestion)
In [4]: df = pd.DataFrame({ 'x':[1,2,3], 'y':[77,88,99] })
...: df['z'] = 1
In [5]: df = pd.DataFrame({ 'x':[1,2,3], 'y':[77,88,99] })
...: df = df[1:]
...: df['z'] = 1
In [6]: df = pd.DataFrame({ 'x':[1,2,3], 'y':[77,88,99] })
...: df2 = df[1:]
...: df2['z'] = 1
-c:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable
/indexing.html#indexing-view-versus-copy
Perhaps others can correct me if I'm wrong, but I believe the error message here is relating to df2 being a copy of a slice of df. However, that's not really an issue as the resulting df and df2 are what I would have expected:
In [7]: df
Out[7]:
x y
0 1 77
1 2 88
2 3 99
In [8]: df2
Out[8]:
x y z
1 2 88 1
2 3 99 1
I know this is going to be terrible to say, but when I get that message I just check to see whether the command did what I wanted or not and don't overly think about the warning. But whether you get a warning message or not, checking that a command did what you expected is really something you need to do all the time in pandas (or matlab, or R, or SAS, or Stata, ... )
This will not generate the warning:
df = pd.DataFrame({ 'x':[1,2,3], 'y':[77,88,99] })
df2 = df[1:].copy()
df2['z'] = 1

Categories

Resources