Pandas replace function wrongly changes in all dataframes [duplicate] - python

This question already has answers here:
why should I make a copy of a data frame in pandas
(8 answers)
Closed 2 years ago.
I use pandas replace function to replace a value. Please see the code below:
import pandas as pd
d = {'color' : pd.Series(['white', 'blue', 'orange']),
'second_color': pd.Series(['white', 'black', 'blue']),
'value' : pd.Series([1., 2., 3.])}
df1 = pd.DataFrame(d)
print(df1)
df = df1
df['color'] = df['color'].replace('white','red')
print(df1)
print(df)
I intend to change a value in df, but why is the same value in df1 also changed?
The code below is ok.
df=df.replace('white','red')

You need to use .copy()
df = df1.copy()
So the changes you do to df will not propagate to df1

Because both are referencing the same data location.
When you do df = df1 it does not create a new data frame it just set the reference of df to variable df1. Using id() you can see both referencing to the same address.
>>> df = df1
>>> id(df)
41633008
>>> id(df1)
41633008
To make a new copy you can use DataFrame.copy method
>>> df = df1.copy()
>>> id(df)
31533376
>>> id(df1)
41633008
Now you can see both referenced to different locations.
There is still much to learn about shallow copy and deep copy. Please read the document for more. - here

Related

How to assign new column to existing DataFrame in pandas

I'm new to pandas. I'm trying to add new columns to my existing DataFrame but It's not getting assigned don't know why can anyone explain me what I'm missing this is what i tried
import pandas as pd
df = pd.DataFrame(data = {"test":["mkt1","mkt2","mkt3"],
"test2":["cty1","cty2","cty3"]})
print("Before",df.columns)
df.assign(test3="Hello")
print("After",df.columns)
Output
Before Index(['test', 'test2'], dtype='object')
After Index(['test', 'test2'], dtype='object')
Pandas assign method returns a new modified dataframe with a new column, it does not modify it in place.
import pandas as pd
df = pd.DataFrame(data = {"test":["mkt1","mkt2","mkt3"],
"test2":["cty1","cty2","cty3"]})
print("Before",df.columns)
df = df.assign(test3="Hello") # <--- Note the variable reassingment
print("After",df.columns)

Reindex dataframe inside loop [duplicate]

This question already has answers here:
How to change variables fed into a for loop in list form
(4 answers)
Closed 5 months ago.
I'm trying to reindex the columns in a set of dataframes inside a loop. This only seems to work outside the loop. See sample code below
import pandas as pd
data1 = [[1,2,3],[4,5,6],[7,8,9]]
data2 = [[10,11,12],[13,14,15],[16,17,18]]
data3 = [[19,20,21],[22,23,24],[25,26,27]]
index = ['a','b','c']
columns = ['d','e','f']
df1 = pd.DataFrame(data=data1,index=index,columns=columns)
df2 = pd.DataFrame(data=data2,index=index,columns=columns)
df3 = pd.DataFrame(data=data3,index=index,columns=columns)
columns2 = ['f','e','d']
for i in [df1,df2,df3]:
i = i.reindex(columns=columns2)
print(df1)
df2 = df2.reindex(columns=columns2)
print(df2)
df1 is not reindexed as desired, however if I reindex df2 outside of the loop it works. Why is that?
Thanks
Andrew
That happens for the same reason this happens:
a = 5
b = 6
for i in [a, b]:
i = 4
>>> a
5
Why? See this accepted answer.
Concerning your problem, one way to go about it is create a list of reindexed dataframes like so:
reindexed_dfs = [df.reindex(columns=columns2) for df in [df1, df2, df3]]
and then reassign df1, df2 and df3. But it's better to just keep using your newly created list anyways.

What is the correct way to add a list as a column to a dataframe?

I want to add a list as a new column to a dataframe. I am doing:
df['Intervention'] = interventionList
It gives me
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
I read Pandas add a series to dataframe column where the accepted answer is very similar.
I believe one option would be to use:
df.assign(Intervention = interventionList)
or to make a copy of the dataframe:
df2 = df.copy()
You can try something like this
import pandas as pd
li = [1,2,3,4,5]
li2 =[6,7,8,9,10]
df = pd.DataFrame()
#Using pd.Series to add lists to dataframe
df['col1'] = pd.Series(li)
df['col2'] = pd.Series(li2)
df

How to change specific row value in dataframe using pandas? [duplicate]

This question already has answers here:
Set value for particular cell in pandas DataFrame using index
(23 answers)
Closed 2 years ago.
Here I attached my data frame.I am trying to change specific value of row.but I am not getting succeed.Any leads would be appreciated.
df.replace(to_replace ="Agriculture, forestry and fishing ",
value ="Agriculture")
Image of My data frame
Try this:
df['Name'] = df['Name'].str.replace('Agriculture, forestry and fishing', 'Agriculture')
This should work for any data type:
df.loc[df.loc[:, 'Name']=='Agriculture, forestry and fishing', 'Name'] = 'Agriculture'
You can easily get all the columns names with calling: df.columns
then you can copy this list and replace the name of any column and reassign the list to df.columns.
For example:
import pandas as pd
df = pd.DataFrame(data=[[1, 2], [10, 20], [100, 200]], columns=['A', 'B'])
df.columns
the output will be in a jupyter notebook:
Index(['C', 'D'], dtype='object')
so you copy that list and then replace what you want to change and reassign it
df.columns = ['C', 'D']
and then you will get a dataframe with the name of columns changed from A and B to C and D, you check this by calling
df.head()

how to append a dataframe without overwriting existing dataframe using for loop in python

i have an empty dataframe[] and want to append additional dataframes using for loop without overwriting existing dataframes, regular append method is overwriting the existing dataframe and showing only the last appended dataframe in output.
use concat() from the pandas module.
import pandas as pd
df_new = pd.concat([df_empty, df_additional])
read more about it in the pandas Docs.
regarding the question in the comment...
df = pd.DataFrame(insert columns which your to-be-appended-df has too)
for i in range(10):
function_to_get_df_new()
df = pd.concat([df, df_new])
Let you have list of dataframes list_of_df = [df1, df2, df3].
You have empty dataframe df = pd.Dataframe()
If you want to append all dataframes in list into that empty dataframe df:
for i in list_of_df:
df = df.append(i)
Above loop will not change df1, df2, df3. But df will change.
Note that doing df.append(df1) will not change df, unless you assign it back to df so that df = df.append(df1)
You can't also use set:
df_new = pd.concat({df_empty, df_additional})
Because pandas.DataFrame objects can't be hashed, set needs hashed so that's why
Or tuple:
df_new = pd.concat((df_empty, df_additional))
They are little quicker...
Update for for loop:
df = pd.DataFrame(data)
for i in range(your number):
df_new=function_to_get_df_new()
df = pd.concat({df, df_new}) # or tuple: df = pd.concat((df, df_new))
The question is already well answered, my 5cts are the suggestion to use ignore_index=True option to get a continuous new index, not duplicate the older ones.
import pandas as pd
df_to_append = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB')) # sample
df = pd.DataFrame() # this is a placeholder for the destination
for i in range(3):
df = df.append(df_to_append, ignore_index=True)
I don't think you need to use for loop here, try concat()
import pandas
result = pandas.concat([emptydf,additionaldf])
pandas.concat documentation

Categories

Resources