How to change specific row value in dataframe using pandas? [duplicate] - python

This question already has answers here:
Set value for particular cell in pandas DataFrame using index
(23 answers)
Closed 2 years ago.
Here I attached my data frame.I am trying to change specific value of row.but I am not getting succeed.Any leads would be appreciated.
df.replace(to_replace ="Agriculture, forestry and fishing ",
value ="Agriculture")
Image of My data frame

Try this:
df['Name'] = df['Name'].str.replace('Agriculture, forestry and fishing', 'Agriculture')

This should work for any data type:
df.loc[df.loc[:, 'Name']=='Agriculture, forestry and fishing', 'Name'] = 'Agriculture'

You can easily get all the columns names with calling: df.columns
then you can copy this list and replace the name of any column and reassign the list to df.columns.
For example:
import pandas as pd
df = pd.DataFrame(data=[[1, 2], [10, 20], [100, 200]], columns=['A', 'B'])
df.columns
the output will be in a jupyter notebook:
Index(['C', 'D'], dtype='object')
so you copy that list and then replace what you want to change and reassign it
df.columns = ['C', 'D']
and then you will get a dataframe with the name of columns changed from A and B to C and D, you check this by calling
df.head()

Related

How to extract a value after colon in all the rows from a pandas dataframe column? [duplicate]

This question already has answers here:
access value from dict stored in python df
(3 answers)
Closed 3 months ago.
Edit: the dummy dataframe is edited
I have a pandas data frame with the below kind of column with 200 rows.
Let's say the name of df is data.
-----------------------------------|
B
-----------------------------------|
{'animal':'cat', 'bird':'peacock'...}
I want to extract the value of animal to a separate column C for all the rows.
I tried the below code but it doesn't work.
data['C'] = data["B"].apply(lambda x: x.split(':')[-2] if ':' in x else x)
Please help.
The dictionary is unpacked with pd.json_normalize
import pandas as pd
data = pd.DataFrame({'B': [{0: {'animal': 'cat', 'bird': 'peacock'}}]})
data['C'] = pd.json_normalize(data['B'])['0.animal']
I'm not totally sure of the structure of your data. Does this look right?
import pandas as pd
import re
df = pd.DataFrame({
"B": ["'animal':'cat'", "'bird':'peacock'"]
})
df["C"] = df.B.apply(lambda x: re.sub(r".*?\:(.*$)", r"\1", x))

Pandas replace function wrongly changes in all dataframes [duplicate]

This question already has answers here:
why should I make a copy of a data frame in pandas
(8 answers)
Closed 2 years ago.
I use pandas replace function to replace a value. Please see the code below:
import pandas as pd
d = {'color' : pd.Series(['white', 'blue', 'orange']),
'second_color': pd.Series(['white', 'black', 'blue']),
'value' : pd.Series([1., 2., 3.])}
df1 = pd.DataFrame(d)
print(df1)
df = df1
df['color'] = df['color'].replace('white','red')
print(df1)
print(df)
I intend to change a value in df, but why is the same value in df1 also changed?
The code below is ok.
df=df.replace('white','red')
You need to use .copy()
df = df1.copy()
So the changes you do to df will not propagate to df1
Because both are referencing the same data location.
When you do df = df1 it does not create a new data frame it just set the reference of df to variable df1. Using id() you can see both referencing to the same address.
>>> df = df1
>>> id(df)
41633008
>>> id(df1)
41633008
To make a new copy you can use DataFrame.copy method
>>> df = df1.copy()
>>> id(df)
31533376
>>> id(df1)
41633008
Now you can see both referenced to different locations.
There is still much to learn about shallow copy and deep copy. Please read the document for more. - here

Add column containing list value if column contains string in list

I'm trying to scan a particular column in a dataframe, eg df['x'] for values which I have in a separate list list = ['y', 'z', 'a', 'b']. How do I make pandas load a new column with the list value if df['x'] contains any, or more than one of the values from the list?
Thanks!
Use this:
In [720]: import pandas as pd
In [719]: if df['x'].str.contains('|'.join(list)).any():
...: df = pd.concat([df, pd.DataFrame(list)], axis=1))
...:

How to change dataframe column names without changing the values? [duplicate]

This question already has answers here:
Renaming column names in Pandas
(35 answers)
Closed 2 years ago.
I have a bunch of CSV files which are read as dataframes. For each dataframe, I want to change some column names, if a specific column exists in a dataframe:
column_name_update_map = {'aa': 'xx'; 'bb': 'yy'}
In such a map, if 'aa' or 'bb' exists in a dataframe, I want to change the aa to xx, and 'bb' to 'yy'. No values should be changed.
for file in files:
print('Current file: ', file)
df = pd.read_csv(file, sep='\t')
df = df.replace(np.nan, '', regex=True)
for index, row in df.iterrows():
pass
I don't think I should use the inner loop, but if I have to do, what's the right way to change the column name only?
You can use rename in dataframes
column_name_update_map = {'aa': 'xx', 'bb': 'yy'}
df = df.rename(columns=column_name_update_map)
To rename specific columns then follow this code.
Code:
import pandas as pd
import numpy as np
#creating sample dataframe
df=pd.DataFrame({'aa':[1, 2], 'bb':[3, 4], 'c':[5, 6], '':[7, 8]})
#replace columns 'aa' to 'xx', 'bb' to 'yy' and '' to 'NaN'
df.rename(columns={'aa':'xx', 'bb':'yy', '':np.nan}, inplace=True)
#display resulting dataframe
print(df)
I hope it would be helpful.

python split pandas numeric vector column into multiple columns [duplicate]

This question already has answers here:
Split a Pandas column of lists into multiple columns
(11 answers)
Closed 4 years ago.
I have a dataframe in pandas, with a column which is a vector:
df = pd.DataFrame({'ID':[1,2], 'Averages':[[1,2,3],[4,5,6]]})
and I wish to split and divide it into elements which would look like this:
df2 = pd.DataFrame({'ID':[1,2], 'A':[1,4], 'B':[2,5], 'C':[3,6]})
I have tried
df['Averages'].astype(str).str.split(' ') but with no luck. any help would be appreciated.
pd.concat([df['ID'], df['Averages'].apply(pd.Series)], axis = 1).rename(columns = {0: 'A', 1: 'B', 2: 'C'})
This will work:
df[['A','B','C']] = pd.DataFrame(df.averages.values.tolist(), index= df.index)

Categories

Resources