How to assign new column to existing DataFrame in pandas - python

I'm new to pandas. I'm trying to add new columns to my existing DataFrame but It's not getting assigned don't know why can anyone explain me what I'm missing this is what i tried
import pandas as pd
df = pd.DataFrame(data = {"test":["mkt1","mkt2","mkt3"],
"test2":["cty1","cty2","cty3"]})
print("Before",df.columns)
df.assign(test3="Hello")
print("After",df.columns)
Output
Before Index(['test', 'test2'], dtype='object')
After Index(['test', 'test2'], dtype='object')

Pandas assign method returns a new modified dataframe with a new column, it does not modify it in place.
import pandas as pd
df = pd.DataFrame(data = {"test":["mkt1","mkt2","mkt3"],
"test2":["cty1","cty2","cty3"]})
print("Before",df.columns)
df = df.assign(test3="Hello") # <--- Note the variable reassingment
print("After",df.columns)

Related

DataFrame returns empty after .update()

I am trying to create a new DataFrame which contains a calculation from an original DF.
To that purpose, I run a for loop with the calc for each column, but I am still getting the empty original DF and I don't see where is the source of the error.
May I ask for some help here?
import yfinance as yf
import pandas as pd
df = yf.download(["YPFD.BA", "GGAL.BA"], period='6mo')
df2 = pd.DataFrame()
for i in ["YPFD.BA", "GGAL.BA"]:
df2.update(df["Volume"][i] * df["Close"][i])
df2
I expected to create a new DF which contains the original index but with the calculation obtained from original DF
I think this is what you are looking to do:
import yfinance as yf
import pandas as pd
df = yf.download(["YPFD.BA", "GGAL.BA"], period='6mo')
df2 = pd.DataFrame()
for i in ["YPFD.BA", "GGAL.BA"]:
df2[i] = df["Volume"][i] * df["Close"][i]
df2

Set a list as the index of a pandas DataFrame

I have a list called: sens_fac = [0.8, 1, 1.2], and a dataframe df defined this way:
import pandas as pd
df = pd.DataFrame(index=range(len(sens_fac)),columns=range(len(factors)))
However, I want to modify the index. I know I can do this in the definition, and it works.
import pandas as pd
df = pd.DataFrame(index=sens_fac,columns=range(len(factors)))
But what if I want to modify the index after it was created? I tried doing this
df.set_index(sens_fac)
But I get this error:
KeyError: 'None of [0.8, 1.2] are in the columns'
You only call the set_index method on an existing pandas.DataFrame object of the same length as your index (so it must not be empty). But there is an index argument in the constructor:
import pandas as pd
import numpy as np
sens_fact = np.random.rand(5)
df = pd.DataFrame(index=sens_fact)
I you want to manipulate an existing pandas.DataFrame than pandas.DataFrame.set_index() is the correct method but it expects the name of a column of the table. So you go with:
df = pd.DataFrame(sens_fact, columns={'sens_fact'})
print(df) # dataframe with standard enumerated indices
df.set_index('sens_fact', inplace=True)
print(df) # dataframe with no columns an non-standard index
Or you can just manipulate the index directly:
df = pd.DataFrame(np.random.rand(len(sens_fact)))
df.index = sens_fact

dataframe values converted as 'nan' after applied df.iloc()

nan values
I ran into a problem after runnning: pd.DataFrame(), the whole data-frame became 'nan' (empty). I could not reverse this again. I also assigned the data-frame columns names, but their values also disappeared:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('PuntaCapi.csv', header=None, sep='\n')
df = df[0].str.split(',', expand=True)
df.to_csv("PuntaCapi.tab",sep="\t",header=None, index=False)
print(df)
Akim =df.iloc[:,0:1]
A= pd.DataFrame(data =Akim ,columns=['Akim'])
veriler2 = pd.DataFrame(data = df, columns=['Akim','Kuvvet','Zaman','Soguma','Yaklasma','Baski','SacKalinliği','PuntaCapi'])
print(veriler2)
Please view the following results from the above mentioned code:
[![Spyder View DataFrame code [][2]][2]1
There is no nan value into the csv file. But after .iloc[], entire dataframe became nan value. I have tried solve the problem but I could not. I need help to solve problem
enter image description here
I do not understand your question.
You read data using pd.read_csv('PuntaCapi.csv', header=None, sep='\n') and save it as df, but you modify df as df[0].str.split(',', expand=True), which directly impact on the result.
Try this code.
df = pd.read_csv('PuntaCapi.csv', header=None, sep='\n')
veriler2 = pd.DataFrame(data = df.values, columns=['Akim','Kuvvet','Zaman','Soguma','Yaklasma','Baski','SacKalinliği','PuntaCapi'])

Trying to iterate over DataFrame created from excel

I have an Excel file which I open with pandas and put into a dataframe. It all works well until I try to iterate over a column in the dataframe using a for loop. I get either df does not exist, or #iterrows() missing 1 required positional argument: 'self'
I tried adding this line to code from pandas import dataframe and import dataframe as df neither work
import pandas as pd
from pandas import DataFrame as df
def getFunc():
df = pd.read_excel('filename.xlsx')
for index, row in df.iterrows(): #this thows exception
some_list = row{ColName] * some_val
You should use return:
import pandas as pd
def getFunc():
df = pd.read_excel('filename.xlsx')
return(df)
df1 = getFunc()
for index, row in df1.iterrows():
some_list = row{ColName] * some_val

how to append a dataframe without overwriting existing dataframe using for loop in python

i have an empty dataframe[] and want to append additional dataframes using for loop without overwriting existing dataframes, regular append method is overwriting the existing dataframe and showing only the last appended dataframe in output.
use concat() from the pandas module.
import pandas as pd
df_new = pd.concat([df_empty, df_additional])
read more about it in the pandas Docs.
regarding the question in the comment...
df = pd.DataFrame(insert columns which your to-be-appended-df has too)
for i in range(10):
function_to_get_df_new()
df = pd.concat([df, df_new])
Let you have list of dataframes list_of_df = [df1, df2, df3].
You have empty dataframe df = pd.Dataframe()
If you want to append all dataframes in list into that empty dataframe df:
for i in list_of_df:
df = df.append(i)
Above loop will not change df1, df2, df3. But df will change.
Note that doing df.append(df1) will not change df, unless you assign it back to df so that df = df.append(df1)
You can't also use set:
df_new = pd.concat({df_empty, df_additional})
Because pandas.DataFrame objects can't be hashed, set needs hashed so that's why
Or tuple:
df_new = pd.concat((df_empty, df_additional))
They are little quicker...
Update for for loop:
df = pd.DataFrame(data)
for i in range(your number):
df_new=function_to_get_df_new()
df = pd.concat({df, df_new}) # or tuple: df = pd.concat((df, df_new))
The question is already well answered, my 5cts are the suggestion to use ignore_index=True option to get a continuous new index, not duplicate the older ones.
import pandas as pd
df_to_append = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB')) # sample
df = pd.DataFrame() # this is a placeholder for the destination
for i in range(3):
df = df.append(df_to_append, ignore_index=True)
I don't think you need to use for loop here, try concat()
import pandas
result = pandas.concat([emptydf,additionaldf])
pandas.concat documentation

Categories

Resources