Set a list as the index of a pandas DataFrame - python

I have a list called: sens_fac = [0.8, 1, 1.2], and a dataframe df defined this way:
import pandas as pd
df = pd.DataFrame(index=range(len(sens_fac)),columns=range(len(factors)))
However, I want to modify the index. I know I can do this in the definition, and it works.
import pandas as pd
df = pd.DataFrame(index=sens_fac,columns=range(len(factors)))
But what if I want to modify the index after it was created? I tried doing this
df.set_index(sens_fac)
But I get this error:
KeyError: 'None of [0.8, 1.2] are in the columns'

You only call the set_index method on an existing pandas.DataFrame object of the same length as your index (so it must not be empty). But there is an index argument in the constructor:
import pandas as pd
import numpy as np
sens_fact = np.random.rand(5)
df = pd.DataFrame(index=sens_fact)
I you want to manipulate an existing pandas.DataFrame than pandas.DataFrame.set_index() is the correct method but it expects the name of a column of the table. So you go with:
df = pd.DataFrame(sens_fact, columns={'sens_fact'})
print(df) # dataframe with standard enumerated indices
df.set_index('sens_fact', inplace=True)
print(df) # dataframe with no columns an non-standard index
Or you can just manipulate the index directly:
df = pd.DataFrame(np.random.rand(len(sens_fact)))
df.index = sens_fact

Related

DataFrame returns empty after .update()

I am trying to create a new DataFrame which contains a calculation from an original DF.
To that purpose, I run a for loop with the calc for each column, but I am still getting the empty original DF and I don't see where is the source of the error.
May I ask for some help here?
import yfinance as yf
import pandas as pd
df = yf.download(["YPFD.BA", "GGAL.BA"], period='6mo')
df2 = pd.DataFrame()
for i in ["YPFD.BA", "GGAL.BA"]:
df2.update(df["Volume"][i] * df["Close"][i])
df2
I expected to create a new DF which contains the original index but with the calculation obtained from original DF
I think this is what you are looking to do:
import yfinance as yf
import pandas as pd
df = yf.download(["YPFD.BA", "GGAL.BA"], period='6mo')
df2 = pd.DataFrame()
for i in ["YPFD.BA", "GGAL.BA"]:
df2[i] = df["Volume"][i] * df["Close"][i]
df2

How to assign new column to existing DataFrame in pandas

I'm new to pandas. I'm trying to add new columns to my existing DataFrame but It's not getting assigned don't know why can anyone explain me what I'm missing this is what i tried
import pandas as pd
df = pd.DataFrame(data = {"test":["mkt1","mkt2","mkt3"],
"test2":["cty1","cty2","cty3"]})
print("Before",df.columns)
df.assign(test3="Hello")
print("After",df.columns)
Output
Before Index(['test', 'test2'], dtype='object')
After Index(['test', 'test2'], dtype='object')
Pandas assign method returns a new modified dataframe with a new column, it does not modify it in place.
import pandas as pd
df = pd.DataFrame(data = {"test":["mkt1","mkt2","mkt3"],
"test2":["cty1","cty2","cty3"]})
print("Before",df.columns)
df = df.assign(test3="Hello") # <--- Note the variable reassingment
print("After",df.columns)

Trying to iterate over DataFrame created from excel

I have an Excel file which I open with pandas and put into a dataframe. It all works well until I try to iterate over a column in the dataframe using a for loop. I get either df does not exist, or #iterrows() missing 1 required positional argument: 'self'
I tried adding this line to code from pandas import dataframe and import dataframe as df neither work
import pandas as pd
from pandas import DataFrame as df
def getFunc():
df = pd.read_excel('filename.xlsx')
for index, row in df.iterrows(): #this thows exception
some_list = row{ColName] * some_val
You should use return:
import pandas as pd
def getFunc():
df = pd.read_excel('filename.xlsx')
return(df)
df1 = getFunc()
for index, row in df1.iterrows():
some_list = row{ColName] * some_val

Pandas: Fill new column by condition row-wise

import pandas as pd
import numpy as np
df = pd.DataFrame([np.random.rand(100),100*[0.1],100*[0.3]]).T
df.columns = ["value","lower","upper"]
df.head()
How can I create a new column which indicates that value is between lower and upper ?
You can use between for this purpose.
df['new_col'] = df['value'].between(df['lower'], df['upper'])

Pandas Re-indexing command

*RE Add missing dates to pandas dataframe, previously ask question
import pandas as pd
import numpy as np
idx = pd.date_range('09-01-2013', '09-30-2013')
df = pd.DataFrame(data = [2,10,5,1], index = ["09-02-2013","09-03-2013","09-06-2013","09-07-2013"], columns = ["Events"])
df.index = pd.DatetimeIndex(df.index); #question (1)
df = df.reindex(idx, fill_value=np.nan)
print(df)
In the above script what does the command noted as question one do? If you leave this
command out of the script, the df will be re-indexed but the data portion of the
original df will not be retained. As there is no reference to the df data in the
DatetimeIndex command, why is the data from the starting df lost?
Short answer: df.index = pd.DatetimeIndex(df.index); converts the string index of df to a DatetimeIndex.
You have to make the distinction between different types of indexes. In
df = pd.DataFrame(data = [2,10,5,1], index = ["09-02-2013","09-03-2013","09-06-2013","09-07-2013"], columns = ["Events"])
you have an index containing strings. When using
df.index = pd.DatetimeIndex(df.index);
you convert this standard index with strings to an index with datetimes (a DatetimeIndex). So the values of these two types of indexes are completely different.
Now, when you reindex with
idx = pd.date_range('09-01-2013', '09-30-2013')
df = df.reindex(idx)
where idx is also an index with datetimes. When you reindex the original df with a string index, there are no matching index values, so no column values of the original df are retained. When you reindex the second df (after converting the index to a datetime index), there will be matching index values, so the column values on those indixes are retained.
See also http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.reindex.html

Categories

Resources