add column to a dataframe in python pandas - python

How do i loop through my excel sheet and add each 'Adjusted Close' to a dataframe? I want to summarize all adj close and make an stock indice.
When i try with the below code the dataframe Percent_Change is empty.
xls = pd.ExcelFile('databas.xlsx')
countSheets = len(xls.sheet_names)
Percent_Change = pd.DataFrame()
x = 0
for x in range(countSheets):
data = pd.read_excel('databas.xlsx', sheet_name=x, index_col='Date')
# Calculate the percent change from day to day
Percent_Change[x] = pd.Series(data['Adj Close'].pct_change()*100, index=Percent_Change.index)
stock_index = data['Percent_Change'].cumsum()

unfortunately I do not have the data to replicate your complete example. However, there appears to be a bug in your code.
You are looping over "x" and "x" is a list of integers. You probably want to loop over the sheet names and append them to your DF. If you want to do that your code should be:
import pandas as pd
xls = pd.ExcelFile('databas.xlsx')
# pep8 unto thyself only, it is conventional to use "_" instead of camelCase or to avoid longer names if at all possible
sheets = xls.sheet_names
Percent_Change = pd.DataFrame()
# using sheet instead of x is more "pythonic"
for sheet in sheets:
data = pd.read_excel('databas.xlsx', sheet_name=sheet, index_col='Date')
# Calculate the percent change from day to day
Percent_Change[sheet] = pd.Series(data['Adj Close'].pct_change()*100, index=Percent_Change.index)
stock_index = data['Percent_Change'].cumsum()

Related

Python. Pandas. Loop. Add Excel Sheet names

I have my code ready, I just can't seem to wrap my head around how to pass the sheet name into the column I have created called "Month" (s['Month'] = 0)?
The purpose of the loop is to go to each Sheet and clean up the data and add a column that will say the current Sheet name.
import pandas as pd
# list of sheet names
sheets = pd.read_excel('Royalties Jan to Dec 21.xlsx', sheet_name=None).values()
# create an empty dataframe to store the merged data
merged_df = []
# loop through the sheet names and read each sheet into a dataframe
for s in sheets:
s['Month'] = 0 ##This is the line I need to change so instead of 0 I get Sheet Name.
s = s.fillna('')
s.columns = (s.iloc[2] + ' ' + s.iloc[3])
s = s[s.iloc[:,0] == 'TOTAL']
# append the data from each sheet to the merged dataframe
merged_df.append(s)
merged_df = pd.concat(merged_df)
merged_df
Any help would be appreciated! Thank you!

Why can't I get a in 1 dataframe all looped dataframes?

I have a listing of 5 stock symbols in a .csv file. I am using the loop below to get options data from each one of the symbols. The output of all 5 symbols ideally will get save in an .xlsx file
If I execute print(df_puts) I see all symbols in the dataframe. However the output .xlsx file only has data from last symbol in the .csv file. Basically it prints data from the last looped symbol and not from all symbols within the loop
Im new to pandas and python in general. I like to understand why this happens for future projects
stocklist = pd.read_excel(filePath)
for i in stocklist.index:
stock=str(stocklist["Symbols"][i])
#df = pdr.get_data_yahoo(stock, start, now, threads=False)
option_dict = options.get_options_chain(stock)
#print(option_dict)
df_puts = pd.DataFrame.from_dict(option_dict.get("puts"))
df_calls = pd.DataFrame.from_dict(option_dict.get("calls"))
newFile = os.path.dirname(filePath1) + "/OptionsOutput.xlsx"
writer = ExcelWriter(newFile)
df_puts.to_excel(writer, "puts", float_format="%.3f")
df_calls.to_excel(writer, "calls", float_format="%.3f")
writer.save()
You need to save all the dataframes in an array and then concat them into final dataframe. After that you can save them in an excel file.
stocklist = pd.read_excel(filePath)
call_df_arr = [] # Created new lists to save dataframe for each stock
put_df_arr = []
for i in stocklist.index:
stock=str(stocklist["Symbols"][i])
#df = pdr.get_data_yahoo(stock, start, now, threads=False)
option_dict = options.get_options_chain(stock)
df_puts = pd.DataFrame.from_dict(option_dict.get("puts"))
df_calls = pd.DataFrame.from_dict(option_dict.get("calls"))
call_df_arr.append(df_calls) # Append DFs
put_df_arr.append(df_puts)
final_call_df = pd.concat(call_df_arr) # Concat DFs
final_put_df = pd.concat(put_df_arr)
newFile = os.path.dirname(filePath1) + "/OptionsOutput.xlsx"
writer = ExcelWriter(newFile)
final_put_df.to_excel(writer, "puts", float_format="%.3f") # Changed name of df to final_put_df
final_call_df.to_excel(writer, "calls", float_format="%.3f")
writer.save()
Edit - Added comments for code changes done.

Using a for loop to add values across different excel sheets but need separated values

hi:) i am trying to make a for loop to reduce redundancy in my code, where i need to access a number of different sheets within an excel file, count the number of specific values and later plot a graph.
my code for my for loop looks like this at the moment:
df = pd.read_excel('C:/Users/julia/OneDrive/Documents/python assignment/2016 data -EU values.xlsx',
skiprows=6)
sheets_1 = ["PM10 ", "PM2.5", "O3 ", "NO2 ", "BaP", "SO2"]
resultM1 = 0
for sheet in sheets_1:
print(sheet[0:5])
for row in df.iterrows():
if row[1]['Country'] == 'Malta':
resultM1 += row[1]['AirPollutionLevel']
print(resultM1)
i would like for the output to look something like this:
PM10 142
PM2.5 53
O3 21
NO2 3
BaP 21
SO2 32
but what i'm getting is just the sheet names printed after each other and the total amount of the sepcific value i need across all sheets. i.e.
PM10
PM2.5
O3
NO2
BaP
SO2
284.913786
i really need the values separated into their respective sheet and not added together.
attached is a screeshot of the excel file. as u can see, there are different sheets and many values within -i need to add values for a specific country in each sheet.
any help would be greatly appreciated!
import pandas as pd
# Open as Excel object
xls = pd.ExcelFile('C:/Users/julia/OneDrive/Documents/python assignment/2016 data -EU values.xlsx')
# Get sheet names
sheets_1 = xls.sheet_names
# Dictionary of SheetNames:dfOfSheets
sheet_to_df_map = {}
for sheet_name in xls.sheet_names:
sheet_to_df_map[sheet_name] = xls.parse(sheet_name)
# create list to store results
resultM1 = []
# Loop over keys and df in dictionary
for key, df in sheet_to_df_map.items():
# remove top 5 blank rows
df = df = df.iloc[5:]
# set column names as first row values
headers = df.iloc[0]
df = pd.DataFrame(df.values[1:], columns=headers)
#loop over rows in the df and create pd series to store Malta results
results =df.loc[df['Country'] == "Malta", 'AirPollutionLevel']
# Loop over the results for Malta from each sheet and append 'Malta' and then append the value to a list
for i in results:
resultM1.append(key)
resultM1.append(i)
# Convert list to df
df = pd.DataFrame(resultM1)
# Rename column
df = df.rename({0: 'Sheet'}, axis=1)
# create two columns
final = pd.DataFrame({'Sheet':df['Sheet'].iloc[::2].values, 'Value':df['Sheet'].iloc[1::2].values})

create new column in dataframe conditionally

updated question
by using the code below i am able to access dataframe only after completion of for loop, but i want to use most recently created column of the dataframe at intermediate time. i.e after every 5 minutes whichever is the last column of the dataframe ,how to achieve this?
#app.route("/sortbymax")
def sortbymax():
df = updated_data()
#### here i want to use most recently created column
df = create_links(df)
df = df.sort_values(by=['perc_change'], ascending=False)
return render_template('sortbymax.html',tables=[df.to_html(escape = False)], titles=df.columns.values)
def read_data():
filename = r'c:\Users\91956\Desktop\bk.xlsm'
df = pd.read_excel(filename)
return df
def updated_data():
df = read_data()
for i in range(288):
temp = read_data()
x=datetime.datetime.now().strftime("%H:%M:%S")
df['perc_change_'+x] = temp['perc_change']
time.sleep(300)
return df
I see you have a file .xlsm which means is a macro enabled excel. I guess you can read it but if you want to change it with python than you most probably lose the macro part in your excel.
For the python part:
this will copy the perc_change column every 5 minutes, with the respective name. However bear in mind that this will work only for one day (it will replace existing columns after that). If you want to work for longer periods, let me know so that I will add day-month-year (whatever you want) in column names.
import datetime
import time
def read_data():
filename = r'c:\Users\91956\Desktop\bk.xlsm'
df = pd.read_excel(filename)
return df
def write_data(df):
filename = r'c:\Users\91956\Desktop\bk.xlsm'
df.to_excel(filename)
df = read_data() #read excel for first time
for i in range(288): #this will run for one day exactly
temp = read_data()
x=datetime.datetime.now().strftime("%H:%M")
df['perc_change_'+x] = temp['perc_change']
time.sleep(300)

How to obtain the mean of selected columns from multiple sheets within same Excel File

I am working with a large excel file having 22 sheets, where each sheet has the same coulmn headings but do not have equal number of rows. I would like to obtain the mean values (excluding zeros) for columns AA to AX for all the 22 sheets. The columns have titles which I use in my code.
Rather than reading each sheet, I want to loop through the sheets and get as output the mean values.
With help from answers to other posts, I have this:
import pandas as pd
xls = pd.ExcelFile('myexcelfile.xlsx')
xls.sheet_names
#print(xls.sheet_names)
out_df = pd.DataFrame()
for sheets in xls.sheet_names:
df = pd.read_excel('myexcelfile.xlsx', sheet_names= None)
df1= df[df[:]!=0]
df2=df1.loc[:,'aa':'ax'].mean()
out_df.append(df2) ## This will append rows of one dataframe to another(just like your expected output)
print(out_df2)
## out_df will have data from all the sheets
The code works so far, but only one of the sheets. How do I get it to work for all 22 sheets?
You can use numpy to perform basic math on pandas.DataFrame or pandas.Series
take a look at my code below
import pandas as pd, numpy as np
XL_PATH = r'C:\Users\YourName\PythonProject\Book1.xlsx'
xlFile = pd.ExcelFile(XL_PATH)
xlSheetNames = xlFile.sheet_names
dfList = [] # variable to store all DataFrame
for shName in xlSheetNames:
df = pd.read_excel(XL_PATH, sheet_name=shName) # read sheet X as DataFrame
dfList.append(df) # put DataFrame into a list
for df in dfList:
print(df)
dfAverage = np.average(df) # use numpy to get DataFrame average
print(dfAverage)
#Try code below
import pandas as pd, numpy as np, os
XL_PATH = "YOUR EXCEL FULL PATH"
SH_NAMES = "WILL CONTAINS LIST OF EXCEL SHEET NAME"
DF_DICT = {} """WILL CONTAINS DICTIONARY OF DATAFRAME"""
def readExcel():
if not os.path.isfile(XL_PATH): return FileNotFoundError
SH_NAMES = pd.ExcelFile(XL_PATH).sheet_names
# pandas.read_excel() have argument 'sheet_name'
# when you put a list to 'sheet_name' argument
# pandas will return dictionary of dataframe with sheet_name as keys
DF_DICT = pd.read_excel(XL_PATH, sheet_name=SH_NAMES)
return SH_NAMES, DF_DICT
#Now you have DF_DICT that contains all DataFrame for each sheet in excel
#Next step is to append all rows data from Sheet1 to SheetX
#This will only works if you have same column for all DataFrame
def appendAllSheets():
dfAp = pd.DataFrame()
for dict in DF_DICT:
df = DF_DICT[dict]
dfAp = pd.DataFrame.append(self=dfAp, other=df)
return dfAp
#you can now call the function as below:
dfWithAllData = appendAllSheets()
#now you have one DataFrame with all rows combine from Sheet1 to SheetX
#you can fixed the data, for example to drop all rows which contain '0'
dfZero_Removed = dfWithAllData[[dfWithAllData['Column_Name'] != 0]]
dfNA_removed = dfWithAllData[not[pd.isna(dfWithAllData['Column_Name'])]]
#last step, to find average or other math operation
#just let numpy do the job
average_of_all_1 = np.average(dfZero_Removed)
average_of_all_2 = np.average(dfNA_Removed)
#show result
#this will show the average of all
#rows of data from Sheet1 to SheetX from your target Excel File
print(average_of_all_1, average_of_all_2)

Categories

Resources