Pandas - Loop through sheets - python

I have 5 sheets and created a script to do numerous formatting, I tested it per sheet, and it works perfectly.
import numpy as np
import pandas as pd
FileLoc = r'C:\T.xlsx'
Sheets = ['Alex','Elvin','Gerwin','Jeff','Joshua',]
df = pd.read_excel(FileLoc, sheet_name= 'Alex', skiprows=6)
df = df[df['ENDING'] != 0]
df = df.head(30).T
df = df[~df.index.isin(['Unnamed: 2','Unnamed: 3','Unnamed: 4','ENDING' ,3])]
df.index.rename('STORE', inplace=True)
df['index'] = df.index
df2 = df.melt(id_vars=['index', 2 ,0, 1] ,value_name='SKU' )
df2 = df2[df2['variable']!= 3]
df2['SKU2'] = np.where(df2['SKU'].astype(str).fillna('0').str.contains('ALF|NOB|MET'),df2.SKU, None)
df2['SKU2'] = df2['SKU2'].ffill()
df2 = df2[~df2[0].isnull()]
df2 = df2[df2['SKU'] != 0]
df2[1] = pd.to_datetime(df2[1]).dt.date
df2.to_excel(r'C:\test.xlsx', index=False)
but when I assigned a list in Sheet_name = Sheets it always produced an error KeyError: 'ENDING'. This part of the code:
Sheets = ['Alex','Elvin','Gerwin','Jeff','Joshua',]
df = pd.read_excel(FileLoc,sheet_name='Sheets',skiprows=6)
Is there a proper way to do this, like looping?
My expected result is to execute the formatting that I have created and consolidate it into one excel file.
NOTE: All sheets have the same format.

In using the read_excel method, if you give the parameter sheet_name=None, this will give you a OrderedDict with the sheet names as keys and the relevant DataFrame as the value. So, you could apply this and loop through the dictionary using .items().
The code would look something like this,
dfs = pd.read_excel('your-excel.xlsx', sheet_name=None)
for key, value in dfs.items():
# apply logic to value
If you wish to combine the data in the sheets, you could use .append(). We can append the data after the logic has been applied to the data in each sheet. That would look something like this,
combined_df = pd.DataFrame()
dfs = pd.read_excel('your-excel.xlsx', sheet_name=None)
for key, value in dfs.items():
# apply logic to value, which is a DataFrame
combined_df = combined_df.append(sheet_df)

Related

Combine excel files

Can someone help how to get output in excel readable format? I am getting output as dataframe but #data is embedded a string in row number 2 and 3
import pandas as pd
import os
input_path = 'C:/Users/Admin/Downloads/Test/'
output_path = 'C:/Users/Admin/Downloads/Test/'
[enter image description here][1]
excel_file_list = os.listdir(input_path)
df = pd.DataFrame()
for file in excel_file_list:
if file.endswith('.xlsx'):
df1 = pd.read_excel(input_path+file, sheet_name=None)
df = df.append(df1, ignore_index=True)enter image description here
writer = pd.ExcelWriter('combined.xlsx', engine='xlsxwriter')
for sheet_name in df.keys():
df[sheet_name].to_excel(writer, sheet_name=sheet_name, index=False)
writer.save()
Your issue may be in using sheet_name=None. If any of the files have multiple sheets, a dictionary will be returned by pd.read_excel() with {'sheet_name':dataframe} format.
To .append() with this, you can try something like this, using python's Dictionary.items() method:
def combotime(dfinput):
df1 = pd.DataFrame()
for k, v in dfinput.items():
df1 = df1.append(dfin[k])
return df1
EDIT: If you mean to keep the sheets separate as implied by your writer loop, do not use a pd.DataFrame() object like your df to add the dictionary items. Instead, add to an existing dictionary:
sheets = {}
sheets = sheets.update(df1) #df1 is your read_excel dictionary
for sheet in sheets.keys():
sheets[sheet].to_excel(writer, sheet_name=sheet, index=Fasle)

Iterate and Concat multiple Dataframe pandas DF python

I have the below code for a pandas operation to parse a json and pick certain columns and concat at axis 1
df_columns_raw_1 = df_tables_normalized['columns'][1]
df_columns_normalize_1 = pd.json_normalize(df_columns_raw_1)
df_colName_1 = df_columns_normalize_1['columnName']
df_table_1 = df_columns_normalize_1['tableName']
df_colLen_1 = df_columns_normalize_1['columnLength']
df_colDataType_1 = df_columns_normalize_1['columnDatatype']
result_1 = pd.concat([df_table_1, df_colName_1,df_colLen_1,df_colDataType_1], axis=1)
bigdata = pd.concat([result_1, result_2....result_500], ignore_index=True, sort=False)
I need to iterate and automate the above code to concat until result_500 df in the bigdata variable instead writing manually for all the dfs

How to iterate with For loops using Excel and Pandas

I am working on combining two excel files that that the same columns but have different values. I would like to convert all numbers into currency form ($ and commas). I've been able to do this but would like to find a more simple way to write the code.
Also, I need help with the output file. I cannot open it unless I close python. It says "Cannot access this file" and is always syncing. Anyone know any solutions?
Here is my code
import pandas as pd
import openpyxl
import xlsxwriter
outputfile = "Outputfile.xlsx"
excel_files = ["File1.xlsx",
"File2.xlsx"]
def combine_excel(excel_files, sheet_name):
sheet_frames = [pd.read_excel(x, sheet_name=sheet_name) for x in excel_files]
combined_df = pd.concat(sheet_frames).reset_index(drop=True)
return combined_df
df1 = combine_excel(excel_files, 0)
df2 = combine_excel(excel_files, 1)
df3 = combine_excel(excel_files, 2)
df4 = combine_excel(excel_files, 3)
df5 = combine_excel(excel_files, 4)
df6 = combine_excel(excel_files, 5)
df7 = combine_excel(excel_files, 6)
for x in df1.iloc[:,[10,11,12,13,14,15,16,17,18,19,20,26,27,28,29,30]]:
df1[x] = df1[x].apply(lambda x: f"${x:,.0f}")
for x in df2.iloc[:,[10,11,12,13,14,15,16,17,18,19,20,26,27,28,29,30]]:
df2[x] = df2[x].apply(lambda x: f"${x:,.0f}")
.
.
.
.
.
.
writer = pd.ExcelWriter(outputfile, engine='xlsxwriter')
df1.to_excel(writer, sheet_name ='Column1', index = False)
df2.to_excel(writer, sheet_name='Column2', index = False)
df3.to_excel(writer, sheet_name='Column3', index = False)
df4.to_excel(writer, sheet_name='Column4', index = False)
df5.to_excel(writer, sheet_name='Column5', index = False)
df6.to_excel(writer, sheet_name='Column6', index = False)
df7.to_excel(writer, sheet_name ='Column7', index = False)
writer.save()
As you can see I would like to make this part more simple to read and write:
for x in df1.iloc[:,[10,11,12,13,14,15,16,17,18,19,20,26,27,28,29,30]]:
df1[x] = df1[x].apply(lambda x: f"${x:,.0f}")
for x in df2.iloc[:,[10,11,12,13,14,15,16,17,18,19,20,26,27,28,29,30]]:
df2[x] = df2[x].apply(lambda x: f"${x:,.0f}")
.
.
.
.
.
.
There is a total 12 lines of code just to convert a number of columns into currency form. Is there a way to do this with 2 lines of code? Also the reason there are multiple df(s) is because I am combining 6 sheets within each Excel file.
I can't test this, but this simplification by refactoring should work:
# instead df1 = ..., df2 = ..., etc., store them in a list
combined_frames = [combine_excel(excel_files, i) for i in range(7)]
# instead of explicitly enumerating all column indices, use a range;
# instead of applying to each column individually, use applymap to apply to
# all cells in the dataframe
for i,df in enumerate(combined_frames):
combined_frames[i].iloc[:, 10:31] = df.iloc[:, 10:31].applymap(lambda x: f"${x:,.0f}")
writer = pd.ExcelWriter(outputfile, engine='xlsxwriter')
# instead of exporting each individual df, export them in a loop,
# dynamically setting the sheet_name
for i, df in enumerate(combined_frames, start=1):
df.to_excel(writer, sheet_name = f'Column{i}', index=False)

How to obtain the mean of selected columns from multiple sheets within same Excel File

I am working with a large excel file having 22 sheets, where each sheet has the same coulmn headings but do not have equal number of rows. I would like to obtain the mean values (excluding zeros) for columns AA to AX for all the 22 sheets. The columns have titles which I use in my code.
Rather than reading each sheet, I want to loop through the sheets and get as output the mean values.
With help from answers to other posts, I have this:
import pandas as pd
xls = pd.ExcelFile('myexcelfile.xlsx')
xls.sheet_names
#print(xls.sheet_names)
out_df = pd.DataFrame()
for sheets in xls.sheet_names:
df = pd.read_excel('myexcelfile.xlsx', sheet_names= None)
df1= df[df[:]!=0]
df2=df1.loc[:,'aa':'ax'].mean()
out_df.append(df2) ## This will append rows of one dataframe to another(just like your expected output)
print(out_df2)
## out_df will have data from all the sheets
The code works so far, but only one of the sheets. How do I get it to work for all 22 sheets?
You can use numpy to perform basic math on pandas.DataFrame or pandas.Series
take a look at my code below
import pandas as pd, numpy as np
XL_PATH = r'C:\Users\YourName\PythonProject\Book1.xlsx'
xlFile = pd.ExcelFile(XL_PATH)
xlSheetNames = xlFile.sheet_names
dfList = [] # variable to store all DataFrame
for shName in xlSheetNames:
df = pd.read_excel(XL_PATH, sheet_name=shName) # read sheet X as DataFrame
dfList.append(df) # put DataFrame into a list
for df in dfList:
print(df)
dfAverage = np.average(df) # use numpy to get DataFrame average
print(dfAverage)
#Try code below
import pandas as pd, numpy as np, os
XL_PATH = "YOUR EXCEL FULL PATH"
SH_NAMES = "WILL CONTAINS LIST OF EXCEL SHEET NAME"
DF_DICT = {} """WILL CONTAINS DICTIONARY OF DATAFRAME"""
def readExcel():
if not os.path.isfile(XL_PATH): return FileNotFoundError
SH_NAMES = pd.ExcelFile(XL_PATH).sheet_names
# pandas.read_excel() have argument 'sheet_name'
# when you put a list to 'sheet_name' argument
# pandas will return dictionary of dataframe with sheet_name as keys
DF_DICT = pd.read_excel(XL_PATH, sheet_name=SH_NAMES)
return SH_NAMES, DF_DICT
#Now you have DF_DICT that contains all DataFrame for each sheet in excel
#Next step is to append all rows data from Sheet1 to SheetX
#This will only works if you have same column for all DataFrame
def appendAllSheets():
dfAp = pd.DataFrame()
for dict in DF_DICT:
df = DF_DICT[dict]
dfAp = pd.DataFrame.append(self=dfAp, other=df)
return dfAp
#you can now call the function as below:
dfWithAllData = appendAllSheets()
#now you have one DataFrame with all rows combine from Sheet1 to SheetX
#you can fixed the data, for example to drop all rows which contain '0'
dfZero_Removed = dfWithAllData[[dfWithAllData['Column_Name'] != 0]]
dfNA_removed = dfWithAllData[not[pd.isna(dfWithAllData['Column_Name'])]]
#last step, to find average or other math operation
#just let numpy do the job
average_of_all_1 = np.average(dfZero_Removed)
average_of_all_2 = np.average(dfNA_Removed)
#show result
#this will show the average of all
#rows of data from Sheet1 to SheetX from your target Excel File
print(average_of_all_1, average_of_all_2)

Returning in a DataFrame - Python

Good morning.
I have a question regarding Python. I have an if where has the conditional and else, the else it renders more than one file and I need to save all information it reads inside a DataFrame, is there a way to do this?
The code I am using:
for idx, folder in enumerate(fileLista):
if folder == 'filename_for_treatment':
df1 = pd.read_excel(folder, sheet_name = sheetName[idx], skiprows=1)
df1.columns = df1.columns.str.strip()
tratativaUm = df1[[column information to be used]]
else:
df2 = pd.read_excel(folder, sheet_name = sheetName[idx], skiprows=1)
df2.columns = df2.columns.str.strip()
TratativaDois = df2[[column information to be use]]
####assign result of each file received in the else
frames = [tratativaUm, tratativaDois]
titEmpresa = pd.concat(frames)
Can someone help me, is it possible to do this? Thanks
you can do it by appending your dataframes in a list for example:
list_df_tratativaDois = []
for idx, folder in enumerate(fileLista):
df = pd.read_excel(folder, sheet_name = sheetName[idx], skiprows=1)
df.columns = df.columns.str.strip()
if folder == 'filename_for_treatment':
tratativaUm = df[[column information to be used]]
else:
list_df_tratativaDois.append(df[[column information to be use]])
titEmpresa = pd.concat([tratativaUm]+list_df_tratativaDois)
Note that instead of df1 and df2 you can just create a df as it was the same read_excel and then depending if folder is the right one, do a different action on df

Categories

Resources