Can someone help how to get output in excel readable format? I am getting output as dataframe but #data is embedded a string in row number 2 and 3
import pandas as pd
import os
input_path = 'C:/Users/Admin/Downloads/Test/'
output_path = 'C:/Users/Admin/Downloads/Test/'
[enter image description here][1]
excel_file_list = os.listdir(input_path)
df = pd.DataFrame()
for file in excel_file_list:
if file.endswith('.xlsx'):
df1 = pd.read_excel(input_path+file, sheet_name=None)
df = df.append(df1, ignore_index=True)enter image description here
writer = pd.ExcelWriter('combined.xlsx', engine='xlsxwriter')
for sheet_name in df.keys():
df[sheet_name].to_excel(writer, sheet_name=sheet_name, index=False)
writer.save()
Your issue may be in using sheet_name=None. If any of the files have multiple sheets, a dictionary will be returned by pd.read_excel() with {'sheet_name':dataframe} format.
To .append() with this, you can try something like this, using python's Dictionary.items() method:
def combotime(dfinput):
df1 = pd.DataFrame()
for k, v in dfinput.items():
df1 = df1.append(dfin[k])
return df1
EDIT: If you mean to keep the sheets separate as implied by your writer loop, do not use a pd.DataFrame() object like your df to add the dictionary items. Instead, add to an existing dictionary:
sheets = {}
sheets = sheets.update(df1) #df1 is your read_excel dictionary
for sheet in sheets.keys():
sheets[sheet].to_excel(writer, sheet_name=sheet, index=Fasle)
I have the below code for a pandas operation to parse a json and pick certain columns and concat at axis 1
df_columns_raw_1 = df_tables_normalized['columns'][1]
df_columns_normalize_1 = pd.json_normalize(df_columns_raw_1)
df_colName_1 = df_columns_normalize_1['columnName']
df_table_1 = df_columns_normalize_1['tableName']
df_colLen_1 = df_columns_normalize_1['columnLength']
df_colDataType_1 = df_columns_normalize_1['columnDatatype']
result_1 = pd.concat([df_table_1, df_colName_1,df_colLen_1,df_colDataType_1], axis=1)
bigdata = pd.concat([result_1, result_2....result_500], ignore_index=True, sort=False)
I need to iterate and automate the above code to concat until result_500 df in the bigdata variable instead writing manually for all the dfs
I am working on combining two excel files that that the same columns but have different values. I would like to convert all numbers into currency form ($ and commas). I've been able to do this but would like to find a more simple way to write the code.
Also, I need help with the output file. I cannot open it unless I close python. It says "Cannot access this file" and is always syncing. Anyone know any solutions?
Here is my code
import pandas as pd
import openpyxl
import xlsxwriter
outputfile = "Outputfile.xlsx"
excel_files = ["File1.xlsx",
"File2.xlsx"]
def combine_excel(excel_files, sheet_name):
sheet_frames = [pd.read_excel(x, sheet_name=sheet_name) for x in excel_files]
combined_df = pd.concat(sheet_frames).reset_index(drop=True)
return combined_df
df1 = combine_excel(excel_files, 0)
df2 = combine_excel(excel_files, 1)
df3 = combine_excel(excel_files, 2)
df4 = combine_excel(excel_files, 3)
df5 = combine_excel(excel_files, 4)
df6 = combine_excel(excel_files, 5)
df7 = combine_excel(excel_files, 6)
for x in df1.iloc[:,[10,11,12,13,14,15,16,17,18,19,20,26,27,28,29,30]]:
df1[x] = df1[x].apply(lambda x: f"${x:,.0f}")
for x in df2.iloc[:,[10,11,12,13,14,15,16,17,18,19,20,26,27,28,29,30]]:
df2[x] = df2[x].apply(lambda x: f"${x:,.0f}")
.
.
.
.
.
.
writer = pd.ExcelWriter(outputfile, engine='xlsxwriter')
df1.to_excel(writer, sheet_name ='Column1', index = False)
df2.to_excel(writer, sheet_name='Column2', index = False)
df3.to_excel(writer, sheet_name='Column3', index = False)
df4.to_excel(writer, sheet_name='Column4', index = False)
df5.to_excel(writer, sheet_name='Column5', index = False)
df6.to_excel(writer, sheet_name='Column6', index = False)
df7.to_excel(writer, sheet_name ='Column7', index = False)
writer.save()
As you can see I would like to make this part more simple to read and write:
for x in df1.iloc[:,[10,11,12,13,14,15,16,17,18,19,20,26,27,28,29,30]]:
df1[x] = df1[x].apply(lambda x: f"${x:,.0f}")
for x in df2.iloc[:,[10,11,12,13,14,15,16,17,18,19,20,26,27,28,29,30]]:
df2[x] = df2[x].apply(lambda x: f"${x:,.0f}")
.
.
.
.
.
.
There is a total 12 lines of code just to convert a number of columns into currency form. Is there a way to do this with 2 lines of code? Also the reason there are multiple df(s) is because I am combining 6 sheets within each Excel file.
I can't test this, but this simplification by refactoring should work:
# instead df1 = ..., df2 = ..., etc., store them in a list
combined_frames = [combine_excel(excel_files, i) for i in range(7)]
# instead of explicitly enumerating all column indices, use a range;
# instead of applying to each column individually, use applymap to apply to
# all cells in the dataframe
for i,df in enumerate(combined_frames):
combined_frames[i].iloc[:, 10:31] = df.iloc[:, 10:31].applymap(lambda x: f"${x:,.0f}")
writer = pd.ExcelWriter(outputfile, engine='xlsxwriter')
# instead of exporting each individual df, export them in a loop,
# dynamically setting the sheet_name
for i, df in enumerate(combined_frames, start=1):
df.to_excel(writer, sheet_name = f'Column{i}', index=False)
I am working with a large excel file having 22 sheets, where each sheet has the same coulmn headings but do not have equal number of rows. I would like to obtain the mean values (excluding zeros) for columns AA to AX for all the 22 sheets. The columns have titles which I use in my code.
Rather than reading each sheet, I want to loop through the sheets and get as output the mean values.
With help from answers to other posts, I have this:
import pandas as pd
xls = pd.ExcelFile('myexcelfile.xlsx')
xls.sheet_names
#print(xls.sheet_names)
out_df = pd.DataFrame()
for sheets in xls.sheet_names:
df = pd.read_excel('myexcelfile.xlsx', sheet_names= None)
df1= df[df[:]!=0]
df2=df1.loc[:,'aa':'ax'].mean()
out_df.append(df2) ## This will append rows of one dataframe to another(just like your expected output)
print(out_df2)
## out_df will have data from all the sheets
The code works so far, but only one of the sheets. How do I get it to work for all 22 sheets?
You can use numpy to perform basic math on pandas.DataFrame or pandas.Series
take a look at my code below
import pandas as pd, numpy as np
XL_PATH = r'C:\Users\YourName\PythonProject\Book1.xlsx'
xlFile = pd.ExcelFile(XL_PATH)
xlSheetNames = xlFile.sheet_names
dfList = [] # variable to store all DataFrame
for shName in xlSheetNames:
df = pd.read_excel(XL_PATH, sheet_name=shName) # read sheet X as DataFrame
dfList.append(df) # put DataFrame into a list
for df in dfList:
print(df)
dfAverage = np.average(df) # use numpy to get DataFrame average
print(dfAverage)
#Try code below
import pandas as pd, numpy as np, os
XL_PATH = "YOUR EXCEL FULL PATH"
SH_NAMES = "WILL CONTAINS LIST OF EXCEL SHEET NAME"
DF_DICT = {} """WILL CONTAINS DICTIONARY OF DATAFRAME"""
def readExcel():
if not os.path.isfile(XL_PATH): return FileNotFoundError
SH_NAMES = pd.ExcelFile(XL_PATH).sheet_names
# pandas.read_excel() have argument 'sheet_name'
# when you put a list to 'sheet_name' argument
# pandas will return dictionary of dataframe with sheet_name as keys
DF_DICT = pd.read_excel(XL_PATH, sheet_name=SH_NAMES)
return SH_NAMES, DF_DICT
#Now you have DF_DICT that contains all DataFrame for each sheet in excel
#Next step is to append all rows data from Sheet1 to SheetX
#This will only works if you have same column for all DataFrame
def appendAllSheets():
dfAp = pd.DataFrame()
for dict in DF_DICT:
df = DF_DICT[dict]
dfAp = pd.DataFrame.append(self=dfAp, other=df)
return dfAp
#you can now call the function as below:
dfWithAllData = appendAllSheets()
#now you have one DataFrame with all rows combine from Sheet1 to SheetX
#you can fixed the data, for example to drop all rows which contain '0'
dfZero_Removed = dfWithAllData[[dfWithAllData['Column_Name'] != 0]]
dfNA_removed = dfWithAllData[not[pd.isna(dfWithAllData['Column_Name'])]]
#last step, to find average or other math operation
#just let numpy do the job
average_of_all_1 = np.average(dfZero_Removed)
average_of_all_2 = np.average(dfNA_Removed)
#show result
#this will show the average of all
#rows of data from Sheet1 to SheetX from your target Excel File
print(average_of_all_1, average_of_all_2)
Good morning.
I have a question regarding Python. I have an if where has the conditional and else, the else it renders more than one file and I need to save all information it reads inside a DataFrame, is there a way to do this?
The code I am using:
for idx, folder in enumerate(fileLista):
if folder == 'filename_for_treatment':
df1 = pd.read_excel(folder, sheet_name = sheetName[idx], skiprows=1)
df1.columns = df1.columns.str.strip()
tratativaUm = df1[[column information to be used]]
else:
df2 = pd.read_excel(folder, sheet_name = sheetName[idx], skiprows=1)
df2.columns = df2.columns.str.strip()
TratativaDois = df2[[column information to be use]]
####assign result of each file received in the else
frames = [tratativaUm, tratativaDois]
titEmpresa = pd.concat(frames)
Can someone help me, is it possible to do this? Thanks
you can do it by appending your dataframes in a list for example:
list_df_tratativaDois = []
for idx, folder in enumerate(fileLista):
df = pd.read_excel(folder, sheet_name = sheetName[idx], skiprows=1)
df.columns = df.columns.str.strip()
if folder == 'filename_for_treatment':
tratativaUm = df[[column information to be used]]
else:
list_df_tratativaDois.append(df[[column information to be use]])
titEmpresa = pd.concat([tratativaUm]+list_df_tratativaDois)
Note that instead of df1 and df2 you can just create a df as it was the same read_excel and then depending if folder is the right one, do a different action on df