I need python code to read in multiple Excel data in a blob using AzureFunction and output multiple sheets ('outlet','product','delivery') of each Excel to a csv file for conversion.
When outputting, I want each csv file to be named with the A1 value of the Excel data and saved in a separate blob folder. (Each corresponding sheet in Excel has the same format)(The contents of the excel data are numbers and letters).
I apologize for my lack of learning, but I don't know the Python code to do this in AzureFunction. Please let me know.
The image of the python code is as follows.
import pandas as pd
import openpyxl
from azure.storage.blob import BlobClient,BlobServiceClient,ContentSettings
connectionstring="XXXXXXXXXXXXXXXX"
#blob
container = "workspace/folder_excel"
save_container = "workspace/folder_csv"
excel_files="*.xlsm"
sheet_name = ['outlet','product','delivery']
#read_excelfile_list
for excel_file in excel_files:
#ExcelBook
input_book = pd.ExcelFile( container + excel_file, engine="openpyxl")
#excel_sheet
#added A1cell number
for sheet_name in sheets:
sheet = pd.read_excel(excel_file, sheet_name=sheet_name)
sheet.to_csv(save_container `+ %.csv" % sheet_name, index=False)
Related
I'm a super beginner and still learning Python.
I have an excel workbook which contains multiple sheets and only want certain sheets to be copied and pasted in a new created worbook and Im having some troubles.
below is my code.
import pandas as pd
import openpyxl
df = pd.read_excel('AMT.xlsb', sheet_name=['Roster','LOA'])
# print whole sheet data
with pd.ExcelWriter('output.xlsx') as writer:
df.to_excel(writer, sheet_name=['Roster','LOA'])
I get an error "IndexError: At least one sheet must be visible", none of the sheets from the AMT file are hidden.
Looks like you may be converting your frame to a dict - Try this:
import pandas as pd
import openpyxl
df = pd.read_excel('AMT.xlsb', sheet_name='Roster')
df1 = pd.read_excel('AMT.xlsb', sheet_name='LOA')
# print whole sheet data
with pd.ExcelWriter('output.xlsx') as writer:
df.to_excel(writer, sheet_name="Roster", index=False)
df1.to_excel(writer, sheet_name="LOA", index=False)
You may still have some clean up after...
I want to combine multiple Excel files/sheets into one Excel file with multiple sheets without changing any formatting. Basically, it is to append all sheets in multiple Excel files into One Excel file with multiple sheets.
For example,
File1 with Sheet1
File2 with Sheet2, Sheet3
File3 with Sheet4, Sheet5
Outcome would be File0 with Sheet1, Sheet2, Sheet3, Sheet4, Sheet5 (as one Excel file).
Here is a code:
from pandas import ExcelWriter
import glob
import os
import pandas as pd
writer = ExcelWriter("File0.xlsx")
for filename in glob.glob("'File*.xlsx"):
excel_file = pd.ExcelFile(filename)
#(_, f_name) = os.path.split(filename)
#(f_short_name, _) = os.path.splitext(f_name)
for sheet_name in excel_file.sheet_names:
df_excel = pd.read_excel(filename, sheet_name)
df_excel.to_excel(writer, sheet_name, index=False)
writer.save()
The code works, but it re-writes the sheets. So I am losing all formats. Is there another way to append all sheets into one Excel file without consolidating them or losing the formatting?
Thank you.
Try to load all sheets into a list put them in a sheet with different names!
from pandas import ExcelWriter
import glob
import os
import pandas as pd
list_of_sheets = []
for filename in glob.glob("'File*.xlsx"):
excel_file = pd.ExcelFile(filename)
list_of_sheets.append(excel_file)
# now add them as different sheets in same excel file
writer = pd.ExcelWriter('multiple.xlsx', engine='xlsxwriter')
for i in range(0, len(list_of_sheets)):
list_of_sheets[i].to_excel(writer, sheet_name='Sheet{}'.format(i))
writer.save()
# in this way, it will be one sheet called multiple.xlsx where each sheet name will be named like sheet1, sheet2... and so on!
#Please accept and upvote the answer if it works, or comment if you have a doubt or error!
I have 5 sheets in an excel workbook. I would like to export each sheet to csv using python libraries.
This is a sheet showing sales in 2019. I have named the seets according to the year they represent as shown here.
I have read the excel spreadsheet using pandas. I have used the for loop since I am interested in saving the csv file like the_sheet_name.csv. This is my code in a jupyter notebook:
import pandas as pd
df = pd.DataFrame()
myfile = 'sampledata.xlsx’
xl = pd.ExcelFile(myfile)
for sheet in xl.sheet_names:
df_tmp = xl.parse(sheet)
print(df_tmp)
df = df.append(df_tmp, ignore_index=True,sort=False)
csvfile = f'{sheet_name}.csv'
df.to_csv(csvfile, index=False)
Executing the code is producing just one csv file that has the data for all the other sheets. I would like to know if there is a way to customize my code so that I can produce individual sheets e.g sales2011.csv, sales2012.csv and so on.
Use sheet_name=None returns a dictionary of dataframes:
dfs = pd.read_excel('file.xlsx', sheet_name=None)
for sheet_name, data in dfs.items():
data.to_csv(f"{sheet_name}.csv")
I want to write a data frame into an existing Excel sheet without overwriting and save it using another file name using Python
import pandas as pd
import openpyxl
srcfile = openpyxl.load_workbook('C:\\Users\\kavitha.j\\Desktop\\Automation\\Report builder\\template.xlsx',read_only=False, keep_vba= True)*#to open the excel sheet and if it has macros*
df=pd.read_excel('C:\\Users\\kavitha.j\\Desktop\\Automation\\Report builder\\CVS Report Builder test.xlsx',sheet_name='Landing Page',skiprows=8)
df
sheetname = srcfile.get_sheet_by_name('Sheet1')#get sheetname from the file
sheetname.range('A9').value = df
sheetname.range('A9').options(pd.DataFrame, expand='table').value
srcfile.save('C:\\Users\\kavitha.j\\Desktop\\Automation\\Report builder\\finaltemplate.xlsm') #save it as a new file, the original file is untouched and here I am saving it as xlsm(m here denotes macros).
I am fairly new to Python, but I'm getting stuck trying to pass an image file into a header during the DataFrame.to_excel() portion of my file.
Basically what I want is a picture in the first cell of the Excel table, followed by a couple of rows (5 to be exact) of text which will include a date (probably from datetime.date.today().ctime() if possible).
I already have the code to output the table portion as:
mydataframe.to_excel(my_path_name, sheet_name= my_sheet_name, index=False, startrow=7,startcol=0)
Is there a way to output the image and text portion directly from Python?
UPDATE:
For clarity, mydataframe is exporting the meat and potatoes of the worksheet (data rows and columns). I already have it starting on row 7 of the worksheet in Excel. The header portion is the trouble spot.
I found the solution and thanks for all of the help.
The simple answer is to use the xlsxwriter package as the engine. In other words assume that the image is saved at the path /image.png. Then the code to insert the data into the excel file with the image located at the top of the data would be:
# Importing packages and storing string for image file
import pandas as pd
import xlsxwriter
import numpy as np
image_file = '/image.png'
# Creating a fictitious data set since the actual data doesn't matter
dataframe = pd.DataFrame(np.random.rand(5,2),columns=['a','b'])
# Opening the xlsxwriter object to a path on the C:/ drive
writer = pd.ExcelWriter('C:/file.xlsx',engine='xlsxwriter')
dataframe.to_excel(writer,sheet_name = 'Arbitrary', startrow=3)
# Accessing the workbook / worksheet
workbook = writer.book
worksheet = writer.sheets['Arbitrary']
# Inserting the image into the workbook in cell A1
worksheet.insert_image('A1',image_file)
# Closing the workbook and saving the file to the specified path and filename
writer.save()
And now I have an image on the top of my excel file. Huzzah!