I have a folder with 100 excel files. I just need 20 of them and I want to create one excel file with selected sheets (same sheets for all excel files).
I did the following:
# Working directory
data_folder = Path("C:/Users/.../myfiles")
#My working files:
excel01 = data_folder / "excel01.xls"
excel02 = data_folder / "excel02.xls"
...
excel20 = data_folder / "excel20.xls"
How do I create the single excel file?
I tried with
df = pd.read_excel ([excel01,excel02,...,excel20],sheet_name = ['sheet1','sheet2,'sheet3'], skiprows = 4)
but it's not working. Any suggestions or more efficient ways are welcome. Thanks
You can create ExcelWriter and send it as a parameter to df.to_excel
excels = [excel01, excel02, ...]
sheets = random.sample(range(1, 16), 3)
i = 1
with pd.ExcelWriter('output.xlsx') as writer:
for excel in excels:
for sheet in sheets:
df = pd.read_excel(excel, sheet_name=f'Sheet{sheet}')
df.to_excel(writer, f'Sheet{i}')
writer.save()
i += 1
import pandas as pd
# Create some Pandas dataframes from some data.
df1 = pd.DataFrame({'Data': [11, 12, 13, 14]})
df2 = pd.DataFrame({'Data': [21, 22, 23, 24]})
df3 = pd.DataFrame({'Data': [31, 32, 33, 34]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_multiple.xlsx', engine='xlsxwriter')
# Write each dataframe to a different worksheet.
df1.to_excel(writer, sheet_name='Sheet1')
df2.to_excel(writer, sheet_name='Sheet2')
df3.to_excel(writer, sheet_name='Sheet3')
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Related
I am trying to append data on existing excel sheet using Pandas-ExcelWriter functionality.
As per python official document, if_sheet_exists=overlay : Write contents to the existing sheet without removing the old contents.
Code I tried:
import pandas as pd
df = pd.DataFrame({'Data': [10, 20, 30]})
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter', mode='w')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
df = pd.DataFrame({'Data': [100, 200, 300]})
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='openpyxl', mode='a', if_sheet_exists='overlay')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
Output I am getting: (overwriting new data instead of appending)
Output I am expecting:
Version details:
Python : 3.9.2
Pandas : pandas==1.4.3
openpyxl : openpyxl==3.0.10
xlsx : XlsxWriter==3.0.3
Trials:
Tried with engine='xlsxwriter' for append mode. but got ValueError: Append mode is not supported with xlsxwriter!
I would suggest to ignore `xlswriter.
my approach would be as below:
import pandas as pd
import openpyxl
df = pd.DataFrame({'Data': [10, 20, 30]})
df.to_excel('pandas_simple.xlsx', sheet_name='Sheet1', index=False) #saving initial dataframe to file
df1 = pd.DataFrame({'Data': [100, 200, 300]}) # new data
wb = openpyxl.load_workbook('pandas_simple.xlsx') # open old file
ws = wb["Sheet1"] # assign sheet to work with or as below
# ws = wb.active
for index, row in df1.iterrows():
ws.append(row.values.tolist())
wb.save("pandas_simple.xlsx")
I'm running into an issue that I think relates to needing to:
export a dataframe to a new Excel worksheet (created at time of export)
write specific values to an existing sheet in same workbook
Doing both of the above in a loop
I can get 1 and 3 to work by themselves, and I can get 2 and 3 to work by themselves, but when I try to do all three things it doesn't work. I think there is some issue with the pandas to_excel using xlsxwriter engine conflicting with the sheets.write(row,column, value) to the same workbook.
For instance, this works by itself (note that I have the "writer" stuff to export dataframe to new sheet commented out):
import pandas as pd
import xlsxwriter
loopList = ["A","B","C","D","E"]
data = [['tom', 10], ['nick', 15], ['juli', 14]]
counter = 1
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
workbook = xlsxwriter.Workbook('C:\\Test\\Test.xlsx')
totalsSheet = workbook.add_worksheet('Totals')
writer = pd.ExcelWriter('C:\\Test\\Test.xlsx', engine = 'xlsxwriter')
for sheets in loopList:
#df.to_excel(writer, sheet_name = sheets, index=False)
totalsSheet.write(counter, counter, sheets + str(counter))
counter+=1
#writer.save()
#writer.close()
workbook.close()
The above makes the test.xlsx workbook, with a Totals worksheet, with "A1", "B2", etc. in incrementing row/column.
Likewise, when I comment out the workbook stuff and UN comment the pandas-export dataframe to new sheets, that also works:
import pandas as pd
import xlsxwriter
loopList = ["A","B","C","D","E"]
data = [['tom', 10], ['nick', 15], ['juli', 14]]
counter = 1
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
workbook = xlsxwriter.Workbook('C:\\Test\\Test.xlsx')
totalsSheet = workbook.add_worksheet('Totals')
writer = pd.ExcelWriter('C:\\Test\\Test.xlsx', engine = 'xlsxwriter')
for sheets in loopList:
df.to_excel(writer, sheet_name = sheets, index=False)
#totalsSheet.write(counter, counter, sheets + str(counter))
counter+=1
writer.save()
writer.close()
#workbook.close()
The above gives me a new Test workbook with 5 sheets (A, B, C, etc.) with the same dataframe exported to each.
However, I can't seem to do both; depending on the order in which I have the lines that write to Excel, it still only does one or the other (I don't get errors, I just get a result that's not both things I'm trying to do).
Is there a way to accomplish both of these things in the same loop?
I'm using python 3.x.x. Thanks for any help.
Could you not just run it in to separate loops that each open and close the file to ensure that it is available to each processes? Something like...
import pandas as pd
import xlsxwriter
loopList = ["A","B","C","D","E"]
data = [['tom', 10], ['nick', 15], ['juli', 14]]
counter = 1
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
with pd.ExcelWriter('C:\\Test\\Test.xlsx', engine = 'xlsxwriter') as writer:
for sheets in loopList:
df.to_excel(writer, sheet_name = sheets, index=False)
workbook = xlsxwriter.Workbook('C:\\Test\\Test.xlsx')
totalsSheet = workbook.add_worksheet('Totals')
for sheets in loopList:
totalsSheet.write(counter, counter, sheets + str(counter))
counter+=1
workbook.close()
Update
Looking into it more, as the docs say xlsxwriter:
cannot read or modify existing Excel XLSX files.
So what you were trying before was causing the overwrite to happen. However, if you look into the docs more you will see that the key is to create the workbook object from the pd.ExcelWriter object. This will mean that both libraries can write to the file at the same time.
I installed xlsxwriter and the code below works for me:
import pandas as pd
import xlsxwriter
# data to write
loopList = ["A","B","C","D","E"]
data = [['tom', 10], ['nick', 15], ['juli', 14]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
# create the writer object
writer = pd.ExcelWriter('Test.xlsx', engine='xlsxwriter')
# create the workbook object from the current writer object
# this means that pandas and xlsxwriter can both write to it
workbook = writer.book
totalsSheet = workbook.add_worksheet('Totals')
# set the counter
counter = 1
# lopo through and use xlsx writer to write specific cells
for sheets in loopList:
totalsSheet.write(counter, counter, sheets + str(counter))
counter+=1
# loop through generating new sheets and writing dfs to the file
for sheets in loopList:
df.to_excel(writer, sheet_name = sheets, index=False)
# save the written data
writer.save()
I have 5 excel tabs in a certain sheet that I need to copy and paste text files into. I know how to a certain cell of a normal excel sheet with one tab. But I have no idea how to copy and paste each text file to the correct tab as there are formulas in each one.
Any ideas?
you can achive using python panda
import pandas as pd
# Create some Pandas dataframes from some data.
df1 = pd.DataFrame({'Data': [11, 12, 13, 14]})
df2 = pd.DataFrame({'Data': [21, 22, 23, 24]})
df3 = pd.DataFrame({'Data': [31, 32, 33, 34]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_multiple.xlsx', engine='xlsxwriter')
# Write each dataframe to a different worksheet.
df1.to_excel(writer, sheet_name='Sheet1')
df2.to_excel(writer, sheet_name='Sheet2')
df3.to_excel(writer, sheet_name='Sheet3')
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Im trying to save the excel file generated in two different folders (output1, output2)
I tried this and didnt worked
writer = pd.ExcelWriter([output1,output2], engine='xlsxwriter')
df1.to_excel(writer, sheet_name='sheeta', index = None)
Thanks
You can copy the file using How do I copy a file in Python? after you created it once ... or simply write it twice:
import pandas as pd
output1 = "p.xlsx"
output2 = "q.xlsx"
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
for o in [output1,output2]:
writer = pd.ExcelWriter(o, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
Results in 2 files being written, containing the data:
Doku:
xlswriter and pandas
I have a python script I am running that currently does three separate things and outputs each result to a different excel file. Is it possible to instead have all of my outputs in one excel file on different sheets? It seems that the latest result always overwrites the whole excel file.
Below was my thinking:
df_finit1.to_excel('OutFile.xlsx', sheet_name = 'Sheet1')
df_finit2.to_excel('OutFile.xlsx', sheet_name = 'Sheet2')
df_finit3.to_excel('OutFile.xlsx', sheet_name = 'Sheet3')
I also tried to use xlsx writer to create a file with 3 different sheets, and output to those sheets but I got the same result. Any tips?
You should use ExcelWriter, it allows to open single .xlsx file and manupulate it.
import pandas as pd
# Initialize xlsx writer
writer = pd.ExcelWriter('output_file.xlsx', engine='xlsxwriter')
workbook = writer.book
df1 = pd.DataFrame({"a": [1,2,3],
"b": [1,2,3]})
df2 = pd.DataFrame({"c": [1,2,3],
"d": [1,2,3]})
df3 = pd.DataFrame({"e": [1,2,3],
"f": [1,2,3]})
df1.to_excel(writer,
sheet_name="sheet1",
startrow=0,
startcol=0)
df2.to_excel(writer,
sheet_name="sheet2",
startrow=0,
startcol=0)
df3.to_excel(writer,
sheet_name="sheet3",
startrow=0,
startcol=0)
writer.save()
you have to use ExcelWriter like this: (maybe you have to install this module)
##############################################################################
#
# An example of writing multiple dataframes to worksheets using Pandas and
# XlsxWriter.
import pandas as pd
# Create some Pandas dataframes from some data.
df1 = pd.DataFrame({'Data': [11, 12, 13, 14]})
df2 = pd.DataFrame({'Data': [21, 22, 23, 24]})
df3 = pd.DataFrame({'Data': [31, 32, 33, 34]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('result_multiple.xlsx', engine='xlsxwriter')
# Write each dataframe to a different worksheet.
df1.to_excel(writer, sheet_name='Sheet1')
df2.to_excel(writer, sheet_name='Sheet2')
df3.to_excel(writer, sheet_name='Sheet3')
# Close the Pandas Excel writer and output the Excel file.
writer.save()