how to append dataframe in existing sheet of excel file using python - python

You can find what I've tried so far below:
import pandas
from openpyxl import load_workbook
book = load_workbook('C:/Users/Abhijeet/Downloads/New Project/Masterfil.xlsx')
writer = pandas.ExcelWriter('C:/Users/Abhijeet/Downloads/New Project/Masterfiles.xlsx', engine='openpyxl',mode='a',if_sheet_exists='replace')
df.to_excel(writer,'b2b')
writer.save()
writer.close()

Generate Sample data
import pandas as pd
# dataframe Name and Age columns
df = pd.DataFrame({'Col1': ['A', 'B', 'C', 'D'],
'Col2': [10, 0, 30, 50]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('sample.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1', index=False)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
This code will add two columns, Col1 and Col2, with data to Sheet1 of sample.xlsx.
To Append data to existing excel
import pandas as pd
from openpyxl import load_workbook
# new dataframe with same columns
df = pd.DataFrame({'Col1': ['E','F','G','H'],
'Col2': [100,70,40,60]})
writer = pd.ExcelWriter('sample.xlsx', engine='openpyxl')
# try to open an existing workbook
writer.book = load_workbook('sample.xlsx')
# copy existing sheets
writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets)
# read existing file
reader = pd.read_excel(r'sample.xlsx')
# write out the new sheet
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)
writer.close()
This code will append data at the end of an excel.
Check these as well
how to append data using openpyxl python to excel file from a specified row?

Suppose you have excel file abc.xlsx.
and You Have Dataframe to be appended as "df1"
1.Read File using Pandas
import pandas as pd
df = pd.read_csv("abc.xlsx")
2.Concat Two dataframes and write to 'abc.xlsx'
finaldf = pd.concat(df,df1)
# write finaldf to abc.xlsx and you are done

Related

export dataframe to new excel worksheet and also write specific values to specific sheet

I'm running into an issue that I think relates to needing to:
export a dataframe to a new Excel worksheet (created at time of export)
write specific values to an existing sheet in same workbook
Doing both of the above in a loop
I can get 1 and 3 to work by themselves, and I can get 2 and 3 to work by themselves, but when I try to do all three things it doesn't work. I think there is some issue with the pandas to_excel using xlsxwriter engine conflicting with the sheets.write(row,column, value) to the same workbook.
For instance, this works by itself (note that I have the "writer" stuff to export dataframe to new sheet commented out):
import pandas as pd
import xlsxwriter
loopList = ["A","B","C","D","E"]
data = [['tom', 10], ['nick', 15], ['juli', 14]]
counter = 1
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
workbook = xlsxwriter.Workbook('C:\\Test\\Test.xlsx')
totalsSheet = workbook.add_worksheet('Totals')
writer = pd.ExcelWriter('C:\\Test\\Test.xlsx', engine = 'xlsxwriter')
for sheets in loopList:
#df.to_excel(writer, sheet_name = sheets, index=False)
totalsSheet.write(counter, counter, sheets + str(counter))
counter+=1
#writer.save()
#writer.close()
workbook.close()
The above makes the test.xlsx workbook, with a Totals worksheet, with "A1", "B2", etc. in incrementing row/column.
Likewise, when I comment out the workbook stuff and UN comment the pandas-export dataframe to new sheets, that also works:
import pandas as pd
import xlsxwriter
loopList = ["A","B","C","D","E"]
data = [['tom', 10], ['nick', 15], ['juli', 14]]
counter = 1
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
workbook = xlsxwriter.Workbook('C:\\Test\\Test.xlsx')
totalsSheet = workbook.add_worksheet('Totals')
writer = pd.ExcelWriter('C:\\Test\\Test.xlsx', engine = 'xlsxwriter')
for sheets in loopList:
df.to_excel(writer, sheet_name = sheets, index=False)
#totalsSheet.write(counter, counter, sheets + str(counter))
counter+=1
writer.save()
writer.close()
#workbook.close()
The above gives me a new Test workbook with 5 sheets (A, B, C, etc.) with the same dataframe exported to each.
However, I can't seem to do both; depending on the order in which I have the lines that write to Excel, it still only does one or the other (I don't get errors, I just get a result that's not both things I'm trying to do).
Is there a way to accomplish both of these things in the same loop?
I'm using python 3.x.x. Thanks for any help.
Could you not just run it in to separate loops that each open and close the file to ensure that it is available to each processes? Something like...
import pandas as pd
import xlsxwriter
loopList = ["A","B","C","D","E"]
data = [['tom', 10], ['nick', 15], ['juli', 14]]
counter = 1
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
with pd.ExcelWriter('C:\\Test\\Test.xlsx', engine = 'xlsxwriter') as writer:
for sheets in loopList:
df.to_excel(writer, sheet_name = sheets, index=False)
workbook = xlsxwriter.Workbook('C:\\Test\\Test.xlsx')
totalsSheet = workbook.add_worksheet('Totals')
for sheets in loopList:
totalsSheet.write(counter, counter, sheets + str(counter))
counter+=1
workbook.close()
Update
Looking into it more, as the docs say xlsxwriter:
cannot read or modify existing Excel XLSX files.
So what you were trying before was causing the overwrite to happen. However, if you look into the docs more you will see that the key is to create the workbook object from the pd.ExcelWriter object. This will mean that both libraries can write to the file at the same time.
I installed xlsxwriter and the code below works for me:
import pandas as pd
import xlsxwriter
# data to write
loopList = ["A","B","C","D","E"]
data = [['tom', 10], ['nick', 15], ['juli', 14]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
# create the writer object
writer = pd.ExcelWriter('Test.xlsx', engine='xlsxwriter')
# create the workbook object from the current writer object
# this means that pandas and xlsxwriter can both write to it
workbook = writer.book
totalsSheet = workbook.add_worksheet('Totals')
# set the counter
counter = 1
# lopo through and use xlsx writer to write specific cells
for sheets in loopList:
totalsSheet.write(counter, counter, sheets + str(counter))
counter+=1
# loop through generating new sheets and writing dfs to the file
for sheets in loopList:
df.to_excel(writer, sheet_name = sheets, index=False)
# save the written data
writer.save()

Problem with different indexing in pandas and xlsxwriter

Here is the code which works fine:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Apply a conditional format to the cell range.
worksheet.conditional_format(1,1,1,1, {'type': '3_color_scale'}) ##CHANGES THE COLOR OF SECOND ROW
# Close the Pandas Excel writer and output the Excel file.
writer.save()
This creates below output.
My question is, Is it possible to include the Header Data in the pandas indexing? I want to start indexing from 1st row. So Header Data should have index 0. It is useful because in xlsxwriter 1st row has index 0.
Firstly index is an object and default index starts from 0. You can swift it by typing:
df.index += 1
As for the header's index name pandas method to_excel takes an argument which is called index_label. So your code should be:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
df.index += 1
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_conditional.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1', index_label='0')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Apply a conditional format to the cell range.
worksheet.conditional_format(1,1,1,1, {'type': '3_color_scale'}) ##CHANGES THE COLOR OF SECOND ROW
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:

Creating a Master excel file from dynamic CSV output using Python

I am trying to create a repository "Master" excel file from a CSV which will be generated and overwritten every couple of hours. The code below creates a new excel file and writes the content from "combo1.csv" to "master.xlsx". However, whenever the combo1 file is updated, the code basically overwrites the contents in the "master.xlsx" file. I need to append the contents from "combo1" to "Master" without the headers being inserted every time. Can someone help me with this?
import pandas as pd
writer = pd.ExcelWriter('master.xlsx', engine='xlsxwriter')
df = pd.read_csv('combo1.csv')
df.to_excel(writer, sheet_name='sheetname')
writer.save()
Refer to Append Data at the End of an Excel Sheet section in this medium article:
Using Python Pandas with Excel Sheets
(Credit to Nensi Trambadiya for the article)
Basically you'll have to first read the Excel file and find the number of rows before pushing the new data.
reader = pd.read_excel(r'master.xlsx')
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)
First read the excel file and then need to perform below method to append the rows.
import pandas as pd
from xlsxwriter import load_workbook
df = pd.DataFrame({'Name': ['abc','def','xyz','ysv'],
'Age': [08,45,32,26]})
writer = pd.ExcelWriter('master.xlsx', engine='xlsxwriter')
writer.book = load_workbook('Master.xlsx')
writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets)
reader = pd.read_excel(r'master.xlsx')
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)
writer.close()
import pandas as pd
from openpyxl import load_workbook
# new dataframe with same columns
df = pd.read_csv('combo.csv')
writer = pd.ExcelWriter('master.xlsx', engine='openpyxl')
# try to open an existing workbook
writer.book = load_workbook('master.xlsx')
# copy existing sheets
writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets)
# read existing file
reader = pd.read_excel(r'master.xlsx')
# write out the new sheet
df.to_excel(writer, index=False, header=False, startrow=len(reader) + 1)
writer.close()
Note that a Master has to be created before running the script

Output to excel file without overwriting sheets

I have a python script I am running that currently does three separate things and outputs each result to a different excel file. Is it possible to instead have all of my outputs in one excel file on different sheets? It seems that the latest result always overwrites the whole excel file.
Below was my thinking:
df_finit1.to_excel('OutFile.xlsx', sheet_name = 'Sheet1')
df_finit2.to_excel('OutFile.xlsx', sheet_name = 'Sheet2')
df_finit3.to_excel('OutFile.xlsx', sheet_name = 'Sheet3')
I also tried to use xlsx writer to create a file with 3 different sheets, and output to those sheets but I got the same result. Any tips?
You should use ExcelWriter, it allows to open single .xlsx file and manupulate it.
import pandas as pd
# Initialize xlsx writer
writer = pd.ExcelWriter('output_file.xlsx', engine='xlsxwriter')
workbook = writer.book
df1 = pd.DataFrame({"a": [1,2,3],
"b": [1,2,3]})
df2 = pd.DataFrame({"c": [1,2,3],
"d": [1,2,3]})
df3 = pd.DataFrame({"e": [1,2,3],
"f": [1,2,3]})
df1.to_excel(writer,
sheet_name="sheet1",
startrow=0,
startcol=0)
df2.to_excel(writer,
sheet_name="sheet2",
startrow=0,
startcol=0)
df3.to_excel(writer,
sheet_name="sheet3",
startrow=0,
startcol=0)
writer.save()
you have to use ExcelWriter like this: (maybe you have to install this module)
##############################################################################
#
# An example of writing multiple dataframes to worksheets using Pandas and
# XlsxWriter.
import pandas as pd
# Create some Pandas dataframes from some data.
df1 = pd.DataFrame({'Data': [11, 12, 13, 14]})
df2 = pd.DataFrame({'Data': [21, 22, 23, 24]})
df3 = pd.DataFrame({'Data': [31, 32, 33, 34]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('result_multiple.xlsx', engine='xlsxwriter')
# Write each dataframe to a different worksheet.
df1.to_excel(writer, sheet_name='Sheet1')
df2.to_excel(writer, sheet_name='Sheet2')
df3.to_excel(writer, sheet_name='Sheet3')
# Close the Pandas Excel writer and output the Excel file.
writer.save()

Creating Excel sheets in same workbook in python

Need a suggestion in my code.
I have a data frame in sheet1 of workbook:
column 1 column 2
000A0000 2
000B0000 3
000A0001 5
000B0001 1
My desired result:
in sheet 2 of Workbook:
column 1 column 2
000A0000 2
000A0001 5
In sheet 3 of Workbook:
column 1 column 2
000B0000 3
000B0001 1
I have done my coding:
import pandas as pd
file="workbook.xlxs"
print(data.sheet_names)
data=data.parse("sheet1")
substrings = ['A', 'B']
T = {x: df[df['sheet1'].str.contains(x, na=False, regex=False)] for x in substrings]
for key, var in T.items():
var.to_excel(f'{key}.xlsx', index=False)
by this I can create new workbook. But I need to create new worksheet in same workbook.
Any suggestion would be appreciated.
To add sheets to the same excel file use openpyxl module as follows:
import pandas as pd
import openpyxl
#reading the sheet1 using read_excel
df = pd.read_excel('workbook.xlsx', sheet_name='Sheet1')
#creating pandas ExcelWriter object and loading the excel file using `openpyxl`
df_writer = pd.ExcelWriter('workbook.xlsx', engine='openpyxl')
excel = openpyxl.load_workbook('workbook.xlsx')
df_writer.book = excel
#checking string in column 1 and writing those to respective sheets in same workbook
for string in ['A','B']:
df[df['column 1'].str.contains(string)].to_excel(df_writer,sheet_name=string)
#saving and closing writer
writer.save()
writer.close()
to_excel would not append sheets to your existing file:
use openpyxl instead:(something like below)
import pandas
from openpyxl import load_workbook
book = load_workbook('path+filename_you_want_to_write_in.xlsx')
writer = pandas.ExcelWriter('path+filename_you_want_to_write_in.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, "Sheet_name_as_per_your_choice",index=False)
writer.save()
Also if you dynamically want to read through the sheets and not specific sheets:
f = pd.ExcelFile(file)
sheet_names = df.sheet_names
for i in list(sheet_names):
df = pd.read_excel(f,i)
This iterates through all your sheets and provides a dataframe based on the sheets.
Try using the xlsxwriter engine.
writer = pd.ExcelWriter('<< file_name >>', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet2')
writer.save()

Categories

Resources