Python reading an updated excel file - python

I’m really stuck on what should be an easy problem.
I have an excel workbook that I’m making an update to 2 Columns for one record for the clean_data sheet. From there, I’m saving and closing the file.
After that, I’m trying to pull in the updated roll up sheet values as a data frame (graphs_rolling) which has formulas utilizing the clean_data sheet.
When I view the data frame, all the values are Nan. I can open the exel file and see the updated values on the graphs_rolling sheet. What can I do to fix the data frame to populate with values?
Code is shown below:
import pandas as pd
import openpyxl
from openpyxl import load_workbook
#Import Data with Correct Rows and Columns for SSM Commercial
book = load_workbook('//CPI Projects//Test//SampleSSM//NewSSM.xlsx')
writer = pd.ExcelWriter('//CPI Projects//Test//SampleSSM//NewSSM.xlsx', engine = 'openpyxl')
writer.book = book
df1 = pd.read_excel('//CPI Projects//Test//SampleSSM//NewSSM.xlsx',sheet_name='clean_data')
df1.loc[df1['ev_id']==20201127, 'commercial_weight'] = 0 df1.loc[df1['ev_id']==20201127, 'commercial'] = 0
book.remove(book['clean_data'])
df1.to_excel(writer, sheet_name = 'clean_data',index=False)
writer.save()
writer.close()
df5 = pd.read_excel('//CPI Projects//Test//SampleSSM//NewSSM.xlsx',sheet_name='graphs_rolling_avg',skiprows=30)
print(df5)

Related

How to write pandas.DataFrame data with array values back to excel

Pandas.DataFrame, I have this output data as a dataframe and i wanted to write back this data back to excel.
This is excel sheet format
I wanted to write dataframe row in excel cell, for example :- Kosten EK will goes in excel sheet D4, IRR mit Finanzierung will go in excel sheet D5. I have same dataframe in which Soll-SOC 1-12 value is single value not an array and it is working properly, but for this case because of array i could not write. how can i solve this?
I am using xlwings, xlwriter to write data back to excel
import xlwings as xw
wb = xw.Book(file_path) # wb = xw.Book(filename) would open an existing file
Working_Sheet = wb.sheets["sheet_name"] # activating working sheet
Working_Sheet.range('D4:D15').options(index=False,header=False).value = Data[20000][0.25]
You should try to convert to a pd.DataFrame object.
import pandas as pd
import xlwings as xw
df = pd.DataFrame(...)
import xlwings as xw
wb = xw.Book(file_path) # wb = xw.Book(filename) would open an existing file
Working_Sheet = wb.sheets["sheet_name"] # activating working sheet
Working_Sheet.range('D4:D15').options(convert=pd.DataFrame, index=False,header=False).value = Data[20000][0.25]

Overwrite a sheet in excel using python

I'm trying to overwrite one sheet of my excel file with data from a .txt file. The excel file I'm bringing the data into has several sheets but I only want to overwrite the 'Previous Month' sheet. Every time I run this code and open the excel file only the previous month sheet is there and nothing else. Many solutions on here show how to add more sheets, I'm trying to update an already existing sheet in an excel with 8 sheets total.
How can I fix my code so that only the one sheet is edited but all of them stay there?
import pandas as pd
#importing previous month data#
writer = pd.ExcelWriter('file.xlsx')
df = pd.read_csv('file.txt', sep='\t')
df.to_excel(writer, sheet_name='Previous Month', startrow=4, startcol=2)
writer.save()
writer.close()
Edited code- whatever is happening here keeps corrupting my original file
import pandas as pd
import openpyxl
#importing previous month data#
writer= pd.ExcelWriter('file.xlsx', mode= 'a', engine="openpyxl", if_sheet_exists="replace")
df = pd.read_csv('file.txt', sep='\t')
df.to_excel(writer, sheet_name="Previous Month", startrow=2, startcol=4)
writer.save()
writer.close()
You can use openpyxl.load_workbook() to do what you are looking for. While I did try the above suggestions, it didn't work for me. the load_workbook() usually runs without issues. So, hope this works for you as well.
I open the output file using load_workbook(), deleted the existing sheet (Sheet2 here) if it exists, then create and write the data using create_sheet() and dataframe_to_rows (Ref). Let me know in case of questions/issues.
import pandas as pd
import openpyxl
df = pd.read_csv('file.txt', sep='\t')
wb=openpyxl.load_workbook('output.xlsx') # Open workbook
if "Sheet2" in wb.sheetnames: # If sheet exists, delete it
del wb['Sheet2']
ws = wb.create_sheet(title='Sheet2') # Create new sheet
from openpyxl.utils.dataframe import dataframe_to_rows
rows = dataframe_to_rows(df, index=False, header=True) # Write dataframe as rows
for r_idx, row in enumerate(rows, 1):
for c_idx, value in enumerate(row, 1):
ws.cell(row=r_idx+2, column=c_idx+4, value=value) #Add... the 2, 4 are the offset, similar to the startrow and startcol in your code
wb.save('output.xlsx')

How do I create a table using Openpyxl's table module?

I'm attempting to create a script to process several Excel sheets at once and one of the steps i'm trying to get Python to handle is to create a table using data passed from a pandas data frame. Creating a table seems pretty straightforward looking at the documentation.
Following the example from here:
# define a table style
mediumstyle = TableStyleInfo(name='TableStyleMedium2', showRowStripes=True)
# create a table
table = Table(displayName='IdlingReport', ref='A1:C35', tableStyleInfo=mediumstyle)
# add the table to the worksheet
sheet2.add_table(table)
# Saving the report
wb.save(openexcel.filename)
print('Report Saved')
However this creates an empty table, instead of using the data present in cells 'A1:C35'. I can't seem to find any examples anywhere that go beyond these steps so any help with what I may be doing wrong is greatly appreciated.
The data in 'A1:C35' is being written to Excel as follows:
while i < len(self.sheets):
with pd.ExcelWriter(filename, engine='openpyxl') as writer:
writer.book = excelbook
writer.sheets = dict((ws.title, ws) for ws in excelbook.worksheets)
self.df_7.to_excel(writer, self.sheets[i], index=False, header=True, startcol=0, startrow=0)
writer.save()
i += 1
The output looks something like this
Time Location Duration
1/01/2019 [-120085722,-254580042] 5 Min
1/02/2019 [-120085722,-254580042] 15 Min
1/02/2019 [-120085722,-254580042] 7 Min
Just to clarify right now I am first writing my data frame to Excel and then after formatting the data I've written as a table. Reversing these steps by creating the table first and then writing to Excel fills the table, but gets rid of the formatting(font color, font type, size, etc). Which means I'd have to add an additional step to fix the formatting(which i'd like to avoid if possible).
Your command
# create a table
table = Table(displayName='IdlingReport', ref='A1:C35', tableStyleInfo=mediumstyle)
creates a special Excel object — an empty table with the name IdlingReport.
You probably want something else - to fill a sheet of your Excel workbook with data from a Pandas dataframe.
For this purpuse there is a function dataframe_to_rows():
from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
wb = Workbook()
ws = wb.active # to rename this sheet: ws.title = "some_name"
# to create a new sheet: ws = wb.create_sheet("some_name")
for row in dataframe_to_rows(df, index=True, header=True):
ws.append(row) # appends this row after a previous one
wb.save("something.xlsx")
See Working with Pandas Dataframes and Tutorial.

Overwrite sheets in Excel with Python

I'm new to Python (and programming in general) and am running into a problem when writing data out to sheets in Excel.
I'm reading in an Excel file, performing a sum calculation on specific columns, and then writing the results out to a new workbook. Then at the end, it creates two charts based on the results.
The code works, except every time I run it, it creates new sheets with numbers appended to the end. I really just want it to overwrite the sheet names I provide, instead of creating new ones.
I'm not familiar enough with all the modules to understand all the options that are available. I've researched openpyxl, and pandas, and similar examples to what I'm trying to do either aren't easy to find, or don't seem to work when I try them.
import pandas as pd
import xlrd
import openpyxl as op
from openpyxl import load_workbook
import matplotlib.pyplot as plt
# declare the input file
input_file = 'TestData.xlsx'
# declare the output_file name to be written to
output_file = 'TestData_Output.xlsx'
book = load_workbook(output_file)
writer = pd.ExcelWriter(output_file, engine='openpyxl')
writer.book = book
# read the source Excel file and calculate sums
excel_file = pd.read_excel(input_file)
num_events_main = excel_file.groupby(['Column1']).sum()
num_events_type = excel_file.groupby(['Column2']).sum()
# create dataframes and write names and sums out to new workbook/sheets
df_1 = pd.DataFrame(num_events_main)
df_2 = pd.DataFrame(num_events_type)
df_1.to_excel(writer, sheet_name = 'TestSheet1')
df_2.to_excel(writer, sheet_name = 'TestSheet2')
# save and close
writer.save()
writer.close()
# dataframe for the first sheet
df = pd.read_excel(output_file, sheet_name='TestSheet1')
values = df[['Column1', 'Column3']]
# dataframe for the second sheet
df = pd.read_excel(output_file, sheet_name='TestSheet2')
values_2 = df[['Column2', 'Column3']]
# create the graphs
events_graph = values.plot.bar(x = 'Column1', y = 'Column3', rot = 60) # rot = rotation
type_graph = values_2.plot.bar(x = 'Column2', y = 'Column3', rot = 60) # rot = rotation
plt.show()
I get the expected results, and the charts work fine. I'd really just like to get the sheets to overwrite with each run.
From the pd.DataFrame.to_excel documentation:
Multiple sheets may be written to by specifying unique sheet_name.
With all data written to the file it is necessary to save the changes.
Note that creating an ExcelWriter object with a file name that already
exists will result in the contents of the existing file being erased.
Try writing to the book like
import pandas as pd
df = pd.DataFrame({'col1':[1,2,3],'col2':[4,5,6]})
writer = pd.ExcelWriter('g.xlsx')
df.to_excel(writer, sheet_name = 'first_df')
df.to_excel(writer, sheet_name = 'second_df')
writer.save()
If you inspect the workbook, you will have two worksheets.
Then lets say you wanted to write new data to the same workbook:
writer = pd.ExcelWriter('g.xlsx')
df.to_excel(writer, sheet_name = 'new_df')
writer.save()
If you inspect the workbook now, you will just have one worksheet named new_df
If there are other worksheets in the excel file that you want to keep and just overwrite the desired worksheets, you would need to use load_workbook.
Before you wrtie any data, you could delete the sheets you want to write to with:
std=book.get_sheet_by_name(<sheee_name>)
book.remove_sheet(std)
That will stop the behavior where a number gets appended to the worksheet name once you attempt to write a workbook with a duplicate sheet name.

Write to an existing excel file using Openpyxl starting in existing sheet starting at a specific column and row

I have been searching this question to write in an existing excel sheet starting from specific row and column however methods like dataframe_to_rows is not writing from a specific position in a cell.
I am now using a custom loop to write this however was wondering if there is a better approach.
The loops works like this
import pandas as pd
import numpy as np
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter
df = pd.DataFrame(np.random.randn(20, 4), columns=list('ABCD'))
file = "C:\\somepath\\some_existing_file.xlsx"
wb = load_workbook(filename=file, read_only=False)
ws = wb['some_existing_sheet']
##Fill up the row and column needed
stcol = 5
strow = 5
## Writing the column header
for c in range(0,len(df.columns)):
ws[get_column_letter(c+stcol)+str(strow)].value = df.columns[c]
## Writing the data
for r in range(0,len(df)):
for c in range(0,len(df.columns)):
ws[get_column_letter(c+stcol)+str(strow+r+1)].value = df.iloc[r][c]
wb.save(file)
Please let me know if there is a better way to write to specefic position in a cell. By any chance if this turns out to be duplicate question, happy to merge in the original thread.
I do have another approach however with xlsx writer but this removes all other data from existing sheet
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application') # opens Excel
writer = pd.ExcelWriter(file', engine='xlsxwriter')
df.to_excel(writer, sheet_name='abc', startrow=5, startcol=5,index=False)
writer.save()
Instead of
ws[get_column_letter(c+stcol)+str(strow)]
you can use
ws.cell(column=c+stcol, row=strow)

Categories

Resources