Copy dataframe from one Excel file to a different Excel file - python

I am trying to use openpyxl to open an Excel file, create a dataframe from filtered data in one of the sheets, and then write that data to an existing sheet in another file, but I keep getting an error saying that the permission is denied, I think because the way I'm calling the dataframe in the append step is somehow opening the file again after I've closed it or something. So I guess I'm wondering if there's a way to somehow get the dataframe of source data into Python, and then close out that source file, open the destination file, and write the dataframe to it. I apologize if that doesn't make sense; I'm pretty new to Python.
My code is below, and any suggestions or simplifications are welcome.
# Get latest source report using list
list_of_files_source = glob.glob(r'C:[my_path]/*')
latest_file_source = max(sorted(list_of_files_source, key = os.path.getctime))
# Load "Employee OT Data" sheet from workbook
file_source = pd.ExcelFile(latest_file_source)
df_source_Employee_OT = pd.read_excel(latest_file_source, 'Employee OT Data')
# Identify 6 most recent weeks (based on week ending date)
wk_end_source = pd.DataFrame(df_source_Employee_OT, columns = ['WEEK_ENDING']).drop_duplicates().apply(pd.to_datetime)
recent_wk_end_source = wk_end_source.sort_values(['WEEK_ENDING'], ascending=False).groupby('WEEK_ENDING').head(1)
recent_wk_end_source = recent_wk_end_source.head(6)
print(recent_wk_end_source)
# Filter source employee data for only 6 most recent weeks
df_source_Employee_OT = recent_wk_end_source.merge(df_source_Employee_OT, on='WEEK_ENDING', how='inner')
file_source.close()
# Make sure Excel instances are closed
import os
os.system("taskkill /f /im EXCEL.exe")
# Load destination workbook, targeting 'SOURCEDATA' sheet
dst = r'C:[my_other_path/Pivots.xlsm')
pivots = xw.Book(str(r'C:[my_other_path]/Pivots.xlsm'))
pivots_source_sheet = pivots.sheets['SOURCEDATA']
# Clear out old data from sheet
pivots_source_sheet.range('2:100000').api.Delete(DeleteShiftDirection.xlShiftUp)
# Save report and close
pivots.save(dst)
# Append with source data
with pd.ExcelWriter(dst, engine='openpyxl', mode='a') as writer:
df_source_Employee_OT.to_excel(writer, sheet_name=pivots_source_sheet, startrow = 2)
pivots.save(dst)
pivots.close()

This part should not be needed since the excel app is never opened, python just reads the data from an excel file, not from the software itself.
import os
os.system("taskkill /f /im EXCEL.exe")
You can pass the name of the excel file directly to pandas.read_excel and pandas.DataFrame.to_excel.
Check out the official documentation for pandas.read_excel and pandas.DataFrame.to_excel. The first one returns a dataframe given the name of an excel file, while the second is called on a dataframe and saves it to an excel file given the target file name. That should be all you need for file i/o. If these functions do not work for some reason, please include the error message you are getting.

Related

How do I make a table for historic data?

How do I make a dataset that shows historic data from snapshots?
I have a csv-file that is updated and overwritten with new snapshot data once a day. I would like to make a python-script that regularly updates the snapshot data with the current snapshots.
One way I thought of was the following:
import pandas as pd
# Read csv-file
snapshot = pd.read_csv('C:/source/snapshot_data.csv')
# Try to read potential trend-data
try:
historic = pd.read_csv('C:/merged/historic_data.csv')
# Merge the two dfs and write back to historic file-path
historic.merge(snapshot).to_csv('C:/merged/historic_data.csv')
except:
snapshot.to_csv('C:/merged/historic_data.csv')
However, I don't like the fact that I use a try-function to get the historic data if the file-path exists or write the snapshot data to the historic path if the path doesn't exist.
Is there anyone that knows a better way of creating a trend dataset?
You can use os module to check if the file exists and mode argument in to_csv function to append data to the file.
The code below will:
Read from snapshot.csv.
Checks if the historic.csv file exists.
If it exists then save the headers else dont save header.
Save the file. If the file already exists, new data will be appended to the file instead of overwriting it.
import os
import pandas as pd
# Read snapshot file
snapshot = pd.read_csv("snapshot.csv")
# Check if historic data file exists
file_path = "historic.csv"
header = not os.path.exists(file_path) # whether header needs to written
# Create or append to the historic data file
snapshot.to_csv(file_path, header=header, index=False, mode="a")
you could easily one line it by utilising the mode parameter in `to_csv'.
pandas.read_csv('snapshot.csv').to_csv('historic.csv', mode='a')
It will create the file if it doesn't already exist, or will append if it does.
What happens if you don't have a new snapshot file? You might want to wrap that in a try... except block. The pythonic way is typically ask for forgiveness instead of permission.
I wouldn't even both with an external library like pandas as the standard library has all you need to 'append' to a file.
with open('snapshot.csv', 'r') as snapshot:
with open('historic.csv', 'a') as historic:
for line in new_file.readline():
historic_file.write(line)

Openpyxl-Made changes to excel and store it in a dataframe, how to kill the Excel without saving all the changes and avoid further recovery dialogue?

I need to open and edit my Excel with openpyxl, store the excel as a dataframe, and close the excel without any changes. Are there any ways to kill the excel and disable the auto-recovery dialogue which may pop out later?
The reason I'm asking is that my code worked perfectly fine in Pycharm, however after I packed it into .exe with pyinstaller, the code stopped working, the error said "Excel cannot access the file, there are serval possible reasons, the file name or path does not exist, or the file is being used by another program, or the workbook you are saving has the same name as a currently open workbook.
I assume it is because the openpyxl did not really close the excel, and I exported it to a different folder with the same file name.
Here is my code:
wb1 = openpyxl.load_workbook(my_path, keep_vba=True)
ws1 = wb1["sheet name"]
making changes...
ws1_df = pd.DataFrame(ws1.values)
wb1.close()
Many thanks ahead :)
The following way you can do this. solution
from win32com.client import Dispatch
# Start excel application
xl = Dispatch('Excel.Application')
# Open existing excel file
book = xl.Workbooks.Open('workbook.xlsx')
# Some arbitrary excel operations ...
# Close excel application without saving file
book.Close(SaveChanges=False)
xl.Quit()

Extracting data from excel using python and writing to an empty excel file

I have a large set of data that I am trying to extract from multiple excel files that have multiple sheets using python and then write that data into a new excel file. I am new with python and have tried to use various tutorials to come up with code that can help me automate the process. However, I have reached a point where I am stuck and need some guidance on how to write the data that I extract to a new excel file. If someone could point me in the write direction, it would be greatly appreciated. See code below:
import os
from pandas.core.frame import DataFrame
path = r"Path where all excel files are located"
os.chdir(path)
for WorkingFile in os.listdir(path):
if os.path.isfile(WorkingFile):
DataFrame = pd.read_excel(WorkingFile, sheet_name = None, header = 12, skipfooter = 54)
DataFrame.to_excel(r'Empty excel file where to write all the extracted data')
When I execute the code I get an error "AttributeError: 'dict' object has no attribute 'to_excel'. So I am not sure how to rectify this error, any help would be appreciated.
Little bit more background on what I am trying to do. I have a folder with about 50 excel files, each file might have multiple sheets. The data I need is located on a table that consists of one row and 14 columns and is in the same location on each file and each sheet. I need to pull that data and compile it into a single excel file. When I run the code above and and a print statement, it is showing me the exact data I want but when I try to write it to excel it doesn't work.
Thanks for help in advance!
Not sure why you're importing DataFrame instead of pandas. Looks like your code is incomplete. Below code will clear the doubts you have. (Not include any conditions for excluding non excel files/dir etc )
import pandas as pd
import os
path = "Dir path to excel files" #Path
df = pd.DataFrame() # Initialize empty df
for file in os.listdir(path):
data = pd.read_excel(path + file) # Read each file from dir
df = df.append(data, ignore_index=True) # and append to empty df
# process df
df.to_excel("path/file.xlsx")

Pandas: ValueError: Worksheet index 0 is invalid, 0 worksheets found

Simple problem that has me completely dumbfounded. I am trying to read an Excel document with pandas but I am stuck with this error:
ValueError: Worksheet index 0 is invalid, 0 worksheets found
My code snippet works well for all but one Excel document linked below. Is this an issue with my Excel document (which definitely has sheets when I open it in Excel) or am I missing something completely obvious?
Excel Document
EDIT - Forgot the code. It is quite simply:
import pandas as pd
df = pd.read_excel(FOLDER + 'omx30.xlsx')
FOLDER Is the absolute path to the folder in which the file is located.
Your file is saved as Strict Open XML Spreadsheet (*.xlsx). Because it shares the same extension as Excel Workbook, it isn't obvious that the format is different. Open the file in Excel and Save As. If the selected option is Strict Open XML Spreadsheet (*.xlsx), change it to Excel Workbook (*.xlsx), save it and try loading it again with pandas.
EDIT: with the info that you have the original .csv, re-do your cleaning and save it as a .csv from Excel; or, if you prefer, pd.read_csv the original, and do your cleaning from the CLI with pandas directly.
It maybe your excel delete the first sheet of index 0, and now the actual index is > 0, but the param sheet_name of function pd.read_excel is 0, so the error raised.
It seems there indeed is a problem with my excel file. We have not been able to figure out what though. For now the path of least resistance is simply saving as a .csv in excel and using pd.read_csv to read this instead.

xlsxwriter: is there a way to open an existing worksheet in my workbook?

I'm able to open my pre-existing workbook, but I don't see any way to open pre-existing worksheets within that workbook. Is there any way to do this?
You cannot append to an existing xlsx file with xlsxwriter.
There is a module called openpyxl which allows you to read and write to preexisting excel file, but I am sure that the method to do so involves reading from the excel file, storing all the information somehow (database or arrays), and then rewriting when you call workbook.close() which will then write all of the information to your xlsx file.
Similarly, you can use a method of your own to "append" to xlsx documents. I recently had to append to a xlsx file because I had a lot of different tests in which I had GPS data coming in to a main worksheet, and then I had to append a new sheet each time a test started as well. The only way I could get around this without openpyxl was to read the excel file with xlrd and then run through the rows and columns...
i.e.
cells = []
for row in range(sheet.nrows):
cells.append([])
for col in range(sheet.ncols):
cells[row].append(workbook.cell(row, col).value)
You don't need arrays, though. For example, this works perfectly fine:
import xlrd
import xlsxwriter
from os.path import expanduser
home = expanduser("~")
# this writes test data to an excel file
wb = xlsxwriter.Workbook("{}/Desktop/test.xlsx".format(home))
sheet1 = wb.add_worksheet()
for row in range(10):
for col in range(20):
sheet1.write(row, col, "test ({}, {})".format(row, col))
wb.close()
# open the file for reading
wbRD = xlrd.open_workbook("{}/Desktop/test.xlsx".format(home))
sheets = wbRD.sheets()
# open the same file for writing (just don't write yet)
wb = xlsxwriter.Workbook("{}/Desktop/test.xlsx".format(home))
# run through the sheets and store sheets in workbook
# this still doesn't write to the file yet
for sheet in sheets: # write data from old file
newSheet = wb.add_worksheet(sheet.name)
for row in range(sheet.nrows):
for col in range(sheet.ncols):
newSheet.write(row, col, sheet.cell(row, col).value)
for row in range(10, 20): # write NEW data
for col in range(20):
newSheet.write(row, col, "test ({}, {})".format(row, col))
wb.close() # THIS writes
However, I found that it was easier to read the data and store into a 2-dimensional array because I was manipulating the data and was receiving input over and over again and did not want to write to the excel file until it the test was over (which you could just as easily do with xlsxwriter since that is probably what they do anyway until you call .close()).
After searching a bit about the method to open the existing sheet in xlxs, I discovered
existingWorksheet = wb.get_worksheet_by_name('Your Worksheet name goes here...')
existingWorksheet.write_row(0,0,'xyz')
You can now append/write any data to the open worksheet.
You can use the workbook.get_worksheet_by_name() feature:
https://xlsxwriter.readthedocs.io/workbook.html#get_worksheet_by_name
According to https://xlsxwriter.readthedocs.io/changes.html the feature has been added on May 13, 2016.
"Release 0.8.7 - May 13 2016
-Fix for issue when inserting read-only images on Windows. Issue #352.
-Added get_worksheet_by_name() method to allow the retrieval of a worksheet from a workbook via its name.
-Fixed issue where internal file creation and modification dates were in the local timezone instead of UTC."
Although it is mentioned in the last two answers with it's documentation link, and from the documentation it seems indeed there are new methods to work with the "worksheets", I couldn't able to find this methods in the latest package of "xlsxwriter==3.0.3"
"xlrd" has removed support for anything other than xls files now.
Hence I was able to workout with "openpyxl" this gives you the expected functionality as mentioned in the first answer above.

Categories

Resources