Embed CSV in Excel and import the data

Embed CSV in Excel and import the data - python

I wrote a tool that extracts data from a large DB and outputs it to an Excel file along with (conditional) formatting to improve readability. For this I use Python with openpyxl on a Linux machine. It works great, but this package is rather slow for writing Excel.
It seems to be a lot quicker to dump the table as (compressed) csv, import that into Excel and apply formatting there using a macro/vba.
To automate the process I'd like to create an empty Excel file pre-loaded with the required VBA to do the formatting; a template. For every data dump, the data is embedded (compressed using deflate) into the Excel file and loaded into the Workbook upon opening the document (or using a "LOAD" button to circumvent macro related security things).
However, just adding some file into the Excel file raises an error when opened:
We found a problem with some content in 'Werkmap1_test_embed.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.
Clicking Yes opens the file and shows some tracing information as XML:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>Repair Result to Werkmap1_OLE_Word0.xml</logFileName>
<summary>Errors were detected in file '/Users/joostk/mnt/cluster/Werkmap1_OLE_Word.xlsx'</summary>
<additionalInfo>
<info>Excel completed file level validation and repair. Some parts of this workbook may have been repaired or discarded.</info>
</additionalInfo>
</recoveryLog>
Is it possible to avoid this? How would I embed a file into the Excel ZIP? Do I need to update some file table (which I could not file easily).
When that's done, I'd like to import the data. Can I access files in the Excel ZIP from VBA? I guess not, and I need to extract the data to some temporary path and load it from there.
I have found these helpful answers elsewhere to load ZIP and plain text:
https://stackoverflow.com/a/35781621/4998990
https://stackoverflow.com/a/11267603/4998990
Many thanks for sharing your thoughts!

so my "Answer" here is that this is caused by using Named Ranges, or an underlying table, or an embedded Query/Connection. When you start manipulating this file you will get the error that you are talking about:
There is no harm to the file if you click "yes" and open. Excel will open this in Repaired Mode which will require you to re-save the file.
The way I've worked around this is to re-read the "repaired" file, in python, and save it as another file or replace it. Essentially just do an extra step of re-reading the data into memory, and write it to a new file. The error will go away. As always, test this method before deploying to production to ensure no records are lost. The way I solve it is with two lines of pandas.
import pandas as pd
repair = pd.read_excel('PATH_TO_REPAIR_FILE')
new_file = repair.to_excel('PATH_TO_WHERE_NEW_FILE_GOES')

Related

openpyxl corrupts spreadsheet if it contains a data source

I use openpyxl to interact with Excel files using Python 3.7. I open and save my .xlsx spreadsheets as follows:
from openpyxl import load_workbook
wb.load_workbook('file.xlsx', read_only=False)
wb.save('file.xlsx')
If file.xlsx contains no links to external data sources (such as SQL Server or Postgre-SQL), then there is no problem with the saved file and it opens okay in Excel after being processed by my Python script.
However, if file.xlsx does contain a link to external data, then upon executing the above script, the output file is now corrupted. When opening the file in Excel, the following error is reported and I have the option of attempting to recover it. When recovering, the data remains but all links to the data source are gone.
> We found a problem with some content in file.xlsx. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.
It is easy to reproduce this error as follows:
Create a blank spreadsheet and save it as file.xlsx.
Run the above three lines of Python code to open and save the file. You will see this works fine and has no impact on the spreadsheet.
Now open file.xlsx in Excel and, from the Data tab, choose a data source. You can choose any data source (link to a csv file, a table within Excel, or an external data source - it doesn't matter).
Save the spreadsheet, then run the above Python script (which again, simply opens and saves it).
Open file.xlsx in Excel. You will see that it is now corrupted.
My conclusion is that, at the moment, openpyxl doesn't support spreadsheets that contain links to external data. It would be useful to have this confirmed, or for a workaround to the above issue to be proposed.
Thanks!!

Error opening text file saved with .xls extension in python

I'm using labview to create and save data from an experiment. Labview itself creates a text file but saves it automatically with a .xls extension (word 1997-2003--it's an old setup that was never changed because it never broke). Whenever I go to open one of the data files, excel spits out this:
"The file you are trying to open, 'name.ext', is in a different format than specified by the file extension. Verify that the file is not corrupted and is from a trusted source before opening the file. Do you want to open the file now?"
I'm generating a lot of data, so I want to use python to sort it out and do some quick analysis over files in a directory.
The problem is that python doesn't like that it's a text file saved with a .xls extension. It can cycle through the directory just fine to get the file names, but whenver I actually try to open the file or do anything with it, I get the error in the image attached. This happens if I change the extension to .xls, .xlsx, or do nothing with it at all and let it try to open the original filename.
error message
I literally have hundreds of these .xls files. I know I can go through, open each one in excel and save as a real excel file by hand, but that will take hours. Can someone please help me figure a way around this error in python?
Dropbox Data File set
*Update. Matlab, when trying to read one of the files using xlsread, says this:
Error using xlsread (line 251)
File C:\Users\zane\Documents\Research Projects\PneuFish Project\Data\Nov 28 2016 ATI
Data\ATI_Data_2016Y_11M_28D_16h_36m_01s.xls not in Microsoft Excel Format.
Thank you!

You can use the module xlrd.
import xlrd
import csv
def csv_from_excel():
wb = xlrd.open_workbook('your_workbook.xls')
sh = wb.sheet_by_name('Sheet1')
your_csv_file = open('your_csv_file.csv', 'wb')
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
for rownum in xrange(sh.nrows):
wr.writerow(sh.row_values(rownum))
your_csv_file.close()
Taken from This Post
This will convert from .xls to .csv, which is easily manipulated with Python.

You've said that the file is a text file, so don't tell Python that it's an Excel file. Just use Python's open and read it as text, then do whatever you want with it. open doesn't care what extension a file has.
I'm going to guess that the format is actually tab-delimited. From memory, earlier versions of Excel would read in tab-delimited text files with the .xls extension without complaint, whereas csv files would always bring up the text import wizard, so this was a common dodge if saving data intended for Excel from a program that didn't support writing real Excel files.
If you want the LabVIEW code to write real Excel files in future, the Write to Measurement File express VI has an option to write in xlsx format. I'm not sure which version of LabVIEW first introduced this but it's been there for a few years now.

XLRD vs Win32 COM performance comparison

I have this huge Excel (xls) file that I have to read data from. I tried using the xlrd library, but is pretty slow. I then found out that by converting the Excel file to CSV file manually and reading the CSV file is orders of magnitude faster.
But I cannot ask my client to save the xls as csv manually every time before importing the file. So I thought of converting the file on the fly, before reading it.
Has anyone done any benchmarking as to which procedure is faster:
Open the Excel file with with the xlrd library and save it as CSV file, or
Open the Excel file with win32com library and save it as CSV file?
I am asking because the slowest part is the opening of the file, so if I can get a performance boots from using win32com I would gladly try it.

if you need to read the file frequently, I think it is better to save it as CSV. Otherwise, just read it on the fly.
for performance issue, I think win32com outperforms. however, considering cross-platform compatibility, I think xlrd is better.
win32com is more powerful. With it, one can handle Excel in all ways (e.g. reading/writing cells or ranges).
However, if you are seeking a quick file conversion, I think pandas.read_excel also works.
I am using another package xlwings. so I am also interested with a comparison among these packages.
to my opinion,
I would use pandas.read_excel to for quick file conversion.
If demanding more processing on Excel, I would choose win32com.

Converting HTML to Excel with Django

I have a reporting module in my Django app that gives the user the ability to see their reports on screen or to export them and have the export opened by Excel.
The export is a cheat. I take the exact same output as the screen version and save it to a file with an .xls extension and
response = HttpResponse(body, content_type='application/vnd.ms-excel')
and badda-boom, badda-bing I have an Excel file that is lightly formatted, i.e. it respects the css styling that I've applied.
The nice thing for the user is that the file auto-opens in Excel; there aren't any extra steps for them. (find the download, import a text file, etc.)
Unfortunately it looks like Excel 2016 has decided (I'm guessing) that that's a security issue and no longer opens the file.
I'm aware of various python -> Excel tools. openpyxl looks promising. But that's going to require me to touch each report.
So, what I'm looking for is something that would give me what I have now, take an html file and have Excel open it as a native file and recognize the existing formatting.

The behavior change has been noted by Microsoft and there are work arounds, for the user:
https://support.microsoft.com/en-us/kb/3181507
It sounds like they're working on a fix.

Creating a Case in PSSE

I have data in an excel file that I would like to use to create a case in PSSE. The data is organized as it would appear in a case in PSSE (ie. for bus Bus number, name, base kV, and so on. Of course the data can be entered manually but I'm working with over 500 buses. I have tried copied and pasting, but that seems to works only sometimes. For machine data, it barely works.
Is there a way to import this data to PSSE from an excel file? I have recently started running PSSE with Python, and maybe there is a way to do this?
--
MK.

Yes. You can import data from an excel file into PSSE using the python package xlrt, however, I would reccomend instead converting your excel file to csv before you import and use csv as it is much easier. Importing data using the API is not just a copy and paste job, into the nicely tabulated spreadsheet that PSSE has in its case data.
Refer to the API documentation for PSSE, chapter II. Search this function, BUS_DATA_2. You will see that you can create buses with this function.
So your job should be three fold.
Import the csv file data with each line being a list of each data parameter for your bus. Like voltage, name, baseKV, PU etc. Store it to another list.
Iterate through the new list you just created and call:
ierr = bus_data_2(i, intgar, realar, name)
and pass in your data from the csv file. (see PSSE API documentation on how to do this) This will effectively load data from the csv file to your case ( in the form of nodes or buses).
After you are finished, you will need to call a function called psspy.save("Casename.sav") to save your work in a new PSSE case.
Note: there are functions to load in line data, fix shunt data, generator data etc.
Your other option is to call up the PTI folks as they can give you training.
Good luck

If you have an Excel data file with exactly the same "format" and same "info" as the regular case file (.sav), try this:
Open any small example .sav file from the example sub-folder PSSE's installation folder
Copy the corresponding spreadsheet to the working case (shown in spreadsheet view) with the same "info" (say, bus, branch,etc.) in PSSE GUI
After finishing copying everything, then save the edited working case in GUI as a new working case.
If this doesn't work, I suggest you to ask this question on forum of "Python for Power Systems":
https://psspy.org/psse-help-forum/questions/

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.