Converting HTML to Excel with Django

Converting HTML to Excel with Django - python

I have a reporting module in my Django app that gives the user the ability to see their reports on screen or to export them and have the export opened by Excel.
The export is a cheat. I take the exact same output as the screen version and save it to a file with an .xls extension and
response = HttpResponse(body, content_type='application/vnd.ms-excel')
and badda-boom, badda-bing I have an Excel file that is lightly formatted, i.e. it respects the css styling that I've applied.
The nice thing for the user is that the file auto-opens in Excel; there aren't any extra steps for them. (find the download, import a text file, etc.)
Unfortunately it looks like Excel 2016 has decided (I'm guessing) that that's a security issue and no longer opens the file.
I'm aware of various python -> Excel tools. openpyxl looks promising. But that's going to require me to touch each report.
So, what I'm looking for is something that would give me what I have now, take an html file and have Excel open it as a native file and recognize the existing formatting.

The behavior change has been noted by Microsoft and there are work arounds, for the user:
https://support.microsoft.com/en-us/kb/3181507
It sounds like they're working on a fix.

Related

Python + Linux - Excel to HTML (keeping format)

I'm looking for a way to convert excel to html while preserving formatting.
I know this is doable on windows due to the availability of some underlying win32 libraries, (eg via xlwings
Python - Excel to HTML (keeping format))
But I'm looking for a solution on Linux.
I've also come by Aspose Cells but this requires a paid license or else it will add a lot of extra junk to the output that needs to be scrubbed out.
And lastly I tried the python lib xlsx2html but it does a very poor job at preserving formatting.
Are there any suggestions for a Linux based solution? I'd also be interested in tools written in other languages that can be easily wrapped around via python.
Thanks in advance!
Update:
Here is an example of a random excel sheet I converted via excel itself that I would like to reproduce. It has some colors, some border variations, some merged cells and some font sizes to see if they all work.

You can use LibreOffice to convert an Excel file to a HTML file using the command line:
# --convert-to implies --headless so it's not mandatory to specify --headless
soffice --headless --convert-to html data.xlsx
You can refer to the documentation to know more about other CLI parameters.

I think you should search for Excel to HTML in the JS world not python (I am not saying it is not possible, but It's more usual in JS), I promise you will get better results.
In my opinion, finding a JS-based solution and make a python wrapper can be more helpful. Because in JS community, they struggled more than another communities to import and work with Excels.
Another idea is to change your approach, look for how you can import an Excel file in an embedded way or iframe inside an HTML page with JS and then export it.
But again, I highly recommend to check JS libraries or GitHub repositories, some of them care about formatting.

Jupyter Notebook issue

I ran some commands on Jupyter Notebook and expected to get a printed output containing data in tabulated form in a .csv file, but then i get an uncompleted output
This is the result i get from the .csv file
I ran this command;
df1=pandas.read_csv("supermarkets.csv", on_bad_lines='skip')
df1
I expected to get a printed output in a tabulated like in the image attached......
The data get printed in well tabulated form here
Here is a link to the online version of the file
[pythonhow.com/supermarkets.csv]

Getting good, clean quality data where the file extension correctly matches the actual content is often a challenge. Assessing the state of the input data is generally always a very important first step.
It appears the data you are trying to get is also online here. Github will render that as a table in the browser because it has a viewer mode. To look at the 'raw' file content, click here. You'll see it is nice comma-delimited file with columns separated by commas and rows each on a different line. The header with the column names is on the first line.
Now open in a good text editor the file you have that you are working with and compare it to the content I pointed you at. That should guide you on what is the issue.
At this point you may just wish to switch to using the version of the file that I pointed you at.
Use the link below to obtain it as proper csv file:
https://raw.githubusercontent.com/kenvilar/data-analysis-using-python/master/supermarkets.csv
You should be able to paste that link in your browser and then right click on the page and choose 'Save as..' to download it to your locak machine. The obtained file should open just fine using the code you showed in the screenshot in your post here.
Please work on writing better questions with specific titles, see here for guidance. The title at present is overly broad and is actually not accurate. This code would not work with the data you apparently have even if you were running it inside a Python code-based script. And so it is not a Jupyter notebook issue. For how to think about making it specific, a good thing to keep in mind is to write for your future self. If you continue to use notebooks you'll have hundreds that would be considered a 'Jupyter Notebook issue', but what makes this issue different from those?

I believe there is an issue with your csv file, not the code.
To me it looks like the data in your csv file are written in json format.
Have you opened the supermarkets.csv file using excel? it should look like a table, not a json formatted file.

did you try df1.show() to see if the csv got read in the first place?

Outputting A .xls File In Python

I have been teaching myself Python to automate some of our work processes. So far reading from Excel files (.xls, .xlsx) has gone great.
Currently I have hit a bit of a snag. Although I can output .xlsx files fine, the software system that we have to use for our primary work task can only take .xls files as an input - it cannot handle .xlsx files, and the vendor sees no reason to add .xlsx support at any point in the foreseeable future.
When I try to output a .xls file using either Pandas or OpenPyXl, and open that file in Excel, I get a warning that the file format and extension of the file do not match, which leads me to think that attempting to open this file using our software could lead to some pretty unexpected consequences (because it's actually a .xlsx file, just not named as such)
I've tried to search for how to fix this all on Google, but all I can find are guides for how to convert a .xls file to a .xlsx file (which is almost the opposite of what I need). So I was wondering if anybody could please help me on whether this can be achieved, and if it can, how.
Thank you very much for your time

Under the pandas.DataFrame.to_excel documentation you should notice a parameter called engine, which states:
engine : str, optional
Write engine to use, openpyxl or xlsxwriter. You can also set this via the options io.excel.xlsx.writer, io.excel.xls.writer, and io.excel.xlsm.writer.
What it does not state is that the engine param is automatically picked based on your file extension -- therefore, easy fix:
import pandas as pd
df = pd.DataFrame({"data": [1, 2, 3]})
df.to_excel("file.xls") # Notice desired file extension.
This will automatically use the xlwt engine, so make sure you have it installed via pip install xlwt.

Felipe is right the filename extension will set the engine parameter.
So basically all it's saying is that the old Excel format ".xls" extension is no longer supported in Pandas. So if you specify the output spreadsheet with the ".xlsx" extension the warning message disappears.

I FINALLY have the answer!
I have libreoffice installed and am using the following in the command line on windows:
"C:\Program Files\LibreOffice\program\soffice.exe" --headless --convert-to xlsx test2.xls
Currently trying to use subprocess to automate this.

Embed CSV in Excel and import the data

I wrote a tool that extracts data from a large DB and outputs it to an Excel file along with (conditional) formatting to improve readability. For this I use Python with openpyxl on a Linux machine. It works great, but this package is rather slow for writing Excel.
It seems to be a lot quicker to dump the table as (compressed) csv, import that into Excel and apply formatting there using a macro/vba.
To automate the process I'd like to create an empty Excel file pre-loaded with the required VBA to do the formatting; a template. For every data dump, the data is embedded (compressed using deflate) into the Excel file and loaded into the Workbook upon opening the document (or using a "LOAD" button to circumvent macro related security things).
However, just adding some file into the Excel file raises an error when opened:
We found a problem with some content in 'Werkmap1_test_embed.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.
Clicking Yes opens the file and shows some tracing information as XML:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>Repair Result to Werkmap1_OLE_Word0.xml</logFileName>
<summary>Errors were detected in file '/Users/joostk/mnt/cluster/Werkmap1_OLE_Word.xlsx'</summary>
<additionalInfo>
<info>Excel completed file level validation and repair. Some parts of this workbook may have been repaired or discarded.</info>
</additionalInfo>
</recoveryLog>
Is it possible to avoid this? How would I embed a file into the Excel ZIP? Do I need to update some file table (which I could not file easily).
When that's done, I'd like to import the data. Can I access files in the Excel ZIP from VBA? I guess not, and I need to extract the data to some temporary path and load it from there.
I have found these helpful answers elsewhere to load ZIP and plain text:
https://stackoverflow.com/a/35781621/4998990
https://stackoverflow.com/a/11267603/4998990
Many thanks for sharing your thoughts!

so my "Answer" here is that this is caused by using Named Ranges, or an underlying table, or an embedded Query/Connection. When you start manipulating this file you will get the error that you are talking about:
There is no harm to the file if you click "yes" and open. Excel will open this in Repaired Mode which will require you to re-save the file.
The way I've worked around this is to re-read the "repaired" file, in python, and save it as another file or replace it. Essentially just do an extra step of re-reading the data into memory, and write it to a new file. The error will go away. As always, test this method before deploying to production to ensure no records are lost. The way I solve it is with two lines of pandas.
import pandas as pd
repair = pd.read_excel('PATH_TO_REPAIR_FILE')
new_file = repair.to_excel('PATH_TO_WHERE_NEW_FILE_GOES')

Generating correct excel xls format

I created a little script in python to generate an excel compatible xml file (saved with xls extension). The file is generated from a part database so I can place an order with the extracted data.
On the website for ordering the parts, you can import the excel file so the order fills automatically. The problem here is that each time I want to make an order, I have to open excel and save the file with xls extension of type MS Excel 97-2003 to get the import working.
The excel document then looks exactly the same, but when opened with notepad, we cannot see the xml anymore, only binary dump.
Is there a way to automate this process, by running a bat file or maybe adding some line to my python script so it is converted in the proper format?
(I know that question has been asked before, but it never has been answered)

There are two basic approaches to this.
You asked about the first: Automating Excel to open and save the file. There are in fact two ways to do that. The second is to use Python tools that can create the file directly in Python without Excel's help. So:
1a: Automating Excel through its automation interface.
Excel is designed to be controlled by external apps, through COM automation. Python has a great COM-automation interface inside of pywin32. Unfortunately, the documentation on pywin32 is not that great, and all of the documentation on Excel's COM automation interface is written for JScript, VB, .NET, or raw COM in C. Fortunately, there are a number of questions on this site about using win32com to drive Excel, such as this one, so you can probably figure it out yourself. It would look something like this:
import win32com.client
excel = win32com.client.Dispatch('Excel.Application')
spreadsheet = excel.Workbooks.Open('C:/path/to/spreadsheet.xml')
spreadsheet.SaveAs('C:/path/to/spreadsheet.xls', fileformat=excel.xlExcel8)
That isn't tested in any way, because I don't have a Windows box with Excel handy. And I vaguely remember having problems getting access to the fileformat names from win32com and just punting and looking up the equivalent numbers (a quick google for "fileformat xlExcel8" shows that the numerical equivalent is 56, and confirms that's the right format for 97-2003 binary xls).
Of course if you don't need to do it in Python, MSDN is full of great examples in JScript, VBA, etc.
The documentation you need is all on MSDN (since the Office Developer Network for Excel was merged into MSDN, and then apparently became a 404 page). The top-level page for Excel is Welcome to the Excel 2013 developer reference (if you want a different version, click on "Office client development" in the navigation thingy above and pick a different version), and what you mostly care about is the Object model reference. You can also find the same documentation (often links to the exact same webpages) in Excel's built-in help. For example, that's where you find out that the Application object has a Workbooks property, which is a Workbooks object, which has Open and Add methods that return a Workbook object, which has a SaveAs method, which takes an optional FileFormat parameter of type XlFileFormat, which has a value xlExcel8 = 56.
As I implied earlier, you may not be able to access enumeration values like xlExcel8 for some reason which I no longer remember, but you can look the value up on MSDN (or just Google it) and put the number 56 instead.
The other documentation (both here and elsewhere within MSDN) is usually either stuff you can guess yourself, or stuff that isn't relevant from win32com. Unfortunately, the already-sparse win32com documentation expects you to have read that documentation—but fortunately, the examples are enough to muddle your way through almost everything but the object model.
1b: Automating Excel via its GUI.
Automating a GUI on Windows is a huge pain, but there are a number of tools that make it a whole lot easier, such as pywinauto. You may be able to just use swapy to write the pywinauto script for you.
If you don't need to do it in Python, separate scripting systems like AutoIt have an even larger user base and even more examples to make your life easier.
2: Doing it all in Python.
xlutils, part of python-excel, may be able to do what you want, without touching Excel at all.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.