pandas - python export as xls instead xlsx - ExcelWriter - python

I would like to export my pandas dataframe as a xls file and not a xlsx.
I use ExcelWriter.
I have done :
xlsxWriter = pd.ExcelWriter(str(outputName + "- Advanced.xls"))
Unfortunatly, nothing outputs.
I think I have to change the engine, but I don't know how?

You can use to_excel and pass the extension .xls as the file name:
df.to_excel(file_name_blah.xls)
pandas will use a different module to write the excel sheet out, note that it will require you to have the pre-requisite 3rd party module installed.

If for some reason you do need to explicitly call pd.ExcelWriter, here's how:
outputName = "xxxx"
xlsWriter = pd.ExcelWriter(str(outputName + "- Advanced.xls"), engine = 'xlwt')
# Convert the dataframe to an Excel Writer object.
test.to_excel(xlsWriter, sheet_name='Sheet1')
# Close the Pandas Excel writer and output the Excel file.
xlsWriter.save()
It's critical not to forget the save() command. That was your problem.
Note that you can also set the engine directly like so: test.to_excel('test.xls', engine='xlwt')

The easiest way to do this is to install the "xlwt" package on your active Env.
pip install xlwt
then just simply use the below code:
df.to_excel('test.xls')

Related

Converting xlsx files to xls to use with pandas [duplicate]

This question already has answers here:
Pandas cannot open an Excel (.xlsx) file
(5 answers)
Closed 2 years ago.
I have a repetetive task, where I download multiple excel files (I'm forced to download in xlsx format), I then take column G from each excel file and concatenate them into "final.xlsx" Then "final.xlsx" is compared to another excel workbook to see if all number instances are matched in each workbook.
I'm now working on making a cross platform python app to solve this. However, pandas won't allow xlsx files anymore, and manually opening and saving them as xls files just adds more repetitive manual labour.
Is there a cross-platform way for python to convert xlsx files to xls?
Or should I abandon pandas and go with openpyxl since I'm forced to handle xlsx format?
I tried using this without success ;
from pathlib import Path
import openpyxl
import os
# get files
os.chdir(os.path.abspath(os.path.dirname(__file__)))
pdir = Path('.')
filelist = [filename for filename in pdir.iterdir() if filename.suffix == '.xlsx']
for filename in filelist:
print(filename.name)
for infile in filelist:
workbook = openpyxl.load_workbook(infile)
outfile = f"{infile.name.split('.')[0]}.xls"
workbook.save(outfile)
You can still use pandas, but you would need openpyxl. As you have it in your code, I suppose it is ok for you.
Otherwise, you can install it via: pip install openpyxl.
The following illustrates how this can work. Kr.
import pandas as pd
fpath = r".\test.xlsx"
df = pd.read_excel (fpath, engine='openpyxl')
print(df)
A B
0 1 2
1 1 2
Previously, the default argument engine=None to read_excel() would result in using the xlrd engine in many cases, including new Excel 2007+ (.xlsx) files. If openpyxl is installed, many of these cases will now default to using the openpyxl engine. See the read_excel() documentation for more details.
Thus, it is strongly encouraged to install openpyxl to read Excel 2007+ (.xlsx) files. Please do not report issues when using xlrd to read .xlsx files. This is no longer supported, switch to using openpyxl instead.
https://pandas.pydata.org/docs/whatsnew/v1.2.0.html

importing an excel file to python

I have a basic question about importing xlsx files to Python. I have checked many responses about the same topic, however I still cannot import my files to Python whatever I try. Here's my code and the error I receive:
import pandas as pd
import xlrd
file_location = 'C:\Users\cagdak\Desktop\python_self_learning\Coursera\sample_data.xlsx'
workbook = xlrd.open_workbook(file_location)
Error:
IOError: [Errno 2] No such file or directory: 'C:\\Users\\cagdak\\Desktop\\python_self_learning\\Coursera\\sample_data.xlsx'
With pandas it is possible to get directly a column of an Excel file. Here is the code.
import pandas
df = pandas.read_excel('sample.xls')
#print the column names
print df.columns
#get the values for a given column
values = df['column_name'].values
#get a data frame with selected columns
FORMAT = ['Col_1', 'Col_2', 'Col_3']
df_selected = df[FORMAT]
You should use raw strings or escape your backslash instead, for example:
file_location = r'C:\Users\cagdak\Desktop\python_self_learning\Coursera\sample_data.xlsx'
or
file_location = 'C:\\Users\\cagdak\\Desktop\python_self_learning\\Coursera\\sample_data.xlsx'
go ahead and try this:
file_location = 'C:/Users/cagdak/Desktop/python_self_learning/Coursera/sample_data.xlsx'
As pointed out above Pandas supports reading of Excel spreadsheets using its read_excel() method. However, it is dependent upon a number of external libraries depending on which version Excel/odf is being accessed. It defaults to selecting one automatically, though one can be specified using the engine parameter. Here's an excerpt from the docs:
"xlrd" supports old-style Excel files (.xls).
"openpyxl" supports newer Excel file formats.
"odf" supports OpenDocument file formats (.odf, .ods, .odt).
"pyxlsb" supports Binary Excel files.
If the required library is not already installed you'll see an error message suggesting library you need to install.

Preserve formatting when modifying an Excel (xlsx) file with Python

Is there any Python module out there that can be used to create an Excel XLSX file replicating the format from a template?
As far as I understood openpyxl supports this. This is example from docs:
from openpyxl import load_workbook
wb = load_workbook('sample_book.xltx')
ws = wb.active
ws['D2'] = 42
wb.save('sample_book.xlsx')
You can use openpyxl to open a template file and then populate it with data and save it as something else to preserve the original template for use later. Check out this answer: Working with Excel In Python

How to append to an existing excel sheet with XLWT in Python

I have created an excel sheet using XLWT plugin using Python. Now, I need to re-open the excel sheet and append new sheets / columns to the existing excel sheet. Is it possible by Python to do this?
After investigation today, (2014-2-18) I cannot see a way to read in a XLS file using xlwt. You can only write from fresh. I think it is better to use openpyxl. Here is a simple example:
from openpyxl import Workbook, load_workbook
wb = Workbook()
ws = wb.create_sheet()
ws.title = 'Pi'
ws.cell('F5').value = 3.14156265
wb.save(filename=r'C:\book2.xls')
# Re-opening the file:
wb_re_read = load_workbook(filename=r'C:\book2.xls')
sheet = wb_re_read.get_sheet_by_name('Pi')
print sheet.cell('F5').value
See other examples here: http://pythonhosted.org/openpyxl/usage.html (where this modified example is taken from)
You read in the file using xlrd, and then 'copy' it to an xlwt Workbook using xlutils.copy.copy().
Note that you'll need to install both xlrd and xlutils libraries.
Note also that not everything gets copied over. Things like images and print settings are not copied, for example, and have to be reset.

MySQLdb to Excel

I have a Django project which has a mysql database backend. How can I export contents from my db to an Excel (xls, xlsx) format?
phpMyAdmin has an Export tab, and you can export in CSV. This can be imported into Excel.
http://pypi.python.org/pypi/xlwt
If you need a xlsx (excel 2007) exporter, you can use openpyxl. Otherwise xlwt is an option.
Openpyxl is a great choice,
but if you don't wanna go through a new thing you can simply write you own exporting function:
for example you can export things in CSV format like this:
def CVSExport(database_array):
f_csv = open('mydatabase.csv', 'w')
for row in database_array:
f_csv.write('"%s";;;;;"%s"\n'%(row[0], row[1]))
f_csv.close()
when you open exported file by excel you should set ";;;;;" as separator.

Categories

Resources