Export Python Dataframe to Excel - python

I'm trying to export a Python Dataframe to excel using xlsx or csv...
Here is the code I tried to use:
export_word_count = word_count.to_excel (r'C:\Users\OTR\PycharmProjects\MyProjects\word_count.xlsx', index = None, header=True)
I keep getting the following error messages:
Traceback (most recent call last):
File "C:/Users/OTR/PycharmProjects/MyProjects/CAP_Test_MotsCles.py", line 35,
in <module>
export_word_count = word_count.to_excel
(r'C:\Users\OTR\PycharmProjects\MyProjects\word_count_CAP.xlsx', index = None,
header=True)
File "C:\Users\OTR\PycharmProjects\MyProjects\venv\lib\site-
packages\pandas\core\generic.py", line 2127, in to_excel
engine=engine)
File "C:\Users\OTR\PycharmProjects\MyProjects\venv\lib\site-packages\pandas\io\formats\excel.py", line 656, in write
writer = ExcelWriter(_stringify_path(writer), engine=engine)
File "C:\Users\OTR\PycharmProjects\MyProjects\venv\lib\site-packages\pandas\io\excel.py", line 1204, in __init__
from openpyxl.workbook import Workbook
ModuleNotFoundError: No module named 'openpyxl'
Any output on this would be greatly appreciated. I tried to tweek the code, but still wouldn't export. Thank you.
EDIT:
Managed to export, but having issues with full data export
Sample Python Data:
products 58
company 53
cannabis 42
business 39

You dont have python openpyxl module installed.
Install it with:
pip install openpyxl

Your words are your index. Right now you are not exporting the index.
Try changing your code to:
word_count.to_excel (r'C:\Users\OTR\PycharmProjects\MyProjects\word_count.xlsx', index =True, header=True)
‘index=True’ is the default behavior, so not actually necessary.

Related

Creating and using Excel with multiple sheets in Python

I am creating an Excel file with two sheets from 2 different data frames. Here is a sample code:
import pandas as pd
import openpyxl
import xlsxwriter
with pd.ExcelWriter('output.xlsx',engine='xlsxwriter') as wr:
df1.to_excel(wr, sheet_name='Final',index=False)
df2.to_excel(wr, sheet_name='Tie_Line_Data',index=False)
When I run it in any Python IDE, it runs perfectly. However, when I run it in Terminal, it gives the following error:
Traceback (most recent call last):
File "/home/memiit/public_html/Pooja_MemiIT_Server/weather_demand_data_automated_mail.py", line 382, in <module>
with pd.ExcelWriter('output.xlsx',engine='xlsxwriter') as wr:
File "/home/memiit/public_html/python3/lib/python3.6/site-packages/pandas/io/excel/_xlsxwriter.py", line 187, in __init__
self.book = xlsxwriter.Workbook(path, **engine_kwargs)
AttributeError: module 'xlsxwriter' has no attribute 'Workbook'
How do I solve this, or is there any alternative way of doing it.

Pandas ExcelWriter yields AttributeError

While trying to create an xlsx file with pandas, I receive the following error:
Traceback (most recent call last):
File "<ipython-input-24-201aac2da411>", line 1, in <module>
writer = pd.ExcelWriter('test_file2.xlsx', engine='xlsxwriter', options={'constant_memory': True})
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\excel.py", line 1945, in __init__
self.book = xlsxwriter.Workbook(path, **engine_kwargs)
AttributeError: module 'xlsxwriter' has no attribute 'Workbook'
My code :
import pandas pd
writer = pd.ExcelWriter('test_file.xlsx', engine='xlsxwriter', options={'constant_memory': True})
Versions
Python 3.7.3
Pandas 0.24.2
I resolved the issue. Previously, I had downloaded the xlsxwriter package and added it to my python pathmanager in Spyder. This conflicted with pandas. Deleting the path and xlsxwriter resolved the issue.

Why I am not able to load excel files generated in the morning, but can load them in the afternoon in Python using Openpyxl

I am using Python Openpyxl to import excel files which are generated by a online tool. When I import the files generated in the morning, I got an error like this:
Traceback (most recent call last):
File "test4.py", line 8, in <module>
wb = openpyxl.load_workbook (temp2)
File "C:\Python27\lib\site-packages\openpyxl\reader\excel.py", line 201, in load_workbook
wb.properties = DocumentProperties.from_tree(src)
File "C:\Python27\lib\site-packages\openpyxl\descriptors\serialisable.py", line 89, in from_tree
return cls(**attrib)
File "C:\Python27\lib\site-packages\openpyxl\packaging\core.py", line 106, in__init__
self.modified = modified
File "C:\Python27\lib\site-packages\openpyxl\descriptors\base.py", line 267, in __set__
value = W3CDTF_to_datetime(value)
File "C:\Python27\lib\site-packages\openpyxl\utils\datetime.py", line 40, in W3CDTF_to_datetime
dt = [int(v) for v in match.groups()[:6]]
AttributeError: 'NoneType' object has no attribute 'groups'
The strange thing is I only got this error when I importing the files which are generated by the online tool in the morning. I tried the same file but generated in the afternoon, it works very well. I'm confused where the problem is. There are no fields in the excel files related to time. And the files generated in the morning and in the afternoon are exactly the same except the modified time. Does anybody can help me with it? Thank you.
Excel files created from this online tool isn't well compatible with openpyxl
The function load_workbook will get workbook-level information and assign to Workbook()'s wb.properties from 'docProps/core.xml' by opening excel file through zipfile. One piece of information is modified time.
The value of modified raise the error, it can't be transported into datetime. The pattern of 'modified' must be openpyxl.utils.datetime.W3CDTF_REGEX, which is W3CDTF|W3C Date and Time Formats
You can check the excel's modified time if it corresponds to W3CDTF. Here is the code:
from openpyxl.reader.excel import _validate_archive
archive = _validate_archive('/path/to/yourexcel.xlsx')
valid_files = archive.namelist()
# you'll find 'xx/core.xml' I'm not sure if it's 'docProps/core.xml'
print valid_files
# read 'xx/core.xml'
wb_info = archive.read('docProps/core.xml')
print wb_info
In wb_info, you will find something like
<dcterms:modified xsi:type="dcterms:W3CDTF">2017-04-01T22:48:48Z</dcterms:modified>.
Contrast wb_info of excel files from online tool and your pc.

Getting error while opening excel file in Python

Hi I am very new to python, here i m trying to open a xls file in python code but it is showing me some error as below.
Code:
from xlrd import open_workbook
import os.path
wb = open_workbook('C:\Users\xxxx\Desktop\a.xlsx')
Error:Traceback (most recent call last):
File "C:\Python27\1.py", line 3, in <module>
wb = open_workbook('C:\Users\xxxx\Desktop\a.xlsx')
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 429, in open_workbook
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 1545, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 1539, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found 'PK\x03\x04\x14\x00\x06\x00'
need help guyz
This is a version conflict issue. Your Excel sheet format and the format that xlrd expects are different. You could try to save the Excel sheet in a different format until you find what xlrd expects.
Not familiar with xlrd, but nothing wrong appears on my Mac.
According to #jewirth, you can try to rename the suffix to xls which is the old version, and then reopen it or convert it into xlsx.
from xlrd import open_workbook
import os.path
wb = open_workbook(r'C:\Users\XXXX\Desktop\a.xlsx')
print wb
Output : <xlrd.book.Book object at 0x0260E490>
Opened the excel in 'r' format and it shows the excel object. Its working normally. Try to get the xlrd version and update it. Change the excel file format to '.xls' from '.xlsx' and try
You are getting that error because you are using an old version of xlrd which doesn't support xlsx.
You need to upgrade to a recent version of xlrd.

Can't load xlsx file

I am trying to read attached xlsx (Click here to download ) file using python openpyxl. However, workbook cannot be loaded. Here is my attempt to open xlsx file in python -
>>> from openpyxl import load_workbook
>>> workbook = load_workbook(filename = "test.xlsx")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\openpyxl\reader\excel.py", line 136, in load_workbook
_load_workbook(wb, archive, filename, use_iterators, keep_vba)
File "C:\Python27\lib\site-packages\openpyxl\reader\excel.py", line 198, in _load_workbook
keep_vba=keep_vba)
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 332, in read_worksheet
fast_parse(ws, xml_source, string_table, style_table, color_index)
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 320, in fast_parse
parser.parse()
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 137, in parse
dispatcher[tag_name](element)
File "C:\Python27\lib\site-packages\openpyxl\reader\worksheet.py", line 176, in parse_merge
self.ws.merge_cells(mergeCell.get('ref'))
File "C:\Python27\lib\site-packages\openpyxl\worksheet.py", line 815, in merge_cells
raise InsufficientCoordinatesException(msg)
openpyxl.shared.exc.InsufficientCoordinatesException: Range must be a cell range (e.g. A1:E1)
It appears that your .xlsx file is damaged or permanently corrupted. The reasons could be many. One of them could be that you might have renamed the extension of the file to .xlsx which would invalidate the file. To confirm this beahviour, please try to open this file in Microsoft Excel.
I tried reading the file through, openpyxl, xlrd and pandas but none of them worked.
>>> import xlrd
>>> xlrd.open_workbook('test.xlsx')
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '<html> <'
>>> from openpyxl import load_workbook
>>> workbook = load_workbook(filename = "test.xlsx")
InvalidFileException: File is not a zip file
>>> import pandas
>>> pandas.ExcelFile('test.xlsx')
InvalidFileException: File is not a zip file
I ran into this issue trying to open every file in a directory ending in *.xlsx .
I later found the file that caused the error was named ~$filename.xlsx . I'm guessing that Microsoft indicates that a file is currently opened by creating a file with the same name, prepended with the ~$. Once I closed the file, everything worked as expected.
The problem was that some merged cells were, in fact, merged with themselves. openpyxl expected a merged cell reference always to be a range of cells. A fix for the problem which ignores meaningless merges has been added to the 2.0 branch.
I like openpyxl and use it for creating xlsx documents. It could be a bug or a missing compatibility with excel feature that takes place in your specific document. I would report it to the openpyxl community
OK Guys.. I have reported this bug to openpyxl developers and they have provided a quick fix on this. Here is the complete thread.
I did never try openpyxl but I use xlrd for reading excel files (.xls and .xlsx). its work great.
see the examples and documentation at http://www.python-excel.org/

Categories

Resources