Converting Excel to CSV python - python

I am using xlrd to convert my .xls Excel file to a CSVfile yet when I try to open the workbook my program crashes sending an error message
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xlrd/book.py", line 1224, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found 'Chrom\tPo'
The Chrom\tPo is part of my header for the excel file yet I don't understand what the error is with the Excel file and how to change it.
The program crashes right when i try to open the excel file using xlrd.open_workbook('Excel File')

I would use openpyxl for this.
import openpyxl
wb = openpyxl.load_workbook(file_name)
ws = wb.worksheets[page_number]
table = []
for row_num in range(ws.get_highest_row()):
temp_row = []
for col_num in range(ws.get_highest_column()):
temp_row.append(ws.cell(row=row_num, col=col_num).value)
table.append(temp_row[:])
This will give you the contents of the sheet as a 2-D list, which you can then write out to a csv or use as you wish.
If you're stuck with xlrd for whatever reason, You may just need to convert your file from xls to xlsx

Here is an answer from a previous question: How to save an Excel worksheet as CSV from Python (Unix)?
The answer goes for openpyxl and xlrd.

Related

Error open file after saving it with storeFile of pysmb

I am reading an Excel file (.xlsx) with pysmb.
import tempfile
from smb.SMBConnection import SMBConnection
conn = SMBConnection(userID, password, client_machine_name, server_name, use_ntlm_v2 = True)
conn.connect(server_ip, 139)
file_obj = tempfile.TemporaryFile()
file_attributes, filesize = conn.retrieveFile(service_name, test.xlsx, file_obj)
This step works, I am able to transform the file in pandas.DataFrame
import pandas as pd
pd.read_excel(file_obj)
Next, I want to save the file, the file is saved but if I want to open it with Excel, I have an error message "Excel has run into an error"
Here the code to save the file
conn.storeFile(service_name, 'test_save.xlsx', file_obj)
file_obj.close()
How can I save correctly the file and open it with excel ?
Thank you
I tried with a .txt file file and it is working. An error occurs with .xlsx, .xls and .pdf files. I have also tried without extension, same issue, imossible to open the file.
I would like to save the file with .pdf and .xlsx extension, and open it.
Thank you.
I found a solution an I will post it here in case someone face a similar issue.
Excel can be save as a binary stream.
from io import BytesIO
df = pd.read_excel(file_obj)
output = BytesIO()
writer = pd.ExcelWriter(output, engine='xlsxwriter')
df.to_excel(writer, sheet_name='data', index = False)
writer.save()
output.seek(0)
conn.storeFile(service_name, 'test_save.xlsx', output)

Python Pandas XLRDError when reading .xls files

I'm having a problem with reading .xls files in Pandas.
Here's the code
df = pd.read_excel('sample.xls')
And the output states,
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\xff\xfeD\x00A\x00T\x00'
Anyone experiencing the same issue? How to fix it?
# Changing the data types of all strings in the module at once
from __future__ import unicode_literals
# Used to save the file as excel workbook
# Need to install this library
from xlwt import Workbook
# Used to open to corrupt excel file
import io
filename = r'sample.xls'
# Opening the file using 'utf-16' encoding
file1 = io.open(filename, "r", encoding="utf-16")
data = file1.readlines()
# Creating a workbook object
xldoc = Workbook()
# Adding a sheet to the workbook object
sheet = xldoc.add_sheet("Sheet1", cell_overwrite_ok=True)
# Iterating and saving the data to sheet
for i, row in enumerate(data):
# Two things are done here
# Removeing the '\n' which comes while reading the file using io.open
# Getting the values after splitting using '\t'
for j, val in enumerate(row.replace('\n', '').split('\t')):
sheet.write(i, j, val)
# Saving the file as an excel file
xldoc.save('1.xls')
Credits to this Medium Article

Python - XLRDError: Unsupported format, or corrupt file: Expected BOF record

I am trying to open an excel file which was given to me for my project, the excel file is the file that we get from a SAP system. But when I try opening it using pandas I am getting the following error:
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '\xff\xfe\r\x00\n\x00\r\x00'
The following is my code:
import pandas as pd
# To open an excel file
df = pd.ExcelFile('myexcel.xls').parse('Sheet1')
Dont know whether it will work for you once it had worked for me, but anyway can you try the following:
from __future__ import unicode_literals
from xlwt import Workbook
import io
filename = r'myexcel.xls'
# Opening the file using 'utf-16' encoding
file1 = io.open(filename, "r", encoding="utf-16")
data = file1.readlines()
# Creating a workbook object
xldoc = Workbook()
# Adding a sheet to the workbook object
sheet = xldoc.add_sheet("Sheet1", cell_overwrite_ok=True)
# Iterating and saving the data to sheet
for i, row in enumerate(data):
# Two things are done here
# Removeing the '\n' which comes while reading the file using io.open
# Getting the values after splitting using '\t'
for j, val in enumerate(row.replace('\n', '').split('\t')):
sheet.write(i, j, val)
# Saving the file as an excel file
xldoc.save('myexcel.xls')
I had faced the same xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; error and solved it by writing an XML to XLSX converter. You can call pd.ExcelFile('myexcel.xlsx') after the convertion. The reason is that actually, pandas uses xlrd for reading Excel files and xlrd does not support XML Spreadsheet (*.xml) i.e. NOT in XLS or XLSX format.
import pandas as pd
from bs4 import BeautifulSoup
def convert_to_xlsx():
with open('sample.xls') as xml_file:
soup = BeautifulSoup(xml_file.read(), 'xml')
writer = pd.ExcelWriter('sample.xlsx')
for sheet in soup.findAll('Worksheet'):
sheet_as_list = []
for row in sheet.findAll('Row'):
sheet_as_list.append([cell.Data.text if cell.Data else '' for cell in row.findAll('Cell')])
pd.DataFrame(sheet_as_list).to_excel(writer, sheet_name=sheet.attrs['ss:Name'], index=False, header=False)
writer.save()
What worked for me was applying this advice:
How to cope with an XLRDError
There you also find a suitable explanation that was appropiated for me. It says that the problem was a file format not correctly saved. When I opened the xls file, it offered to save it as html.I saved it a ".xlsx" and solved the problem

Saving attachments from outlook, error when loading with pandas/xlrd

I have this script, which has previously worked for other emails, to download attachments:
import win32com.client as win
import xlrd
outlook = win.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder("6")
all_inbox = inbox.Items
subject = 'Email w/Attachment'
attachment1 = 'Attachment - 20160715.xls'
for msg in all_inbox:
if msg.subject == subject:
break
for att in msg.Attachments:
if att.FileName == attachment1:
break
att.SaveAsFile('L:\\My Documents\\Desktop\\' + attachment1)
workbook = xlrd.open_workbook('L:\\My Documents\\Desktop\\' + attachment1)
However, when I try and open the file using xlrd reader (or with pandas)I get this:
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\nVisit '
Can anyone explain what's gone wrong here?
Is there a way I can open the attachment, without saving it, and just copy a worksheet and save that copy as a .csv file instead?
Thank you
take a look at this question. It's possible the file you are trying to download is not a true excel file, but a csv saved as an .xls file. The evidence is the error message Expected BOF record; found b'\r\nVisit '. I think an excel file would start with <?xml or something to that effect. You could get around it with a try/catch:
import pandas as pd
try: #try to read it as a .xls file
workbook = xlrd.open_workbook(path)
except XLRDError: #if fails, read as csv
workbook = pd.read_csv(path)

How to write in to already opened excel file by using openpyxl

I opened a excel file by using the following code:
from openpyxl import load_workbook
wb = load_workbook('path of the file')
DriverTableSheet = wb.get_sheet_by_name(name = 'name of the sheet')
after that I have to append some values in that excel file..
for that I used the following code
DriverTableSheet.cell(row=1, column=2).value="value"
But it is not responding. Can u guys please guide how to write / append a data in that excel file and save that excel file

Categories

Resources