Tablib xlsx file badZip file issue

Tablib xlsx file badZip file issue - python

I am getting error on opening xlsx extension file in windows 8 using tablib library.
python version - 2.7.14
error is as follows:
python suit_simple_sheet_product.py
Traceback (most recent call last):
File "suit_simple_sheet_product.py", line 19, in <module>
data = tablib.Dataset().load(open(BASE_PATH).read())
File "C:\Python27\lib\site-packages\tablib\core.py", line 446, in load
format = detect_format(in_stream)
File "C:\Python27\lib\site-packages\tablib\core.py", line 1157, in detect_format
if fmt.detect(stream):
File "C:\Python27\lib\site-packages\tablib\formats\_xls.py", line 25, in detect
xlrd.open_workbook(file_contents=stream)
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 120, in open_workbook
zf = zipfile.ZipFile(timemachine.BYTES_IO(file_contents))
File "C:\Python27\lib\zipfile.py", line 770, in __init__
self._RealGetContents()
File "C:\Python27\lib\zipfile.py", line 811, in _RealGetContents
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file
path location is as follows =
BASE_PATH = 'C:\Users\anju\Downloads\automate\catalog-5090 fabric detail and price list.xlsx'

Excel .xlsx files are actually zip files. In order for the unzip to work correctly, the file must be opened in binary mode, as such your need to open the file using:
import tablib
BASE_PATH = r'c:\my folder\my_test.xlsx'
data = tablib.Dataset().load(open(BASE_PATH, 'rb').read())
print data
Add r before your string to stop Python from trying to interpret the backslash characters in your path.

Related

zipfile.BadZipfile: Bad CRC-32 for file | Read only file

Got a read-only file within a zip file which are password protected and I need to extract it to the /tmp directory.
I get a CRC-32 error which suggests that the file would be corrupted yet I know it isn't and is in fact a read-only file. Any Suggestions?
Error:
Traceback (most recent call last):
File "/tmp/usercode.py", line 45, in <module>
zip.extractall('/tmp',pwd = "piso")
File "/usr/lib64/python2.7/zipfile.py", line 1040, in extractall
self.extract(zipinfo, path, pwd)
File "/usr/lib64/python2.7/zipfile.py", line 1028, in extract
return self._extract_member(member, path, pwd)
File "/usr/lib64/python2.7/zipfile.py", line 1084, in _extract_member
shutil.copyfileobj(source, target)
File "/usr/lib64/python2.7/shutil.py", line 49, in copyfileobj
buf = fsrc.read(length)
File "/usr/lib64/python2.7/zipfile.py", line 632, in read
data = self.read1(n - len(buf))
File "/usr/lib64/python2.7/zipfile.py", line 672, in read1
self._update_crc(data, eof=(self._compress_left==0))
File "/usr/lib64/python2.7/zipfile.py", line 647, in _update_crc
raise BadZipfile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipfile: Bad CRC-32 for file 'alien-12.txt'
Code:
# importing required modules
from zipfile import ZipFile
# specifying the zip file name
file_name = "/tmp/alien-12.zip"
# opening the zip file in READ mode
with ZipFile(file_name, 'r') as zip:
# printing all the contents of the zip file
zip.printdir()
# extracting all the files
print('Extracting all the files now...')
zip.extractall('/tmp',pwd = "piso")
print('Done!')
If I change the line of:
zip.extractall('/tmp',pwd = "piso")
then I get the error of:
IOError: [Errno 30] Read-only file system:
Then go on to try and fix it first by trying to output what is in the zip file.
zipfile.testzip() returns which then errors
Error:
RuntimeError: File alien-12.txt is encrypted, password required for extraction

Pandas unable to open this Excel file

I am trying to use python pandas to open an Excel file. Code is simple as shown below;
import pandas as pd
df = pd.read_excel('../TestXLWings.xlsm', sheetname="TestSheet")
I got an error below;
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2\helpers\pydev\pydevd.py", line 1599, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2\helpers\pydev\pydevd.py", line 1026, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/testing/Dropbox/Test-XLwings/test.py", line 3, in <module>
df = pd.read_excel('../TestXLWings.xlsm', sheetname="TestSheet")
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\excel.py", line 203, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\excel.py", line 260, in __init__
self.book = xlrd.open_workbook(io)
File "C:\ProgramData\Anaconda3\lib\site-packages\xlrd\__init__.py", line 441, in open_workbook
ragged_rows=ragged_rows,
File "C:\ProgramData\Anaconda3\lib\site-packages\xlrd\book.py", line 87, in open_workbook_xls
ragged_rows=ragged_rows,
File "C:\ProgramData\Anaconda3\lib\site-packages\xlrd\book.py", line 595, in biff2_8_load
raise XLRDError("Can't find workbook in OLE2 compound document")
xlrd.biffh.XLRDError: Can't find workbook in OLE2 compound document
My Excel file is xlsm and protected by password. What does OLE2 compound document mean exactly? Does pandas have problems opening this kind of Excel files? I am using python v3.6

I will answer my own question. In one of the comments from ayhan, Excel-protected files cannot be read by xlrd. One solution is to remove the protection.
I need the command to unprotect an Excel file from python
Another solution to read the Excel-protected file is to use xlwings. I have verified that xlwings is able to read protected Excel files when the Excel file is opened.

I would create a new excel file and remove sensitivity label in excel. Then be able to read the file with pd.

Unzipping file results in "BadZipFile"

I know there are similar questions but neither of them provided a solution for my problem. I am using the following code:
import os, glob
import zipfile
root = 'E:\\xx\\fashion\\*'
directory = 'E:\\xx\\fashion\\'
extension = ".zip"
date_file_list = []
for folder in glob.glob(root):
if folder.endswith(extension): # check for ".zip" extension
print(folder)
zipfile.ZipFile(os.path.join(directory, folder)).extractall(os.path.join(directory, os.path.splitext(folder)[0]))
os.remove(folder) # delete zipped file_name
And I get the following error:
Traceback (most recent call last):
File "C:/Users/xx/unzip.py", line 12, in <module>
zipfile.ZipFile(os.path.join(directory, folder)).extractall(os.path.join(directory, os.path.splitext(folder)[0]))
File "C:\Users\xx\AppData\Local\Programs\Python\Python35\lib\zipfile.py", line 1026, in __init__
self._RealGetContents()
File "C:\Users\xx\AppData\Local\Programs\Python\Python35\lib\zipfile.py", line 1094, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
Some of the files are compressed in winzip some of them are in 7zip. But there are too many files to unzip.
Anybody know why this error is occurring?

Reading password protected Word Documents with zipfile

I am trying to read a password protected word document on Python using zipfile.
The following code works with a non-password protected document, but gives an error when used with a password protected file.
try:
from xml.etree.cElementTree import XML
except ImportError:
from xml.etree.ElementTree import XML
import zipfile
psw = "1234"
WORD_NAMESPACE = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
PARA = WORD_NAMESPACE + 'p'
TEXT = WORD_NAMESPACE + 't'
def get_docx_text(path):
document = zipfile.ZipFile(path, "r")
document.setpassword(psw)
document.extractall()
xml_content = document.read('word/document.xml')
document.close()
tree = XML(xml_content)
paragraphs = []
for paragraph in tree.getiterator(PARA):
texts = [node.text
for node in paragraph.getiterator(TEXT)
if node.text]
if texts:
paragraphs.append(''.join(texts))
return '\n\n'.join(paragraphs)
When running get_docx_text() with a password protected file, I received the following error:
Traceback (most recent call last):
File "<ipython-input-15-d2783899bfe5>", line 1, in <module>
runfile('/Users/username/Workspace/Python/docx2txt.py', wdir='/Users/username/Workspace/Python')
File "/Applications/Spyder-Py2.app/Contents/Resources/lib/python2.7/spyderlib/widgets/externalshell/sitecustomize.py", line 680, in runfile
execfile(filename, namespace)
File "/Applications/Spyder-Py2.app/Contents/Resources/lib/python2.7/spyderlib/widgets/externalshell/sitecustomize.py", line 78, in execfile
builtins.execfile(filename, *where)
File "/Users/username/Workspace/Python/docx2txt.py", line 41, in <module>
x = get_docx_text("/Users/username/Desktop/file.docx")
File "/Users/username/Workspace/Python/docx2txt.py", line 23, in get_docx_text
document = zipfile.ZipFile(path, "r")
File "zipfile.pyc", line 770, in __init__
File "zipfile.pyc", line 811, in _RealGetContents
BadZipfile: File is not a zip file
Does anyone have any advice to get this code to work?

I don't think this is an encryption problem, for two reasons:
Decryption is not attempted when the ZipFile object is created. Methods like ZipFile.extractall, extract, and open, and read take an optional pwd parameter containing the password, but the object constructor / initializer does not.
Your stack trace indicates that the BadZipFile is being raised when you create the ZipFile object, before you call setpassword:
document = zipfile.ZipFile(path, "r")
I'd look carefully for other differences between the two files you're testing: ownership, permissions, security context (if you have that on your OS), ... even filename differences can cause a framework to "not see" the file you're working on.
Also --- the obvious one --- try opening the encrypted zip file with your zip-compatible command of choice. See if it really is a zip file.
I tested this by opening an encrypted zip file in Python 3.1, while "forgetting" to provide a password. I could create the ZipFile object (the variable zfile below) without any error, but got a RuntimeError --- not a BadZipFile exception --- when I tried to read a file without providing a password:
Traceback (most recent call last):
File "./zf.py", line 35, in <module>
main()
File "./zf.py", line 29, in main
print_checksums(zipfile_name)
File "./zf.py", line 22, in print_checksums
for checksum in checksum_contents(zipfile_name):
File "./zf.py", line 13, in checksum_contents
inner_file = zfile.open(inner_filename, "r")
File "/usr/lib64/python3.1/zipfile.py", line 903, in open
"password required for extraction" % name)
RuntimeError: File apache.log is encrypted, password required for extraction
I was also able to raise a BadZipfile exception, once by trying to open an empty file and once by trying to open some random logfile text that I'd renamed to a ".zip" extension. The two test files produced identical stack traces, down to the line numbers.
Traceback (most recent call last):
File "./zf.py", line 35, in <module>
main()
File "./zf.py", line 29, in main
print_checksums(zipfile_name)
File "./zf.py", line 22, in print_checksums
for checksum in checksum_contents(zipfile_name):
File "./zf.py", line 10, in checksum_contents
zfile = zipfile.ZipFile(zipfile_name, "r")
File "/usr/lib64/python3.1/zipfile.py", line 706, in __init__
self._GetContents()
File "/usr/lib64/python3.1/zipfile.py", line 726, in _GetContents
self._RealGetContents()
File "/usr/lib64/python3.1/zipfile.py", line 738, in _RealGetContents
raise BadZipfile("File is not a zip file")
zipfile.BadZipfile: File is not a zip file
While this stack trace isn't exactly the same as yours --- mine has a call to _GetContents, and the pre-3.2 "small f" spelling of BadZipfile --- but they're close enough that I think this is the kind of problem you're dealing with.

python - extract csv files from 7z

I have lots of csv files contained in different 7z files. I want to find specific csv files in those 7z files and save them decompressed in a different directory.
I have tried
import os
import py7zlib
tree = r'Where_the_7zfiles_are_stored'
dst = r'Where_I_want_to_store_the_csvfiles'
for dirpath, dirname, filename in os.walk(tree):
for myfile in filename:
if myfile.endswith('2008-01-01_2008-04-30_1.7z'):
myZip = py7zlib.Archive7z(open(os.path.join(dirpath,myfile), 'rb'))
csvInZipFile = zip(myZip.filenames,myZip.files)
for myCsvFileName, myCsvFile in csvInZipFile:
if '2008-01' in myCsvFileName:
with open(os.path.join(dst,myCsvFileName),'wb') as outfile:
outfile.write(myCsvFile.read())
but I get the following error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\'\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
execfile(filename, namespace)
File "C:\Users\'\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "C:/Users//'/Documents/Example/unzipfiles.py", line 23, in <module>
outfile.write(myCsvFile.read())
File "C:\Users\'\Anaconda3\lib\site-packages\py7zlib.py", line 576, in read
data = getattr(self, decoder)(coder, data)
File "C:\Users\'\Anaconda3\lib\site-packages\py7zlib.py", line 634, in _read_lzma
return self._read_from_decompressor(coder, dec, input, checkremaining=True, with_cache=True)
File "C:\Users\'\Anaconda3\lib\site-packages\py7zlib.py", line 611, in _read_from_decompressor
tmp = decompressor.decompress(data)
ValueError: data error during decompression
The odd thing is that the method seems to work fine for the first two csv files. I have no idea how to get to the root of the problem. At least the data in the csv files do not seem to be different. Manually unpacking the different csv files using IZArc goes without problem. (The problem occurred in both python 2.7 and 3.4).
I have also tried to use the lzma module, but here I could not figure out how to retrieve the different csv files contained in the 7z file.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tablib xlsx file badZip file issue - python

Related

zipfile.BadZipfile: Bad CRC-32 for file | Read only file

Pandas unable to open this Excel file

Unzipping file results in "BadZipFile"

Reading password protected Word Documents with zipfile

python - extract csv files from 7z

Categories

Resources