Reading password protected Word Documents with zipfile - python

I am trying to read a password protected word document on Python using zipfile.
The following code works with a non-password protected document, but gives an error when used with a password protected file.
try:
from xml.etree.cElementTree import XML
except ImportError:
from xml.etree.ElementTree import XML
import zipfile
psw = "1234"
WORD_NAMESPACE = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
PARA = WORD_NAMESPACE + 'p'
TEXT = WORD_NAMESPACE + 't'
def get_docx_text(path):
document = zipfile.ZipFile(path, "r")
document.setpassword(psw)
document.extractall()
xml_content = document.read('word/document.xml')
document.close()
tree = XML(xml_content)
paragraphs = []
for paragraph in tree.getiterator(PARA):
texts = [node.text
for node in paragraph.getiterator(TEXT)
if node.text]
if texts:
paragraphs.append(''.join(texts))
return '\n\n'.join(paragraphs)
When running get_docx_text() with a password protected file, I received the following error:
Traceback (most recent call last):
File "<ipython-input-15-d2783899bfe5>", line 1, in <module>
runfile('/Users/username/Workspace/Python/docx2txt.py', wdir='/Users/username/Workspace/Python')
File "/Applications/Spyder-Py2.app/Contents/Resources/lib/python2.7/spyderlib/widgets/externalshell/sitecustomize.py", line 680, in runfile
execfile(filename, namespace)
File "/Applications/Spyder-Py2.app/Contents/Resources/lib/python2.7/spyderlib/widgets/externalshell/sitecustomize.py", line 78, in execfile
builtins.execfile(filename, *where)
File "/Users/username/Workspace/Python/docx2txt.py", line 41, in <module>
x = get_docx_text("/Users/username/Desktop/file.docx")
File "/Users/username/Workspace/Python/docx2txt.py", line 23, in get_docx_text
document = zipfile.ZipFile(path, "r")
File "zipfile.pyc", line 770, in __init__
File "zipfile.pyc", line 811, in _RealGetContents
BadZipfile: File is not a zip file
Does anyone have any advice to get this code to work?

I don't think this is an encryption problem, for two reasons:
Decryption is not attempted when the ZipFile object is created. Methods like ZipFile.extractall, extract, and open, and read take an optional pwd parameter containing the password, but the object constructor / initializer does not.
Your stack trace indicates that the BadZipFile is being raised when you create the ZipFile object, before you call setpassword:
document = zipfile.ZipFile(path, "r")
I'd look carefully for other differences between the two files you're testing: ownership, permissions, security context (if you have that on your OS), ... even filename differences can cause a framework to "not see" the file you're working on.
Also --- the obvious one --- try opening the encrypted zip file with your zip-compatible command of choice. See if it really is a zip file.
I tested this by opening an encrypted zip file in Python 3.1, while "forgetting" to provide a password. I could create the ZipFile object (the variable zfile below) without any error, but got a RuntimeError --- not a BadZipFile exception --- when I tried to read a file without providing a password:
Traceback (most recent call last):
File "./zf.py", line 35, in <module>
main()
File "./zf.py", line 29, in main
print_checksums(zipfile_name)
File "./zf.py", line 22, in print_checksums
for checksum in checksum_contents(zipfile_name):
File "./zf.py", line 13, in checksum_contents
inner_file = zfile.open(inner_filename, "r")
File "/usr/lib64/python3.1/zipfile.py", line 903, in open
"password required for extraction" % name)
RuntimeError: File apache.log is encrypted, password required for extraction
I was also able to raise a BadZipfile exception, once by trying to open an empty file and once by trying to open some random logfile text that I'd renamed to a ".zip" extension. The two test files produced identical stack traces, down to the line numbers.
Traceback (most recent call last):
File "./zf.py", line 35, in <module>
main()
File "./zf.py", line 29, in main
print_checksums(zipfile_name)
File "./zf.py", line 22, in print_checksums
for checksum in checksum_contents(zipfile_name):
File "./zf.py", line 10, in checksum_contents
zfile = zipfile.ZipFile(zipfile_name, "r")
File "/usr/lib64/python3.1/zipfile.py", line 706, in __init__
self._GetContents()
File "/usr/lib64/python3.1/zipfile.py", line 726, in _GetContents
self._RealGetContents()
File "/usr/lib64/python3.1/zipfile.py", line 738, in _RealGetContents
raise BadZipfile("File is not a zip file")
zipfile.BadZipfile: File is not a zip file
While this stack trace isn't exactly the same as yours --- mine has a call to _GetContents, and the pre-3.2 "small f" spelling of BadZipfile --- but they're close enough that I think this is the kind of problem you're dealing with.

Related

Python error :OSError: [Errno 9] Bad file descriptor

I have wanted to transform DOCX file using docx library. Everytime I run it i get this error
OSError: [Errno 9] Bad file descriptor
The code is :
from docx import Document
def bionify(path_to_text: str) -> None:
doc = Document(path_to_text)
new_doc = Document()
all_paragraphs = doc.paragraphs
for paragraph in all_paragraphs:
word_list = paragraph.text.split(' ')
new_paragraph = new_doc.add_paragraph()
for word in word_list:
i = 0
while i < len(word):
if i == 0 or i == 1:
new_paragraph.add_run(word[i]).bold = True
else:
new_paragraph.add_run(word[i]).bold = False
i += 1
new_paragraph.add_run(' ')
# Input the path to the document that you wish to save to:
new_doc.save('sample_output.docx')
if __name__ == '__main__':
# Input the path to the document containing your text file you wish to read from:
bionify(r'C:\Users\###\Desktop\bionic python reader transformer\BionicTexterizer\sample_input.docx')
I have changed the destination, python package, python version to run it. But every time I get OSError: [Errno 9] Bad file descriptor
Complete tracepack:
Traceback (most recent call last):
File "c:\Users\####\Desktop\bionic python reader transformer\BionicTexterizer\main.py", line 62, in <module>
bionify(r'C:\Users\####\Desktop\bionic python reader transformer\BionicTexterizer\sample_input.docx')
File "c:\Users\####\Desktop\bionic python reader transformer\BionicTexterizer\main.py", line 57, in bionify
new_doc.save('sample_output.docx')
File "C:\Python310\lib\site-packages\docx\document.py", line 135, in save
self._part.save(path_or_stream)
File "C:\Python310\lib\site-packages\docx\parts\document.py", line 111, in save
self.package.save(path_or_stream)
File "C:\Python310\lib\site-packages\docx\opc\package.py", line 172, in save
PackageWriter.write(pkg_file, self.rels, self.parts)
File "C:\Python310\lib\site-packages\docx\opc\pkgwriter.py", line 33, in write
PackageWriter._write_content_types_stream(phys_writer, parts)
File "C:\Python310\lib\site-packages\docx\opc\pkgwriter.py", line 45, in _write_content_types_stream
phys_writer.write(CONTENT_TYPES_URI, cti.blob)
File "C:\Python310\lib\site-packages\docx\opc\phys_pkg.py", line 155, in write
self._zipf.writestr(pack_uri.membername, blob)
File "C:\Python310\lib\zipfile.py", line 1810, in writestr
with self.open(zinfo, mode='w') as dest:
File "C:\Python310\lib\zipfile.py", line 1176, in close
self._fileobj.seek(self._zinfo.header_offset)
OSError: [Errno 9] Bad file descriptor
Exception ignored in: <function ZipFile.__del__ at 0x0000022D9BF4BEB0>
Traceback (most recent call last):
File "C:\Python310\lib\zipfile.py", line 1815, in __del__
self.close()
File "C:\Python310\lib\zipfile.py", line 1837, in close
self._fpclose(fp)
File "C:\Python310\lib\zipfile.py", line 1937, in _fpclose
fp.close()
Windows 11. It is a problem with windows 11. I have ran the code without any problems on windows 10. There seems to some package permission issues.

Getting error 'NotImplementedError("That compression method is not supported")' when extracting zipfile in python3.9

I have read through the Python documentation about zip files and watched a couple of videos, but everything didn't work. I'm using Kali Linux, so that the password has to be encoded in bytes.
Here is my code, with which I have tried:
import zipfile
import string
import traceback
def try_function(zip, pwd):
try:
zip.extractall(pwd=pwd.encode())
print("Yes")
except TypeError:
print("No")
z = zipfile.ZipFile("test.txt.zip")
pwd_local = "abc"
if __name__ == '__main__':
try_function(z, pwd_local)
But I always get the same error:
Traceback (most recent call last):
File "ZipWorker.py", line 22, in <module>
try_function(z, pwd_list)
File "ZipWorker.py", line 11, in crack
zip.extractall(pwd.encode())
File "/usr/lib/python3.9/zipfile.py", line 1633, in extractall
self._extract_member(zipinfo, path, pwd)
File "/usr/lib/python3.9/zipfile.py", line 1686, in _
extract_member
with self.open(member, pwd=pwd) as source, \
File "/usr/lib/python3.9/zipfile.py", line 1559, in open
return ZipExtFile(zef_file, mode, zinfo, pwd, True)
File "/usr/lib/python3.9/zipfile.py", line 797, in __init__
self._decompressor = _get_decompressor(self._compress_type)
File "/usr/lib/python3.9/zipfile.py", line 698, in
_get_decompressor
_check_compression(compress_type)
File "/usr/lib/python3.9/zipfile.py", line 678, in
_check_compression
raise NotImplementedError("That compression method is not
supported")
NotImplementedError: That compression method is not supported
Does anyone know how to do this? I'm using python3.9.
So I finally find out, why the code above doesn't work.
When you are creating a zipfile with for example 7zip, this zip file will be encrypted.
But the encryption isn't in bytes, it's encrypted in the hashes: AES-256 or ZipCrypto.

How can read Minecraft .mca files so that in python I can extract individual blocks?

I can't find a way of reading the Minecraft world files in a way that i could use in python
I've looked around the internet but can find no tutorials and only a few libraries that claim that they can do this but never actually work
from nbt import *
nbtfile = nbt.NBTFile("r.0.0.mca",'rb')
I expected this to work but instead I got errors about the file not being compressed or something of the sort
Full error:
Traceback (most recent call last):
File "C:\Users\rober\Desktop\MinePy\MinecraftWorldReader.py", line 2, in <module>
nbtfile = nbt.NBTFile("r.0.0.mca",'rb')
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nbt\nbt.py", line 628, in __init__
self.parse_file()
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nbt\nbt.py", line 652, in parse_file
type = TAG_Byte(buffer=self.file)
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nbt\nbt.py", line 99, in __init__
self._parse_buffer(buffer)
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nbt\nbt.py", line 105, in _parse_buffer
self.value = self.fmt.unpack(buffer.read(self.fmt.size))[0]
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\gzip.py", line 276, in read
return self._buffer.read(size)
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\gzip.py", line 463, in read
if not self._read_gzip_header():
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\gzip.py", line 411, in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'\x00\x00')
Use anvil parser. (Install with pip install anvil-parser)
Reading
import anvil
region = anvil.Region.from_file('r.0.0.mca')
# You can also provide the region file name instead of the object
chunk = anvil.Chunk.from_region(region, 0, 0)
# If `section` is not provided, will get it from the y coords
# and assume it's global
block = chunk.get_block(0, 0, 0)
print(block) # <Block(minecraft:air)>
print(block.id) # air
print(block.properties) # {}
https://pypi.org/project/anvil-parser/
According to this page, the .mca files is not totally kind of of NBT file. It begins with an 8KiB header which includes the offsets of chunks in the region file itself and the timestamps for the last updates of those chunks.
I recommend you to see the offical announcement and this page for more information.

Tablib xlsx file badZip file issue

I am getting error on opening xlsx extension file in windows 8 using tablib library.
python version - 2.7.14
error is as follows:
python suit_simple_sheet_product.py
Traceback (most recent call last):
File "suit_simple_sheet_product.py", line 19, in <module>
data = tablib.Dataset().load(open(BASE_PATH).read())
File "C:\Python27\lib\site-packages\tablib\core.py", line 446, in load
format = detect_format(in_stream)
File "C:\Python27\lib\site-packages\tablib\core.py", line 1157, in detect_format
if fmt.detect(stream):
File "C:\Python27\lib\site-packages\tablib\formats\_xls.py", line 25, in detect
xlrd.open_workbook(file_contents=stream)
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 120, in open_workbook
zf = zipfile.ZipFile(timemachine.BYTES_IO(file_contents))
File "C:\Python27\lib\zipfile.py", line 770, in __init__
self._RealGetContents()
File "C:\Python27\lib\zipfile.py", line 811, in _RealGetContents
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file
path location is as follows =
BASE_PATH = 'C:\Users\anju\Downloads\automate\catalog-5090 fabric detail and price list.xlsx'
Excel .xlsx files are actually zip files. In order for the unzip to work correctly, the file must be opened in binary mode, as such your need to open the file using:
import tablib
BASE_PATH = r'c:\my folder\my_test.xlsx'
data = tablib.Dataset().load(open(BASE_PATH, 'rb').read())
print data
Add r before your string to stop Python from trying to interpret the backslash characters in your path.

Python script to unzip and print one line of a file

I am trying a simple example of retrieving data from a file and printing only one line of the output. I get semicolon error around encoded and 'r'.
import gzip
data = gzip.open('pagecounts-20130601-000000.gz', 'r')
encoded=data.read()
print encoded[2]
It gives this error:
Traceback (most recent call last):
File "filter_articles.scpt", line 4, in <module> encoded=data.read()
File "/usr/lib/python2.7/gzip.py", line 249, in read self._read(readsize)
File "/usr/lib/python2.7/gzip.py", line 308, in _read self._add_read_data( uncompress )
File "/usr/lib/python2.7/gzip.py", line 326, in _add_read_data self.extrabuf = self.extrabuf[offset:] + data MemoryError
I guess this is because the file is huge and was not able to read the content? What could be better way to print few lines of the file?
I am assuming that:
You meant to have quotes around the file name in your script.
You actually want the third line (as your post suggests) and not the third character (as your script suggests)
In this case the following should work:
import gzip
data = gzip.open('pagecounts-20130601-000000.gz', 'r')
data.readline()
data.readline()
print data.readline()

Categories

Resources