Can't unzip a folder in Python

Can't unzip a folder in Python - python

I tried unzipping a file through Python using zipfile.extractAll but it gave BAD zip file, hence I tried this:
zipfile cant handle some type of zip data?
As mentioned in this answer, i used the code:
def fixBadZipfile(zipFile):
f = open(zipFile, 'r+b')
data = f.read()
pos = data.find('\x50\x4b\x05\x06') # End of central directory signature
if (pos > 0):
self._log("Truncating file at location " + str(pos + 22) + ".")
f.seek(pos + 22) # size of 'ZIP end of central directory record'
f.truncate()
f.close()
else:
# raise error, file is truncated enter code here
but it gave the error
Message File Name Line Position Traceback
C:\Users\aditya1.r\Desktop\Python_pyscripter\module1.py 50
main C:\Users\aditya1.r\Desktop\Python_pyscripter\module1.py 17
fixBadZipfile C:\Users\aditya1.r\Desktop\Python_pyscripter\module1.py 37
TypeError: 'str' does not support the buffer interface
I'm using Python 3.4
How can i unzip this file?

import subprocess
subprocess.Popen('unzip ' + file_name, shell = True).wait()
Hope this help you :)

You are reading the file as a bytes object but trying to find passing a string object so just simply change this line -
pos = data.find('\x50\x4b\x05\x06')
to
pos = data.find(b'\x50\x4b\x05\x06')
Note that I have casted it to a byte object by simply prepending a b.
You don't need to do this is Python 2.X but in python 3.X you need to explicitly serialize a string object to a byte object.

Related

Stuck at translation of FTP-uploadscript from Python2.x towards Python3.x

Python script for ftp-upload of various types of files from local Raspberry to remote Webserver:
original is running on several Raspberries under Python2.x & Raspian_Buster (and earlier Raspian_versions) without any problems.
The txt-file for this upload is generated by a lua-script-setup like the one below
file = io.open("/home/pi/PVOutput_Info.txt", "w+")
-- Opens a file named PVOutput_Info.txt (stored under the designated sub-folder of Domoticz)
file:write(" === PV-generatie & Consumptie === \n")
file:write(" Datum = " .. WSDatum .. "\n")
file:write(" Tijd = " .. WSTijd .. "\n")
file:close() -- closes the open file
os.execute("chmod a+rw /home/pi/PVTemp_Info.txt")
Trying to upgrade this simplest version towards use with Python3.x & Raspian_Bullseye, but stuck with solving the reported error.
It looks as if the codec now has a problem with a byte 0xb0 in the txt-file.
Any remedy or hint to circumvent this problem?
#!/usr/bin/python3
# (c)2017 script compiled by Toulon7559 from various material from forums, version 0.1 for upload of *.txt to /
# Original script running under Python2.x and Raspian_Buster
# Version 0165P3 of 20230201 is an experimental adaptation towards Python3.x and Raspian_Bullseye
# --------------------------------------------------
# Line006 = Function for FTP_UPLOAD to Server
# --------------------------------------------------
# Imports for script-operation
import ftplib
import os
# Definition of Upload_function
def upload(ftp, file):
ext = os.path.splitext(file)[1]
if ext in (".txt", ".htm", ".html"):
ftp.storlines("STOR " + file, open(file))
else:
ftp.storbinary("STOR " + file, open(file, "rb"), 1024)
# --------------------------------------------------
# Line020 = Actual FTP-Login & -Upload
# --------------------------------------------------
ftp = ftplib.FTP("<FTP_server>")
ftp.login("<Login_UN>", "<login_PW>")
# set path to destination directory
ftp.cwd('/')
# set path to source directory
os.chdir("/home/pi/")
# upload of TXT-files
upload(ftp, "PVTemp_Info.txt")
upload(ftp, "PVOutput_Info.txt")
# reset path to root
ftp.cwd('/')
print ('End of script Misc_Upload_0165P3')
print
Putty_CLI_Command
sudo python3 /home/pi/domoticz/scripts/python/Misc_upload_0165P3a.py
Resulting report at Putty's CLI
Start of script Misc_Upload_0165P3
Traceback (most recent call last):
File "/home/pi/domoticz/scripts/python/Misc_upload_0165P3a.py", line 39, in <module>
upload(ftp, "PVTemp_Info.txt")
File "/home/pi/domoticz/scripts/python/Misc_upload_0165P3a.py", line 25, in upload
ftp.storlines("STOR " + file, open(file))
File "/usr/lib/python3.9/ftplib.py", line 519, in storlines
buf = fp.readline(self.maxline + 1)
File "/usr/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 175: invalid start byte

I'm afraid that there's no easy mapping to the Python 3. Two simple, but not 1:1 solutions for Python 3 would be:
Consider uploading all files using a binary mode. I.e. get rid of the
if ext in (".txt", ".htm", ".html"):
ftp.storlines("STOR " + file, open(file))
else:
Or open the text file using the actual encoding that the files use (you have to find out):
open(file, encoding='cp1252')
See Error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
If you really need the exact functionality that you had in Python 2 (that is: Upload any text file, in whatever encoding, using FTP text transfer mode), it would be more complicated. The Python 2 basically just translates any of CR/LF EOL sequences in the file to CRLF (what is the requirement of the FTP specification), keeping the rest of the file intact.
You can copy FTP.storbinary code and implement the above translation of buf byte-wise (without decoding/recording which Python 3 FTP.storlines/readline does).
If the files are not huge, a simple implementation is to load whole file to memory, convert in memory and upload. This is not difficult, if you know that all your files use the same EOL sequence. If not, the translation might be more difficult.
Or you may even give up on the translation, as most FTP servers do not care (they can handle any common EOL sequence). Just use the FTP.storbinary code as it is, only change TYPE I to TYPE A (what you need to do even if you implement the translation as per the previous point).
Btw, you also need to close the file in any case, so the correct code would be like:
with open(file) as f:
ftp.storlines("STOR " + file, f)
Likewise for storbinary.

Python script that executes another python script

I'm pretty new to the world of python. I decided to do a project but came to a stop, after my script wouldn't execute the right way. In which I mean the script that I need to be executed on its own through another script keeps on giving me nothing or some syntax error instead of all the stuff that is supposed to be happening (converting files). The other script in question writes new lines into the other script to change the file name (to be converted) to the newest file. The file looks something like this:
import glob
import os.path
folder_path = r'C:\User\Desktop\Folder\Audio'
file_type = r'\*mp4'
files = glob.glob(folder_path + file_type)
max_file = max(files, key=os.path.getctime)
mp3_file = max_file.replace('.mp4', '')
with open ("file.py", 'w') as f:
f.write("")
with open ("file.py", 'w') as f:
f.write('from moviepy.editor import *\n' "mp4_file = '{}'\n"
"mp3_file = '{}.mp3'\n" 'videoclip = VideoFileClip(mp4_file)\n' 'audioclip = videoclip.audio\n'
'audioclip.write_audiofile(mp3_file)\n' 'audioclip.close()\n' 'videoclip.close()\n'.format(max_file, mp3_file))
exec(open("file.py").read())
Right now it gives this error:
Traceback (most recent call last):
File "C:\Users\Desktop\Folder\Audio\File Manager.py", line 19, in <module>
exec(open("file.py").read())
File "<string>", line 2
mp4_file = 'C:\User\Desktop\Folder\Audio\test.mp4'
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
I plan not on using that exact line of code to execute my python file since there are many alternatives, but if I was on the right trail, then I might as well. The other file that's supposed to be executed has generic file converting code:
from moviepy.editor import *
mp4_file = 'C:\User\Desktop\Folder\Audio\test.mp4'
mp3_file = 'C:\User\Desktop\Folder\Audio\test.mp3'
videoclip = VideoFileClip(mp4_file)
audioclip = videoclip.audio
audioclip.write_audiofile(mp3_file)
audioclip.close()
videoclip.close()
Other solutions mostly gave me a blank inactive shell; if the answer to this problem that it's impossible, then it might as well be, and I'll take that as a valid answer, but please explain why.

Corrections
You are using different quotes while writing to the file, from single quotes ' to double ", update it to be more consistent.
The error is suggesting that while writing to the file it is also writing some unicode characters which it cannot read hence the unicode error (look at where the carrot ^ is pointing at, it's a blank space since it's not a printable character).
Suggestions
Don't just write to a file and then immediately read from it. Different operating systems have different behaviour for such repeated access which will give you strange issues (this is not your issue tho)
Just create a function extractMp3FromVideoFile which takes two arguements max_file and mp3_file
Instead of writing to a file and increasing the HDD IO simply put the file's code into a variable and then exec it.
Solution
import glob
import os.path
folder_path = r'C:\User\Desktop\Folder\Audio'
file_type = r'\*mp4'
files = glob.glob(folder_path + file_type)
max_file = max(files, key=os.path.getctime)
mp3_file = max_file.replace('.mp4', '')
code = "from moviepy.editor import *\nmp4_file = '{}'\nmp3_file = '{}.mp3'\nvideoclip = VideoFileClip(mp4_file)\naudioclip = videoclip.audio\naudioclip.write_audiofile(mp3_file)\naudioclip.close()\nvideoclip.close()\n".format(max_file, mp3_file)
exec(code)

Getting "TypeError: pwd: expected bytes, got str" when extracting zip file with password

I want to unzip a specific named file located in a specific directory.
The name of the file = happy.zip.
The location = C:/Users/desktop/Downloads.
I want to extract all the files to C:/Users/desktop/Downloads(the same location)
I tried:
import zipfile
import os
in_Zip = r"C:/Users/desktop/Downloads/happy.zip"
outDir = r"C:/Users/desktop/Downloads"
z = zipfile.ZipFile(in_Zip, 'r')
z.extractall(outDir, pwd='1234!')
z.close
But I got:
"TypeError: pwd: expected bytes, got str"

In Python 2: '1234!' = byte string
In Python 3: '1234!' = unicode string
Assuming you are using Python 3, you need to either use b'1234!' or encode the string to get byte string using str.encode() this is useful if you have the password saved as a string passwd = '1234!' then you can use:
z.extractall(outDir, pwd=passwd.encode())
or use byte string directly:
z.extractall(outDir, pwd=b'1234!')

Please note that this will only work if the zip file has been encrypted using the "Zip legacy encryption" option when setting up the password.

Check Size of File Assigned to a Variable in Python 2

I'm working on some code (that I have been given), as part of a coursework at University.
Part of the code requires that, we check the file we are writing to contains any data.
The file has been opened for writing within the code already:
f = open('newfile.txt', 'w')
Initially, I thought that I would just find the length of the file, however if I try: len(f)>512, I get an error:
TypeError: object of type 'file' has no len()
I have had a bit of a google and found various links, such as here, however when I try using the line: os.stat(f).st_size > 512, I get the following error message:
TypeError: coercing to Unicode: need string or buffer, file found
If I try and use the filename itself: os.stat("newfile.txt").st_size > 512, it works fine.
My question is, is there a way that I can use the variable that the file has been assigned to, f, or is this just not possible?
For context, the function looks like this:
def doData ():
global data, newblock, lastblock, f, port
if f.closed:
print "File " + f.name + " closed"
elif os.stat(f).st_size>512:
f.write(data)
lastblock = newblock
doAckLast()
EDIT: Thanks for the link to the other post Morgan, however this didn't work for me. The main thing is that the programs are still referencing the file by file path and name, whereas, I need to reference it by variable name instead.

According to effbot's Getting Information About a File page,
The os module also provides a fstat function, which can be used on an
opened file. It takes an integer file handle, not a file object, so
you have to use the fileno method on the file object:
This function returns the same values as a corresponding call to os.stat.
f = open("file.dat")
st = os.fstat(f.fileno())
if f.closed:
print "File " + f.name + " closed"
elif st.st_size>512:
f.write(data)
lastblock = newblock
doAckLast()

unzipping file results in "BadZipFile: File is not a zip file"

I have two zip files, both of them open well with Windows Explorer and 7-zip.
However when i open them with Python's zipfile module [ zipfile.ZipFile("filex.zip") ], one of them gets opened but the other one gives error "BadZipfile: File is not a zip file".
I've made sure that the latter one is a valid Zip File by opening it with 7-Zip and looking at its properties (says 7Zip.ZIP). When I open the file with a text editor, the first two characters are "PK", showing that it is indeed a zip file.
I'm using Python 2.5 and really don't have any clue how to go about for this. I've tried it both with Windows as well as Ubuntu and problem exists on both platforms.
Update: Traceback from Python 2.5.4 on Windows:
Traceback (most recent call last):
File "<module1>", line 5, in <module>
zipfile.ZipFile("c:/temp/test.zip")
File "C:\Python25\lib\zipfile.py", line 346, in init
self._GetContents()
File "C:\Python25\lib\zipfile.py", line 366, in _GetContents
self._RealGetContents()
File "C:\Python25\lib\zipfile.py", line 378, in _RealGetContents
raise BadZipfile, "File is not a zip file"
BadZipfile: File is not a zip file
Basically when the _EndRecData function is called for getting data from End of Central Directory" record, the comment length checkout fails [ endrec[7] == len(comment) ].
The values of locals in the _EndRecData function are as following:
END_BLOCK: 4096,
comment: '\x00',
data: '\xd6\xf6\x03\x00\x88,N8?<e\xf0q\xa8\x1cwK\x87\x0c(\x82a\xee\xc61N\'1qN\x0b\x16K-\x9d\xd57w\x0f\xa31n\xf3dN\x9e\xb1s\xffu\xd1\.....', (truncated)
endrec: ['PK\x05\x06', 0, 0, 4, 4, 268, 199515, 0],
filesize: 199806L,
fpin: <open file 'c:/temp/test.zip', mode 'rb' at 0x045D4F98>,
start: 4073

files named file can confuse python - try naming it something else. if it STILL wont work, try this code:
def fixBadZipfile(zipFile):
f = open(zipFile, 'r+b')
data = f.read()
pos = data.find('\x50\x4b\x05\x06') # End of central directory signature
if (pos > 0):
self._log("Trancating file at location " + str(pos + 22)+ ".")
f.seek(pos + 22) # size of 'ZIP end of central directory record'
f.truncate()
f.close()
else:
# raise error, file is truncated

I run into the same issue. My problem was that it was a gzip instead of a zip file. I switched to the class gzip.GzipFile and it worked like a charm.

astronautlevel's solution works for most cases, but the compressed data and CRCs in the Zip can also contain the same 4 bytes. You should do an rfind (not find), seek to pos+20 and then add write \x00\x00 to the end of the file (tell zip applications that the length of the 'comments' section is 0 bytes long).
# HACK: See http://bugs.python.org/issue10694
# The zip file generated is correct, but because of extra data after the 'central directory' section,
# Some version of python (and some zip applications) can't read the file. By removing the extra data,
# we ensure that all applications can read the zip without issue.
# The ZIP format: http://www.pkware.com/documents/APPNOTE/APPNOTE-6.3.0.TXT
# Finding the end of the central directory:
# http://stackoverflow.com/questions/8593904/how-to-find-the-position-of-central-directory-in-a-zip-file
# http://stackoverflow.com/questions/20276105/why-cant-python-execute-a-zip-archive-passed-via-stdin
# This second link is only losely related, but echos the first, "processing a ZIP archive often requires backwards seeking"
content = zipFileContainer.read()
pos = content.rfind('\x50\x4b\x05\x06') # reverse find: this string of bytes is the end of the zip's central directory.
if pos>0:
zipFileContainer.seek(pos+20) # +20: see secion V.I in 'ZIP format' link above.
zipFileContainer.truncate()
zipFileContainer.write('\x00\x00') # Zip file comment length: 0 byte length; tell zip applications to stop reading.
zipFileContainer.seek(0)
return zipFileContainer

I had the same problem and was able to solve this issue for my files, see my answer at
zipfile cant handle some type of zip data?

I'm very new at python and i was facing the exact same issue, none of the previous methods were working.
Trying to print the 'corrupted' file just before unzipping it returned an empty byte object.
Turned out, I was trying to unzip the file right after writing it to disk, without closing the file handler.
with open(path, 'wb') as outFile:
outFile.write(data)
outFile.close() # was missing this
with zipfile.ZipFile(path, 'r') as zip:
zip.extractall(destination)
Closing the file stream then unzipping the file resolved my issue.

Sometime there are zip file which contain corrupted files and upon unzipping the zip gives badzipfile error. but there are tools like 7zip winrar which ignores these errors and successfully unzip the zip file. you can create a sub process and use this code to unzip your zip file without getting BadZipFile Error.
import subprocess
ziploc = "C:/Program Files/7-Zip/7z.exe" #location where 7zip is installed
cmd = [ziploc, 'e',your_Zip_file.zip ,'-o'+ OutputDirectory ,'-r' ]
sp = subprocess.Popen(cmd, stderr=subprocess.STDOUT, stdout=subprocess.PIPE)

Show the full traceback that you got from Python -- this may give a hint as to what the specific problem is. Unanswered: What software produced the bad file, and on what platform?
Update: Traceback indicates having problem detecting the "End of Central Directory" record in the file -- see function _EndRecData starting at line 128 of C:\Python25\Lib\zipfile.py
Suggestions:
(1) Trace through the above function
(2) Try it on the latest Python
(3) Answer the question above.
(4) Read this and anything else found by google("BadZipfile: File is not a zip file") that appears to be relevant

I faced this problem and was looking for a good and clean solution; But there was no solution until I found this answer. I had the same problem that #marsl (among the answers) had. It was a gzipfile instead of a zipfile in my case.
I could unarchive and decompress my gzipfile with this approach:
with tarfile.open(archive_path, "r:gz") as gzip_file:
gzip_file.extractall()

Have you tried a newer python, or if that is too much trouble, simply a newer zipfile.py? I have successfully used a copy of zipfile.py from Python 2.6.2 (latest at the time) with Python 2.5 in order to open some zip files that weren't supported by Py2.5s zipfile module.

In some cases, you have to confirm if the zip file is actually in gzip format. this was the case for me and i solved it by :
import requests
import tarfile
url = ".tar.gz link"
response = requests.get(url, stream=True)
file = tarfile.open(fileobj=response.raw, mode="r|gz")
file.extractall(path=".")

for this this happened when the file wasn't downloaded fully I think. So I just delete it in my download code.
def download_and_extract(url: str,
path_used_for_zip: Path = Path('~/data/'),
path_used_for_dataset: Path = Path('~/data/tmp/'),
rm_zip_file_after_extraction: bool = True,
force_rewrite_data_from_url_to_file: bool = False,
clean_old_zip_file: bool = False,
gdrive_file_id: Optional[str] = None,
gdrive_filename: Optional[str] = None,
):
"""
Downloads data and tries to extract it according to different protocols/file types.
note:
- to force a download do:
force_rewrite_data_from_url_to_file = True
clean_old_zip_file = True
- to NOT remove file after extraction:
rm_zip_file_after_extraction = False
Tested with:
- zip files, yes!
Later:
- todo: tar, gz, gdrive
force_rewrite_data_from_url_to_file = remvoes the data from url (likely a zip file) and redownloads the zip file.
"""
path_used_for_zip: Path = expanduser(path_used_for_zip)
path_used_for_zip.mkdir(parents=True, exist_ok=True)
path_used_for_dataset: Path = expanduser(path_used_for_dataset)
path_used_for_dataset.mkdir(parents=True, exist_ok=True)
# - download data from url
if gdrive_filename is None: # get data from url, not using gdrive
import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
print("downloading data from url: ", url)
import urllib
import http
response: http.client.HTTPResponse = urllib.request.urlopen(url, context=ctx)
print(f'{type(response)=}')
data = response
# save zipfile like data to path given
filename = url.rpartition('/')[2]
path2file: Path = path_used_for_zip / filename
else: # gdrive case
from torchvision.datasets.utils import download_file_from_google_drive
# if zip not there re-download it or force get the data
path2file: Path = path_used_for_zip / gdrive_filename
if not path2file.exists():
download_file_from_google_drive(gdrive_file_id, path_used_for_zip, gdrive_filename)
filename = gdrive_filename
# -- write downloaded data from the url to a file
print(f'{path2file=}')
print(f'{filename=}')
if clean_old_zip_file:
path2file.unlink(missing_ok=True)
if filename.endswith('.zip') or filename.endswith('.pkl'):
# if path to file does not exist or force to write down the data
if not path2file.exists() or force_rewrite_data_from_url_to_file:
# delete file if there is one if your going to force a rewrite
path2file.unlink(missing_ok=True) if force_rewrite_data_from_url_to_file else None
print(f'about to write downloaded data from url to: {path2file=}')
# wb+ is used sinze the zip file was in bytes, otherwise w+ is fine if the data is a string
with open(path2file, 'wb+') as f:
# with open(path2file, 'w+') as f:
print(f'{f=}')
print(f'{f.name=}')
f.write(data.read())
print(f'done writing downloaded from url to: {path2file=}')
elif filename.endswith('.gz'):
pass # the download of the data doesn't seem to be explicitly handled by me, that is done in the extract step by a magic function tarfile.open
# elif is_tar_file(filename):
# os.system(f'tar -xvzf {path_2_zip_with_filename} -C {path_2_dataset}/')
else:
raise ValueError(f'File type {filename=} not supported.')
# - unzip data written in the file
extract_to = path_used_for_dataset
print(f'about to extract: {path2file=}')
print(f'extract to target: {extract_to=}')
if filename.endswith('.zip'):
import zipfile # this one is for zip files, inspired from l2l
zip_ref = zipfile.ZipFile(path2file, 'r')
zip_ref.extractall(extract_to)
zip_ref.close()
if rm_zip_file_after_extraction:
path2file.unlink(missing_ok=True)
elif filename.endswith('.gz'):
import tarfile
file = tarfile.open(fileobj=response, mode="r|gz")
file.extractall(path=extract_to)
file.close()
elif filename.endswith('.pkl'):
# no need to extract it, but when you use the data make sure you torch.load it or pickle.load it.
print(f'about to test torch.load of: {path2file=}')
data = torch.load(path2file) # just to test
assert data is not None
print(f'{data=}')
pass
else:
raise ValueError(f'File type {filename=} not supported, edit code to support it.')
# path_2_zip_with_filename = path_2_ziplike / filename
# os.system(f'tar -xvzf {path_2_zip_with_filename} -C {path_2_dataset}/')
# if rm_zip_file:
# path_2_zip_with_filename.unlink(missing_ok=True)
# # raise ValueError(f'File type {filename=} not supported.')
print(f'done extracting: {path2file=}')
print(f'extracted at location: {path_used_for_dataset=}')
print(f'-->Succes downloading & extracting dataset at location: {path_used_for_dataset=}')
you can use my code with pip install ultimate-utils for the most up to date version.

In the other case, this warning showing up when the ml/dl model has different format.
For the example:
you want to open pickle, but the model format is .sav
Solution:
you need to change the format to original format
pickle --> .pkl
tensorflow --> .h5
etc.

In my case, the zip file itself was missing from that directory - thus when I tried to unzip it, I got the error "BadZipFile: File is not a zip file". It got resolved after I moved the .zip file to the directory. Please confirm that the file is indeed present in your directory before running the python script.

In my case, the zip file was corrupted. I was trying to download the zip file with urllib.request.urlretrieve but the file wouldn't completely download for some reason.
I connected to a VPN, the file downloaded just fine, and I was able to open the file.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.