How to zip a file in python? - python

I have been trying to make a python script to zip a file with the zipfile module. Although the text file is made into a zip file, It doesn't seem to be compressing it; testtext.txt is 1024KB whilst testtext.zip (The code's creation) is also equal to 1024KB. However, if I compress testtext.txt manually in File Explorer, the resulting zip file is compressed (To 2KB, specifically). How, if possible, can I combat this logical error?
Below is the script that I have used to (unsuccessfully) zip a text file.
from zipfile import ZipFile
textFile = ZipFile("compressedtextstuff.zip", "w")
textFile.write("testtext.txt")
textFile.close()

Well that's odd. Python's zipfile defaults to the stored compression method, which does not compress! (Why would they do that?)
You need to specify a compression method. Use ZIP_DEFLATED, which is the most widely supported.
import zipfile
zip = zipfile.ZipFile("stuff.zip", "w", zipfile.ZIP_DEFLATED)
zip.write("test.txt")
zip.close()

From the https://docs.python.org/3/library/zipfile.html#zipfile-objects it suggest example:
with ZipFile('spam.zip', 'w') as myzip:
myzip.write('eggs.txt')
So your code will be
from zipfile import ZipFile
with ZipFile('compressedtextstuff.zip', 'w', zipfile.ZIP_DEFLATED) as myzip:
myzip.write('testtext.txt')

https://docs.python.org/3/library/zipfile.html#:~:text=with%20ZipFile(%27spam.zip%27%2C%20%27w%27)%20as%20myzip%3A%0A%20%20%20%20myzip.write(%27eggs.txt%27)
In the docs they have it written with a with statement so I would try that first.
Edit:
I just came back to say that you have to specify your compression method but Mark beat me to the punch.
Here is a link to a StackOverflow post about it
https://stackoverflow.com/questions/4166447/python-zipfile-module-doesnt-seem-to-be-compressing-my-files#:~:text=This%20is%20because%20ZipFile%20requires,the%20method%20to%20be%20zipfile.

Related

Force python to read file with arbitrary extension as .txt file from archive using zipfile [duplicate]

So say I have a zip file named "files.zip"
it contains
"text1.txt":
words
and
"text2.txt":
other words
How do I tell python to open and read the text1.txt file? I know that usually to open a text file outside of a zip file I would just do this:
file = open('text1.txt','r')
If you need to open a file inside a ZIP archive in text mode, e.g. to pass it to csv.reader, you can do so with io.TextIOWrapper:
import io
import zipfile
with zipfile.ZipFile("files.zip") as zf:
with io.TextIOWrapper(zf.open("text1.txt"), encoding="utf-8") as f:
...
You can use the zipfile module like so:
zip = zipfile.ZipFile('test.zip')
file = zip.read('text1.txt')
Don't forget to import zipfile module: import zipfile
Since Python 3.8, it's been possible to construct Path objects for zipfile contents, and use their read_text method to read them as text. Since Python 3.9 it's been possible to specify text mode in the path object's open method.
with zipfile.ZipFile('spam.zip') as zf:
# Create a path object.
path = zipfile.Path(zf, at='somedir/somefile.txt')
# Read all the contents (Python 3.8+):
contents = path.read(encoding='UTF-8')
# Or open as as file (Python 3.9+):
with path.open(encoding='UTF-8') as f:
# Do stuff

Large Zip Files with Zipfile Module Python

I have never used the zip file module before. I have a directory that contains thousands of zip files i need to process. These files can be up to 6GB big. I have looked through some documentation but a lot of them are not clear on what the best methods are for reading large zip files without needing to extract.
I stumbled up this: Read a large zipped text file line by line in python
So in my solution I tried to emulate it and use it like I would reading a normal text file with the with open function
with open(odfslogp_obj, 'rb', buffering=102400) as odfslog
So I wrote the following based off the answer from that link:
for odfslogp_obj in odfslogs_plist:
with zipfile.ZipFile(odfslogp_obj, mode='r') as z:
with z.open(buffering=102400) as f:
for line in f:
print(line)
But this gives me an "unexpected keyword" error for z.open()
Question is, is there documentation that explains what keywords, the z.open() function would take? I only found one for the .ZipFile() function.
I wanna make sure my code isn't using up too much memory while processing these files line by line.
odfslogp_obj is a Path object btw
When I take off the buffering and just have z.open(), I get an error saying: TypeError: open() missing 1 required positional argument: 'name'
Once you've opened the zipfile, you still need to open the individual files it contains. That the second z.open you had problems with. Its not the builtin python open and it doesn't have a "buffering" parameter. See ZipFile.open
Once the zipfile is opened you can enumate its files and open them in turn. ZipFile.open opens in binary mode, which may be a different problem, depending on what you want to do with the file.
for odfslogp_obj in odfslogs_plist:
with zipfile.ZipFile(odfslogp_obj, mode='r') as z:
for name in z.namelist():
with z.open(name) as f:
for line in f:
print(line)

Does gzip has an extract method like tarfile.extract?

For example tarfile.extractall(path) extracts the contents to specified direcotry. Similarly does gzip has any extract method to get the gzip member (contains only one file as per standards) to the specified directory or is there any workaround?
Edit: I don't want to read the complete file in to the memory
No, as gzip is only a compression format and not an archive one, there are no extract methods in the gzip module nor in the gzip.GzipFile class. But you have no reason to load the complete file in memory, you can just copy it in chunks. The manual gives an example on how to compress a file, it can be easily adapted to uncompress it:
import gzip
import shutil
with open('/home/joe/file.txt', 'wb') as f_out:
with gzip.open('/home/joe/file.txt.gz', 'rb') as f_in:
shutil.copyfileobj(f_in, f_out)
shutil.copyfileobj is meant to process copies in chunks.

How to compress a processed text file in Python?

I have a text file which I constantly append data to. When processing is done I need to gzip the file. I tried several options like shutil.make_archive, tarfile, gzip but could not eventually do it. Is there no simple way to compress a file without actually writing to it?
Let's say I have mydata.txt file and I want it to be gzipped and saved as mydata.txt.gz.
I don't see the problem. You should be able to use e.g. the gzip module just fine, something like this:
inf = open("mydata.txt", "rb")
outf = gzip.open("file.txt.gz", "wb")
outf.write(inf.read())
outf.close()
inf.close()
There's no problem with the file being overwritten, the name given to gzip.open() is completely independent of the name given to plain open().
If you want to compress a file without writing to it, you could run a shell command such as gzip using the Python libraries subprocess or popen or os.system.

Using GZIP Module with Python

I'm trying to use the Python GZIP module to simply uncompress several .gz files in a directory. Note that I do not want to read the files, only uncompress them. After searching this site for a while, I have this code segment, but it does not work:
import gzip
import glob
import os
for file in glob.glob(PATH_TO_FILE + "/*.gz"):
#print file
if os.path.isdir(file) == False:
shutil.copy(file, FILE_DIR)
# uncompress the file
inF = gzip.open(file, 'rb')
s = inF.read()
inF.close()
the .gz files are in the correct location, and I can print the full path + filename with the print command, but the GZIP module isn't getting executed properly. what am I missing?
If you get no error, the gzip module probably is being executed properly, and the file is already getting decompressed.
The precise definition of "decompressed" varies on context:
I do not want to read the files, only uncompress them
The gzip module doesn't work as a desktop archiving program like 7-zip - you can't "uncompress" a file without "reading" it. Note that "reading" (in programming) usually just means "storing (temporarily) in the computer RAM", not "opening the file in the GUI".
What you probably mean by "uncompress" (as in a desktop archiving program) is more precisely described (in programming) as "read a in-memory stream/buffer from a compressed file, and write it to a new file (and possibly delete the compressed file afterwards)"
inF = gzip.open(file, 'rb')
s = inF.read()
inF.close()
With these lines, you're just reading the stream. If you expect a new "uncompressed" file to be created, you just need to write the buffer to a new file:
with open(out_filename, 'wb') as out_file:
out_file.write(s)
If you're dealing with very large files (larger than the amount of your RAM), you'll need to adopt a different approach. But that is the topic for another question.
You're decompressing file into s variable, and do nothing with it. You should stop searching stackoverflow and read at least python tutorial. Seriously.
Anyway, there's several thing wrong with your code:
you need is to STORE the unzipped data in s into some file.
there's no need to copy the actual *.gz files. Because in your code, you're unpacking the original gzip file and not the copy.
you're using file, which is a reserved word, as a variable. This is not
an error, just a very bad practice.
This should probably do what you wanted:
import gzip
import glob
import os
import os.path
for gzip_path in glob.glob(PATH_TO_FILE + "/*.gz"):
if os.path.isdir(gzip_path) == False:
inF = gzip.open(gzip_path, 'rb')
# uncompress the gzip_path INTO THE 's' variable
s = inF.read()
inF.close()
# get gzip filename (without directories)
gzip_fname = os.path.basename(gzip_path)
# get original filename (remove 3 characters from the end: ".gz")
fname = gzip_fname[:-3]
uncompressed_path = os.path.join(FILE_DIR, fname)
# store uncompressed file data from 's' variable
open(uncompressed_path, 'w').write(s)
You should use with to open files and, of course, store the result of reading the compressed file. See gzip documentation:
import gzip
import glob
import os
import os.path
for gzip_path in glob.glob("%s/*.gz" % PATH_TO_FILE):
if not os.path.isdir(gzip_path):
with gzip.open(gzip_path, 'rb') as in_file:
s = in_file.read()
# Now store the uncompressed data
path_to_store = gzip_fname[:-3] # remove the '.gz' from the filename
# store uncompressed file data from 's' variable
with open(path_to_store, 'w') as f:
f.write(s)
Depending on what exactly you want to do, you might want to have a look at tarfile and its 'r:gz' option for opening files.
I was able to resolve this issue by using the subprocess module:
for file in glob.glob(PATH_TO_FILE + "/*.gz"):
if os.path.isdir(file) == False:
shutil.copy(file, FILE_DIR)
# uncompress the file
subprocess.call(["gunzip", FILE_DIR + "/" + os.path.basename(file)])
Since my goal was to simply uncompress the archive, the above code accomplishes this. The archived files are located in a central location, and are copied to a working area, uncompressed, and used in a test case. the GZIP module was too complicated for what I was trying to accomplish.
Thanks for everyone's help. It is much appreciated!
I think there is a much simpler solution than the others presented given the op only wanted to extract all the files in a directory:
import glob
from setuptools import archive_util
for fn in glob.glob('*.gz'):
archive_util.unpack_archive(fn, '.')

Categories

Resources