Python ZipFile: remove embedded archive from a containing file - python

There a technic of store ZIP archive concatenated with some other file (e. g. with EXE to store additional resources or with JPEG for steganography). Python's ZipFile supports such files (e. g. if you open ZipFile in "a" mode on non-ZIP file, it will append ZIP headers to the end). I would like to update such archive (possible add, update and delete files from ZIP archive).
Python's ZipFile doesn't support deleting and overriding of the files inside the archive, only appending, so the only way for me is completely recreate ZIP file with new contents. But I need to conserve the main file in which ZIP was embedded. If I just open it in "w" mode, the whole file has completed overridden.
I need a way how to remove a ZIP file from the end of an ordinary file. I'd prefer use only functions which are available in Python 3 standard library.

I found a solution:
min_header_offset = None
with ZipFile(output_filename, "r") as zip_file:
for info in zip_file.infolist():
if min_header_offset is None or info.header_offset < min_header_offset:
min_header_offset = info.header_offset
# Here also possible to save existing files if them needed for update
if min_header_offset is not None:
with open(output_filename, "r+b") as f:
f.truncate(min_header_offset)
# Somehow populate new archive contents
with ZipFile(args.output, "a") as zip_file:
for input_filename in input_filenames:
zip_file.write(input_filename)
It clears the archive, but don't touch anything what is going before the archive.

Related

How to avoid subfolders creation while zipping files?

I am trying to zip the files from the list localpath_list in to one zip file `reports.zip.
It works as expected, but when I extract the reports.zip file, there are folders created inside it.
i.e all the .xls files are under files/sample/.
what I need is just the .xls files without any folder structure.
localpath_list = ["files/sample/sample1.xls", "files/sample/sample2.xls", "files/sample/sample3.xls"]
with zipfile.ZipFile(fr"downloads/reports.zip", 'w') as zipF:
for file in localpath_list:
zipF.write(file, compress_type=zipfile.ZIP_DEFLATED)
According to: [Python.Docs]: zipfile - ZipFile.write(filename, arcname=None, compress_type=None, compresslevel=None) (emphasis is mine):
Write the file named filename to the archive, giving it the archive name arcname (by default, this will be the same as filename, ...
So you should use:
zipF.write(file, arcname=os.path.basename(file), compress_type=zipfile.ZIP_DEFLATED)
If expanding the functionality (to include files from multiple folders) is in plan, you should pay attention to duplicate file base names (in different folders).

How can I save all the generated images from the following code to a Zip file using python?

I need to decode the data in the content_arrays list and generate an image , The following code does that
content_arrays = ['ljfdslkfjaslkfjsdlf' , 'sdfasfsdfsdfsafs'] // Contains a list of base64 encoded data
i=0
for content in content_arrays:
img_data = (content_arrays[i])
with open(filename, "wb") as fh:
fh.write(base64.b64decode(img_data))
i=i+1
How can I store all the generated images directly to a single zip file which contains all the images that are generated by decoding the base64 string from the above list[content_arrays].
Current File Structure of the downloaded data ::
-- Desktop
-- image1.png
-- image2.png
Required File Structure of the downloaded data ::
-- Desktop
-- Data.zip
-- image1.png
-- image2.png
I've used python zipfile module , but couldn't figure out things.
If there is any possible way , please do give your suggestions ..
you can just use the zipfile module and then write the content to separate files in the zip. In this example i am just writing the content to a file inside the zip for each item in contents list. I am also using the writestr method here so i dont have to have physical files on the disk i can just create my content in memory and write it in my zip rather then having to first create it as a file on the OS and then write the file in the zip
from zipfile import ZipFile
with ZipFile("data.zip", "w") as my_zip:
content_arrays = ['ljfdslkfjaslkfjsdlf', 'sdfasfsdfsdfsafs']
for index, content in enumerate(content_arrays):
#do what ever you need to do here with your content
my_zip.writestr(f'file_{index}.txt', content)
OUTPUT
In your case, you can iterate over the list of filenames
with ZipFile('images.zip', 'w') as zip_obj:
# Add multiple files to the zip
for filename in filenames:
zip_obj.write(filename)

is it possible to collect comment data form multiple zip files without unzipping?

Hello is it possible to collect the comment data of a zip file from multiple files?(as the optional comment you get on the side when opening a Zip or a Rar file)
and if so, where exactly does the comment gets stored?
You can do something like:
from zipfile import ZipFile
zipfiles = ["example.zip",]
for zfile in zipfiles:
print("Opening: {}".format(zfile))
with ZipFile(zfile, 'r') as testzip:
print(testzip.comment) # comment for entire zip
l = testzip.infolist() #list all files in archive
for finfo in l:
# per file/directory comments
print("{}:{}".format(finfo.filename, finfo.comment))
Check http://www.artpol-software.com/ZipArchive/KB/0610242300.aspx for more information on how and where metadata is stored in zip files.

Unzip folder by chunks in python

I have a big zip file containing many files that i'd like to unzip by chunks to avoid consuming too much memory.
I tried to use python module zipfile but I didn't find a way to load the archive by chunk and to extract it on disk.
Is there simple way to do that in python ?
EDIT
#steven-rumbalski correctly pointed that zipfile correctly handle big files by unzipping the files one by one without loading the full archive.
My problem here is that my zip file is on AWS S3 and that my EC2 instance cannot load such a big file in RAM so I download it by chunks and I would like to unzip it by chunk.
You don't need a special way to extract a large archive to disk. The source Lib/zipfile.py shows that zipfile is already memory efficient. Creating a zipfile.ZipFile object does not read the whole file into memory. Rather it just reads in the table of contents for the ZIP file. ZipFile.extractall() extracts files one at a time using shutil.copyfileobj() copying from a subclass of io.BufferedIOBase.
If all you want to do is a one-time extraction Python provides a shortcut from the command line:
python -m zipfile -e archive.zip target-dir/
You can use zipfile (or possibly tarfile) as follows:
import zipfile
def extract_chunk(fn, directory, ix_begin, ix_end):
with zipfile.ZipFile("{}/file.zip".format(directory), 'r') as zf:
infos = zf.infolist()
print(infos)
for ix in range(max(0, ix_begin), min(ix_end, len(infos))):
zf.extract(infos[ix], directory)
zf.close()
directory = "path"
extract_chunk("{}/file.zip".format(directory), directory, 0, 50)

How to create an empty zip file?

I'm using zipfile and under some circumstance I need to create an empty zip file for some placeholder purpose. How can I do this?
I know this:
Changed in version 2.7.1: If the file is created with mode 'a' or 'w'
and then closed without adding any files to the archive, the
appropriate ZIP structures for an empty archive will be written to the
file.
but my server uses a lower version as 2.6.
You can create an empty zip file without the need to zipfile as:
empty_zip_data = b'PK\x05\x06\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
with open('empty.zip', 'wb') as zip:
zip.write(empty_zip_data)
empty_zip_data is the data of an empty zip file.
You can simply do:
from zipfile import ZipFile
archive_name = 'test_file.zip'
with ZipFile(archive_name, 'w') as file:
pass

Categories

Resources