I'm using zipfile and under some circumstance I need to create an empty zip file for some placeholder purpose. How can I do this?
I know this:
Changed in version 2.7.1: If the file is created with mode 'a' or 'w'
and then closed without adding any files to the archive, the
appropriate ZIP structures for an empty archive will be written to the
file.
but my server uses a lower version as 2.6.
You can create an empty zip file without the need to zipfile as:
empty_zip_data = b'PK\x05\x06\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
with open('empty.zip', 'wb') as zip:
zip.write(empty_zip_data)
empty_zip_data is the data of an empty zip file.
You can simply do:
from zipfile import ZipFile
archive_name = 'test_file.zip'
with ZipFile(archive_name, 'w') as file:
pass
Related
There a technic of store ZIP archive concatenated with some other file (e. g. with EXE to store additional resources or with JPEG for steganography). Python's ZipFile supports such files (e. g. if you open ZipFile in "a" mode on non-ZIP file, it will append ZIP headers to the end). I would like to update such archive (possible add, update and delete files from ZIP archive).
Python's ZipFile doesn't support deleting and overriding of the files inside the archive, only appending, so the only way for me is completely recreate ZIP file with new contents. But I need to conserve the main file in which ZIP was embedded. If I just open it in "w" mode, the whole file has completed overridden.
I need a way how to remove a ZIP file from the end of an ordinary file. I'd prefer use only functions which are available in Python 3 standard library.
I found a solution:
min_header_offset = None
with ZipFile(output_filename, "r") as zip_file:
for info in zip_file.infolist():
if min_header_offset is None or info.header_offset < min_header_offset:
min_header_offset = info.header_offset
# Here also possible to save existing files if them needed for update
if min_header_offset is not None:
with open(output_filename, "r+b") as f:
f.truncate(min_header_offset)
# Somehow populate new archive contents
with ZipFile(args.output, "a") as zip_file:
for input_filename in input_filenames:
zip_file.write(input_filename)
It clears the archive, but don't touch anything what is going before the archive.
I am having a peculiar problem when writing zip files through to_csv.
Using GZIP:
df.to_csv(path_or_buf = 'sample.csv.gz', compression="gzip", index = None, sep = ",", header=True, encoding='utf-8-sig')
gives a neat gzip file with name 'sample.csv.gz' and inside it I get my csv 'sample.csv'
However, things change when using ZIP
df.to_csv(path_or_buf = 'sample.csv.zip', compression="zip", index = None, sep = ",", header=True, encoding='utf-8-sig')
gives a zip file with name 'sample.csv.zip', but inside it the csv has been renamed to 'sample.csv.zip' as well.
Removing the extra '.zip' from the file gives the csv back.
How can I implement zip extension without this issue?
I need to have zip files as a requirement that I can't bypass.
I am using python 2.7 on windows 10 machine.
Thanks in advance for help.
It is pretty straightforward in pandas since version 1.0.0 using dict as compression options:
filename = 'sample'
compression_options = dict(method='zip', archive_name=f'{filename}.csv')
df.to_csv(f'{filename}.zip', compression=compression_options, ...)
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
As the thread linked in the comment discusses, ZIP's directory-like nature makes it hard to do what you want without making a lot of assumptions or complicating the arguments for to_csv
If your goal is to write the data directly to a ZIP file, that's harder than you'd think.
If you can bear temporarily writing your data to the filesystem, you can use Python's zipfile module to put that file in a ZIP with the name you preferred, and then delete the file.
import zipfile
import os
df.to_csv('sample.csv',index=None,sep=",",header=True,encoding='utf-8-sig')
with zipfile.ZipFile('sample.zip', 'w') as zf:
zf.write('sample.csv')
os.remove('sample.csv')
Since Pandas 1.0.0 it's possible to set compression using to_csv().
Example in one line:
df.to_csv('sample.zip', compression={'method': 'zip', 'archive_name': 'sample.csv'})
I am currently using extratall function in python to unzip, after unziping it also creates a folder like: myfile.zip -> myfile/myfile.zip , how do i get rid of myfile flder and just unzip it to the current folder without the folder, is it possible ?
I use the standard module zipfile. There is the method extract which provides what I think you want. This method has the optional argument path to either extract the content to the current working directory or the the given path
import os, zipfile
os.chdir('path/of/my.zip')
with zipfile.ZipFile('my.zip') as Z :
for elem in Z.namelist() :
Z.extract(elem, 'path/where/extract/to')
If you omit the 'path/where/extract/to' the files from the ZIP-File will be extracted to the directory of the ZIP-File.
import shutil
# loop over everything in the zip
for name in myzip.namelist():
# open the entry so we can copy it
member = myzip.open(name)
with open(os.path.basename(name), 'wb') as outfile:
# copy it directly to the output directory,
# without creating the intermediate directory
shutil.copyfileobj(member, outfile)
Is anyone can provide example how to create zip file from csv file using Python/Pandas package?
Thank you
Use
df.to_csv('my_file.gz', compression='gzip')
From the docs:
compression : string, optional
a string representing the compression to use in the output file, allowed values are ‘gzip’, ‘bz2’, ‘xz’, only used when the first
argument is a filename
See discussion of support of zip files here.
In the to_csv() method of pandas, besides the compression type (gz, zip etc) you can specify the archive file name - just pass the dict with necessary params as the compression parameter:
compression_opts = dict(method='zip',
archive_name='out.csv')
df.to_csv('out.zip', compression=compression_opts)
In the example above, the first argument of the to_csv method defines the name of the [ZIP] archive file, the method key of the dict defines [ZIP] compression type and the archive_name key of the dict defines the name of the [CSV] file inside the archive file.
Result:
├─ out.zip
│ └─ out.csv
See details in to_csv() pandas docs
In response to Stefan's answer, add '.csv.gz' for the zip csv file to work
df.to_csv('my_file.csv.gz', compression='gzip')
Hope that helps
The Pandas to_csv compression has some security vulnerabilities where it leaves the absolute path of the file in the zip archive on Linux machine. Not to mention one might want to save a file in the highest level of a zipped file. The following function addresses this issue by using zipfile. On top of that, it doesn't suffer from pickle protocol change (4 to 5).
from pathlib import Path
import zipfile
def save_compressed_df(df, dirPath, fileName):
"""Save a Pandas dataframe as a zipped .csv file.
Parameters
----------
df : pandas.core.frame.DataFrame
Input dataframe.
dirPath : str or pathlib.PosixPath
Parent directory of the zipped file.
fileName : str
File name without extension.
"""
dirPath = Path(dirPath)
path_zip = dirPath / f'{fileName}.csv.zip'
txt = df.to_csv(index=False)
with zipfile.ZipFile(path_zip, 'w', zipfile.ZIP_DEFLATED) as zf:
zf.writestr(f'{fileName}.csv', txt)
I'm trying to extract a specific file from a zip archive using python.
In this case, extract an apk's icon from the apk itself.
I am currently using
with zipfile.ZipFile('/path/to/my_file.apk') as z:
# extract /res/drawable/icon.png from apk to /temp/...
z.extract('/res/drawable/icon.png', 'temp/')
which does work, in my script directory it's creating temp/res/drawable/icon.png which is temp plus the same path as the file is inside the apk.
What I actually want is to end up with temp/icon.png.
Is there any way of doing this directly with a zip command, or do I need to extract, then move the file, then remove the directories manually?
You can use zipfile.ZipFile.open:
import shutil
import zipfile
with zipfile.ZipFile('/path/to/my_file.apk') as z:
with z.open('/res/drawable/icon.png') as zf, open('temp/icon.png', 'wb') as f:
shutil.copyfileobj(zf, f)
Or use zipfile.ZipFile.read:
import zipfile
with zipfile.ZipFile('/path/to/my_file.apk') as z:
with open('temp/icon.png', 'wb') as f:
f.write(z.read('/res/drawable/icon.png'))