Create zip file from in memory file in python - python

I am retrieving files from S3 bucket using the following code an it works fine.
file = io.BytesIO()
k.get_contents_to_file(file)
Now I want to add this in memory file to a zip file. The code below takes filename as argument but I have an in memory file.
zip_file.write(filename, zip_path)
I am using python 3.4 for my project.

Try to use writestr
Signature: writestr(zinfo_or_arcname, data, compress_type=None)
Docstring: Write a file into the archive. The contents is 'data',
which may be either a 'str' or a 'bytes' instance; if it is a 'str',
it is encoded as UTF-8 first. 'zinfo_or_arcname' is either a ZipInfo
instance or the name of the file in the archive.

Related

Python creating binary zip file

How do I create the binary contents of a zip file from .csv file binary contents? I don't want to actually write any files to memory.
For instance, I have tried zipObj = ZipFile(outputZipFileName, 'w'), but that requires a file name, which means it is not just a file in binary format.
EDIT: I just found the answer at https://www.neilgrogan.com/py-bin-zip/
The file in zipfile.ZipFile() can be an actual disk file or a file-like-object. Your solution probably lies in the io.BytesIO or io.StringIO classes. These let you create bytes or strings in memory and treat them like files in other functions and classes that take file-like-objects.
Answer is at https://www.neilgrogan.com/py-bin-zip/.
Turns out, using BytesIO alongside zipfile was the ticket!

Python: how to pass a file from a zip to a function that reads data from that file

I have a zip-file that contains .nrrd type files. The pynrrd lib comes with a read function. How can I pull the .nrrd file from the zip and pass it to the nrrd.read() function?
I tried following, but that gives the following error at the nrrd.read() line:
TypeError was unhandled by user code, file() argument 1 must be
encoded string without NULL bytes, not str
in_dir = r'D:\Temp\Slikvideo\JPEG\SV_4_1_mask'
zip_file = 'Annotated.mitk'
zf = zipfile.ZipFile(in_dir + '\\' + zip_file)
f_name = 'datafile.nrrd' # .nrrd file in zip
file_nrrd = zf.read(f_name) # pull the file from the zip
img_nrrd, options = nrrd.read(file_nrrd) # read the .nrrd image data from the file
I could write the file pulled from the .zip to disk, and then read it from disk with nrrd.read() but I am sure there is a better way.
I think that your is a good way...
Here there is a similar question:
Similar question
Plus answer:
I think that the problem maybe is that when you use zipfile.ZipFile you not set the attribute:
Try using:
zipfile.ZipFile (path,"r")
The following works:
file_nrrd = zf.extract(f_name) # extract the file from the zip

Reading individual bz2 files from a tar file

I'm trying to read many bz2 files within a tar file, a file has the following structure:
2013-01.tar
01\01\00\X.json.bz2\X.json
01\01\02\X.json.bz2\X.json
I'm able to get the filenames as follows:
import tarfile
tar = tarfile.open(filepath, 'r')
tar_members_names = [filename for filename in tar.getnames()]
# Side question: How would I only return files and no directories?
Which returns a list of the .bz2 files. Now I'm trying to extract them (temporarily) using:
inner_filename = tar_members_names[0]
t_extract = tar.extractfile(inner_filename)
The following code to extract the json file returns an error, however. How would I go about retrieving the JSON files line by line?
import bz2
txt = bz2.BZ2File(t_extract)
TypeError: coercing to Unicode: need string or buffer, ExFileObject found
txt = bz2.decompress(t_extract)
TypeError: must be convertible to a buffer, not ExFileObject
I've been unable to figure out how to return a buffer from the tar file instead of the current ExFileObject (how to convert it to a buffer?), any suggestions are greatly appreciated.
BZ2File expects a file name as first argument and you pass a file object (i.e. an object which has the same API as what Python returns for open()).
To do what you want, you'll have to read all the bytes from t_extract yourself and call bz2.decompress(data) or use BZ2Decompressor to stream the data through it.

How can i find the path of tempfile in django

I am using pdftk like this
pdftk template.pdf fill_form /temp/input.fdf output /temp/output.pdf
Now this is working fine
But now i have generated the temporary file instead of /temp/input.fdf with this
myfile = tempfile.NamedTemporaryFile()
myfile.write(fdf)
myfile.seek(0)
myfile.close()
Now i don't know how can i pass myfile as input to the pdftk
myfile.name will get you the file path.
Note that tempfiles do not exist after close(). From the docs:
tempfile.TemporaryFile([mode='w+b'[, bufsize=-1[, suffix=''[,
prefix='tmp'[, dir=None]]]]])
Return a file-like object that can be used as a temporary storage
area. The file is created using mkstemp(). It will be destroyed as
soon as it is closed (including an implicit close when the object is
garbage collected). Under Unix, the directory entry for the file is
removed immediately after the file is created. Other platforms do not
support this; your code should not rely on a temporary file created
using this function having or not having a visible name in the file
system.
Source: http://docs.python.org/2/library/tempfile.html
Can't you get the name using
myfile = tempfile.NamedTemporaryFile()
myfile.write(fdf)
myfile.seek(0)
myfile.close()
print(myfile.name)

Validating a zip file coming from stdin

After some frustration with unzip(1L), I've been trying to create a script that will unzip and print out raw data from all of the files inside a zip archive that is coming from stdin. I currently have the following, which works:
import sys, zipfile, StringIO
stdin = StringIO.StringIO(sys.stdin.read())
zipselect = zipfile.ZipFile(stdin)
filelist = zipselect.namelist()
for filename in filelist:
print filename, ':'
print zipselect.read(filename)
When I try to add validation to check if it truly is a zip file, however, it doesn't like it.
...
zipcheck = zipfile.is_zipfile(zipselect)
if zipcheck is not None:
print 'Input is not a zip file.'
sys.exit(1)
...
results in
File "/home/chris/simple/zipcat/zipcat.py", line 13, in <module>
zipcheck = zipfile.is_zipfile(zipselect)
File "/usr/lib/python2.7/zipfile.py", line 149, in is_zipfile
result = _check_zipfile(fp=filename)
File "/usr/lib/python2.7/zipfile.py", line 135, in _check_zipfile
if _EndRecData(fp):
File "/usr/lib/python2.7/zipfile.py", line 203, in _EndRecData
fpin.seek(0, 2)
AttributeError: ZipFile instance has no attribute 'seek'
I assume it can't seek because it is not a file, as such?
Sorry if this is obvious, this is my first 'go' with Python.
You should pass stdin to is_zipfile, not zipselect. is_zipfile takes a path to a file or a file object, not a ZipFile.
See the zipfile.is_zipfile documentation
You are correct that a ZipFile can't seek because it isn't a file. It's an archive, so it can contain many files.
To do this entirely in memory will take some work. The AttributeError message means that the is_zipfile method is trying to use the seek method of the file handle you provide. But standard input is not seekable, and therefore your file object for it has no seek method.
If you really, really can't store the file on disk temporarily, then you could buffer the entire file in memory (you would need to enforce a size limit for security), and then implement some "duck" code that looks and acts like a seekable file object but really just uses the byte-string in memory.
It is possible that you could cheat and buffer only enough of the data for is_zipfile to do its work, but I seem to recall that the table-of-contents for ZIP is at the end of the file. I could be wrong about that though.
Your 2011 python2 fragment was: StringIO.StringIO(sys.stdin.read())
In 2018 a python3 programmer might phrase that as: io.StringIO(...).
What you wanted was the following python3 fragment: io.BytesIO(...).
Certainly that works well for me when using the requests module to download binary ZIP files from webservers:
zf = zipfile.ZipFile(io.BytesIO(req.content))

Categories

Resources