How do I persist to disk a temporary file using Python?

How do I persist to disk a temporary file using Python? - python

I am attempting to use the 'tempfile' module for manipulating and creating text files. Once the file is ready I want to save it to disk. I thought it would be as simple as using 'shutil.copy'. However, I get a 'permission denied' IOError:
>>> import tempfile, shutil
>>> f = tempfile.TemporaryFile(mode ='w+t')
>>> f.write('foo')
>>> shutil.copy(f.name, 'bar.txt')
Traceback (most recent call last):
File "<pyshell#5>", line 1, in <module>
shutil.copy(f.name, 'bar.txt')
File "C:\Python25\lib\shutil.py", line 80, in copy
copyfile(src, dst)
File "C:\Python25\lib\shutil.py", line 46, in copyfile
fsrc = open(src, 'rb')
IOError: [Errno 13] Permission denied: 'c:\\docume~1\\me\\locals~1\\temp\\tmpvqq3go'
>>>
Is this not intended when using the 'tempfile' library? Is there a better way to do this? (Maybe I am overlooking something very trivial)

hop is right, and dF. is incorrect on why the error occurs.
Since you haven't called f.close() yet, the file is not removed.
The doc for NamedTemporaryFile says:
Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).
And for TemporaryFile:
Under Unix, the directory entry for the file is removed immediately after the file is created. Other platforms do not support this; your code should not rely on a temporary file created using this function having or not having a visible name in the file system.
Therefore, to persist a temporary file (on Windows), you can do the following:
import tempfile, shutil
f = tempfile.NamedTemporaryFile(mode='w+t', delete=False)
f.write('foo')
file_name = f.name
f.close()
shutil.copy(file_name, 'bar.txt')
os.remove(file_name)
The solution Hans Sjunnesson provided is also off, because copyfileobj only copies from file-like object to file-like object, not file name:
shutil.copyfileobj(fsrc, fdst[, length])
Copy the contents of the file-like object fsrc to the file-like object fdst. The integer length, if given, is the buffer size. In particular, a negative length value means to copy the data without looping over the source data in chunks; by default the data is read in chunks to avoid uncontrolled memory consumption. Note that if the current file position of the fsrc object is not 0, only the contents from the current file position to the end of the file will be copied.

The file you create with TemporaryFile or NamedTemporaryFile is automatically removed when it's closed, which is why you get an error. If you don't want this, you can use mkstemp instead (see the docs for tempfile).
>>> import tempfile, shutil, os
>>> fd, path = tempfile.mkstemp()
>>> os.write(fd, 'foo')
>>> os.close(fd)
>>> shutil.copy(path, 'bar.txt')
>>> os.remove(path)

Starting from python 2.6 you can also use NamedTemporaryFile with the delete= option set to False. This way the temporary file will be accessible, even after you close it.
Note that on Windows (NT and later) you cannot access the file a second time while it is still open. You have to close it before you can copy it. This is not true on Unix systems.

You could always use shutil.copyfileobj, in your example:
new_file = open('bar.txt', 'rw')
shutil.copyfileobj(f, new_file)

Related

Do I need to close the OS-level handle returned to me by tempfile.mkstemp if not using it?

I need to create a temporary file to write some data out to in Python 3. The file will be written to via a separate module which deals with opening the file from a path given as a string.
I'm using tempfile.mkstemp() to create this temporary file and according to the docs:
mkstemp() returns a tuple containing an OS-level handle to an open file (as would be returned by os.open()) and the absolute pathname of that file, in that order.
Given I'm not going to be using the open OS-level file handle given to me, do I need to close it? I understand about regular Python file handles and closing them but I'm not familiar with OS-level file descriptors/handles.
So is this better:
fd, filename = tempfile.mkstemp()
os.close(fd)
Or can I simply just do this:
_, output_filename = tempfile.mkstemp()

The returned file descriptor is not a file object, the garbage collector will not close it for you.
You should use:
fd, filename = tempfile.mkstemp()
os.close(fd)
The returned file descriptor is useful to avoid race conditions where the filename is replaced with a symbolic link to a file that the attacker can not read but you can which can result in data exposure.

Specifying file name with mkstemp, file not found

I need to be able to create a temporary file with a specified file name and write data to it, then zip said file with filename up along with other files:
fd, path = tempfile.mkstemp(".bin", "filename", "~/path/to/working/directory/")
try:
with os.fdopen(fd, "wb") as tmp:
tmp.write(data)
with ZipFile("zip.zip", "w") as zip:
zip.write("filename")
zip.writestr("file2", file2_str)
zip.writestr("file3", file3_str)
# ...
finally:
os.remove(path)
I think I must be misunderstanding how mkstemp works, I get the error at the first line of code here:
FileNotFoundError: [Errno 2] No such file or directory: '~/path/to/working/directory/filenameq5st7dey.bin'
It looks like a bunch of garbage gets added to the file name before the suffix is put on the file. I've tried this without a suffix and I still get garbage at the end of the file name.
Aside from the garbage in the file name, why do I get a file not found error instead of having a temporary file created in my directory with that name (plus garbage)?

You supplied this argument:
"~/path/to/working/directory/"
Perfectly natural, it makes sense why you would supply it. But it is wrong. If you ls . you likely will not find a ~ directory.
What you were hoping for was expansion to ${HOME}, as the Bourne shell does. In python we must call this function:
os.path.expanduser("~/path/to/working/directory/")
Print the result it returns and you'll see why it's essential.
Some folks prefer to have pathlib do the work for them:
from pathlib import Path
Path("~/path/to/working/directory/").expanduser()

shutil.move deletes the file on windows when new name contains a colon (and an extension)

Try:
import os, shutil
wd = os.path.abspath(os.path.curdir)
newfile = os.path.join(wd, 'testfile')
print str(newfile)
with open(newfile, 'w') as f: f.write('Hello bugs')
shutil.move(newfile, os.path.join(wd, 'testfile:.txt')) # note the :
Now check the directory - newfile is deleted and no other file is created - Process finished with exit code 0.
If however you issue:
shutil.move(newfile, os.path.join(wd, 'testfile:')) # note no extension
it blows with:
Traceback (most recent call last):
File "C:/Users/MrD/.PyCharm40/config/scratches/scratch_3", line 9, in <module>
shutil.move(newfile, os.path.join(wd, 'testfile:'))
File "C:\_\Python27\lib\shutil.py", line 302, in move
copy2(src, real_dst)
File "C:\_\Python27\lib\shutil.py", line 130, in copy2
copyfile(src, dst)
File "C:\_\Python27\lib\shutil.py", line 83, in copyfile
with open(dst, 'wb') as fdst:
IOError: [Errno 22] invalid mode ('wb') or filename: 'C:\\Users\\MrD\\.PyCharm40\\config\\scratches\\testfile:'
as it should.
Is it a bug ?
Context: I was testing the behavior of my code when illegal filenames were given (: is illegal in windows filenames) when to my amazement my program deleted the original file (bad!) and created a zero size file with the attributes of the original (yes in my case the file was created, just empty) and filename the filename given up to the : - soo a filename like textfile:.jpg gave me a zero byte textfile. It took a lot of debugging - here is the little critter inside the Python27\lib\shutil.py copyfile() (the line that blows above and did not blow):
I don't know why in my case the file was created though while when running the script no.

This isn't a bug in Python's shutil or os modules, it's just a weirdness in Windows. Peter Wood's link in the comments discusses "Advanced Data Streams" -- a Windows filesystem mechanism that attaches a hidden file containing metadata to a regular, visible file. A key word there is attached; The hidden file is deleted if the file it is attached to is deleted.
It appears that a colon is used to separate the path of the regular file from the hidden file. For example, if in the command line you write:
> notepad foo
Then close notepad, and write
> notepad foo.txt:bar
Notepad will open the hidden file. Go ahead and write something in it, save, and close. Typing > dir and the command line will only show foo.txt, not foo.txt:bar.txt. But sure enough, if you write
> notepad foo.txt:bar.txt
the file you just edited will appear, and your changes will be intact.
So what is happening with your Python code? The documentation for shutil.move says:
src is copied (using shutil.copy2()) to dst and then removed.
So when you move testfile to testfile:.txt, Python first copies testfile to the hidden testfile:.txt. But then it removes testfile, and by doing so removes the hidden testfile:.txt. Therefore it appears to you that the original file has been deleted, and no new file has been created.
The following snippet of code might make this clearer (I've saved it as demo.py, and I'm running it in the same, other-wise empty directory):
import os, shutil
with open('test', 'w') as f:
f.write('Hello bugs')
shutil.copy2('test', 'test:foo.txt')
with open('test:foo.txt') as f:
print(f.read())
print 'test: exists? ', os.path.exists('test')
print 'test:foo.txt exists? ', os.path.exists('test:foo.txt')
print os.listdir('.')
print('removing...')
os.remove('test')
print 'test: exists? ', os.path.exists('test')
print 'test:foo.txt exists? ', os.path.exists('test:foo.txt')
print os.listdir('.')
This prints:
Hello bugs
test exists? True
test:foo.txt exists? True
['demo.py', 'test']
removing...
test: exists? False
test:foo.txt exists? False
['demo.py']
This shows that we can create a normal file, write to it, and copy that normal file to its hidden stream, open, and read it just fine, and the result is as expected. Then we see that os.path.exists shows that both test and it's hidden attachment test:foo.txt exist, even though os.listdir only shows test. Then we delete test and we see that test:foo.txt no longer exists as well.
Lastly, you can't create a hidden data stream without a name, therefore test: is an invalid path. Python correctly throws an exception in this case.
So the Python code is actually functioning as it should under Windows -- "Alternate Data Streams" are just such a little-known "feature" that this behavior is surprising.

How to generate temporary file in django and then destroy

I am doing some file processing and for generating the file i need to generate some temporary file from existing data and then use that file as input to my function.
But i am confused where should i save that file and then delete it.
Is there any temp location where files automatically gets deleted after user session

Python has the tempfile module for exactly this purpose. You do not need to worry about the location/deletion of the file, it works on all supported platforms.
There are three types of temporary files:
tempfile.TemporaryFile - just basic temporary file,
tempfile.NamedTemporaryFile - "This function operates exactly as TemporaryFile() does, except that the file is guaranteed to have a visible name in the file system (on Unix, the directory entry is not unlinked). That name can be retrieved from the name attribute of the file object.",
tempfile.SpooledTemporaryFile - "This function operates exactly as TemporaryFile() does, except that data is spooled in memory until the file size exceeds max_size, or until the file’s fileno() method is called, at which point the contents are written to disk and operation proceeds as with TemporaryFile().",
EDIT: The example usage you asked for could look like this:
>>> with TemporaryFile() as f:
f.write('abcdefg')
f.seek(0) # go back to the beginning of the file
print(f.read())
abcdefg

You should use something from the tempfile module. I think that it has everything you need.

I would add that Django has a built-in NamedTemporaryFile functionality in django.core.files.temp which is recommended for Windows users over using the tempfile module. This is because the Django version utilizes the O_TEMPORARY flag in Windows which prevents the file from being re-opened without the same flag being provided as explained in the code base here.
Using this would look something like:
from django.core.files.temp import NamedTemporaryFile
temp_file = NamedTemporaryFile(delete=True)
Here is a nice little tutorial about it and working with in-memory files, credit to Mayank Jain.

I just added some important changes: convert str to bytes and a command call to show how external programs can access the file when a path is given.
import os
from tempfile import NamedTemporaryFile
from subprocess import call
with NamedTemporaryFile(mode='w+b') as temp:
# Encode your text in order to write bytes
temp.write('abcdefg'.encode())
# put file buffer to offset=0
temp.seek(0)
# use the temp file
cmd = "cat "+ str(temp.name)
print(os.system(cmd))

Validating a zip file coming from stdin

After some frustration with unzip(1L), I've been trying to create a script that will unzip and print out raw data from all of the files inside a zip archive that is coming from stdin. I currently have the following, which works:
import sys, zipfile, StringIO
stdin = StringIO.StringIO(sys.stdin.read())
zipselect = zipfile.ZipFile(stdin)
filelist = zipselect.namelist()
for filename in filelist:
print filename, ':'
print zipselect.read(filename)
When I try to add validation to check if it truly is a zip file, however, it doesn't like it.
...
zipcheck = zipfile.is_zipfile(zipselect)
if zipcheck is not None:
print 'Input is not a zip file.'
sys.exit(1)
...
results in
File "/home/chris/simple/zipcat/zipcat.py", line 13, in <module>
zipcheck = zipfile.is_zipfile(zipselect)
File "/usr/lib/python2.7/zipfile.py", line 149, in is_zipfile
result = _check_zipfile(fp=filename)
File "/usr/lib/python2.7/zipfile.py", line 135, in _check_zipfile
if _EndRecData(fp):
File "/usr/lib/python2.7/zipfile.py", line 203, in _EndRecData
fpin.seek(0, 2)
AttributeError: ZipFile instance has no attribute 'seek'
I assume it can't seek because it is not a file, as such?
Sorry if this is obvious, this is my first 'go' with Python.

You should pass stdin to is_zipfile, not zipselect. is_zipfile takes a path to a file or a file object, not a ZipFile.
See the zipfile.is_zipfile documentation
You are correct that a ZipFile can't seek because it isn't a file. It's an archive, so it can contain many files.

To do this entirely in memory will take some work. The AttributeError message means that the is_zipfile method is trying to use the seek method of the file handle you provide. But standard input is not seekable, and therefore your file object for it has no seek method.
If you really, really can't store the file on disk temporarily, then you could buffer the entire file in memory (you would need to enforce a size limit for security), and then implement some "duck" code that looks and acts like a seekable file object but really just uses the byte-string in memory.
It is possible that you could cheat and buffer only enough of the data for is_zipfile to do its work, but I seem to recall that the table-of-contents for ZIP is at the end of the file. I could be wrong about that though.

Your 2011 python2 fragment was: StringIO.StringIO(sys.stdin.read())
In 2018 a python3 programmer might phrase that as: io.StringIO(...).
What you wanted was the following python3 fragment: io.BytesIO(...).
Certainly that works well for me when using the requests module to download binary ZIP files from webservers:
zf = zipfile.ZipFile(io.BytesIO(req.content))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I persist to disk a temporary file using Python? - python

You could always use shutil.copyfileobj, in your example: new_file = open('bar.txt', 'rw') shutil.copyfileobj(f, new_file)

Related

Do I need to close the OS-level handle returned to me by tempfile.mkstemp if not using it?

Specifying file name with mkstemp, file not found

shutil.move deletes the file on windows when new name contains a colon (and an extension)

How to generate temporary file in django and then destroy

Validating a zip file coming from stdin

Categories

Resources