Can't access temporary files created with tempfile

Can't access temporary files created with tempfile - python

I am using tempfile.NamedTemporaryFile() to store some text until the program ends. On Unix is working without any issues but on Windows the file returned isn't accessible for reading or writing: python gives Errno 13. The only way is to set delete=False and manually delete the file with os.remove(). Why?

This causes the IOError because the file can be opened only once after it is created.
The reason is because NamedTemporaryFile creates the file with FILE_SHARE_DELETE flag on Windows. On Windows when a file has been created/opened with specific share flag all subsequent open operations have to pass this share flag. It's not the case with Python's open function which does not pass FILE_SHARE_DELETE flag. See my answer on How to create a temporary file that can be read by a subprocess? question for more details and a workaround.

Take a look: http://docs.python.org/2/library/tempfile.html
tempfile.NamedTemporaryFile([mode='w+b'[, bufsize=-1[, suffix=''[, prefix='tmp'[, dir=None[, delete=True]]]]]])
This function operates exactly as TemporaryFile() does, except that the file is guaranteed to have a visible name in the file system (on Unix, the directory entry is not unlinked). That name can be retrieved from the name attribute of the file object. Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later). If delete is true (the default), the file is deleted as soon as it is closed.

Thanks to #Rnhmjoj here is a working solution:
file = NamedTemporaryFile(delete=False)
file.close()
You have to keep the file with the delete-flag and then close it after creation. This way, Windows will unlock the file and you can do stuff with it!

Related

Finding which files are being read from during a session (python code)

I have a large system written in python. when I run it, it reads all sorts of data from many different files on my filesystem. There are thousands lines of code, and hundreds of files, most of them are not actually being used. I want to see which files are actually being accessed by the system (ubuntu), and hopefully, where in the code they are being opened. Filenames are decided dynamically using variables etc., so the actual filenames cannot be determined just by looking at the code.
I have access to the code, of course, and can change it.
I try to figure how to do this efficiently, with minimal changes in the code:
is there a Linux way to determine which files are accessed, and at what times? this might be useful, although it won't tell me where in the code this happens
is there a simple way to make an "open file" command also log the file name, time, etc... of the open file? hopefully without having to go into the code and change every open command, there are many of them, and some are not being used at runtime.
Thanks

You can trace file accesses without modifying your code, using strace.
Either you start your program with strace, like this
strace -f -e trace=file your_program.py
Otherwise you attach strace to a running program like this
strace -f -e trace=file -p <PID>

For 1 - You can use
ls -la /proc/<PID>/fd`
Replacing <PID> with your process id.
Note that it will give you all the open file descriptors, some of them are stdin stdout stderr, and often other things, such as open websockets (which use a file descriptor), however filtering it for files should be easy.
For 2- See the great solution proposed here -
Override python open function when used with the 'as' keyword to print anything
e.g. overriding the open function with your own, which could include the additional logging.

One possible method is to "overload" the open function. This will have many effects that depend on the code, so I would do that very carefully if needed, but basically here's an example:
>>> _open = open
>>> def open(filename):
... print(filename)
... return _open(filename)
...
>>> open('somefile.txt')
somefile.txt
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in open
FileNotFoundError: [Errno 2] No such file or directory: 'somefile.txt'
As you can see my new open function will return the original open (renamed as _open) but will first print out the argument (the filename). This can be done with more sophistication to log the filename if needed, but the most important thing is that this needs to run before any use of open in your code

Python temporary directory to execute other processes?

I have a string of Java source code in Python that I want to compile, execute, and collect the output (stdout and stderr). Unfortunately, as far as I can tell, javac and java require real files, so I have to create a temporary directory.
What is the best way to do this? The tempfile module seems to be oriented towards creating files and directories that are only visible to the Python process. But in this case, I need Java to be able to see them too. However, I also want the other stuff to be handled intelligently if possible (such as deleting the folder when done or using the appropriate system temp folder)

tempfile.NamedTemporaryFile and tempfile.TemporaryDirectory work perfectly fine for your purposes. The resulting objects have a .name attribute that provides a file system visible name that java/javac can handle just fine, just make sure to:
Set the suffix appropriately if the compiler insists on files being named with a .java extension
Always call .flush() on the file handle before handing the .name of a NamedTemporaryFile to an external process or it may (usually will) see an incomplete file
If you don't want Python cleaning up the files when you close the objects, either pass delete=False to NamedTemporaryFile's constructor, or use the mkstemp and mkdtemp functions (which create the objects, but don't clean them up for you).
So for example, you might do:
# Create temporary directory for source and class files
with tempfile.TemporaryDirectory() as d:
# Write source code
srcpath = os.path.join(d.name, "myclass.java")
with open(srcpath, "w") as srcfile:
srcfile.write('source code goes here')
# Compile source code
subprocess.check_call(['javac', srcpath])
# Run source code
# Been a while since I've java-ed; you don't include .java or .class
# when running, right?
invokename = os.path.splitext(srcpath)[0]
subprocess.check_call(['java', invokename])
... with block for TemporaryDirectory done, temp directory cleaned up ...

tempfile.mkstemp creates a file that is normally visible in the filesystem and returns you the path as well. You should be able to use this to create your input and output files - assuming javac will atomically overwrite the output file if it exists there should be no race condition if other processes on your system don't misbehave.

Lock file for access on windows

Using portalocker we can lock a file for access through the following way:
f=open("M99","r+")
portalocker.lock(f,portalocker.LOCK_EX)
The lock over the file can be removed using
f.close() #or
portalocker.unlock(file) #needs `file` ie reference to file it locked ..pretty obvious too
Can this same thing be done by any other way in python wherein
We can lock the file for access
Restart Python (so no longer have the original Python file object or file number).
Unlock the file for access in the new process.
I cannot save f or file object so can't use pickle or something either. Is there a way using the Python standard library or some win32api call?
Any windows utility will also do...any command line from windows?

It appears you want to lock access to resources where the lock persists between program invocations. You need a different strategy for that.
Create a lock file using exclusive create mode; in Python 2 this requires using the os.open() call (followed by os.fdopen() to produce a Python file object), in Python 3 you can use the 'x' mode when using the built-in open().
In Python 2:
import os
LOCKFILE = r'some\path\to\lockfile'
class AlreadyLocked(Exception):
pass
def lock():
try:
fd = os.open(LOCKFILE, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
except IOError:
# file already exists
raise AlreadyLocked()
with os.fdopen(fd, 'w') as lockfile:
# write the PID of the current process so you can debug
# later if a lockfile can be deleted after a program crash
lockfile.write(os.getpid())
def unlock():
os.remove(LOCKFILE)
In Python 3 the lock() function would be:
def lock():
try:
with open(LOCKFILE, 'x') as lockfile:
# write the PID of the current process so you can debug
# later if a lockfile can be deleted after a program crash
lockfile.write(os.getpid())
except IOError:
# file already exists
raise AlreadyLocked()
You need to use exclusive create mode to avoid race conditions; in exclusive create mode the file can only be created if it doesn't yet exist, a condition checked by the Operating System, rather than by a separate step in Python which would open a window for another program to create the lock as well.
Now you can lock and unlock without tracking the file descriptor. The lockfile is now a signal file; if it is present something has claimed a lock, and deleting the file means something is unlocked.
This does mean that access to the files or directories you are trying to protect is only protected because all your code honours this lock system, not because the OS is enforcing locks on those files or directories.
This all means that this only works if all access to the shared resource is handled by processes that cooperate in this strategy. It cannot be used if another process doesn't honour this scheme. In that case your only option is to use OS level locking and you have to keep your process running for the full duration of the lock.

there is a method in win32api to set file attributes if you have a read of the following:
python SetFileAttributes
MSDN file attributes
these give you the python method to set file attributes:
win32api.SetFileAttributes(file, win32con.FILE_ATTRIBUTE_NORMAL)
where file is the name/path of the file, and not a file object
and the second argument is a attribute mask, is you wanted to set several attributes at once, you can use bitwise xor to add them:
win32con.FILE_ATTRIBUTE_HIDDEN | win32con.FILE_ATTRIBUTE_READONLY
and there are more constants named in the MSDN page.
EDIT:
for file locking you can also look at the win32file.LockFileEx method
i haven't used this before so it may take some playing around, but it appears to need you to pass it a file object (not a path) and then certain constants to set the access permissions, more info on the constants can be found on MSDN

You could use subprocess to open the file in notepad or excel:
import subprocess, time
subprocess.call('start excel.exe "\lockThisFile.txt\"', shell = True)
time.sleep(10) # if you need the file locked before executing the next commands, you may need to sleep it for a few seconds
or
subprocess.call('notepad > lockThisFile.txt', shell = True)
As written you need shell = True, otherwise windows will give you a syntax error.
(subprocess.Popen() works as well)
You can then close the process later using:
subprocess.call('taskkill /f /im notepad.exe') # or excel.exe
Other options include
-write some C++ code and call it from python (https://msdn.microsoft.com/en-us/library/windows/desktop/aa365203(v=vs.85).aspx)
-call 3rd party programs with subprocess.call():
FileLocker http://www.jensscheffler.de/filelocker (https://superuser.com/questions/294826/how-to-purposefully-exclusively-lock-a-file)
Easy File Locker http://www.xoslab.com/efl.html and Dispatch (from win32com.client import Dispatch), although last choice is the most complex

with open inside try - except block, too many files open?

Quite simply, I am cycling through all sub folders in a specific location, and collecting a few numbers from three different files.
def GrepData():
import glob as glob
import os as os
os.chdir('RUNS')
RUNSDir = os.getcwd()
Directories = glob.glob('*.*')
ObjVal = []
ParVal = []
AADVal = []
for dir in Directories:
os.chdir(dir)
(X,Y) = dir.split(sep='+')
AADPath = glob.glob('Aad.out')
ObjPath = glob.glob('fobj.out')
ParPath = glob.glob('Par.out')
try:
with open(os.path.join(os.getcwd(),ObjPath[0])) as ObjFile:
for line in ObjFile:
ObjVal.append(list([X,Y,line.split()[0]]))
ObjFile.close()
except(IndexError):
ObjFile.close()
try:
with open(os.path.join(os.getcwd(),ParPath[0])) as ParFile:
for line in ParFile:
ParVal.append(list([X,Y,line.split()[0]]))
ParFile.close()
except(IndexError):
ParFile.close()
try:
with open(os.path.join(os.getcwd(),AADPath[0])) as AADFile:
for line in AADFile:
AADVal.append(list([X,Y,line.split()[0]]))
AADFile.close()
except(IndexError):
AADFile.close()
os.chdir(RUNSDir)
Each file open command is placed in a try - except block, as in a few cases the file that is opened will be empty, and thus appending the line.split() will lead to an index error since the list is empty.
However when running this script i get the following error: "OSError: [Errno 24] Too Many open files"
I was under the impression that the idea of the "with open..." statement was that it took care of closing the file after use? Clearly that is not happening.
So what I am asking for is two things:
The answer to: "Is my understanding of with open correct?"
How can I correct whatever error is inducing this problem?
(And yes i know the code is not exactly elegant. The whole try - except ought to be a single object that is reused - but I will fix that after figuring out this error)

Try moving your try-except inside the with like so:
with open(os.path.join(os.getcwd(),ObjPath[0])) as ObjFile:
for line in ObjFile:
try:
ObjVal.append(list([X,Y,line.split()[0]]))
except(IndexError):
pass
Notes: there is no need to close your file manually, this is what with is for. Also, there is no need to use as os in your imports if you are using the same name.

"Too many open files" has nothing to do with writing semantically incorrect python code, and you are using with correctly. The key is the part of your error that says "OSError," which refers to the underlying operating system.
When you call open(), the python interpreter will execute a system call. The details of the system call vary a bit by which OS you are using, but on linux this call is open(2). The operating system kernel will handle the system call. While the file is open, it has an entry in the system file table and takes up OS resources -- this means effectively it is "taking up space" whilst it is open. As such the OS has a limit to the number of files that can be opened at any one time.
Your problem is that while you call open(), you don't call close() quickly enough. In the event that your directory structure requires you to have many thousands files open at once that might approach this cap, it can be temporarily changed (at least on linux, I'm less familiar with other OSes so I don't want to go into too many details about how to do this across platforms).

Opening file - Performing a function

I was wondering if someone could give me a direction on how to give functions to a file... This is a bit hard to explain, so I'll try my best.
Let's say I have an application (using wxPython) and let's say that I have a file. Now this file is assigned to open with the application. So, I double-click the file and it opens the application. Now my question is, what would have to be written on the file to, for example, open up a dialog? So we double-click the file and it opens a dialog on the application?
PS: I know that I have first to associate the program with a certain file type to double-click it, but thats not the question.

AFAIK most platforms just call the helper app with the file you clicked on as an argument, so your filepath will be in sys.argv[1]

I think what he wants to do is associate a file extension to his application so when he opens the file by double clicking it, it sends the contents of the file to his app; in this case, display the contents within a Dialog?
If this is the case, than the first thing you would need to do (provided you are on windows) is create the appropriate file association for your file extention. This can be done through the registry and when setup correctly will open your app with the the path/filename of the file that was executed as the first argument. Ideally it is the same as executing it from the command line like:
C:\your\application.exe "C:\The\Path\To\my.file"
Now as suggested above, you would then need to use sys.argv to to obtain the arguments passed to your application, in this case C:\Path\To\my.file would be the first argument. Simply put, sys.argv is a list of arguments passed to the application; in this case the first entry sys.argv[0] will always be the path to your application, and as mentioned above, sys.argv[1] would be the path to your custom file.
Example:
import sys
myFile = sys.argv[1]
f = file(myFile, "r")
contents = f.read()
f.close()
Then you will be able to pass the variable contents to your dialog to do whatever with.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can't access temporary files created with tempfile - python

Thanks to #Rnhmjoj here is a working solution: file = NamedTemporaryFile(delete=False) file.close() You have to keep the file with the delete-flag and then close it after creation. This way, Windows will unlock the file and you can do stuff with it!

Related

Finding which files are being read from during a session (python code)

Python temporary directory to execute other processes?

Lock file for access on windows

with open inside try - except block, too many files open?

Opening file - Performing a function

Categories

Resources