Swift task open file descriptor of pipe

Swift task open file descriptor of pipe - python

I am trying to open a Swift Pipe from a python script that is executed via a Swift Task
Swift code
let pipe=Pipe()
let task = Process()
var env=ProcessInfo.processInfo.environment
task.launchPath = "/pythonscript.py"
let fh=pipe.fileHandleForWriting
task.arguments = ["\(fh.fileDescriptor)"]
task.launch()
Python
#!/usr/local/bin/python
import os
import sys
fd=int(sys.argv[1])
print(os.fdopen(fd, u'w'))
What I get back from the python script is
Traceback (most recent call last):
File "./test.py", line 7, in <module>
print(os.fdopen(fd, u'w'))
OSError: [Errno 9] Bad file descriptor
Why can't python open the file descriptor I created in Swift?

Why can't python open the file descriptor I created in Swift?
Short answer (fudging a little): because the file descriptor is a process local identifier which is used by the OS to link to the open file information it keeps for process. You cannot copy them between processes.
Long answer:
In macOS/Unix/Linux (*nix) a file descriptor is just a process-local value which is used by the OS to link to the appropriate open file information within the OS. Different processes can have exactly the same file descriptor values which identify completely different files. Therefore you cannot simply copy a file descriptor value between processes.
In *nix a child process inherits the open files, and their associated descriptors, from its parent. This is the only way file descriptors get passed between processes. In outline the steps are:
The parent process forks, creating a clone of itself
The clone then closes any files the child should not access (usually all of them except the standard input, output and error files).
If the parent has pre-opened files that should be the child's standard input, output or error the clone then reassigns the file descriptors for those files to the standard file descriptors for standard input, output and error.
After all this file descriptor work is done the clone then replaces its code with the code the child needs to run - this keeps the open files and file descriptors.
The child code now executes unaware of all this setup.
In Swift all the above is handled by Process, in Terminal it is handled by the shell which uses it to set up file redirection, pipes etc.
To get your pipe to your Python process you can (a) use the Process methods to attach it to the spawned processes standard input or output; (b) create a named pipe, that is one with a file path, and pass the file path to your python to open; or (c) go low-level and write some interfacing C code which does the fork/dup(2)/exec calls and starts up your python code with the pipe on a known descriptor other than standard input or output.
(a) is easiest! (b) requires you to do some research on named pipes, its not hard but you'll need work with sandboxing if its enabled and create the pipe in a directory both process can access. (c) is best avoided.
Have fun, and if you get stuck ask a new question showing what you've tried, where it goes wrong, etc. and someone will undoubtedly help you along.
HTH

Related

Finding which files are being read from during a session (python code)

I have a large system written in python. when I run it, it reads all sorts of data from many different files on my filesystem. There are thousands lines of code, and hundreds of files, most of them are not actually being used. I want to see which files are actually being accessed by the system (ubuntu), and hopefully, where in the code they are being opened. Filenames are decided dynamically using variables etc., so the actual filenames cannot be determined just by looking at the code.
I have access to the code, of course, and can change it.
I try to figure how to do this efficiently, with minimal changes in the code:
is there a Linux way to determine which files are accessed, and at what times? this might be useful, although it won't tell me where in the code this happens
is there a simple way to make an "open file" command also log the file name, time, etc... of the open file? hopefully without having to go into the code and change every open command, there are many of them, and some are not being used at runtime.
Thanks

You can trace file accesses without modifying your code, using strace.
Either you start your program with strace, like this
strace -f -e trace=file your_program.py
Otherwise you attach strace to a running program like this
strace -f -e trace=file -p <PID>

For 1 - You can use
ls -la /proc/<PID>/fd`
Replacing <PID> with your process id.
Note that it will give you all the open file descriptors, some of them are stdin stdout stderr, and often other things, such as open websockets (which use a file descriptor), however filtering it for files should be easy.
For 2- See the great solution proposed here -
Override python open function when used with the 'as' keyword to print anything
e.g. overriding the open function with your own, which could include the additional logging.

One possible method is to "overload" the open function. This will have many effects that depend on the code, so I would do that very carefully if needed, but basically here's an example:
>>> _open = open
>>> def open(filename):
... print(filename)
... return _open(filename)
...
>>> open('somefile.txt')
somefile.txt
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in open
FileNotFoundError: [Errno 2] No such file or directory: 'somefile.txt'
As you can see my new open function will return the original open (renamed as _open) but will first print out the argument (the filename). This can be done with more sophistication to log the filename if needed, but the most important thing is that this needs to run before any use of open in your code

Use pipe() and fdopen() to pass data from Python script to C++ application in Windows

We have some Linux/macOS application, which can communicate with outer world by passing file descriptor and reading data from it. Usually this is done to pass stdin/stdout descriptors, however we use pipe() and this works pretty well. Except MinGW/Windows.
What would be the recommended way of doing the same job under the Windows? Pass the whole file handle, or there are good ways to simulate small-int-like descriptor?

In Windows, C file descriptors are inherited in the process STARTUPINFO record in the reserved fields cbReserved2 and lpReserved2. The protocol is undocumented, but the source is distributed with Visual C++. C functions that use this feature include the _[w]spawn family of functions and [_w]system. (The [_w]system function, however, is not generally useful in this regard because only the immediate cmd.exe process inherits the parent's file descriptors. CMD does not pass them on to child processes.)
In Python 2.7, os.pipe is implemented by calling CreatePipe, which returns non-inheritable file handles for the read and write ends of the pipe. These handles are then manually wrapped with inheritable file descriptors via _open_osfhandle, but the underlying OS handle is still non-inheritable. To work around this, duplicate the file descriptor via os.dup, which internally duplicates an inheritable file handle, and then close the source file descriptor. For example:
pread, pwrite = os.pipe()
pwrite_child = os.dup(pwrite)
os.close(pwrite)
Python's subprocess module is usually the preferred way to create a child process. However, we can't use subprocess in this case because it doesn't support inheriting file descriptors via STARTUPINFO (*). Here's an example that uses os.spawnv instead:
rc = os.spawnv(os.P_WAIT, 'path/to/spam.exe', ['spam', 'arg1', 'arg2'])
(*) It's an awkward situation that Windows Python internally uses the C runtime file API (e.g. _wopen, _read, _write), but in places fails to support C file descriptors. It should bite the bullet and use the Windows file API directly with OS file handles, then it would at least be consistent.

what is the os.close(3) for?

What is the os.close(3) for?
I am reading the python cookbook 2nd chapter 2.9, it explains how the python zip file work. There is one snippet of code in it I don't really got it.
import zipfile, tempfile, os, sys
handle, filename = tempfile.mkstemp('.zip')
os.close(handle) # <- handle is int 3 here
z = zipfile.ZipFile(filename, 'w')
z.writestr('hello.py', 'def f(): return "hello world from "+__file__\n')
z.close()
sys.path.insert(0, filename)
import hello
print hello.f()
os.unlink(filename)
os.close() explaination in python docs:
This function is intended for low-level I/O and must be applied to a file descriptor as returned by os.open() or pipe(). To close a “file object” returned by the built-in function open() or by popen() or fdopen(), use its close() method.
The file descriptor in linux from 0,1 & 2 are stdin, stdout & stderror, I don't get what the fd 3 for? Even though I have read this "What is the file descriptor 3 assigned by default? ".
I comment the os.close(handle) out, but the output make no different.

Even though Python mostly deals in "file objects", these are an abstraction around OS-level file handles; when actually reading or writing content to a file (or network stream, or other file-like object) at the operating system level, one passes the OS the handle number associated with the file with which one wants to interact. Thus, every file object in Python that's actually backed by an OS-level file handle has such a file descriptor number associated.
File handles are stored in a table, each associated with an integer. On Linux, you can look at the directory /proc/self/fds (substituting a PID number for self to look at a different process) to see which handles have which numbers for a given process.
handle, filename = tempfile.mkstemp('.zip'); os.close(handle), thus, closes the OS-level file handle which was returned to you by mkstemp.
By the way: It's important to note that there's absolutely nothing special about the number 3, and that there is no default or conventional behavior for same implemented at the operating system level; it just happened to be the next available position in the file handle table when mkstemp was called (or, to be more precise, when the C standard library implementation of mkstemp called the OS-level syscall open).

You are getting file descriptor 3 because in this case it is the next available file descriptor. As you mentioned, stdin (0), stdout (1) and stderr (2) are automatically opened for you. The link you cited ( https://unix.stackexchange.com/questions/41421/what-is-the-file-descriptor-3-assigned-by-default ) points this out, too.

Lock file for access on windows

Using portalocker we can lock a file for access through the following way:
f=open("M99","r+")
portalocker.lock(f,portalocker.LOCK_EX)
The lock over the file can be removed using
f.close() #or
portalocker.unlock(file) #needs `file` ie reference to file it locked ..pretty obvious too
Can this same thing be done by any other way in python wherein
We can lock the file for access
Restart Python (so no longer have the original Python file object or file number).
Unlock the file for access in the new process.
I cannot save f or file object so can't use pickle or something either. Is there a way using the Python standard library or some win32api call?
Any windows utility will also do...any command line from windows?

It appears you want to lock access to resources where the lock persists between program invocations. You need a different strategy for that.
Create a lock file using exclusive create mode; in Python 2 this requires using the os.open() call (followed by os.fdopen() to produce a Python file object), in Python 3 you can use the 'x' mode when using the built-in open().
In Python 2:
import os
LOCKFILE = r'some\path\to\lockfile'
class AlreadyLocked(Exception):
pass
def lock():
try:
fd = os.open(LOCKFILE, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
except IOError:
# file already exists
raise AlreadyLocked()
with os.fdopen(fd, 'w') as lockfile:
# write the PID of the current process so you can debug
# later if a lockfile can be deleted after a program crash
lockfile.write(os.getpid())
def unlock():
os.remove(LOCKFILE)
In Python 3 the lock() function would be:
def lock():
try:
with open(LOCKFILE, 'x') as lockfile:
# write the PID of the current process so you can debug
# later if a lockfile can be deleted after a program crash
lockfile.write(os.getpid())
except IOError:
# file already exists
raise AlreadyLocked()
You need to use exclusive create mode to avoid race conditions; in exclusive create mode the file can only be created if it doesn't yet exist, a condition checked by the Operating System, rather than by a separate step in Python which would open a window for another program to create the lock as well.
Now you can lock and unlock without tracking the file descriptor. The lockfile is now a signal file; if it is present something has claimed a lock, and deleting the file means something is unlocked.
This does mean that access to the files or directories you are trying to protect is only protected because all your code honours this lock system, not because the OS is enforcing locks on those files or directories.
This all means that this only works if all access to the shared resource is handled by processes that cooperate in this strategy. It cannot be used if another process doesn't honour this scheme. In that case your only option is to use OS level locking and you have to keep your process running for the full duration of the lock.

there is a method in win32api to set file attributes if you have a read of the following:
python SetFileAttributes
MSDN file attributes
these give you the python method to set file attributes:
win32api.SetFileAttributes(file, win32con.FILE_ATTRIBUTE_NORMAL)
where file is the name/path of the file, and not a file object
and the second argument is a attribute mask, is you wanted to set several attributes at once, you can use bitwise xor to add them:
win32con.FILE_ATTRIBUTE_HIDDEN | win32con.FILE_ATTRIBUTE_READONLY
and there are more constants named in the MSDN page.
EDIT:
for file locking you can also look at the win32file.LockFileEx method
i haven't used this before so it may take some playing around, but it appears to need you to pass it a file object (not a path) and then certain constants to set the access permissions, more info on the constants can be found on MSDN

You could use subprocess to open the file in notepad or excel:
import subprocess, time
subprocess.call('start excel.exe "\lockThisFile.txt\"', shell = True)
time.sleep(10) # if you need the file locked before executing the next commands, you may need to sleep it for a few seconds
or
subprocess.call('notepad > lockThisFile.txt', shell = True)
As written you need shell = True, otherwise windows will give you a syntax error.
(subprocess.Popen() works as well)
You can then close the process later using:
subprocess.call('taskkill /f /im notepad.exe') # or excel.exe
Other options include
-write some C++ code and call it from python (https://msdn.microsoft.com/en-us/library/windows/desktop/aa365203(v=vs.85).aspx)
-call 3rd party programs with subprocess.call():
FileLocker http://www.jensscheffler.de/filelocker (https://superuser.com/questions/294826/how-to-purposefully-exclusively-lock-a-file)
Easy File Locker http://www.xoslab.com/efl.html and Dispatch (from win32com.client import Dispatch), although last choice is the most complex

Detect file handle leaks in python?

My program appears to be leaking file handles. How can I find out where?
My program uses file handles in a few different places—output from child processes, call ctypes API (ImageMagick) opens files, and they are copied.
It crashes in shutil.copyfile, but I'm pretty sure this is not the place it is leaking.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Python25\Lib\site-packages\magpy\magpy.py", line 874, in main
magpy.run_all()
File "C:\Python25\Lib\site-packages\magpy\magpy.py", line 656, in run_all
[operation.operate() for operation in operations]
File "C:\Python25\Lib\site-packages\magpy\magpy.py", line 417, in operate
output_file = self.place_image(output_file)
File "C:\Python25\Lib\site-packages\magpy\magpy.py", line 336, in place_image
shutil.copyfile(str(input_file), str(self.full_filename))
File "C:\Python25\Lib\shutil.py", line 47, in copyfile
fdst = open(dst, 'wb')
IOError: [Errno 24] Too many open files: 'C:\\Documents and Settings\\stuart.axon\\Desktop\\calzone\\output\\wwtbam4\\Nokia_NCD\\nl\\icon_42x42_V000.png'
Press any key to continue . . .

I had similar problems, running out of file descriptors during subprocess.Popen() calls. I used the following script to debug on what is happening:
import os
import stat
_fd_types = (
('REG', stat.S_ISREG),
('FIFO', stat.S_ISFIFO),
('DIR', stat.S_ISDIR),
('CHR', stat.S_ISCHR),
('BLK', stat.S_ISBLK),
('LNK', stat.S_ISLNK),
('SOCK', stat.S_ISSOCK)
)
def fd_table_status():
result = []
for fd in range(100):
try:
s = os.fstat(fd)
except:
continue
for fd_type, func in _fd_types:
if func(s.st_mode):
break
else:
fd_type = str(s.st_mode)
result.append((fd, fd_type))
return result
def fd_table_status_logify(fd_table_result):
return ('Open file handles: ' +
', '.join(['{0}: {1}'.format(*i) for i in fd_table_result]))
def fd_table_status_str():
return fd_table_status_logify(fd_table_status())
if __name__=='__main__':
print fd_table_status_str()
You can import this module and call fd_table_status_str() to log the file descriptor table status at different points in your code.
Also, make sure that subprocess.Popen instances are destroyed. Keeping references of Popen instances in Windows prevent the GC from running. And if the instances are kept, the associated pipes are not closed. More info here.

Use Process Explorer, select your process, View->Lower Pane View->Handles - then look for what seems out of place - usually lots of the same or similar files open points to the problem.

lsof -p <process_id> works well on several UNIX-like systems including FreeBSD.

Look at the output from ls -l /proc/$pid/fd/ (substituting the PID of your process, of course) to see which files are open [or, on win32, use Process Explorer to list open files]; then figure out where in your code you're opening them, and make sure that close() is being called. (Yes, the garbage collector will eventually close things, but it's not always fast enough to avoid running out of fds).
Checking for any circular references which might be preventing garbage collection is also a good practice. (The cycle collector will eventually dispose of these -- but it may not run frequently enough to avoid file descriptor exhaustion; I've been bitten by this personally).

While the OP has a Windows system, I'm sure plenty of people here (such as myself) are looking for others too (it's not even tagged Windows).
Google has a psutil package with a get_open_files() method. It looks like an excellent interface, but it hasn't been maintained in a couple years it seems. I actually wrote an implementation for my own Python 2 project on Linux. I'm using it with unittest to make sure my functions clean up their resources.
import os
# calling this **synchronously** will accurately relay open files on Linux
def get_open_files(pid):
# directory spawned by Python process, containing its file descriptors
path = "/proc/%d/fd" % pid
# list the abspaths belonging to that directory
links = ["%s/%s" % (path, f) for f in os.listdir(path)]
# filter out the bad ones returned by os.listdir()
valid_links = filter(lambda f: os.path.exists(f), links)
# these links are fd integers, so map them to their actual file devices
devices = map(lambda f: os.readlink(f), valid_links)
# remove any ones that are stdin, stdout, stderr, etc.
return filter(lambda f: "/dev/pts" not in f, devices)

Python's own test suite has a refleak module that utilizes fd_count. Works across operating systems and is available on full installs:
>>> from test.support.os_helper import fd_count
>>> fd_count()
27
On Python 3.9 and earlier, the os_helper doesn't exist, so from test.support import fd_count.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.