What is the os.close(3) for?
I am reading the python cookbook 2nd chapter 2.9, it explains how the python zip file work. There is one snippet of code in it I don't really got it.
import zipfile, tempfile, os, sys
handle, filename = tempfile.mkstemp('.zip')
os.close(handle) # <- handle is int 3 here
z = zipfile.ZipFile(filename, 'w')
z.writestr('hello.py', 'def f(): return "hello world from "+__file__\n')
z.close()
sys.path.insert(0, filename)
import hello
print hello.f()
os.unlink(filename)
os.close() explaination in python docs:
This function is intended for low-level I/O and must be applied to a file descriptor as returned by os.open() or pipe(). To close a “file object” returned by the built-in function open() or by popen() or fdopen(), use its close() method.
The file descriptor in linux from 0,1 & 2 are stdin, stdout & stderror, I don't get what the fd 3 for? Even though I have read this "What is the file descriptor 3 assigned by default? ".
I comment the os.close(handle) out, but the output make no different.
Even though Python mostly deals in "file objects", these are an abstraction around OS-level file handles; when actually reading or writing content to a file (or network stream, or other file-like object) at the operating system level, one passes the OS the handle number associated with the file with which one wants to interact. Thus, every file object in Python that's actually backed by an OS-level file handle has such a file descriptor number associated.
File handles are stored in a table, each associated with an integer. On Linux, you can look at the directory /proc/self/fds (substituting a PID number for self to look at a different process) to see which handles have which numbers for a given process.
handle, filename = tempfile.mkstemp('.zip'); os.close(handle), thus, closes the OS-level file handle which was returned to you by mkstemp.
By the way: It's important to note that there's absolutely nothing special about the number 3, and that there is no default or conventional behavior for same implemented at the operating system level; it just happened to be the next available position in the file handle table when mkstemp was called (or, to be more precise, when the C standard library implementation of mkstemp called the OS-level syscall open).
You are getting file descriptor 3 because in this case it is the next available file descriptor. As you mentioned, stdin (0), stdout (1) and stderr (2) are automatically opened for you. The link you cited ( https://unix.stackexchange.com/questions/41421/what-is-the-file-descriptor-3-assigned-by-default ) points this out, too.
Related
Say hello to bug.py:
import os, sys
stdout2 = os.dup(sys.stdout.fileno())
(r,w) = os.pipe()
os.dup2(w,sys.stdout.fileno())
print("Hail Stan")
z = os.read(r,1000)
os.write(stdout2,z)
If you run this on OSX (and I imagine, on Linux), this works great. However, in Windows, we get this:
PS Z:\...> python --version
Python 3.9.2
PS Z:\...> python bug.py
Traceback (most recent call last):
File "Z:\...\bug.py", line 6, in <module>
print("Hail Stan")
OSError: [WinError 1] Incorrect function
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
OSError: [WinError 1] Incorrect function
I don't know much about anything but this smells of some deep Python+Windows mysticism and PDB isn't cutting cheese so I can't debug my way out of this. Anyone knows how to make this work?
The slightly bigger context is that I'm trying to build some tee-like functionality into my application, but all other methods I've found of capturing stdout are incomplete. You can use popen and subprocess for various things, but if you truly want to grab your whole process's stdout and stderr, at least on Unix, the only way I found is to dup2 the filedescriptors through pipes.
I don't know if I've truly got to the bottomest bottom but I've mostly figured this one out.
sys.stdout is an object of type TextIOWrapper, whose sys.stdout.write() method eventually calls either os.write() or the C equivalent, on the filedescriptor sys.stdout.fileno(), which is 1. There are several types of filedescriptors: files, sockets, serial ports, terminals, pipes, etc... The C library functions write(), close(), etc... work on mostly any type of filedescriptor, but some features only work when the filedescriptor is of a suitable type. At creation time, the TextIOWrapper that is sys.stdout examines its filedescriptor (1) and determines that it's a terminal. On Windows only, sys.stdout.write() ends up doing some operations that are only valid if the filedescriptor 1 is truly a terminal. Once you os.dup2() so that 1 becomes an os.pipe(), and not a terminal, the Windows Python implementation crashes when it attempts to do terminal-specific operations on a pipe.
It's not clear to me that there's a way of causing sys.stdout to re-examine its filedescriptor so that it notices that it's not a terminal anymore, and avoid crashing the Python interpreter.
As a workaround, on Windows only, I'm doing stdout.write = lambda z: os.write(stdout.fileno(),z.encode() if hasattr(z,'encode') else z)
I haven't done a deep dive through the Python code base to see whether this is sufficient, but it does appear to allow my programs to run correctly, and print() no longer causes the Python interpreter to crash.
Many other people instead do sys.stdout = ..., but I did not want to do this for many reasons. For one, I was worried that some script or module might store a local cached copy of sys.stdout. Although it is possible to cache sys.stdout.write directly, I thought that this was less likely. Furthermore, it allows me to monkeypatch in my constructor, e.g. StreamCapture(sys.stdout), without having to refer to the global variable sys.stdout directly in my constructor (a "side-effect"). If you believe that a function is allowed to mutate its inputs, then this is side-effect-free.
We have some Linux/macOS application, which can communicate with outer world by passing file descriptor and reading data from it. Usually this is done to pass stdin/stdout descriptors, however we use pipe() and this works pretty well. Except MinGW/Windows.
What would be the recommended way of doing the same job under the Windows? Pass the whole file handle, or there are good ways to simulate small-int-like descriptor?
In Windows, C file descriptors are inherited in the process STARTUPINFO record in the reserved fields cbReserved2 and lpReserved2. The protocol is undocumented, but the source is distributed with Visual C++. C functions that use this feature include the _[w]spawn family of functions and [_w]system. (The [_w]system function, however, is not generally useful in this regard because only the immediate cmd.exe process inherits the parent's file descriptors. CMD does not pass them on to child processes.)
In Python 2.7, os.pipe is implemented by calling CreatePipe, which returns non-inheritable file handles for the read and write ends of the pipe. These handles are then manually wrapped with inheritable file descriptors via _open_osfhandle, but the underlying OS handle is still non-inheritable. To work around this, duplicate the file descriptor via os.dup, which internally duplicates an inheritable file handle, and then close the source file descriptor. For example:
pread, pwrite = os.pipe()
pwrite_child = os.dup(pwrite)
os.close(pwrite)
Python's subprocess module is usually the preferred way to create a child process. However, we can't use subprocess in this case because it doesn't support inheriting file descriptors via STARTUPINFO (*). Here's an example that uses os.spawnv instead:
rc = os.spawnv(os.P_WAIT, 'path/to/spam.exe', ['spam', 'arg1', 'arg2'])
(*) It's an awkward situation that Windows Python internally uses the C runtime file API (e.g. _wopen, _read, _write), but in places fails to support C file descriptors. It should bite the bullet and use the Windows file API directly with OS file handles, then it would at least be consistent.
I am trying to open a Swift Pipe from a python script that is executed via a Swift Task
Swift code
let pipe=Pipe()
let task = Process()
var env=ProcessInfo.processInfo.environment
task.launchPath = "/pythonscript.py"
let fh=pipe.fileHandleForWriting
task.arguments = ["\(fh.fileDescriptor)"]
task.launch()
Python
#!/usr/local/bin/python
import os
import sys
fd=int(sys.argv[1])
print(os.fdopen(fd, u'w'))
What I get back from the python script is
Traceback (most recent call last):
File "./test.py", line 7, in <module>
print(os.fdopen(fd, u'w'))
OSError: [Errno 9] Bad file descriptor
Why can't python open the file descriptor I created in Swift?
Why can't python open the file descriptor I created in Swift?
Short answer (fudging a little): because the file descriptor is a process local identifier which is used by the OS to link to the open file information it keeps for process. You cannot copy them between processes.
Long answer:
In macOS/Unix/Linux (*nix) a file descriptor is just a process-local value which is used by the OS to link to the appropriate open file information within the OS. Different processes can have exactly the same file descriptor values which identify completely different files. Therefore you cannot simply copy a file descriptor value between processes.
In *nix a child process inherits the open files, and their associated descriptors, from its parent. This is the only way file descriptors get passed between processes. In outline the steps are:
The parent process forks, creating a clone of itself
The clone then closes any files the child should not access (usually all of them except the standard input, output and error files).
If the parent has pre-opened files that should be the child's standard input, output or error the clone then reassigns the file descriptors for those files to the standard file descriptors for standard input, output and error.
After all this file descriptor work is done the clone then replaces its code with the code the child needs to run - this keeps the open files and file descriptors.
The child code now executes unaware of all this setup.
In Swift all the above is handled by Process, in Terminal it is handled by the shell which uses it to set up file redirection, pipes etc.
To get your pipe to your Python process you can (a) use the Process methods to attach it to the spawned processes standard input or output; (b) create a named pipe, that is one with a file path, and pass the file path to your python to open; or (c) go low-level and write some interfacing C code which does the fork/dup(2)/exec calls and starts up your python code with the pipe on a known descriptor other than standard input or output.
(a) is easiest! (b) requires you to do some research on named pipes, its not hard but you'll need work with sandboxing if its enabled and create the pipe in a directory both process can access. (c) is best avoided.
Have fun, and if you get stuck ask a new question showing what you've tried, where it goes wrong, etc. and someone will undoubtedly help you along.
HTH
A script I am trying to fix uses the following paradigm for redirecting stdout to a file.
import os
stdio_file = 'temp.out'
flag = os.O_WRONLY | os.O_CREAT | os.O_TRUNC
stdio_fp = os.open(stdio_file, flag)
os.dup2(stdio_fp, 1)
print("hello")
On Python 2, this works. On Python 3, you get an OSError
Traceback (most recent call last):
File "test.py", line 6, in <module>
print("hello")
OSError: [WinError 6] The handle is invalid
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
OSError: [WinError 6] The handle is invalid
I assume there are more preferable methods to routing stdout through a file but I am wondering why this method stopped working in Python 3 and if there is an easy way to fix it?
Code such as os.dup2(stdio_fp, 1) will work in Python 3.5 and earlier, or in 3.6+ with the environment variable PYTHONLEGACYWINDOWSSTDIO defined.
The issue is that print writes to a sys.stdout object that's only meant for console I/O. Specifically, in 3.6+ the raw layer of Python 3's standard output file (i.e. sys.stdout.buffer.raw) is an io._WindowsConsoleIO instance when stdout is initially a console file1. This object caches the initial handle value of the stdout file descriptor2. Subsequently, dup2 closes this handle while re-associating the file descriptor with a duplicate handle for "temp.out". At this point the cached handle is no longer valid. (Really, it shouldn't be caching the handle, since calling _get_osfhandle is relatively cheap compared to the cost of console I/O.) However, even if it had a valid handle for "temp.out", sys.stdout.write would fail anyway since _WindowsConsoleIO uses the console-only function WriteConsoleW instead of generic WriteFile.
You need to reassign sys.stdout instead of bypassing Python's I/O stack with low-level operations such as dup2. I know it's not ideal from the Unix developer's point of view. I wish we could re-implement the way Unicode is supported for the Windows console without introducing this console-only _WindowsConsoleIO class, which disrupts low-level patterns that people have relied on for decades.
1. _WindowsConsoleIO was added to support the full range of Unicode in the Windows console (at least as well as the console can support it). For this it uses the console's UTF-16 wide-character API (e.g. ReadConsoleW and WriteConsoleW). Previously CPython's console support was limited to text that was encoded with Windows codepages, using generic byte-based I/O (e.g. ReadFile and WriteFile).
2. Windows uses handles to reference kernel objects such as File objects. This system isn't compatible in behavior with POSIX file descriptors (FDs). The C runtime (CRT) thus has a "low I/O" compatibility layer that associates POSIX-style FDs with Windows file handles, and it also implements POSIX I/O functions such as open and write. The CRT's _open_osfhandle function associates a native file handle with an FD, and _get_osfhandle returns the handle associated with an FD. Sometimes CPython uses the CRT low I/O layer, and sometimes it uses the Windows API directly. It's really kind of a mess, if you ask me.
I have a service running on a Linux box that creates a named pipe character device-special file, and I want to write a Python3 program that communicates with the service by writing text commands and reading text replies from the pipe device. I don't have source code for the service.
I can use os.open(named_pipe_pathname, os.O_RDWR), and I can use os.read(...) and os.write(...) to read and write it, but that's a pain because I have to write my own code to convert between bytes and strings, I have to write my own readline(...) function, etc.
I would much rather use a Python3 io object to read and write the pipe device, but every way I can think to create one returns the same error:
io.UnsupportedOperation: File or stream is not seekable.
For example, I get that message if I try open(pathname, "r+"), and I get that same message if I try fd=os.open(...) followed by os.fdopen(fd, "r+", ...).
Q: What is the preferred way for a Python3 program to write and read text to and from a named pipe character device?
Edit:
Oops! I assumed that I was dealing with a named pipe because documentation for the service describes it as a "pipe" and, because it doesn't appear in the file system until the user-mode service runs. But, the Linux file utility says it is in fact, a character device special file.
The problem occurs because attempting to use io.open in read-write mode implicitly tries to wrap the underlying file in io.BufferedRandom (which is then wrapped in io.TextIOWrapper if in text mode), which assumes the underlying file is not only read/write, but random access, and it takes liberties (seeking implicitly) based on this. There is a separate class, io.BufferedRWPair, intended for use with read/write pipes (the docstring specifically mentions it being used for sockets and two way pipes).
You can mimic the effects of io.open by manually wrapping layer by layer to produce the same end result. Specifically, for a text mode wrapper, you'd do something like:
rawf = io.FileIO(named_pipe_pathname, mode="rb+")
with io.TextIOWrapper(io.BufferedRWPair(rawf, rawf), encoding='utf-8', write_through=True) as txtf:
del rawf # Remove separate reference to rawf; txtf manages lifetime now
# Example use that works (but is terrible form, since communicating with
# oneself without threading, select module, etc., is highly likely to deadlock)
# It works for this super-simple case; presumably you have some parallel real code
txtf.write("abcé\n")
txtf.flush()
print(txtf.readline(), flush=True)
I believe this will close rawf twice when txtf is closed, but luckily, double-close is harmless here (the second close does nothing, realizing it's already closed).
Solution
You can use pexpect. Here is an example using two python modules:
caller.py
import pexpect
proc = pexpect.spawn('python3 backwards.py')
proc.expect(' > ')
while True:
n = proc.sendline(input('Feed me - '))
proc.expect(' > ')
print(proc.before[n+1:].decode())
backwards.py
x = ''
while True:
x = input(x[::-1] + ' > ')
Explanation
caller.py is using a "Pseudo-TTY device" to talk to backwards.py. We are providing input with sendline and capturing input with expect (and the before attribute).
It looks like you need to create separate handles for reading and for writing: to open read/write just requires a seek method. I couldn't figure out how to timeout reading, so it's nice to add an opener (see the docstring for io.open) that opens the reader in non-blocking mode. I set up a simple echo service on a named pipe called /tmp/test_pipe:
In [1]: import io
In [2]: import os
In [3]: nonblockingOpener = lambda name, flags:os.open(name, flags|os.O_NONBLOCK)
In [4]: reader = io.open('/tmp/test_pipe', 'r', opener = nonblockingOpener)
In [5]: writer = io.open('/tmp/test_pipe', 'w')
In [6]: writer.write('Hi have a line\n')
In [7]: writer.flush()
In [8]: reader.readline()
Out[8]: 'You said: Hi have a line\n'
In [9]: reader.readline()
''