dup2 and pipe shenanigans with Python and Windows - python

Say hello to bug.py:
import os, sys
stdout2 = os.dup(sys.stdout.fileno())
(r,w) = os.pipe()
os.dup2(w,sys.stdout.fileno())
print("Hail Stan")
z = os.read(r,1000)
os.write(stdout2,z)
If you run this on OSX (and I imagine, on Linux), this works great. However, in Windows, we get this:
PS Z:\...> python --version
Python 3.9.2
PS Z:\...> python bug.py
Traceback (most recent call last):
File "Z:\...\bug.py", line 6, in <module>
print("Hail Stan")
OSError: [WinError 1] Incorrect function
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
OSError: [WinError 1] Incorrect function
I don't know much about anything but this smells of some deep Python+Windows mysticism and PDB isn't cutting cheese so I can't debug my way out of this. Anyone knows how to make this work?
The slightly bigger context is that I'm trying to build some tee-like functionality into my application, but all other methods I've found of capturing stdout are incomplete. You can use popen and subprocess for various things, but if you truly want to grab your whole process's stdout and stderr, at least on Unix, the only way I found is to dup2 the filedescriptors through pipes.

I don't know if I've truly got to the bottomest bottom but I've mostly figured this one out.
sys.stdout is an object of type TextIOWrapper, whose sys.stdout.write() method eventually calls either os.write() or the C equivalent, on the filedescriptor sys.stdout.fileno(), which is 1. There are several types of filedescriptors: files, sockets, serial ports, terminals, pipes, etc... The C library functions write(), close(), etc... work on mostly any type of filedescriptor, but some features only work when the filedescriptor is of a suitable type. At creation time, the TextIOWrapper that is sys.stdout examines its filedescriptor (1) and determines that it's a terminal. On Windows only, sys.stdout.write() ends up doing some operations that are only valid if the filedescriptor 1 is truly a terminal. Once you os.dup2() so that 1 becomes an os.pipe(), and not a terminal, the Windows Python implementation crashes when it attempts to do terminal-specific operations on a pipe.
It's not clear to me that there's a way of causing sys.stdout to re-examine its filedescriptor so that it notices that it's not a terminal anymore, and avoid crashing the Python interpreter.
As a workaround, on Windows only, I'm doing stdout.write = lambda z: os.write(stdout.fileno(),z.encode() if hasattr(z,'encode') else z)
I haven't done a deep dive through the Python code base to see whether this is sufficient, but it does appear to allow my programs to run correctly, and print() no longer causes the Python interpreter to crash.
Many other people instead do sys.stdout = ..., but I did not want to do this for many reasons. For one, I was worried that some script or module might store a local cached copy of sys.stdout. Although it is possible to cache sys.stdout.write directly, I thought that this was less likely. Furthermore, it allows me to monkeypatch in my constructor, e.g. StreamCapture(sys.stdout), without having to refer to the global variable sys.stdout directly in my constructor (a "side-effect"). If you believe that a function is allowed to mutate its inputs, then this is side-effect-free.

Related

Use pipe() and fdopen() to pass data from Python script to C++ application in Windows

We have some Linux/macOS application, which can communicate with outer world by passing file descriptor and reading data from it. Usually this is done to pass stdin/stdout descriptors, however we use pipe() and this works pretty well. Except MinGW/Windows.
What would be the recommended way of doing the same job under the Windows? Pass the whole file handle, or there are good ways to simulate small-int-like descriptor?
In Windows, C file descriptors are inherited in the process STARTUPINFO record in the reserved fields cbReserved2 and lpReserved2. The protocol is undocumented, but the source is distributed with Visual C++. C functions that use this feature include the _[w]spawn family of functions and [_w]system. (The [_w]system function, however, is not generally useful in this regard because only the immediate cmd.exe process inherits the parent's file descriptors. CMD does not pass them on to child processes.)
In Python 2.7, os.pipe is implemented by calling CreatePipe, which returns non-inheritable file handles for the read and write ends of the pipe. These handles are then manually wrapped with inheritable file descriptors via _open_osfhandle, but the underlying OS handle is still non-inheritable. To work around this, duplicate the file descriptor via os.dup, which internally duplicates an inheritable file handle, and then close the source file descriptor. For example:
pread, pwrite = os.pipe()
pwrite_child = os.dup(pwrite)
os.close(pwrite)
Python's subprocess module is usually the preferred way to create a child process. However, we can't use subprocess in this case because it doesn't support inheriting file descriptors via STARTUPINFO (*). Here's an example that uses os.spawnv instead:
rc = os.spawnv(os.P_WAIT, 'path/to/spam.exe', ['spam', 'arg1', 'arg2'])
(*) It's an awkward situation that Windows Python internally uses the C runtime file API (e.g. _wopen, _read, _write), but in places fails to support C file descriptors. It should bite the bullet and use the Windows file API directly with OS file handles, then it would at least be consistent.

Python on Windows "Handle Invalid" when redirecting stdout writing to file

A script I am trying to fix uses the following paradigm for redirecting stdout to a file.
import os
stdio_file = 'temp.out'
flag = os.O_WRONLY | os.O_CREAT | os.O_TRUNC
stdio_fp = os.open(stdio_file, flag)
os.dup2(stdio_fp, 1)
print("hello")
On Python 2, this works. On Python 3, you get an OSError
Traceback (most recent call last):
File "test.py", line 6, in <module>
print("hello")
OSError: [WinError 6] The handle is invalid
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
OSError: [WinError 6] The handle is invalid
I assume there are more preferable methods to routing stdout through a file but I am wondering why this method stopped working in Python 3 and if there is an easy way to fix it?
Code such as os.dup2(stdio_fp, 1) will work in Python 3.5 and earlier, or in 3.6+ with the environment variable PYTHONLEGACYWINDOWSSTDIO defined.
The issue is that print writes to a sys.stdout object that's only meant for console I/O. Specifically, in 3.6+ the raw layer of Python 3's standard output file (i.e. sys.stdout.buffer.raw) is an io._WindowsConsoleIO instance when stdout is initially a console file1. This object caches the initial handle value of the stdout file descriptor2. Subsequently, dup2 closes this handle while re-associating the file descriptor with a duplicate handle for "temp.out". At this point the cached handle is no longer valid. (Really, it shouldn't be caching the handle, since calling _get_osfhandle is relatively cheap compared to the cost of console I/O.) However, even if it had a valid handle for "temp.out", sys.stdout.write would fail anyway since _WindowsConsoleIO uses the console-only function WriteConsoleW instead of generic WriteFile.
You need to reassign sys.stdout instead of bypassing Python's I/O stack with low-level operations such as dup2. I know it's not ideal from the Unix developer's point of view. I wish we could re-implement the way Unicode is supported for the Windows console without introducing this console-only _WindowsConsoleIO class, which disrupts low-level patterns that people have relied on for decades.
1. _WindowsConsoleIO was added to support the full range of Unicode in the Windows console (at least as well as the console can support it). For this it uses the console's UTF-16 wide-character API (e.g. ReadConsoleW and WriteConsoleW). Previously CPython's console support was limited to text that was encoded with Windows codepages, using generic byte-based I/O (e.g. ReadFile and WriteFile).
2. Windows uses handles to reference kernel objects such as File objects. This system isn't compatible in behavior with POSIX file descriptors (FDs). The C runtime (CRT) thus has a "low I/O" compatibility layer that associates POSIX-style FDs with Windows file handles, and it also implements POSIX I/O functions such as open and write. The CRT's _open_osfhandle function associates a native file handle with an FD, and _get_osfhandle returns the handle associated with an FD. Sometimes CPython uses the CRT low I/O layer, and sometimes it uses the Windows API directly. It's really kind of a mess, if you ask me.

what is the os.close(3) for?

What is the os.close(3) for?
I am reading the python cookbook 2nd chapter 2.9, it explains how the python zip file work. There is one snippet of code in it I don't really got it.
import zipfile, tempfile, os, sys
handle, filename = tempfile.mkstemp('.zip')
os.close(handle) # <- handle is int 3 here
z = zipfile.ZipFile(filename, 'w')
z.writestr('hello.py', 'def f(): return "hello world from "+__file__\n')
z.close()
sys.path.insert(0, filename)
import hello
print hello.f()
os.unlink(filename)
os.close() explaination in python docs:
This function is intended for low-level I/O and must be applied to a file descriptor as returned by os.open() or pipe(). To close a “file object” returned by the built-in function open() or by popen() or fdopen(), use its close() method.
The file descriptor in linux from 0,1 & 2 are stdin, stdout & stderror, I don't get what the fd 3 for? Even though I have read this "What is the file descriptor 3 assigned by default? ".
I comment the os.close(handle) out, but the output make no different.
Even though Python mostly deals in "file objects", these are an abstraction around OS-level file handles; when actually reading or writing content to a file (or network stream, or other file-like object) at the operating system level, one passes the OS the handle number associated with the file with which one wants to interact. Thus, every file object in Python that's actually backed by an OS-level file handle has such a file descriptor number associated.
File handles are stored in a table, each associated with an integer. On Linux, you can look at the directory /proc/self/fds (substituting a PID number for self to look at a different process) to see which handles have which numbers for a given process.
handle, filename = tempfile.mkstemp('.zip'); os.close(handle), thus, closes the OS-level file handle which was returned to you by mkstemp.
By the way: It's important to note that there's absolutely nothing special about the number 3, and that there is no default or conventional behavior for same implemented at the operating system level; it just happened to be the next available position in the file handle table when mkstemp was called (or, to be more precise, when the C standard library implementation of mkstemp called the OS-level syscall open).
You are getting file descriptor 3 because in this case it is the next available file descriptor. As you mentioned, stdin (0), stdout (1) and stderr (2) are automatically opened for you. The link you cited ( https://unix.stackexchange.com/questions/41421/what-is-the-file-descriptor-3-assigned-by-default ) points this out, too.

Deciphering large program flow in Python

I'm in the process of learning how a large (356-file), convoluted Python program is set up. Besides manually reading through and parsing the code, are there any good methods for following program flow?
There are two methods which I think would be useful:
Something similar to Bash's "set -x"
Something that displays which file outputs each line of output
Are there any methods to do the above, or any other ways that you have found useful?
I don't know if this is actually a good idea, but since I actually wrote a hook to display the file and line before each line of output to stdout, I might as well give it to you…
import inspect, sys
class WrapStdout(object):
_stdout = sys.stdout
def write(self, buf):
frame = sys._getframe(1)
try:
f = inspect.getsourcefile(frame)
except TypeError:
f = 'unknown'
l = frame.f_lineno
self._stdout.write('{}:{}:{}'.format(f, l, buf))
def flush(self):
self._stdout.flush()
sys.stdout = WrapStdout()
Just save that as a module, and after you import it, every chunk of stdout will be prefixed with file and line number.
Of course this will get pretty ugly if:
Anyone tries to print partial lines (using stdout.write directly, or print magic comma in 2.x, or end='' in 3.x).
You mix Unicode and non-Unicode in 2.x.
Any of the source files have long pathnames.
etc.
But all the tricky deep-Python-magic bits are there; you can build on top of it pretty easily.
Could be very tedious, but using a debugger to trace the flow of execution, instruction by instruction could probably help you to some extent.
import pdb
pdb.set_trace()
You could look for a cross reference program. There is an old program called pyxr that does this. The aim of cross reference is to let you know how classes refer to each other. Some of the IDE's also do this sort of thing.
I'd recommend running the program inside an IDE like pydev or pycharm. Being able to stop the program and inspect its state can be very helpful.

Using python subprocess to fake running a cmd from a terminal

We have a vendor-supplied python tool ( that's byte-compiled, we don't have the source ). Because of this, we're also locked into using the vendor supplied python 2.4. The way to the util is:
source login.sh
oupload [options]
The login.sh just sets a few env variables, and then 2 aliases:
odownload () {
${PYTHON_CMD} ${OCLIPATH}/ocli/commands/word_download_command.pyc "$#"
}
oupload () {
${PYTHON_CMD} ${OCLIPATH}/ocli/commands/word_upload_command.pyc "$#"
}
Now, when I run it their way - works fine. It will prompt for a username and password, then do it's thing.
I'm trying to create a wrapper around the tool to do some extra steps after it's run and provide some sane defaults for the utility. The problem I'm running into is I cannot, for the life of me, figure out how to use subprocess to successfully do this. It seems to realize that the original command isn't running directly from the terminal and bails.
I created a '/usr/local/bin/oupload' and copied from the original login.sh. Only difference is instead of doing an alias at the end, I actually run the command.
Then, in my python script, I try to run my new shell script:
if os.path.exists(options.zipfile):
try:
cmd = string.join(cmdargs,' ')
p1 = Popen(cmd, shell=True, stdin=PIPE)
But I get:
Enter Opsware Username: Traceback (most recent call last):
File "./command.py", line 31, in main
File "./controller.py", line 51, in handle
File "./controllers/word_upload_controller.py", line 81, in _handle
File "./controller.py", line 66, in _determineNew
File "./lib/util.py", line 83, in determineNew
File "./lib/util.py", line 112, in getAuth
Empty Username not legal
Unknown Error Encountered
SUMMARY:
Name: Empty Username not legal
Description: None
So it seemed like an extra carriage return was getting sent ( I tried rstripping all the options, didn't help ).
If I don't set stdin=PIPE, I get:
Enter Opsware Username: Traceback (most recent call last):
File "./command.py", line 31, in main
File "./controller.py", line 51, in handle
File "./controllers/word_upload_controller.py", line 81, in _handle
File "./controller.py", line 66, in _determineNew
File "./lib/util.py", line 83, in determineNew
File "./lib/util.py", line 109, in getAuth
IOError: [Errno 5] Input/output error
Unknown Error Encountered
I've tried other variations of using p1.communicate, p1.stdin.write() along with shell=False and shell=True, but I've had no luck in trying to figure out how to properly send along the username and password. As a last result, I tried looking at the byte code for the utility they provided - it didn't help - once I called the util's main routine with the proper arguments, it ended up core dumping w/ thread errors.
Final thoughts - the utility doesn't want to seem to 'wait' for any input. When run from the shell, it pauses at the 'Username' prompt. When run through python's popen, it just blazes thru and ends, assuming no password was given. I tried to lookup ways of maybe preloading the stdin buffer - thinking maybe the process would read from that if it was available, but couldn't figure out if that was possible.
I'm trying to stay away from using pexpect, mainly because we have to use the vendor's provided python 2.4 because of the precompiled libraries they provide and I'm trying to keep distribution of the script to as minimal a footprint as possible - if I have to, I have to, but I'd rather not use it ( and I honestly have no idea if it works in this situation either ).
Any thoughts on what else I could try would be most appreciated.
UPDATE
So I solved this by diving further into the bytecode and figuring out what I was missing from the compiled command.
However, this presented two problems -
The vendor code, when called, was doing an exit when it completed
The vendor code was writing to stdout, which I needed to store and operate on ( it contains the ID of the uploaded pkg ). I couldn't just redirect stdout, because the vendor code was still asking for the username/password.
1 was solved easy enough by wrapping their code in a try/except clause.
2 was solved by doing something similar to: https://stackoverflow.com/a/616672/677373
Instead of a log file, I used cStringIO. I also had to implement a fake 'flush' method, since it seems the vendor code was calling that and complaining that the new obj I had provided for stdout didn't supply it - code ends up looking like:
class Logger(object):
def __init__(self):
self.terminal = sys.stdout
self.log = StringIO()
def write(self, message):
self.terminal.write(message)
self.log.write(message)
def flush(self):
self.terminal.flush()
self.log.flush()
if os.path.exists(options.zipfile):
try:
os.environ['OCLI_CODESET'] = 'ISO-8859-1'
backup = sys.stdout
sys.stdout = output = Logger()
# UploadCommand was the command found in the bytecode
upload = UploadCommand()
try:
upload.main(cmdargs)
except Exception, rc:
pass
sys.stdout = backup
# now do some fancy stuff with output from output.log
I should note that the only reason I simply do a 'pass' in the except: clause is that the except clause is always called. The 'rc' is actually the return code from the command, so I will probably add handling for non-zero cases.
I tried to lookup ways of maybe preloading the stdin buffer
Do you perhaps want to create a named fifo, fill it with username/password info, then reopen it in read mode and pass it to popen (as in popen(..., stdin=myfilledbuffer))?
You could also just create an ordinary temporary file, write the data to it, and reopen it in read mode, again, passing the reopened handle as stdin. (This is something I'd personally avoid doing, since writing username/passwords to temporary files is often of the bad. OTOH it's easier to test than FIFOs)
As for the underlying cause: I suspect that the offending software is reading from stdin via a non-blocking method. Not sure why that works when connected to a terminal.
AAAANYWAY: no need to use pipes directly via Popen at all, right? I kinda laugh at the hackishness of this, but I'll bet it'll work for you:
# you don't actually seem to need popen here IMO -- call() does better for this application.
statuscode = call('echo "%s\n%s\n" | oupload %s' % (username, password, options) , shell=True)
tested with status = call('echo "foo\nbar\nbar\nbaz" |wc -l', shell = True) (output is '4', naturally.)
The original question was solved by just avoiding the issue and not using the terminal and instead importing the python code that was being called by the shell script and just using that.
I believe J.F. Sebastian's answer would probably work better for what was originally asked, however, so I'd suggest people looking for an answer to a similar question look down the path of using the pty module.

Categories

Resources