I was trying to debug an issue with abc.ABCMeta - in particular a subclass check that didn't work as expected and I wanted to start by simply adding a print to the __subclasscheck__ method (I know there are better ways to debug code, but pretend for the sake of this question that there's no alternative). However when starting Python afterwards Python crashes (like a segmentation fault) and I get this exception:
Fatal Python error: Py_Initialize: can't initialize sys standard streams
Traceback (most recent call last):
File "C:\...\lib\io.py", line 84, in <module>
File "C:\...\lib\abc.py", line 158, in register
File "C:\...\lib\abc.py", line 196, in __subclasscheck__
RuntimeError: lost sys.stdout
So it probebly wasn't a good idea to put the print in there. But where exactly does the exception come from? I only changed Python code, that shouldn't crash, right?
Does someone know where this exception is coming from and if/how I can avoid it but still put a print in the abc.ABCMeta.__subclasscheck__ method?
I'm using Windows 10, Python-3.5 (just in case it might be important).
This exception stems from the fact that CPython imports io, and, indirectly, abc.py during the initialization of the standard streams:
if (!(iomod = PyImport_ImportModule("io"))) {
goto error;
}
io imports the abc module and registers FileIO as a virtual subclass of RawIOBase, a couple of other classes for BufferedIOBase and others for TextIOBase. ABCMeta.register invokes __subclasscheck__ in the process.
As you understand, using print in __subclasscheck__ when sys.stdout isn't set-up is a big no-no; the initialization fails and you get back your error:
if (initstdio() < 0)
Py_FatalError(
"Py_NewInterpreter: can't initialize sys standard streams");
You can get around it by guarding it with a hasattr(sys, 'stdout'), sys has been initialized by this point while stdout hasn't (and, as such, won't exist in sys in the early initialization phase):
if hasattr(sys, 'stdout'):
print("Debug")
you should get good amount of output when firing Python up now.
Related
Say hello to bug.py:
import os, sys
stdout2 = os.dup(sys.stdout.fileno())
(r,w) = os.pipe()
os.dup2(w,sys.stdout.fileno())
print("Hail Stan")
z = os.read(r,1000)
os.write(stdout2,z)
If you run this on OSX (and I imagine, on Linux), this works great. However, in Windows, we get this:
PS Z:\...> python --version
Python 3.9.2
PS Z:\...> python bug.py
Traceback (most recent call last):
File "Z:\...\bug.py", line 6, in <module>
print("Hail Stan")
OSError: [WinError 1] Incorrect function
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
OSError: [WinError 1] Incorrect function
I don't know much about anything but this smells of some deep Python+Windows mysticism and PDB isn't cutting cheese so I can't debug my way out of this. Anyone knows how to make this work?
The slightly bigger context is that I'm trying to build some tee-like functionality into my application, but all other methods I've found of capturing stdout are incomplete. You can use popen and subprocess for various things, but if you truly want to grab your whole process's stdout and stderr, at least on Unix, the only way I found is to dup2 the filedescriptors through pipes.
I don't know if I've truly got to the bottomest bottom but I've mostly figured this one out.
sys.stdout is an object of type TextIOWrapper, whose sys.stdout.write() method eventually calls either os.write() or the C equivalent, on the filedescriptor sys.stdout.fileno(), which is 1. There are several types of filedescriptors: files, sockets, serial ports, terminals, pipes, etc... The C library functions write(), close(), etc... work on mostly any type of filedescriptor, but some features only work when the filedescriptor is of a suitable type. At creation time, the TextIOWrapper that is sys.stdout examines its filedescriptor (1) and determines that it's a terminal. On Windows only, sys.stdout.write() ends up doing some operations that are only valid if the filedescriptor 1 is truly a terminal. Once you os.dup2() so that 1 becomes an os.pipe(), and not a terminal, the Windows Python implementation crashes when it attempts to do terminal-specific operations on a pipe.
It's not clear to me that there's a way of causing sys.stdout to re-examine its filedescriptor so that it notices that it's not a terminal anymore, and avoid crashing the Python interpreter.
As a workaround, on Windows only, I'm doing stdout.write = lambda z: os.write(stdout.fileno(),z.encode() if hasattr(z,'encode') else z)
I haven't done a deep dive through the Python code base to see whether this is sufficient, but it does appear to allow my programs to run correctly, and print() no longer causes the Python interpreter to crash.
Many other people instead do sys.stdout = ..., but I did not want to do this for many reasons. For one, I was worried that some script or module might store a local cached copy of sys.stdout. Although it is possible to cache sys.stdout.write directly, I thought that this was less likely. Furthermore, it allows me to monkeypatch in my constructor, e.g. StreamCapture(sys.stdout), without having to refer to the global variable sys.stdout directly in my constructor (a "side-effect"). If you believe that a function is allowed to mutate its inputs, then this is side-effect-free.
Background
Consider the following minimal example:
When I save the following script and run it from terminal,
import time
time.sleep(5)
raise Exception
the code will raise an error after sleeping five seconds, leaving the following traceback.
Traceback (most recent call last):
File "test/minimal_error.py", line 4, in <module>
raise Exception
Exception
Now, say, I run the script, and during the 5-second-sleep, I add a line in the middle.
import time
time.sleep(5)
a = 1
raise Exception
After the python interpreter wakes up from the sleep and reaches the next line, raise Exception, it will raise the error, but it leaves the following traceback.
Traceback (most recent call last):
File "test/minimal_error.py", line 4, in <module>
a = 1
Exception
So the obvious problem is that it doesn't print the actual code that caused the error. Although it gives the correct line number (correctly reflecting the version of the script that is running, while understandably useless) and a proper error message, I can't really know what piece of code actually caused the error.
In real practice, I implement one part of a program, run it to see if that part is doing fine, and while it is still running, I move on to the next thing I have to implement. And when the script throws an error, I have to find which actual line of code caused the error. I usually just read the error message and try to deduce the original code that caused it. Sometimes it isn't easy to guess, so I copy the script to clipboard and rollback the code by undoing what I've written after running the script, check the line that caused error, and paste back from clipboard.
Question
Is there any understandable reason why the interpreter shows a = 1, which is line 4 of the "current" version of the code, instead of raise Exception, which is line 4 of the "running" version of the code? If the interpreter knows "line 4" caused the error and the error message is "Exception", why can't it say the command raise Exception raised it?
I'm not really sure if this question is on-topic here, but I don't think I can conclude it off-topic from what the help center says. It is about "[a] software [tool] commonly used by programmers" (the Python interpreter) and is "a practical, answerable problem that is unique to software development," I think. I don't think it's opinion-based, because there should be a reason for this choice of implementation.
(Observed the same in Python 2.7.16, 3.6.8, 3.7.2, and 3.7.3, so it doesn't seem to be version-specific, but a thing that just happens in Python.)
The immediate reason is that Python re-opens the file and reads the specified line again to print it in error messages. So why would it need to do that when it already read the file in the beginning? Because it doesn't keep the source code in memory, only the generated byte code.
In fact, Python will never hold the entire contents of the source file in memory at one time. Instead the lexer will read from the file and produce one token at a time, which the parser then parses and turns into byte code. Once the parser is done with a token, it's gone.
So the only way to get back at the original source code is to open the source file again.
I think it a classic problem which is described here.
Sleep use os system call to pause execution of that thread.
A script I am trying to fix uses the following paradigm for redirecting stdout to a file.
import os
stdio_file = 'temp.out'
flag = os.O_WRONLY | os.O_CREAT | os.O_TRUNC
stdio_fp = os.open(stdio_file, flag)
os.dup2(stdio_fp, 1)
print("hello")
On Python 2, this works. On Python 3, you get an OSError
Traceback (most recent call last):
File "test.py", line 6, in <module>
print("hello")
OSError: [WinError 6] The handle is invalid
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
OSError: [WinError 6] The handle is invalid
I assume there are more preferable methods to routing stdout through a file but I am wondering why this method stopped working in Python 3 and if there is an easy way to fix it?
Code such as os.dup2(stdio_fp, 1) will work in Python 3.5 and earlier, or in 3.6+ with the environment variable PYTHONLEGACYWINDOWSSTDIO defined.
The issue is that print writes to a sys.stdout object that's only meant for console I/O. Specifically, in 3.6+ the raw layer of Python 3's standard output file (i.e. sys.stdout.buffer.raw) is an io._WindowsConsoleIO instance when stdout is initially a console file1. This object caches the initial handle value of the stdout file descriptor2. Subsequently, dup2 closes this handle while re-associating the file descriptor with a duplicate handle for "temp.out". At this point the cached handle is no longer valid. (Really, it shouldn't be caching the handle, since calling _get_osfhandle is relatively cheap compared to the cost of console I/O.) However, even if it had a valid handle for "temp.out", sys.stdout.write would fail anyway since _WindowsConsoleIO uses the console-only function WriteConsoleW instead of generic WriteFile.
You need to reassign sys.stdout instead of bypassing Python's I/O stack with low-level operations such as dup2. I know it's not ideal from the Unix developer's point of view. I wish we could re-implement the way Unicode is supported for the Windows console without introducing this console-only _WindowsConsoleIO class, which disrupts low-level patterns that people have relied on for decades.
1. _WindowsConsoleIO was added to support the full range of Unicode in the Windows console (at least as well as the console can support it). For this it uses the console's UTF-16 wide-character API (e.g. ReadConsoleW and WriteConsoleW). Previously CPython's console support was limited to text that was encoded with Windows codepages, using generic byte-based I/O (e.g. ReadFile and WriteFile).
2. Windows uses handles to reference kernel objects such as File objects. This system isn't compatible in behavior with POSIX file descriptors (FDs). The C runtime (CRT) thus has a "low I/O" compatibility layer that associates POSIX-style FDs with Windows file handles, and it also implements POSIX I/O functions such as open and write. The CRT's _open_osfhandle function associates a native file handle with an FD, and _get_osfhandle returns the handle associated with an FD. Sometimes CPython uses the CRT low I/O layer, and sometimes it uses the Windows API directly. It's really kind of a mess, if you ask me.
We have a vendor-supplied python tool ( that's byte-compiled, we don't have the source ). Because of this, we're also locked into using the vendor supplied python 2.4. The way to the util is:
source login.sh
oupload [options]
The login.sh just sets a few env variables, and then 2 aliases:
odownload () {
${PYTHON_CMD} ${OCLIPATH}/ocli/commands/word_download_command.pyc "$#"
}
oupload () {
${PYTHON_CMD} ${OCLIPATH}/ocli/commands/word_upload_command.pyc "$#"
}
Now, when I run it their way - works fine. It will prompt for a username and password, then do it's thing.
I'm trying to create a wrapper around the tool to do some extra steps after it's run and provide some sane defaults for the utility. The problem I'm running into is I cannot, for the life of me, figure out how to use subprocess to successfully do this. It seems to realize that the original command isn't running directly from the terminal and bails.
I created a '/usr/local/bin/oupload' and copied from the original login.sh. Only difference is instead of doing an alias at the end, I actually run the command.
Then, in my python script, I try to run my new shell script:
if os.path.exists(options.zipfile):
try:
cmd = string.join(cmdargs,' ')
p1 = Popen(cmd, shell=True, stdin=PIPE)
But I get:
Enter Opsware Username: Traceback (most recent call last):
File "./command.py", line 31, in main
File "./controller.py", line 51, in handle
File "./controllers/word_upload_controller.py", line 81, in _handle
File "./controller.py", line 66, in _determineNew
File "./lib/util.py", line 83, in determineNew
File "./lib/util.py", line 112, in getAuth
Empty Username not legal
Unknown Error Encountered
SUMMARY:
Name: Empty Username not legal
Description: None
So it seemed like an extra carriage return was getting sent ( I tried rstripping all the options, didn't help ).
If I don't set stdin=PIPE, I get:
Enter Opsware Username: Traceback (most recent call last):
File "./command.py", line 31, in main
File "./controller.py", line 51, in handle
File "./controllers/word_upload_controller.py", line 81, in _handle
File "./controller.py", line 66, in _determineNew
File "./lib/util.py", line 83, in determineNew
File "./lib/util.py", line 109, in getAuth
IOError: [Errno 5] Input/output error
Unknown Error Encountered
I've tried other variations of using p1.communicate, p1.stdin.write() along with shell=False and shell=True, but I've had no luck in trying to figure out how to properly send along the username and password. As a last result, I tried looking at the byte code for the utility they provided - it didn't help - once I called the util's main routine with the proper arguments, it ended up core dumping w/ thread errors.
Final thoughts - the utility doesn't want to seem to 'wait' for any input. When run from the shell, it pauses at the 'Username' prompt. When run through python's popen, it just blazes thru and ends, assuming no password was given. I tried to lookup ways of maybe preloading the stdin buffer - thinking maybe the process would read from that if it was available, but couldn't figure out if that was possible.
I'm trying to stay away from using pexpect, mainly because we have to use the vendor's provided python 2.4 because of the precompiled libraries they provide and I'm trying to keep distribution of the script to as minimal a footprint as possible - if I have to, I have to, but I'd rather not use it ( and I honestly have no idea if it works in this situation either ).
Any thoughts on what else I could try would be most appreciated.
UPDATE
So I solved this by diving further into the bytecode and figuring out what I was missing from the compiled command.
However, this presented two problems -
The vendor code, when called, was doing an exit when it completed
The vendor code was writing to stdout, which I needed to store and operate on ( it contains the ID of the uploaded pkg ). I couldn't just redirect stdout, because the vendor code was still asking for the username/password.
1 was solved easy enough by wrapping their code in a try/except clause.
2 was solved by doing something similar to: https://stackoverflow.com/a/616672/677373
Instead of a log file, I used cStringIO. I also had to implement a fake 'flush' method, since it seems the vendor code was calling that and complaining that the new obj I had provided for stdout didn't supply it - code ends up looking like:
class Logger(object):
def __init__(self):
self.terminal = sys.stdout
self.log = StringIO()
def write(self, message):
self.terminal.write(message)
self.log.write(message)
def flush(self):
self.terminal.flush()
self.log.flush()
if os.path.exists(options.zipfile):
try:
os.environ['OCLI_CODESET'] = 'ISO-8859-1'
backup = sys.stdout
sys.stdout = output = Logger()
# UploadCommand was the command found in the bytecode
upload = UploadCommand()
try:
upload.main(cmdargs)
except Exception, rc:
pass
sys.stdout = backup
# now do some fancy stuff with output from output.log
I should note that the only reason I simply do a 'pass' in the except: clause is that the except clause is always called. The 'rc' is actually the return code from the command, so I will probably add handling for non-zero cases.
I tried to lookup ways of maybe preloading the stdin buffer
Do you perhaps want to create a named fifo, fill it with username/password info, then reopen it in read mode and pass it to popen (as in popen(..., stdin=myfilledbuffer))?
You could also just create an ordinary temporary file, write the data to it, and reopen it in read mode, again, passing the reopened handle as stdin. (This is something I'd personally avoid doing, since writing username/passwords to temporary files is often of the bad. OTOH it's easier to test than FIFOs)
As for the underlying cause: I suspect that the offending software is reading from stdin via a non-blocking method. Not sure why that works when connected to a terminal.
AAAANYWAY: no need to use pipes directly via Popen at all, right? I kinda laugh at the hackishness of this, but I'll bet it'll work for you:
# you don't actually seem to need popen here IMO -- call() does better for this application.
statuscode = call('echo "%s\n%s\n" | oupload %s' % (username, password, options) , shell=True)
tested with status = call('echo "foo\nbar\nbar\nbaz" |wc -l', shell = True) (output is '4', naturally.)
The original question was solved by just avoiding the issue and not using the terminal and instead importing the python code that was being called by the shell script and just using that.
I believe J.F. Sebastian's answer would probably work better for what was originally asked, however, so I'd suggest people looking for an answer to a similar question look down the path of using the pty module.
I want to disallow access to file system from clients code, so I think I could overwrite open function
env = {
'open': lambda *a: StringIO("you can't use open")
}
exec(open('user_code.py'), env)
but I got this
unqualified exec is not allowed in function 'my function' it contains a
nested function with free variables
I also try
def open_exception(*a):
raise Exception("you can't use open")
env = {
'open': open_exception
}
but got the same Exception (not "you can't use open")
I want to prevent of:
executing this:
"""def foo():
return open('some_file').read()
print foo()"""
and evaluate this
"open('some_file').write('some text')"
I also use session to store code that was evaluated previously so I need to prevent of executing this:
"""def foo(s):
return open(s)"""
and then evaluating this
"foo('some').write('some text')"
I can't use regex because someone could use (eval inside string)
"eval(\"opxx('some file').write('some text')\".replace('xx', 'en')"
Is there any way to prevent access to file system inside exec/eval? (I need both)
There's no way to prevent access to the file system inside exec/eval. Here's an example code that demonstrates a way for the user code to call otherwise restricted classes that always works:
import subprocess
code = """[x for x in ().__class__.__bases__[0].__subclasses__()
if x.__name__ == 'Popen'][0](['ls', '-la']).wait()"""
# Executing the `code` will always run `ls`...
exec code in dict(__builtins__=None)
And don't think about filtering the input, especially with regex.
You might consider a few alternatives:
ast.literal_eval if you could limit yourself only to simple expressions
Using another language for user code. You might look at Lua or JavaScript - both are sometimes used to run unsafe code inside sandboxes.
There's the pysandbox project, though I can't guarantee you that the sandboxed code is really safe. Python wasn't designed to be sandboxed, and in particular the CPython implementation wasn't written with sandboxing in mind. Even the author seems to doubt the possibility to implement such sandbox safely.
You can't turn exec() and eval() into a safe sandbox. You can always get access to the builtin module, as long as the sys module is available::
sys.modules[().__class__.__bases__[0].__module__].open
And even if sys is unavailable, you can still get access to any new-style class defined in any imported module by basically the same way. This includes all the IO classes in io.
This actually can be done.
That is, practically just what you describe can be accomplished on Linux, contrary to other answers here. That is, you can achieve a setup where you can have an exec-like call which runs untrusted code under security which is reasonably difficult to penetrate, and which allows output of the result. Untrusted code is not allowed to access the filesystem at all except for reading specifically allowed parts of the Python vm and standard library.
If that's close enough to what you wanted, read on.
I'm envisioning a system where your exec-like function spawns a subprocess under a very strict AppArmor profile, such as the one used by Straitjacket (see here and here). This will limit all filesystem access at the kernel level, other than files specifically allowed to be read. This will also limit the process's stack size, max data segment size, max resident set size, CPU time, the number of signals that can be queued, and the address space size. The process will have locked memory, cores, flock/fcntl locks, POSIX message queues, etc, wholly disallowed. If you want to allow using size-limited temporary files in a scratch area, you can mkstemp it and make it available to the subprocess, and allow writes there under certain conditions (make sure that hard links are absolutely disallowed). You'd want to make sure to clear out anything interesting from the subprocess environment and put it in a new session and process group, and close all FDs in the subprocess except for the stdin/stdout/stderr, if you want to allow communication with those.
If you want to be able to get a Python object back out from the untrusted code, you could wrap it in something which prints the result's repr to stdout, and after you check its size, you evaluate it with ast.literal_eval(). That pretty severely limits the possible types of object that can be returned, but really, anything more complicated than those basic types probably carries the possibility of sekrit maliciousness intended to be triggered within your process. Under no circumstances should you use pickle for the communication protocol between the processes.
As #Brian suggest overriding open doesn't work:
def raise_exception(*a):
raise Exception("you can't use open")
open = raise_exception
print eval("open('test.py').read()", {})
this display the content of the file but this (merging #Brian and #lunaryorn answers)
import sys
def raise_exception(*a):
raise Exception("you can't use open")
__open = sys.modules['__builtin__'].open
sys.modules['__builtin__'].open = raise_exception
print eval("open('test.py').read()", {})
will throw this:
Traceback (most recent call last):
File "./test.py", line 11, in <module>
print eval("open('test.py').read()", {})
File "<string>", line 1, in <module>
File "./test.py", line 5, in raise_exception
raise Exception("you can't use open")
Exception: you can't use open
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python2.6/dist-packages/apport_python_hook.py", line 48, in apport_excepthook
if not enabled():
File "/usr/lib/python2.6/dist-packages/apport_python_hook.py", line 23, in enabled
conf = open(CONFIG).read()
File "./test.py", line 5, in raise_exception
raise Exception("you can't use open")
Exception: you can't use open
Original exception was:
Traceback (most recent call last):
File "./test.py", line 11, in <module>
print eval("open('test.py').read()", {})
File "<string>", line 1, in <module>
File "./test.py", line 5, in raise_exception
raise Exception("you can't use open")
Exception: you can't use open
and you can access to open outside user code via __open
"Nested function" refers to the fact that it's declared inside another function, not that it's a lambda. Declare your open override at the top level of your module and it should work the way you want.
Also, I don't think this is totally safe. Preventing open is just one of the things you need to worry about if you want to sandbox Python.