Using python subprocess to fake running a cmd from a terminal - python

We have a vendor-supplied python tool ( that's byte-compiled, we don't have the source ). Because of this, we're also locked into using the vendor supplied python 2.4. The way to the util is:
source login.sh
oupload [options]
The login.sh just sets a few env variables, and then 2 aliases:
odownload () {
${PYTHON_CMD} ${OCLIPATH}/ocli/commands/word_download_command.pyc "$#"
}
oupload () {
${PYTHON_CMD} ${OCLIPATH}/ocli/commands/word_upload_command.pyc "$#"
}
Now, when I run it their way - works fine. It will prompt for a username and password, then do it's thing.
I'm trying to create a wrapper around the tool to do some extra steps after it's run and provide some sane defaults for the utility. The problem I'm running into is I cannot, for the life of me, figure out how to use subprocess to successfully do this. It seems to realize that the original command isn't running directly from the terminal and bails.
I created a '/usr/local/bin/oupload' and copied from the original login.sh. Only difference is instead of doing an alias at the end, I actually run the command.
Then, in my python script, I try to run my new shell script:
if os.path.exists(options.zipfile):
try:
cmd = string.join(cmdargs,' ')
p1 = Popen(cmd, shell=True, stdin=PIPE)
But I get:
Enter Opsware Username: Traceback (most recent call last):
File "./command.py", line 31, in main
File "./controller.py", line 51, in handle
File "./controllers/word_upload_controller.py", line 81, in _handle
File "./controller.py", line 66, in _determineNew
File "./lib/util.py", line 83, in determineNew
File "./lib/util.py", line 112, in getAuth
Empty Username not legal
Unknown Error Encountered
SUMMARY:
Name: Empty Username not legal
Description: None
So it seemed like an extra carriage return was getting sent ( I tried rstripping all the options, didn't help ).
If I don't set stdin=PIPE, I get:
Enter Opsware Username: Traceback (most recent call last):
File "./command.py", line 31, in main
File "./controller.py", line 51, in handle
File "./controllers/word_upload_controller.py", line 81, in _handle
File "./controller.py", line 66, in _determineNew
File "./lib/util.py", line 83, in determineNew
File "./lib/util.py", line 109, in getAuth
IOError: [Errno 5] Input/output error
Unknown Error Encountered
I've tried other variations of using p1.communicate, p1.stdin.write() along with shell=False and shell=True, but I've had no luck in trying to figure out how to properly send along the username and password. As a last result, I tried looking at the byte code for the utility they provided - it didn't help - once I called the util's main routine with the proper arguments, it ended up core dumping w/ thread errors.
Final thoughts - the utility doesn't want to seem to 'wait' for any input. When run from the shell, it pauses at the 'Username' prompt. When run through python's popen, it just blazes thru and ends, assuming no password was given. I tried to lookup ways of maybe preloading the stdin buffer - thinking maybe the process would read from that if it was available, but couldn't figure out if that was possible.
I'm trying to stay away from using pexpect, mainly because we have to use the vendor's provided python 2.4 because of the precompiled libraries they provide and I'm trying to keep distribution of the script to as minimal a footprint as possible - if I have to, I have to, but I'd rather not use it ( and I honestly have no idea if it works in this situation either ).
Any thoughts on what else I could try would be most appreciated.
UPDATE
So I solved this by diving further into the bytecode and figuring out what I was missing from the compiled command.
However, this presented two problems -
The vendor code, when called, was doing an exit when it completed
The vendor code was writing to stdout, which I needed to store and operate on ( it contains the ID of the uploaded pkg ). I couldn't just redirect stdout, because the vendor code was still asking for the username/password.
1 was solved easy enough by wrapping their code in a try/except clause.
2 was solved by doing something similar to: https://stackoverflow.com/a/616672/677373
Instead of a log file, I used cStringIO. I also had to implement a fake 'flush' method, since it seems the vendor code was calling that and complaining that the new obj I had provided for stdout didn't supply it - code ends up looking like:
class Logger(object):
def __init__(self):
self.terminal = sys.stdout
self.log = StringIO()
def write(self, message):
self.terminal.write(message)
self.log.write(message)
def flush(self):
self.terminal.flush()
self.log.flush()
if os.path.exists(options.zipfile):
try:
os.environ['OCLI_CODESET'] = 'ISO-8859-1'
backup = sys.stdout
sys.stdout = output = Logger()
# UploadCommand was the command found in the bytecode
upload = UploadCommand()
try:
upload.main(cmdargs)
except Exception, rc:
pass
sys.stdout = backup
# now do some fancy stuff with output from output.log
I should note that the only reason I simply do a 'pass' in the except: clause is that the except clause is always called. The 'rc' is actually the return code from the command, so I will probably add handling for non-zero cases.

I tried to lookup ways of maybe preloading the stdin buffer
Do you perhaps want to create a named fifo, fill it with username/password info, then reopen it in read mode and pass it to popen (as in popen(..., stdin=myfilledbuffer))?
You could also just create an ordinary temporary file, write the data to it, and reopen it in read mode, again, passing the reopened handle as stdin. (This is something I'd personally avoid doing, since writing username/passwords to temporary files is often of the bad. OTOH it's easier to test than FIFOs)
As for the underlying cause: I suspect that the offending software is reading from stdin via a non-blocking method. Not sure why that works when connected to a terminal.
AAAANYWAY: no need to use pipes directly via Popen at all, right? I kinda laugh at the hackishness of this, but I'll bet it'll work for you:
# you don't actually seem to need popen here IMO -- call() does better for this application.
statuscode = call('echo "%s\n%s\n" | oupload %s' % (username, password, options) , shell=True)
tested with status = call('echo "foo\nbar\nbar\nbaz" |wc -l', shell = True) (output is '4', naturally.)

The original question was solved by just avoiding the issue and not using the terminal and instead importing the python code that was being called by the shell script and just using that.
I believe J.F. Sebastian's answer would probably work better for what was originally asked, however, so I'd suggest people looking for an answer to a similar question look down the path of using the pty module.

Related

Pyinstaller not allowing multiprocessing with MacOS

I have a python file that I would like to package as an executable for MacOS 11.6.
The python file (called Service.py) relies on one other json file and runs perfectly fine when running with python. My file uses argparse as the arguments can differ depending on what is needed.
Example of how the file is called with python:
python3 Service.py -v Zephyr_Scale_Cloud https://myurl.cloud/ philippa#email.com password1 group3
The file is run in exactly the same way when it is an executable:
./Service.py -v Zephyr_Scale_Cloud https://myurl.cloud/ philippa#email.com password1 group3
I can package the file using PyInstaller and the executable runs.
Command used to package the file:
pyinstaller --paths=/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/ Service.py
However, when I get to the point that requires multiprocessing, the arguments get lost. My second argument (here noted as https://myurl.cloud) is a URL that I require.
The error I see is:
[MAIN] Starting new process RUNID9157
url before constructing the client recognised as pipe_handle=15
usage: Service [-h] test_management_tool url
Service: error: the following arguments are required: url
Traceback (most recent call last):
File "urllib3/connection.py", line 174, in _new_conn
File "urllib3/util/connection.py", line 72, in create_connection
File "socket.py", line 954, in getaddrinfo
I have done some logging and the url does get correctly read. But as soon as the process started and picking up what it needs to, the url is changed to 'pipe_handle=x', in the code above it is pipe_handle=15.
I need the url to retrieve an authentication token, but it just stops being read as the correct value and is changed to this pipe_handle value. I have no idea why.
Has anyone else seen this?!
I am using Python 3.9, PyInstaller 4.4 and ArgParse.
I have also added
if __name__ == "__main__":
if sys.platform.startswith('win'):
# On Windows - multiprocessing is different to Unix and Mac.
multiprocessing.freeze_support()
to my if name = "main" section as I saw this on other posts but it doesn't help.
Can someone please assist?
Sending commands via sys.argv is complicated by the fact that multiprocessing's "spawn" start method uses that to pass the file descriptors for the initial communication pipes between the parent and child.
I'm projecting here a little because you did not share the code of how/where you call argparse, and how/where you call multiprocessing
If you are parsing args outside of if __name__ == "__main__":, the args may get parsed (re-parsed on child import __main__) before sys.argv gets automatically cleaned up by multiprocessing.spawn.prepare() in the child. You should be able to fix this by moving the argparse stuff inside your target function. It also may be easier to parse the args in the parent, and simply send the parsed results as an argument to the target function. See this answer of mine for further discussion on sys.argv with multiprocessing.

Finding which files are being read from during a session (python code)

I have a large system written in python. when I run it, it reads all sorts of data from many different files on my filesystem. There are thousands lines of code, and hundreds of files, most of them are not actually being used. I want to see which files are actually being accessed by the system (ubuntu), and hopefully, where in the code they are being opened. Filenames are decided dynamically using variables etc., so the actual filenames cannot be determined just by looking at the code.
I have access to the code, of course, and can change it.
I try to figure how to do this efficiently, with minimal changes in the code:
is there a Linux way to determine which files are accessed, and at what times? this might be useful, although it won't tell me where in the code this happens
is there a simple way to make an "open file" command also log the file name, time, etc... of the open file? hopefully without having to go into the code and change every open command, there are many of them, and some are not being used at runtime.
Thanks
You can trace file accesses without modifying your code, using strace.
Either you start your program with strace, like this
strace -f -e trace=file your_program.py
Otherwise you attach strace to a running program like this
strace -f -e trace=file -p <PID>
For 1 - You can use
ls -la /proc/<PID>/fd`
Replacing <PID> with your process id.
Note that it will give you all the open file descriptors, some of them are stdin stdout stderr, and often other things, such as open websockets (which use a file descriptor), however filtering it for files should be easy.
For 2- See the great solution proposed here -
Override python open function when used with the 'as' keyword to print anything
e.g. overriding the open function with your own, which could include the additional logging.
One possible method is to "overload" the open function. This will have many effects that depend on the code, so I would do that very carefully if needed, but basically here's an example:
>>> _open = open
>>> def open(filename):
... print(filename)
... return _open(filename)
...
>>> open('somefile.txt')
somefile.txt
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in open
FileNotFoundError: [Errno 2] No such file or directory: 'somefile.txt'
As you can see my new open function will return the original open (renamed as _open) but will first print out the argument (the filename). This can be done with more sophistication to log the filename if needed, but the most important thing is that this needs to run before any use of open in your code

How do I embed my shell scanning-script into a Python script?

Iv'e been using the following shell command to read the image off a scanner named scanner_name and save it in a file named file_name
scanimage -d <scanner_name> --resolution=300 --format=tiff --mode=Color 2>&1 > <file_name>
This has worked fine for my purposes.
I'm now trying to embed this in a python script. What I need is to save the scanned image, as before, into a file and also capture any std output (say error messages) to a string
I've tried
scan_result = os.system('scanimage -d {} --resolution=300 --format=tiff --mode=Color 2>&1 > {} '.format(scanner, file_name))
But when I run this in a loop (with different scanners), there is an unreasonably long lag between scans and the images aren't saved until the next scan starts (the file is created as an empty file and is not filled until the next scanning command). All this with scan_result=0, i.e. indicating no error
The subprocess method run() has been suggested to me, and I have tried
with open(file_name, 'w') as scanfile:
input_params = '-d {} --resolution=300 --format=tiff --mode=Color 2>&1 > {} '.format(scanner, file_name)
scan_result = subprocess.run(["scanimage", input_params], stdout=scanfile, shell=True)
but this saved the image in some kind of an unreadable file format
Any ideas as to what may be going wrong? Or what else I can try that will allow me to both save the file and check the success status?
subprocess.run() is definitely preferred over os.system() but neither of them as such provides support for running multiple jobs in parallel. You will need to use something like Python's multiprocessing library to run several tasks in parallel (or painfully reimplement it yourself on top of the basic subprocess.Popen() API).
You also have a basic misunderstanding about how to run subprocess.run(). You can pass in either a string and shell=True or a list of tokens and shell=False (or no shell keyword at all; False is the default).
with_shell = subprocess.run(
"scanimage -d {} --resolution=300 --format=tiff --mode=Color 2>&1 > {} ".format(
scanner, file_name), shell=True)
with open(file_name) as write_handle:
no_shell = subprocess.run([
"scanimage", "-d", scanner, "--resolution=300", "--format=tiff",
"--mode=Color"], stdout=write_handle)
You'll notice that the latter does not support redirection (because that's a shell feature) but this is reasonably easy to implement in Python. (I took out the redirection of standard error -- you really want error messages to remain on stderr!)
If you have a larger working Python program this should not be awfully hard to integrate with a multiprocessing.Pool(). If this is a small isolated program, I would suggest you peel off the Python layer entirely and go with something like xargs or GNU parallel to run a capped number of parallel subprocesses.
I suspect the issue is you're opening the output file, and then running the subprocess.run() within it. This isn't necessary. The end result is, you're opening the file via Python, then having the command open the file again via the OS, and then closing the file via Python.
JUST run the subprocess, and let the scanimage 2>&1> filename command create the file (just as it would if you ran the scanimage at the command line directly.)
I think subprocess.check_output() is now the preferred method of capturing the output.
I.e.
from subprocess import check_output
# Command must be a list, with all parameters as separate list items
command = ['scanimage',
'-d{}'.format(scanner),
'--resolution=300',
'--format=tiff',
'--mode=Color',
'2>&1>{}'.format(file_name)]
scan_result = check_output(command)
print(scan_result)
However, (with both run and check_output) that shell=True is a big security risk ... especially if the input_params come into the Python script externally. People can pass in unwanted commands, and have them run in the shell with the permissions of the script.
Sometimes, the shell=True is necessary for the OS command to run properly, in which case the best recommendation is to use an actual Python module to interface with the scanner - versus having Python pass an OS command to the OS.

Can't seem to get fortran executable to run correctly through python

I have read a bunch of different topics on SO and other sites and cannot get a direct answer to my question/problem. Currently I have this python script that runs completely fine, with the exception of no calls made to run a fortran program are working correctly. I have tried using subprocess commands, os.system commands, opening bash script files that are opened through python, and no luck. Here are some examples and errors I'm getting.
One attmept:
subprocess.Popen(["sh", "{0}{1}".format(SCRIPTS,"qlmtconvertf.sh"), "qlmt"], shell=False, stdout=subprocess.PIPE)
This gives an error that the program has trouble reading the file correctly.
forrtl: severe (24): end-of-file during read, unit 1, file /home/akoufos/lapw/Ar/lda/bcc55_mt1.5_lo_e8_o4/DOS/lat70/qlmt
Another attempt:
subprocess.Popen(["./{0}{1}".format(SOURCE,"qlmtconvertf"), "qlmt"], shell=False, stdout=subprocess.PIPE)
This gives an error of not finding the file.
File "/home/akoufos/lapw/Scripts_Plots/LAPWanalysis.py", line 59, in DOS
subprocess.Popen(["./{0}{1}".format(SOURCE,"qlmtconvertf"), "qlmt"], shell=False, stdout=subprocess.PIPE)
File "/usr/lib64/python2.7/subprocess.py", line 672, in __init__
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1202, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
Yet another attempt:
os.system("{0}{1}".format(SOURCE,"qlmtconvertf qlmt"))
This gives an error equivalent to the first example. In all examples SOURCE="/home/myusername/lapw/Source/", where the fortran source files are, SCRIPTS="/home/myusername/lapw/Scripts_Plots/", where I have other files and the python scripts in, qlmtconvertf is a compiled fortran program, and qlmt is a file the qlmtconvertf reads. This source code works completely fine if I call it in the shell, like I have done countless times, but I'm trying to automate calling these codes. I have written a bash script as well, that does what I need, but I'm trying to do everything through python instead. Any ideas, suggestions, or answers to what I am doing incorrectly and what is going on would be greatly appreciated. Thank you all in advance.
EDIT: I got it working with the suggestion given below by Francis. I had to keep the complete paths (i.e. /home/username/etc) and the os.path.join to call the program correctly.
import os.path
LAPW = "/home/myusername/lapw/"
SOURCE = os.path.join(LAPW,'Source')
SCRIPTS = os.path.join(LAPW,'Scripts_Plots')
QLMTCONVERT = os.path.join(SOURCE,'qlmtconvertf')
qargs = [QLMTCONVERT,'qlmt']
#CALLING PROGRAM
subprocess.Popen(qargs, stdout=subprocess.PIPE).communicate(input=None)
To get it to work correctly I had to also close the 'qlmt' file I had created during the python script. Also I am working in the directory that contains the 'qlmt' file.
(edit Also added .communicate(input=None) to the end of the subprocess. This was unnecessary for this process call, but it was important for a latter one I made in the script that tried to use a file the process was creating. From my understanding the .communicate talks to the process and basically waits for it to finish before the next python line is executed. Similar to .wait(), but more advanced. If someone who understands this more wants to elaborate, please feel free. edit)
I'm not exactly sure why this method worked, but using strings as inputs for the subprocess was giving errors. If any one has any insight on this I would be very thankful if you could pass on your knowledge. Thank you everyone for the help.
I think you forgot a slash in your filenames:
"{0}{1}".format(SOURCE,"qlmtconvertf qlmt") == '/home/myusername/lapw/Sourceqlmtconvertf qlmt'
I assume you mean this?
"{0}/{1}".format(SOURCE,"qlmtconvertf qlmt") == '/home/myusername/lapw/Source/qlmtconvertf qlmt'
I recommend using os.path.join rather than direct string construction for pathname creation:
import os.path
executable = os.path.join(SOURCE, 'qlmtconvertf')
args = ['qlmt']
subprocess.Popen(executable+args, stdout=subprocess.PIPE)

disallow access to filesystem inside exec and eval in Python

I want to disallow access to file system from clients code, so I think I could overwrite open function
env = {
'open': lambda *a: StringIO("you can't use open")
}
exec(open('user_code.py'), env)
but I got this
unqualified exec is not allowed in function 'my function' it contains a
nested function with free variables
I also try
def open_exception(*a):
raise Exception("you can't use open")
env = {
'open': open_exception
}
but got the same Exception (not "you can't use open")
I want to prevent of:
executing this:
"""def foo():
return open('some_file').read()
print foo()"""
and evaluate this
"open('some_file').write('some text')"
I also use session to store code that was evaluated previously so I need to prevent of executing this:
"""def foo(s):
return open(s)"""
and then evaluating this
"foo('some').write('some text')"
I can't use regex because someone could use (eval inside string)
"eval(\"opxx('some file').write('some text')\".replace('xx', 'en')"
Is there any way to prevent access to file system inside exec/eval? (I need both)
There's no way to prevent access to the file system inside exec/eval. Here's an example code that demonstrates a way for the user code to call otherwise restricted classes that always works:
import subprocess
code = """[x for x in ().__class__.__bases__[0].__subclasses__()
if x.__name__ == 'Popen'][0](['ls', '-la']).wait()"""
# Executing the `code` will always run `ls`...
exec code in dict(__builtins__=None)
And don't think about filtering the input, especially with regex.
You might consider a few alternatives:
ast.literal_eval if you could limit yourself only to simple expressions
Using another language for user code. You might look at Lua or JavaScript - both are sometimes used to run unsafe code inside sandboxes.
There's the pysandbox project, though I can't guarantee you that the sandboxed code is really safe. Python wasn't designed to be sandboxed, and in particular the CPython implementation wasn't written with sandboxing in mind. Even the author seems to doubt the possibility to implement such sandbox safely.
You can't turn exec() and eval() into a safe sandbox. You can always get access to the builtin module, as long as the sys module is available::
sys.modules[().__class__.__bases__[0].__module__].open
And even if sys is unavailable, you can still get access to any new-style class defined in any imported module by basically the same way. This includes all the IO classes in io.
This actually can be done.
That is, practically just what you describe can be accomplished on Linux, contrary to other answers here. That is, you can achieve a setup where you can have an exec-like call which runs untrusted code under security which is reasonably difficult to penetrate, and which allows output of the result. Untrusted code is not allowed to access the filesystem at all except for reading specifically allowed parts of the Python vm and standard library.
If that's close enough to what you wanted, read on.
I'm envisioning a system where your exec-like function spawns a subprocess under a very strict AppArmor profile, such as the one used by Straitjacket (see here and here). This will limit all filesystem access at the kernel level, other than files specifically allowed to be read. This will also limit the process's stack size, max data segment size, max resident set size, CPU time, the number of signals that can be queued, and the address space size. The process will have locked memory, cores, flock/fcntl locks, POSIX message queues, etc, wholly disallowed. If you want to allow using size-limited temporary files in a scratch area, you can mkstemp it and make it available to the subprocess, and allow writes there under certain conditions (make sure that hard links are absolutely disallowed). You'd want to make sure to clear out anything interesting from the subprocess environment and put it in a new session and process group, and close all FDs in the subprocess except for the stdin/stdout/stderr, if you want to allow communication with those.
If you want to be able to get a Python object back out from the untrusted code, you could wrap it in something which prints the result's repr to stdout, and after you check its size, you evaluate it with ast.literal_eval(). That pretty severely limits the possible types of object that can be returned, but really, anything more complicated than those basic types probably carries the possibility of sekrit maliciousness intended to be triggered within your process. Under no circumstances should you use pickle for the communication protocol between the processes.
As #Brian suggest overriding open doesn't work:
def raise_exception(*a):
raise Exception("you can't use open")
open = raise_exception
print eval("open('test.py').read()", {})
this display the content of the file but this (merging #Brian and #lunaryorn answers)
import sys
def raise_exception(*a):
raise Exception("you can't use open")
__open = sys.modules['__builtin__'].open
sys.modules['__builtin__'].open = raise_exception
print eval("open('test.py').read()", {})
will throw this:
Traceback (most recent call last):
File "./test.py", line 11, in <module>
print eval("open('test.py').read()", {})
File "<string>", line 1, in <module>
File "./test.py", line 5, in raise_exception
raise Exception("you can't use open")
Exception: you can't use open
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python2.6/dist-packages/apport_python_hook.py", line 48, in apport_excepthook
if not enabled():
File "/usr/lib/python2.6/dist-packages/apport_python_hook.py", line 23, in enabled
conf = open(CONFIG).read()
File "./test.py", line 5, in raise_exception
raise Exception("you can't use open")
Exception: you can't use open
Original exception was:
Traceback (most recent call last):
File "./test.py", line 11, in <module>
print eval("open('test.py').read()", {})
File "<string>", line 1, in <module>
File "./test.py", line 5, in raise_exception
raise Exception("you can't use open")
Exception: you can't use open
and you can access to open outside user code via __open
"Nested function" refers to the fact that it's declared inside another function, not that it's a lambda. Declare your open override at the top level of your module and it should work the way you want.
Also, I don't think this is totally safe. Preventing open is just one of the things you need to worry about if you want to sandbox Python.

Categories

Resources