My question is more theoretical than practical, I've found more answers that explains how but not why should we use a list in a subprocess.Popen call.
For example as is known:
Python 2.7.10 (default, Oct 14 2015, 16:09:02)
[GCC 5.2.1 20151010] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess
>>> cmd = subprocess.Popen(["python", "-V"], stdout=subprocess.PIPE)
Python 2.7.10
Then I was messing around in UNIX and found something interesting:
mvarge#ubuntu:~$ strace -f python -V 2>&1
execve("/usr/bin/python", ["python", "-V"], [/* 29 vars */]) = 0
Probably both execve and the list model that subprocess use are someway related, but can anyone give a good explanation for this?
Thanks in advance.
The underlying C-level representation is a *char [] array. Representing this as a list in Python is just a very natural and transparent mapping.
You can use a string instead of a list with shell=True; the shell is then responsible for parsing the command line into a * char [] array. However, the shell adds a number of pesky complexities; see the many questions for why you want to avoid shell=True for a detailed explanation.
The command line arguments argv and the environment envp are just two of many OS-level structures which are essentially a null-terminated arrays of strings.
A process is an OS level abstraction — to create a process, you have to use OS API that dictates what you should use. It is not necessary to use a list e.g., a string (lpCommandLine) is the native interface on Windows (CreateProcess()). POSIX uses execv() and therefore the native interface is a sequence of arguments (argv). Naturally, subprocess Python module uses these interfaces to run external commands (create new processes).
The technical (uninsteresting) answer is that in "why we must", the "must" part is not correct as Windows demonstrates.
To understand "why it is", you could ask the creators of CreateProcess(), execv() functions.
To understand "why we should" use a list, look at the table of contents for Unix (list) and Windows (string): How Command Line Parameters Are Parsed — the task that should be simple is complicated on Windows.
The main difference is that on POSIX the caller is responsible for splitting a command line into separate parameters. While on Windows the command itself parses its parameters. Different programs may and do use different algorithms to parse the parameters. subprocess module uses MS C runtime rules (subprocess.list2cmdline()), to combine args list into the command line. It is much harder for a programmer to understand how the parameters might be parsed on Windows.
Related
I'm sure it's intentional, so can someone explain the rationale for this behavior:
Python 2.7.2 (default, Oct 13 2011, 15:27:47)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from os.path import isdir,expanduser
>>> isdir("~amosa/pdb")
False
>>> isdir(expanduser("~amosa/pdb"))
True
>>>
>>> from os import chdir
>>> chdir("~amosa/pdb")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 2] No such file or directory: '~amosa/pdb'
>>> chdir(expanduser("~amosa/pdb"))
>>>
It's really annoying since, after all, the path with a username in it can be resolved unambiguously... I want to write code that can handle any sort of input that a user might give me, but this behavior requires me to call expanduser on every path my code has to deal with. It also means that anywhere I print that path out for the user to see, it will be slightly less legible than what they gave me.
This seems inconsistent with the concept of "duck typing" in which I generalize to mean that I expect python not to whine to me unless there's actually a problem...
Because the underlying system calls don't recognize user paths, and the file access APIs are a fairly thin wrapper over them.
Additionally, it would be fairly surprising for non-Unix users,
if (for example) fopen("~foo") returns a "foo: no such user" error (as "~foo" is a valid file name on, for example, Windows)…
Or, similarly, if fopen("~administrator") returns an error like "Is a directory: C:\Documents and Settings\Administrator\".
Finally, as commenters have noted: you're confusing "duck typing" with "helpful shortcuts", which are two entirely different things:
- Duck typing allows me to substitute for a duck anything which quacks like a duck.
- Helpful shortcuts allow me to substitute for a duck anything which could be made to quack like a duck.(Python does not "try to make it quack" like some other languages do).
In normal Unix utilities, the ~amosa syntax is handled by the shell, which is the program that invokes utilities. The utilities themselves do not know about the special ~ syntax (generally).
So if your python program is invoked by a shell on Unix, it will Just Work:
$ python -c 'import sys; print sys.argv[1]' ~drj
/home/drj
Notice how the python program above prints the expanded path, even though it clearly has no code to do the expansion itself. The shell expanded it.
I am using a software that my lab has developed, lets call it cool_software. When I type cool_software on terminal, basically I get a new prompt cool_software > and I can imput commands to this software from the terminal.
Now I would like to automate this in Python, however I am not sure how to pass the cool_software commands onto it. Here's my MWE:
import os
os.system(`cool_software`)
os.system(`command_for_cool_software`)
The problem with the code above is that command_for_cool_software is executed in the usual unix shell, it is not executed by the cool_software.
Based on #Barmar suggestion from the comments, using pexpect is pretty neat. From the documentation:
The spawn class is the more powerful interface to the Pexpect system. You can use this to spawn a child program then interact with it by sending input and expecting responses (waiting for patterns in the child’s output).
This is a working example using the python prompt as an example:
import pexpect
child = pexpect.spawn("python") # mimcs running $python
child.sendline('print("hello")') # >>> print("hello")
child.expect("hello") # expects hello
print(child.after) # prints "hello"
child.close()
In your case, it will be like this:
import pexpect
child = pexpect.spawn("cool_software")
child.sendline(command_for_cool_software)
child.expect(expected_output) # catch the expected output
print(child.after)
child.close()
NOTE
child.expect() matches only what you expect. If you don't expect anything and want to get all the output since you started spawn, then you can use child.expect('.+') which would match everything.
This is what I got:
b'Python 3.8.10 (default, Jun 2 2021, 10:49:15) \r\n[GCC 9.4.0] on linux\r\nType "help", "copyright", "credits" or "license" for more information.\r\n>>> print("hello")\r\nhello\r\n>>> '
In Julia, calling a function with the #edit macro from the REPL will open the editor and put the cursor at the line where the method is defined. So, doing this:
julia> #edit 1 + 1
jumps to julia/base/int.jl and puts the cursor on the line:
(+)(x::T, y::T) where {T<:BitInteger} = add_int(x, y)
As does the function form: edit(+, (Int, Int))
Is there an equivalent decorator/function in Python that does the same from the Python REPL?
Disclaimer: In the Python ecosystem, this is not the job of the core language/runtime but rather tools such as IDEs. For example, the ipython shell has the ?? special syntax to get improved help including source code.
Python 3.8.5 (default, Jul 21 2020, 10:42:08)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.18.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import random
In [2]: random.uniform??
Signature: random.uniform(a, b)
Source:
def uniform(self, a, b):
"Get a random number in the range [a, b) or [a, b] depending on rounding."
return a + (b-a) * self.random()
File: /usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/random.py
Type: method
The Python runtime itself allows viewing source code of objects via inspect.getsource. This uses a heuristic to search the source code as available; the objects themselves do not carry their source code.
Python 3.8.5 (default, Jul 21 2020, 10:42:08)
[Clang 11.0.0 (clang-1100.0.33.17)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import inspect
>>> print(inspect.getsource(inspect.getsource))
def getsource(object):
"""Return the text of the source code for an object.
The argument may be a module, class, method, function, traceback, frame,
or code object. The source code is returned as a single string. An
OSError is raised if the source code cannot be retrieved."""
lines, lnum = getsourcelines(object)
return ''.join(lines)
It is not possible to resolve arbitrary expressions or statements to their source; since all names in Python are resolved dynamically, the vast majority of expressions does not have a well-defined implementation unless executed. A debugger, e.g. as provided by pdb.set_trace(), allows inspecting the expression as it is executed.
In most IDEs like PyCharm or VSCode you can Ctrl+ click on a function / class to get its definition, even if it is in the core language or a 3rd party library (in VSCode, this also works in Julia btw.).
A limitation is that this only works for "pure Python" code, C library code, etc. is not shown.
Inspired by another question here, I would like to retrieve the Python interpreter's full command line in a portable way. That is, I want to get the original argv of the interpreter, not the sys.argv which excludes options to the interpreter itself (like -m, -O, etc.).
sys.flags tells us which boolean options were set, but it doesn't tell us about -m arguments, and the set of flags is bound to change over time, creating a maintenance burden.
On Linux you can use procfs to retrieve the original command line, but this is not portable (and it's sort of gross):
open('/proc/{}/cmdline'.format(os.getpid())).read().split('\0')
You can use ctypes
~$ python2 -B -R -u
Python 2.7.9 (default, Dec 11 2014, 04:42:00)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Persistent session history and tab completion are enabled.
>>> import ctypes
>>> argv = ctypes.POINTER(ctypes.c_char_p)()
>>> argc = ctypes.c_int()
>>> ctypes.pythonapi.Py_GetArgcArgv(ctypes.byref(argc), ctypes.byref(argv))
1227013240
>>> argc.value
4
>>> argv[0]
'python2'
>>> argv[1]
'-B'
>>> argv[2]
'-R'
>>> argv[3]
'-u'
I'm going to add another answer to this. #bav had the right answer for Python 2.7, but it breaks in Python 3 as #szmoore points out (not just 3.7). The code below, however, will work in both Python 2 and Python 3 (the key to that is c_wchar_p in Python 3 instead of c_char_p in Python 2) and will properly convert the argv into a Python list so that it's safe to use in other Python code without segfaulting:
def get_python_interpreter_arguments():
argc = ctypes.c_int()
argv = ctypes.POINTER(ctypes.c_wchar_p if sys.version_info >= (3, ) else ctypes.c_char_p)()
ctypes.pythonapi.Py_GetArgcArgv(ctypes.byref(argc), ctypes.byref(argv))
# Ctypes are weird. They can't be used in list comprehensions, you can't use `in` with them, and you can't
# use a for-each loop on them. We have to do an old-school for-i loop.
arguments = list()
for i in range(argc.value - len(sys.argv) + 1):
arguments.append(argv[i])
return arguments
You'll notice that it also returns only the interpreter arguments and excludes the augments found in sys.argv. You can eliminate this behavior by removing - len(sys.argv) + 1.
So I am running a Python script within which I am calling Python's debugger, PDB by writing:
import ipdb; ipdb.set_trace()
(iPython's version of PDB, though for the matter I don't think it makes a difference; I use it for the colored output only).
Now, when I get to the debugger I want to execute a multi-line statement such as an if clause or a for loop but as soon as I type
if condition:
and hit the return key, I get the error message *** SyntaxError: invalid syntax (<stdin>, line 1)
How can one execute multi-line statements within PDB? If not possible is there a way around this to still executing an if clause or a for loop?
You could do this while in pdb to launch a temporary interactive Python session with all the local variables available:
(pdb) !import code; code.interact(local=vars())
Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>>
When you're done, use Ctrl-D to return to the regular pdb prompt.
Just don't hit Ctrl-C, that will terminate the entire pdb session.
In python3 ipdb (and pdb) have a command called interact. It can be used to:
Start an interactive interpreter (using the code module) whose global namespace contains all the (global and local) names found in the current scope.
To use it, simply enter interact at the pdb prompt. Among other things, it's useful for applying code spanning multiple lines, and also for avoiding accidental triggering of other pdb commands.
My recommendation is to use IPython embedding.
ipdb> from IPython import embed; embed()
Inside the Python (2.7.1) interpreter or debugger (import pdb), you can execute a multi-line statement with the following syntax.
for i in range(5): print("Hello"); print("World"); print(i)
Note: When I'm inside the interpreter, I have to hit return twice before the code will execute. Inside the debugger, however, I only have to hit return once.
There is the special case if you want a couple of commands be executed when hitting a break point. Then there is the debugger command commands. It allows you to enter multiple lines of commands and then end the whole sequence with the end key word. More with (pdb) help commands.
I don't know if you can do this, that'd be a great feature for ipdb though. You can use list comprehensions of course, and execute simple multi-line expressions like:
if y == 3: print y; print y; print y;
You could also write some functions beforehand to do whatever it is you need done that would normally take multiple lines.