How can I get subprocess.check_call to give me the raw binary output of a command, it seems to be encoding it incorrectly somewhere.
Details:
I have a command that returns text like this:
some output text “quote” ...
(Those quotes are unicode e2809d)
Here's how I'm calling the command:
f_output = SpooledTemporaryFile()
subprocess.check_call(cmd, shell=True, stdout=f_output)
f_output.seek(0)
output = f_output.read()
The problem is I get this:
>>> repr(output)
some output text ?quote? ...
>>> type(output)
<str>
(And if I call 'ord' the '?' I get 63.)
I'm on Python 2.7 on Linux.
Note: Running the same code on OSX works correctly to me. The problem is when I run it on a Linux server.
Wow, this was the weirdest issue ever but I've fixed it!
It turns out that the program it was calling (a java program) was returning different encoding depending on where it was called from!
Dev osx machine, returns the characters fine, Linux server from command line, returns them fine, called from a Django app, nope turns into "?"s.
To fix this I ended up adding this argument to the command:
-Dfile.encoding=utf-8
I got that idea here, and it seems to work. There's also a way to modify the Java program internally to do that.
Sorry I blamed Python! You guys had the right idea.
The redirection (stdout=file) happens at the file descriptor level. Python has nothing to do with what is written to the file if you see ? instead of “ in the file itself (not in a REPL).
If it work on OS X and it "doesn't work" on Linux server then the likely reason is the difference in the environment, check LC_ALL, LC_CTYPE, LANG envvars—python, /bin/sh (due to shell=True), and the cmd may use your locale encoding that is ASCII if the environment is not set (C, POSIX locale).
To get "raw binary" from a subprocess:
#!/usr/bin/env python
import subprocess
raw_binary = subprocess.check_output(['cmd', 'arg 1', 'arg 2'])
print(repr(raw_binary))
Note:
no shell=True—don't use it unless it is necessary
many programs may change their behavior if they detect that the output is not a tty, example.
Related
When using os.spawnl() or subprocess.run() functions from Python, Unicode arguments (for example, path values) gets passed as gibberish, and the called executable can't use them.
>>> subprocess.run(["ping", "ğü"], stdout=subprocess.PIPE).stdout
b'Ping request could not find host \xa7\x81. Please check the name and try again.\r\n'
PS C:\Users\flc> C:\Windows\system32\PING.EXE ğü
Ping request could not find host ğü. Please check the name and try again.
Strangely, this doesn't happen with nslookup:
>>> subprocess.run(["nslookup", "ğü"], stdout=subprocess.PIPE).stdout`
`*** one.one.one.one can't find ğü: Non-existent domain`
`b'Server: one.one.one.one\r\nAddress: 1.1.1.1\r\n\r\n'
I tested this with dir and echo, but since they require shell=True, I wasn't sure if it was making things seem OK. Hence I picked ping and nslookup.
Characters "ğü" are interpreted as "\xa7\x81" by called programs. While on Powershell, ping works with Unicode characters. Replacement characters use cp1254. It seems like even if I instruct Python to use utf-8, it passes arguments in my local encoding.
What's happening here? How can I safely pass arguments to spawned executables? Is this a problem with Python, or the target executable?
Im trying to pass a python command from R (on Windows x64 Rstudio) to a python script via the command promt. It works if I type directly into cdm but not if I do it via R using the R function system(). The format is (this is how I EXACTLY would write in the windows cmd shell/promt):
pyhton C:/some/path/script <C:/some/input.file> C:/some/output.file
This works in the cmd promt, and runs the script with the input file (in <>) and gives the output file. I thought I in R could do:
system('pyhton C:/some/path/script <C:/some/input.file> C:/some/output.file')
But this gives an error from python about
error: unparsable arguments: ['<C:/some/input.file>', 'C:/some/output.file']
It seems as if R or windows interpret the white spaces different than if I simply wrote (or copy-paste) the line to the cmd promt. How to do this.
From ?system
This interface has become rather complicated over the years: see
system2 for a more portable and flexible interface which is
recommended for new code.
System2 accepts a parameter args for the arguments of your command.
So you can try:
system2('python', c('C:\\some\\path\\script', 'C:\\some\\input.file', 'C:\\some\\output.file'))
On Windows:
R documentation is not really clear on this point (or maybe it's just me), anyway it seems that on Windows the suggested approach is to use the shell() which is less raw than system and system2, plus it seems to work better with redirection operators (like < or >).
shell ('python C:\\some\\path\\script < C:\\some\\input.file > C:\\some\\output.file')
So what is this command doing is:
Call python
Telling python to execute the script C:\some\path\script. Here we need to escape the '\' using '\'.
Then we passing some inputs to the script using a the '<' operator and the input.file
We redirect the output (using '>') to the output file.
Scratching my head... this curl command will work fine from the command line when I copy it from here and paste it in my Windows 7 command line, but I can't get it to execute in my Python 2.7.9 script. Says the system cannot find the specified file. Popen using 'ping' or something like that works just fine, so I'm sure this is a goober typo that I'm just not seeing. I would appreciate a separate set of eyes and any comments as to what is wrong.
proc = subprocess.Popen("curl --ntlm -u : --upload-file c:\\temp\\test.xlsx http://site.domain.com/sites/site/SiteDirectory/folder/test.xlsx")
Have a look at second two paragraphs of the subprocess.Popen documentation if you haven't already:
args should be a sequence of program arguments or else a single string. By default, the program to execute is the first item in args if args is a sequence. If args is a string, the interpretation is platform-dependent and described below. See the shell and executable arguments for additional differences from the default behavior. Unless otherwise stated, it is recommended to pass args as a sequence.
On Unix, if args is a string, the string is interpreted as the name or path of the program to execute. However, this can only be done if not passing arguments to the program. [emphasis mine]
Instead you should pass in a list in which each argument to the program (including the executable name itself) is given as a separate item in the list. This is generally going to be safer in a cross-platform context anyways.
Update: I see now that you're using Windows in which case the advice on UNIX doesn't apply. On Windows though things are even more hairy. The best advice remains to use a list :)
Update 2: Another possible issue (and in fact the OP's issue as reported in the comments on this answer) is that because the full path to the curl executable was not given, it may not be found if the Python interpreter is running in an environment with a different PATH environment variable.
I am a bit confused as to how to get this done.
What I need to do is call an external command, from within a Python script, that takes as input several arguments, and a file name.
Let's call the executable that I am calling "prog", the input file "file", so the command line (in Bash terminal) looks like this:
$ prog --{arg1} {arg2} < {file}
In the above {arg1} is a string, and {arg2} is an integer.
If I use the following:
#!/usr/bin/python
import subprocess as sbp
sbp.call(["prog","--{arg1}","{arg2}","<","{file}"])
The result is an error output from "prog", where it claims that the input is missing {arg2}
The following produces an interesting error:
#!/usr/bin/python
import subprocess as sbp
sbp.call(["prog","--{arg1} {arg2} < {file}"])
all the spaces seem to have been removed from the second string, and equal sign appended at the very end:
command not found --{arg1}{arg2}<{file}=
None of this behavior seems to make any sense to me, and there isn't much that one can go by from the Python man pages found online. Please note that replacing sbp.call with sbp.Popen does not fix the problem.
The issue is that < {file} isn’t actually an argument to the program, but is syntax for the shell to set up redirection. You can tell Python to use the shell, or you can setup the redirection yourself.
from subprocess import *
# have shell interpret redirection
check_call('wc -l < /etc/hosts', shell=True)
# set up redirection in Python
with open('/etc/hosts', 'r') as f:
check_call(['wc', '-l'], stdin=f.fileno())
The advantage of the first method is that it’s faster and easier to type. There are a lot of disadvantages, though: it’s potentially slower since you’re launching a shell; it’s potentially non-portable because it depends on the operating system shell’s syntax; and it can easily break when there are spaces or other special characters in filenames.
So the second method is preferred.
I've seen similar questions (e.g. Running a command in a new Mac OS X Terminal window ) but I need to confirm this command and its expected behavior in a mac (which I don't have). If anyone can run the following in Python 3 Mac:
import subprocess, os
def runcom(bashCommand):
sp = subprocess.Popen(['osascript'], stdin=subprocess.PIPE, stderr=subprocess.PIPE)
sp.communicate('''tell application "Terminal"\nactivate\ndo script with command "{0} $EXIT"\nend tell'''.format(bashCommand))
runcom('''echo \\"This is a test\\n\\nThis should come two lines later; press any key\\";read throwaway''')
runcom('''echo \\"This is a test\\"\n\necho \\"This should come one line later; press any key\\";read throwaway''')
runcom('''echo \\"This is testing whether I can have you enter your sudo pw on separate terminal\\";sudo ls;\necho \\"You should see your current directory; press any key\\";read throwaway''')
Firstly, and most basically, is the "spawn new terminal and execute" command correct? (For reference, this version of the runcom function came from this answer below, and is much cleaner than my original.)
As for the actual tests: the first one tests that internal double escaped \\n characters really work. The second tests that we can put (unescaped) newlines into the "script" and still have it work just like semicolon. Finally, the last one tests whether you can call a sudo process in a separate terminal (my ultimate goal).
In all cases, the new terminal should disappear as soon as you "press any key". Please also confirm this.
If one of these doesn't work, a correction/diagnosis would be most appreciated. Also appreciated: is there a more pythonic way of spawning a terminal on Mac then executing a (sudo, extended) bash commands on it?
Thanks!
[...] its expected behavior [...]
This is hard to answer, since those commands do what I expect them to do, which might not be what you expect them to do.
As for the actual tests: the first one tests that internal double escaped \n characters really work.
The \\n with the doubled backslash does indeed work correctly in that it causes echo to emit a newline character. However, no double quotes are emitted by echo.
The second tests that we can put (unescaped) newlines into the "script" and still have it work just like semicolon.
That works also.
Finally, the last one tests whether you can call a sudo process in a separate terminal (my ultimate goal).
There is no reason why this should not work also, and indeed it does.
In all cases, the new terminal should disappear as soon as you "press any key". Please also confirm this.
That will not work because of several reasons:
read in bash will by default read a whole line, not just one character
after the script you supply is executed, there is no reason for the shell within the terminal to exit
even if the shell would exit, the user can configure Terminal.app not to close a window after the shell exits (this is even the default setting)
Other problems:
the script you supply to osascript will appear in the terminal window before it is executed. in the examples above, the user will see every "This is a test [...]" twice.
I cannot figure out what $EXIT is supposed to do
The ls command will show the user "the current directory" only in the sense that the current working directory in a new terminal window will always be the user's home directory
throwaway will not be available after the script bashCommand exits
Finally, this script will not work at all under Python 3, because it crashes with a TypeError: communicate() takes a byte string as argument, not a string.
Also appreciated: is there a more pythonic way of spawning a terminal on Mac [...]
You should look into PyObjC! It's not necessarily more pythonic, but at least you would eliminate some layers of indirection.
I don't have Python 3, but I edited your runcom function a little and it should work:
def runcom(bashCommand):
sp = subprocess.Popen(['osascript'], stdin=subprocess.PIPE, stderr=subprocess.PIPE)
sp.communicate('''tell application "Terminal"\nactivate\ndo script with command "{0} $EXIT"\nend tell'''.format(bashCommand))