subprocess returns different output than shell

subprocess returns different output than shell - python

Short context: Github's gh client adds ANSI colors to its JSON output, which is nice but makes post process hard. So if you pipe the output, gh makes a slightly different and regularly encoded output, for example:
gh api organizations | cat
prints something like
[{"login":"errfree","id":44,"node_id":"MDEyOk9yZ2FuaXphdGlvbjQ0","url":"https://api.github.com/orgs/errfree","repos_url":"https://api.github.com/orgs/errfree/repos","events_url":"https://api.github.com/orgs/errfree/events","hooks_url":"https://api.github.com/orgs/errfree/hooks","issues_url":"https://api.github.com/orgs/errfree/issues","members_url":"https://api.github.com/orgs/errfree/members{/member}","public_members_url":"https://api.github.com/orgs/errfree/public_members{/member}","avatar_url":"https://avatars.githubusercontent.com/u/44?v=4","description":null},
However, if I do the same with subprocess via
p = subprocess.run('gh api organizations | cat', shell=True, capture_output=True)
print(p.stdout.decode())
the result is completely different:
Why? And: Is there an option to force the command (in my case gh) to behave like it would be piped?

Related

How can I pass python variable in os.system

I am trying to pass the output of one os.system() to another os.system().
However, I am getting no output.
user = os.system("whoami")
print (user)
box=os.system("docker ps -a --format \"{{.Names}}\" | grep user")
print (box)
Output:
xvision
256

There are a couple issues with the posted code:
The user in the grep command is a string, not the variable I believe you are intending to use.
The return value from os.system is simply the exit status of the command; not the values which you are looking to retrieve.
If I'm not mistaken, docker will require elevated permissions to execute the a ps command. Perhaps your visudo is setup differently than mine, which allows the command - but something to be aware of.
Additionally, the first system call to get the username is unneeded, as the shell call can be used instead, as grep $( whoami ). However, if you are expecting a different username on the docker system, you can use an f-string as:
f'grep | {user}'
Instead, the subprocess library should be used here as you can retrieve the values from the subprocess call.
For example:
import subprocess
# Note: sudo might be optional in your case, depending on setup.
rtn = subprocess.Popen('sudo docker ps -a --format {{.Names}} | grep $( whoami )',
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE).communicate()
>>> rtn
(b'username', None) # <-- tuple of (stdout, stderr)
Getting the username:
value = rtn[0].decode()
>>> value
'username'
A note on the shell call:
Some might argue that 'shell should be avoided'. However, in this case I'm choosing to use it for the following reasons:
To make the command string a bit easier to read for the OP.
To (more easily) facilitate the pipe into grep.
Without the pipe into grep, the command could be split into a list of arguments; thus alleviating the need for the shell call.

Reading only file names from s3 using python [duplicate]

How do I execute the following shell command using the Python subprocess module?
echo "input data" | awk -f script.awk | sort > outfile.txt
The input data will come from a string, so I don't actually need echo. I've got this far, can anyone explain how I get it to pipe through sort too?
p_awk = subprocess.Popen(["awk","-f","script.awk"],
stdin=subprocess.PIPE,
stdout=file("outfile.txt", "w"))
p_awk.communicate( "input data" )
UPDATE: Note that while the accepted answer below doesn't actually answer the question as asked, I believe S.Lott is right and it's better to avoid having to solve that problem in the first place!

You'd be a little happier with the following.
import subprocess
awk_sort = subprocess.Popen( "awk -f script.awk | sort > outfile.txt",
stdin=subprocess.PIPE, shell=True )
awk_sort.communicate( b"input data\n" )
Delegate part of the work to the shell. Let it connect two processes with a pipeline.
You'd be a lot happier rewriting 'script.awk' into Python, eliminating awk and the pipeline.
Edit. Some of the reasons for suggesting that awk isn't helping.
[There are too many reasons to respond via comments.]
Awk is adding a step of no significant value. There's nothing unique about awk's processing that Python doesn't handle.
The pipelining from awk to sort, for large sets of data, may improve elapsed processing time. For short sets of data, it has no significant benefit. A quick measurement of awk >file ; sort file and awk | sort will reveal of concurrency helps. With sort, it rarely helps because sort is not a once-through filter.
The simplicity of "Python to sort" processing (instead of "Python to awk to sort") prevents the exact kind of questions being asked here.
Python -- while wordier than awk -- is also explicit where awk has certain implicit rules that are opaque to newbies, and confusing to non-specialists.
Awk (like the shell script itself) adds Yet Another Programming language. If all of this can be done in one language (Python), eliminating the shell and the awk programming eliminates two programming languages, allowing someone to focus on the value-producing parts of the task.
Bottom line: awk can't add significant value. In this case, awk is a net cost; it added enough complexity that it was necessary to ask this question. Removing awk will be a net gain.
Sidebar Why building a pipeline (a | b) is so hard.
When the shell is confronted with a | b it has to do the following.
Fork a child process of the original shell. This will eventually become b.
Build an os pipe. (not a Python subprocess.PIPE) but call os.pipe() which returns two new file descriptors that are connected via common buffer. At this point the process has stdin, stdout, stderr from its parent, plus a file that will be "a's stdout" and "b's stdin".
Fork a child. The child replaces its stdout with the new a's stdout. Exec the a process.
The b child closes replaces its stdin with the new b's stdin. Exec the b process.
The b child waits for a to complete.
The parent is waiting for b to complete.
I think that the above can be used recursively to spawn a | b | c, but you have to implicitly parenthesize long pipelines, treating them as if they're a | (b | c).
Since Python has os.pipe(), os.exec() and os.fork(), and you can replace sys.stdin and sys.stdout, there's a way to do the above in pure Python. Indeed, you may be able to work out some shortcuts using os.pipe() and subprocess.Popen.
However, it's easier to delegate that operation to the shell.

import subprocess
some_string = b'input_data'
sort_out = open('outfile.txt', 'wb', 0)
sort_in = subprocess.Popen('sort', stdin=subprocess.PIPE, stdout=sort_out).stdin
subprocess.Popen(['awk', '-f', 'script.awk'], stdout=sort_in,
stdin=subprocess.PIPE).communicate(some_string)

To emulate a shell pipeline:
from subprocess import check_call
check_call('echo "input data" | a | b > outfile.txt', shell=True)
without invoking the shell (see 17.1.4.2. Replacing shell pipeline):
#!/usr/bin/env python
from subprocess import Popen, PIPE
a = Popen(["a"], stdin=PIPE, stdout=PIPE)
with a.stdin:
with a.stdout, open("outfile.txt", "wb") as outfile:
b = Popen(["b"], stdin=a.stdout, stdout=outfile)
a.stdin.write(b"input data")
statuses = [a.wait(), b.wait()] # both a.stdin/stdout are closed already
plumbum provides some syntax sugar:
#!/usr/bin/env python
from plumbum.cmd import a, b # magic
(a << "input data" | b > "outfile.txt")()
The analog of:
#!/bin/sh
echo "input data" | awk -f script.awk | sort > outfile.txt
is:
#!/usr/bin/env python
from plumbum.cmd import awk, sort
(awk["-f", "script.awk"] << "input data" | sort > "outfile.txt")()

The accepted answer is sidestepping actual question.
here is a snippet that chains the output of multiple processes:
Note that it also prints the (somewhat) equivalent shell command so you can run it and make sure the output is correct.
#!/usr/bin/env python3
from subprocess import Popen, PIPE
# cmd1 : dd if=/dev/zero bs=1m count=100
# cmd2 : gzip
# cmd3 : wc -c
cmd1 = ['dd', 'if=/dev/zero', 'bs=1M', 'count=100']
cmd2 = ['tee']
cmd3 = ['wc', '-c']
print(f"Shell style : {' '.join(cmd1)} | {' '.join(cmd2)} | {' '.join(cmd3)}")
p1 = Popen(cmd1, stdout=PIPE, stderr=PIPE) # stderr=PIPE optional, dd is chatty
p2 = Popen(cmd2, stdin=p1.stdout, stdout=PIPE)
p3 = Popen(cmd3, stdin=p2.stdout, stdout=PIPE)
print("Output from last process : " + (p3.communicate()[0]).decode())
# thoretically p1 and p2 may still be running, this ensures we are collecting their return codes
p1.wait()
p2.wait()
print("p1 return: ", p1.returncode)
print("p2 return: ", p2.returncode)
print("p3 return: ", p3.returncode)

http://www.python.org/doc/2.5.2/lib/node535.html covered this pretty well. Is there some part of this you didn't understand?
Your program would be pretty similar, but the second Popen would have stdout= to a file, and you wouldn't need the output of its .communicate().

Inspired by #Cristian's answer. I met just the same issue, but with a different command. So I'm putting my tested example, which I believe could be helpful:
grep_proc = subprocess.Popen(["grep", "rabbitmq"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
subprocess.Popen(["ps", "aux"], stdout=grep_proc.stdin)
out, err = grep_proc.communicate()
This is tested.
What has been done
Declared lazy grep execution with stdin from pipe. This command will be executed at the ps command execution when the pipe will be filled with the stdout of ps.
Called the primary command ps with stdout directed to the pipe used by the grep command.
Grep communicated to get stdout from the pipe.
I like this way because it is natural pipe conception gently wrapped with subprocess interfaces.

The previous answers missed an important point. Replacing shell pipeline is basically correct, as pointed out by geocar. It is almost sufficient to run communicate on the last element of the pipe.
The remaining problem is passing the input data to the pipeline. With multiple subprocesses, a simple communicate(input_data) on the last element doesn't work - it hangs forever. You need to create a a pipeline and a child manually like this:
import os
import subprocess
input = """\
input data
more input
""" * 10
rd, wr = os.pipe()
if os.fork() != 0: # parent
os.close(wr)
else: # child
os.close(rd)
os.write(wr, input)
os.close(wr)
exit()
p_awk = subprocess.Popen(["awk", "{ print $2; }"],
stdin=rd,
stdout=subprocess.PIPE)
p_sort = subprocess.Popen(["sort"],
stdin=p_awk.stdout,
stdout=subprocess.PIPE)
p_awk.stdout.close()
out, err = p_sort.communicate()
print (out.rstrip())
Now the child provides the input through the pipe, and the parent calls communicate(), which works as expected. With this approach, you can create arbitrary long pipelines without resorting to "delegating part of the work to the shell". Unfortunately the subprocess documentation doesn't mention this.
There are ways to achieve the same effect without pipes:
from tempfile import TemporaryFile
tf = TemporaryFile()
tf.write(input)
tf.seek(0, 0)
Now use stdin=tf for p_awk. It's a matter of taste what you prefer.
The above is still not 100% equivalent to bash pipelines because the signal handling is different. You can see this if you add another pipe element that truncates the output of sort, e.g. head -n 10. With the code above, sort will print a "Broken pipe" error message to stderr. You won't see this message when you run the same pipeline in the shell. (That's the only difference though, the result in stdout is the same). The reason seems to be that python's Popen sets SIG_IGN for SIGPIPE, whereas the shell leaves it at SIG_DFL, and sort's signal handling is different in these two cases.

EDIT: pipes is available on Windows but, crucially, doesn't appear to actually work on Windows. See comments below.
The Python standard library now includes the pipes module for handling this:
https://docs.python.org/2/library/pipes.html, https://docs.python.org/3.4/library/pipes.html
I'm not sure how long this module has been around, but this approach appears to be vastly simpler than mucking about with subprocess.

For me, the below approach is the cleanest and easiest to read
from subprocess import Popen, PIPE
def string_to_2_procs_to_file(input_s, first_cmd, second_cmd, output_filename):
with open(output_filename, 'wb') as out_f:
p2 = Popen(second_cmd, stdin=PIPE, stdout=out_f)
p1 = Popen(first_cmd, stdout=p2.stdin, stdin=PIPE)
p1.communicate(input=bytes(input_s))
p1.wait()
p2.stdin.close()
p2.wait()
which can be called like so:
string_to_2_procs_to_file('input data', ['awk', '-f', 'script.awk'], ['sort'], 'output.txt')

How to call a series of bash commands in python and store output

I am trying to run the following bash script in Python and store the readlist output. The readlist that I want to be stored as a python list, is a list of all files in the current directory ending in *concat_001.fastq.
I know it may be easier to do this in python (i.e.
import os
readlist = [f for f in os.listdir(os.getcwd()) if f.endswith("concat_001.fastq")]
readlist = sorted(readlist)
However, this is problematic, as I need Python to sort the list in EXACTLY the same was as bash, and I was finding that bash and Python sort certain things in different orders (eg Python and bash deal with capitalised and uncapitalised things differently - but when I tried
readlist = np.asarray(sorted(flist, key=str.lower))
I still found that two files starting with ML_ and M_ were sorted in different order with bash and Python. Hence trying to run my exact bash script through Python, then to use the sorted list generated with bash in my subsequent Python code.
input_suffix="concat_001.fastq"
ender=`echo $input_suffix | sed "s/concat_001.fastq/\*concat_001.fastq/g" `
readlist="$(echo $ender)"
I have tried
proc = subprocess.call(command1, shell=True, stdout=subprocess.PIPE)
proc = subprocess.call(command2, shell=True, stdout=subprocess.PIPE)
proc = subprocess.Popen(command3, shell=True, stdout=subprocess.PIPE)
But I just get: subprocess.Popen object at 0x7f31cfcd9190
Also - I don't understand the difference between subprocess.call and subprocess.Popen. I have tried both.
Thanks,
Ruth

So your question is a little confusing and does not exactly explain what you want. However, I'll try to give some suggestions to help you update it, or in my effort, answer it.
I will assume the following: your python script is passing to the command line 'input_suffix' and that you want your python program to receive the contents of 'readlist' when the external script finishes.
To make our lives simpler, and allow things to be more complicated, I would make the following bash script to contain your commands:
script.sh
#!/bin/bash
input_suffix=$1
ender=`echo $input_suffix | sed "s/concat_001.fastq/\*concat_001.fastq/g"`
readlist="$(echo $ender)"
echo $readlist
You would execute this as script.sh "concat_001.fastq", where $1 takes in the first argument passed on the command line.
To use python to execute external scripts, as you quite rightly found, you can use subprocess (or as noted by another response, os.system - although subprocess is recommended).
The docs tell you that subprocess.call:
"Wait for command to complete, then return the returncode attribute."
and that
"For more advanced use cases when these do not meet your needs, use the underlying Popen interface."
Given you want to pipe the output from the bash script to your python script, let's use Popen as suggested by the docs. As I posted the other stackoverflow answer, it could look like the following:
import subprocess
from subprocess import Popen, PIPE
# Execute out script and pipe the output to stdout
process = subprocess.Popen(['script.sh', 'concat_001.fastq'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
# Obtain the standard out, and standard error
stdout, stderr = process.communicate()
and then:
>>> print stdout
*concat_001.fastq

Best way to pipe output of Linux sort

I would like process a file line by line. However I need to sort it first which I normally do by piping:
sort --key=1,2 data |./script.py.
What's the best to call sort from within python? Searching online I see subprocess or the sh module might be possibilities? I don't want to read the file into memory and sort in python as the data is very big.

Its easy. Use subprocess.Popen to run sort and read its stdout to get your data.
import subprocess
myfile = 'data'
sort = subprocess.Popen(['sort', '--key=1,2', myfile],
stdout=subprocess.PIPE)
for line in sort.stdout:
your_code_here
sort.wait()
assert sort.returncode == 0, 'sort failed'

I think this page will answer your question
The answer I prefer, from #Eli Courtwright is (all quoted verbatim):
Here's a summary of the ways to call external programs and the advantages and disadvantages of each:
os.system("some_command with args") passes the command and arguments to your system's shell. This is nice because you can actually run multiple commands at once in this manner and set up pipes and input/output redirection. For example,
os.system("some_command < input_file | another_command > output_file")
However, while this is convenient, you have to manually handle the escaping of shell characters such as spaces, etc. On the other hand, this also lets you run commands which are simply shell commands and not actually external programs.
http://docs.python.org/lib/os-process.html
stream = os.popen("some_command with args") will do the same thing as os.system except that it gives you a file-like object that you can use to access standard input/output for that process. There are 3 other variants of popen that all handle the i/o slightly differently. If you pass everything as a string, then your command is passed to the shell; if you pass them as a list then you don't need to worry about escaping anything.
http://docs.python.org/lib/os-newstreams.html
The Popen class of the subprocess module. This is intended as a replacement for os.popen but has the downside of being slightly more complicated by virtue of being so comprehensive. For example, you'd say
print Popen("echo Hello World", stdout=PIPE, shell=True).stdout.read()
instead of
print os.popen("echo Hello World").read()
but it is nice to have all of the options there in one unified class instead of 4 different popen functions.
http://docs.python.org/lib/node528.html
The call function from the subprocess module. This is basically just like the Popen class and takes all of the same arguments, but it simply wait until the command completes and gives you the return code. For example:
return_code = call("echo Hello World", shell=True)
http://docs.python.org/lib/node529.html
The os module also has all of the fork/exec/spawn functions that you'd have in a C program, but I don't recommend using them directly.
The subprocess module should probably be what you use.

I believe sort will read all data in memory, so I'm not sure you will won anything but you can use shell=True in subprocess and use pipeline
>>> subprocess.check_output("ls", shell = True)
'1\na\na.cpp\nA.java\na.php\nerase_no_module.cpp\nerase_no_module.cpp~\nWeatherSTADFork.cpp\n'
>>> subprocess.check_output("ls | grep j", shell = True)
'A.java\n'
Warning
Invoking the system shell with shell=True can be a security hazard if combined with untrusted input. See the warning under Frequently Used Arguments for details.

How to get subprocess' stdout data asynchronously?

I wrote a simple python script for my application and predefined some fast commands like make etc.
I've written a function for running system commands (linux):
def runCommand(commandLine):
print('############## Running command: ' + commandLine)
p = subprocess.Popen(commandLine, shell = True, stdout = subprocess.PIPE)
print (p.stdout.read().decode('utf-8'))
Everything works well except a few things:
I'm using cmake and it's output is colored. Any chances to save colors in output?
I can look at output after process has finished. For example, make runs for a long period of time but I can see the output only after full compilation. How to do it asynchronously?

I'm not sure about colors, but here's how to poll the subprocess's stdout one line at a time:
import subprocess
proc = subprocess.Popen('cmake', shell=True, stdout=subprocess.PIPE)
while proc.poll() is None:
output = proc.stdout.readline()
print output
Don't forget to read from stderr as well, as I'm sure cmake will emit information there.

You're not getting color because cmake detects if its stdout is a terminal, if it's not it doesn't color its own output. Some programs give you an option to force coloring output. Unfortunately cmake does not, so you're out of luck there. Unless you want to patch cmake yourself.
Lots of programs do this, for example grep:
# grep test test.txt
test
^
|
|------- this word is red
Now pipe it to cat:
# grep test test.txt | cat
test
^
|
|------- no longer red
grep option --color=always to force color:
# grep test test.txt --color=always | cat
test
^
|
|------- red again

Regarding how to get the output of your process before it finishes, it should be possible to do that replacing:
p.stdout.read
with:
for line in p.stdout:
Regarding how to save colored output, there isn't anything special about that. For example, if the row output is saved to a file, then next time cat <logfile> is executed the console will interpret the escape sequences and displaye the colors as expected.

For CMake specifically, you can force color output using the option CLICOLOR_FORCE=1:
command = 'make CLICOLOR_FORCE=1'
args = shlex.split(command)
proc = subprocess.Popen(args, stdout=subprocess.PIPE)
Then print as in the accepted answer:
while proc.poll() is None:
output = proc.stdout.readline()
print(output.decode('utf-8'))
If you decode to utf-8 before printing, you should see colored output.
If you print the result as a byte literal (i.e. without decoding), you should see the escape sequences for the various colors.
Consider trying the option universal_newlines=True:
proc = subprocess.Popen(args, stdout=subprocess.PIPE, universal_newlines=True)
This causes the call to proc.stdout.readline() to return a string instead of a byte literal so you can/must skip the call to decode().

To do async output do something like: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440554
Not sure if you can capture the coloured output. If you can get the escaped colour codes you may be able to.

Worth noting here is the use of the script command as a pseudo terminal and be detected as a tty instead of redirect(pipe) file descriptor, see:
bash command preserve color when piping
Works like a charm...
As per the example in the question simply let script execute cmake:
import subprocess
proc = subprocess.Popen('script cmake', shell=True, stdout=subprocess.PIPE)
while proc.poll() is None:
output = proc.stdout.readline()
print output
This tricks cmake into thinking it's being executed from a terminal and will produce the ANSI candy you're after.
nJoy!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.