Python run bash script with flags and pipes

Python run bash script with flags and pipes - python

I have a bash script that returns the admin email for a domain, like the following.
whois -h $(whois "stackoverflow.com" | grep 'Registrar WHOIS Server:' | cut -f2- -d:) "stackoverflow.com" | grep 'Admin Email:' | cut -f2- -d:
I want to run this in a python file. I believe I need to use a subprocess but can't seem to get it working with the pipes and flags. Any help?

Yes, you can use subprocess with pipe.
i will ilustrate an exemple:
ps = subprocess.Popen(('whois', 'stackoverflow.com'), stdout=subprocess.PIPE)
output = subprocess.check_output(('grep', 'Registrar WHOIS'), stdin=ps.stdout)
ps.wait()
You can ajust as your's need

The easiest solution is to write the commands into a script file and execute that file.
If you don't want that, you can execute any command with
bash -c 'command'

This is covered in the Replacing Older Functions with the subprocess Module section of the docs.
The example there is this bash pipeline:
output=`dmesg | grep hda`
rewritten for subprocess as;
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
Note that in many cases, you don't need to handle all of the same edge cases that the shell handles in exactly the same way. But if you don't know what you need, it's better to be fully general like this.
Your $() does the same thing as the backticks in that example, your pipes are the same as the example's pipes, and your arguments aren't anything special.
So:
whois = Popen(['whois', 'stackoverflow.com'], stdout=PIPE)
grep = Popen(['grep', 'Registrar WHOIS Server:'], stdin=whois.stdout, stdout=PIPE)
whois.stdout.close()
cut = Popen(['cut', '-f2-', '-d:'], stdin=grep.stdout, stdout=PIPE)
grep.stdout.close()
inneroutput, _ = cut.communicate()
whois = Popen(['whois', '-h', inneroutput, 'stackoverflow.com'], stdout=PIPE)
grep = Popen(['grep', 'Admin Email:', stdin=whois.stdout, stdout=PIPE)
whois.stdout.close()
cut = Popen(['cut', '-f2-', '-d:'], stdin=grep.stdout)
grep.stdout.close()
cut.communicate()
If this seems like a mess, consider that:
Your original shell command is a mess.
If you actually know exactly what you're expecting the pipeline to do, you can skip a lot of it.
All of the stuff you're doing here could just be done directly in Python without the need for this whole mess.
You may be happier using a third-party library like plumbum.
How could you write the whole thing in Python without all this piping? For example, instead of using grep, you could use Python's re module. Or, since you're not even using a regular expression at all, just a simple in check. And likewise for cut:
whois = subprocess.run(['whois', 'stackoverflow.com'],
check=True, stdout=PIPE, encoding='utf-8').output
for line in whois.splitlines():
if 'Registrar WHOIS Server:' in line:
registrar = line.split(':', 1)[1]
break
whois = subprocess.run(['whois', '-h', registrar, 'stackoverflow.com'],
check=True, stdout=PIPE, encoding='utf-8').output
for line in inner.splitlines():
if 'Admin Email:' in line:
admin = line.split(':', 1)[1]
break

Related

Best way to check if a shell command has executed successfully

import os
val = os.popen("ls | grep a").read()
Let's say I want to check if a given directory has a any file named a. If the directory doesn't have a file with a in it the val is empty and if not val should be assigned with some output gotten from executing that command.
Are there any cases where the value of val could be still something even with empty output? In this case, there are no files with a but could val still have some value? Are there any cases where the output looks empty when we execute on a terminal, but value still has some value (e.g. white space)?
Is it an effective approach to use in general? (I am not really trying to check for files with certain names. This is actually just an example.)
Are there any better ways of doing such a thing?

I'd recommend using python3 subprocess, where you can use the check parameter. Then your command will throw an error if it does not succeed:
import subprocess
proc = subprocess.run(["ls | grep a"], shell=True, check=True, stdout=subprocess.PIPE)
# proc = subprocess.run(["ls | grep a"], shell=True, check=True, capture_output=True) # starting python3.7
print(proc.stdout)
but as #JohnKugelman suggested, in this case you'd better use glob:
import glob
files_with_a = glob.glob("*a*")

If you reallly go for the approach of running a os command from python, I'd recommend using the subprocess package:
import subprocess
command = "ls | grep a"
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
process.wait()
print(process.returncode)

Reading only file names from s3 using python [duplicate]

How do I execute the following shell command using the Python subprocess module?
echo "input data" | awk -f script.awk | sort > outfile.txt
The input data will come from a string, so I don't actually need echo. I've got this far, can anyone explain how I get it to pipe through sort too?
p_awk = subprocess.Popen(["awk","-f","script.awk"],
stdin=subprocess.PIPE,
stdout=file("outfile.txt", "w"))
p_awk.communicate( "input data" )
UPDATE: Note that while the accepted answer below doesn't actually answer the question as asked, I believe S.Lott is right and it's better to avoid having to solve that problem in the first place!

You'd be a little happier with the following.
import subprocess
awk_sort = subprocess.Popen( "awk -f script.awk | sort > outfile.txt",
stdin=subprocess.PIPE, shell=True )
awk_sort.communicate( b"input data\n" )
Delegate part of the work to the shell. Let it connect two processes with a pipeline.
You'd be a lot happier rewriting 'script.awk' into Python, eliminating awk and the pipeline.
Edit. Some of the reasons for suggesting that awk isn't helping.
[There are too many reasons to respond via comments.]
Awk is adding a step of no significant value. There's nothing unique about awk's processing that Python doesn't handle.
The pipelining from awk to sort, for large sets of data, may improve elapsed processing time. For short sets of data, it has no significant benefit. A quick measurement of awk >file ; sort file and awk | sort will reveal of concurrency helps. With sort, it rarely helps because sort is not a once-through filter.
The simplicity of "Python to sort" processing (instead of "Python to awk to sort") prevents the exact kind of questions being asked here.
Python -- while wordier than awk -- is also explicit where awk has certain implicit rules that are opaque to newbies, and confusing to non-specialists.
Awk (like the shell script itself) adds Yet Another Programming language. If all of this can be done in one language (Python), eliminating the shell and the awk programming eliminates two programming languages, allowing someone to focus on the value-producing parts of the task.
Bottom line: awk can't add significant value. In this case, awk is a net cost; it added enough complexity that it was necessary to ask this question. Removing awk will be a net gain.
Sidebar Why building a pipeline (a | b) is so hard.
When the shell is confronted with a | b it has to do the following.
Fork a child process of the original shell. This will eventually become b.
Build an os pipe. (not a Python subprocess.PIPE) but call os.pipe() which returns two new file descriptors that are connected via common buffer. At this point the process has stdin, stdout, stderr from its parent, plus a file that will be "a's stdout" and "b's stdin".
Fork a child. The child replaces its stdout with the new a's stdout. Exec the a process.
The b child closes replaces its stdin with the new b's stdin. Exec the b process.
The b child waits for a to complete.
The parent is waiting for b to complete.
I think that the above can be used recursively to spawn a | b | c, but you have to implicitly parenthesize long pipelines, treating them as if they're a | (b | c).
Since Python has os.pipe(), os.exec() and os.fork(), and you can replace sys.stdin and sys.stdout, there's a way to do the above in pure Python. Indeed, you may be able to work out some shortcuts using os.pipe() and subprocess.Popen.
However, it's easier to delegate that operation to the shell.

import subprocess
some_string = b'input_data'
sort_out = open('outfile.txt', 'wb', 0)
sort_in = subprocess.Popen('sort', stdin=subprocess.PIPE, stdout=sort_out).stdin
subprocess.Popen(['awk', '-f', 'script.awk'], stdout=sort_in,
stdin=subprocess.PIPE).communicate(some_string)

To emulate a shell pipeline:
from subprocess import check_call
check_call('echo "input data" | a | b > outfile.txt', shell=True)
without invoking the shell (see 17.1.4.2. Replacing shell pipeline):
#!/usr/bin/env python
from subprocess import Popen, PIPE
a = Popen(["a"], stdin=PIPE, stdout=PIPE)
with a.stdin:
with a.stdout, open("outfile.txt", "wb") as outfile:
b = Popen(["b"], stdin=a.stdout, stdout=outfile)
a.stdin.write(b"input data")
statuses = [a.wait(), b.wait()] # both a.stdin/stdout are closed already
plumbum provides some syntax sugar:
#!/usr/bin/env python
from plumbum.cmd import a, b # magic
(a << "input data" | b > "outfile.txt")()
The analog of:
#!/bin/sh
echo "input data" | awk -f script.awk | sort > outfile.txt
is:
#!/usr/bin/env python
from plumbum.cmd import awk, sort
(awk["-f", "script.awk"] << "input data" | sort > "outfile.txt")()

The accepted answer is sidestepping actual question.
here is a snippet that chains the output of multiple processes:
Note that it also prints the (somewhat) equivalent shell command so you can run it and make sure the output is correct.
#!/usr/bin/env python3
from subprocess import Popen, PIPE
# cmd1 : dd if=/dev/zero bs=1m count=100
# cmd2 : gzip
# cmd3 : wc -c
cmd1 = ['dd', 'if=/dev/zero', 'bs=1M', 'count=100']
cmd2 = ['tee']
cmd3 = ['wc', '-c']
print(f"Shell style : {' '.join(cmd1)} | {' '.join(cmd2)} | {' '.join(cmd3)}")
p1 = Popen(cmd1, stdout=PIPE, stderr=PIPE) # stderr=PIPE optional, dd is chatty
p2 = Popen(cmd2, stdin=p1.stdout, stdout=PIPE)
p3 = Popen(cmd3, stdin=p2.stdout, stdout=PIPE)
print("Output from last process : " + (p3.communicate()[0]).decode())
# thoretically p1 and p2 may still be running, this ensures we are collecting their return codes
p1.wait()
p2.wait()
print("p1 return: ", p1.returncode)
print("p2 return: ", p2.returncode)
print("p3 return: ", p3.returncode)

http://www.python.org/doc/2.5.2/lib/node535.html covered this pretty well. Is there some part of this you didn't understand?
Your program would be pretty similar, but the second Popen would have stdout= to a file, and you wouldn't need the output of its .communicate().

Inspired by #Cristian's answer. I met just the same issue, but with a different command. So I'm putting my tested example, which I believe could be helpful:
grep_proc = subprocess.Popen(["grep", "rabbitmq"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
subprocess.Popen(["ps", "aux"], stdout=grep_proc.stdin)
out, err = grep_proc.communicate()
This is tested.
What has been done
Declared lazy grep execution with stdin from pipe. This command will be executed at the ps command execution when the pipe will be filled with the stdout of ps.
Called the primary command ps with stdout directed to the pipe used by the grep command.
Grep communicated to get stdout from the pipe.
I like this way because it is natural pipe conception gently wrapped with subprocess interfaces.

The previous answers missed an important point. Replacing shell pipeline is basically correct, as pointed out by geocar. It is almost sufficient to run communicate on the last element of the pipe.
The remaining problem is passing the input data to the pipeline. With multiple subprocesses, a simple communicate(input_data) on the last element doesn't work - it hangs forever. You need to create a a pipeline and a child manually like this:
import os
import subprocess
input = """\
input data
more input
""" * 10
rd, wr = os.pipe()
if os.fork() != 0: # parent
os.close(wr)
else: # child
os.close(rd)
os.write(wr, input)
os.close(wr)
exit()
p_awk = subprocess.Popen(["awk", "{ print $2; }"],
stdin=rd,
stdout=subprocess.PIPE)
p_sort = subprocess.Popen(["sort"],
stdin=p_awk.stdout,
stdout=subprocess.PIPE)
p_awk.stdout.close()
out, err = p_sort.communicate()
print (out.rstrip())
Now the child provides the input through the pipe, and the parent calls communicate(), which works as expected. With this approach, you can create arbitrary long pipelines without resorting to "delegating part of the work to the shell". Unfortunately the subprocess documentation doesn't mention this.
There are ways to achieve the same effect without pipes:
from tempfile import TemporaryFile
tf = TemporaryFile()
tf.write(input)
tf.seek(0, 0)
Now use stdin=tf for p_awk. It's a matter of taste what you prefer.
The above is still not 100% equivalent to bash pipelines because the signal handling is different. You can see this if you add another pipe element that truncates the output of sort, e.g. head -n 10. With the code above, sort will print a "Broken pipe" error message to stderr. You won't see this message when you run the same pipeline in the shell. (That's the only difference though, the result in stdout is the same). The reason seems to be that python's Popen sets SIG_IGN for SIGPIPE, whereas the shell leaves it at SIG_DFL, and sort's signal handling is different in these two cases.

EDIT: pipes is available on Windows but, crucially, doesn't appear to actually work on Windows. See comments below.
The Python standard library now includes the pipes module for handling this:
https://docs.python.org/2/library/pipes.html, https://docs.python.org/3.4/library/pipes.html
I'm not sure how long this module has been around, but this approach appears to be vastly simpler than mucking about with subprocess.

For me, the below approach is the cleanest and easiest to read
from subprocess import Popen, PIPE
def string_to_2_procs_to_file(input_s, first_cmd, second_cmd, output_filename):
with open(output_filename, 'wb') as out_f:
p2 = Popen(second_cmd, stdin=PIPE, stdout=out_f)
p1 = Popen(first_cmd, stdout=p2.stdin, stdin=PIPE)
p1.communicate(input=bytes(input_s))
p1.wait()
p2.stdin.close()
p2.wait()
which can be called like so:
string_to_2_procs_to_file('input data', ['awk', '-f', 'script.awk'], ['sort'], 'output.txt')

sort and uniq in python

I want to do some shell command in python. I have a main.py, which call successive function and I find some of them easier to do in shell. The problem : I want to do all of this automatically !
I want to do this kind of code :
sort fileIn | uniq > fileOut
my problem is to do it with the pipe caracter. I try :
from subprocess import call
call(['sort ',FileOut,'|',' uniq '])
or
p1 = subprocess.Popen(['sort ', FileOut], stdout=subprocess.PIPE)
p2 = subprocess.Popen([" wc","-l"], stdin=p1.stdout, stdout=subprocess.PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output,err = p2.communicate()
But all of this didn't work.
(NB: FileOut is a string)

You need to use shell=True, which causes your command to be run by the shell, instead of using a exec syscall:
call('sort {0} | uniq'.format(FileOut), shell=True)
It's worth noting that, if you simply want unique lines of a file in python (in no particular order), it may be easier to do so without the shell:
unique_lines= set(open('filename').readlines())

I got tired of always looking up the Popen module documentation so this is an abridged version of the utility function I use to wrap Popen. you can take the so parameter of the first call and pass it as the input to the next call. You can also do error checking/parsing if you need to.
def run(command, input=None)
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE, shell=True)
if input:
so, se = process.communicate(input)
else:
so, se = process.communicate()
rc = process.returncode
return so, se, rc

Sub-processing pipe write to file malfunction

Executing this in shell gets me tangible results:
wget -O c1 --no-cache "http://some.website" | sed "1,259d" c1 | sed "4,2002d"
Doing this in Python gets me nothing:
subprocess.call(shlex.split("wget -O c1 --no-cache \"http://some.website/tofile\""))
c1 = open("c1",'w')
first = subprocess.Popen(shlex.split("sed \"1,259d\" c1"), stdout=subprocess.PIPE)
subprocess.Popen(shlex.split("sed \"4,2002d\""), stdin=first.stdout, stdout=c1)
c1.close()
Doing this also gets me no results:
c1.write(subprocess.Popen(shlex.split("sed \"4,2002d\""), stdin=first.stdout, stdout=subprocess.PIPE).communicate()[0])
By 'gets me nothing' I mean blank output in the file. Does anyone see anything out of the ordinary here?

I always use plumbum for running external commands. It provides a very intuitive interface and, of course, takes care of escaping for me.
Would look something like:
from plumbum.cmd import wget, sed
cmd1 = wget['-O', 'c1']['--no-cache']["http://some.website"]
cmd2 = sed["1,259d"]['c1'] | sed["4,2002d"]
print cmd1
cmd1() # run it
print cmd2
cmd2() # run it

The statement c1 = open("c1",'w') opens file c1 for writing and truncates any existing data, so everything wget wrote to the file gets erased before you call sed.
Anyway, I think shlex.split is generally awkward. I prefer to build the args list manually:
from subprocess import Popen, PIPE
p0 = Popen(['wget', '-O', '-', 'http://www.google.com'], stdout=PIPE)
p1 = Popen(['sed', '2,8d'], stdin=p0.stdout, stdout=PIPE)
with open('c1', 'w') as c1:
p2 = Popen(['sed', '2,7d'], stdin=p1.stdout, stdout=c1)
p2.wait()
However, there's no obvious reason a Python programmer should have to call out to sed. Python has string methods and regular expressions. Also, instead of wget you can use urllib2.urlopen.

Why not just do everything all in pipes and send the output to a file?
wget -O - "http://www.google.com" | sed "1,259d" | sed "4,2002d" > c1
Or if you don't want to send it to a file, and want it on stdout instead:
wget -O - "http://www.google.com" | sed "1,259d" | sed "4,2002d"
And if you want to do it in Python:
pipe = subprocess.Popen(shlex.split("wget -O - \"http://www.google.com\" | sed \"1,259d\" | sed \"4,2002d\""), stdout=subprocess.PIPE)
result = pipe.communicate()[0]

In the interest of making life easier for people who more-or-less may be running into the same type of problem, I have decided to post the final revised code, which factored in comments about c1 and overwriting of data. Of particular interest is the usage of communicate() which helped to completely eliminate any manifestations of zombie processes, which were quite irritating. Also, I found it useful to use subprocess.call in portions where piping wasn't necessary. No wait() was necessary in the end. Ultimately, staying away from sed and wget is a good idea, especially with Python's inbuilt tools and urllib2.
p0 = subprocess.call(shlex.split("wget -Oc1 --no-cache \"http://Some.website/tofile\""))
p1 = subprocess.Popen(shlex.split("sed \"1,261d\" c1"), stdout=subprocess.PIPE)
with open("cc1", 'w') as cc1:
p2 = subprocess.Popen(shlex.split("sed \"3,2002d\""), stdin=p1.stdout, stdout=cc1)
p2.communicate()
p1.communicate()
p3 = subprocess.call(shlex.split("mv cc1 c1"))

python subprocess: "write error: Broken pipe"

I have a problem piping a simple subprocess.Popen.
Code:
import subprocess
cmd = 'cat file | sort -g -k3 | head -20 | cut -f2,3' % (pattern,file)
p = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE)
for line in p.stdout:
print(line.decode().strip())
Output for file ~1000 lines in length:
...
sort: write failed: standard output: Broken pipe
sort: write error
Output for file >241 lines in length:
...
sort: fflush failed: standard output: Broken pipe
sort: write error
Output for file <241 lines in length is fine.
I have been reading the docs and googling like mad but there is something fundamental about the subprocess module that I'm missing ... maybe to do with buffers. I've tried p.stdout.flush() and playing with the buffer size and p.wait(). I've tried to reproduce this with commands like 'sleep 20; cat moderatefile' but this seems to run without error.

From the recipes on subprocess docs:
# To replace shell pipeline like output=`dmesg | grep hda`
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]

This is because you shouldn't use "shell pipes" in the command passed to subprocess.Popen, you should use the subprocess.PIPE like this:
from subprocess import Popen, PIPE
p1 = Popen('cat file', stdout=PIPE)
p2 = Popen('sort -g -k 3', stdin=p1.stdout, stdout=PIPE)
p3 = Popen('head -20', stdin=p2.stdout, stdout=PIPE)
p4 = Popen('cut -f2,3', stdin=p3.stdout)
final_output = p4.stdout.read()
But i have to say that what you're trying to do could be done in pure python instead of calling a bunch of shell commands.

I have been having the same error. Even put the pipe in a bash script and executed that instead of the pipe in Python. From Python it would get the broken pipe error, from bash it wouldn't.
It seems to me that perhaps the last command prior to the head is throwing an error as it's (the sort) STDOUT is closed. Python must be picking up on this whereas with the shell the error is silent. I've changed my code to consume the entire input and the error went away.
Would make sense also with smaller files working as the pipe probably buffers the entire output before head exits. This would explain the breaks on larger files.
e.g., instead of a 'head -1' (in my case, I was only wanting the first line), I did an awk 'NR == 1'
There are probably better ways of doing this depending on where the 'head -X' occurs in the pipe.

You don't need shell=True. Don't invoke the shell. This is how I would do it:
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
stdout_value = p.communicate()[0]
stdout_value # the output
See if you face the problem about the buffer after using this?

try using communicate(), rather than reading directly from stdout.
the python docs say this:
"Warning Use communicate() rather than
.stdin.write, .stdout.read or
.stderr.read to avoid deadlocks due to
any of the other OS pipe buffers
filling up and blocking the child
process."
http://docs.python.org/library/subprocess.html#subprocess.Popen.stdout
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
output = p.communicate[0]
for line in output:
# do stuff

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.