sort and uniq in python

sort and uniq in python - python

I want to do some shell command in python. I have a main.py, which call successive function and I find some of them easier to do in shell. The problem : I want to do all of this automatically !
I want to do this kind of code :
sort fileIn | uniq > fileOut
my problem is to do it with the pipe caracter. I try :
from subprocess import call
call(['sort ',FileOut,'|',' uniq '])
or
p1 = subprocess.Popen(['sort ', FileOut], stdout=subprocess.PIPE)
p2 = subprocess.Popen([" wc","-l"], stdin=p1.stdout, stdout=subprocess.PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output,err = p2.communicate()
But all of this didn't work.
(NB: FileOut is a string)

You need to use shell=True, which causes your command to be run by the shell, instead of using a exec syscall:
call('sort {0} | uniq'.format(FileOut), shell=True)
It's worth noting that, if you simply want unique lines of a file in python (in no particular order), it may be easier to do so without the shell:
unique_lines= set(open('filename').readlines())

I got tired of always looking up the Popen module documentation so this is an abridged version of the utility function I use to wrap Popen. you can take the so parameter of the first call and pass it as the input to the next call. You can also do error checking/parsing if you need to.
def run(command, input=None)
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE, shell=True)
if input:
so, se = process.communicate(input)
else:
so, se = process.communicate()
rc = process.returncode
return so, se, rc

Related

Running shell commands from Python and printing the output in real time

I want to write a function that will execute multiple shell commands one at a time and print what the shell returns in real time.
I currently have the following code which does not print the shell (I am using Windows 10 and python 3.6.2):
commands = ["foo", "foofoo"]
p = subprocess.Popen("cmd.exe", shell=True, stdin=subprocess.PIPE, \
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
for command in commands:
p.stdin.write((command + "\n").encode("utf-8"))
p.stdin.close()
p.stdout.read()
How can I see what the shell returns in real time ?
Edit : This question is not a duplicate of the two first links in the comments, they do not help printing in real time.

It is possible to handle stdin and stdout in different threads. That way one thread can be handling printing the output from stdout and another one writing new commands on stdin. However, since stdin and stdout are independent streams, I do not think this can guarantee the order between the streams. For the current example it seems to work as intended, though.
import subprocess
import threading
def stdout_printer(p):
for line in p.stdout:
print(line.rstrip())
commands = ["foo", "foofoo"]
p = subprocess.Popen("cmd.exe", stdin=subprocess.PIPE,
stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
universal_newlines=True)
t = threading.Thread(target=stdout_printer, args=(p,))
t.start()
for command in commands:
p.stdin.write((command + "\n"))
p.stdin.flush()
p.stdin.close()
t.join()
Also, note that I am writing stdout line by line, which is normally OK, since it tends to be buffered and being generated a line (or more) at a time. I guess it is possible to handle an unbuffered stdout stream (or e.g. stderr) character-by-character instead, if that is preferable.

I believe you need something like this
commands = ["foo", "foofoo"]
p = subprocess.Popen("cmd.exe", shell=True, stdin=subprocess.PIPE, \
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
for command in commands:
p.stdin.write((command + "\n").encode("utf-8"))
out, err = p.communicate()
print("{}".format(out))
print("{}".format(err))

Assuming you want control of the output in your python code you might need to do something like this
import subprocess
def run_process(exe):
'Define a function for running commands and capturing stdout line by line'
p = subprocess.Popen(exe.split(), stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
return iter(p.stdout.readline, b'')
if __name__ == '__main__':
commands = ["foo", "foofoo"]
for command in commands:
for line in run_process(command):
print(line)

Python Popen shell script but fail

I want to execute bash command
'/bin/echo </verbosegc> >> /tmp/jruby.log'
in python using Popen. The code does not raise any exception, but none change is made on the jruby.log after execution. The python code is shown below.
>>> command='/bin/echo </verbosegc> >> '+fullpath
>>> command
'/bin/echo </verbosegc> >> /tmp/jruby.log'
>>process = subprocess.Popen(command.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True)
>>> output= process.communicate()[0]
>>> output
'</verbosegc> >> /tmp/jruby.log\n
I also print out the process.pid and then check the pid using ps -ef | grep pid. The result shows that the process pid has been finished.

Just use pass file object if you want to append the output to a file, you cannot redirect to a file unless you set shell=True:
command = ['/bin/echo', '</verbosegc>']
with open('/tmp/jruby.log',"a") as f:
subprocess.check_call(command, stdout=f,stderr=subprocess.STDOUT)

The first argument to subprocess.Popen is the array ['/bin/echo', '</verbosegc>', '>>', '/tmp/jruby.log']. When the first argument to subprocess.Popen is an array, it does not launch a shell to run the command, and the shell is what's responsible for interpreting >> /tmp/jruby.log to mean "write output to jruby.log".
In order to make the >> redirection work in this command, you'll need to pass command directly to subprocess.Popen() without splitting it into a list. You'll also need to quote the first argument (or else the shell will interpret the "<" and ">" characters in ways you don't want):
command = '/bin/echo "</verbosegc>" >> /tmp/jruby.log'
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True)

Consider the following:
command = [ 'printf "%s\n" "$1" >>"$2"', # shell script to execute
'', # $0 in shell
'</verbosegc>', # $1
'/tmp/jruby.log' ] # $2
subprocess.Popen(command, shell=True)
The first argument is a shell script referring to $1 and $2, which are in turn passed as separate arguments. Keeping data separate from code, rather than trying to substitute the former into the latter, is a precaution against shell injection (think of this as an analog to SQL injection).
Of course, don't actually do anything like this in Python -- the native primitives for file IO are far more appropriate.

Have you tried without splitting the command and using shell=True? My usual format is:
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)
output = process.stdout.read() # or .readlines()

Running shell command from Python script

I'm trying to run a shell command from within a python script which needs to do several things
1. The shell command is 'hspice tran.deck >! tran.lis'
2. The script should wait for the shell command to complete before proceeding
3. I need to check the return code from the command and
4. Capture STDOUT if it completed successfully else capture STDERR
I went through the subprocess module and tried out a couple of things but couldn't find a way to do all of the above.
- with subprocess.call() I could check the return code but not capture the output.
- with subprocess.check_output() I could capture the output but not the code.
- with subprocess.Popen() and Popen.communicate(), I could capture STDOUT and STDERR but not the return code.
I'm not sure how to use Popen.wait() or the returncode attribute. I also couldn't get Popen to accept '>!' or '|' as arguments.
Can someone please point me in the right direction? I'm using Python 2.7.1
EDIT: Got things working with the following code
process = subprocess.Popen('ls | tee out.txt', shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = process.communicate()
if(process.returncode==0):
print out
else:
print err
Also, should I use a process.wait() after the process = line or does it wait by default?

Just use .returncode after .communicate(). Also, tell Popen that what you're trying to run is a shell command, rather than a raw command line:
p = subprocess.Popen('ls | tee out.txt', shell=True, ...)
p.communicate()
print p.returncode

From the docs:
Popen.returncode
The child return code, set by poll() and wait() (and indirectly by communicate()). A None value indicates that the process hasn’t terminated yet.
A negative value -N indicates that the child was terminated by signal N (Unix only).

Here is example how to interact with shell:
>>> process = subprocess.Popen(['/bin/bash'], shell=False, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
>>> process.stdin.write('echo it works!\n')
>>> process.stdout.readline()
'it works!\n'
>>> process.stdin.write('date\n')
>>> process.stdout.readline()
'wto, 13 mar 2012, 17:25:35 CET\n'
>>>

Getting shell output with Python?

I have a shell script that gets whois info for domains, and outputs taken or available to the shell depending on the domain.
I'd like to execute the script, and be able to read this value inside my Python script.
I've been playing around with subprocess.call but can't figure out how to get the output.
e.g.,
subprocess.call('myscript www.google.com', shell=True)
will output taken to the shell.

subprocess.call() does not give you the output, only the return code. For the output you should use subprocess.check_output() instead. These are friendly wrappers around the popen family of functions, which you could also use directly.
For more details, see: http://docs.python.org/library/subprocess.html

Manually using stdin and stdout with Popen was such a common pattern that it has been abstracted into a very useful method in the subprocess module: communicate
Example:
p = subprocess.Popen(['myscript', 'www.google.com'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
(stdoutdata, stderrdata) = p.communicate(input="myinputstring")
# all done!

import subprocess as sp
p = sp.Popen(["/usr/bin/svn", "update"], stdin=sp.PIPE, stdout=sp.PIPE, close_fds=True)
(stdout, stdin) = (p.stdout, p.stdin)
data = stdout.readline()
while data:
# Do stuff with data, linewise.
data = stdout.readline()
stdout.close()
stdin.close()
Is the idiom I use, obviously in this case I was updating an svn repository.

try subprocess.check_output.

python subprocess: "write error: Broken pipe"

I have a problem piping a simple subprocess.Popen.
Code:
import subprocess
cmd = 'cat file | sort -g -k3 | head -20 | cut -f2,3' % (pattern,file)
p = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE)
for line in p.stdout:
print(line.decode().strip())
Output for file ~1000 lines in length:
...
sort: write failed: standard output: Broken pipe
sort: write error
Output for file >241 lines in length:
...
sort: fflush failed: standard output: Broken pipe
sort: write error
Output for file <241 lines in length is fine.
I have been reading the docs and googling like mad but there is something fundamental about the subprocess module that I'm missing ... maybe to do with buffers. I've tried p.stdout.flush() and playing with the buffer size and p.wait(). I've tried to reproduce this with commands like 'sleep 20; cat moderatefile' but this seems to run without error.

From the recipes on subprocess docs:
# To replace shell pipeline like output=`dmesg | grep hda`
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]

This is because you shouldn't use "shell pipes" in the command passed to subprocess.Popen, you should use the subprocess.PIPE like this:
from subprocess import Popen, PIPE
p1 = Popen('cat file', stdout=PIPE)
p2 = Popen('sort -g -k 3', stdin=p1.stdout, stdout=PIPE)
p3 = Popen('head -20', stdin=p2.stdout, stdout=PIPE)
p4 = Popen('cut -f2,3', stdin=p3.stdout)
final_output = p4.stdout.read()
But i have to say that what you're trying to do could be done in pure python instead of calling a bunch of shell commands.

I have been having the same error. Even put the pipe in a bash script and executed that instead of the pipe in Python. From Python it would get the broken pipe error, from bash it wouldn't.
It seems to me that perhaps the last command prior to the head is throwing an error as it's (the sort) STDOUT is closed. Python must be picking up on this whereas with the shell the error is silent. I've changed my code to consume the entire input and the error went away.
Would make sense also with smaller files working as the pipe probably buffers the entire output before head exits. This would explain the breaks on larger files.
e.g., instead of a 'head -1' (in my case, I was only wanting the first line), I did an awk 'NR == 1'
There are probably better ways of doing this depending on where the 'head -X' occurs in the pipe.

You don't need shell=True. Don't invoke the shell. This is how I would do it:
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
stdout_value = p.communicate()[0]
stdout_value # the output
See if you face the problem about the buffer after using this?

try using communicate(), rather than reading directly from stdout.
the python docs say this:
"Warning Use communicate() rather than
.stdin.write, .stdout.read or
.stderr.read to avoid deadlocks due to
any of the other OS pipe buffers
filling up and blocking the child
process."
http://docs.python.org/library/subprocess.html#subprocess.Popen.stdout
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
output = p.communicate[0]
for line in output:
# do stuff

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

sort and uniq in python - python

Related

Running shell commands from Python and printing the output in real time

Python Popen shell script but fail

Running shell command from Python script

Getting shell output with Python?

python subprocess: "write error: Broken pipe"

Categories

Resources