using tail in python subprocess takes long time

using tail in python subprocess takes long time - python

I am trying to use multiprocessing for my research project. As multiple process will read the same large file in the same time, so I try to make a copy of it within specific range for each process using tail. The specific code is shown as bellow
python
result = subprocess.call(["tail", "-n", "+" + str(skip+1), resolution_file, ">", skipped_file], shell=True)
The result is shown as 0, which should have been done. But there is no skipped file I want generated. I also try the code in the python console, but it takes unreasonable long time that i have to keyinterrupt it.
Anyone has some ideas?

Your code hangs because it's waiting for an EOF on stdin, not reading resolution_file at all.
Taking out shell=True fixes this:
result = subprocess.call(["tail", "-n", "+" + str(skip+1), resolution_file],
stdout=open(skipped_file, 'w'))
...now, why did it behave that way? Because shell=True prepends ['sh', '-c'] to your argument list. Thus, your original code was actually doing the following:
result = subprocess.call(["sh", "-c", "tail", "-n", "+" + str(skip+1), resolution_file, ">", skipped_file])
And what does that do? Well, it runs sh -c 'tail', with subsequent arguments available to the shell script tail. Except that script doesn't look at its other arguments at all, so they're just ignored. And when it's passed no arguments, tail just waits for an EOF on stdin... one which, in the case at hand, never comes.
So, what if you did want to use shell=True, and to open the output file from inside the shell rather than from your Python code? In that case, you might write the code as such:
result = subprocess.call([
'tail -n +"$1" -- "$2" >"$3"', '_', # script itself, then $0 it's run with
str(skip+1), # this is $1 for the script
resolution_file, # ...its $2...
skipped_file # ...and its $3
], shell=True)

Related

Run a subprocess in python and both show the output in "real time" and save it to a variable

I would like to be able to run a subprocess from python code and both see the output in real time and once the process is finished have the output in a variable
Right now I do one of either two things
1) Run subprocess using subprocess.call in that case I get the output in real time but I don't have at the end the output in a variable (I want to parse it and extract values from it)
2) Run subprocess using subprocess.check_output in that case I have the output in a variable but if I want to see it then I have to print it "manually"
Is there a way to get both things "together" ?
Hope it is clear, I can add my code if you need
Thanks !!!
EDIT:
This is my current code
I added a timeout optional parameter (Default value is 1200 and also deal with shell (For some reason same commands that work in Linux do not work in Windows if I don't have the shell=True) the "mode" parameter is the one that I use to differentiate the cases where I want the output in "real time" and I don't have to parse it and the other cases
I was wondering if there is a cleaner and better way to achieve same results

Assuming you are trying to run some command your_command You can use the following:
some_proc = subprocess.Popen(['your_command'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
The stdout=subprocess.PIPE does stdout the result on success. Afterwards, you can access the output as follows:
store_in_var = some_proc.stdout
Now you can parse your store_in_var

import subprocess
from subprocess import PIPE
comd = input('command here : ')
comds = comd.split(' ')
f = subprocess.run(comds, shell= True,stdout=PIPE, stderr=PIPE)
result = f.stdout.decode()
errors = f.stderr.decode()

How does subprocess.call() work with shell=False?

I am using Python's subprocess module to call some Linux command line functions. The documentation explains the shell=True argument as
If shell is True, the specified command will be executed through the shell
There are two examples, which seem the same to me from a descriptive viewpoint (i.e. both of them call some command-line command), but one of them uses shell=True and the other does not
>>> subprocess.call(["ls", "-l"])
0
>>> subprocess.call("exit 1", shell=True)
1
My question is:
What does running the command with shell=False do, in contrast to shell=True?
I was under the impression that subprocess.call and check_call and check_output all must execute the argument through the shell. In other words, how can it possibly not execute the argument through the shell?
It would also be helpful to get some examples of:
Things that can be done with shell=True that can't be done with
shell=False and why they can't be done.
Vice versa (although it seems that there are no such examples)
Things for which it does not matter whether shell=True or False and why it doesn't matter

UNIX programs start each other with the following three calls, or derivatives/equivalents thereto:
fork() - Create a new copy of yourself.
exec() - Replace yourself with a different program (do this if you're the copy!).
wait() - Wait for another process to finish (optional, if not running in background).
Thus, with shell=False, you do just that (as Python-syntax pseudocode below -- exclude the wait() if not a blocking invocation such as subprocess.call()):
pid = fork()
if pid == 0: # we're the child process, not the parent
execlp("ls", "ls", "-l", NUL);
else:
retval = wait(pid) # we're the parent; wait for the child to exit & get its exit status
whereas with shell=True, you do this:
pid = fork()
if pid == 0:
execlp("sh", "sh", "-c", "ls -l", NUL);
else:
retval = wait(pid)
Note that with shell=False, the command we executed was ls, whereas with shell=True, the command we executed was sh.
That is to say:
subprocess.Popen(foo, shell=True)
is exactly the same as:
subprocess.Popen(
["sh", "-c"] + ([foo] if isinstance(foo, basestring) else foo),
shell=False)
That is to say, you execute a copy of /bin/sh, and direct that copy of /bin/sh to parse the string into an argument list and execute ls -l itself.
So, why would you use shell=True?
You're invoking a shell builtin.
For instance, the exit command is actually part of the shell itself, rather than an external command. That said, this is a fairly small set of commands, and it's rare for them to be useful in the context of a shell instance that only exists for the duration of a single subprocess.call() invocation.
You have some code with shell constructs (ie. redirections) that would be difficult to emulate without it.
If, for instance, your command is cat one two >three, the syntax >three is a redirection: It's not an argument to cat, but an instruction to the shell to set stdout=open('three', 'w') when running the command ['cat', 'one', 'two']. If you don't want to deal with redirections and pipelines yourself, you need a shell to do it.
A slightly trickier case is cat foo bar | baz. To do that without a shell, you need to start both sides of the pipeline yourself: p1 = Popen(['cat', 'foo', 'bar'], stdout=PIPE), p2=Popen(['baz'], stdin=p1.stdout).
You don't give a damn about security bugs.
...okay, that's a little bit too strong, but not by much. Using shell=True is dangerous. You can't do this: Popen('cat -- %s' % (filename,), shell=True) without a shell injection vulnerability: If your code were ever invoked with a filename containing $(rm -rf ~), you'd have a very bad day. On the other hand, ['cat', '--', filename] is safe with all possible filenames: The filename is purely data, not parsed as source code by a shell or anything else.
It is possible to write safe scripts in shell, but you need to be careful about it. Consider the following:
filenames = ['file1', 'file2'] # these can be user-provided
subprocess.Popen(['cat -- "$#" | baz', '_'] + filenames, shell=True)
That code is safe (well -- as safe as letting a user read any file they want ever is), because it's passing your filenames out-of-band from your script code -- but it's safe only because the string being passed to the shell is fixed and hardcoded, and the parameterized content is external variables (the filenames list). And even then, it's "safe" only to a point -- a bug like Shellshock that triggers on shell initialization would impact it as much as anything else.

I was under the impression that subprocess.call and check_call and check_output all must execute the argument through the shell.
No, subprocess is perfectly capable of starting a program directly (via an operating system call). It does not need a shell
Things that can be done with shell=True that can't be done with shell=False
You can use shell=False for any command that simply runs some executable optionally with some specified arguments.
You must use shell=True if your command uses shell features. This includes pipelines, |, or redirections or that contains compound statements combined with ; or && or || etc.
Thus, one can use shell=False for a command like grep string file. But, a command like grep string file | xargs something will, because of the | require shell=True.
Because the shell has power features that python programmers do not always find intuitive, it is considered better practice to use shell=False unless you really truly need the shell feature. As an example, pipelines are not really truly needed because they can also be done using subprocess' PIPE feature.

executing a command from a string and timing it

I need to write a script that will receive several parameters, from which the most important one
Is a string that contains a command (in linux).
I need to be able to run it, keep the output in STDOUT (the usual), but also time it, and later output some .csv file.
Say it looks something like this:
timing_script.py param "echo hello world; cat /tmp/foo_bar"
The command will output stuff to STDOUT every couple of milliseconds, which I need it to stay there. I'm saying this because my previous attempt at this script was in bash and I had to cut from the time command to actually time that, which also meant having to disregard the output of the command.
I'll also have to append something like param,0.345 to a csv file.
How do I execute a command from a string and also time it?

You can use subprocess to run linux command from string and time to calculate execution time:
import time
from subprocess import Popen, PIPE
start = time.time()
p1 = Popen(["my_linux_cmd"], stdout=PIPE)
print(p1.communicate()) # sdout
end = time.time()
exec_time = end - start
print(exec_time) # exeution time
Check subprocess.Popen fro more details about the available options
Warning: to print the stdout you can also use Popen.stdout.read but use communicate() rather to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.

A simpler way which stays in the shell uses the formatting option -f of the time command. You can use it like that :
$ param="foo"
$ command="echo bar ; cat /tmp/foobar"
$ /usr/bin/time -f "$param,%e" bash -c "$command"
bar
#Beginning of foobar file
#End of foobar file
foo,0.00
Please have a look at man time for further examples about formatting the output of time
Of course, you can also directly run the following command (i.e. without using variables) :
/usr/bin/time -f "myparam,%e" bash -c "echo bar ; cat /tmp/foobar"
Have fun

How to call a series of bash commands in python and store output

I am trying to run the following bash script in Python and store the readlist output. The readlist that I want to be stored as a python list, is a list of all files in the current directory ending in *concat_001.fastq.
I know it may be easier to do this in python (i.e.
import os
readlist = [f for f in os.listdir(os.getcwd()) if f.endswith("concat_001.fastq")]
readlist = sorted(readlist)
However, this is problematic, as I need Python to sort the list in EXACTLY the same was as bash, and I was finding that bash and Python sort certain things in different orders (eg Python and bash deal with capitalised and uncapitalised things differently - but when I tried
readlist = np.asarray(sorted(flist, key=str.lower))
I still found that two files starting with ML_ and M_ were sorted in different order with bash and Python. Hence trying to run my exact bash script through Python, then to use the sorted list generated with bash in my subsequent Python code.
input_suffix="concat_001.fastq"
ender=`echo $input_suffix | sed "s/concat_001.fastq/\*concat_001.fastq/g" `
readlist="$(echo $ender)"
I have tried
proc = subprocess.call(command1, shell=True, stdout=subprocess.PIPE)
proc = subprocess.call(command2, shell=True, stdout=subprocess.PIPE)
proc = subprocess.Popen(command3, shell=True, stdout=subprocess.PIPE)
But I just get: subprocess.Popen object at 0x7f31cfcd9190
Also - I don't understand the difference between subprocess.call and subprocess.Popen. I have tried both.
Thanks,
Ruth

So your question is a little confusing and does not exactly explain what you want. However, I'll try to give some suggestions to help you update it, or in my effort, answer it.
I will assume the following: your python script is passing to the command line 'input_suffix' and that you want your python program to receive the contents of 'readlist' when the external script finishes.
To make our lives simpler, and allow things to be more complicated, I would make the following bash script to contain your commands:
script.sh
#!/bin/bash
input_suffix=$1
ender=`echo $input_suffix | sed "s/concat_001.fastq/\*concat_001.fastq/g"`
readlist="$(echo $ender)"
echo $readlist
You would execute this as script.sh "concat_001.fastq", where $1 takes in the first argument passed on the command line.
To use python to execute external scripts, as you quite rightly found, you can use subprocess (or as noted by another response, os.system - although subprocess is recommended).
The docs tell you that subprocess.call:
"Wait for command to complete, then return the returncode attribute."
and that
"For more advanced use cases when these do not meet your needs, use the underlying Popen interface."
Given you want to pipe the output from the bash script to your python script, let's use Popen as suggested by the docs. As I posted the other stackoverflow answer, it could look like the following:
import subprocess
from subprocess import Popen, PIPE
# Execute out script and pipe the output to stdout
process = subprocess.Popen(['script.sh', 'concat_001.fastq'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
# Obtain the standard out, and standard error
stdout, stderr = process.communicate()
and then:
>>> print stdout
*concat_001.fastq

python: subprocess.Popen() behaviour

I am trying to use rsync with python. I have read that the preferred way to passing arguments to Popen is using an array.
The code I tried:
p = Popen(["rsync",
"\"{source}\"".format(source=latestPath),
"\"{user}#{host}:{dir}\"".format(user=user, host=host, dir=dir)],
stdout=PIPE, stderr=PIPE)
The result is rsync asking for password, even though I have set up SSH keys to do the authentication.
I think this is a problem with the environment the new process gets executed in. What I tried next is:
p = Popen(["rsync",
"\"{source}\"".format(source=latestPath),
"\"{user}#{host}:{dir}\"".format(user=user, host=host, dir=dir)],
stdout=PIPE, stderr=PIPE, shell=True)
This results in rsync printing the "correct usage", so the arguments are passed incorrectly to rsync. I am not sure if this is even supposed to work(passing an array with shell=True)
If I remove the array altogether like this:
p = Popen("rsync \"{source}\" \"{user}#{host}:{dir}\"".format(
source=latestPath, user=user, host=host, dir=dir),
stdout=PIPE, stderr=PIPE, shell=True)
The program works fine. It really doesn't matter for the sake of this script, but I'd like to know what's the difference? Why don't the other two(mainly the first one) work?
Is it just that the shell environment is required, and the second one is incorrect?
EDIT: Contents of the variables
latestPath='/home/tomcat/.jenkins/jobs/MC 4thworld/workspace/target/FourthWorld-0.1-SNAPSHOT.jar'
user='mc'
host='192.168.0.32'
dir='/mc/test/plugins/'

I'd like to know what's the difference?
When shell=True, the entire command is passed to the shell. The quotes are there so the shell can correctly pick the command apart again. In particular, passing
foo "bar baz"
to the shell causes it to parse the command as (Python syntax) ['foo', 'bar baz'] so that it can execute the foo command with the argument bar baz.
By contrast, when shell=False, Python will pass the arguments in the list to the program immediately. For example, try the following subprocess commands:
>>> import subprocess
>>> subprocess.call(["echo", '"Hello!"'])
"Hello!"
0
>>> subprocess.call('echo "Hello!"', shell=True)
Hello!
0
and note that in the first, the quotes are echoed back at you by the echo program, while in the second case, the shell has stripped them off prior to executing echo.
In your specific case, rsync gets the quotes but doesn't know how it's supposed to handle them; it's not itself a shell, after all.

Could it be to do with the cwd or env parameters? Maybe in the first syntax, it can't find the SSH keys...

Just a suggestion, it might be easier for you to use sh instead of subprocess:
import sh
sh.rsync(latestPath, user+"#"+host+":"+dir)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

using tail in python subprocess takes long time - python

Related

Run a subprocess in python and both show the output in "real time" and save it to a variable

How does subprocess.call() work with shell=False?

executing a command from a string and timing it

How to call a series of bash commands in python and store output

python: subprocess.Popen() behaviour

Categories

Resources