Python subprocess command execution got stuck - python

I have an issue where a unix command executed with the python subprocess module is stuck:
(The full code is here:
https://github.com/discoproject/disco/blob/master/lib/disco/worker/classic/func.py)
The unix command is a simple in-place sort.
The way the process is created:
env = os.environ.copy()
env['LC_ALL'] = 'C'
cmd, shell = sort_cmd(filename, sort_buffer_size)
subprocess.check_call(cmd, env=env, shell=shell)
where the sort_cmd is:
def sort_cmd(filename, sort_buffer_size):
return (r"sort -z -t$'\xff' -k 1,1 -T . -S {0} -o {1} {1}"
.format(sort_buffer_size, filename), True)
The input file (which is also the output file) of the sort command is empty. The file was not empty before calling this command (it is printed).
The question is, if this is a python issue, how could the file be empty. (One hypothesis is this python 2.7 bug: http://bugs.python.org/issue19809).
Issuing strace on the sort process showed that it was stuck on a futex. Unfortunately, I haven't been able to reproduce this problem and I do not have the input file. When the sort process was killed manually, it returned (with an error of course).
I am using gnu coreutils 8.10

This cannot be a python issue, as it all happens from within the subshell and Python has no notion that it is a filename whatsoever.
In fact, the sort command outputs its output file for writing and empties it. If it is the input file at the same time, you are lost.
A solution could be to output everything into a temporary file and rename that afterwards.

Related

How do I embed my shell scanning-script into a Python script?

Iv'e been using the following shell command to read the image off a scanner named scanner_name and save it in a file named file_name
scanimage -d <scanner_name> --resolution=300 --format=tiff --mode=Color 2>&1 > <file_name>
This has worked fine for my purposes.
I'm now trying to embed this in a python script. What I need is to save the scanned image, as before, into a file and also capture any std output (say error messages) to a string
I've tried
scan_result = os.system('scanimage -d {} --resolution=300 --format=tiff --mode=Color 2>&1 > {} '.format(scanner, file_name))
But when I run this in a loop (with different scanners), there is an unreasonably long lag between scans and the images aren't saved until the next scan starts (the file is created as an empty file and is not filled until the next scanning command). All this with scan_result=0, i.e. indicating no error
The subprocess method run() has been suggested to me, and I have tried
with open(file_name, 'w') as scanfile:
input_params = '-d {} --resolution=300 --format=tiff --mode=Color 2>&1 > {} '.format(scanner, file_name)
scan_result = subprocess.run(["scanimage", input_params], stdout=scanfile, shell=True)
but this saved the image in some kind of an unreadable file format
Any ideas as to what may be going wrong? Or what else I can try that will allow me to both save the file and check the success status?
subprocess.run() is definitely preferred over os.system() but neither of them as such provides support for running multiple jobs in parallel. You will need to use something like Python's multiprocessing library to run several tasks in parallel (or painfully reimplement it yourself on top of the basic subprocess.Popen() API).
You also have a basic misunderstanding about how to run subprocess.run(). You can pass in either a string and shell=True or a list of tokens and shell=False (or no shell keyword at all; False is the default).
with_shell = subprocess.run(
"scanimage -d {} --resolution=300 --format=tiff --mode=Color 2>&1 > {} ".format(
scanner, file_name), shell=True)
with open(file_name) as write_handle:
no_shell = subprocess.run([
"scanimage", "-d", scanner, "--resolution=300", "--format=tiff",
"--mode=Color"], stdout=write_handle)
You'll notice that the latter does not support redirection (because that's a shell feature) but this is reasonably easy to implement in Python. (I took out the redirection of standard error -- you really want error messages to remain on stderr!)
If you have a larger working Python program this should not be awfully hard to integrate with a multiprocessing.Pool(). If this is a small isolated program, I would suggest you peel off the Python layer entirely and go with something like xargs or GNU parallel to run a capped number of parallel subprocesses.
I suspect the issue is you're opening the output file, and then running the subprocess.run() within it. This isn't necessary. The end result is, you're opening the file via Python, then having the command open the file again via the OS, and then closing the file via Python.
JUST run the subprocess, and let the scanimage 2>&1> filename command create the file (just as it would if you ran the scanimage at the command line directly.)
I think subprocess.check_output() is now the preferred method of capturing the output.
I.e.
from subprocess import check_output
# Command must be a list, with all parameters as separate list items
command = ['scanimage',
'-d{}'.format(scanner),
'--resolution=300',
'--format=tiff',
'--mode=Color',
'2>&1>{}'.format(file_name)]
scan_result = check_output(command)
print(scan_result)
However, (with both run and check_output) that shell=True is a big security risk ... especially if the input_params come into the Python script externally. People can pass in unwanted commands, and have them run in the shell with the permissions of the script.
Sometimes, the shell=True is necessary for the OS command to run properly, in which case the best recommendation is to use an actual Python module to interface with the scanner - versus having Python pass an OS command to the OS.

batch file to convert .mp4 to .mp3 crashes half the times

I am using a batch file to access my portable VLC executable to convert an mp4 to an mp3:
set arg1=%1 REM -> arg1={my_mp4_full_path}
set arg2=%2 REM -> arg2={my_mp3_full_path}
echo %arg1%
echo %arg2%
REM batch file is in the same directory as "VLCPlayer" folder
"%~dp0\VLCPlayer\VLCPortable.exe" -I dummy %arg1% --sout=#transcode{acodec=mp3,ab=128,vcodec=dummy}:std{access="file",mux="raw",dst=%arg2%} vlc://quit
When I run this script the first time, vlc crashes and I get an unplayable mp3 file, however when I run the script again the script works and I get a playable mp3. Is there a way to remedy this, or make it consistent? I don't see why running it twice would yield different outcomes.
No I don't have ffmpeg on my computer it is unrecognizable internal or external command.
Note that I face the same problem when using powershell to perform the same task, when I import my function from a .psm1 script:
function ConvertToMp3(
[switch] $inputObject,
[string] $vlc = '{PAth_TO_PORTABLE_VLC}\VLCPortable.exe')
{
PROCESS {
$codec = 'mp3';
$oldFile = $_;
$newFile = $oldFile.FullName.Replace($oldFile.Extension, ".$codec").Replace("'","");
&"$vlc" -I dummy "$oldFile" ":sout=#transcode{acodec=$codec,
vcodec=dummy}:standard{access=file,mux=raw,dst=`'$newFile`'}" vlc://quit | out-null;
# delete the original file
Remove-Item $oldFile;
}
}
I get the same random output that sometimes works, sometimes crashes.
Update:
I feel like I should add more info of how I use the batch file:
I have a python script Convert.py and I call my batch file inside using os.system():
mp4_to_convert = arguments.file
full_path_mp4 = os.path.join(outdir,mp4_to_convert)
mp3_to_convert_to = mp4_to_convert.replace(".mp4",".mp3")
full_path_mp3 = os.path.join(outdir,mp3_to_convert_to)
command_string = """Convert_Script.bat \"{}\" \"{}\"""".format(full_path_mp4, full_path_mp3)
os.system(command_string)
This is the documentation of os.system():
os.system(command)
Execute the command (a string) in a subshell. This
is implemented by calling the Standard C function system(), and has
the same limitations. Changes to sys.stdin, etc. are not reflected in
the environment of the executed command. If command generates any
output, it will be sent to the interpreter standard output stream.
On Unix, the return value is the exit status of the process encoded in
the format specified for wait(). Note that POSIX does not specify the
meaning of the return value of the C system() function, so the return
value of the Python function is system-dependent.
On Windows, the return value is that returned by the system shell
after running command. The shell is given by the Windows environment
variable COMSPEC: it is usually cmd.exe, which returns the exit status
of the command run; on systems using a non-native shell, consult your
shell documentation.
Any pointers or suggestions would be helpful, thank you in advance for your help.

Issue using subprocess to run a PDAL bash command from Python [duplicate]

This question already has answers here:
File not found error when launching a subprocess containing piped commands
(6 answers)
Closed 2 years ago.
Issue:
I cannot run a pdal bash command from Python using subprocess.
Here is the code
based on Running Bash commands in Python:
import os, subprocess
input = '/path/to/file.ply'
output = '/path/to/statfile.json'
if not os.path.isfile(output):
open(output, 'a').close()
bashcmd = ("pdal info --boundary "
+input
+" > "
+output
)
print("Bash command is:\n{}\n".format(bashcmd))
process = subprocess.Popen(bashcommand.split(),
stdout=subprocess.PIPE,
shell=True)
output, error = process.communicate()
print("Output:\n{}\n".format(output))
print("Error:\n{}\n".format(error))
Which gives me this output in the Python console:
Bash command is:
pdal info --boundary /path/to/file.ply > /path/to/statfile.json
Output:
Usage:
pdal <options>
pdal <command> <command options>
--command The PDAL command
--debug Sets the output level to 3 (option deprecated)
--verbose, -v Sets the output level (0-8)
--drivers List available drivers
--help, -h Display help text
--list-commands List available commands
--version Show program version
--options Show options for specified driver (or 'all')
--log Log filename (accepts stderr, stdout, stdlog, devnull as
special cases)
--logtiming Turn on timing for log messages
The following commands are available:
- delta
- diff
- fauxplugin
- ground
- hausdorff
- info
- merge
- pcl
- pipeline
- random
- smooth
- sort
- split
- tindex
- translate
See http://pdal.io/apps/ for more detail
Error:
None
It looks as if it had stop reading the arguments of the command after the call to 'pdal' only, which prints this help message.
If I copy the output of the first print and paste it in a bash terminal, it works properly, giving me the output file with the desired metadata. But from Python no output file is created.
Question:
I wonder why (e.g. is there anything wrong with the redirection or the fact that the computation itself may take ~20sec normally?), and how to execute this command from Python?
This doesn't provide a clear enough answer to the present issue.
There are multiple errors here.
You are using an undefined variable bashCommand instead of the one you defined above bashcmd.
You are mixing output to a Python file handle with shell redirection.
You are not capturing the stderr of the process. (I will vaguely assume you do not need the standard error anyway.)
You should not split() the command if you run it with shell=True.
More broadly, you should probably avoid the shell=True and let Python take care of the redirection for you by connecting the output to the file you open; and in modern times, you really should not use subprocess.Popen() if you can use subprocess.run() or subprocess.check_call() or friends.
import subprocess
input = '/path/to/file.ply'
output = '/path/to/statfile.json'
with open(output, 'a') as handle:
bashcmd = ['pdal', 'info', '--boundary', input]
#print("Bash command is:\n{}\n".format(bashcmd))
result = subprocess.run(bashcmd, stdout=handle, stderr=subprocess.PIPE)
# No can do because output goes straight to the file now
##print("Output:\n{}\n".format(output))
#print("Error:\n{}\n".format(result.stdout))

Execute batch file in different directory

I have a a file structure like the following (Windows):
D:\
dir_1\
batch_1.bat
dir_1a\
batch_2.bat
dir_2\
main.py
For the sake of this question, batch_1.bat simply calls batch_2.bat, and looks like:
cd dir_1a
start batch_2.bat %*
Opening batch_1.bat from a command prompt indeed opens batch_2.bat as it's supposed to, and from there on, everything is golden.
Now I want my Python file, D:\dir_2\main.py, to spawn a new process which starts batch_1.bat, which in turn should start batch_2.bat. So I figured the following Python code should work:
import subprocess
subprocess.Popen(['cd "D:/dir_1"', "start batch_1.bat"], shell=True)
This results in "The system cannot find the path specified" being printed to my Python console. (No error is raised, of course.) This is due to the first command. I get the same result even if I cut it down to:
subprocess.Popen(['cd "D:/"'], shell=True)
I also tried starting the batch file directly, like so:
subprocess.Popen("start D:/dir_1/batch_1.bat", shell=True)
For reasons that I don't entirely get, this seems to just open a windows command prompt, in dir_2.
If I forego the start part of this command, then my Python process is going to end up waiting for batch_1 to finish, which I don't want. But it does get a little further:
subprocess.Popen("D:/dir_1/batch_1.bat", shell=True)
This results in batch_1.bat successfully executing... in dir_2, the directory of the Python script, rather than the directory of batch_1.bat, which results in it not being able to find dir_1a\ and hence, batch_2.bat is not executed at all.
I am left highly confused. What am I doing wrong, and what should I be doing instead?
Your question is answered here: Python specify popen working directory via argument
In a nutshell, just pass an optional cwd argument to Popen:
subprocess.Popen(["batch_1.bat"], shell=True, cwd=r'd:\<your path>\dir1')

Python rsync error in reading remote root-level files

I try to setup a cron job to rsync remote files (contains root-level files) into my local server, if I run the command in shell, it works. But if I run this in Python, I got into strange command not found error:
This works if run it in a shell:
rsync -ave ssh --rsync-path='sudo rsync' --delete root#192.168.1.100:/tmp/test2 ./test
But this Python script doesn't:
#!/usr/bin/python
from subprocess import call
....
for src_dir in backup_list:
call(["rsync", "-ave", "ssh", "--rsync-path='sudo rsync'", "--delete", src_host+src_dir, dst_dir])
It fails with:
local server:$ backup.py
bash: sudo rsync: command not found
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: remote command not found (code 127) at io.c(226) [Receiver=3.1.0]
...
It is most likely a spacing error or something small, the way I debug commands is to make sure to prints out. OS.system is a great alternative thats easier although subprocess is better. I am not around my computer to test it but you can either set your subprocess like that, or use this example. This is assuming your on Linux or Mac.
import os
cmd = ('rsync -ave --delete root' +str(src_host) + str(src_directory) + '' + str(dst_dir)) #variable you can call anytime
os.system(cmd) # actually performs the command
print x # how to test and make sure
Quotes around an argument with spaces like you have in "--rsync-path='sudo rsync'" are needed when the shell splits up a long string into arguments, to avoid treating rsync as a separate argument. In your call(), you're providing the individual arguments, so that splitting of a string into arguments is not performed. With your code as-is, the quotes end up as part of the argument passed to rsync. Just drop them. Here's a working example of the list passed to a call() for a very similar rsync invocation:
['rsync',
'-arvz',
'-delete',
'-e',
'ssh',
'--rsync-path=sudo rsync',
'192.168.0.17:/remote/directory/',
'/local/directory/']
I have been facing the same issue:
This piece of code work for me…
join the command while passing to call or Popen and add shell=True.
from subprocess import call
for src_dir in backup_list:
call( " ".join(["rsync", "-ave", "ssh", "--rsync-path='sudo rsync'", "--delete", src_host+src_dir, dst_dir]) , shell=True)

Categories

Resources