Run sed for subset of arguments at time - python

I am currently running sed in a python subprocess, however I am receiving the error:
"OSError: [Errno 7] Argument list too long: 'sed'"
The Python code is:
subprocess.run(['sed', '-i',
'-e', 's/#/pau/g',
*glob.glob('label_POS/label_phone_align/dump/*')], check=True)
Where the /dump/ directory has ~13,000 files in it. I have been told that I need to run the command for subsets of the argument list, but I'm can't find how to do that.

Whoever told you that probably meant that you need to split up the glob and run multiple separate commands:
files = glob.glob('label_POS/label_phone_align/dump/*')
i = 0
scale = 100
# process in units of 100 filenames until we have them all
while scale*i < len(files):
subprocess.run(['sed', '-i',
'-e', 's/#/pau/g',
*files[scale*i:scale*(i+1)]], check=True)
i += 1
and then amalgamate all that output however you need, after the fact. I don't know how many inputs the sed command can accept from the command line, but it's apparently less than 13,000. You can keep changing scale until it doesn't error.

Please scroll down to the end of this answer for the solution I recommend for your specific problem. There's a bit of background here for context and/or future visitors grappling with other "argument list too long" errors.
The exec() system call has a size limit; you cannot pass more than ARG_MAX bytes as arguments to a process, where this system constant's value can usually be queried with the getconf ARG_MAX command on modern systems.
import glob
import subprocess
arg_max = subprocess.run(['getconf', 'ARG_MAX'],
text=True, check=True, capture_output=True
).stdout.strip()
arg_max = int(arg_max)
cmd = ['sed', '-i', '-e', 's/#/pau/g']
files = glob.glob('label_POS/label_phone_align/dump/*')
while files:
base = sum(len(x) for x in cmd) + len(cmd)
for l in range(len(files)):
base += 1 + len(files[l])
if base > arg_max:
l -= 1
break
subprocess.run(cmd + files[0:l+1], check=True)
files = files[l+1:]
Of course, the xargs command already does exactly this for you.
import subprocess
import glob
subprocess.run(
['xargs', '-r', '-0', 'sed', '-i', '-e', 's/#/pau/g'],
input=b'\0'.join([x.encode() for x in glob.glob('label_POS/label_phone_align/dump/*') + ['']]),
check=True)
Simply removing the long path might be enough in you case, though. You are repeating label_POS/label_phone_align/dump/ in front of every file name in the argument array.
import glob
import subprocess
import os
path = 'label_POS/label_phone_align/dump'
files = [os.path.basename(file)
for file in glob.glob(os.path.join(path, '*'))]
subprocess.run(
['sed', '-i', '-e', 's/#/pau/g', *files],
cwd=path, check=True)
Eventually, perhaps prefer a pure Python solution.
import glob
import fileinput
for line in fileinput.input(glob.glob('label_POS/label_phone_align/dump/*'), inplace=True):
print(line.replace('#', 'pau'))

Related

Running a C executable inside a python program

I have written a C code where I have converted one file format to another file format. To run my C code, I have taken one command line argument : filestem.
I executed that code using : ./executable_file filestem > outputfile
Where I have got my desired output inside outputfile
Now I want to take that executable and run within a python code.
I am trying like :
import subprocess
import sys
filestem = sys.argv[1];
subprocess.run(['/home/dev/executable_file', filestem , 'outputfile'])
But it is unable to create the outputfile. I think some thing should be added to solve the > issue. But unable to figure out. Please help.
subprocess.run has optional stdout argument, you might give it file handle, so in your case something like
import subprocess
import sys
filestem = sys.argv[1]
with open('outputfile','wb') as f:
subprocess.run(['/home/dev/executable_file', filestem],stdout=f)
should work. I do not have ability to test it so please run it and write if it does work as intended
You have several options:
NOTE - Tested in CentOS 7, using Python 2.7
1. Try pexpect:
"""Usage: executable_file argument ("ex. stack.py -lh")"""
import pexpect
filestem = sys.argv[1]
# Using ls -lh >> outputfile as an example
cmd = "ls {0} >> outputfile".format(filestem)
command_output, exitstatus = pexpect.run("/usr/bin/bash -c '{0}'".format(cmd), withexitstatus=True)
if exitstatus == 0:
print(command_output)
else:
print("Houston, we've had a problem.")
2. Run subprocess with shell=true (Not recommended):
"""Usage: executable_file argument ("ex. stack.py -lh")"""
import sys
import subprocess
filestem = sys.argv[1]
# Using ls -lh >> outputfile as an example
cmd = "ls {0} >> outputfile".format(filestem)
result = subprocess.check_output(shlex.split(cmd), shell=True) # or subprocess.call(cmd, shell=True)
print(result)
It works, but python.org frowns upon this, due to the chance of a shell injection: see "Security Considerations" in the subprocess documentation.
3. If you must use subprocess, run each command separately and take the SDTOUT of the previous command and pipe it into the STDIN of the next command:
p = subprocess.Popen(cmd, stdin=PIPE, stdout=PIPE)
stdout_data, stderr_data = p.communicate()
p = subprocess.Popen(cmd, stdin=stdout_data, stdout=PIPE)
etc...
Good luck with your code!

Pass_fds alternative in python 2.7

I am currently using popen to call a Unix command which accepts multiple files as arguments and instead of using files I would like to pass the data from memory as a variable/file object. With this command actual files need to be specified with the command as it does not read them from STDIN. I can pass one file to the command by using '/dev/fd/0' as an argument and passing the contents of the file to STDIN, via communicate() but I am looking for a way to pass multiple files.
I believe I need to use file descriptors here in order to achieve this and from looking I can see python 3+ has an option called pass_fds, but no such option exists in 2.7.
Is there any way to do this in python 2.7, I guess you'd need to use os.pipe perhaps?
Thanks
I am sure there's a much better way of doing but I managed to do what I needed:
from subprocess import PIPE, Popen
import os
fakefiles = []
fd2 = 10 # Arbitrary starting fd number
fakefiles.append("""The entire
content of
file one
""")
fakefiles.append("content of file two")
def fd_file_list(fd, maxrange):
fdlist = []
for i in range(0, maxrange):
fdlist.append('/dev/fd/' + str(fd))
fd += 1
return fdlist
def create_fds(fd, files):
for content in files:
r, w = os.pipe()
w = os.fdopen(w, 'w')
w.write(content)
w.close()
os.dup2(r, fd)
fd += 1
fd_files = fd_file_list(fd2, len(fakefiles))
p2 = Popen(['/home/pi/myscript.sh'] + fd_files, stdin=PIPE, stdout=PIPE, stderr=PIPE, preexec_fn=create_fds(fd2, fakefiles))
out, err = p2.communicate()
print out
Where the content of /home/pi/myscript.sh is:
#!/bin/bash
((!$#)) && exit
for i; do
echo -e "\n\nfile is $i"
cat $i
done

Redirecting shell command output to a file does not work using subprocess.Popen in Python

I am using Python 2.6.6 and failed to re-direct the Beeline(Hive) SQL query output returning multiple rows to a file on Unix using ">". For simplicity's sake, I replaced the SQL query with simple "ls" command on current directory and outputting to a text file.
Please ignore syntax of function sendfile. I want help to tweak the function "callcmd" to pipe the stdout onto the text file.
def callcmd(cmd, shl):
logging.info('> '+' '.join(map(str,cmd)))
#return 0;
start_time = time.time()
command_process = subprocess.Popen(cmd, shell=shl, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, universal_newlines=True)
command_output = command_process.communicate()[0]
logging.info(command_output)
elapsed_time = time.time() - start_time
logging.info(time.strftime("%H:%M:%S",time.gmtime(elapsed_time))+' = time to complete (hh:mm:ss)')
if (command_process.returncode != 0):
logging.error('ERROR ON COMMAND: '+' '.join(map(str,cmd)))
logging.error('ERROR CODE: '+str(ret_code))
return command_process.returncode
cmd=['ls', ' >', '/home/input/xyz.txt']
ret_code = callcmd(cmd, False)
Your command (i.e. cmd) could be ['sh', '-c', 'ls > ~/xyz.txt']. That would mean that the output of ls is never passed to Python, it happens entirely in the spawned shell – so you can't log the output. In that case, I'd have used return_code = subprocess.call(cmd), no need for Popen and communicate.
Equivalently, assuming you use bash or similar, you can simply use
subprocess.call('ls > ~/test.txt', shell=True)
If you want to access the output, e.g. for logging, you could use
s = subprocess.check_output(['ls'])
and then write that to a file like you would regularly in Python. To check for a non-zero exit code, handle the CalledProcessError that is raised in such cases.
Here the stdout in command_output is written to a file. You don't need to use any redirection although an alternative might be to have the python print to stdout, and then you would redirect that in your shell to a file.
#!/usr/bin/python
import subprocess
cmd=['ls']
command_process = subprocess.Popen(
cmd,
shell='/bin/bash',
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
universal_newlines=True
)
command_output = command_process.communicate()[0]
if (command_process.returncode != 0):
logging.error('ERROR ON COMMAND: '+' '.join(map(str,cmd)))
logging.error('ERROR CODE: '+str(ret_code))
f = open('listing.txt','w')
f.write(command_output)
f.close()
I added this piece of code to my code and It works fine.Thanks to #Snohdo
f = open('listing.txt','w')
f.write(command_output)
f.close()

Python subprocess: pipe an image blob to imagemagick shell command

I have an image in-memory and I wish to execute the convert method of imagemagick using Python's subprocess. While this line works well using Ubuntu's terminal:
cat image.png | convert - new_image.jpg
This piece of code doesn't work using Python:
jpgfile = Image.open('image.png');
proc = Popen(['convert', '-', 'new_image.jpg'], stdin=PIPE, shell=True)
print proc.communicate(jpgfile.tostring())
I've also tried reading the image as a regular file without using PIL, I've tried switching between subprocess methods and different ways to write to stdin.
The best part is, nothing is happening but I'm not getting a real error. When printing stdout I can see imagemagick help on terminal, followed by the following:
By default, the image format of `file' is determined by its magic
number. To specify a particular image format, precede the filename
with an image format name and a colon (i.e. ps:image) or specify the
image type as the filename suffix (i.e. image.ps). Specify 'file' as
'-' for standard input or output. (None, None)
Maybe there's a hint in here I'm not getting.
Please point me in the right direction, I'm new to Python but from my experience with PHP this should be an extremely easy task, or so I hope.
Edit:
This is the solution I eventually used to process PIL image object without saving a temporary file. Hope it helps someone. (in the example I'm reading the file from the local drive, but the idea is to read an image from a remote location)
out = StringIO()
jpgfile = Image.open('image.png')
jpgfile.save(out, 'png', quality=100);
out.seek(0);
proc = Popen(['convert', '-', 'image_new.jpg'], stdin=PIPE)
proc.communicate(out.read())
It is not subprocess that is causing any issue it is what you are passing to imagemagick that is incorrect,tostring() does get passed to imagemagick. if you actually wanted to replicate the linux command you can pipe from one process to another:
from subprocess import Popen,PIPE
proc = Popen(['cat', 'image.jpg'], stdout=PIPE)
p2 = Popen(['convert', '-', 'new_image.jpg'],stdin=proc.stdout)
proc.stdout.close()
out,err = proc.communicate()
print(out)
When you pass a list of args you don't need shell=True, if you wanted to use shell=True you would pass a single string:
from subprocess import check_call
check_call('cat image.jpg | convert - new_image.jpg',shell=True)
Generally I would avoid shell=True. This answer outlines what exactly shell=True does.
You can also pass a file object to stdin:
with open('image.jpg') as jpgfile:
proc = Popen(['convert', "-", 'new_image.jpg'], stdin=jpgfile)
out, err = proc.communicate()
print(out)
But as there is no output when the code runs successfully you can use check_call which will raise a CalledProcessError if there is a non-zero exit status which you can catch and take the appropriate action:
from subprocess import check_call, CalledProcessError
with open('image.jpg') as jpgfile:
try:
check_call(['convert', "-", 'new_image.jpg'], stdin=jpgfile)
except CalledProcessError as e:
print(e.message)
If you wanted to write to stdin using communicate you could also pass the file contents using .read:
with open('image.jpg') as jpgfile:
proc = Popen(['convert', '-', 'new_image.jpg'], stdin=PIPE)
proc.communicate(jpgfile.read())
If you want don't want to store the image on disk use a tempfile:
import tempfile
import requests
r = requests.get("http://www.reptileknowledge.com/images/ball-python.jpg")
out = tempfile.TemporaryFile()
out.write(r.content)
out.seek(0)
from subprocess import check_call
check_call(['convert',"-", 'new_image.jpg'], stdin=out)
Using a CStringIo.StringIO object:
import requests
r = requests.get("http://www.reptileknowledge.com/images/ball-python.jpg")
out = cStringIO.StringIO(r.content)
from subprocess import check_call,Popen,PIPE
p = Popen(['convert',"-", 'new_image.jpg'], stdin=PIPE)
p.communicate(out.read())

executing a subprocess from python

I think something is getting subtly mangeled when I attempt to execute a subprocess from a python script
I attempt to execute vlc with some (a lot) of arguments.
the instance of vlc that arises complains:
Your input can't be opened:
VLC is unable to open the MRL ' -vvv rtsp://192.168.1.201:554/ch0_multicast_one --sout=#transcode{acodec=none}:duplicate{dst=rtp{sdp=rtsp://:5544/user_hash.sdp},dst=display} :no-sout-rtp-sap :no-sout-standard-sap :ttl=1 :sout-keep'. Check the log for details.
Here is the python code
pid = subprocess.Popen(["vlc "," -vvv rtsp://%s" % target_nvc.ip_address + ":554/ch0_multicast_one --sout=#transcode{acodec=none}:duplicate{dst=rtp{sdp=rtsp://:5544/user_hash.sdp},dst=display} :no-sout-rtp-sap :no-sout-standard-sap :ttl=1 :sout-keep" ], stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
I have examined the output of the subprocess function (using a shell), and if I copy paste that string into my cmd window, the vlc instance works fine... Is this a privilege thing?
Since you're passing a list to subprocess.Popen, each parameter must be in its own element. So you'd want something like:
pid = subprocess.Popen([
"vlc",
"-vvv",
"rtsp://%s:554/ch0_multicast_one" % target_nvc.ip_address,
# etc
], ...)
Each parameter (that the shell would normally parse apart for you) must be in a separate list element.
You can also pass a single command line string and let the shell pull it apart:
pid = subprocess.Popen("vlc -vvv rtsp://...", shell=True, ...)
Using the first form is better for commands that have lots of arguments.
You should use this...
pid = subprocess.Popen(["vlc", "-vvv",
"rtsp://%s" % target_nvc.ip_address + ":554/ch0_multicast_one",
"--sout=#transcode{acodec=none}:duplicate{dst=rtp{sdp=rtsp://:5544/user_hash.sdp},dst=display}",
":no-sout-rtp-sap", ":no-sout-standard-sap",
":ttl=1", ":sout-keep" ], stdout=subprocess.PIPE,
stderr=subprocess.PIPE, stdin=subprocess.PIPE)
movies_path = glob.glob("D:\\MOVIES\**\*\*\*.mp4", recursive=True) + \
glob.glob("D:\\MOVIES\**\*\*\*.mkv", recursive=True) + \
glob.glob("D:\\MOVIES\**\*\*\*.avi", recursive=True)
# probably the right movie
rightMoviePath = difflib.get_close_matches(which_movie, movies_path, len(movies_path), 0)
movie_name = rightMoviePath[0].split("\\")[-1]
hebrew_subtitle_path = glob.glob(rightMoviePath[0].replace(movie_name, "Hebrew.srt"))[0]
english_subtitle_path = glob.glob(rightMoviePath[0].replace(movie_name, "English.srt"))[0]
process, player = subprocess.Popen(["C:\\Users\\yonat\\Downloads\\VLC\\vlc.exe", "--sub-file", hebrew_subtitle_path, rightMoviePath[0]],
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

Categories

Resources