Calling 'mv' from Python Popen with wildcard - python

I can't seem to get the 'mv' command to work from Python subprocess.Popen with a wildcard.
The code:
def moveFilesByType(source, destination, extension):
params = []
params.append("mv")
params.append(source + "/*." + extension)
params.append(destination + "/")
print params
pipe = subprocess.Popen(params, shell=True, stdout=PIPE)
result, err = pipe.communicate()
return result
The output from print params:
['mv', '/full_path_to_folder_source/*.nib', '/full_path_to_folder_target/']
The paths here are shortened just for readability, but I assure that they are valid. Calling this exact same command from a terminal works but calling in python gives the standard message about improper use of mv:
usage: mv [-f | -i | -n] [-v] source target
mv [-f | -i | -n] [-v] source ... directory
I read that in order for wildcards to work, I would need the parameter shell=True in the Popen call, which is present. Any ideas why this doesn't work? Removing shell=True ends up treating the asterisks as hard literals as expected.

Use a string instead of an array:
params = "mv /full_path_to_folder_source/*.nib /full_path_to_folder_target/"
When you specify arguments via the array form, the argument '/full_path_to_folder_source/*.nib' is passed to mv. You want to force bash to expand the argument, but Popen won't pass each argument through the shell.

You can do it without starting a new process using modules shutil and glob:
import glob
import shutil
def moveFilesByType(source, destination, extension):
for path in glob.glob(source + "/*." + extension):
shutil.move(path, destination)

You shouldn't need to use subprocess for this, check out shutil.copytree

Related

SyntaxError: missing ; before statement - Python

Let's say I have this snippet
list_command = 'mongo --host {host} --port {port} ' \
'--username {username} --password {password} --authenticationDatabase {database} < {path}'
def shell_exec(cmd: str):
import subprocess
p = subprocess.call(cmd, shell=True)
return p
Let's say these are the commands I'm trying to run on mongo
use users
show collections
db.base.find().pretty()
If format the string list_command with the appropriate values and pass it to the function with shell=True, it works fine. But I'm trying to avoid it for security purposes.
If I call it with shell=False, I get the following error:
2020-08-31T14:08:49.291+0100 E QUERY [thread1] SyntaxError: missing ; before statement #./mongo/user-01-09-2020:1:4
failed to load: ./mongo/user-01-09-2020
253
Your list_command is a shell command: in particular, it includes input redirection (via < {path}), which is a syntactic feature of the shell. To use it you need shell=True.
If you don’t want to use shell=True, you need to change the way you construct the argument (separate arguments need to be passed as separate items of a list rather than as a single string), and you need to pass the script into the standard input via an explicit pipe, by setting its input parameter:
cmd = ['mongo', '--host', '{host}', '--port', …]
subprocess.run(cmd, input=mongodb_script)
Using input raised the following error: TypeError: init() got an unexpected keyword argument 'input'.
I ended up doing the following:
import subprocess
def shell_exec(cmd: str, stdin=None):
with open(stdin, 'rb') as f:
return subprocess.call(cmd.split(), stdin=f)

Run sed for subset of arguments at time

I am currently running sed in a python subprocess, however I am receiving the error:
"OSError: [Errno 7] Argument list too long: 'sed'"
The Python code is:
subprocess.run(['sed', '-i',
'-e', 's/#/pau/g',
*glob.glob('label_POS/label_phone_align/dump/*')], check=True)
Where the /dump/ directory has ~13,000 files in it. I have been told that I need to run the command for subsets of the argument list, but I'm can't find how to do that.
Whoever told you that probably meant that you need to split up the glob and run multiple separate commands:
files = glob.glob('label_POS/label_phone_align/dump/*')
i = 0
scale = 100
# process in units of 100 filenames until we have them all
while scale*i < len(files):
subprocess.run(['sed', '-i',
'-e', 's/#/pau/g',
*files[scale*i:scale*(i+1)]], check=True)
i += 1
and then amalgamate all that output however you need, after the fact. I don't know how many inputs the sed command can accept from the command line, but it's apparently less than 13,000. You can keep changing scale until it doesn't error.
Please scroll down to the end of this answer for the solution I recommend for your specific problem. There's a bit of background here for context and/or future visitors grappling with other "argument list too long" errors.
The exec() system call has a size limit; you cannot pass more than ARG_MAX bytes as arguments to a process, where this system constant's value can usually be queried with the getconf ARG_MAX command on modern systems.
import glob
import subprocess
arg_max = subprocess.run(['getconf', 'ARG_MAX'],
text=True, check=True, capture_output=True
).stdout.strip()
arg_max = int(arg_max)
cmd = ['sed', '-i', '-e', 's/#/pau/g']
files = glob.glob('label_POS/label_phone_align/dump/*')
while files:
base = sum(len(x) for x in cmd) + len(cmd)
for l in range(len(files)):
base += 1 + len(files[l])
if base > arg_max:
l -= 1
break
subprocess.run(cmd + files[0:l+1], check=True)
files = files[l+1:]
Of course, the xargs command already does exactly this for you.
import subprocess
import glob
subprocess.run(
['xargs', '-r', '-0', 'sed', '-i', '-e', 's/#/pau/g'],
input=b'\0'.join([x.encode() for x in glob.glob('label_POS/label_phone_align/dump/*') + ['']]),
check=True)
Simply removing the long path might be enough in you case, though. You are repeating label_POS/label_phone_align/dump/ in front of every file name in the argument array.
import glob
import subprocess
import os
path = 'label_POS/label_phone_align/dump'
files = [os.path.basename(file)
for file in glob.glob(os.path.join(path, '*'))]
subprocess.run(
['sed', '-i', '-e', 's/#/pau/g', *files],
cwd=path, check=True)
Eventually, perhaps prefer a pure Python solution.
import glob
import fileinput
for line in fileinput.input(glob.glob('label_POS/label_phone_align/dump/*'), inplace=True):
print(line.replace('#', 'pau'))

Running vulture from a python script

I'm trying to find a way to run vulture (which finds unused code in python projects) inside a python script.
vulture documentation can be found here:
https://pypi.org/project/vulture/
Does anyone know how to do it?
The only way I know to use vulture is by shell commands.
I tried to tun the shell commands from the script, using module subprocess, something like this:
process = subprocess.run(['vulture', '.'], check=True,
stdout=subprocess.PIPE, stderr=subprocess.STDOUT,universal_newlines=True)
which I though would have the same effect as running the shell command "vulture ."
but it doesn't work.
Can anyone help?
Thanks
Vulture dev here.
The Vulture package exposes an API, called scavenge - which it uses internally for running the analysis after parsing command line arguments (here in vulture.main).
It takes in a list of Python files/directories. For each directory, Vulture analyzes all contained *.py files.
To analyze the current directory:
import vulture
v = vulture.Vulture()
v.scavenge(['.'])
If you just want to print the results to stdout, you can call:
v.report()
However, it's also possible to perform custom analysis/filters over Vulture's results. The method vulture.get_unused_code returns a list of vulture.Item objects - which hold the name, type and location of unused code.
For the sake of this answer, I'm just gonna print the name of all unused objects:
for item in v.get_unused_code():
print(item.name)
For more info, see - https://github.com/jendrikseipp/vulture
I see you want to capture the output shown at console:
Below code might help:
import tempfile
import subprocess
def run_command(args):
with tempfile.TemporaryFile() as t:
try:
out = subprocess.check_output(args,shell=True, stderr=t)
t.seek(0)
console_output = '--- Provided Command: --- ' + '\n' + args + '\n' + t.read() + out + '\n'
return_code = 0
except subprocess.CalledProcessError as e:
t.seek(0)
console_output = '--- Provided Command: --- ' + '\n' + args + '\n' + t.read() + e.output + '\n'
return_code = e.returncode
return return_code, console_output
Your expected output will be displayed in console_output
Link:
https://docs.python.org/3/library/subprocess.html

output the command line called by subprocess?

I'm using the subprocess.Popen call, and in another question I found out that I had been misunderstanding how Python was generating arguments for the command line.
My Question
Is there a way to find out what the actual command line was?
Example Code :-
proc = subprocess.popen(....)
print "the commandline is %s" % proc.getCommandLine()
How would you write getCommandLine ?
It depends on the version of Python you are using. In Python3.3, the arg is saved in proc.args:
proc = subprocess.Popen(....)
print("the commandline is {}".format(proc.args))
In Python2.7, the args not saved, it is just passed on to other functions like _execute_child. So, in that case, the best way to get the command line is to save it when you have it:
proc = subprocess.Popen(shlex.split(cmd))
print "the commandline is %s" % cmd
Note that if you have the list of arguments (such as the type of thing returned by shlex.split(cmd), then you can recover the command-line string, cmd using the undocumented function subprocess.list2cmdline:
In [14]: import subprocess
In [15]: import shlex
In [16]: cmd = 'foo -a -b --bar baz'
In [17]: shlex.split(cmd)
Out[17]: ['foo', '-a', '-b', '--bar', 'baz']
In [18]: subprocess.list2cmdline(['foo', '-a', '-b', '--bar', 'baz'])
Out[19]: 'foo -a -b --bar baz'
The correct answer to my question is actually that there IS no command line. The point of subprocess is that it does everything through IPC. The list2cmdline does as close as can be expected, but in reality the best thing to do is look at the "args" list, and just know that that will be argv in the called program.
Beautiful and scalable method
I have been using something like this:
#!/usr/bin/env python3
import os
import shlex
import subprocess
import sys
def run_cmd(cmd, cwd=None, extra_env=None, extra_paths=None, dry_run=False):
if extra_env is None:
extra_env = {}
newline_separator = ' \\\n'
out = []
kwargs = {}
env = os.environ.copy()
# cwd
if 'cwd' is not None:
kwargs['cwd'] = cwd
# extra_env
env.update(extra_env)
for key in extra_env:
out.append('{}={}'.format(shlex.quote(key), shlex.quote(extra_env[key])) + newline_separator)
# extra_paths
if extra_paths is not None:
path = ':'.join(extra_paths)
if 'PATH' in env:
path += ':' + env['PATH']
env['PATH'] = path
out.append('PATH="{}:${{PATH}}"'.format(':'.join(extra_paths)) + newline_separator)
# Command itself.
for arg in cmd:
out.append(shlex.quote(arg) + newline_separator)
# Print and run.
kwargs['env'] = env
print('+ ' + ' '.join(out) + ';')
if not dry_run:
subprocess.check_call(cmd, **kwargs)
run_cmd(
sys.argv[1:],
cwd='/bin',
extra_env={'ASDF': 'QW ER'},
extra_paths=['/some/path1', '/some/path2']
)
Sample run:
./a.py echo 'a b' 'c d'
Output:
+ ASDF='QW ER' \
PATH="/some/path1:/some/path2:${PATH}" \
echo \
'a b' \
'c d' \
;
a b c d
Feature summary:
makes huge command lines readable with one option per line
add a + to commands like sh -x so users can differentiate commands from their output easily
show cd, and extra environment variables if they are given to the command. These only printed if given, generating a minimal shell command.
All of this allows users to easily copy the commands manually to run them if something fails, or to see what is going on.
Tested on Python 3.5.2, Ubuntu 16.04. GitHub upstream.
You can see it by passing the process id to ps command, if you are on POSIX OS:
import subprocess
proc = subprocess.Popen(["ls", "-la"])
subprocess.Popen(["ps", "-p", str(proc.pid)])
Output (see the CMD column):
PID TTY TIME CMD
7778 ttys004 0:00.01 ls -la
On windows, I used #catwith 's trick (thanks, btw):
wmic process where "name like '%mycmd%'" get processid,commandline
where "mycmd" is a part of the cmd unique to your command (used to filter irrelevant system commands)
That's how I revealed another bug in the suprocess vs windows saga. One of the arguments I had had its double-quotes escaped a-la unix! \"asdasd\"

calling rsync from python subprocess.call

I'm trying to execute rsync over ssh from a subprocess in a python script to copy images from one server to another. I have a function defined as:
def rsyncBookContent(bookIds, serverEnv):
bookPaths = ""
if len(bookIds) > 1:
bookPaths = "{" + ",".join(("book_"+str(x)) for x in bookIds) + "}"
else:
bookPaths = "book_" + str(bookIds[0])
for host in serverEnv['content.hosts']:
args = ["rsync", "-avz", "--include='*/'", "--include='*.jpg'", "--exclude='*'", "-e", "ssh", options.bookDestDir + "/" + bookPaths, "jill#" + host + ":/home/jill/web/public/static/"]
print "executing " + ' '.join(args)
subprocess.call(args)
What I'm ultimately trying to do is have Python execute this (which works from a bash shell):
rsync -avz --include='*/' --include='*.jpg' --exclude='*' -e ssh /shared/books/{book_482,book_347} jill#10.12.27.20:/home/jill/web/public/static/
And indeed my print statement outputs:
executing rsync -avz --include='*/' --include='*.jpg' --exclude='*' -e ssh /shared/books/{book_482,book_347} jill#10.12.27.20:/home/jill/web/public/static/
But when executed from within this python script, there are two problems:
if len(bookIds) > 1, the list of sub-directories under /shared/books/ is somehow misinterpreted by bash or rsync. The error message is:
rsync: link_stat "/shared/books/{book_482,book_347}" failed: No such file or directory (2))
if len(bookIds) == 1, all files under the source directory are rsynced (not just *.jpg, as is my intention)
Seems as if the subprocess.call function requires some characters to be escaped or something, no?
Figured out my issues. My problems were the result of my misunderstanding of how the subprocess.call function executes and bash's expansion of lists inside curly braces.
When I was issuing the rsync command in a bash shell with subdirectories in curly braces, bash was really expanding that into multiple arguments which were being passed to rsync (/shared/books/book_1 shared/books/book_2, etc.). When passing the same string with curly braces "/shared/books/{book_1, book_2}" to the subprocess.call function, the expansion wasn't happening, since it wasn't going through bash, so my argument to rsync was really "/shared/books/{book_1, book_2}".
Similarly, the single quotes around the file patterns ('*', '*.jpg', etc.) work on the bash command line (only the values inside the single quotes are passed to rsync), but inside subprocess.call, the single quotes are passed to rsync as the file pattern ("'*.jpg'").
New (working) code looks like this:
def rsyncBookContent(bookIds, serverEnv):
bookPaths = []
for b in bookIds:
bookPaths.append(options.bookDestDir + "/book_" + str(b))
args = []
for host in serverEnv['content.hosts']:
# copy all *.jpg files via ssh
args = ["rsync", "-avz", "--include", "*/", "--include", "*.jpg", "--exclude", "*", "-e", "ssh"]
args.extend(bookPaths)
args.append("jill#" + host + ":/home/jill/web/public/static/"])
print "executing " + ' '.join(args)
subprocess.call(args)

Categories

Resources