How to pass parameters from xargs to python script? - python

I have command.list file with command parameters for my python script my_script.py which have 3 parameters.
One line of which look like:
<path1> <path2> -sc 4
Looks like it not work like this because parameters should be split?
cat command.list | xargs -I {} python3 my_script.py {}
How to split string to pararmeters and pass it to python script?

What about cat command.list | xargs -L 1 python3 my_script.py? This will pass one line (-L 1) at a time to your script.

The documentation of -I from man xargs
-I replace-str
Replace occurrences of replace-str in the initial-arguments with names read from standard input. Also, unquoted blanks do not terminate input items; instead the separator is the newline character. Implies -x and -L 1.
What you want is
xargs -L1 python3 my_script.py
By the way: cat is not necessary. Use one of the following commands
< command.list xargs -L1 python3 my_script.py
xargs -a command.list -L1 python3 my_script.py

Not sure, what you are trying to do with xargs -I {} python3 my_script.py {} there.
But are you looking for,
$ cat file
<path1> <path2> -sc 4
....
<path1n> <path2n> -sc 4
$ while read -r path1 path2 unwanted unwanted; do python3 my_script.py "$path2" ; done<file

Related

Python argument missed if I pass $ inside string as input parameter

I am trying to pass some shell cmd into one of my python script so that I can run them toward hosts. However, when I pass them into the line, the $ is omitted for some reason. How can I avoid this?
I use parser and setup like this:
parser = argparse.ArgumentParser(description='check size', add_help=True)
parser.add_argument('-c', "--cmd", dest="cmd",type=str, help="Command to run Ex: df -h /boot/|grep -i boot|awk '{print \$4}' 2>/dev/null ")
Ex:
This is working with a backslash
python3.4 disk_file_check.py -e env -op chksize -c "df -h /boot/|grep -i boot|awk {'print \$4'} 2>/dev/null"
since anybody can use this script and all they should pass is just the command they want to use. but I want it works like this:
python3.4 disk_file_check.py -e env -op chksize -c "df -h /boot/|grep -i boot|awk {'print $4'} 2>/dev/null"
But when I do this, The string is turned into:
df -h /boot/|grep -i boot|awk '{print }' 2>/dev/null
As you can see $ is cutting out and give me wrong result...
Any possible setting in argparse can do it? or I should try diff way? Thanks
Thanks. I think I will not pass this through command line since shell exapnd it. Thanks for pointing that out.

Inserting bash script in python code

I am trying to execute a bash script from python code. The bash script has some grep commands in a pipe inside a for loop. When I run the bash script itself it gives no errors but when I use it within the python code it says: grep:write error.
The command that I call in python is:
subprocess.call("./change_names.sh",shell=True)
The bash script is:
#!/usr/bin/env bash
for file in *.bam;do new_file=`samtools view -h $file | grep -P '\tSM:' | head -n 1 | sed 's/.\+SM:\(.\+\)/\1/' | sed 's/\t.\+//'`;rename s/$file/$new_file.bam/ $file;done
What am I missing?
You should not use shell=True when you are running a simple command which doesn't require the shell for anything in the command line.
subprocess_call(["./change_names.sh"])
There are multiple problems in the shell script. Here is a commented refactoring.
#!/usr/bin/env bash
for file in *.bam; do
# Use modern command substitution syntax; fix quoting
new_file=$(samtools view -h "$file" |
grep -P '\tSM:' |
# refactor to a single sed script
sed -n 's/.\+SM:\([^\t]\+\).*/\1/p;q')
# Fix quoting some more; don't use rename
mv "$file" "$new_file.bam"
done
grep -P doesn't seem to be necessary or useful here, but without an example of what the input looks like, I'm hesitant to refactor that into the sed script too. I hope I have guessed correctly what your sed version does with the \+ and \t escapes which aren't entirely portable.
This will still produce a warning that you are not reading all of the output from grep in some circumstances. A better solution is probably to refactor even more of this into your Python script.
import glob
for file in glob.glob('*.bam'):
new_name = subprocess.check_output(['samtools', 'view', '-h', file])
for line in new_name.split('\n'):
if '\tSM:' in line:
dest = line.split('\t')[0].split('SM:')[-1] + '.bam'
os.rename(file, dest)
break
Hi try with below modification which will fix your issue.
for file in *.bam;do new_file=`unbuffer samtools view -h $file | grep -P '\tSM:' | head -n 1 | sed 's/.\+SM:\(.\+\)/\1/' | sed 's/\t.\+//'`;rename s/$file/$new_file.bam/ $file;done
Or else try to redirect your standard error to dev/null like below
for file in *.bam;do new_file=`samtools view -h $file >2>/dev/null | grep -P '\tSM:' | head -n 1 | sed 's/.\+SM:\(.\+\)/\1/' | sed 's/\t.\+//'`;rename s/$file/$new_file.bam/ $file;done
Your actual issue is with this command samtools view -h $file While you are running the script from python you should provide a full path like below:-
/fullpath/samtools view -h $file

How to add argparse flags based on shell find results

I have a python script that accepts a -f flag, and appends multiple uses of the flag.
For example, if I run python myscript -f file1.txt -f file2.txt, I would have a list of files, files=['file1.txt', 'files2.txt']. This works great, but am wondering how I can automatically use the results of a find command to append as many -f flags as there are files.
I've tried:
find ./ -iname '*.txt' -print0 | xargs python myscript.py -f
But it only grabs the first file
With the caveat that this will fail if there are more files than will fit on a single command line (whereas xargs would run myscript.py multiple times, each with a subset of the full list of arguments):
#!/usr/bin/env bash
args=( )
while IFS= read -r -d '' name; do
args+=( -f "$name" )
done < <(find . -iname '*.txt' -print0)
python myscript.py "${args[#]}"
If you want to do this safely in a way that tolerates an arbitrary number of filenames, you're better off using a long-form option -- such as --file rather than -f -- with the = separator allowing the individual name to be passed as part of the same argv entry, thus preventing xargs from splitting a filename apart from the sigil that precedes it:
#!/usr/bin/env bash
# This requires -printf, a GNU find extension
find . -iname '*.txt' -printf '--file=%p\0' | xargs -0 python myscript.py
...or, more portably (running on MacOS, albeit still requiring a shell -- such as bash -- that can handle NUL-delimited reads):
#!/usr/bin/env bash
# requires find -print0 and xargs -0; these extensions are available on BSD as well as GNU
find . -iname '*.txt' -print0 |
while IFS= read -r -d '' f; do printf '--file=%s\0' "$f"; done |
xargs -0 python myscript.py
Your title seems to imply that you can modify the script. In that case, use the nargs (number of args) option to allow more arguments for the -f flag:
parser = argparse.ArgumentParser()
parser.add_argument('--files', '-f', nargs='+')
args = parser.parse_args()
print(args.files)
Then you can use your find command easily:
15:44 $ find . -depth 1 | xargs python args.py -f
['./args.py', './for_clint', './install.sh', './sys_user.json']
Otherwise, if you can't modify the script, see #CharlesDuffy's answer.

Why they are different if I run the command in linux directly and run it by importing os module in python?

I am doing two "similar" things:
(1) in python
import os
os.system('cat ...input | awk -f ...awk' -v seed=$RANDOM)
(2) in linux terminal
cat ...input | awk -f ...awk' -v seed=$RANDOM
Actually, my awk file will return a randomized input file, but if I run way(1) many times, the result always be same(only one result). But If I run way(2), then every time I can get a randomized file. What's wrong with it?
If I want to run this command in python, how should I do then?
Thank you so much for you answer.
EDIT:
Adding the actual code:
(1) in python
import os
os.system("cat data/MD-00001-00000100.input | awk -f utils/add_random_real_weights.awk -v seed=$RANDOM")
(2) in linux:
cat data/MD-00001-00000100.input | awk -f utils/add_random_real_weights.awk -v seed=$RANDOM

Use GNU parallel to parallelise a bash for loop

I have a for loop which runs a Python script ~100 times on 100 different input folders. The python script is most efficient on 2 cores, and I have 50 cores available. So I'd like to use GNU parallel to run the script on 25 folders at a time.
Here's my for loop (works fine, but is sequential of course), the python script takes a bunch of input variables including the -p 2 which runs it on two cores:
for folder in $(find /home/rob/PartitionFinder/ -maxdepth 2 -type d); do
python script.py --raxml --quick --no-ml-tree $folder --force -p 2
done
and here's my attempt to parallelise it, which does not work:
folders=$(find /home/rob/PartitionFinder/ -maxdepth 2 -type d)
echo $folders | parallel -P 25 python script.py --raxml --quick --no-ml-tree {} --force -p 2
The issue I'm hitting (perhaps it's just the first of many though) is that my folders variable is not a list, so it's really just passing a long string of 100 folders as the {} to the script.
All hints gratefully received.
Replace echo $folders | parallel ... with echo "$folders" | parallel ....
Without the double quotes, the shell parses spaces in $folders and passes them as separate arguments to echo, which causes them to be printed on one line. parallel provides each line as argument to the job.
To avoid such quoting issues altogether, it is always a good idea to pipe find to parallel directly, and use the null character as the delimiter:
find ... -print0 | parallel -0 ...
This will work even when encountering file names that contain multiple spaces or a newline character.
you can pipe find directly to parallel:
find /home/rob/PartitionFinder/ -maxdepth 2 -type d | parallel -P 25 python script.py --raxml --quick --no-ml-tree {} --force -p 2
If you want to keep the string in $folder, you can pipe the echo to xargs.
echo $folders | xargs -n 1 | parallel -P 25 python script.py --raxml --quick --no-ml-tree {} --force -p 2
You can create a Makefile like this:
#!/usr/bin/make -f
FOLDERS=$(shell find /home/rob/PartitionFinder/ -maxdepth 2 -type d)
all: ${FOLDERS}
# To execute the find before the all
find_folders:
# echo $(FOLDERS) > /dev/null
${FOLDERS}: find_folders
# python script.py --raxml --quick --no-ml-tree $# --force -p 2
and then run make -j 25
Be careful: use tabs to indent in your file
Also, files with spaces in the name won't work.

Categories

Resources