Pass output of 'find' command to Python with docopt (issue with spaces) - python

Consider this simple Python command-line script:
"""foobar
Description
Usage:
foobar [options] <files>...
Arguments:
<files> List of files.
Options:
-h, --help Show help.
--version Show version.
"""
import docopt
args = docopt.docopt(__doc__)
print(args['<files>'])
And consider that I have the following files in a folder:
file1.pdf
file 2.pdf
Now I want to pass the output of the find command to my simple command-line script. But when I try
foobar `find . -iname '*.pdf'`
I don't get the list of files that I wanted, because the input is split on spaces. I.e. I get:
['./file', '2.pdf', './file1.pdf']
How can I correctly do this?

This isn't a Python question. This is all about how the shell tokenizes command lines. Whitespace is used to separate command arguments, which is why file 2.pdf is showing as as two separate arguments.
You can combine find and xargs to do what you want like this:
find . -iname '*.pdf' -print0 | xargs -0 foobar
The -print0 argument to find tells it to output filenames seperated by ASCII NUL characters rather than spaces, and the -0 argument to xargs tells it to expect that form of input. xargs with then call your foobar script with the correct arguments.
Compare:
$ ./foobar $(find . -iname '*.pdf' )
['./file', '2.pdf', './file1.pdf']
To:
$ find . -iname '*.pdf' -print0 | xargs -0 ./foobar
['./file 2.pdf', './file1.pdf']

Related

Running python scripts for different input directory through bash terminal

I am trying to automate my task through the terminal using bash. I have a python script that takes two parameters (paths of input and output) and then the script runs and saves the output in a text file.
All the input directories have a pattern that starts from "g-" whereas the output directory remains static.
So, I want to write a script that could run on its own so that I don't have to manually run it on hundreds of directories.
$ python3 program.py ../g-changing-directory/ ~/static-directory/ > ~/static-directory/final/results.txt
You can do it like this:
find .. -maxdepth 1 -type d -name "g-*" | xargs -n1 -P1 -I{} python3 program.py {} ~/static-directory/ >> ~/static-directory/final/results.txt
find .. will look in the parent directory -maxdepth 1 will look only on the top level and not take any subdirectories -type d only takes directories -name "g-*" takes objects starting with g- (use -iname "g-*" if you want objects starting with g- or G-).
We pipe it to xargs which will apply the input from stdin to the command specified. -n1 tells it to start a process per input word, -P1 tells it to only run one process at a time, -I{} tells it to replace {} with the input in the command.
Then we specify the command to run for the input, where {} is replaced by xargs.: python3 program.py {} ~/static-directory/ >> ~/static-directory/final/results.txt have a look at the >> this will append to a file if it exists, while > will overwrite the file, if it exists.
With -P4 you could start four processes in parallel. But you do not want to do that, as you are writing into one file and multi-processing can mess up your output file. If every process would write into its own file, you could do multi-processing safely.
Refer to man find and man xargs for further details.
There are many other ways to do this, as well. E.g. for loops like this:
for F in $(ls .. | grep -oP "g-.*"); do
python3 program.py $F ~/static-directory/ >> ~/static-directory/final/results.txt
done
There are many ways to do this, here's what I would write:
find .. -type d -name "g-*" -exec python3 program.py {} ~/static-directory/ \; > ~/static-directory/final/results.txt
You haven't mentioned if you want nested directories to be included, if the answer is no then you have to add the -maxdepth parameter as in #toydarian's answer.

How to add argparse flags based on shell find results

I have a python script that accepts a -f flag, and appends multiple uses of the flag.
For example, if I run python myscript -f file1.txt -f file2.txt, I would have a list of files, files=['file1.txt', 'files2.txt']. This works great, but am wondering how I can automatically use the results of a find command to append as many -f flags as there are files.
I've tried:
find ./ -iname '*.txt' -print0 | xargs python myscript.py -f
But it only grabs the first file
With the caveat that this will fail if there are more files than will fit on a single command line (whereas xargs would run myscript.py multiple times, each with a subset of the full list of arguments):
#!/usr/bin/env bash
args=( )
while IFS= read -r -d '' name; do
args+=( -f "$name" )
done < <(find . -iname '*.txt' -print0)
python myscript.py "${args[#]}"
If you want to do this safely in a way that tolerates an arbitrary number of filenames, you're better off using a long-form option -- such as --file rather than -f -- with the = separator allowing the individual name to be passed as part of the same argv entry, thus preventing xargs from splitting a filename apart from the sigil that precedes it:
#!/usr/bin/env bash
# This requires -printf, a GNU find extension
find . -iname '*.txt' -printf '--file=%p\0' | xargs -0 python myscript.py
...or, more portably (running on MacOS, albeit still requiring a shell -- such as bash -- that can handle NUL-delimited reads):
#!/usr/bin/env bash
# requires find -print0 and xargs -0; these extensions are available on BSD as well as GNU
find . -iname '*.txt' -print0 |
while IFS= read -r -d '' f; do printf '--file=%s\0' "$f"; done |
xargs -0 python myscript.py
Your title seems to imply that you can modify the script. In that case, use the nargs (number of args) option to allow more arguments for the -f flag:
parser = argparse.ArgumentParser()
parser.add_argument('--files', '-f', nargs='+')
args = parser.parse_args()
print(args.files)
Then you can use your find command easily:
15:44 $ find . -depth 1 | xargs python args.py -f
['./args.py', './for_clint', './install.sh', './sys_user.json']
Otherwise, if you can't modify the script, see #CharlesDuffy's answer.

Piping find output to xargs to run python script

I'm seeing the weirdest results here and was hoping somebody can explain this to me.
So, I'm using a find command to locate all files of type .log, and piping the results of that command to a python script. ONLY the first result of the find command is being piped to xargs, or xargs is receiving all results and passing them as a string to the python script.
Example:
# Find returns 3 .log files
find . -name "*.log"
file1.log
file2.log
file3.log
# xargs only runs the python script for first found file (or all 3 are being piped to script as a single string, and only first result is read in)
find . -name "*.log" | xargs python myscript.py -infile
Success: parsed file1.log
What I want to happen is the python script to run for all 3 files found.
find . -name "*.log" | xargs python myscript.py -infile
Success: parsed file1.log
Success: parsed file1.log
Success: parsed file1.log
A safer way to do this is as follows:
find . -name "*.log" -print0 | \
xargs -0 -I {} python myscript.py -infile {}
When passing file names from find, it is very important to use the -0 or -d option to set the separator to \0 (null). Filenames can not contain / or \0 characters, so it guarantees a safe use of the filename.
With xargs, you must supply -0 to inform of the use of \0 separators. You also need:
"-L 1" if you just need the filename as the last argument.
"-I {}" to pass one or more to the command anywhere in the command.

How to pass UNIX find output as arguments for a Python script?

I have a list of files that I can obtain using the UNIX 'find' command such as:
$ find . -name "*.txt"
foo/foo.txt
bar/bar.txt
How can I pass this output into a Python script like hello.py so I can parse it using Python's argparse library?
Thanks!
If you want just text output of find(1), then use a pipe:
~$ find . -name "*.txt" | python hello.py
If you are looking to pass list of files as arguments to the script, use xargs(1):
~$ find . -name "*.txt" -print0 | xargs -0 python hello.py
or use -exec option of find(1).
Use xargs:
find . -name "*.txt" | xargs python -c 'import sys; print sys.argv[1:]'
From man find:
-exec command ;
Execute command; true if 0 status is returned. All following
arguments to find are taken to be arguments to the command until
an argument consisting of `;' is encountered. The string `{}'
is replaced by the current file name being processed everywhere
it occurs in the arguments to the command, not just in arguments
where it is alone, as in some versions of find. Both of these
constructions might need to be escaped (with a `\') or quoted to
protect them from expansion by the shell. See the EXAMPLES sec‐
tion for examples of the use of the -exec option. The specified
command is run once for each matched file. The command is exe‐
cuted in the starting directory. There are unavoidable secu‐
rity problems surrounding use of the -exec action; you should
use the -execdir option instead.
-exec command {} +
This variant of the -exec action runs the specified command on
the selected files, but the command line is built by appending
each selected file name at the end; the total number of invoca‐
tions of the command will be much less than the number of
matched files. The command line is built in much the same way
that xargs builds its command lines. Only one instance of `{}'
is allowed within the command. The command is executed in the
starting directory.
So you can do
find . -name "*.txt" -exec python myscript.py {} +
This helps, if you need to pass arguments after the list of arguments from the find output:
$ python hello.py `find . -name "*.txt"`
I used it to concat pdf files into another one:
$ pdfunite `find . -name "*.pdf" | sort` all.pdf

xargs, python, reading files from stdin

In a directory with 30 CSV files, running:
find . -name "*.csv" | (xargs python ~/script.py)
How can I have python properly run on each file passed by xargs? I do print sys.stdin and it's just one file. I try for file in stdin loop, but there's nothing there. What am I missing?
In fact xargs does not pass to stdin. It passes all its read from stdin as arguments to the command you give it in parameter.
You can debug your command invokation with an echo:
find . -name "*.csv" | (xargs echo python ./script.py)
You will see all your files outputed on one line.
So in fact to access your files from arguments list in python use this in your script:
import sys
for argument in sys.argv[1:]:
print argument
script.py is being run exactly once for each csv file
python ~/script.py file1.csv
python ~/script.py file2.csv
python ~/script.py file3.csv
python ~/script.py file4.csv
etc
If you want to run it like
python ~/script.py file1.csv file2.csv file3.csv
then do
python ~/script.py `find . -name "*.csv"`
or
python ~/script.py `ls *.csv`
(the " may have to be escaped, not sure)
EDIT: note the difference between ` and '

Categories

Resources