xargs, python, reading files from stdin - python

In a directory with 30 CSV files, running:
find . -name "*.csv" | (xargs python ~/script.py)
How can I have python properly run on each file passed by xargs? I do print sys.stdin and it's just one file. I try for file in stdin loop, but there's nothing there. What am I missing?

In fact xargs does not pass to stdin. It passes all its read from stdin as arguments to the command you give it in parameter.
You can debug your command invokation with an echo:
find . -name "*.csv" | (xargs echo python ./script.py)
You will see all your files outputed on one line.
So in fact to access your files from arguments list in python use this in your script:
import sys
for argument in sys.argv[1:]:
print argument

script.py is being run exactly once for each csv file
python ~/script.py file1.csv
python ~/script.py file2.csv
python ~/script.py file3.csv
python ~/script.py file4.csv
etc
If you want to run it like
python ~/script.py file1.csv file2.csv file3.csv
then do
python ~/script.py `find . -name "*.csv"`
or
python ~/script.py `ls *.csv`
(the " may have to be escaped, not sure)
EDIT: note the difference between ` and '

Related

Running python scripts for different input directory through bash terminal

I am trying to automate my task through the terminal using bash. I have a python script that takes two parameters (paths of input and output) and then the script runs and saves the output in a text file.
All the input directories have a pattern that starts from "g-" whereas the output directory remains static.
So, I want to write a script that could run on its own so that I don't have to manually run it on hundreds of directories.
$ python3 program.py ../g-changing-directory/ ~/static-directory/ > ~/static-directory/final/results.txt
You can do it like this:
find .. -maxdepth 1 -type d -name "g-*" | xargs -n1 -P1 -I{} python3 program.py {} ~/static-directory/ >> ~/static-directory/final/results.txt
find .. will look in the parent directory -maxdepth 1 will look only on the top level and not take any subdirectories -type d only takes directories -name "g-*" takes objects starting with g- (use -iname "g-*" if you want objects starting with g- or G-).
We pipe it to xargs which will apply the input from stdin to the command specified. -n1 tells it to start a process per input word, -P1 tells it to only run one process at a time, -I{} tells it to replace {} with the input in the command.
Then we specify the command to run for the input, where {} is replaced by xargs.: python3 program.py {} ~/static-directory/ >> ~/static-directory/final/results.txt have a look at the >> this will append to a file if it exists, while > will overwrite the file, if it exists.
With -P4 you could start four processes in parallel. But you do not want to do that, as you are writing into one file and multi-processing can mess up your output file. If every process would write into its own file, you could do multi-processing safely.
Refer to man find and man xargs for further details.
There are many other ways to do this, as well. E.g. for loops like this:
for F in $(ls .. | grep -oP "g-.*"); do
python3 program.py $F ~/static-directory/ >> ~/static-directory/final/results.txt
done
There are many ways to do this, here's what I would write:
find .. -type d -name "g-*" -exec python3 program.py {} ~/static-directory/ \; > ~/static-directory/final/results.txt
You haven't mentioned if you want nested directories to be included, if the answer is no then you have to add the -maxdepth parameter as in #toydarian's answer.

I need to pass as parameter of a python script the name of the last .csv file created in the folder

I need to pass to a csv2sql script the name of the last .csv file I've created in the working folder.
Could I use this :
ls -t *.csv | head -1
to get the output as parameter?
python csv2sql.py -t products last_created_file.csv > sql.output
If I try:
python csv2sql.py -t products ls -t *.csv | head -1 > sql.output
I get this:
csv2sql.py: error: argument csvFile: can't open 'ls': [Errno 2] No such file or directory: 'ls'
This is a shell problem. The common syntax for interpreting command-line output as a single arg for another command line, is the back-quote. For what I think you're doing, this would read:
python csv2sql.py -t products `ls -t *.csv | head -1` > sql.output
This will
Execute the ls command
Pipe that to head
Return the single line as an argument to python
Divert the stdout of python to sql.output
You're getting an error because bash does not see ls -t *.csv | head -1 as an command but as the arguments to the python command (so each word becomes an argument, for example: args[2] would be 'ls', args[3] would be '-t' etc.).
To fix this you could save the output of ls -t *.csv | head -1 to a variable and then pass the variable to the python command, by saving this as a bash file and then executing it:
#!/bin/bash
output=$(ls -t *.csv | head -1)
python csv2sql.py -t products $output | head -1 > sql.output
Please do note that I currently don't have access to my linux machine and can't test anything, so I'm assuming that the ls -t *.csv | head -1 command does what you want it to do.
The solution given above is also not a very tidy one since it uses both bash and python, so personally to I would recommand researching the subsystem module and use that for an all python solution.

python, read the filename from directory , use the string in bash script

I am writing a bash script (e.g. program.sh) where I am calling a python code in which a list of files are read from a directory.
the python script (read_files.py) is as following:
import os
def files(path):
for filename in os.listdir('/home/testfiles'):
if os.path.isfile(os.path.join('/home/testfiles', filename)):
yield filename
for filename in files("."):
print (filename)
Now I want to keep the string filename and use it in the bash script.
e.g.
program.sh:
#!/bin/bash
python read_files.py
$Database_maindir/filename
.
.
.
How could I keep the string filename (the names of files in the directory) and write a loop in order to execute commands in bash script for each filename?
The Python script in the question doesn't do anything that Bash cannot already do all by itself, and simpler and easier. Use simple native Bash instead:
shopt -s nullglob
for path in /home/testfiles/*; do
if [[ -f "$path" ]]; then
filename=$(basename "$path")
echo "do something with $filename"
fi
done
If the Python script does something more than what you wrote in the question,
for example it does some complex computation and spits out filenames,
which would be complicated to do in Bash,
then you do have a legitimate use case to keep it.
In that case, you can iterate over the lines in the output like this:
python read_files.py | while read -r filename; do
echo "do something with $filename"
done
Are you looking for something like this? =
for filename in $(python read_files.py); do
someCommand $filename
done

Piping find output to xargs to run python script

I'm seeing the weirdest results here and was hoping somebody can explain this to me.
So, I'm using a find command to locate all files of type .log, and piping the results of that command to a python script. ONLY the first result of the find command is being piped to xargs, or xargs is receiving all results and passing them as a string to the python script.
Example:
# Find returns 3 .log files
find . -name "*.log"
file1.log
file2.log
file3.log
# xargs only runs the python script for first found file (or all 3 are being piped to script as a single string, and only first result is read in)
find . -name "*.log" | xargs python myscript.py -infile
Success: parsed file1.log
What I want to happen is the python script to run for all 3 files found.
find . -name "*.log" | xargs python myscript.py -infile
Success: parsed file1.log
Success: parsed file1.log
Success: parsed file1.log
A safer way to do this is as follows:
find . -name "*.log" -print0 | \
xargs -0 -I {} python myscript.py -infile {}
When passing file names from find, it is very important to use the -0 or -d option to set the separator to \0 (null). Filenames can not contain / or \0 characters, so it guarantees a safe use of the filename.
With xargs, you must supply -0 to inform of the use of \0 separators. You also need:
"-L 1" if you just need the filename as the last argument.
"-I {}" to pass one or more to the command anywhere in the command.

How to pass UNIX find output as arguments for a Python script?

I have a list of files that I can obtain using the UNIX 'find' command such as:
$ find . -name "*.txt"
foo/foo.txt
bar/bar.txt
How can I pass this output into a Python script like hello.py so I can parse it using Python's argparse library?
Thanks!
If you want just text output of find(1), then use a pipe:
~$ find . -name "*.txt" | python hello.py
If you are looking to pass list of files as arguments to the script, use xargs(1):
~$ find . -name "*.txt" -print0 | xargs -0 python hello.py
or use -exec option of find(1).
Use xargs:
find . -name "*.txt" | xargs python -c 'import sys; print sys.argv[1:]'
From man find:
-exec command ;
Execute command; true if 0 status is returned. All following
arguments to find are taken to be arguments to the command until
an argument consisting of `;' is encountered. The string `{}'
is replaced by the current file name being processed everywhere
it occurs in the arguments to the command, not just in arguments
where it is alone, as in some versions of find. Both of these
constructions might need to be escaped (with a `\') or quoted to
protect them from expansion by the shell. See the EXAMPLES sec‐
tion for examples of the use of the -exec option. The specified
command is run once for each matched file. The command is exe‐
cuted in the starting directory. There are unavoidable secu‐
rity problems surrounding use of the -exec action; you should
use the -execdir option instead.
-exec command {} +
This variant of the -exec action runs the specified command on
the selected files, but the command line is built by appending
each selected file name at the end; the total number of invoca‐
tions of the command will be much less than the number of
matched files. The command line is built in much the same way
that xargs builds its command lines. Only one instance of `{}'
is allowed within the command. The command is executed in the
starting directory.
So you can do
find . -name "*.txt" -exec python myscript.py {} +
This helps, if you need to pass arguments after the list of arguments from the find output:
$ python hello.py `find . -name "*.txt"`
I used it to concat pdf files into another one:
$ pdfunite `find . -name "*.pdf" | sort` all.pdf

Categories

Resources