I would like to call a Python script from within a Bash while loop. However, I do not understand very well how to use appropriately the while loop (and maybe variable) syntax of the Bash. The behaviour I am looking for is that, while a file still contains lines (DNA sequences), I am calling a Python script to extract groups of sequences so that another program (dialign2) can align them. Finally, I add the alignments to a result file. Note: I am not trying to iterate over the file. What should I change in order for the Bash while loop to work? I also want to be sure that the while loop will re-check the changing file.txt on each loop. Here is my attempt:
#!/bin/bash
# Call a python script as many times as needed to treat a text file
c=1
while [ `wc -l file.txt` > 0 ] ; # Stop when file.txt has no more lines
do
echo "Python script called $c times"
python script.py # Uses file.txt and removes lines from it
# The Python script also returns a temp.txt file containing DNA sequences
c=$c + 1
dialign -f temp.txt # aligns DNA sequences
cat temp.fa >>results.txt # append DNA alignements to result file
done
Thanks!
No idea why you want to do this.
c=1
while [[ -s file.txt ]] ; # Stop when file.txt has no more lines
do
echo "Python script called $c times"
python script.py # Uses file.txt and removes lines from it
c=$(($c + 1))
done
try -gt to eliminate the shell metacharacter >
while [ `wc -l file.txt` -gt 0 ]
do
...
c=$[c + 1]
done
#OP if you want to loop through a file , just use while read loop. Also, you are not using the variables $c as well as the line. Are you passing each line to your Python script? Or you just calling your Python script whenever a line is encountered? (your script going to be slow if you do that)
while true
do
while read -r line
do
# if you are taking STDIN in myscript.py, then something must be passed to
# myscript.py, if not i really don't understand what you are doing.
echo "$line" | python myscript.py > temp.txt
dialign -f temp.txt # aligns DNA sequences
cat temp.txt >>results.txt
done <"file.txt"
if [ ! -s "file.txt" ]; break ;fi
done
Lastly, you could have done everything in Python. the way to iterate "file.txt" in Python is simply
f=open("file.txt"):
for line in f:
print "do something with line"
print "or bring what you have in myscript.py here"
f.close()
The following should do what you say you want:
#!/bin/bash
c=1
while read line;
do
echo "Python script called $c times"
# $line contains a line of text from file.txt
python script.py
c=$((c + 1))
done < file.txt
However, there is no need to use bash, to iterate over the lines in a file. You can do that quite easily without ever leaving python:
myfile = open('file.txt', 'r')
for count, line in enumerate(myfile):
print '%i lines in file' % (count + 1,)
# the variable "line" contains the line of text from the file.txt
# Do your thing here.
Related
The following Perl script (my.pl) can read from either the file in the command line arguments or from standard input (STDIN):
while (<>) {
print($_);
}
perl my.pl will read from standard input, while perl my.pl a.txt will read from a.txt. This is very handy.
Is there an equivalent in Bash?
The following solution reads from a file if the script is called with a file name as the first parameter $1 and otherwise from standard input.
while read line
do
echo "$line"
done < "${1:-/dev/stdin}"
The substitution ${1:-...} takes $1 if defined. Otherwise, the file name of the standard input of the own process is used.
Perhaps the simplest solution is to redirect standard input with a merging redirect operator:
#!/bin/bash
less <&0
Standard input is file descriptor zero. The above sends the input piped to your bash script into less's standard input.
Read more about file descriptor redirection.
Here is the simplest way:
#!/bin/sh
cat -
Usage:
$ echo test | sh my_script.sh
test
To assign stdin to the variable, you may use: STDIN=$(cat -) or just simply STDIN=$(cat) as operator is not necessary (as per #mklement0 comment).
To parse each line from the standard input, try the following script:
#!/bin/bash
while IFS= read -r line; do
printf '%s\n' "$line"
done
To read from the file or stdin (if argument is not present), you can extend it to:
#!/bin/bash
file=${1--} # POSIX-compliant; ${1:--} can be used either.
while IFS= read -r line; do
printf '%s\n' "$line" # Or: env POSIXLY_CORRECT=1 echo "$line"
done < <(cat -- "$file")
Notes:
- read -r - Do not treat a backslash character in any special way. Consider each backslash to be part of the input line.
- Without setting IFS, by default the sequences of Space and Tab at the beginning and end of the lines are ignored (trimmed).
- Use printf instead of echo to avoid printing empty lines when the line consists of a single -e, -n or -E. However there is a workaround by using env POSIXLY_CORRECT=1 echo "$line" which executes your external GNU echo which supports it. See: How do I echo "-e"?
See: How to read stdin when no arguments are passed? at stackoverflow SE
I think this is the straightforward way:
$ cat reader.sh
#!/bin/bash
while read line; do
echo "reading: ${line}"
done < /dev/stdin
--
$ cat writer.sh
#!/bin/bash
for i in {0..5}; do
echo "line ${i}"
done
--
$ ./writer.sh | ./reader.sh
reading: line 0
reading: line 1
reading: line 2
reading: line 3
reading: line 4
reading: line 5
The echo solution adds new lines whenever IFS breaks the input stream. #fgm's answer can be modified a bit:
cat "${1:-/dev/stdin}" > "${2:-/dev/stdout}"
The Perl loop in the question reads from all the file name arguments on the command line, or from standard input if no files are specified. The answers I see all seem to process a single file or standard input if there is no file specified.
Although often derided accurately as UUOC (Useless Use of cat), there are times when cat is the best tool for the job, and it is arguable that this is one of them:
cat "$#" |
while read -r line
do
echo "$line"
done
The only downside to this is that it creates a pipeline running in a sub-shell, so things like variable assignments in the while loop are not accessible outside the pipeline. The bash way around that is Process Substitution:
while read -r line
do
echo "$line"
done < <(cat "$#")
This leaves the while loop running in the main shell, so variables set in the loop are accessible outside the loop.
Perl's behavior, with the code given in the OP can take none or several arguments, and if an argument is a single hyphen - this is understood as stdin. Moreover, it's always possible to have the filename with $ARGV.
None of the answers given so far really mimic Perl's behavior in these respects. Here's a pure Bash possibility. The trick is to use exec appropriately.
#!/bin/bash
(($#)) || set -- -
while (($#)); do
{ [[ $1 = - ]] || exec < "$1"; } &&
while read -r; do
printf '%s\n' "$REPLY"
done
shift
done
Filename's available in $1.
If no arguments are given, we artificially set - as the first positional parameter. We then loop on the parameters. If a parameter is not -, we redirect standard input from filename with exec. If this redirection succeeds we loop with a while loop. I'm using the standard REPLY variable, and in this case you don't need to reset IFS. If you want another name, you must reset IFS like so (unless, of course, you don't want that and know what you're doing):
while IFS= read -r line; do
printf '%s\n' "$line"
done
More accurately...
while IFS= read -r line ; do
printf "%s\n" "$line"
done < file
Please try the following code:
while IFS= read -r line; do
echo "$line"
done < file
I combined all of the above answers and created a shell function that would suit my needs. This is from a Cygwin terminal of my two Windows 10 machines where I had a shared folder between them. I need to be able to handle the following:
cat file.cpp | tx
tx < file.cpp
tx file.cpp
Where a specific filename is specified, I need to used the same filename during copy. Where input data stream has been piped through, then I need to generate a temporary filename having the hour minute and seconds. The shared mainfolder has subfolders of the days of the week. This is for organizational purposes.
Behold, the ultimate script for my needs:
tx ()
{
if [ $# -eq 0 ]; then
local TMP=/tmp/tx.$(date +'%H%M%S')
while IFS= read -r line; do
echo "$line"
done < /dev/stdin > $TMP
cp $TMP //$OTHER/stargate/$(date +'%a')/
rm -f $TMP
else
[ -r $1 ] && cp $1 //$OTHER/stargate/$(date +'%a')/ || echo "cannot read file"
fi
}
If there is any way that you can see to further optimize this, I would like to know.
#!/usr/bin/bash
if [ -p /dev/stdin ]; then
#for FILE in "$#" /dev/stdin
for FILE in /dev/stdin
do
while IFS= read -r LINE
do
echo "$#" "$LINE" #print line argument and stdin
done < "$FILE"
done
else
printf "[ -p /dev/stdin ] is false\n"
#dosomething
fi
Running:
echo var var2 | bash std.sh
Result:
var var2
Running:
bash std.sh < <(cat /etc/passwd)
Result:
root:x:0:0::/root:/usr/bin/bash
bin:x:1:1::/:/usr/bin/nologin
daemon:x:2:2::/:/usr/bin/nologin
mail:x:8:12::/var/spool/mail:/usr/bin/nologin
Two principle ways:
Either pipe the argument files and stdin into a single stream and process that like stdin (stream approach)
Or redirect stdin (and argument files) into a named pipe and process that like a file (file approach)
Stream approach
Minor revisions to earlier answers:
Use cat, not less. It's faster and you don't need pagination.
Use $1 to read from first argument file (if present) or $* to read from all files (if present). If these variables are empty, read from stdin (like cat does)
#!/bin/bash
cat $* | ...
File approach
Writing into a named pipe is a bit more complicated, but this allows you to treat stdin (or files) like a single file:
Create pipe with mkfifo.
Parallelize the writing process. If the named pipe is not read from, it may block otherwise.
For redirecting stdin into a subprocess (as necessary in this case), use <&0 (unlike what others have been commenting, this is not optional here).
#!/bin/bash
mkfifo /tmp/myStream
cat $* <&0 > /tmp/myStream & # separate subprocess (!)
AddYourCommandHere /tmp/myStream # process input like a file,
rm /tmp/myStream # cleaning up
File approach: Variation
Create named pipe only if no arguments are given. This may be more stable for reading from files as named pipes can occasionally block.
#!/bin/bash
FILES=$*
if echo $FILES | egrep -v . >&/dev/null; then # if $FILES is empty
mkfifo /tmp/myStream
cat <&0 > /tmp/myStream &
FILES=/tmp/myStream
fi
AddYourCommandHere $FILES # do something ;)
if [ -e /tmp/myStream ]; then
rm /tmp/myStream
fi
Also, it allows you to iterate over files and stdin rather than concatenate all into a single stream:
for file in $FILES; do
AddYourCommandHere $file
done
The following works with standard sh (tested with Dash on Debian) and is quite readable, but that's a matter of taste:
if [ -n "$1" ]; then
cat "$1"
else
cat
fi | commands_and_transformations
Details: If the first parameter is non-empty then cat that file, else cat standard input. Then the output of the whole if statement is processed by the commands_and_transformations.
The code ${1:-/dev/stdin} will just understand the first argument, so you can use this:
ARGS='$*'
if [ -z "$*" ]; then
ARGS='-'
fi
eval "cat -- $ARGS" | while read line
do
echo "$line"
done
Reading from stdin into a variable or from a file into a variable.
Most examples in the existing answers use loops that immediately echo each of line as it is read from stdin. This might not be what you really want to do.
In many cases you need to write a script that calls a command which only accepts a file argument. But in your script you may want to support stdin also. In this case you need to first read full stdin and then provide it as a file.
Let's see an example. The script below prints the certificate details of a certificate (in PEM format) that is passed either as a file or via stdin.
# print-cert script
content=""
while read line
do
content="$content$line\n"
done < "${1:-/dev/stdin}"
# Remove the last newline appended in the above loop
content=${content%\\n}
# Keytool accepts certificate only via a file, but in our script we fix this.
keytool -printcert -v -file <(echo -e $content)
# Read from file
cert-print mycert.crt
# Owner: CN=....
# Issuer: ....
# ....
# Or read from stdin (by pasting)
cert-print
#..paste the cert here and press enter
# Ctl-D
# Owner: CN=....
# Issuer: ....
# ....
# Or read from stdin by piping to another command (which just prints the cert(s) ). In this case we use openssl to fetch directly from a site and then print its info.
echo "" | openssl s_client -connect www.google.com:443 -prexit 2>/dev/null \
| sed -n -e '/BEGIN\ CERTIFICATE/,/END\ CERTIFICATE/ p' \
| cert-print
# Owner: CN=....
# Issuer: ....
# ....
This one is easy to use on the terminal:
$ echo '1\n2\n3\n' | while read -r; do echo $REPLY; done
1
2
3
I don't find any of these answers acceptable. In particular, the accepted answer only handles the first command line parameter and ignores the rest. The Perl program that it is trying to emulate handles all the command line parameters. So the accepted answer doesn't even answer the question.
Other answers use Bash extensions, add unnecessary 'cat' commands, only work for the simple case of echoing input to output, or are just unnecessarily complicated.
However, I have to give them some credit, because they gave me some ideas. Here is the complete answer:
#!/bin/sh
if [ $# = 0 ]
then
DEFAULT_INPUT_FILE=/dev/stdin
else
DEFAULT_INPUT_FILE=
fi
# Iterates over all parameters or /dev/stdin
for FILE in "$#" $DEFAULT_INPUT_FILE
do
while IFS= read -r LINE
do
# Do whatever you want with LINE here.
echo $LINE
done < "$FILE"
done
As a workaround, you can use the stdin device in the /dev directory:
....| for item in `cat /dev/stdin` ; do echo $item ;done
With...
while read line
do
echo "$line"
done < "${1:-/dev/stdin}"
I got the following output:
Ignored 1265 characters from standard input. Use "-stdin" or "-" to tell how to handle piped input.
Then decided with for:
Lnl=$(cat file.txt | wc -l)
echo "Last line: $Lnl"
nl=1
for num in `seq $nl +1 $Lnl`;
do
echo "Number line: $nl"
line=$(cat file.txt | head -n $nl | tail -n 1)
echo "Read line: $line"
nl=$[$nl+1]
done
Use:
for line in `cat`; do
something($line);
done
Basically I want to take as input text from a file, remove a line from that file, and send the output back to the same file. Something along these lines if that makes it any clearer.
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > file_name
however, when I do this I end up with a blank file.
Any thoughts?
Use sponge for this kind of tasks. Its part of moreutils.
Try this command:
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | sponge file_name
You cannot do that because bash processes the redirections first, then executes the command. So by the time grep looks at file_name, it is already empty. You can use a temporary file though.
#!/bin/sh
tmpfile=$(mktemp)
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > ${tmpfile}
cat ${tmpfile} > file_name
rm -f ${tmpfile}
like that, consider using mktemp to create the tmpfile but note that it's not POSIX.
Use sed instead:
sed -i '/seg[0-9]\{1,\}\.[0-9]\{1\}/d' file_name
try this simple one
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name
Your file will not be blank this time :) and your output is also printed to your terminal.
You can't use redirection operator (> or >>) to the same file, because it has a higher precedence and it will create/truncate the file before the command is even invoked. To avoid that, you should use appropriate tools such as tee, sponge, sed -i or any other tool which can write results to the file (e.g. sort file -o file).
Basically redirecting input to the same original file doesn't make sense and you should use appropriate in-place editors for that, for example Ex editor (part of Vim):
ex '+g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' -scwq file_name
where:
'+cmd'/-c - run any Ex/Vim command
g/pattern/d - remove lines matching a pattern using global (help :g)
-s - silent mode (man ex)
-c wq - execute :write and :quit commands
You may use sed to achieve the same (as already shown in other answers), however in-place (-i) is non-standard FreeBSD extension (may work differently between Unix/Linux) and basically it's a stream editor, not a file editor. See: Does Ex mode have any practical use?
One liner alternative - set the content of the file as variable:
VAR=`cat file_name`; echo "$VAR"|grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' > file_name
Since this question is the top result in search engines, here's a one-liner based on https://serverfault.com/a/547331 that uses a subshell instead of sponge (which often isn't part of a vanilla install like OS X):
echo "$(grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name)" > file_name
The general case is:
echo "$(cat file_name)" > file_name
Edit, the above solution has some caveats:
printf '%s' <string> should be used instead of echo <string> so that files containing -n don't cause undesired behavior.
Command substitution strips trailing newlines (this is a bug/feature of shells like bash) so we should append a postfix character like x to the output and remove it on the outside via parameter expansion of a temporary variable like ${v%x}.
Using a temporary variable $v stomps the value of any existing variable $v in the current shell environment, so we should nest the entire expression in parentheses to preserve the previous value.
Another bug/feature of shells like bash is that command substitution strips unprintable characters like null from the output. I verified this by calling dd if=/dev/zero bs=1 count=1 >> file_name and viewing it in hex with cat file_name | xxd -p. But echo $(cat file_name) | xxd -p is stripped. So this answer should not be used on binary files or anything using unprintable characters, as Lynch pointed out.
The general solution (albiet slightly slower, more memory intensive and still stripping unprintable characters) is:
(v=$(cat file_name; printf x); printf '%s' ${v%x} > file_name)
Test from https://askubuntu.com/a/752451:
printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do (v=$(cat file_uniquely_named.txt; printf x); printf '%s' ${v%x} > file_uniquely_named.txt); done; cat file_uniquely_named.txt; rm file_uniquely_named.txt
Should print:
hello
world
Whereas calling cat file_uniquely_named.txt > file_uniquely_named.txt in the current shell:
printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do cat file_uniquely_named.txt > file_uniquely_named.txt; done; cat file_uniquely_named.txt; rm file_uniquely_named.txt
Prints an empty string.
I haven't tested this on large files (probably over 2 or 4 GB).
I have borrowed this answer from Hart Simha and kos.
This is very much possible, you just have to make sure that by the time you write the output, you're writing it to a different file. This can be done by removing the file after opening a file descriptor to it, but before writing to it:
exec 3<file ; rm file; COMMAND <&3 >file ; exec 3>&-
Or line by line, to understand it better :
exec 3<file # open a file descriptor reading 'file'
rm file # remove file (but fd3 will still point to the removed file)
COMMAND <&3 >file # run command, with the removed file as input
exec 3>&- # close the file descriptor
It's still a risky thing to do, because if COMMAND fails to run properly, you'll lose the file contents. That can be mitigated by restoring the file if COMMAND returns a non-zero exit code :
exec 3<file ; rm file; COMMAND <&3 >file || cat <&3 >file ; exec 3>&-
We can also define a shell function to make it easier to use :
# Usage: replace FILE COMMAND
replace() { exec 3<$1 ; rm $1; ${#:2} <&3 >$1 || cat <&3 >$1 ; exec 3>&- }
Example :
$ echo aaa > test
$ replace test tr a b
$ cat test
bbb
Also, note that this will keep a full copy of the original file (until the third file descriptor is closed). If you're using Linux, and the file you're processing on is too big to fit twice on the disk, you can check out this script that will pipe the file to the specified command block-by-block while unallocating the already processed blocks. As always, read the warnings in the usage page.
The following will accomplish the same thing that sponge does, without requiring moreutils:
shuf --output=file --random-source=/dev/zero
The --random-source=/dev/zero part tricks shuf into doing its thing without doing any shuffling at all, so it will buffer your input without altering it.
However, it is true that using a temporary file is best, for performance reasons. So, here is a function that I have written that will do that for you in a generalized way:
# Pipes a file into a command, and pipes the output of that command
# back into the same file, ensuring that the file is not truncated.
# Parameters:
# $1: the file.
# $2: the command. (With $3... being its arguments.)
# See https://stackoverflow.com/a/55655338/773113
siphon()
{
local tmp file rc=0
[ "$#" -ge 2 ] || { echo "Usage: siphon filename [command...]" >&2; return 1; }
file="$1"; shift
tmp=$(mktemp -- "$file.XXXXXX") || return
"$#" <"$file" >"$tmp" || rc=$?
mv -- "$tmp" "$file" || rc=$(( rc | $? ))
return "$rc"
}
There's also ed (as an alternative to sed -i):
# cf. http://wiki.bash-hackers.org/howto/edit-ed
printf '%s\n' H 'g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' wq | ed -s file_name
You can use slurp with POSIX Awk:
!/seg[0-9]\{1,\}\.[0-9]\{1\}/ {
q = q ? q RS $0 : $0
}
END {
print q > ARGV[1]
}
Example
This does the trick pretty nicely in most of the cases I faced:
cat <<< "$(do_stuff_with f)" > f
Note that while $(…) strips trailing newlines, <<< ensures a final newline, so generally the result is magically satisfying.
(Look for “Here Strings” in man bash if you want to learn more.)
Full example:
#! /usr/bin/env bash
get_new_content() {
sed 's/Initial/Final/g' "${1:?}"
}
echo 'Initial content.' > f
cat f
cat <<< "$(get_new_content f)" > f
cat f
This does not truncate the file and yields:
Initial content.
Final content.
Note that I used a function here for the sake of clarity and extensibility, but that’s not a requirement.
A common usecase is JSON edition:
echo '{ "a": 12 }' > f
cat f
cat <<< "$(jq '.a = 24' f)" > f
cat f
This yields:
{ "a": 12 }
{
"a": 24
}
Try this
echo -e "AAA\nBBB\nCCC" > testfile
cat testfile
AAA
BBB
CCC
echo "$(grep -v 'AAA' testfile)" > testfile
cat testfile
BBB
CCC
I usually use the tee program to do this:
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name
It creates and removes a tempfile by itself.
I am using a bash script to call google-api's upload_video.py (https://developers.google.com/youtube/v3/guides/uploading_a_video )
I have a mp4 called output.mp4 which I would like to upload.
The problem is I cannot get my array to work how I would like.
This new line character is "required" because my arguments to python script contain spaces.
Here is a simplified version of my bash script:
# Operator may change these
hold=100
location="Foo, Montana "
declare -a file_array=("unique_ID_0" "unique_ID_1")
upload_file=upload_file.txt
upload_movie=output.mp4
# Hit enter at end b/c \n not recognized
upload_title=$location' - '${file_array[0]}' - Hold '$hold' Sweeps
'
upload_description='The spectrum recording was made in at '$location'.
'
# Overwrite with 1st call > else apppend >>
echo "$upload_title" > $upload_file
echo "$upload_description" >> $upload_file
# Load each line of text file into array
IFS=$'\n'
cmd_google=$(<$upload_file)
unset IFS
nn=1
for i in "${cmd_google[#]}"
do
echo "$i"
# Delete last character: \n
#i=${i[-nn]%?}
#i=${i: : -nn}
#i=${i::${#i}-nn}
i=${i%?}
#i=${i#"\n"}
#i=${i%"\n"}
echo "$i"
done
python upload_video.py --file=$upload_movie --title="${cmd_google[0]}" --description="${cmd_google[1]}"
At first I attempted to remove the new line character, but it appears that the enter or \n is not working how I would like, each line is not separate. It writes the title and description as one line.
How do I modify my bash script to recognize a newline character?
This is much simpler than you are making it.
# Operator may change these
hold=100
location="Foo, Montana"
declare -a file_array=("unique_ID_0" "unique_ID_1")
upload_file=upload_file.txt
upload_movie=output.mp4
upload_title="$location - ${file_array[0]} - Hold $hold Sweeps"
upload_description="The spectrum recording was made in at $location."
cat <<EOF > "$upload_file"
$upload_title
$upload_description
EOF
# ...
readarray -t cmd_google < "$upload_file"
python upload_video.py --file="$upload_movie" --title="${cmd_google[0]}" --description="${cmd_google[1]}"
I suspect the readarray command is all you are really looking for, since much of the above code is simply creating a file that I assume you are receiving already created.
I figured it out with help from chepner's answer. My question hid the fact that I wanted to write new line characters into the video's description.
Instead of adding a new line character in the bash script, it is much easier to have a text file which contains the correctly formatted script and read it in, then concatenate it with run-time specific variable.
In my case the correctly formatted text is called description.txt:
Here is a snip of my description.txt which contains newline characters
Here is my final version of the script:
# Operator may change these
hold=100
location="Foo, Montana"
declare -a file_array=("unique_ID_0" "unique_ID_1")
upload_title="$location - ${file_array[0]} - Hold $hold Sweeps"
upload_description="The spectrum recording was made in at $location. "
# Read in script which contains newline
temp=$(<description.txt)
# Concatenate them
upload_description="$upload_description$temp"
upload_movie=output.mp4
python upload_video.py --file="$upload_movie" --title="$upload_title" --description="$upload_description"
Trying to print filename of files that don't have 12 columns.
This works at the command line:
for i in *dim*; do awk -F',' '{if (NR==1 && NF!=12)print FILENAME}' $i; done;
When I try to embed this in subprocess.call in a python script, it doesn't work:
subprocess.call("""for %i in (*dim*.csv) do (awk -F, '{if ("NR==1 && NF!=12"^) {print FILENAME}}' %i)""", shell=True)
The first error I received was "Print is unexpected at this time" so I googled and added ^ within parentheses. Next error was "unexpected newline or end of string" so googled again and added the quotes around NR==1 && NF!=12. With the current code it's printing many lines in each file so I suspect something is wrong with the if statement. I've used awk and for looped before in this style in subprocess.call but not combined and with an if statement.
Multiple input files in AWK
In the string you are passing to subprocess.call(), your if statement is evaluating a string (probably not the comparison you want). It might be easier to just simplify the shell command by doing everything in AWK. You are executing AWK for every $i in the shell's for loop. Since you can give multiple input files to AWK, there is really no need for this loop.
You might want to scan through the entire files until you find any line that has other than 12 fields, and not only check the first line (NR==1). In this case, the condition would be only NF!=12.
If you want to check only the first line of each file, then NR==1 becomes FNR==1 when using multiple files. NR is the "number of records" (across all input files) and FNR is "file number of records" for the current input file only. These are special built-in variables in AWK.
Also, the syntax of AWK allows for the blocks to be executed only if the line matches some condition. Giving no condition (as you did) runs the block for every line. For example, to scan through all files given to AWK and print the name of a file with other than 12 fields on the first line, try:
awk -F, 'FNR==1 && NF!=12{print FILENAME; nextfile}' *dim*.csv
I have added the .csv to your wildcard *dim* as you had in the Python version. The -F, of course changes the field separator to a comma from the default space. For every line in each file, AWK checks if the number of fields NF is 12, if it's not, it executes the block of code, otherwise it goes on to the next line. This block prints the FILENAME of the current file AWK is processing, then skips to the beginning of the next file with nextfile.
Try running this AWK version with your subprocess module in Python:
subprocess.call("""awk -F, 'FNR==1 && NF!=12{print FILENAME; nextfile}' *dim*.csv""", shell=True)
The triple quotes makes it a literal string. The output of AWK goes to stdout and I'm assuming you know how to use this in Python with the subprocess module.
Using only Python
Don't forget that Python is itself an expressive and powerful language. If you are already using Python, it may be simpler, easier, and more portable to use only Python instead of a mixture of Python, bash, and AWK.
You can find the names of files (selected from *dim*.csv) with the first line of each file having other than 12 comma-separated fields with:
import glob
files_found = []
for filename in glob.glob('*dim*.csv'):
with open(filename, 'r') as f:
firstline = f.readline()
if len(firstline.split(',')) != 12:
files_found.append(filename)
f.close()
print(files_found)
The glob module gives the listing of files matching the wildcard pattern *dim*.csv. The first line of each of these files is read and split into fields separated by commas. If the number of these fields is not 12, it is added to the list files_found.
I have 500 files to plot and I want to do this automatically. I have the gnuplot script
that does the plotting with the file name hard coded. I would like to have a loop that calls gnuplot every iteration with a different file name, but it does not seem that gnuplot support command line arguments.
Is there an easy way? I also installed the gnuplot-python package in case I can do it via a python script.However, I couldn't find the api so it's a bit difficult to figure out.
Thank you!
You can transform your gnuplot script to a shell script by prepending the lines
#!/bin/sh
gnuplot << EOF
appending the line
EOF
and substituting every $ by \$. Then, you can substitute every occurence of the filename by $1 and call the shell script with the filename as parameter.
Regarding the $'s in Sven Marnach's solution (the lines between EOF are called a "here script" in shell parlance): in my experience, one uses shell variables as usual, but $s that are meant for gnuplot itself must be escaped.
Here is an example:
for distrib in "uniform" "correlated" "clustered"
do
gnuplot << EOF
# gnuplot preamble omitted for brevity
set output "../plots/$distrib-$eps-$points.pdf"
set title "$distrib distribution, eps = $eps, #points = $points"
plot "../plots/dat/$distrib-$eps-$points.dat" using 1:(\$2/$points) title "exact NN"
EOF
done
Note the backslash escaping the dollar so that gnuplot sees it.
A simple way is to generate 500 gnuplot scripts like so:
for filename in list_of_files:
outfile = open(filename + '-script', 'w')
outfile.write(script_text % (filename,))
where script_text is the text of your gnuplot script with the filename replaced with a %s.
I've done this some times. But don't use EOF, because you cannot write on bash inside the << EOF and EOF tags. Depending on the names of the files you can do it in diferent ways.
a) If the filenames are loopable (sort of 1.dat 2.dat 3.dat, etc.)
#!/bin/bash
for((i=0;i<1;i++)) do
echo "plot '-' u 1:2"
for((j=1;j<=3;j++)) do
cat $i.dat
echo "e"
done
done | gnuplot -persist
The first loop is a kind of buffer to feed it all to gnuplot.
b) If the filenames aren't loopable (such as ñlasjkd.dat ajñljd.mov añlsjkd.gif) you first need to move them to a new folder. Then do
#!/bin/bash
ffiles=$(ls | xargs) # a list of the folder's files
# Use the list to pipe all the data to gnuplot using cat
for((i=0;i<1;i++)) do
echo "plot '-' u 1:2 w lp";
cat $ffiles;
echo "e";
done | gnuplot -persist
c) If you want some more, that is: to keep the information of the separated files just on one file... but maintaning the datasheets alive use "index" of gnuplot (if gnuplot reads two black lines guesses that is another datasheet)
#!/bin/bash
ffiles=$(ls|xargs)
ls $ffiles > ffiles.list # A file with the folder's files
while read line
do
cat $line;
echo -e; echo -e;
done < ffiles.list > alldata.dat
# ^-feeding ffiles.list to the while loop
# and writing the file alldata.dat
now you can go to gnuplot and acces to one datasheet
plot 'alldata.dat' index 1 u 1:2
and you will see the first file appearing on the list "ffiles.list". If you whant to see more than one, say 4
plot 'alldata.dat' index 1:4 u 1:2
tricky but easy.
Unless I'm misunderstanding your question, there's an easier way.
In the gnuplot script, replace all occurrences of your filename with $0.
Then in bash, loop over your plot files (let's say they are .txt files in $plotsDir) and call gnuplot:
for f in "$plotsDir/*.txt"; do
echo "call '$f'" | gnuplot
done
Here is one way to do this. Your gnuplot input file, say plot.gp, contains a line
plot "data"
or something like that. Save the lines before this line in plot1.gp and the lines after this line in plot2.gp. Now call gnuplot as
gnuplot plot1.gp -e `plot "data"` plot2.gp
which facilitates passing the name of data on the command line.
I create a file named test.txt containing plot [0:20] x;
I run gnuplot test.txt and I see that gnuplot has indeed read contents of my file, so it does support arguments at runtime.
Different methods of solving this problem are also covered in How to pass command line argument to gnuplot? :
Using -e
Using -c or call with ARG0 like variables (gnuplot >= 5)
Using call with $0 like variables (gnuplot <= 4.6.6)
Using environment variables and system()
Piping the program into gnuplot