I need to pick some numbers out of some text files. I can pick out the lines I need with grep, but didn't know how to extract the numbers from the lines. A colleague showed me how to do this from bash with perl:
cat results.txt | perl -pe 's/.+(\d\.\d+)\.\n/\1 /'
However, I usually code in Python, not Perl. So my question is, could I have used Python in the same way? I.e., could I have piped something from bash to Python and then gotten the result straight to stdout? ... if that makes sense. Or is Perl just more convenient in this case?
Yes, you can use Python from the command line. python -c <stuff> will run <stuff> as Python code. Example:
python -c "import sys; print sys.path"
There isn't a direct equivalent to the -p option for Perl (the automatic input/output line-by-line processing), but that's mostly because Python doesn't use the same concept of $_ and whatnot that Perl does - in Python, all input and output is done manually (via raw_input()/input(), and print/print()).
For your particular example:
cat results.txt | python -c "import re, sys; print ''.join(re.sub(r'.+(\d\.\d+)\.\n', r'\1 ', line) for line in sys.stdin)"
(Obviously somewhat more unwieldy. It's probably better to just write the script to do it in actual Python.)
You can use:
$ python -c '<your code here>'
You can in theory, but Python doesn't have anywhere near as much regex magic that Perl does, so the resulting command will be much more unwieldy, especially as you can't use regular expressions without importing re (and you'll probably need sys for sys.stdin too).
The Python equivalent of your colleague's Perl one-liner is approximately:
import sys, re
for line in sys.stdin:
print re.sub(r'.+(\d\.\d+)\.\n', r'\1 ', line)
You have a problem which can be solved several ways.
I think you should consider using regular expression (what perl is doing in your example) directly from Python. Regular expressions are in the re module. An example would be:
import re
filecontent = open('somefile.txt').read()
print re.findall('.+(\d\.\d+)\.$', filecontent)
(I would prefer using $ instead of '\n' for line endings, because line endings are different between operational systems and file encodings)
If you want to call bash commands from inside Python, you could use:
import os
os.system(mycommand)
Where command is the bash command. I use it all the time, because some operations are better to perform in bash than in Python.
Finally, if you want to extract the numbers with grep, use the -o option, which prints only the matched part.
Perl (or sed) is more convenient. However it is possible, if ugly:
python -c 'import sys, re; print "\n".join(re.sub(".+(\d\.\d+)\.\n","\1 ", l) for l in sys.stdin)'
Quoting from https://stackoverflow.com/a/12259852/411282:
for ln in __import__("fileinput").input(): print ln.rstrip()
See the explanation linked above, but this does much more of what perl -p does, including support for multiple file names and stdin when no filename is given.
https://docs.python.org/3/library/fileinput.html#fileinput.input
You can use python to execute code directly from your bash command line, by using python -c, or you can process input piped to stdin using sys.stdin, see here.
Related
I would like to retrieve output from a shell command that contains spaces and quotes. It looks like this:
import subprocess
cmd = "docker logs nc1 2>&1 |grep mortality| awk '{print $1}'|sort|uniq"
subprocess.check_output(cmd)
This fails with "No such file or directory". What is the best/easiest way to pass commands such as these to subprocess?
The absolutely best solution here is to refactor the code to replace the entire tail of the pipeline with native Python code.
import subprocess
from collections import Counter
s = subprocess.run(
["docker", "logs", "nc1"],
text=True, capture_output=True, check=True)
count = Counter()
for line in s.stdout.splitlines():
if "mortality" in line:
count[line.split()[0]] += 1
for count, word in count.most_common():
print(count, word)
There are minor differences in how Counter objects resolve ties (if two words have the same count, the one which was seen first is returned first, rather than by sort order), but I'm guessing that's unimportant here.
I am also ignoring standard output from the subprocess; if you genuinely want to include output from error messages, too, just include s.stderr in the loop driver too.
However, my hunch is that you don't realize your code was doing that, which drives home the point nicely: Mixing shell script and Python raises the mainainability burden, because now you have to understand both shell script and Python to understand the code.
(And in terms of shell script style, I would definitely get rid of the useless grep by refactoring it into the Awk script, and probably also fold in the sort | uniq which has a trivial and more efficient replacement in Awk. But here, we are replacing all of that with Python code anyway.)
If you really wanted to stick to a pipeline, then you need to add shell=True to use shell features like redirection, pipes, and quoting. Without shell=True, Python looks for a command whose file name is the entire string you were passing in, which of course doesn't exist.
I'm trying to use Python to extract info from some JSON (on a system where I can't install jq). My current approach runs afoul of the syntax restrictions described in Why can't use semi-colon before for loop in Python?. How can I modify this code to still work in light of this limitation?
My current code looks like the following:
$ SHIFT=$(aws ec2 describe-images --region "$REGION" --filters "Name=tag:Release,Values=$RELEASE_CODE_1.2003.2")
$ echo "$SHIFT" | python -c "import sys, json; for image in json.load(sys.stdin)['Images']: print image['ImageId'];"
File "<string>", line 1
import sys, json; for image in json.load(sys.stdin)['Images']: print image['ImageId'];
^
SyntaxError: invalid syntax
Since Python's syntax doesn't allow a for loop to be separated from a prior command with a semicolon, how can I work around this limitation?
There are several options here:
Pass your code as a multi-line string. Note that " is used to delimit Python strings rather than the original ' here for the sake of simplicity: A POSIX-compatible mechanism to embed a literal ' in a single-quoted string is possible, but quite ugly.
extractImageIds() {
python -c '
import sys, json
for image in json.load(sys.stdin)["Images"]:
print image["ImageId"]
' "$#"
}
Use bash's C-style escaped string syntax ($'') to embed newlines, as with $'\n'. Note that the leading $ is critical, and that this doesn't work with /bin/sh. See the bash-hackers' wiki on ANSI C-like strings for details.
extractImageIds() { python -c $'import sys, json\nfor image in json.load(sys.stdin)["Images"]:\n\tprint image["ImageId"]' "$#"; }
Use __import__() to avoid the need for a separate import command.
extractImageIds() { python -c 'for image in __import__("json").load(__import__("sys").stdin)["Images"]: print image["ImageId"]' "$#"; }
Pass the code on stdin and move the input onto argv; note that this only works if the input doesn't overwhelm your operating system's allowed maximum command-line size. Consider the following example:
extractImageIds() {
# capture function's input to a variable
local input=$(</dev/stdin) || return
# ...and expand that variable on the Python interpreter's command line
python - "$input" "$#" <<'EOF'
import sys, json
for image in json.loads(sys.argv[1])["Images"]:
print image["ImageId"]
EOF
}
Note that $(</dev/stdin) is a more efficient bash-only alternative to $(cat); due to shell builtin support, it works even on operating systems where /dev/stdin doesn't exist as a file.
All of these have been tested as follows:
extractImageIds <<<'{"Images": [{"ImageId": "one"}, {"ImageId": "two"}]}'
To efficiently provide stdin from a variable, one could run extractImageIds <<<"$variable" instead. Note that the "$#" elements in the wrapper are there to ensure that sys.argv is populated with arguments to the shell function -- where sys.argv isn't referenced by the Python code being run, this syntax is optional.
I am trying to run some piece of Python code in a Bash script, so i wanted to understand what is the difference between:
#!/bin/bash
#your bash code
python -c "
#your py code
"
vs
python - <<DOC
#your py code
DOC
I checked the web but couldn't compile the bits around the topic. Do you think one is better over the other?
If you wanted to return a value from Python code block to your Bash script then is a heredoc the only way?
The main flaw of using a here document is that the script's standard input will be the here document. So if you have a script which wants to process its standard input, python -c is pretty much your only option.
On the other hand, using python -c '...' ties up the single-quote for the shell's needs, so you can only use double-quoted strings in your Python script; using double-quotes instead to protect the script from the shell introduces additional problems (strings in double-quotes undergo various substitutions, whereas single-quoted strings are literal in the shell).
As an aside, notice that you probably want to single-quote the here-doc delimiter, too, otherwise the Python script is subject to similar substitutions.
python - <<'____HERE'
print("""Look, we can have double quotes!""")
print('And single quotes! And `back ticks`!')
print("$(and what looks to the shell like process substitutions and $variables!)")
____HERE
As an alternative, escaping the delimiter works identically, if you prefer that (python - <<\____HERE)
If you are using bash, you can avoid heredoc problems if you apply a little bit more of boilerplate:
python <(cat <<EoF
name = input()
print(f'hello, {name}!')
EoF
)
This will let you run your embedded Python script without you giving up the standard input. The overhead is mostly the same of using cmda | cmdb. This technique is known as Process Substitution.
If want to be able to somehow validate the script, I suggest that you dump it to a temporary file:
#!/bin/bash
temp_file=$(mktemp my_generated_python_script.XXXXXX.py)
cat > $temp_file <<EoF
# embedded python script
EoF
python3 $temp_file && rm $temp_file
This will keep the script if it fails to run.
If you prefer to use python -c '...' without having to escape with the double-quotes you can first load the code in a bash variable using here-documents:
read -r -d '' CMD << '--END'
print ("'quoted'")
--END
python -c "$CMD"
The python code is loaded verbatim into the CMD variable and there's no need to escape double quotes.
How to use here-docs with input
tripleee's answer has all the details, but there's Unix tricks to work around this limitation:
So if you have a script which wants to process its standard input, python -c is pretty much your only option.
This trick applies to all programs that want to read from a redirected stdin (e.g., ./script.py < myinputs) and also take user input:
python - <<'____HERE'
import os
os.dup2(1, 0)
print(input("--> "))
____HERE
Running this works:
$ bash heredocpy.sh
--> Hello World!
Hello World!
If you want to get the original stdin, run os.dup(0) first. Here is a real-world example.
This works because as long as either stdout or stderr are a tty, one can read from them as well as write to them. (Otherwise, you could just open /dev/tty. This is what less does.)
In case you want to process inputs from a file instead, that's possible too -- you just have to use a new fd:
Example with a file
cat <<'____HERE' > file.txt
With software there are only two possibilites:
either the users control the programme
or the programme controls the users.
____HERE
python - <<'____HERE' 4< file.txt
import os
for line in os.fdopen(4):
print(line.rstrip().upper())
____HERE
Example with a command
Unfortunately, pipelines don't work here -- but process substitution does:
python - <<'____HERE' 4< <(fortune)
import os
for line in os.fdopen(4):
print(line.rstrip().upper())
____HERE
I have this awk/sed command
awk '{full=full$0}END{print full;}' initial.xml | sed 's|</Product>|</Product>\
|g' > final.xml
to break an XML doc containing large number of tags
such that the new file will have all contents of the product node in a single line
I am trying to run it using os.system and subprocess module however this is wrapping all the contents of the file into one line.
Can anyone convert it into equivalent python script?
Thanks!
Something like this?
from __future__ import print_function
import fileinput
for line in fileinput.input('initial.xml'):
print(line.rstrip('\n').replace('</Product>','</Product>\n'),end='')
I'm using the print function because the default print in Python 2.x will add a space or newline after each set of output. There are various other ways to work around that, some of which involve buffering your output before printing it.
For the record, your problem could equally well be solved in just a simple Awk script.
awk '{ gsub(/<Product>/,"&\n"); printf $0 }' initial.xml
Printing output as it arrives without a trailing newline is going to be a lot more efficient than buffering the whole file and then printing it at the end, and of course, Awk has all the necessary facilities to do the substition as well. (gsub is not available in all dialects of Awk, though.)
I have a python file (/home/test.py) that has a mixture of spaces and tabs in it.
Is there a programmatic way (note: programmatic, NOT using an editor) to convert this file to use only tabs? (meaning, replace any existing 4-spaces occurrences with a single tab)?
Would be grateful for either a python code sample or a linux command to do the above. Thanks.
Sounds like a task for sed:
sed -e 's/ /\t/g' test.py > test.new
[Put a real tab instead of \t]
However...
Use 4 spaces per indentation level.
--PEP 8 -- Style Guide for Python Code
you can try iterating the file and doing replacing eg
import fileinput
for line in fileinput.FileInput("file",inplace=1):
print line.replace(" ","\t")
or you can try a *nix tool like sed/awk
$ awk '{gsub(/ /,"\t")}1' file > temp && mv temp file
$ ruby -i.bak -ne '$_.gsub!(/ /,"\t")' file