Are there anything similar to "perl -pe" option in python? - python

TL;DR I want to know what is an option in Python, which might roughly correspond to the "-pe" option in Perl.
I used to use PERL for some time, but these days, I have been switching to PYTHON for its easiness and simplicity. When I needed to do simple text processing on the shell prompt, I had used perl -pe option. The following is an example.
grep foo *.txt | perl -pe 's/foo/bar/g'
To do a similar thing in Python, could someone explain to me how I can do this?

-pe is a combination of two arguments, -p and -e. The -e option is roughly equivalent to Python's -c option, which lets you specify code to run on the command line. There is no Python equivalent of -p, which effectively adds code to run your passed code in a loop that reads from standard input. To get that, you'll actually have to write the corresponding loop in your code, along with the code that reads from standard input.

Perl, although a fully fledged as programming language, was initially thought, and evolved as, a tool for text manipulation.
Python, on th other hand, has always been a general purpose programing language. It can handle text and text flexibility, with enormous flexibility, when compared with, say Java or C++, but it will do so within its general syntax, without exceptions to those rules, including special syntax for regular expressions, that in absence of other directives become a program in themselves.The same goes for opening, reading and writting files, given their names.
So, you can do that with "python -c ..." to run a Python script from the command line - but your command must be a full program - with beginning, middle and end.
So, for a simple regular expression substitution in several files passed in the stdin, you could try something along:
grep foo *txt| python3 -c "import sys, re, os; [(open(filename + 'new', 'wt').write(re.sub ('foo', 'bar', open(filename).read())), os.rename(filename + "new", filename))for filename in sys.stdin]"

Related

Bash script to read file line by line with some words quoted and containing spaces [duplicate]

This question already has an answer here:
Honoring quotes while reading shell arguments from a file
(1 answer)
Closed 1 year ago.
I am trying to create a shell script that reads a file with parameters for a python script the shell script is supposed to execute in a loop:
#!/bin/bash
while read -r i ; do python catfisher.py $i ; done < fishes
where 'fishes' might contain:
vesper "purplish green" fender
vespa "dimmer grey" "stradivarius veil"
The problem is that the python script's argparser interprets the parameters like so:
python purplish green fender
even when echoing $1 in the bash script outputs:
vesper "purplish green" fender
The python script is fine if run manually:
python catfisher.py "purplish green" fender
so I'm assuming it's my lacking bash script skills that are the culprit, rather than my lacking argparser skills but pray advice if I'm mistaken.
Well, the problem is that to do this 100% correctly, you'd have to define a data format for your fishes file and create a parser for it.
You could approximate your desired results by saying "ok, I'd like anything in fishes to be interpreted like a shell would interpret commands to an argument", but that also entails other metacharacters than spaces and quotes which you already have, such as dollar signs, braces, tildes, parentheses, etc.
Since you are using variable expansion to feed your line to Python, your shell is already past the quoting phase, thus not interpreting quotes contained in the variable.
Unfortunately, there are no mechanisms to selectively carry out shell parsing phases, like you want to do, ignoring dangerous parts.
You can pass the text to a new shell, but that allows your data to contain commands, making your software vulnerable to code injection:
while read -r i ; do bash -c "python catfisher.py $i" ; done < fishes
The best solution would probably be close to the first suggestion, except rather than defining your own data format, you could use an existing one, such as CSV, some dialects of which already allow for quoting, and are widely supported by libraries, like the csv module, built into Python's standard library.

python subprocess multiple commands with win path [duplicate]

This question already has answers here:
How do you activate an Anaconda environment within a Python Script?
(5 answers)
Closed 2 years ago.
I'm trying to trigger the execution of a python script via conda.
I would then capture the output and report it to command prompt where this is executed.
This is basically the concept in the easiest way
wrap.py - wrapper inted to execute multiple times the following script
import subprocess
def wrap():
while True:
cmd1=r"call C:\\Users\\my_user\\anaconda3\\Scripts\\activate.bat"
cmd2=r"cd C:\\myfolder\\mysubfolder"
cmd3=r"C:\\Users\\my_user\\anaconda3\\python.exe C:\\myfolder\\mysubfolder\\test.py"
proc = subprocess.run([cmd1,cmd2,cmd3])
if __name__ == '__main__':
wrap()
test.py - script that has to be executed
def mytest():
print("success")
if __name__ == '__main__':
mytest()
since mytest prints success once, I would like the output of the wrapper (run on anaconda) to be
(base) C:\myfolder\mysubfolder> python wrap.py
success
success
success
...
I tried with
1 - subprocess.Popen
2 - using shell=True or not
3 - using a list ["first command","second command","third command"] or a single string "first;second;third"
4 - using or removing "r" in front of the string, here the blanks are breaking the game
5 - using single or double ""
6- in my_user the underscore is also resulting in an encoding error
I actually tried to replicate at least 20 different stackoverflow "solutions" but none of them really worked for me. I also read properly the subprocessing page of python documentation, but this didn't help.
Any hint is appreciated, I'm lost.
The syntax subprocess.run([cmd1, cmd2, cmd3]) means run cmd1 with cmd2 and cmd3 as command-line arguments to cmd1. You instead want to execute a single sequence of shell commands; several of the things you are trying to do here require the shell, so you do want shell=True, which dictates the use of a single string as input, rather than a list consisting of a command and its arguments.
(Windows has some finicky processing behind the scenes which makes it not completely impossible to use a list of strings as the first argument with shell=True; but this really isn't portable or obvious. Just don't.)
Regarding the requirement for shell=True here, commands like call and cd (and source or . in Bourne-family shells) are shell built-ins which do not exist as separate binaries; if you don't have shell=True you will simply get "command not found" or your local equivalent. (Under other circumstances, you should generally avoid shell=True when you can. But this is not one of those cases; here, it really is unavoidable without major code changes.)
If your shell is cmd I guess the command might look like
subprocess.run(
r"call C:\Users\my_user\anaconda3\Scripts\activate.bat & C:\Users\my_user\anaconda3\python.exe C:\myfolder\mysubfolder\test.py",
shell=True)
or equivalently the same without r before the string and with all backslashes doubled; the only difference between an r"..." string and a regular "..." string is how the former allows you to put in literal backslashes, whereas the latter requires you to escape them; in the former case, everything in the string is literal, whereas in the latter case, you can use symbolic notations like \n for a newline character, \t for tab, etc.
In Python, it doesn't really matter whether you use single or double quotes; you can switch between them freely, obviously as long as you use the same opening and closing quotes. If you need literal single quotes in the string, use double quotes so you don't have to backslash-escape the literal quote, and vice versa. There's also the triple-quoted string which accepts either quoting character, but is allowed to span multiple lines, i.e. contain literal newlines without quoting them.
If your preferred shell is sh or bash, the same syntax would look like
subprocess.run(r"""
source C:\Users\my_user\anaconda3\Scripts\activate.bat &&
C:\Users\my_user\anaconda3\python.exe C:\myfolder\mysubfolder\test.py""",
shell=True)
I left out the cd in both cases because nothing in your code seems to require the subprocess to run in a particular directory. If you do actually have that requirement, you can add cwd=r'C:\myfolder\mysubfolder' after shell=True to run the entire subprocess in a separate directory.
There are situations where the facilities of subprocess.run() are insufficient, and you need to drop down to bare subprocess.Popen() and do the surrounding plumbing yourself; but this emphatically is not one of those scenarios. You should stay far away from Popen() if you can, especially if your understanding of subprocesses is not very sophisticated.

running python in an argument of a program & how to give hex when string expected

I've come across this command and want to understand how it works. program is just simple c program command for mac or linux.
./program `python -c 'print "\xC8\xCE\xC5\x06"'`
1) Can someone explain how this command works?
2) is this the only way to give a hex value to a program when string is expected?
This is a way of evaluating python expressions from the command line in bash. It has nothing to do with C. The only python code here is print "\xC8\xCE\xC5\x06". The rest is bash code.
You can try this command in bash python -c "print 'Hello World'"
You can also read man python for more information on python command line flags.
In python strings \xHH is used to translate a hex value into characters. In this case u"\xC8\xCE\xC5" == u"ÈÎÅ". If you don't use unicode strings, the output will be some non-ascii characters, which program might make sense of, but that cannot be entered or printed in a regular bash session. program might not care if the string is printable, and instead just treat it as binary data.
The backticks in bash will run the enclosed command first, and then use the string as a regular bash expression in the parent scope. Another way of doing this in bash would be this:
./program $(python -c "print '\xC8\xCE\xC5\x06'")
To answer you second question: There are other ways of doing this as well. You could probably use printf instead of python. Like this:
./program $(printf "\xC8\xCE\xC5\x06")

Passing arguments to a Python script from a shell variable containing quoted spaces

I am calling a python script through a bash wrapper, but I'm having trouble dealing with arguments that contain quoted spaces.
I assemble the arguments to the python script into a bash variable, such as
opt="-c start.txt"
opt+="--self 'name Na'"
Then call the python script with something like:
python test_args.py $opt
When printing sys.argv in Python, I get
['test-args.py', '-c', 'start.txt', '--self', "'name", "Na'"]
instead of the expected
['test-args.py', '-c', 'start.txt', '--self', 'name Na']
I tried using an array when calling the script, such as
python test_args.py ${opt[#]}
but then I get
['test-args.py', "-c start.txt --self 'name Na'"]
Any other ideas?
Use an array, but store each argument as a separate element in the array:
opt=(-c start.txt)
opt+=(--self 'name Na')
python test_args.py "${opt[#]}"
See BashFAQ #050.
This is what the shlex module is for.
The shlex class makes it easy to write lexical analyzers for simple
syntaxes resembling that of the Unix shell. This will often be useful
for writing minilanguages, (for example, in run control files for
Python applications) or for parsing quoted strings.
Your instinct to embed spaces inside the variable's value was good, but when the value is simply expanded during the command line parsing their special meaning is lost as you saw. You need to expand the variable before the command line to your python script is parsed:
set -f
eval python test_args.py $opt
set +f
That will expand to:
python test_args.py -c start.txt --self 'name Na'
Which will then be parsed correctly with the quotes regaining their special meaning.
Edit: I've added set -f/+f (aka -/+o noglob) around the eval to disable file globbing although that wasn't an issue in the OP's example that's not an unheard of issue with eval. (Another, stronger caveat is to never eval user input unless you take extreme care to make sure it won't blow up into something nasty. If you don't control the value being eval-ed, you can't be sure what will happen.)

Maximum characters that can be stuffed into raw_input() in Python

For an InterviewStreet challenge, we have to be able to accomodate for a 10,000 character String input from the keyboard, but when I copy/paste a 10k long word into my local testing, it cuts off at a thousand or so.
What's the official limit in Python? And is there a way to change this?
Thanks guys
Here's the challenge by-the-by:
http://www.interviewstreet.com/recruit/challenges/solve/view/4e1491425cf10/4edb8abd7cacd
Are you sure of the fact that your 10k long word doesn't contain newlines?
raw_input([prompt])
If the prompt argument is present, it is written to standard output without a trailing newline. The function then reads a line from input, converts it to a string (stripping a trailing newline), and returns that. When EOF is read, EOFError is raised.
...
If the readline module was loaded, then raw_input() will use it to provide elaborate line editing and history features.
There is no maximum limit (in python) of the buffer returned by raw_input, and as I tested some big length of input to stdin I could not reproduce your result. I tried to search the web for information regarding this but came up with nothing that would help me answer your question.
my tests
:/tmp% python -c 'print "A"*1000000' | python -c 'print len (raw_input ())';
1000000
:/tmp% python -c 'print "A"*210012300' | python -c 'print len (raw_input ())';
210012300
:/tmp% python -c 'print "A"*100+"\n"+"B"*100' | python -c 'print len (raw_input ())';
100
I had this same experience, and found python limits the length of input to raw_input if you do not import the readline module. Once I imported the readline module, it lifted the limit (or at least raised it significantly enough to where the text I was using worked just fine). This was on my Mac with Python 2.7.15. Additionally, it’s been confirmed working on at least 3.9.5.
I guess this is part of the challenges. The faq suggest raw_input() might not be the optimal approach:
The most common (possibly naive) methods are listed below. (...)
There are indeed Python standard modules helping to handle system input/output.

Categories

Resources