Un-escape spaces with Python pathlib [duplicate]

Un-escape spaces with Python pathlib [duplicate] - python

I have command line arguments in a string and I need to split it to feed to argparse.ArgumentParser.parse_args.
I see that the documentation uses string.split() plentifully. However in complex cases, this does not work, such as
--foo "spaces in brakets" --bar escaped\ spaces
Is there a functionality to do that in python?
(A similar question for java was asked here).

This is what shlex.split was created for.

If you're parsing a windows-style command line, then shlex.split doesn't work correctly - calling subprocess functions on the result will not have the same behavior as passing the string directly to the shell.
In that case, the most reliable way to split a string like the command-line arguments to python is... to pass command line arguments to python:
import sys
import subprocess
import shlex
import json # json is an easy way to send arbitrary ascii-safe lists of strings out of python
def shell_split(cmd):
"""
Like `shlex.split`, but uses the Windows splitting syntax when run on Windows.
On windows, this is the inverse of subprocess.list2cmdline
"""
if os.name == 'posix':
return shlex.split(cmd)
else:
# TODO: write a version of this that doesn't invoke a subprocess
if not cmd:
return []
full_cmd = '{} {}'.format(
subprocess.list2cmdline([
sys.executable, '-c',
'import sys, json; print(json.dumps(sys.argv[1:]))'
]), cmd
)
ret = subprocess.check_output(full_cmd).decode()
return json.loads(ret)
One example of how these differ:
# windows does not treat all backslashes as escapes
>>> shell_split(r'C:\Users\me\some_file.txt "file with spaces"', 'file with spaces')
['C:\\Users\\me\\some_file.txt', 'file with spaces']
# posix does
>>> shlex.split(r'C:\Users\me\some_file.txt "file with spaces"')
['C:Usersmesome_file.txt', 'file with spaces']
# non-posix does not mean Windows - this produces extra quotes
>>> shlex.split(r'C:\Users\me\some_file.txt "file with spaces"', posix=False)
['C:\\Users\\me\\some_file.txt', '"file with spaces"']

You could use the split_arg_string helper function from the click package:
import re
def split_arg_string(string):
"""Given an argument string this attempts to split it into small parts."""
rv = []
for match in re.finditer(r"('([^'\\]*(?:\\.[^'\\]*)*)'"
r'|"([^"\\]*(?:\\.[^"\\]*)*)"'
r'|\S+)\s*', string, re.S):
arg = match.group().strip()
if arg[:1] == arg[-1:] and arg[:1] in '"\'':
arg = arg[1:-1].encode('ascii', 'backslashreplace') \
.decode('unicode-escape')
try:
arg = type(string)(arg)
except UnicodeError:
pass
rv.append(arg)
return rv
For example:
>>> print split_arg_string('"this is a test" 1 2 "1 \\" 2"')
['this is a test', '1', '2', '1 " 2']
The click package is starting to dominate for command-arguments parsing, but I don't think it supports parsing arguments from string (only from argv). The helper function above is used only for bash completion.
Edit: I can nothing but recommend to use the shlex.split() as suggested in the answer by #ShadowRanger. The only reason I'm not deleting this answer is because it provides a little bit faster splitting then the full-blown pure-python tokenizer used in shlex (around 3.5x faster for the example above, 5.9us vs 20.5us). However, this shouldn't be a reason to prefer it over shlex.

Related

For some reason sys.argv() is not accepting comman lines. For example I type in hello in the command line sys.argv only show the filename [duplicate]

How do I read from stdin? Some code golf challenges require using stdin for input.

Use the fileinput module:
import fileinput
for line in fileinput.input():
pass
fileinput will loop through all the lines in the input specified as file names given in command-line arguments, or the standard input if no arguments are provided.
Note: line will contain a trailing newline; to remove it use line.rstrip().

There's a few ways to do it.
sys.stdin is a file-like object on which you can call functions read or readlines if you want to read everything or you want to read everything and split it by newline automatically. (You need to import sys for this to work.)
If you want to prompt the user for input, you can use raw_input in Python 2.X, and just input in Python 3.
If you actually just want to read command-line options, you can access them via the sys.argv list.
You will probably find this Wikibook article on I/O in Python to be a useful reference as well.

import sys
for line in sys.stdin:
print(line)
Note that this will include a newline character at the end. To remove the newline at the end, use line.rstrip() as #brittohalloran said.

Python also has built-in functions input() and raw_input(). See the Python documentation under Built-in Functions.
For example,
name = raw_input("Enter your name: ") # Python 2.x
or
name = input("Enter your name: ") # Python 3

Here's from Learning Python:
import sys
data = sys.stdin.readlines()
print "Counted", len(data), "lines."
On Unix, you could test it by doing something like:
% cat countlines.py | python countlines.py
Counted 3 lines.
On Windows or DOS, you'd do:
C:\> type countlines.py | python countlines.py
Counted 3 lines.

How do you read from stdin in Python?
I'm trying to do some of the code golf challenges, but they all require the input to be taken from stdin. How do I get that in Python?
You can use:
sys.stdin - A file-like object - call sys.stdin.read() to read everything.
input(prompt) - pass it an optional prompt to output, it reads from stdin up to the first newline, which it strips. You'd have to do this repeatedly to get more lines, at the end of the input it raises EOFError. (Probably not great for golfing.) In Python 2, this is raw_input(prompt).
open(0).read() - In Python 3, the builtin function open accepts file descriptors (integers representing operating system IO resources), and 0 is the descriptor of stdin. It returns a file-like object like sys.stdin - probably your best bet for golfing. In Python 2, this is io.open.
open('/dev/stdin').read() - similar to open(0), works on Python 2 and 3, but not on Windows (or even Cygwin).
fileinput.input() - returns an iterator over lines in all files listed in sys.argv[1:], or stdin if not given. Use like ''.join(fileinput.input()).
Both sys and fileinput must be imported, respectively, of course.
Quick sys.stdin examples compatible with Python 2 and 3, Windows, Unix
You just need to read from sys.stdin, for example, if you pipe data to stdin:
$ echo foo | python -c "import sys; print(sys.stdin.read())"
foo
We can see that sys.stdin is in default text mode:
>>> import sys
>>> sys.stdin
<_io.TextIOWrapper name='<stdin>' mode='r' encoding='UTF-8'>
file example
Say you have a file, inputs.txt, we can accept that file and write it back out:
python -c "import sys; sys.stdout.write(sys.stdin.read())" < inputs.txt
Longer answer
Here's a complete, easily replicable demo, using two methods, the builtin function, input (use raw_input in Python 2), and sys.stdin. The data is unmodified, so the processing is a non-operation.
To begin with, let's create a file for inputs:
$ python -c "print('foo\nbar\nbaz')" > inputs.txt
And using the code we've already seen, we can check that we've created the file:
$ python -c "import sys; sys.stdout.write(sys.stdin.read())" < inputs.txt
foo
bar
baz
Here's the help on sys.stdin.read from Python 3:
read(size=-1, /) method of _io.TextIOWrapper instance
Read at most n characters from stream.
Read from underlying buffer until we have n characters or we hit EOF.
If n is negative or omitted, read until EOF.
Builtin function, input (raw_input in Python 2)
The builtin function input reads from standard input up to a newline, which is stripped (complementing print, which adds a newline by default.) This occurs until it gets EOF (End Of File), at which point it raises EOFError.
Thus, here's how you can use input in Python 3 (or raw_input in Python 2) to read from stdin - so we create a Python module we call stdindemo.py:
$ python -c "print('try:\n while True:\n print(input())\nexcept EOFError:\n pass')" > stdindemo.py
And let's print it back out to ensure it's as we expect:
$ python -c "import sys; sys.stdout.write(sys.stdin.read())" < stdindemo.py
try:
while True:
print(input())
except EOFError:
pass
Again, input reads up until the newline and essentially strips it from the line. print adds a newline. So while they both modify the input, their modifications cancel. (So they are essentially each other's complement.)
And when input gets the end-of-file character, it raises EOFError, which we ignore and then exit from the program.
And on Linux/Unix, we can pipe from cat:
$ cat inputs.txt | python -m stdindemo
foo
bar
baz
Or we can just redirect the file from stdin:
$ python -m stdindemo < inputs.txt
foo
bar
baz
We can also execute the module as a script:
$ python stdindemo.py < inputs.txt
foo
bar
baz
Here's the help on the builtin input from Python 3:
input(prompt=None, /)
Read a string from standard input. The trailing newline is stripped.
The prompt string, if given, is printed to standard output without a
trailing newline before reading input.
If the user hits EOF (*nix: Ctrl-D, Windows: Ctrl-Z+Return), raise EOFError.
On *nix systems, readline is used if available.
sys.stdin
Here we make a demo script using sys.stdin. The efficient way to iterate over a file-like object is to use the file-like object as an iterator. The complementary method to write to stdout from this input is to simply use sys.stdout.write:
$ python -c "print('import sys\nfor line in sys.stdin:\n sys.stdout.write(line)')" > stdindemo2.py
Print it back out to make sure it looks right:
$ python -c "import sys; sys.stdout.write(sys.stdin.read())" < stdindemo2.py
import sys
for line in sys.stdin:
sys.stdout.write(line)
And redirecting the inputs into the file:
$ python -m stdindemo2 < inputs.txt
foo
bar
baz
Golfed into a command:
$ python -c "import sys; sys.stdout.write(sys.stdin.read())" < inputs.txt
foo
bar
baz
File Descriptors for Golfing
Since the file descriptors for stdin and stdout are 0 and 1 respectively, we can also pass those to open in Python 3 (not 2, and note that we still need the 'w' for writing to stdout).
If this works on your system, it will shave off more characters.
$ python -c "open(1,'w').write(open(0).read())" < inputs.txt
baz
bar
foo
Python 2's io.open does this as well, but the import takes a lot more space:
$ python -c "from io import open; open(1,'w').write(open(0).read())" < inputs.txt
foo
bar
baz
Addressing other comments and answers
One comment suggests ''.join(sys.stdin) for golfing but that's actually longer than sys.stdin.read() - plus Python must create an extra list in memory (that's how str.join works when not given a list) - for contrast:
''.join(sys.stdin)
sys.stdin.read()
The top answer suggests:
import fileinput
for line in fileinput.input():
pass
But, since sys.stdin implements the file API, including the iterator protocol, that's just the same as this:
import sys
for line in sys.stdin:
pass
Another answer does suggest this. Just remember that if you do it in an interpreter, you'll need to do Ctrl-d if you're on Linux or Mac, or Ctrl-z on Windows (after Enter) to send the end-of-file character to the process. Also, that answer suggests print(line) - which adds a '\n' to the end - use print(line, end='') instead (if in Python 2, you'll need from __future__ import print_function).
The real use-case for fileinput is for reading in a series of files.

The answer proposed by others:
for line in sys.stdin:
print line
is very simple and pythonic, but it must be noted that the script will wait until EOF before starting to iterate on the lines of input.
This means that tail -f error_log | myscript.py will not process lines as expected.
The correct script for such a use case would be:
while 1:
try:
line = sys.stdin.readline()
except KeyboardInterrupt:
break
if not line:
break
print line
UPDATE
From the comments it has been cleared that on python 2 only there might be buffering involved, so that you end up waiting for the buffer to fill or EOF before the print call is issued.

This will echo standard input to standard output:
import sys
line = sys.stdin.readline()
while line:
print line,
line = sys.stdin.readline()

Building on all the anwers using sys.stdin, you can also do something like the following to read from an argument file if at least one argument exists, and fall back to stdin otherwise:
import sys
f = open(sys.argv[1]) if len(sys.argv) > 1 else sys.stdin
for line in f:
# Do your stuff
and use it as either
$ python do-my-stuff.py infile.txt
or
$ cat infile.txt | python do-my-stuff.py
or even
$ python do-my-stuff.py < infile.txt
That would make your Python script behave like many GNU/Unix programs such as cat, grep and sed.

argparse is an easy solution
Example compatible with both Python versions 2 and 3:
#!/usr/bin/python
import argparse
import sys
parser = argparse.ArgumentParser()
parser.add_argument('infile',
default=sys.stdin,
type=argparse.FileType('r'),
nargs='?')
args = parser.parse_args()
data = args.infile.read()
You can run this script in many ways:
1. Using stdin
echo 'foo bar' | ./above-script.py
  or shorter by replacing echo by here string:
./above-script.py <<< 'foo bar'
2. Using a filename argument
echo 'foo bar' > my-file.data
./above-script.py my-file.data
3. Using stdin through the special filename -
echo 'foo bar' | ./above-script.py -

The following chip of code will help you (it will read all of stdin blocking unto EOF, into one string):
import sys
input_str = sys.stdin.read()
print input_str.split()

I am pretty amazed no one had mentioned this hack so far:
python -c "import sys; set(map(sys.stdout.write,sys.stdin))"
in python2 you can drop the set() call, but it would work either way

Try this:
import sys
print sys.stdin.read().upper()
and check it with:
$ echo "Hello World" | python myFile.py

You can read from stdin and then store inputs into "data" as follows:
data = ""
for line in sys.stdin:
data += line

Read from sys.stdin, but to read binary data on Windows, you need to be extra careful, because sys.stdin there is opened in text mode and it will corrupt \r\n replacing them with \n.
The solution is to set mode to binary if Windows + Python 2 is detected, and on Python 3 use sys.stdin.buffer.
import sys
PY3K = sys.version_info >= (3, 0)
if PY3K:
source = sys.stdin.buffer
else:
# Python 2 on Windows opens sys.stdin in text mode, and
# binary data that read from it becomes corrupted on \r\n
if sys.platform == "win32":
# set sys.stdin to binary mode
import os, msvcrt
msvcrt.setmode(sys.stdin.fileno(), os.O_BINARY)
source = sys.stdin
b = source.read()

I use the following method, it returns a string from stdin (I use it for json parsing).
It works with pipe and prompt on Windows (not tested on Linux yet).
When prompting, two line breaks indicate end of input.
def get_from_stdin():
lb = 0
stdin = ''
for line in sys.stdin:
if line == "\n":
lb += 1
if lb == 2:
break
else:
lb = 0
stdin += line
return stdin

For Python 3 that would be:
# Filename e.g. cat.py
import sys
for line in sys.stdin:
print(line, end="")
This is basically a simple form of cat(1), since it doesn't add a newline after each line. You can use this (after You marked the file executable using chmod +x cat.py such as:
echo Hello | ./cat.py

Since Python 3.8 you can use assignment expression:
while (line := input()):
print(line)

The problem I have with solution
import sys
for line in sys.stdin:
print(line)
is that if you don't pass any data to stdin, it will block forever. That's why I love this answer: check if there is some data on stdin first, and then read it. This is what I ended up doing:
import sys
import select
# select(files to read from, files to write to, magic, timeout)
# timeout=0.0 is essential b/c we want to know the asnwer right away
if select.select([sys.stdin], [], [], 0.0)[0]:
help_file_fragment = sys.stdin.read()
else:
print("No data passed to stdin", file=sys.stderr)
sys.exit(2)

When using -c command, as a tricky way, instead of reading the stdin (and more flexible in some cases) you can pass a shell script command as well to your python command by putting the shell command in quotes within a parenthesis started by $ sign.
e.g.
python3 -c "import sys; print(len(sys.argv[1].split('\n')))" "$(cat ~/.goldendict/history)"
This will count the number of lines from goldendict's history file.

I had some issues when getting this to work for reading over sockets piped to it. When the socket got closed it started returning empty string in an active loop. So this is my solution to it (which I only tested in linux, but hope it works in all other systems)
import sys, os
sep=os.linesep
while sep == os.linesep:
data = sys.stdin.readline()
sep = data[-len(os.linesep):]
print '> "%s"' % data.strip()
So if you start listening on a socket it will work properly (e.g. in bash):
while :; do nc -l 12345 | python test.py ; done
And you can call it with telnet or just point a browser to localhost:12345

Regarding this:
for line in sys.stdin:
I just tried it on python 2.7 (following someone else's suggestion) for a very large file, and I don't recommend it, precisely for the reasons mentioned above (nothing happens for a long time).
I ended up with a slightly more pythonic solution (and it works on bigger files):
with open(sys.argv[1], 'r') as f:
for line in f:
Then I can run the script locally as:
python myscript.py "0 1 2 3 4..." # can be a multi-line string or filename - any std.in input will work

There is
os.read(0, x)
which reads xbytes from 0 which represents stdin. This is an unbuffered read, more low level than sys.stdin.read()

Nonblocking, bytemode, stdin -> stdout:
# pipe.py
import os, sys, time
os.set_blocking(0, False)
sys.stdin = os.fdopen(0, 'rb', 0)
sys.stdout = os.fdopen(1, 'wb', 0)
while 1:
time.sleep(.1)
try: out = sys.stdin.read()
except:
sys.stdout.write(b"E")
continue
if out is None:
sys.stdout.write(b"N")
continue
if not out:
sys.stdout.write(b"_")
break
# working..
out = b"<" + out + b">"
sys.stdout.write(out)
sys.stdout.write(b".\n")
Usage:
$ for i in 1 2 3; do sleep 1; printf "===$i==="; done | python3 pipe.py
NNNNNNNNN<===1===>NNNNNNNNN<===2===>NNNNNNNNN<===3===>_.
Minimal code:
import os, sys
os.set_blocking(0, False)
fd0 = os.fdopen(0, 'rb', 0)
fd1 = os.fdopen(1, 'wb', 0)
while 1:
bl = fd0.read()
if bl is None: continue
if not bl: break
fd1.write(bl)
Tested on Linux, Python 3.9.2

Worth saying for short command line chaining input is preferred over fileinput and sys.stdin as it requires no import, and is shorter to type.
$ echo hello word | python3 -c "print(input().upper())"
HELLO WORD

Whitespace in string escape in batch file

I am unable to run my batch command due to character escaping sequence issue.
Input from python:
import subprocess
print(data) ==> --i "test - testing"
subprocess.call(["c:/foo/boo/file.bat", data])
Batch file:
SET #tt=%1
output:-
C:\foo\boo>SET #tt=" --i \"test
Expected:-
C:\foo\boo>SET #tt=--i "test - testing"
Is there a way to escape white space to pass the actual input in batch file? Kindly suggest.

The proper quoting and escaping of commands can be tricky. Python has a function in a module to help make this easier: shlex.split: ❝Split the string s using shell-like syntax.❞
Documentation: shlex.split
I don't have a way to test your code. Here is an example of what I think you're trying to achieve.
import shlex
import subprocess
command = 'c:/foo/boo/file.bat --i "test - testing"'
split_command = shlex.split(command)
print(split_command) # shlex.split handles all the proper escaping
subprocess.call(split_command)
OUTPUT from print: ['c:/foo/boo/file.bat', '--i', 'test - testing']

How do I input strings in Linux terminal that points to file path using subprocess.call command?

I'm using Ubuntu and have been trying to do a simple automation that requires me to input the [name of website] and the [file path] onto a list of command lines. I'm using subprocess and call function. I tried something simpler first using the "ls" command.
from subprocess import call
text = raw_input("> ")
("ls", "%s") % (text)
These returned as "buffsize must be an integer". I tried to found out what it was and apparently I had to pass the command as a list. So I tried doing it on the main thing im trying to code.
from subprocess import call
file_path = raw_input("> ")
site_name = raw_input("> ")
call("thug", -FZM -W "%s" -n "%s") % (site_name, file_path)
These passed as an invalid syntax on the first "%s". Can anyone point me to the correct direction?

You cannot use % on a tuple.
("ls", "%s") % text # Broken
You probably mean
("ls", "%s" % text)
But just "%s" % string is obviously going to return simply string, so there is no need to use formatting here.
("ls", text)
This still does nothing useful; did you forget the call?
You also cannot have unquoted strings in the argument to call.
call("thug", -FZM -W "%s" -n "%s") % (site_name, file_path) # broken
needs to have -FZM and -W quoted, and again, if you use format strings, the formatting needs to happen adjacent to the format string.
call(["thug", "-FZM", "-W", site_name, "-n", file_path])
Notice also how the first argument to call() is either a proper list, or a long single string (in which case you need shell=True, which you want to avoid if you can).
If you are writing new scripts, you most definitely should be thinking seriously about targetting Python 3 (in which case you want to pivot to subprocess.run() and input() instead of raw_input() too). Python 2 is already past its originally announced end-of-life date, though it was pushed back a few years because Py3k adoption was still slow a few years ago. It no longer is, and shouldn't be -- you want to be in Py3, that's where the future is.

Here is a complete example of how you would call a executable python file with subprocess.call Using argparse to properly parse the input.
Your python file to be called (sandboxArgParse.py):
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--filePath", help="Just A test", dest='filePath')
parser.add_argument("--siteName", help="Just A test", dest='siteName')
args = parser.parse_args()
print args.siteName
print args.filePath
Your calling python file:
from subprocess import call
call(["python","/users/dev/python/sandboxArgParse.py", "--filePath", "abcd.txt", "--siteName", "www.google.com"])

passing sentence as a argument in subprocess command

I am executing python script inside another script and want to pass two arguments to it
lines = [line.strip('\n') for line in open('que.txt')]
for l in lines:
print 'my sentence : '
print l
#os.system("find_entity.py") //this also does not work
subprocess.call(" python find_entity.py l 1", shell=True) //this works but l does not considered as sentence which was read
what is the correct approach?
update:
lines = [line.strip('\n') for line in open('q0.txt')]
for line_num, line in enumerate(lines):
cmd = ["python", "find_entity.py", line]
subprocess.call(cmd, shell=True)
then it goes to python terminal

You can use one of string substitution mechanics:
C-style string formatting
in your case it would looks like
subprocess.call("python find_entity.py %s %d" % (line, line_num))
C#-style string formatting
subprocess.call("python find_entity.py {} {}".format(line, line_num))
or templates
Or, in case with subprocess library you should pass arguments as list to call function:
subprocess.call(["python", "find_entity.py", line, str(line_num)])
Look at line and line_num variables — they pass without any quotes, so they would be passed by value.
This solution is recommended, because it provides more clean and obvious code and provide correct parameter's processing(such as whitespace escaping, etc).
However, if you want to use shell=True flag for subprocess.call, solution with list of args will not work instead of string substitution solutions. BTW, subprocess and os provides all shell powerful options: such as script piping, expanding user home directory(~), etc. So, if you will code big and complicated script you should use python libraries instead of using shell=True.

If you already have the command name and its arguments in separate variables, or already in a list, you almost never want to use shell=True. (It's not illegal, but its behavior is undocumented and generally not what is wanted.)
cmd = ["python", "find_entity.py", line]
subprocess.call(cmd)

you need the contents of variable l (I renamed it to line), not the string literal "l"
for line_num, line in enumerate(lines):
cmd = ["python",
"find_entity.py",
line,
str(line_num)]
subprocess.call(cmd, shell=True)

Passing empty string to argparse

I'm using argparse (Python 3.2). A parameter mode is defined simply as:
p.add_argument('--mode', dest='mode')
I want to call the program from the command line in such a way that parameter mode is set to an empty string; or at least to a single space ' ' (I can strip whitespace later).
I tried to use (under Windows) program_name --mode "" and program_name --mode " ", but neither worked.

This seems to work for me under OS-X:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--mode')
p = parser.parse_args()
print(p)
and then:
python test.py --mode=
I don't have a windows machine, so I don't know anything about how those operate...
Of course, the other things you mentioned above would work as well on OS-X. The advantage here is that it shouldn't rely on how the shell splits arguments enclosed in quotations which should make it work in more places I would think.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Un-escape spaces with Python pathlib [duplicate] - python

This is what shlex.split was created for.

Related

For some reason sys.argv() is not accepting comman lines. For example I type in hello in the command line sys.argv only show the filename [duplicate]

Whitespace in string escape in batch file

How do I input strings in Linux terminal that points to file path using subprocess.call command?

passing sentence as a argument in subprocess command

Passing empty string to argparse

Categories

Resources