Whitespace in string escape in batch file - python

I am unable to run my batch command due to character escaping sequence issue.
Input from python:
import subprocess
print(data) ==> --i "test - testing"
subprocess.call(["c:/foo/boo/file.bat", data])
Batch file:
SET #tt=%1
output:-
C:\foo\boo>SET #tt=" --i \"test
Expected:-
C:\foo\boo>SET #tt=--i "test - testing"
Is there a way to escape white space to pass the actual input in batch file? Kindly suggest.

The proper quoting and escaping of commands can be tricky. Python has a function in a module to help make this easier: shlex.split: ❝Split the string s using shell-like syntax.❞
Documentation: shlex.split
I don't have a way to test your code. Here is an example of what I think you're trying to achieve.
import shlex
import subprocess
command = 'c:/foo/boo/file.bat --i "test - testing"'
split_command = shlex.split(command)
print(split_command) # shlex.split handles all the proper escaping
subprocess.call(split_command)
OUTPUT from print: ['c:/foo/boo/file.bat', '--i', 'test - testing']

Related

Un-escape spaces with Python pathlib [duplicate]

I have command line arguments in a string and I need to split it to feed to argparse.ArgumentParser.parse_args.
I see that the documentation uses string.split() plentifully. However in complex cases, this does not work, such as
--foo "spaces in brakets" --bar escaped\ spaces
Is there a functionality to do that in python?
(A similar question for java was asked here).
This is what shlex.split was created for.
If you're parsing a windows-style command line, then shlex.split doesn't work correctly - calling subprocess functions on the result will not have the same behavior as passing the string directly to the shell.
In that case, the most reliable way to split a string like the command-line arguments to python is... to pass command line arguments to python:
import sys
import subprocess
import shlex
import json # json is an easy way to send arbitrary ascii-safe lists of strings out of python
def shell_split(cmd):
"""
Like `shlex.split`, but uses the Windows splitting syntax when run on Windows.
On windows, this is the inverse of subprocess.list2cmdline
"""
if os.name == 'posix':
return shlex.split(cmd)
else:
# TODO: write a version of this that doesn't invoke a subprocess
if not cmd:
return []
full_cmd = '{} {}'.format(
subprocess.list2cmdline([
sys.executable, '-c',
'import sys, json; print(json.dumps(sys.argv[1:]))'
]), cmd
)
ret = subprocess.check_output(full_cmd).decode()
return json.loads(ret)
One example of how these differ:
# windows does not treat all backslashes as escapes
>>> shell_split(r'C:\Users\me\some_file.txt "file with spaces"', 'file with spaces')
['C:\\Users\\me\\some_file.txt', 'file with spaces']
# posix does
>>> shlex.split(r'C:\Users\me\some_file.txt "file with spaces"')
['C:Usersmesome_file.txt', 'file with spaces']
# non-posix does not mean Windows - this produces extra quotes
>>> shlex.split(r'C:\Users\me\some_file.txt "file with spaces"', posix=False)
['C:\\Users\\me\\some_file.txt', '"file with spaces"']
You could use the split_arg_string helper function from the click package:
import re
def split_arg_string(string):
"""Given an argument string this attempts to split it into small parts."""
rv = []
for match in re.finditer(r"('([^'\\]*(?:\\.[^'\\]*)*)'"
r'|"([^"\\]*(?:\\.[^"\\]*)*)"'
r'|\S+)\s*', string, re.S):
arg = match.group().strip()
if arg[:1] == arg[-1:] and arg[:1] in '"\'':
arg = arg[1:-1].encode('ascii', 'backslashreplace') \
.decode('unicode-escape')
try:
arg = type(string)(arg)
except UnicodeError:
pass
rv.append(arg)
return rv
For example:
>>> print split_arg_string('"this is a test" 1 2 "1 \\" 2"')
['this is a test', '1', '2', '1 " 2']
The click package is starting to dominate for command-arguments parsing, but I don't think it supports parsing arguments from string (only from argv). The helper function above is used only for bash completion.
Edit: I can nothing but recommend to use the shlex.split() as suggested in the answer by #ShadowRanger. The only reason I'm not deleting this answer is because it provides a little bit faster splitting then the full-blown pure-python tokenizer used in shlex (around 3.5x faster for the example above, 5.9us vs 20.5us). However, this shouldn't be a reason to prefer it over shlex.

How to enter a long and not-a-string data into the argument?

I have a problem with Python and need your help.
Take a look at this code:
import os
os.chdir('''C:\\Users\\Admin\\Desktop\\Automate_the_Boring_Stuff_onlimematerials_v.2\\automate_online-materials\\example.xlsx''')
The os.chdir() did not work because the directory I put in between the ''' and ''' is considered as raw string. Note that a line is no more than 125 characters, so I have to make a new line.
So how can I fix this?
You can split your statement into multiple lines by using the backslash \ to indicate that a statement is continued on the new line.
message = 'This message will not generate an error because \
it was split by using the backslash on your \
keyboard'
print(message)
Output
This message will not generate an error because it was split by using the backslash on your keyboard
Lines can be longer than 125 characters, but you should probably avoid that. You have a few solutions:
x = ('hi'
'there')
# x is now the string "hithere"
os.chdir('hi'
'there') # does a similar thing (os.chdir('hithere'))
You could also set up a variable:
root_path = "C:\\Users\\Admin\\Desktop"
filepath = "other\\directories" # why not just rename it though
os.chdir(os.path.join(root_path, filepath))
Do these work for you?
I'm also curious why you have to chdir there; if it's possible, you should just run the python script from that directory.

Concatenating strings retrieved by regex from a subprocess STDERR results in disorder

I have an audio file, Sample.flac. The title and length can be read with ffprobe to result in the output being sent to STDERR.
I want to run ffprobe through subprocess, and have done so successfully. I then retrieve the output (piped to subprocess.PIPE) with *.communicate()[1].decode() as indicated that I should by the Python docs.
communicate() returns a tuple, (stdout, stderr), with the output from the Popen() object. The proper index for stderr is then accessed and decoded from a byte string into a Python 3 UTF-8 string.
This decoded output is then parsed with a multiline regex pattern matching the format of the ffprobe metadata output. The match groups are then placed appropriately into a dictionary, with each first group converted to lowercase, and used as the key for the second group (value).
Here is an example of the output and the working regex.
The data can be accessed through the dictionary keys as expected. But upon concatenating the values together (all are strings), the output appears mangled.
This is the output I would expect:
Believer (Kaskade Remix) 190
Instead, this is what I get:
190ever (Kaskade Remix)
I don't understand why the strings appear to "overlap" each other and result in a mangled form. Can anyone explain this and what I have done wrong?
Below is the complete code that was run to produce the results above. It is a reduced section of my full project.
#! /usr/bin/env python3
# -*- coding: utf-8 -*-
import os
from re import findall, MULTILINE
from subprocess import Popen, PIPE
def media_metadata(file_path):
"""Use FFPROBE to get information about a media file."""
stderr = Popen(("ffprobe", file_path), shell=True, stderr=PIPE).communicate()[1].decode()
metadata = {}
for match in findall(r"(\w+)\s+:\s(.+)$", stderr, MULTILINE):
metadata[match[0].lower()] = match[1]
return metadata
if __name__ == "__main__":
meta = media_metadata("C:/Users/spike/Music/Sample.flac")
print(meta["title"], meta["length"])
# The above and below have the same result in the console
# print(meta["title"] + " " + meta["length"])
# print("{title} {length}".format(meta))
Can anyone explain this unpredictable output?
I have asked this question here earlier, however I dont think it was very clear. In the raw output when this is run on multiple files, you can see that towards the end the strings start becoming as unpredictable as not even printing part of the title value at all.
Thanks.
You are catching up the "\r" symbol. At printing, cursor is returned to the beginning of the string, so the next print and overwrites the first part. Stripping whitespaces (will also remove trailing "\r") should solve the problem:
metadata[match[0].lower()] = match[1].strip()
Reproduce:
print('Believer (Kaskade Remix)\r 190')
Output:
190ever (Kaskade Remix)
Issue:
End-Of-Line is \r\n. re $ matches \n. \r remains in the matching group.
Fix:
Insert \r before $ in your re pattern. i.e. (\w+)\s+:\s(.+)\r$
Or use universal_newlines=True as a Popen argument and remove .decode()
as the output will be text with \n instead of \r\n.
Or stderr = stderr.replace('\r', '') before re processing.
Alternative:
ffprobe can output a json string. Use json module which loads the string
and returns a dictionary.
i.e. command
['ffprobe', '-show_format', '-of', 'json', file_path]
The json string will be the stdout stream.

Xcopy with Python

I'm trying to get xcopy working with python to copy files to a remote system. I am using a very simple test example:
import os
src = "C:\<Username>\Desktop\test2.txt"
dst = "C:\Users\<Username>"
print os.system("xcopy %s %s" % (src, dst))
But for some reason when I run this I get:
Invalid number of parameters
4
Running the xcopy directly from the command line works fine. Any ideas?
Thanks
\t is a tab character. I'd suggest using raw strings for windows paths:
src = r"C:\<Username>\Desktop\test2.txt"
dst = r"C:\Users\<Username>"
This will stop python from surprising you by interpreting some of your backslashes as escape sequences.
In addition to using raw string literals, use the subprocess module instead of os.system - it will take care of quoting your arguments properly if they contain spaces. Thus:
import subprocess
src = r'C:\<Username>\Desktop\test2.txt'
dst = r'C:\Users\<Username>'
subprocess.call(['xcopy', src, dst])
Try prefixing your strings with r. So r"C:\<Username>\Desktop\test2.txt". The problem is that a backslash is treated as a special character within strings.

Using Tshark in Python Subprocess is giving syntax error

I am trying to develop a script to read pcap file and extract some field from it but using tshark as a subprocess. However i am getting syntax error regarding cmd. Can anyone help me out on this?
def srcDestDport (filename):
cmd = r"tshark -o column.format:"Source","%s", "Destination","%d", "dstport"," %uD"' -r %s"%(filename)
subcmd = cmd.split(' ')
lines = subprocess.Popen(subcmd,stdout=subprocess.PIPE)
return lines
As far as Python is concerned, you appear to be missing some commas in your cmd definition:
cmd = r"tshark -o column.format:"Source","%s", "Destination","%d", "dstport"," %uD"' -r %s"%(filename)
# -- no comma here -^ ----^ ----^ --^
because the first string ends when the first " is encountered at "Source"; a raw string does not preclude you from escaping embedded quotes.
If you wanted to produce a list of arguments, just make it a list directly, saves you interpolating the filename too:
cmd = ["tshark", "-o",
'column.format:"Source","%s","Destination","%d","dstport"," %uD"',
"-r", filename]
Note the single quotes around the 3rd argument to preserve the quotes in the command line argument.
This eliminates the need to split as well and preserves whitespace in the filename.

Categories

Resources