There are times that I automagically create small shell scripts from Python, and I want to make sure that the filename arguments do not contain non-escaped special characters. I've rolled my own solution, that I will provide as an answer, but I am almost certain I've seen such a function lost somewhere in the standard library. By “lost” I mean I didn't find it in an obvious module like shlex, cmd or subprocess.
Do you know of such a function in the stdlib? If yes, where is it?
Even a negative (but definite and correct :) answer will be accepted.
pipes.quote():
>>> from pipes import quote
>>> quote("""some'horrible"string\with lots of junk!$$!""")
'"some\'horrible\\"string\\\\with lots of junk!\\$\\$!"'
Although note that it's arguably got a bug where a zero-length arg will return nothing:
>>> quote("")
''
Probably it would be better if it returned '""'.
The function I use is:
def quote_filename(filename):
return '"%s"' % (
filename
.replace('\\', '\\\\')
.replace('"', '\"')
.replace('$', '\$')
.replace('`', '\`')
)
that is: I always enclose the filename in double quotes, and then quote the only characters special inside double quotes.
Related
using re.escape() on this directory:
C:\Users\admin\code
Should theoratically return this, right?
C:\\Users\\admin\\code
However, what I actually get is this:
C\:\\Users\\admin\\code
Notice the backslash immediately after C. This makes the string unusable, and trying to use directory.replace('\', '') just bugs out Python because it can't deal with a single backslash string, and treats everything after it as string.
Any ideas?
Update
This was a dumb question :p
No it should not. It's help says "Escape all the characters in pattern except ASCII letters, numbers and '_'"
What you are reporting you are getting is after calling the print function on the resulting string. In console, if you type directory and press enter, it would give something like: C\\:\\\\Users\\\\admin\\\\code. When using directory.replace('\\','') it would replace all backslashes. For example: directory.replace('\\','x') gives Cx:xxUsersxxadminxxcode. What might work in this case is replacing both the backslash and colon with ':' i.e. directory.replace('\\:',':'). This will work.
However, I will suggest doing something else. A neat way to work with Windows directories in Python is to use forward slash. Python and the OS will work out a way to understand your paths with forward slashes. Further, if you aren't using absolute paths, as far as the paths are concerned, your code will be portable to Unix-style OSes.
It also seems to me that you are calling re.escape unnecessarily. If the printing the directory is giving you C:\Users\admin\code then it's a perfectly fine directory to use already. And you don't need to escape it. It's already done. If it wasn't escaped print('C:\Users\admin\code') would give something like C:\Usersdmin\code since \a has special meaning (beep).
I am using replace string method in Python and I am finding something that I cannot understand.
Changing the way that a folder is written in python to windows notation, I find that replace method will change this double / for a double \ instead of just one \ as intended.
folder_im_wdows = folder_im_wdows.replace("//","\\")
But the most impressive, is that when I try a workaround doing the next
folder_im_wdows = folder_im_wdows.replace("//",chr(92))
Python does the same...
The original variable is: //xxxxx//xxxx//xxxx//xxxx//xxx//xxxxx
And I want to get -> \xxx\x\x\x
What's happening with replace method?
This is because python's CLI escapes backslashes.
Example from python's CLI:
>>> str = "abc//def//fgh"
>>> str.replace("//", "\\")
'abc\\def\\fgh'
>>> print(str.replace("//", "\\"))
abc\def\fgh
>>>
Also, you should need to use \\ and not only \, because you need to escape the backslash character, well, I do.
Use os.path for working with path names:
import os
os.path.normpath('C:/Users/Bob/My Documents')
os.path.abspath would do the job too (it uses os.path.normpath)
Note: requires host to be windows, if that's not the case you can use ntpath.normpath directly
https://docs.python.org/library/os.path.html#os.path.normpath
Avoid regexes, replaces and all that. You're going to get it wrong in some subtle way.
I'm getting some content from Twitter API, and I have a little problem, indeed I sometimes get a tweet ending with only one backslash.
More precisely, I'm using simplejson to parse Twitter stream.
How can I escape this backslash ?
From what I have read, such raw string shouldn't exist ...
Even if I add one backslash (with two in fact) I still get an error as I suspected (since I have a odd number of backslashes)
Any idea ?
I can just forget about these tweets too, but I'm still curious about that.
Thanks : )
Prepending the string with r (stands for "raw") will escape all characters inside the string. For example:
print r'\b\n\\'
will output
\b\n\\
Have I understood the question correctly?
I guess you are looking a method similar to stripslashes in PHP. So, here you go:
Python version of PHP's stripslashes
You can try using raw strings by prepending an r (so nothing has to be escaped) to the string or re.escape().
I'm not really sure what you need considering I haven't seen the text of the response. If none of the methods you come up with on your own or get from here work, you may have to forget about those tweets.
Unless you update your question and come back with a real problem, I'm asserting that you don't have an issue except confusion.
You get the string from the Tweeter API, ergo the string does not show up in your code. “Raw strings” exist only in your code, and it is “raw strings” in code that can't end in a backslash.
Consider this:
def some_obscure_api():
"This exists in a library, so you don't know what it does"
return r"hello" + "\\" # addition just for fun
my_string = some_obscure_api()
print(my_string)
See? my_string happily ends in a backslash and your code couldn't care less.
I am trying to run a program from the command prompt in windows. I am having some issues. The code is below:
commandString = "'C:\Program Files\WebShot\webshotcmd.exe' //url '" + columns[3] + "' //out '"+columns[1]+"~"+columns[2]+".jpg'"
os.system(commandString)
time.sleep(10)
So with the single quotes I get "The filename, directory name, or volume label syntax is incorrect." If I replace the single quotes with \" then it says something to the effect of "'C:\Program' is not a valid executable."
I realize it is a syntax error, but I am not quite sure how to fix this....
column[3] contains a full url copy pasted from a web browser (so it should be url encoded). column[1] will only contain numbers and periods. column[2] contains some text, double quotes and colons are replaced. Mentioning just in case...
Thanks!
Windows requires double quotes in this situation, and you used single quotes.
Use the subprocess module rather than os.system, which is more robust and avoids calling the shell directly, making you not have to worry about confusing escaping issues.
Dont use + to put together long strings. Use string formatting (string %s" % (formatting,)), which is more readable, efficient, and idiomatic.
In this case, don't form a long string as a shell command anyhow, make a list and pass it to subprocess.call.
As best as I can tell you are escaping your forward slash but not your backslashes, which is backwards. A string literal with // has both slashes in the string it makes. In any event, rather than either you should use the os.path module which avoids any confusion from parsing escapes and often makes scripts more portable.
Use the subprocess module for calling system commands. Also ,try removing the single quotes and use double quotes.
When using os.system() it's often necessary to escape filenames and other arguments passed as parameters to commands. How can I do this? Preferably something that would work on multiple operating systems/shells but in particular for bash.
I'm currently doing the following, but am sure there must be a library function for this, or at least a more elegant/robust/efficient option:
def sh_escape(s):
return s.replace("(","\\(").replace(")","\\)").replace(" ","\\ ")
os.system("cat %s | grep something | sort > %s"
% (sh_escape(in_filename),
sh_escape(out_filename)))
Edit: I've accepted the simple answer of using quotes, don't know why I didn't think of that; I guess because I came from Windows where ' and " behave a little differently.
Regarding security, I understand the concern, but, in this case, I'm interested in a quick and easy solution which os.system() provides, and the source of the strings is either not user-generated or at least entered by a trusted user (me).
shlex.quote() does what you want since python 3.
(Use pipes.quote to support both python 2 and python 3,
though note that pipes has been deprecated since 3.10
and slated for removal in 3.13)
This is what I use:
def shellquote(s):
return "'" + s.replace("'", "'\\''") + "'"
The shell will always accept a quoted filename and remove the surrounding quotes before passing it to the program in question. Notably, this avoids problems with filenames that contain spaces or any other kind of nasty shell metacharacter.
Update: If you are using Python 3.3 or later, use shlex.quote instead of rolling your own.
Perhaps you have a specific reason for using os.system(). But if not you should probably be using the subprocess module. You can specify the pipes directly and avoid using the shell.
The following is from PEP324:
Replacing shell pipe line
-------------------------
output=`dmesg | grep hda`
==>
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]
Maybe subprocess.list2cmdline is a better shot?
Note that pipes.quote is actually broken in Python 2.5 and Python 3.1 and not safe to use--It doesn't handle zero-length arguments.
>>> from pipes import quote
>>> args = ['arg1', '', 'arg3']
>>> print 'mycommand %s' % (' '.join(quote(arg) for arg in args))
mycommand arg1 arg3
See Python issue 7476; it has been fixed in Python 2.6 and 3.2 and newer.
I believe that os.system just invokes whatever command shell is configured for the user, so I don't think you can do it in a platform independent way. My command shell could be anything from bash, emacs, ruby, or even quake3. Some of these programs aren't expecting the kind of arguments you are passing to them and even if they did there is no guarantee they do their escaping the same way.
Notice: This is an answer for Python 2.7.x.
According to the source, pipes.quote() is a way to "Reliably quote a string as a single argument for /bin/sh". (Although it is deprecated since version 2.7 and finally exposed publicly in Python 3.3 as the shlex.quote() function.)
On the other hand, subprocess.list2cmdline() is a way to "Translate a sequence of arguments into a command line string, using the same rules as the MS C runtime".
Here we are, the platform independent way of quoting strings for command lines.
import sys
mswindows = (sys.platform == "win32")
if mswindows:
from subprocess import list2cmdline
quote_args = list2cmdline
else:
# POSIX
from pipes import quote
def quote_args(seq):
return ' '.join(quote(arg) for arg in seq)
Usage:
# Quote a single argument
print quote_args(['my argument'])
# Quote multiple arguments
my_args = ['This', 'is', 'my arguments']
print quote_args(my_args)
The function I use is:
def quote_argument(argument):
return '"%s"' % (
argument
.replace('\\', '\\\\')
.replace('"', '\\"')
.replace('$', '\\$')
.replace('`', '\\`')
)
that is: I always enclose the argument in double quotes, and then backslash-quote the only characters special inside double quotes.
On UNIX shells like Bash, you can use shlex.quote in Python 3 to escape special characters that the shell might interpret, like whitespace and the * character:
import os
import shlex
os.system("rm " + shlex.quote(filename))
However, this is not enough for security purposes! You still need to be careful that the command argument is not interpreted in unintended ways. For example, what if the filename is actually a path like ../../etc/passwd? Running os.system("rm " + shlex.quote(filename)) might delete /etc/passwd when you only expected it to delete filenames found in the current directory! The issue here isn't with the shell interpreting special characters, it's that the filename argument isn't interpreted by the rm as a simple filename, it's actually interpreted as a path.
Or what if the valid filename starts with a dash, for example, -f? It's not enough to merely pass the escaped filename, you need to disable options using -- or you need to pass a path that doesn't begin with a dash like ./-f. The issue here isn't with the shell interpreting special characters, it's that the rm command interprets the argument as a filename or a path or an option if it begins with a dash.
Here is a safer implementation:
if os.sep in filename:
raise Exception("Did not expect to find file path separator in file name")
os.system("rm -- " + shlex.quote(filename))
I think these answers are a bad idea for escaping command-line arguments on Windows. Based on the results: people are trying to apply a black-list approach to filtering 'bad' characters, assuming (and hoping) they got them all. Windows is very complex and there could be all manner of characters found in the future that might allow an attacker to hijack command line arguments.
I've already seen some answers neglect to filter basic meta-characters in Windows (like the semi-colon.) The approach I take is far simpler:
Make a list of allowed ASCII characters.
Remove all chars that aren't in that list.
Escape slashes and double-quotes.
Surround entire command with double quotes so the command argument cannot be maliciously broken and commandeered with spaces.
A basic example:
def win_arg_escape(arg, allow_vars=0):
allowed_list = """'"/\\abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-. """
if allow_vars:
allowed_list += "~%$"
# Filter out anything that isn't a
# standard character.
buf = ""
for ch in arg:
if ch in allowed_list:
buf += ch
# Escape all slashes.
buf = buf.replace("\\", "\\\\")
# Escape double quotes.
buf = buf.replace('"', '""')
# Surround entire arg with quotes.
# This avoids spaces breaking a command.
buf = '"%s"' % (buf)
return buf
The function has an option to enable use of environmental variables and other shell variables. Enabling this poses more risk so its disabled by default.