Escape space in filepath - python

I'm trying to write a python tool that will read a logfile and process it
One thing it should do is use the paths listed in the logfile (it's a logfile for a backup tool)
/Volumes/Live_Jobs/Live_Jobs/*SCANS\ and\ LE\ Docs/_LE_PROOFS_DOCS/JEM_lj/JEM/0002_OXO_CorkScrew/3\ Delivery/GG_Double\ Lever\ Waiters\ Corkscrew_072613_Mike_RETOUCHED/gg_3110200_2_V3_Final.tif
Unfortunately the paths that I'm provided with aren't appropriately escaped and I've had trouble properly escaping in python. Perhaps python isn't the best tool for this, but I like it's flexibility - it will allow me to extend whatever I write
Using the regex escape function escapes too many characters, pipes.quote method doesn't escape the spaces, and if I use a regex to replace ' ' with '\ ' I end up getting
/Volumes/Live_Jobs/Live_Jobs/*SCANS\\ and\\ LE\\ Docs/_LE_PROOFS_DOCS/JEM_lj/JEM/0002_OXO_CorkScrew/3\\ Delivery/GG_Double\\ Lever\\ Waiters\\ Corkscrew_072613_Mike_RETOUCHED/gg_3110200_2_V3_Final.tif
which are double escaped and wont pass to python functions like os.path.getsize().
What am I doing wrong??

If you're reading paths out of a file, and passing them to functions like os.path.getsize, you don't need to escape them. For example:
>>> with open('name with spaces', 'w') as f:
... f.write('abc\n')
>>> os.path.getsize('name with spaces')
4
In fact, there are only a handful of functions in Python that need spaces escaped, either because they're passing a string to the shell (like os.system) or because they're trying to do shell-like parsing on your behalf (like subprocess.foo with an arg string instead of an arg list).
So, let's say logfile.txt looks like this:
/Volumes/My Drive/My Scans/Batch 1/foo bar.tif
/Volumes/My Drive/My Scans/Batch 1/spam eggs.tif
/Volumes/My Drive/My Scans/Batch 2/another long name.tif
… then something like this will work fine:
with open('logfile.txt') as logf:
for line in logf:
with open(line.rstrip()) as f:
do_something_with_tiff_file(f)
Noticing those * characters in your example, if these are glob patterns, that's fine too:
with open('logfile.txt') as logf:
for line in logf:
for path in glob.glob(line.rstrip()):
with open(path) as f:
do_something_with_tiff_file(f)
If your problem is the exact opposite of what you described, and the file is full of strings that are escaped, and you want to unescape them, decode('string_escape') will undo Python-style escaping, and there are different functions to undo different kinds of escaping, but without knowing what kind of escaping you want to undo it's hard to say which function you want…

Try this:
myfile = open(r'c:\tmp\junkpythonfile','w')
The 'r' stands for a raw string.
You could also use \ like
myfile = open('c:\\tmp\\junkpythonfile','w')

This command will escape the spaces in a string.
# sample_string = sample_string.replace(key, value)
file_path = file_path.replace(' ','\ ')
For more details see https://thispointer.com/python-replace-multiple-characters-in-a-string

Related

Why does python add additional backslashes to the path?

I have a text file with a path that goes like this:
r"\\user\data\t83\rf\Desktop\QA"
When I try to read this file a print a line it returns the following string, I'm unable to open the file from this location:
'r"\\\\user\\data\\t83\\rf\\Desktop\\QA"\n'
Seems you've got Python code in your text file, so either sanitize your file, so it only includes the actual path (not a Python string representation) or you can try to fiddle with string replace until you're satisfied, or just evaluate the Python string.
Note that using eval() opens Padora's box (it as unsafe as it gets), it's safer to use ast.literal_eval() instead.
import ast
file_content = 'r"\\\\user\\data\\t83\\rf\\Desktop\\QA"\n'
print(eval(file_content)) # do not use this, it's only shown for the sake of completeness
print(ast.literal_eval(file_content))
Output:
\\user\data\t83\rf\Desktop\QA
\\user\data\t83\rf\Desktop\QA
Personally, I'd prefer to sanitize the file, so it only contains \\user\data\t83\rf\Desktop\QA
\ will wait for another character to form one like \n (new line) or \t (tab) therefore a single backslash will merge with the next character. To solve this if the next character is \\ it will represent the single backslash.

replacing string from '\' into '/' Python

I've been strugling with some code where i need to change simple \ into / in Python. Its a path of file- Python doesn't read path of file in Windows'es way, so i simply want to change Windows path for Python to read file correctly.
I want to parse some text from game to count statistics. Im Doing it this way:
import re
pathNumbers = "D:\Gry\Tibia\packages\TibiaExternal\log\test server.txt"
pathNumbers = re.sub(r"\\", r"/",pathNumbers)
fileNumbers = open (pathNumbers, "r")
print(fileNumbers.readline())
fileNumbers.close()
But the Error i get back is
----> 6 fileNumbers = open (pathNumbers, "r") OSError: [Errno 22] Invalid argument: 'D:/Gry/Tibia/packages/TibiaExternal\test server.txt'
And the problem is, that function re.sub() and .replace(), give the same result- almost full path is replaced, but last char to change always stays untouched.
Do you have any solution for this, because it seems like changing those chars are for python a sensitive point.
Simple answer:
If you want to use paths on different plattforms join them with
os.path.join(path,*paths)
This way you don't have to work with the different separators at all.
Answer to what you intended to do:
The actual problem is, that your pathNumbers variable is not raw (leading r in definition), meaning that the backslashes are used as escape characters. In most cases this does not change anything, because the combinations with the following characters don't have a meaning. \t is the tab character, \n would be the newline character, so these are not simple backslash characters any more.
So simply write
pathNumbers = r"D:\Gry\Tibia\packages\TibiaExternal\log\test server.txt"

Own pretty print option in python script

I'm outputting pretty huge XML structure to file and I want user to be able to enable/disable pretty print.
I'm working with approximately 150MB of data,when I tried xml.etree.ElementTree and build tree structure from it's element objects, it used awfully lot of memory, so I do this manually by storing raw strings and outputing by .write(). My output sequence looks like this:
ofile.write(pretty_print(u'\
\t\t<LexicalEntry id="%s">\n\
\t\t\t<feat att="languageCode" val="cz"/>\n\
\t\t\t<Lemma>\n\
\t\t\t\t<FormRepresentation>\n\
\t\t\t\t\t<feat att="writtenForm" val="%s"/>\n\
\t\t\t\t</FormRepresentation>\n\
\t\t\t</Lemma>\n\
\t\t\t<Sense>%s\n' % (str(lex_id), word['word'], '' if word['pos']=='' else '\n\t\t\t\t<feat att="partOfSpeech" val="%s"/>' % word['pos'])))
inside the .write() I call my function pretty_print which, depending on command line option, SHOULD strip all tab and newline characters
o_parser = OptionParser()
# ....
o_parser.add_option("-p", "--prettyprint", action="store_true", dest="pprint", default=False)
# ....
def pretty_print(string):
if not options.pprint:
return string.strip('\n\t')
return string
I wrote 'should', because it does not, in this particular case it does not strip any of the characters.
BUT in this case, it works fine:
for ss in word['synsets']:
ofile.write(pretty_print(u'\t\t\t\t<Sense synset="%s-synset"/>\n' % ss))
First thing that came on my mind was that there might be some issues with the substitution, but when i print passed string inside the pretty_print function it looks perfectly fine.
Any suggestiones what might cause that .strip() does not work?
Or if there is any better way to do this, I'll accept any advice
Your issue is that str.strip() only removes from the beginning and end of a string.
You either want str.replace() to remove all instances, or to split it into lines and strip each line, if you want to remove them from the beginning and end of lines.
Also note that for your massive string, Python supports multi-line strings with triple quotes that will make it a lot easier to type out, and the old style string formatting with % has been superseded by str.format() - which you probably want to use instead in new code.

How can I read blackslashes from a file correctly?

The following code:
key = open("C:\Scripts\private.ppk",'rb').read()
reads the file and assigns its data to the var key.
For a reason, backslashes are multiplied in the process. How can I make sure they don't get multiplied?
You ... don't. They are escaped when they are read in so that they will process properly when they are written out / used. If you're declaring strings and don't want to double up the back slashes you can use raw strings r'c:\myfile.txt', but that doesn't really apply to the contents of a file you're reading in.
>>> s = r'c:\boot.ini'
>>> s
'c:\\boot.ini'
>>> repr(s)
"'c:\\\\boot.ini'"
>>> print s
c:\boot.ini
>>>
As you can see, the extra slashes are stored internally, but when you use the value in a print statement (write a file, test for values, etc.) they're evaluated properly.
You should read this great blog post on python and the backslash escape character.
And under some circumstances, if
Python prints information to the
console, you will see the two
backslashes rather than one. For
example, this is part of the
difference between the repr() function
and the str() function.
myFilename =
"c:\newproject\typenames.txt" print
repr(myFilename), str(myFilename)
produces
'c:\newproject\typenames.txt'
c:\newproject\typenames.txt
Backslashes are represented as escaped. You'll see two backslashes for each real one existing on the file, but that is normal behaviour.
The reason is that the backslash is used in order to create codes that represent characters that cannot be easily represented, such as new line '\n' or tab '\t'.
Are you trying to put single backslashes in a string? Strings with backslashes require and escape character, in this case "\". It will print to the screen with a single slash
In fact there is a solution - using eval, as long as the file content can be wrapped into quotes of some kind. Following worked for me (PATH contains some script that executes Matlab):
MATLAB_EXE = "C:\Program Files (x86)\MATLAB\R2012b\bin\matlab.exe"
content = open(PATH).read()
MATLAB_EXE in content # False
content = eval(f'r"""{content}"""')
MATLAB_EXE in content # True
This works by evaluating the content as python string literal, making double escapes transform into single ones. Raw string is used to prevent escapes forming special characters.

How to escape os.system() calls?

When using os.system() it's often necessary to escape filenames and other arguments passed as parameters to commands. How can I do this? Preferably something that would work on multiple operating systems/shells but in particular for bash.
I'm currently doing the following, but am sure there must be a library function for this, or at least a more elegant/robust/efficient option:
def sh_escape(s):
return s.replace("(","\\(").replace(")","\\)").replace(" ","\\ ")
os.system("cat %s | grep something | sort > %s"
% (sh_escape(in_filename),
sh_escape(out_filename)))
Edit: I've accepted the simple answer of using quotes, don't know why I didn't think of that; I guess because I came from Windows where ' and " behave a little differently.
Regarding security, I understand the concern, but, in this case, I'm interested in a quick and easy solution which os.system() provides, and the source of the strings is either not user-generated or at least entered by a trusted user (me).
shlex.quote() does what you want since python 3.
(Use pipes.quote to support both python 2 and python 3,
though note that pipes has been deprecated since 3.10
and slated for removal in 3.13)
This is what I use:
def shellquote(s):
return "'" + s.replace("'", "'\\''") + "'"
The shell will always accept a quoted filename and remove the surrounding quotes before passing it to the program in question. Notably, this avoids problems with filenames that contain spaces or any other kind of nasty shell metacharacter.
Update: If you are using Python 3.3 or later, use shlex.quote instead of rolling your own.
Perhaps you have a specific reason for using os.system(). But if not you should probably be using the subprocess module. You can specify the pipes directly and avoid using the shell.
The following is from PEP324:
Replacing shell pipe line
-------------------------
output=`dmesg | grep hda`
==>
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]
Maybe subprocess.list2cmdline is a better shot?
Note that pipes.quote is actually broken in Python 2.5 and Python 3.1 and not safe to use--It doesn't handle zero-length arguments.
>>> from pipes import quote
>>> args = ['arg1', '', 'arg3']
>>> print 'mycommand %s' % (' '.join(quote(arg) for arg in args))
mycommand arg1 arg3
See Python issue 7476; it has been fixed in Python 2.6 and 3.2 and newer.
I believe that os.system just invokes whatever command shell is configured for the user, so I don't think you can do it in a platform independent way. My command shell could be anything from bash, emacs, ruby, or even quake3. Some of these programs aren't expecting the kind of arguments you are passing to them and even if they did there is no guarantee they do their escaping the same way.
Notice: This is an answer for Python 2.7.x.
According to the source, pipes.quote() is a way to "Reliably quote a string as a single argument for /bin/sh". (Although it is deprecated since version 2.7 and finally exposed publicly in Python 3.3 as the shlex.quote() function.)
On the other hand, subprocess.list2cmdline() is a way to "Translate a sequence of arguments into a command line string, using the same rules as the MS C runtime".
Here we are, the platform independent way of quoting strings for command lines.
import sys
mswindows = (sys.platform == "win32")
if mswindows:
from subprocess import list2cmdline
quote_args = list2cmdline
else:
# POSIX
from pipes import quote
def quote_args(seq):
return ' '.join(quote(arg) for arg in seq)
Usage:
# Quote a single argument
print quote_args(['my argument'])
# Quote multiple arguments
my_args = ['This', 'is', 'my arguments']
print quote_args(my_args)
The function I use is:
def quote_argument(argument):
return '"%s"' % (
argument
.replace('\\', '\\\\')
.replace('"', '\\"')
.replace('$', '\\$')
.replace('`', '\\`')
)
that is: I always enclose the argument in double quotes, and then backslash-quote the only characters special inside double quotes.
On UNIX shells like Bash, you can use shlex.quote in Python 3 to escape special characters that the shell might interpret, like whitespace and the * character:
import os
import shlex
os.system("rm " + shlex.quote(filename))
However, this is not enough for security purposes! You still need to be careful that the command argument is not interpreted in unintended ways. For example, what if the filename is actually a path like ../../etc/passwd? Running os.system("rm " + shlex.quote(filename)) might delete /etc/passwd when you only expected it to delete filenames found in the current directory! The issue here isn't with the shell interpreting special characters, it's that the filename argument isn't interpreted by the rm as a simple filename, it's actually interpreted as a path.
Or what if the valid filename starts with a dash, for example, -f? It's not enough to merely pass the escaped filename, you need to disable options using -- or you need to pass a path that doesn't begin with a dash like ./-f. The issue here isn't with the shell interpreting special characters, it's that the rm command interprets the argument as a filename or a path or an option if it begins with a dash.
Here is a safer implementation:
if os.sep in filename:
raise Exception("Did not expect to find file path separator in file name")
os.system("rm -- " + shlex.quote(filename))
I think these answers are a bad idea for escaping command-line arguments on Windows. Based on the results: people are trying to apply a black-list approach to filtering 'bad' characters, assuming (and hoping) they got them all. Windows is very complex and there could be all manner of characters found in the future that might allow an attacker to hijack command line arguments.
I've already seen some answers neglect to filter basic meta-characters in Windows (like the semi-colon.) The approach I take is far simpler:
Make a list of allowed ASCII characters.
Remove all chars that aren't in that list.
Escape slashes and double-quotes.
Surround entire command with double quotes so the command argument cannot be maliciously broken and commandeered with spaces.
A basic example:
def win_arg_escape(arg, allow_vars=0):
allowed_list = """'"/\\abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-. """
if allow_vars:
allowed_list += "~%$"
# Filter out anything that isn't a
# standard character.
buf = ""
for ch in arg:
if ch in allowed_list:
buf += ch
# Escape all slashes.
buf = buf.replace("\\", "\\\\")
# Escape double quotes.
buf = buf.replace('"', '""')
# Surround entire arg with quotes.
# This avoids spaces breaking a command.
buf = '"%s"' % (buf)
return buf
The function has an option to enable use of environmental variables and other shell variables. Enabling this poses more risk so its disabled by default.

Categories

Resources