Rewritting a path - duplicate the backslash "\" - python

Let's say that I have this path :
path = "\\main\user\program\mathlab\test\count"
I want to rewrite this path to be able to use it. So I need to duplicate the backslash to get :
new_path = "\\\main\\user\\program\\mathlab\\test\\count"

This should do the job:
path.replace("\\", "\\\\") # replaces each \ with \\
We need two backslashes to represent one actual backslash. This is because a single backslash starts an escape sequence (of 2 characters total) that gives you the ability to represent characters like whitespaces (e.g. \n for newline, \t for tabs, ...). If you need it to give you a slash, you will have to add another slash to finish the sequence.

Related

Python path string adds an extra slash except for the last one

I have a path to a directory where I want to iterate over all of the files. It looks like this myPath = "D:\workspace\test\main\test docs" and when I print out myPath it looks like "D:\\workspace\test\\main\test docs" . As you can see it added a slash to every slash except the last one.
When I do for path, dirs, files in os.walk(myPath): it doesn't work if there isn'
t the extra slash. Why isn't python adding the extra slash to the last slash?
It was working on a different computer.
Because '\t' is an escape sequence that makes sense: it is a tab, like is specified in the 2.4.1 String literals section. The others only happen not to make sense here, so Python will escape these for you (for free).
You can thus add the extra backslash like:
myPath = "D:\\workspace\\test\\main\\test docs"
or you can use a raw string, by prefixing it with r:
myPath = r"D:\workspace\test\main\test docs"
In that case:
Unless an r' orR' prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C.
So that means the backslash (\) is not interpreted as something special, but only as a backslash.

Using os.chdir to access a file in which a folder starts with '\f'

I know that \f is a form feed. I want to access my folder the following way:
os.chdir("C:\Python27\BGT_Python\skills\fuzzymatching")
The folder 'fuzzymatching' starts with the \f symbol which breaks the string.
What's the easiest way to get around these types of symbols?
Add an r character in front of the string:
os.chdir(r"C:\Python27\BGT_Python\skills\fuzzymatching")
See the Python docs.
In triple-quoted strings, unescaped newlines and quotes are allowed (and are retained), except that three unescaped quotes in a row terminate the string. (A ``quote'' is the character used to open the string, i.e. either ' or ".)
and
Unless an r' orR' prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C.
For completeness, I'll add:
os.chdir("C:/Python27/BGT_Python/skills/fuzzymatching")
About the only part of Windows that actually requires backslashes is the command line.
This should work:
os.chdir("C:\Python27\BGT_Python\skills\\fuzzymatching")
I just added a \ to scape \f.

Backslashes in Python Regex

I'm writing a quick Python script to do a bit of inspection on some of our Hibernate mapping files. I'm trying to use this bit of Python to get the table name of a POJO, whether or not its class path is fully defined:
searchObj = re.search(r'<class name="(.*\\.|)' + pojo + '".*table="(.*?)"', contents)
However - say pojo is 'MyObject' - the regex is not matching it to this line:
<class name="com.place.package.MyObject" table="my_cool_object" dynamic-insert="true" dynamic-update="true">
If I print the string (while stopped in Pdb) I'm searching with, I see this:
'<class name="(.*\\\\.|)MyObject".*table="(.*?)"'
I'm quite confused as to what's going wrong here. For one, I was under the impression that the 'r' prefix made it so that the backslashes wouldn't be escaped. Even so, if I remove one of the backslashes such that my search string is this:
searchObj = re.search(r'<class name="(.*\.|)' + pojo + '".*table="(.*?)"', contents)
And the string searched becomes
'<class name="(.*\\.|)MyObject".*table="(.*?)"'
It still doesn't return a match. What's going wrong here? The regex expression I'm intending to use works on regex101.com (with just one backslash in the apparently problematic area.) Any idea what is going wrong here?
Given this:
re.search(r'<class name="(.*\\.|)' + pojo + '".*table="(.*?)"', contents)
The the first part of the pattern is interpreted like this:
1. class name=" a literal string beginning with c and ending with "
2. ( the beginning of a group
3. .* zero or more of any characters
4. \\ a literal single slash
5. . any single character
6. OR
7. nothing
8. ) end of the group
Since the string you're searching for does not have a literal backslash, it won't match.
If what you intend is for \\. to mean "a literal period", you need a single backslash since it is inside a raw string: \.
Also, ending the group with a pipe seems weird. I'm not sure what you think that's accomplishing. If you mean to say "any number of characters ending in a dot, or nothing", you can do that with (.*\.)?, since the ? means "zero or one of the preceding match".
This seems to work for me:
import re
contents1 = '''<class name="com.place.package.MyObject" table="my_cool_object" dynamic-insert="true" dynamic-update="true">'''
contents2 = '''<class name="MyObject" table="my_cool_object" dynamic-insert="true" dynamic-update="true">'''
pojo="MyObject"
pattern = r'<class name="(.*\.)?' + pojo + '.*table="(.*?)"'
assert(re.search(pattern, contents1))
assert(re.search(pattern, contents2))
On Pythex, I tried this regex:
<class name="(.*)\.MyObject" table="([^"]*)"
on this string:
<class name="com.place.package.MyObject" table="my_cool_object" dynamic-insert="true" dynamic-update="true">
and got these two match captures:
com.place.package
my_cool_object
So I think in your case, this line
searchObj = re.search(r'<class name="(.*)\.' + pojo + '"table="([^"]*)"', contents)
will produce the result you want.
About the confusing backslashes – you add two and then four show up, on the Python documentation 7.2. re — Regular expression operations it explains that r'' is “raw string notation”, used to circumvent Python’s regular character escaping, which uses a backslash. So:
'\\' means “a string composed of one backslash”, since the first backslash in the string escapes the second backslash. Python sees the first backslash and thinks, ‘the next character is a special one’; then it sees the second and says, ‘the special character is an actual backslash’. It’s stored as a single character \. If you ask Python to print this, it will escape the output and show you "\\".
r'\\' means “a string composed of two actual backslashes. It’s stored as character \ followed by character \. If you ask Python to print this, it will escape the output and show you "\\\\".

Is os.path.basename meant for file system files?

In Windows, os.path.basename('D:\\abc\def.txt') returns abc\def.txt, whereas os.path.basename('/abc/def.txt') returns def.txt.
Shouldn't the first also return def.txt?
You have an escape code in your filename, not a \ directory separator. You must've simplified your problem by using def for the filename, but had you actually tested with that simplified filename you'd have noticed that the slash would be doubled:
>>> 'D:\\abc\def.txt'
'D:\\abc\\def.txt'
Note that the \d in the string literal became a \\ escaped backslash in the Python representation of the value. That's because there is no valid \d escape sequence. On a Windows system the os.path.basename() call works as expected for that path:
>>> import os.path
>>> os.path.basename('D:\\abc\\def.txt')
'def.txt'
In your case, however, you created an escape sequence, either \n, \r or \t, because you either forgot to double the backslash or you forgot to use a raw string. You do not have a \ character in that part of the filename, so there is nothing to split on at that location.
Use a r'...' raw string to prevent single backslashes from forming escape sequences, or double your backslashes in all locations, or use forward slashes (Windows accepts either).

How can I put an actual backslash in a string literal (not use it for an escape sequence)?

I have this code:
import os
path = os.getcwd()
final = path +'\xulrunner.exe ' + path + '\application.ini'
print(final)
I want output like:
C:\Users\me\xulrunner.exe C:\Users\me\application.ini
But instead I get an error that looks like:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \xXX escape
I don't want the backslashes to be interpreted as escape sequences, but as literal backslashes. How can I do it?
Note that if the string should only contain a backslash - more generally, should have an odd number of backslashes at the end - then raw strings cannot be used. Please use How can I get a string with a single backslash in it? to close questions that are asking for a string with just a backslash in it. Use How to write string literals in python without having to escape them? when the question is specifically about wanting to avoid the need for escape sequences.
To answer your question directly, put r in front of the string.
final= path + r'\xulrunner.exe ' + path + r'\application.ini'
But a better solution would be os.path.join:
final = os.path.join(path, 'xulrunner.exe') + ' ' + \
os.path.join(path, 'application.ini')
(the backslash there is escaping a newline, but you could put the whole thing on one line if you want)
I will mention that you can use forward slashes in file paths, and Python will automatically convert them to the correct separator (backslash on Windows) as necessary. So
final = path + '/xulrunner.exe ' + path + '/application.ini'
should work. But it's still preferable to use os.path.join because that makes it clear what you're trying to do.
You can escape the slash. Use \\ and you get just one slash.
You can escape the backslash with another backslash (\\), but it won’t look nicer. To solve that, put an r in front of the string to signal a raw string. A raw string will ignore all escape sequences, treating backslashes as literal text. It cannot contain the closing quote unless it is preceded by a backslash (which will be included in the string), and it cannot end with a single backslash (or odd number of backslashes).
Another simple (and arguably more readable) approach is using string raw format and replacements like so:
import os
path = os.getcwd()
final = r"{0}\xulrunner.exe {0}\application.ini".format(path)
print(final)
or using the os path method (and a microfunction for readability):
import os
def add_cwd(path):
return os.path.join( os.getcwd(), path )
xulrunner = add_cwd("xulrunner.exe")
inifile = add_cwd("application.ini")
# in production you would use xulrunner+" "+inifile
# but the purpose of this example is to show a version where you could use any character
# including backslash
final = r"{} {}".format( xulrunner, inifile )
print(final)

Categories

Resources