In Python how would I write the string '"['BOS']"'.
I tried entering "\"['BOS']\"" but this gives the output '"[\'BOS\']"' with added backslashes in front of the '.
You can use triple quotes:
'''"['BOS']"'''
What you did ("\"['BOS']\"") is fine too. You get the backslashes on output, but they aren't part of the string:
>>> a = "\"['BOS']\""
>>> a
'"[\'BOS\']"' # this is the representation of the string
>>> print a
"['BOS']" # this is the actual content
When you type an expression such as a into the console, it's the same as writing print repr(a). repr(a) returns a string that can be used to reconstruct the original value, hence the quotes around the string and the backslashes.
You should use triple quotes so that you don't need to use backslashes.
'''"['BOS']"'''
The reason you got \s in your output is because the python console adds them:
>>> s = '''"['BOS']"'''
>>> s
'"[\'BOS\']"'
>>>
Enclose the entire string with """ or ''' (you would use ''' if the outermost quotation marks were ") in cases like these to make things simpler.
"""'"['BOS']"'"""
You can build it dynamically as well:
>>> print('"{}"'.format("'[BOS]'"))
"'[BOS]'"
>>> print('"'+"'[BOS]'"+'"')
"'[BOS]'"
Related
I want to read regular expressions from a file, where each line contains a regex:
lorem.*
dolor\S*
The following code is supposed to read each and append it to a list of regex strings:
vocabulary=[]
with open(path, "r") as vocabularyFile:
for term in vocabularyFile:
term = term.rstrip()
vocabulary.append(term)
This code seems to escape the \ special character in the file as \\. How can I either avoid escaping or unescape the string so that it can be worked with as if I wrote this?
regex = r"dolor\S*"
You are getting confused by echoing the value. The Python interpreter echoes values by printing the repr() function result, and this makes sure to escape any meta characters:
>>> regex = r"dolor\S*"
>>> regex
'dolor\\S*'
regex is still an 8 character string, not 9, and the single character at index 5 is a single backslash:
>>> regex[4]
'r'
>>> regex[5]
'\\'
>>> regex[6]
'S'
Printing the string writes out all characters verbatim, so no escaping takes place:
>>> print(regex)
dolor\S*
The same process is applied to the contents of containers, like a list or a dict:
>>> container = [regex, 'foo\nbar']
>>> print(container)
['dolor\\S*', 'foo\nbar']
Note that I didn't echo there, I printed. str(list_object) produces the same output as repr(list_object) here.
If you were to print individual elements from the list, you get the same unescaped result again:
>>> print(container[0])
dolor\S*
>>> print(container[1])
foo
bar
Note how the \n in the second element was written out as a newline now. It is for that reason that containers use repr() for contents; to make otherwise hard-to-detect or non-printable data visible.
In other words, your strings do not contain escaped strings here.
s='\'(-inf-24.5]\'' #this in not working
what should be put before \ to include it?
we have to assign s '\'(-inf-24.5]\''
the last two characters are two single quotes and not a single double quote.
the string should literally contain the given single backslashes as the string is to be inserted as it is in a column.
You can try this:
>>> s="\\'(-inf-24.5]\\'"
>>> print s
\'(-inf-24.5]\'
or
>>> s="'\\'(-inf-24.5]\\''"
>>> print s
'\'(-inf-24.5]\''
Basically, you will need to escape the backslash, when you write \' normally, python treats it as the ' being escaped. Also, python strings can be either "", or '', so you can mix them togather to get the desired result.
>>> s = r"'\'(-inf-24.5]\''"
>>> s
"'\\'(-inf-24.5]\\''"
>>> print(s)
'\'(-inf-24.5]\''
Prepending r before a string denotes a raw string, basically indicating to the interpreter that that string's characters should be taken literally. The only thing it can't do is end a string with a backslash (such a backslash would have to be concatenated from a separate string).
For static strings, putting an r in front of the string would give the raw string (e.g. r'some \' string'). Since it is not possible to put r in front of a unicode string variable, what is the minimal approach to dynamically convert a string variable to its raw form? Should I manually substitute all backslashes with double backslashes?
str_var = u"some text with escapes e.g. \( \' \)"
raw_str_var = ???
If you really need to escape a string, let's say you want to print a newline as \n, you can use the encode method with the Python specific string_escape encoding:
>>> s = "hello\nworld"
>>> e = s.encode("string_escape")
>>> e
"hello\\nworld"
>>> print s
hello
world
>>> print e
hello\nworld
You didn't mention anything about unicode, or which Python version you are using, but if you are dealing with unicode strings you should use unicode_escape instead.
>>> u = u"föö\nbär"
>>> print u
föö
bär
>>> print u.encode('unicode_escape')
f\xf6\xf6\nb\xe4r
Your post originally had the regex tag, maybe re.escape is what you're actually looking for?
>>> re.escape(u"foo\nbar\'baz")
u"foo\\\nbar\\'baz"
Not the "double escapes", ie printing the above string yields:
foo\
bar\'baz
There is nothing to convert - the r prefix is only significant in source code notation, not for program logic.
As a rule, if you use a single backslash in a normal string, it will automatically be converted to a double backslash if it doesn't start a valid escape sequence:
>>> "\n \("
'\n \\('
Since it may be difficult to remember all the valid/invalid escape sequences, raw string notation was introduced. But there is no way and no need to convert a string after it has been defined.
In your case, the correct approach would be to use
str_var = ur"some text with escapes e.g. \( \' \)"
which happens to result in the same string here, but is more explicit.
I want to print the string. In my code i am not getting the right string.
line="\\python\001tag\file.txt"
str=re.search(r"\[(0-9)+]",line) (don't use raw_string here)
print str.group()
This gives nothing. I want to extract 001 from there.
Note: I don't want to use rawstring.because here user is getting the path from other resource. Is it possible to replace single slash by double slash to solve this problem
You need to use a raw-string so that escape sequences are not processed:
sat = r"\\Python\001tag\file.txt"
Demo:
>>> sat = r"\\Python\001tag\file.txt"
>>> sat
'\\\\Python\\001tag\\file.txt'
>>> print(sat)
\\Python\001tag\file.txt
>>>
Three errors: '\001' gives the codepoint in octal, actually the character at codepoint 1. Use double \\ or raw-strings.
Second: r'\[' escapes the '[', use double \\ instead: r'\\[+0-9()]' (I have rearranged the characters in the set, so that it doesn't look like a expression group.
Third: You want to look at str.group(0) to get the whole matched string.
I want to use a variable in a regex, like this:
variables = ['variableA','variableB']
for i in range(len(variables)):
regex = r"'('+variables[i]+')[:|=|\(](-?\d+(?:\.\d+)?)(?:\))?'"
pattern_variable = re.compile(regex)
match = re.search(pattern_variable, line)
The problem is that python adds an extra backslash character for each backslash character in my regex string (ipython), and makes my regex invalid:
In [76]: regex
Out[76]: "'('+variables[i]+')[:|=|\\(](-?\\d+(?:\\.\\d+)?)(?:\\))?'"
Any tips on how I can avoid this?
No, it only displays extra backslashes so that the string could be read in again and have the correct number of backslashes. Try
print regex
and you will see the difference.
There is no problem there. What you're seeing is the output of the repr() of the string. Since the repr is supposed to be more-or-less reversible back into the original object, it doubles up all backslashes, as well as escaping the type of quote used at the ends of the repr.