I am trying to create a string in Python using the format method, but one of the arguments is a regular expression. I tried:
>>> 'foo{}bar'.format('[\s]+')
'foo[\\s]+bar'
Because of the escaping, I can't use the result as a re.search pattern. Is there a way not to escape it?
Thanks.
In fact, you don't need to do anything, because the result is what you expect. foo[\\s]+bar is the representation, but not the real value, which is foo[\s]+bar. Try this:
>> print 'foo{}bar'.format('[\s]+')
# and you will get
>> 'foo[\s]+bar'
Related
I'm using a format() in python and I want to use a variable pokablelio so that the person could choose how many numbers to output after the dot. When I try to put the variable alone after the comma it outputs: ValueError: Invalid format specifier. I tried replacing some characters or making the whole string in a parentheses but that didn't work.
Right now I'm wondering: Can I even use a variable as a string to put it in format's place?
(note: The machine should have a "'.10f'" string in the variable)
Error and the code
It is possible to use variables as part of the format specifier, just include them inside additional curly braces:
>>> n_places = 10
>>> f'{1.23:.{n_places}f}'
'1.2300000000'
I have a string s, its contents are variable. How can I make it a raw string? I'm looking for something similar to the r'' method.
i believe what you're looking for is the str.encode("string-escape") function. For example, if you have a variable that you want to 'raw string':
a = '\x89'
a.encode('unicode_escape')
'\\x89'
Note: Use string-escape for python 2.x and older versions
I was searching for a similar solution and found the solution via:
casting raw strings python
Raw strings are not a different kind of string. They are a different way of describing a string in your source code. Once the string is created, it is what it is.
Since strings in Python are immutable, you cannot "make it" anything different. You can however, create a new raw string from s, like this:
raw_s = r'{}'.format(s)
As of Python 3.6, you can use the following (similar to #slashCoder):
def to_raw(string):
return fr"{string}"
my_dir ="C:\data\projects"
to_raw(my_dir)
yields 'C:\\data\\projects'. I'm using it on a Windows 10 machine to pass directories to functions.
raw strings apply only to string literals. they exist so that you can more conveniently express strings that would be modified by escape sequence processing. This is most especially useful when writing out regular expressions, or other forms of code in string literals. if you want a unicode string without escape processing, just prefix it with ur, like ur'somestring'.
For Python 3, the way to do this that doesn't add double backslashes and simply preserves \n, \t, etc. is:
a = 'hello\nbobby\nsally\n'
a.encode('unicode-escape').decode().replace('\\\\', '\\')
print(a)
Which gives a value that can be written as CSV:
hello\nbobby\nsally\n
There doesn't seem to be a solution for other special characters, however, that may get a single \ before them. It's a bummer. Solving that would be complex.
For example, to serialize a pandas.Series containing a list of strings with special characters in to a textfile in the format BERT expects with a CR between each sentence and a blank line between each document:
with open('sentences.csv', 'w') as f:
current_idx = 0
for idx, doc in sentences.items():
# Insert a newline to separate documents
if idx != current_idx:
f.write('\n')
# Write each sentence exactly as it appared to one line each
for sentence in doc:
f.write(sentence.encode('unicode-escape').decode().replace('\\\\', '\\') + '\n')
This outputs (for the Github CodeSearchNet docstrings for all languages tokenized into sentences):
Makes sure the fast-path emits in order.
#param value the value to emit or queue up\n#param delayError if true, errors are delayed until the source has terminated\n#param disposable the resource to dispose if the drain terminates
Mirrors the one ObservableSource in an Iterable of several ObservableSources that first either emits an item or sends\na termination notification.
Scheduler:\n{#code amb} does not operate by default on a particular {#link Scheduler}.
#param the common element type\n#param sources\nan Iterable of ObservableSource sources competing to react first.
A subscription to each source will\noccur in the same order as in the Iterable.
#return an Observable that emits the same sequence as whichever of the source ObservableSources first\nemitted an item or sent a termination notification\n#see ReactiveX operators documentation: Amb
...
Just format like that:
s = "your string"; raw_s = r'{0}'.format(s)
With a little bit correcting #Jolly1234's Answer:
here is the code:
raw_string=path.encode('unicode_escape').decode()
s = "hel\nlo"
raws = '%r'%s #coversion to raw string
#print(raws) will print 'hel\nlo' with single quotes.
print(raws[1:-1]) # will print hel\nlo without single quotes.
#raws[1:-1] string slicing is performed
The solution, which worked for me was:
fr"{orignal_string}"
Suggested in comments by #ChemEnger
I suppose repr function can help you:
s = 't\n'
repr(s)
"'t\\n'"
repr(s)[1:-1]
't\\n'
Just simply use the encode function.
my_var = 'hello'
my_var_bytes = my_var.encode()
print(my_var_bytes)
And then to convert it back to a regular string do this
my_var_bytes = 'hello'
my_var = my_var_bytes.decode()
print(my_var)
--EDIT--
The following does not make the string raw but instead encodes it to bytes and decodes it.
I just want to print the original string.
[Case1] I know put "r" before the string can work
print r'123\r\n567"45'
>>`
123\r\n567"45
[Case2] But when it is a Variable
aaa = '123\r\n567"45'
print aaa
>>
123
567"45
Is there any function can print aaa with the same effect like Case1?
The obvious way to make Case 2 work like Case 1 is to use a raw string in your assignment statement:
aaa = r'123\r\n567"45'
Now when you print aaa, you'll get the actual backslashes and r and n characters, rather than a carriage return and a newline.
If you're actually loading aaa from some other source (rather than using a string literal), your task is a little bit more complicated. You'll actually need to transform the string in some way to get the output you want.
One simple way of doing something close to what you want is to use the repr function:
aaa = some_function() # returns '123\r\n567"45' and some_function can't be changed
print repr(aaa)
This will not quite do what you want though, since it will add quotation marks around the string's text. If you care about that, you could remove them with a slice:
print repr(aaa)[1:-1]
Another approach to take is to manually transform the characters you want escaped, e.g. with str.replace or str.translate. This is easy to do if you only care about escaping a few special characters and not others.
print aaa.replace('\r', r'\r').replace('\n', r'\n')
A final option is to use str.encode with the special character set called unicode-escape, which will escape all characters that are not printable ASCII:
print aaa.encode('unicode-escape')
This only works as intended in Python 2 however. In Python 3, str.encode always returns a bytes instance which you'd need to decode again to get a str (aaa.encode('unicode-escape').decode('ascii') would work, but it's really ugly).
You can do it using repr function in python:
If you are using python 2 then :
If you are using python 3 then :
If what you really want is just to print the original string, instead of prepending an r before the literal, you may want to use the python native function repr. E.g.
>>> aaa = '123\r\n567"45'
>>> print repr(aaa)
'123\r\n567"45'
which is equivalent in this (exact) case to
>>> print repr('123\r\n567"45')
'123\r\n567"45'
I write some simple Python script and I want to replace all characters / with \ in text variable. I have problem with character \, because it is escape character. When I use replace() method:
unix_path='/path/to/some/directory'
unix_path.replace('/','\\')
then it returns following string: \\path\\to\\some\\directory. Of course, I can't use: unix_path.replace('/','\'), because \ is escape character.
When I use regular expression:
import re
unix_path='/path/to/some/directory'
re.sub('/', r'\\', unix_path)
then it has same results: \\path\\to\\some\\directory. I would like to get this result: \path\to\some\directory.
Note: I aware of os.path, but I did not find any feasible method in this module.
You missed something: it is shown as \\ by the Python interpreter, but your result is correct: '\\'is just how Python represents the character \ in a normal string. That's strictly equivalent to \ in a raw string, e.g. 'some\\path is same as r'some\path'.
And also: Python on windows knows very well how to use / in paths.
You can use the following trick though, if you want your dislpay to be OS-dependant:
In [0]: os.path.abspath('c:/some/path')
Out[0]: 'c:\\some\\path'
You don't need a regex for this:
>>> unix_path='/path/to/some/directory'
>>> unix_path.replace('/', '\\')
'\\path\\to\\some\\directory'
>>> print(_)
\path\to\some\directory
And, more than likely, you should be using something in os.path instead of messing with this sort of thing manually.
This worked for me:
unix_path= '/path/to/some/directory'
print(unix_path.replace('/','\\'))
result:
\path\to\some\directory
I'm parsing a xml file in which I get basic expressions (like id*10+2). What I am trying to do is to evaluate the expression to actually get the value. To do so, I use the eval() method which works very well.
The only thing is the numbers are in fact hexadecimal numbers. The eval() method could work well if every hex number was prefixed with '0x', but I could not find a way to do it, neither could I find a similar question here. How would it be done in a clean way ?
Use the re module.
>>> import re
>>> re.sub(r'([\dA-F]+)', r'0x\1', 'id*A+2')
'id*0xA+0x2'
>>> eval(re.sub(r'([\dA-F]+)', r'0x\1', 'CAFE+BABE'))
99772
Be warned though, with an invalid input to eval, it won't work. There are also many risks of using eval.
If your hex numbers have lowercase letters, then you could use this:
>>> re.sub(r'(?<!i)([\da-fA-F]+)', r'0x\1', 'id*a+b')
'id*0xa+0xb'
This uses a negative lookbehind assertion to assure that the letter i is not before the section it is trying to convert (preventing 'id' from turning into 'i0xd'. Replace i with I if the variable is Id.
If you can parse expresion into individual numbers then I would suggest to use int function:
>>> int("CAFE", 16)
51966
Be careful with eval! Do not ever use it in untrusted inputs.
If it's just simple arithmetic, I'd use a custom parser (there are tons of examples out in the wild)... And using parser generators (flex/bison, antlr, etc.) is a skill that is useful and easily forgotten, so it could be a good chance to refresh or learn it.
One option is to use the parser module:
import parser, token, re
def hexify(ast):
if not isinstance(ast, list):
return ast
if ast[0] in (token.NAME, token.NUMBER) and re.match('[0-9a-fA-F]+$', ast[1]):
return [token.NUMBER, '0x' + ast[1]]
return map(hexify, ast)
def hexified_eval(expr, *args):
ast = parser.sequence2st(hexify(parser.expr(expr).tolist()))
return eval(ast.compile(), *args)
>>> hexified_eval('id*10 + BABE', {'id':0xcafe})
567466
This is somewhat cleaner than a regex solution in that it only attempts to replace tokens that have been positively identified as either names or numbers (and look like hex numbers). It also correctly handles more general python expressions such as id*10 + len('BABE') (it won't replace 'BABE' with '0xBABE').
OTOH, the regex solution is simpler and might cover all the cases you need to deal with anyway.