eval() seems to be dangerous to use when processing unknown strings, which is what a part of my project is doing.
For my project I have a string, called:
stringAsByte = "b'a'"
I've tried to do the following to convert that string directly (without using eval):
byteRepresentation = str.encode(stringAsByte)
print(byteRepresentation) # prints b"b'a'"
Clearly, that didn't work, so instead of doing:
byteRepresentation = eval(stringAsByte) # Uses eval!
print(byteRepresentation) # prints b'a'
Is there another way where I can get the output b'a'?
yes, with ast.literal_eval which is safe since it only evaluates literals.
>>> import ast
>>> stringAsByte = "b'a'"
>>> ast.literal_eval(stringAsByte)
b'a'
Related
I have a string s, its contents are variable. How can I make it a raw string? I'm looking for something similar to the r'' method.
i believe what you're looking for is the str.encode("string-escape") function. For example, if you have a variable that you want to 'raw string':
a = '\x89'
a.encode('unicode_escape')
'\\x89'
Note: Use string-escape for python 2.x and older versions
I was searching for a similar solution and found the solution via:
casting raw strings python
Raw strings are not a different kind of string. They are a different way of describing a string in your source code. Once the string is created, it is what it is.
Since strings in Python are immutable, you cannot "make it" anything different. You can however, create a new raw string from s, like this:
raw_s = r'{}'.format(s)
As of Python 3.6, you can use the following (similar to #slashCoder):
def to_raw(string):
return fr"{string}"
my_dir ="C:\data\projects"
to_raw(my_dir)
yields 'C:\\data\\projects'. I'm using it on a Windows 10 machine to pass directories to functions.
raw strings apply only to string literals. they exist so that you can more conveniently express strings that would be modified by escape sequence processing. This is most especially useful when writing out regular expressions, or other forms of code in string literals. if you want a unicode string without escape processing, just prefix it with ur, like ur'somestring'.
For Python 3, the way to do this that doesn't add double backslashes and simply preserves \n, \t, etc. is:
a = 'hello\nbobby\nsally\n'
a.encode('unicode-escape').decode().replace('\\\\', '\\')
print(a)
Which gives a value that can be written as CSV:
hello\nbobby\nsally\n
There doesn't seem to be a solution for other special characters, however, that may get a single \ before them. It's a bummer. Solving that would be complex.
For example, to serialize a pandas.Series containing a list of strings with special characters in to a textfile in the format BERT expects with a CR between each sentence and a blank line between each document:
with open('sentences.csv', 'w') as f:
current_idx = 0
for idx, doc in sentences.items():
# Insert a newline to separate documents
if idx != current_idx:
f.write('\n')
# Write each sentence exactly as it appared to one line each
for sentence in doc:
f.write(sentence.encode('unicode-escape').decode().replace('\\\\', '\\') + '\n')
This outputs (for the Github CodeSearchNet docstrings for all languages tokenized into sentences):
Makes sure the fast-path emits in order.
#param value the value to emit or queue up\n#param delayError if true, errors are delayed until the source has terminated\n#param disposable the resource to dispose if the drain terminates
Mirrors the one ObservableSource in an Iterable of several ObservableSources that first either emits an item or sends\na termination notification.
Scheduler:\n{#code amb} does not operate by default on a particular {#link Scheduler}.
#param the common element type\n#param sources\nan Iterable of ObservableSource sources competing to react first.
A subscription to each source will\noccur in the same order as in the Iterable.
#return an Observable that emits the same sequence as whichever of the source ObservableSources first\nemitted an item or sent a termination notification\n#see ReactiveX operators documentation: Amb
...
Just format like that:
s = "your string"; raw_s = r'{0}'.format(s)
With a little bit correcting #Jolly1234's Answer:
here is the code:
raw_string=path.encode('unicode_escape').decode()
s = "hel\nlo"
raws = '%r'%s #coversion to raw string
#print(raws) will print 'hel\nlo' with single quotes.
print(raws[1:-1]) # will print hel\nlo without single quotes.
#raws[1:-1] string slicing is performed
The solution, which worked for me was:
fr"{orignal_string}"
Suggested in comments by #ChemEnger
I suppose repr function can help you:
s = 't\n'
repr(s)
"'t\\n'"
repr(s)[1:-1]
't\\n'
Just simply use the encode function.
my_var = 'hello'
my_var_bytes = my_var.encode()
print(my_var_bytes)
And then to convert it back to a regular string do this
my_var_bytes = 'hello'
my_var = my_var_bytes.decode()
print(my_var)
--EDIT--
The following does not make the string raw but instead encodes it to bytes and decodes it.
I'm writing some stuff with Django and it's becoming a bit of a mess. I'm trying to create an app that can run and return data from up-loadable python modules with a text input field for calling functions.
Most of it's working and I can call functions with no arguments.
The problem is with entering functions which require arguments, as any input that's passed to python is in string form, meaning if I enter: param('a',[foo]) to call the function:
def param(a,b):
return "Hello world" + a + b
The function will return:
"Hello world'a'[foo]"
Basically I need a way to take a string and convert it to literal python code... if that's possible so that I can treat this input box as if it were the python console.
Any ideas? Any help would be greatly welcome! I do realize this isn't a very pythonic way of doing things.
EDIT due to concerns about security: I am not too worried about security issues as this will only ever be a local project.
You can use ast.literal_eval, it is more limited than eval but does not pose the risks:
Safely evaluate an expression node or a Unicode or Latin-1 encoded string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.
from ast import literal_eval
def param(a,b):
return "Hello world" + literal_eval(a) + literal_eval(b)
In [9]: param('" foo"', "'bar'")
Out[9]: 'Hello world foobar'
You can use a try/except to catch when you want a or b are just meant to be strings:
You can use eval for that.
def param(a, b):
return "Hello world" + eval(a) + eval(b)
It will surely produce an error if you try to concatenate a string and a list:
>>> param('", an unsafe world!"', '"!!!"')
'Hello world, an unsafe world!!!!'
>>> param('", an unsafe world!"', '[]')
TypeError: cannot concatenate 'str' and 'list' objects
As of getting rid of the quotation marks, I guess you can either add the quotation marks yourself (like eval('"{0}"'.format(a))) or use an ast.literal_eval() as proposed by Padraic.
Is there a python equivalent to echo -e?
In other words, is there a built-in function to convert r"\x50\x79\x74\x68\x6f\x6e" to "Python" in Python?
Edit
I added the 'r' prefix, to make sure everyone understands that I do not want the python interpreter to convert this. Rather, I want to convert that 24-character string to a 6-character one.
The correct way to do this, which I just found is
>>> a = r"\x50\x79\x74\x68\x6f\x6e"
>>> print a
\x50\x79\x74\x68\x6f\x6e
>>> a.decode('string_escape')
'Python'
Make sure you are escaping the backslashes (or using the raw 'r' prefix) when testing this!
References:
http://docs.python.org/library/stdtypes.html#str.decode
http://docs.python.org/library/codecs.html#standard-encodings
No conversion is necessary. They are already the same string
>>> "\x50\x79\x74\x68\x6f\x6e" == "Python"
True
If you actually have a different string "\\x50\\x79\\x74\\x68\\x6f\\x6e" which actually contains backslashes ("\x50\x79\x74\x68\x6f\x6e" does not contain any backslashes), then you would do
>>> s
'\\x50\\x79\\x74\\x68\\x6f\\x6e'
>>> s.decode('string-escape')
'Python'
I have a config file like this.
[rects]
rect1=(2,2,10,10)
rect2=(12,8,2,10)
I need to loop through the values and convert them to tuples.
I then need to make a tuple of the tuples like
((2,2,10,10), (12,8,2,10))
Instead of using a regex or int/string functions, you could also use the ast module's literal_eval function, which only evaluates strings that are valid Python literals. This function is safe (according to the docs).
http://docs.python.org/library/ast.html#ast.literal_eval
import ast
ast.literal_eval("(1,2,3,4)") # (1,2,3,4)
And, like others have said, ConfigParser works for parsing the INI file.
To turn the strings into tuples of ints (which is, I assume, what you want), you can use a regex like this:
x = "(1,2,3)"
t = tuple(int(v) for v in re.findall("[0-9]+", x))
And you can use, say, configparser to parse the config file.
Considering that cp is the ConfigParser object for the cfg file having the config.
[rects]
rect1=(2,2,10,10)
rect2=(12,8,2,10)
>> import ast
>> tuple(ast.literal_eval(v[1]) for v in cp.items('rects'))
((2,2,10,10), (12,8,2,10))
Edit : Changed eval() to a safer version literal_eval()
From python docs - literal_eval() does following :
Safely evaluate an expression node or a string containing a Python
expression. The string or node provided may only consist of the following
Python literal structures: strings, numbers, tuples, lists, dicts, booleans,
and None
You can simply make a tuple of tuples like
new_tuple = (rect1,rect2) # ((2,2,10,10), (12,8,2,10))
If you want to loop through values
for i in rect1+rect2:
print i
If you want to regroup the numbers you could do
tuple_regrouped = zip(rect1,rect2) #((2,12),(2,8),(10,2), (10,10))
EDIT:
Didn't notice the string part. If you have lines in strings, like from reading a config file, you can do something like
# line = "rect1 = (1,2,3,4)"
config_dict = {}
var_name, tuple_as_str = line.replace(" ","").split("=")
config_dict[var_name] = tuple([int(i) for i in tuple_as_str[1:-1].split(',')])
# and now you'd have config_dict['rect1'] = (1,2,3,4)
The easiest way to do this would be to use Michael Foord's ConfigObject library. It has an unrepr mode, which'll directly convert the string into a tuple for you.
In python, are strings mutable? The line someString[3] = "a" throws the error
TypeError: 'str' object does not
support item assignment
I can see why (as I could have written someString[3] = "test" and that would obviously be illegal) but is there a method to do this in python?
Python strings are immutable, which means that they do not support item or slice assignment. You'll have to build a new string using i.e. someString[:3] + 'a' + someString[4:] or some other suitable approach.
Instead of storing your value as a string, you could use a list of characters:
>>> l = list('foobar')
>>> l[3] = 'f'
>>> l[5] = 'n'
Then if you want to convert it back to a string to display it, use this:
>>> ''.join(l)
'foofan'
If you are changing a lot of characters one at a time, this method will be considerably faster than building a new string each time you change a character.
In new enough pythons you can also use the builtin bytearray type, which is mutable. See the stdlib documentation. But "new enough" here means 2.6 or up, so that's not necessarily an option.
In older pythons you have to create a fresh str as mentioned above, since those are immutable. That's usually the most readable approach, but sometimes using a different kind of mutable sequence (like a list of characters, or possibly an array.array) makes sense. array.array is a bit clunky though, and usually avoided.
>>> import ctypes
>>> s = "1234567890"
>>> mutable = ctypes.create_string_buffer(s)
>>> mutable[3] = "a"
>>> print mutable.value
123a567890
Use this:
someString.replace(str(list(someString)[3]),"a")
Just define a new string equaling to what you want to do with your current string.
a = str.replace(str[n],"")
return a