eval() seems to be dangerous to use when processing unknown strings, which is what a part of my project is doing.
For my project I have a string, called:
stringAsByte = "b'a'"
I've tried to do the following to convert that string directly (without using eval):
byteRepresentation = str.encode(stringAsByte)
print(byteRepresentation) # prints b"b'a'"
Clearly, that didn't work, so instead of doing:
byteRepresentation = eval(stringAsByte) # Uses eval!
print(byteRepresentation) # prints b'a'
Is there another way where I can get the output b'a'?
yes, with ast.literal_eval which is safe since it only evaluates literals.
>>> import ast
>>> stringAsByte = "b'a'"
>>> ast.literal_eval(stringAsByte)
b'a'
I have a string that represents a number which uses commas to separate thousands. How can I convert this to a number in python?
>>> int("1,000,000")
Generates a ValueError.
I could replace the commas with empty strings before I try to convert it, but that feels wrong somehow. Is there a better way?
For float values, see How can I convert a string with dot and comma into a float in Python, although the techniques are essentially the same.
import locale
locale.setlocale( locale.LC_ALL, 'en_US.UTF-8' )
locale.atoi('1,000,000')
# 1000000
locale.atof('1,000,000.53')
# 1000000.53
There are several ways to parse numbers with thousands separators. And I doubt that the way described by #unutbu is the best in all cases. That's why I list other ways too.
The proper place to call setlocale() is in __main__ module. It's global setting and will affect the whole program and even C extensions (although note that LC_NUMERIC setting is not set at system level, but is emulated by Python). Read caveats in documentation and think twice before going this way. It's probably OK in single application, but never use it in libraries for wide audience. Probably you shoud avoid requesting locale with some particular charset encoding, since it might not be available on some systems.
Use one of third party libraries for internationalization. For example PyICU allows using any available locale wihtout affecting the whole process (and even parsing numbers with particular thousands separators without using locales):
NumberFormat.createInstance(Locale('en_US')).parse("1,000,000").getLong()
Write your own parsing function, if you don't what to install third party libraries to do it "right way". It can be as simple as int(data.replace(',', '')) when strict validation is not needed.
Replace the commas with empty strings, and turn the resulting string into an int or a float.
>>> a = '1,000,000'
>>> int(a.replace(',' , ''))
1000000
>>> float(a.replace(',' , ''))
1000000.0
I got locale error from accepted answer, but the following change works here in Finland (Windows XP):
import locale
locale.setlocale( locale.LC_ALL, 'english_USA' )
print locale.atoi('1,000,000')
# 1000000
print locale.atof('1,000,000.53')
# 1000000.53
This works:
(A dirty but quick way)
>>> a='-1,234,567,89.0123'
>>> "".join(a.split(","))
'-123456789.0123'
I tried this. It goes a bit beyond the question:
You get an input. It will be converted to string first (if it is a list, for example from Beautiful soup);
then to int,
then to float.
It goes as far as it can get. In worst case, it returns everything unconverted as string.
def to_normal(soupCell):
''' converts a html cell from beautiful soup to text, then to int, then to float: as far as it gets.
US thousands separators are taken into account.
needs import locale'''
locale.setlocale( locale.LC_ALL, 'english_USA' )
output = unicode(soupCell.findAll(text=True)[0].string)
try:
return locale.atoi(output)
except ValueError:
try: return locale.atof(output)
except ValueError:
return output
>>> import locale
>>> locale.setlocale(locale.LC_ALL, "")
'en_US.UTF-8'
>>> print locale.atoi('1,000,000')
1000000
>>> print locale.atof('1,000,000.53')
1000000.53
this is done on Linux in US.
A little late, but the babel library has parse_decimal and parse_number which do exactly what you want:
from babel.numbers import parse_decimal, parse_number
parse_decimal('10,3453', locale='es_ES')
>>> Decimal('10.3453')
parse_number('20.457', locale='es_ES')
>>> 20457
parse_decimal('10,3453', locale='es_MX')
>>> Decimal('103453')
You can also pass a Locale class instead of a string:
from babel import Locale
parse_decimal('10,3453', locale=Locale('es_MX'))
>>> Decimal('103453')
If you're using pandas and you're trying to parse a CSV that includes numbers with a comma for thousands separators, you can just pass the keyword argument thousands=',' like so:
df = pd.read_csv('your_file.csv', thousands=',')
Try this:
def changenum(data):
foo = ""
for i in list(data):
if i == ",":
continue
else:
foo += i
return float(int(foo))
I have a variable with value like a ="\x01" from my database, how can I convert it into an integer. I have searched the internet but had no success in finding anything.
Anyone have an idea?
In PHP, there is a build-in module to convert it. Is there any similar module for that function in Python?
Simple answer is to use ord().
>>> a = '\x01'
>>> ord(a)
1
But if performance is what you are looking for then refer #chepner's answer.
You can use the struct module for fixed-length values.
>>> a = '\x01'
>>> import struct
>>> struct.unpack("B", a)
(1,)
unpack always returns a tuple, since you can extract multiple values from a single string.
See this code:
my_src_str = '"""hello"""'
my_real_str = get_real_string_from_python_src_string(my_src_str)
In this case, my_src_str is a string representation in python source code format. I want to interpret it as a real python string. Here I want to get hello to my_real_str. How can I do this?
>>> import ast
>>> my_src_str = '"""hello"""'
>>> ast.literal_eval(my_src_str)
'hello'
I have unicode u"{'code1':1,'code2':1}" and I want it in dictionary format.
I want it in {'code1':1,'code2':1} format.
I tried unicodedata.normalize('NFKD', my_data).encode('ascii','ignore') but it returns string not dictionary.
Can anyone help me?
You can use built-in ast package:
import ast
d = ast.literal_eval("{'code1':1,'code2':1}")
Help on function literal_eval in module ast:
literal_eval(node_or_string)
Safely evaluate an expression node or a string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.
You can use literal_eval. You may also want to be sure you are creating a dict and not something else. Instead of assert, use your own error handling.
from ast import literal_eval
from collections import MutableMapping
my_dict = literal_eval(my_str_dict)
assert isinstance(my_dict, MutableMapping)
EDIT: Turns out my assumption was incorrect; because the keys are not wrapped in double-quote marks ("), the string isn't JSON. See here for some ways around this.
I'm guessing that what you have might be JSON, a.k.a. JavaScript Object Notation.
You can use Python's built-in json module to do this:
import json
result = json.loads(u"{'code1':1,'code2':1}") # will NOT work; see above
I was getting unicode error when I was reading a json from a file. So this one worked for me.
import ast
job1 = {}
with open('hostdata2.json') as f:
job1= json.loads(f.read())
f.close()
#print type before converting this from unicode to dic would be <type 'unicode'>
print type(job1)
job1 = ast.literal_eval(job1)
print "printing type after ast"
print type(job1)
# this should result <type 'dict'>
for each in job1:
print each
print "printing keys"
print job1.keys()
print "printing values"
print job1.values()
You can use the builtin eval function to convert the string to a python object
>>> string_dict = u"{'code1':1, 'code2':1}"
>>> eval(string_dict)
{'code1': 1, 'code2': 1}