How to convert unusual unicode string with number to integer in python

How to convert unusual unicode string with number to integer in python - python

I have some fairly hairy unicode strings with numbers in them that I'd like to test the value of. Normally, I'd just use str.isnumeric to test for whether it could be converted via int() but I'm encountering cases where isnumeric returns True but int() raises an exception.
Here's an example program:
>>> s = '⒍'
>>> s.isnumeric()
True
>>> int(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '⒍'
Unicode is always full of surprises, so I'm happy to just be robust to this case and use a try/except block to catch unusual numbers. However, I'd be happier if I could still convert them to integers. Is there a consistent way to do this?

If you want to test if a string can be passed to int, use str.isdecimal. Both str.isnumeric and str.isdigit include decimal-like characters that aren't compatible with int.
And as #abarnert has mentioned in the comments, the most guaranteed way to test if a string can be passed to int is to simply do it in a try block.
On the other hand, '⒍' can be converted to an actual digit with the help of the unicodedata module, e.g.
print(unicodedata.digit('⒍'))
would output 6.

I don't know how much luck you'll have, but unicodedata may handle some cases (python 3 code):
>>> import unicodedata
>>> unicodedata.normalize('NFKC', '⒍')
'6.'
Slightly better. As to testing, if you want an int you could just int() it and catch the exception.

The best way to find out if a string can be converted to int is to just try it:
s = '⒍'
try:
num = int(s)
except ValueError:
# handle it
Sure, you can try to figure out the right way to test the string in advance, but why? If the rule you want is "whatever int accepts", just use int.
If you want to convert something that is a digit, but isn't a decimal, use the unicodedata module:
s = '⒍'
num = unicodedata.digit(s) # 6
num = unicodedata.numeric(s) # 6.0
num = unicodedata.decimal(s) # ValueError: not a decimal
The DIGIT SIX FULL STOP character's entry in the database has Digit and Numeric values, despite being a Number, Other rather than a Number, Decimal Digit (and therefore not being compatible with int).

Related

Can't Convert Object to Float. What's the Best Workaround?

I'm struggling to convert an object to a float.
df_final['INBCS'] = df_final['INBCS'].astype(float)
It keeps saying: ValueError: could not convert string to float: '1,620,000'
If I try a different approace, I get mostly NAN results.
print(pd.to_numeric(df_final['INBCS'], errors='coerce'))
I tried one more approach, and I still get errors.
df_final = df_final[df_final['INBCS'].apply(lambda x: x.isnumeric())]
There are no NANs in the data; I already converted them to zeros. When I print the data, it shows commas, but there are no commas at all. I even did ran a replace function to get rid of any potential commas, but again, there are no commas in the data. Any idea what's wrong here? Thanks.

The reason you can't convert that string to a float is that Python doesn't know what to do with the commas. You can reproduce this easily:
>>> float('1,000')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: '1,000'
It's tempting to just remove the commas and parse the number, but there's an internationalization concern. In some countres, a comma separates thousands (eg, "1,000,000" is one million). In other countries, commas separate decimals (eg, "1,05" is one and five one-hundredths).
For that reason, it's best to use localization to parse a number like that if you can't get it in a native form. See this answer for details on that.

The reason is because you have , there, you can do:
df_final['INBCS'] = df_final['INBCS'].replace(',','')
df_final['INBCS'] = df_final['INBCS'].astype(float)
should work.

Try this:
string = '1,620,000'
decimal = float(''.join(string.split(',')))
print(type(decimal), decimal)
# Prints (<type 'float'>, 1620000.0)
This first gets rid of all the commas using split(','), then recreates the string using ''.join(). Finally, it converts the whole thing to a float using float().

Creating one for loop for 3 strings

So I have a String called 'Number' with 'abf573'. The task is, to find out if the String 'Number' just has characters and numbers from the Hexadecimal System.
My plan was to make a for loop, where we go through each position of the String 'Numbers', to check with an if statement if it is something out of the Hexadecimal System. To check that, I thought about writing down the A-F, a-f and 0-9 into Lists or separat Strings.
My Problem now is, that I have never done something like this in Python. I know how to make for loops and if-/else-/elif-Statements, but I dunno how to implement this in to this Problem.
Would be nice, if someone can give me a hint, how to do it, or if my way of thinking is even right or not.

I find it quite smart and fast to try to convert this string into an integer using int(), and to handle the exception ValueError which occurs if it is not possible.
Here is the beautiful short code:
my_string = 'abf573'
try:
result = int(my_string, 16)
print("OK")
except ValueError:
print("NOK")

Strings are iterables. So, you can write
Number = '12ab'
for character in Number:
if character in 'abcdef':
print('it is HEX')
Also, there is an isdigit method on strings, so your number is hex is not Number.isdigit()

Storing int or str in the list

I created a text file and opened it in Python using:
for word_in_line in open("test.txt"):
To loop through the words in a line in txt file.
The text file only has one line, which is:
int 111 = 3 ;
When I make a list using .split():
print("Input: {}".format(word_in_line))
line_list = word_in_line.split()
It creates:
['int', '111', '=', '3', ';']
And I was looking for a way to check if line_list[1] ('111') is an integer.
But when I try type(line_list[1]), it says that its str because of ''.
My goal is to read through the txt file and see if it is integer or str or other data type, etc.

What you have in your list is a string. So the type coming is correct and expected.
What you are looking to do is check to see if what you have are all digits in your string. So to do that use the isdigit string method:
line_list[1].isdigit()
Depending on what exactly you are trying to validate here, there are cases where all you want are purely digits, where this solution provides exactly that.
There could be other cases where you want to check whether you have some kind of number. For example, 10.5. This is where isdigit will fail. For cases like that, you can take a look at this answer that provides an approach to check whether you have a float

I don't agree with the above answer.
Any string parsing like #idjaw's answer of line_list[1].isdigit() will fail on an odd edge case. For example, what if the number is a float and like .50 and starts with a dot? The above approach won't work. Technically we only care about ints in this example so this won't matter, but in general it is dangerous.
In general if you are trying to check whether a string is a valid number, it is best to just try to convert the string to a number and then handle the error accordingly.
def isNumber(string):
try:
val = int(string)
return True
except ValueError:
return False

How to check if a variable is an integer or a string? [duplicate]

This question already has answers here:
How can I check if a string represents an int, without using try/except?
(23 answers)
Closed 9 years ago.
I have an application that has a couple of commands.
When you type a certain command, you have to type in additional info about something/someone.
Now that info has to be strictly an integer or a string, depending on the situation.
However, whatever you type into Python using raw_input() actually is a string, no matter what, so more specifically, how would I shortly and without try...except see if a variable is made of digits or characters?

In my opinion you have two options:
Just try to convert it to an int, but catch the exception:
try:
value = int(value)
except ValueError:
pass # it was a string, not an int.
This is the Ask Forgiveness approach.
Explicitly test if there are only digits in the string:
value.isdigit()
str.isdigit() returns True only if all characters in the string are digits (0-9).
The unicode / Python 3 str type equivalent is unicode.isdecimal() / str.isdecimal(); only Unicode decimals can be converted to integers, as not all digits have an actual integer value (U+00B2 SUPERSCRIPT 2 is a digit, but not a decimal, for example).
This is often called the Ask Permission approach, or Look Before You Leap.
The latter will not detect all valid int() values, as whitespace and + and - are also allowed in int() values. The first form will happily accept ' +10 ' as a number, the latter won't.
If your expect that the user normally will input an integer, use the first form. It is easier (and faster) to ask for forgiveness rather than for permission in that case.

if you want to check what it is:
>>>isinstance(1,str)
False
>>>isinstance('stuff',str)
True
>>>isinstance(1,int)
True
>>>isinstance('stuff',int)
False
if you want to get ints from raw_input
>>>x=raw_input('enter thing:')
enter thing: 3
>>>try: x = int(x)
except: pass
>>>isinstance(x,int)
True

The isdigit method of the str type returns True iff the given string is nothing but one or more digits. If it's not, you know the string should be treated as just a string.

Depending on your definition of shortly, you could use one of the following options:
try: int(your_input); except ValueError: # ...
your_input.isdigit()
use a regex
use parse which is kind of the opposite of format

Don't check. Go ahead and assume that it is the right input, and catch an exception if it isn't.
intresult = None
while intresult is None:
input = raw_input()
try: intresult = int(input)
except ValueError: pass

Parsing a string representing a float with an exponent in Python

I have a large file with numbers in the form of 6,52353753563E-7. So there's an exponent in that string. float() dies on this.
While I could write custom code to pre-process the string into something float() can eat, I'm looking for the pythonic way of converting these into a float (something like a format string passed somewhere). I must say I'm surprised float() can't handle strings with such an exponent, this is pretty common stuff.
I'm using python 2.6, but 3.1 is an option if need be.

Nothing to do with exponent. Problem is comma instead of decimal point.
>>> float("6,52353753563E-7")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for float(): 6,52353753563E-7
>>> float("6.52353753563E-7")
6.5235375356299998e-07
For a general approach, see locale.atof()

Your problem is not in the exponent but in the comma.
with python 3.1:
>>> a = "6.52353753563E-7"
>>> float(a)
6.52353753563e-07

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to convert unusual unicode string with number to integer in python - python

I don't know how much luck you'll have, but unicodedata may handle some cases (python 3 code): >>> import unicodedata >>> unicodedata.normalize('NFKC', '⒍') '6.' Slightly better. As to testing, if you want an int you could just int() it and catch the exception.

Related

Can't Convert Object to Float. What's the Best Workaround?

Creating one for loop for 3 strings

Storing int or str in the list

How to check if a variable is an integer or a string? [duplicate]

Parsing a string representing a float with an exponent in Python

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to convert unusual unicode string with number to integer in python - python

I don't know how much luck you'll have, but unicodedata may handle some cases (python 3 code): >>> import unicodedata >>> unicodedata.normalize('NFKC', '⒍') '6.' Slightly better. As to testing, if you want an int you could just int() it and catch the exception.

Related

Can't Convert Object to Float. What's the Best Workaround?

Creating one for loop for 3 strings

Storing int or str in the list

How to check if a variable is an integer or a string? [duplicate]

Parsing a string representing a float *with an exponent* in Python

Categories

Resources

Parsing a string representing a float with an exponent in Python