How to convert hexadecimal string to character with that code point?

How to convert hexadecimal string to character with that code point? - python

I have the string x = '0x32' and would like to turn it into y = '\x32'.
Note that len(x) == 4 and len(y) == 1.
I've tried to use z = x.replace("0", "\\"), but that causes z = '\\x32' and len(z) == 4. How can I achieve this?

You do not have to make it that hard: you can use int(..,16) to parse a hex string of the form 0x.... Next you simply use chr(..) to convert that number into a character with that Unicode (and in case the code is less than 128 ASCII) code:
y = chr(int(x,16))
This results in:
>>> chr(int(x,16))
'2'
But \x32 is equal to '2' (you can look it up in the ASCII table):
>>> chr(int(x,16)) == '\x32'
True
and:
>>> len(chr(int(x,16)))
1

Try this:
z = x[2:].decode('hex')

The ability to include code points like '\x32' inside a quoted string is a convenience for the programmer that only works in literal values inside the source code. Once you're manipulating strings in memory, that option is no longer available to you, but there are other ways of getting a character into a string based on its code point value.
Also note that '\x32' results in exactly the same string as '2'; it's just typed out differently.
Given a string containing a hexadecimal literal, you can convert it to its numeric value with int(str,16). Once you have a numeric value, you can convert it to the character with that code point via chr(). So putting it all together:
x = '0x32'
print(chr(int(x,16)))
#=> 2

Related

how to check for a presence of a character in a byte array

This sounds too basic for me to ask but here goes. I have a bytearray. I want to check for the presence of let's say 'a' or 'A' in the array and print the count of them. I do the following but I don't see any even though I know there is 'a' in there -
a_bytes = bytearray.fromhex(hex_string)
count = 0
for x in a_bytes:
if ( (x=='a') or (x == 'A') ):
count = count+1
return count
Why doesn't the above code work? I printed out the byte values as integers and I see 65 repeating multiple times.
Then again I try to convert the constant 'a' to integer using int('a') but I get an error --
ValueError: invalid literal for int() with base 10: 'a'

The values in the bytearray are stored as integers, not as hex representations. You need to search for 65 or 97, not "A" or "a".
If you want to use this to look up strings, just use a list. If you're not interested in the integer values of the bytes, a bytearray is not the right choice. Also, if you use a list, you can just use the .count method of lists to directly count occurrences of a particular value.

Comparison isn't supported between int and str types. You are trying to compare a byte with a string or character. To get the unicode codepoint of a character you can use the ord() function. Note that a unicode codepoint is an integer between 0 and about 1 million (unlike the range of a byte that is [0-255]) but in case of some encodings used in your byte array (let's say ascii or utf-8) and in case of ascii characters the usage of ord() is OK. Introducing the relation between byte arrays and strings (encoding) is out of the scope of this answer.
A solution to correct your code is to replace 'a' and 'A' with ord('a') and ord('A') respectively as others recommended.
However instead of your solution I would do this:
count = a_bytes.count(b'a') + a_bytes.count(b'A')
This makes the code much simpler and readable in your scenario.

Converting between characters and their integer-value and back is done with the functions ord and chr. So ord('A') == 65.
And as the byte-array stores the values as ints, the result is
a_bytes = bytearray.fromhex(hex_string)
count = sum(c == ord('A') or c == ord('b') for c in a_bytes)
This works because True is 1 and False is 0

If you want to work with strings and characters, you need to decode the byte data first:
a_bytes = bytearray.fromhex(hex_string).decode('ascii')
count = 0
for x in a_bytes:
if x == 'a' or x == 'A':
count += 1
return count

Float converted to 2.dp reverts to original number of decimal places when inserted into a string

I have created the following snippet of code and I am trying to convert my 5 dp DNumber to a 2 dp one and insert this into a string. However which ever method I try to use, always seems to revert the DNumber back to the original number of decimal places (5)
Code snippet below:
if key == (1, 1):
DNumber = '{r[csvnum]}'.format(r=row)
# returns 7.65321
DNumber = """%.2f""" % (float(DNumber))
# returns 7.65
Check2 = False
if DNumber:
if DNumber <= float(8):
Check2 = True
if Check2:
print DNumber
# returns 7.65
string = 'test {r[csvhello]} TESTHERE test'.format(r=row).replace("TESTHERE", str("""%.2f""" % (float(gtpe))))
# returns: test Hello 7.65321 test
string = 'test {r[csvhello]} TESTHERE test'.format(r=row).replace("TESTHERE", str(DNumber))
# returns: test Hello 7.65321 test
What I hoped it would return: test Hello 7.65 test
Any Ideas or suggestion on alternative methods to try?

It seems like you were hoping that converting the float to a 2-decimal-place string and then back to a float would give you a 2-decimal-place float.
The first problem is that your code doesn't actually do that anywhere. If you'd done that, you would get something very close to 7.65, not 7.65321.
But the bigger problem is that what you're trying to do doesn't make any sense. A float always has 53 binary digits, no matter what. If you round it to two decimal digits (no matter how you do it, including by converting to string and back), what you actually get is a float rounded to two decimal digits and then rounded to 53 binary digits. The closest float to 7.65 is not exactly 7.65, but 7.650000000000000355271368.* So, that's what you'd end up with. And there's no way around that; it's inherent to the way float is stored.
However, there is a different type you can use for this: decimal.Decimal. For example:
>>> f = 7.65321
>>> s = '%.2f' % f
>>> d = decimal.Decimal(s)
>>> f, s, d
(7.65321, '7.65', Decimal('7.65'))
Or, of course, you could just pass around a string instead of a float (as you're accidentally doing in your code already), or you could remember to use the .2f format every time you want to output it.
As a side note, since your DNumber ends up as a string, this line is not doing anything useful:
if DNumber <= 8:
In Python 2.x, comparing two values of different types gives you a consistent but arbitrary and meaningless answer. With CPython 2.x, it will always be False.** In a different Python 2.x implementation, it might be different. In Python 3.x, it raises a TypeError.
And changing it to this doesn't help in any way:
if DNumber <= float(8):
Now, instead of comparing a str to an int, you're comparing a str to a float. This is exactly as meaningless, and follows the exact same rules. (Also, float(8) means the same thing as 8.0, but less readable and potentially slower.)
For that matter, this:
if DNumber:
… is always going to be true. For a number, if foo checks whether it's non-zero. That's a bad idea for float values (you should check whether it's within some absolute or relative error range of 0). But again, you don't have a float value; you have a str. And for strings, if foo checks whether the string is non-empty. So, even if you started off with 0, your string "0.00" is going to be true.
* I'm assuming here that you're using CPython, on a platform that uses IEEE-754 double for its C double type, and that all those extra conversions back and forth between string and float aren't introducing any additional errors.
** The rule is, slightly simplified: If you compare two numbers, they're converted to a type that can hold them both; otherwise, if either value is None it's smaller; otherwise, if either value is a number, it's smaller; otherwise, whichever one's type has an alphabetically earlier name is smaller.

I think you're trying to do the following - combine the formatting with the getter:
>>> a = 123.456789
>>> row = {'csvnum': a}
>>> print 'test {r[csvnum]:.2f} hello'.format(r=row)
test 123.46 hello

If your number is a 7 followed by five digits, you might want to try:
print "%r" % float(str(x)[:4])
where x is the float in question.
Example:
>>>x = 1.11111
>>>print "%r" % float(str(x)[:4])
>>>1.11

Length of hexadecimal number

How can we get the length of a hexadecimal number in the Python language?
I tried using this code but even this is showing some error.
i = 0
def hex_len(a):
if a > 0x0:
# i = 0
i = i + 1
a = a/16
return i
b = 0x346
print(hex_len(b))
Here I just used 346 as the hexadecimal number, but my actual numbers are very big to be counted manually.

Use the function hex:
>>> b = 0x346
>>> hex(b)
'0x346'
>>> len(hex(b))-2
3
or using string formatting:
>>> len("{:x}".format(b))
3

While using the string representation as intermediate result has some merits in simplicity it's somewhat wasted time and memory. I'd prefer a mathematical solution (returning the pure number of digits without any 0x-prefix):
from math import ceil, log
def numberLength(n, base=16):
return ceil(log(n+1)/log(base))
The +1 adjustment takes care of the fact, that for an exact power of your number base you need a leading "1".

As Ashwini wrote, the hex function does the hard work for you:
hex(x)
Convert an integer number (of any size) to a hexadecimal string. The result is a valid Python expression.

Why do I have to convert numbers into strings to get its location?

Why will it not print out the position of a integer/float until I have converted it to a string?
example
x = 123
print x[0] # error
To fix this I have to do
x = 123
print str(x)[0]
But why do I have to make it into a string for it to work?

Well, why should this work in the first place? What is the nth index of a number; what is index 0 of the decimal number 123?
Is it 1 because of its decimal representation?
Is it 7 because of its hexadecimal representation (7B)?
Is it 0 because of its hexadecimal representation (0x7B)?
Is it 1 because of its octal representation (173)?
Is it 0 because of its octal representation (0173)?
Is it 1 because of its binary representation (1111011)
Is it 1 because of its binary representation with the least significant bit first (1101111)?
Is it S because that’s what 123 is in ASCII?
…
As you can see, this is very unclear, and it does not make sense to begin with. When using the index access on a list, you get the nth element of the list. When you use the index access on a sequence, you get the nth element of the sequence. A string is a sequence of characters, so when using the index access on a string, you get the nth element of the string sequence; the nth character. A number is no sequence, only its string representation in some format is.

123 is but one representation of the integer value. Python int values are not sequences or mappings, so [item] indexing has no meaning for them.
By turning the number into a string, you 'capture' the representation into a series of digit characters and you then can get the first one.
Another way to do it would be to divide by 10 until you have a number lower than 10:
x = 123
while x > 10:
x //= 10
print x # prints the number 1
Note that x then holds an int still, not a single-character string.

The simple answer is because you have the wrong type. Integers don't support indexing -- and I really don't think they should (they're not sequences or mappings and I can't think of any way that indexing an integer actually makes sense).
Note that there is more than one way to represent an integer as well:
>>> 0x7b == 123
True
So in this case, who is to say that x[0] should return 1 instead of 0 (or 7) depending on how you want to think of it?

As a side note to the excellent answers above, this is one way you can convert a number to a list of numbers to allow indexing:
In [2]: map(int,str(123))
Out[2]: [1, 2, 3]
In [3]: map(int,str(123))[0]
Out[3]: 1
In [4]: type(map(int,str(123))[0])
Out[4]: int

How do I calculate the numeric value of a string with unicode components in python?

Along the lines of my previous question, How do I convert unicode characters to floats in Python? , I would like to find a more elegant solution to calculating the value of a string that contains unicode numeric values.
For example, take the strings "1⅕" and "1 ⅕". I would like these to resolve to 1.2
I know that I can iterate through the string by character, check for unicodedata.category(x) == "No" on each character, and convert the unicode characters by unicodedata.numeric(x). I would then have to split the string and sum the values. However, this seems rather hacky and unstable. Is there a more elegant solution for this in Python?

I think this is what you want...
import unicodedata
def eval_unicode(s):
#sum all the unicode fractions
u = sum(map(unicodedata.numeric, filter(lambda x: unicodedata.category(x)=="No",s)))
#eval the regular digits (with optional dot) as a float, or default to 0
n = float("".join(filter(lambda x:x.isdigit() or x==".", s)) or 0)
return n+u
or the "comprehensive" solution, for those who prefer that style:
import unicodedata
def eval_unicode(s):
#sum all the unicode fractions
u = sum(unicodedata.numeric(i) for i in s if unicodedata.category(i)=="No")
#eval the regular digits (with optional dot) as a float, or default to 0
n = float("".join(i for i in s if i.isdigit() or i==".") or 0)
return n+u
But beware, there are many unicode values that seem to not have a numeric value assigned in python (for example ⅜⅝ don't work... or maybe is just a matter with my keyboard xD).
Another note on the implementation: it's "too robust", it will work even will malformed numbers like "123½3 ½" and will eval it to 1234.0... but it won't work if there are more than one dots.

>>> import unicodedata
>>> b = '10 ⅕'
>>> int(b[:-1]) + unicodedata.numeric(b[-1])
10.2
define convert_dubious_strings(s):
try:
return int(s)
except UnicodeEncodeError:
return int(b[:-1]) + unicodedata.numeric(b[-1])
and if it might have no integer part than another try-except sub-block needs to be added.

This might be sufficient for you, depending on the strange edge cases you want to deal with:
val = 0
for c in my_unicode_string:
if unicodedata.category(unichr(c)) == 'No':
cval = unicodedata.numeric(c)
elif c.isdigit():
cval = int(c)
else:
continue
if cval == int(cval):
val *= 10
val += cval
print val
Whole digits are assumed to be another digit in the number, fractional characters are assumed to be fractions to add to the number. Doesn't do the right thing with spaces between digits, repeated fractions, etc.

I think you'll need a regular expression, explicitly listing the characters that you want to support. Not all numerical characters are suitable for the kind of composition that you envision - for example, what should be the numerical value of
u"4\N{CIRCLED NUMBER FORTY TWO}2\N{SUPERSCRIPT SIX}"
???
Do
for i in range(65536):
if unicodedata.category(unichr(i)) == 'No':
print hex(i), unicodedata.name(unichdr(i))
and go through the list defining which ones you really want to support.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to convert hexadecimal string to character with that code point? - python

I have the string x = '0x32' and would like to turn it into y = '\x32'. Note that len(x) == 4 and len(y) == 1. I've tried to use z = x.replace("0", "\\"), but that causes z = '\\x32' and len(z) == 4. How can I achieve this?

Try this: z = x[2:].decode('hex')

Related

how to check for a presence of a character in a byte array

Float converted to 2.dp reverts to original number of decimal places when inserted into a string

Length of hexadecimal number

Why do I have to convert numbers into strings to get its location?

How do I calculate the numeric value of a string with unicode components in python?

Categories

Resources