I am sure this question has already been asked, so forgive me for the duplicate.
Python's chr() function returns the unicode string representation of 1 ordinal value. How can I return a unicode string of a string of ordinals? For example:
john:
j - 106
o - 111
h - 104
n - 110
The full unicode string is: 106111104110
My current method is:
from textwrap import wrap
ct = "106111104110" # unicode string
Split = wrap(ct,3) # split into threes list
inInt = list(map(int, Split)) # convert list of string into list of int
answer=''.join([chr(num) for num in inInt]) # return unicode string for each 3 character string
print(answer)
The above works correctly, printing "john".
However this does not work when the unicode for the value is less than 3 characters, or less than 100. For example:
apple:
a - 97
p - 112
p - 112
l - 108
e - 101
The full unicode string is: 97112112108101
However doing:
ct="97112112108101"
Split = wrap(ct,3)
inInt = list(map(int, Split))
answer=''.join([chr(num) for num in inInt])
print(answer)
will print ϋyyQ because the unicode of a is 97, which is only 2 characters. I would like to not be constricted to using only characters over 100.
Is there a python library that has the functionality I am looking for? Many thanks in advance.
Unicode code points can be up to six hexadecimal digits or seven decimal digits, so you could use leading zeros for consistency:
>>> ''.join(format(ord(x),'06x') for x in 'john')
'00006a00006f00006800006e'
>>> ''.join(chr(int(_[i:i+6],16)) for i in range(0,len(_),6)) # _ gets previous result from REPL.
'john'
>>> ''.join(format(ord(x),'06x') for x in '你好吗')
'004f6000597d005417'
>>> ''.join(chr(int(_[i:i+6],16)) for i in range(0,len(_),6))
'你好吗'
However, typical encoding is performed on byte strings, so encode to UTF-8 first, then you can use bytes methods to get two-digit hex strings:
>>> 'apple'.encode('utf8').hex()
'6170706c65'
>>> bytes.fromhex(_).decode()
'apple'
>>> '你好吗'.encode('utf8').hex()
'e4bda0e5a5bde59097'
>>> bytes.fromhex(_).decode('utf8')
'你好吗'
Related
I have a string of the following form
\xNN\xNN\xNN\xNN…
N can be any digit from 0 to 9. For example:
str = "\x41\x42\x43"
\xNN is a hexadecimal number that represents a character according to ASCII code.
Is there a simple way to convert this type of string to a normal string? For example "\x41\x42\x43" is equivalent to "ABC".
How about
>>> s = b"\x41\x42\x43"
>>> print(s)
b'ABC'
Or
>>> s = "\x41\x42\x43"
>>> print(s.encode())
b'ABC'
I have some string text in unicode, containing some numbers as below:
txt = '36fsdfdsf14'
However, int(txt[:2]) does not recognize the characters as number. How to change the characters to have them recognized as number?
If you actually have Unicode (or decode your byte string to Unicode) then you can normalize the data with a canonical replacement:
>>> s = u'36fsdfdsf14'
>>> s
u'\uff13\uff16fsdfdsf\uff11\uff14'
>>> import unicodedata as ud
>>> ud.normalize('NFKC',s)
u'36fsdfdsf14'
If canonical normalization changes too much for you, you can make a translation table of just the replacements you want:
#coding:utf8
repl = u'0123456789'
# Fullwidth digits are U+FF10 to U+FF19.
# This makes a lookup table from Unicode ordinal to the ASCII character equivalent.
xlat = dict(zip(range(0xff10,0xff1a),repl))
s = u'36fsdfdsf14'
print(s.translate(xlat))
Output:
36fsdfdsf14
On python 3
[int(x) for x in re.findall(r'\d+', '36fsdfdsf14')]
# [36, 14]
On python 2
[int(x) for x in re.findall(r'\d+', u'36fsdfdsf14', re.U)]
# [36, 14]
About python 2 example, notice the 'u' in front of string and re.U flag. You may convert existing str typed variable such as txt in your question to unicode as txt.decode('utf8').
Hi i’m sort of new to Python and i’m trying to convert a string of characters to ASCII in Python but I don’t know how to do that
So the relevant parts of my code are probably this
string = input(“Enter a line of text: “)
l = list(string)
return(l)
So it puts the input in a list because then it’s separate characters instead of a whole string of them but then I don’t know how to convert it to ASCII. I know I have to use ord() but I don’t understand how to do that when it’s more than one character because I don’t know what the input will be so I can’t just do like ord(A).
How can I do it?
You might use a list comprehension, like so:
string = input(“Enter a line of text: “)
list_of_characters = list(string)
list_of_ord_values = [ord(character) for character in list_of_characters]
Note that, since strings are iterable, you can skip the intermediate step:
list_of_ord_values = [ord(character) for character in string]
You need to make a for loop to loop over each character and convert them to ASCII.
You already have l which is a list of each character in the string. So loop over l and convert each character in the list to the ASCII value using ord().
Here's a time complexity efficient way of doing it. Assuming your list is of single characters as such:
l = ['a', 'b', 'c']
ascii_values = map(ord, l)
This will map each character to its specified ascii value.
If you are using python 2, strings are encoded as ascii by default. If you are using python 3, you can convert a string to ascii using the built-in string method encode:
ascii = string.encode('ascii')
If you want this to be represented as a list, you might try encoding each separately, as the default iterated representation for ascii characters are integers. (try [x for x in b'binary string!'])
ascii_chars = [char.encode('ascii') for char in string]
string = input("enter string: ")
for i in string:
if i != ' ': # ignoring space char
print(i, ord(i))
Input
sample string
Output
s 115
a 97
m 109
p 112
l 108
e 101
s 115
t 116
r 114
i 105
n 110
g 103
The output can even be saved to another list.
What is the most 'Pythonic' way of translating
'\xff\xab\x12'
into
'ffab12'
I looked for functions that can do it, but they all want to translate to ASCII (so '\x40' to 'a'). I want to have the hexadecimal digits in ASCII.
There's a module called binascii that contains functions for just this:
>>> import binascii
>>> binascii.hexlify('\xff\xab\x12')
'ffab12'
>>> binascii.unhexlify('ffab12')
'\xff\xab\x12'
original = '\xff\xab\x12'
result = original.replace('\\x', '')
print result
It's \x because it's escaped. a.replace(b,c) just replaces all occurances of b with c in a.
What you want is not ascii, because ascii translates 0x41 to 'A'. You just want it in hexadecimal base without the \x (or 0x, in some cases)
Edit!!
Sorry, I thought the \x is escaped. So, \x followed by 2 hex digits represents a single char, not 4..
print "\x41"
Will print
A
So what we have to do is to convert each char to hex, then print it like that:
res = ""
for i in original:
res += hex(ord(i))[2:].zfill(2)
print res
Now let's go over this line:
hex(ord(i))[2:]
ord(c) - returns the numerical value of the char c
hex(i) - returns the hex string value of the int i (e.g if i=65 it will return 0x41.
[2:] - cutting the "0x" prefix out of the hex string.
.zfill(2) - padding with zeroes
So, making that with a list comprehension will be much shorter:
result = "".join([hex(ord(c))[2:].zfill(2) for c in original])
print result
For example, I get a string:
str = "please answer my question"
I want to write it to a file.
But I need to know the size of the string before writing the string to the file. What function can I use to calculate the size of the string?
If you are talking about the length of the string, you can use len():
>>> s = 'please answer my question'
>>> len(s) # number of characters in s
25
If you need the size of the string in bytes, you need sys.getsizeof():
>>> import sys
>>> sys.getsizeof(s)
58
Also, don't call your string variable str. It shadows the built-in str() function.
Python 3:
user225312's answer is correct:
A. To count number of characters in str object, you can use len() function:
>>> print(len('please anwser my question'))
25
B. To get memory size in bytes allocated to store str object, you can use sys.getsizeof() function
>>> from sys import getsizeof
>>> print(getsizeof('please anwser my question'))
50
Python 2:
It gets complicated for Python 2.
A. The len() function in Python 2 returns count of bytes allocated to store encoded characters in a str object.
Sometimes it will be equal to character count:
>>> print(len('abc'))
3
But sometimes, it won't:
>>> print(len('йцы')) # String contains Cyrillic symbols
6
That's because str can use variable-length encoding internally. So, to count characters in str you should know which encoding your str object is using. Then you can convert it to unicode object and get character count:
>>> print(len('йцы'.decode('utf8'))) #String contains Cyrillic symbols
3
B. The sys.getsizeof() function does the same thing as in Python 3 - it returns count of bytes allocated to store the whole string object
>>> print(getsizeof('йцы'))
27
>>> print(getsizeof('йцы'.decode('utf8')))
32
>>> s = 'abcd'
>>> len(s)
4
You also may use str.len() to count length of element in the column
data['name of column'].str.len()
The most Pythonic way is to use the len(). Keep in mind that the '\' character in escape sequences is not counted and can be dangerous if not used correctly.
>>> len('foo')
3
>>> len('\foo')
3
>>> len('\xoo')
File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \xXX escape
Do you want to find the length of the string in python language ? If you want to find the length of the word, you can use the len function.
string = input("Enter the string : ")
print("The string length is : ",len(string))
OUTPUT : -
Enter the string : viral
The string length is : 5