Print a string as hexadecimal bytes - python

I have this string: Hello, World! and I want to print it using Python as '48:65:6c:6c:6f:2c:20:57:6f:72:6c:64:21'.
hex() works only for integers.
How can it be done?

You can transform your string to an integer generator. Apply hexadecimal formatting for each element and intercalate with a separator:
>>> s = "Hello, World!"
>>> ":".join("{:02x}".format(ord(c)) for c in s)
'48:65:6c:6c:6f:2c:20:57:6f:72:6c:64:21

':'.join(x.encode('hex') for x in 'Hello, World!')

For Python 2.x:
':'.join(x.encode('hex') for x in 'Hello, World!')
The code above will not work with Python 3.x. For 3.x, the code below will work:
':'.join(hex(ord(x))[2:] for x in 'Hello, World!')

Another answer in two lines that some might find easier to read, and helps with debugging line breaks or other odd characters in a string:
For Python 2.7
for character in string:
print character, character.encode('hex')
For Python 3.7 (not tested on all releases of 3)
for character in string:
print(character, character.encode('utf-8').hex())

Some complements to Fedor Gogolev's answer:
First, if the string contains characters whose ASCII code is below 10, they will not be displayed as required. In that case, the correct format should be {:02x}:
>>> s = "Hello Unicode \u0005!!"
>>> ":".join("{0:x}".format(ord(c)) for c in s)
'48:65:6c:6c:6f:20:75:6e:69:63:6f:64:65:20:5:21:21'
^
>>> ":".join("{:02x}".format(ord(c)) for c in s)
'48:65:6c:6c:6f:20:75:6e:69:63:6f:64:65:20:05:21:21'
^^
Second, if your "string" is in reality a "byte string" -- and since the difference matters in Python 3 -- you might prefer the following:
>>> s = b"Hello bytes \x05!!"
>>> ":".join("{:02x}".format(c) for c in s)
'48:65:6c:6c:6f:20:62:79:74:65:73:20:05:21:21'
Please note there is no need for conversion in the above code as a bytes object is defined as "an immutable sequence of integers in the range 0 <= x < 256".

Print a string as hex bytes?
The accepted answer gives:
s = "Hello world !!"
":".join("{:02x}".format(ord(c)) for c in s)
returns:
'48:65:6c:6c:6f:20:77:6f:72:6c:64:20:21:21'
The accepted answer works only so long as you use bytes (mostly ascii characters). But if you use unicode, e.g.:
a_string = u"Привет мир!!" # "Prevyet mir", or "Hello World" in Russian.
You need to convert to bytes somehow.
If your terminal doesn't accept these characters, you can decode from UTF-8 or use the names (so you can paste and run the code along with me):
a_string = (
"\N{CYRILLIC CAPITAL LETTER PE}"
"\N{CYRILLIC SMALL LETTER ER}"
"\N{CYRILLIC SMALL LETTER I}"
"\N{CYRILLIC SMALL LETTER VE}"
"\N{CYRILLIC SMALL LETTER IE}"
"\N{CYRILLIC SMALL LETTER TE}"
"\N{SPACE}"
"\N{CYRILLIC SMALL LETTER EM}"
"\N{CYRILLIC SMALL LETTER I}"
"\N{CYRILLIC SMALL LETTER ER}"
"\N{EXCLAMATION MARK}"
"\N{EXCLAMATION MARK}"
)
So we see that:
":".join("{:02x}".format(ord(c)) for c in a_string)
returns
'41f:440:438:432:435:442:20:43c:438:440:21:21'
a poor/unexpected result - these are the code points that combine to make the graphemes we see in Unicode, from the Unicode Consortium - representing languages all over the world. This is not how we actually store this information so it can be interpreted by other sources, though.
To allow another source to use this data, we would usually need to convert to UTF-8 encoding, for example, to save this string in bytes to disk or to publish to html. So we need that encoding to convert the code points to the code units of UTF-8 - in Python 3, ord is not needed because bytes are iterables of integers:
>>> ":".join("{:02x}".format(c) for c in a_string.encode('utf-8'))
'd0:9f:d1:80:d0:b8:d0:b2:d0:b5:d1:82:20:d0:bc:d0:b8:d1:80:21:21'
Or perhaps more elegantly, using the new f-strings (only available in Python 3):
>>> ":".join(f'{c:02x}' for c in a_string.encode('utf-8'))
'd0:9f:d1:80:d0:b8:d0:b2:d0:b5:d1:82:20:d0:bc:d0:b8:d1:80:21:21'
In Python 2, pass c to ord first, i.e. ord(c) - more examples:
>>> ":".join("{:02x}".format(ord(c)) for c in a_string.encode('utf-8'))
'd0:9f:d1:80:d0:b8:d0:b2:d0:b5:d1:82:20:d0:bc:d0:b8:d1:80:21:21'
>>> ":".join(format(ord(c), '02x') for c in a_string.encode('utf-8'))
'd0:9f:d1:80:d0:b8:d0:b2:d0:b5:d1:82:20:d0:bc:d0:b8:d1:80:21:21'

You can use hexdump's:
import hexdump
hexdump.dump("Hello, World!", sep=":")
(append .lower() if you require lower-case). This works for both Python 2 and 3.

Using map and lambda function can produce a list of hex values, which can be printed (or used for other purposes)
>>> s = 'Hello 1 2 3 \x01\x02\x03 :)'
>>> map(lambda c: hex(ord(c)), s)
['0x48', '0x65', '0x6c', '0x6c', '0x6f', '0x20', '0x31', '0x20', '0x32', '0x20', '0x33', '0x20', '0x1', '0x2', '0x3', '0x20', '0x3a', '0x29']

A bit more general for those who don't care about Python 3 or colons:
from codecs import encode
data = open('/dev/urandom', 'rb').read(20)
print(encode(data, 'hex')) # Data
print(encode(b"hello", 'hex')) # String

This can be done in the following ways:
from __future__ import print_function
str = "Hello, World!"
for char in str:
mm = int(char.encode('hex'), 16)
print(hex(mm), sep=':', end=' ')
The output of this will be in hexadecimal as follows:
0x48 0x65 0x6c 0x6c 0x6f 0x20 0x57 0x6f 0x72 0x6c 0x64 0x21

For something that offers more performance than ''.format(), you can use this:
>>> ':'.join( '%02x'%(v if type(v) is int else ord(v)) for v in 'Hello, World!' )
'48:65:6C:6C:6F:2C:20:57:6F:72:6C:64:21'
>>>
>>> ':'.join( '%02x'%(v if type(v) is int else ord(v)) for v in b'Hello, World!' )
'48:65:6C:6C:6F:2C:20:57:6F:72:6C:64:21'
>>>
I am sorry this couldn't look nicer.
It would be nice if one could simply do '%02x'%v, but that only takes int...
But you'll be stuck with byte-strings b'' without the logic to select ord(v).

With f-string:
"".join(f"{ord(c):x}" for c in "Hello")
Use any delimiter:
>>> "⚡".join(f"{ord(c):x}" for c in "Hello")
'48⚡65⚡6c⚡6c⚡6f'

Just for convenience, very simple.
def hexlify_byteString(byteString, delim="%"):
''' Very simple way to hexlify a byte string using delimiters '''
retval = ""
for intval in byteString:
retval += ('0123456789ABCDEF'[int(intval / 16)])
retval += ('0123456789ABCDEF'[int(intval % 16)])
retval += delim
return(retval[:-1])
hexlify_byteString(b'Hello, World!', ":")
# Out[439]: '48:65:6C:6C:6F:2C:20:57:6F:72:6C:64:21'

Related

Python strings as long number

I have long string and I want to present it as a long num.
I tried:
l=[ord (i)for i in str1]
but this is not what I need.
I need to make it long number and not numbers as items in the list.
this line gives me [23,21,45,34,242,32]
and I want to make it one long Number that I can change it again to the same string.
any idea?
Here is a translation of Paulo Bu's answer (with base64 encoding) into Python 3:
>>> import base64
>>> s = 'abcde'
>>> e = base64.b64encode(s.encode('utf-8'))
>>> print(e)
b'YWJjZGU='
>>> base64.b64decode(e).decode('utf-8')
'abcde'
Basically the difference is that your workflow has gone from:
string -> base64
base64 -> string
To:
string -> bytes
bytes -> base64
base64 -> bytes
bytes -> string
Is this what you are looking for :
>>> str = 'abcdef'
>>> ''.join([chr(y) for y in [ ord(x) for x in str ]])
'abcdef'
>>>
#! /usr/bin/python2
# coding: utf-8
def encode(s):
result = 0
for ch in s.encode('utf-8'):
result *= 256
result += ord(ch)
return result
def decode(i):
result = []
while i:
result.append(chr(i%256))
i /= 256
result = reversed(result)
result = ''.join(result)
result = result.decode('utf-8')
return result
orig = u'Once in Persia reigned a king …'
cipher = encode(orig)
clear = decode(cipher)
print '%s -> %s -> %s'%(orig, cipher, clear)
This is code that I found that works.
str='sdfsdfsdfdsfsdfcxvvdfvxcvsdcsdcs sdcsdcasd'
I=int.from_bytes(bytes([ord (i)for i in str]),byteorder='big')
print(I)
print(I.to_bytes(len(str),byteorder='big'))
What about using base 64 encoding? Are you fine with it? Here's an example:
>>>import base64
>>>s = 'abcde'
>>>e = base64.b64encode(s)
>>>print e
YWJjZGU=
>>>base64.b64decode(e)
'abcde'
The encoding is not pure numbers but you can go back and forth from string without much trouble.
You can also try encoding the string to hexadecimal. This will yield numbers although I'm not sure you can always come back from the encode string to the original string:
>>>s='abc'
>>>n=s.encode('hex')
>>>print n
'616263'
>>>n.decode('hex')
'abc'
If you need it to be actual integers then you can extend the trick:
>>>s='abc'
>>>n=int(s.encode('hex'), 16) #convert it to integer
>>>print n
6382179
hex(n)[2:].decode('hex') # return from integer to string
>>>abc
Note: I'm not sure this work out of the box in Python 3
UPDATE: To make it work with Python 3 I suggest using binascii module this way:
>>>import binascii
>>>s = 'abcd'
>>>n = int(binascii.hexlify(s.encode()), 16) # encode is needed to convert unicode to bytes
>>>print(n)
1633837924 #integer
>>>binascii.unhexlify(hex(n)[2:].encode()).decode()
'abcd'
encode and decode methods are needed to convert from bytes to string and the opposite. If you plan to include especial (non-ascii) characters then probably you'll need to specify encodings.
Hope this helps!

How to convert string to binary?

I am in need of a way to get the binary representation of a string in python. e.g.
st = "hello world"
toBinary(st)
Is there a module of some neat way of doing this?
Something like this?
>>> st = "hello world"
>>> ' '.join(format(ord(x), 'b') for x in st)
'1101000 1100101 1101100 1101100 1101111 100000 1110111 1101111 1110010 1101100 1100100'
#using `bytearray`
>>> ' '.join(format(x, 'b') for x in bytearray(st, 'utf-8'))
'1101000 1100101 1101100 1101100 1101111 100000 1110111 1101111 1110010 1101100 1100100'
If by binary you mean bytes type, you can just use encode method of the string object that encodes your string as a bytes object using the passed encoding type. You just need to make sure you pass a proper encoding to encode function.
In [9]: "hello world".encode('ascii')
Out[9]: b'hello world'
In [10]: byte_obj = "hello world".encode('ascii')
In [11]: byte_obj
Out[11]: b'hello world'
In [12]: byte_obj[0]
Out[12]: 104
Otherwise, if you want them in form of zeros and ones --binary representation-- as a more pythonic way you can first convert your string to byte array then use bin function within map :
>>> st = "hello world"
>>> map(bin,bytearray(st))
['0b1101000', '0b1100101', '0b1101100', '0b1101100', '0b1101111', '0b100000', '0b1110111', '0b1101111', '0b1110010', '0b1101100', '0b1100100']
Or you can join it:
>>> ' '.join(map(bin,bytearray(st)))
'0b1101000 0b1100101 0b1101100 0b1101100 0b1101111 0b100000 0b1110111 0b1101111 0b1110010 0b1101100 0b1100100'
Note that in python3 you need to specify an encoding for bytearray function :
>>> ' '.join(map(bin,bytearray(st,'utf8')))
'0b1101000 0b1100101 0b1101100 0b1101100 0b1101111 0b100000 0b1110111 0b1101111 0b1110010 0b1101100 0b1100100'
You can also use binascii module in python 2:
>>> import binascii
>>> bin(int(binascii.hexlify(st),16))
'0b110100001100101011011000110110001101111001000000111011101101111011100100110110001100100'
hexlify return the hexadecimal representation of the binary data then you can convert to int by specifying 16 as its base then convert it to binary with bin.
We just need to encode it.
'string'.encode('ascii')
You can access the code values for the characters in your string using the ord() built-in function. If you then need to format this in binary, the string.format() method will do the job.
a = "test"
print(' '.join(format(ord(x), 'b') for x in a))
(Thanks to Ashwini Chaudhary for posting that code snippet.)
While the above code works in Python 3, this matter gets more complicated if you're assuming any encoding other than UTF-8. In Python 2, strings are byte sequences, and ASCII encoding is assumed by default. In Python 3, strings are assumed to be Unicode, and there's a separate bytes type that acts more like a Python 2 string. If you wish to assume any encoding other than UTF-8, you'll need to specify the encoding.
In Python 3, then, you can do something like this:
a = "test"
a_bytes = bytes(a, "ascii")
print(' '.join(["{0:b}".format(x) for x in a_bytes]))
The differences between UTF-8 and ascii encoding won't be obvious for simple alphanumeric strings, but will become important if you're processing text that includes characters not in the ascii character set.
In Python version 3.6 and above you can use f-string to format result.
str = "hello world"
print(" ".join(f"{ord(i):08b}" for i in str))
01101000 01100101 01101100 01101100 01101111 00100000 01110111 01101111 01110010 01101100 01100100
The left side of the colon, ord(i), is the actual object whose value
will be formatted and inserted into the output. Using ord() gives you
the base-10 code point for a single str character.
The right hand side of the colon is the format specifier. 08 means
width 8, 0 padded, and the b functions as a sign to output the
resulting number in base 2 (binary).
def method_a(sample_string):
binary = ' '.join(format(ord(x), 'b') for x in sample_string)
def method_b(sample_string):
binary = ' '.join(map(bin,bytearray(sample_string,encoding='utf-8')))
if __name__ == '__main__':
from timeit import timeit
sample_string = 'Convert this ascii strong to binary.'
print(
timeit(f'method_a("{sample_string}")',setup='from __main__ import method_a'),
timeit(f'method_b("{sample_string}")',setup='from __main__ import method_b')
)
# 9.564299999998184 2.943955828988692
method_b is substantially more efficient at converting to a byte array because it makes low level function calls instead of manually transforming every character to an integer, and then converting that integer into its binary value.
This is an update for the existing answers which used bytearray() and can not work that way anymore:
>>> st = "hello world"
>>> map(bin, bytearray(st))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding
Because, as explained in the link above, if the source is a string, you must also give the encoding:
>>> map(bin, bytearray(st, encoding='utf-8'))
<map object at 0x7f14dfb1ff28>
''.join(format(i, 'b') for i in bytearray(str, encoding='utf-8'))
This works okay since its easy to now revert back to the string as no
zeros will be added to reach the 8 bits to form a byte hence easy to
revert to string to avoid complexity of removing the zeros added.
a = list(input("Enter a string\t: "))
def fun(a):
c =' '.join(['0'*(8-len(bin(ord(i))[2:]))+(bin(ord(i))[2:]) for i in a])
return c
print(fun(a))

Python get character code in different encoding?

Given a character code as integer number in one encoding, how can you get the character code in, say, utf-8 and again as integer?
UTF-8 is a variable-length encoding, so I'll assume you really meant "Unicode code point". Use chr() to convert the character code to a character, decode it, and use ord() to get the code point.
>>> ord(chr(145).decode('koi8-r'))
9618
You can only map an "integer number" from one encoding to another if they are both single-byte encodings.
Here's an example using "iso-8859-15" and "cp1252" (aka "ANSI"):
>>> s = u'€'
>>> s.encode('iso-8859-15')
'\xa4'
>>> s.encode('cp1252')
'\x80'
>>> ord(s.encode('cp1252'))
128
>>> ord(s.encode('iso-8859-15'))
164
Note that ord is here being used to get the ordinal number of the encoded byte. Using ord on the original unicode string would give its unicode code point:
>>> ord(s)
8364
The reverse operation to ord can be done using either chr (for codes in the range 0 to 127) or unichr (for codes in the range 0 to sys.maxunicode):
>>> print chr(65)
A
>>> print unichr(8364)
€
For multi-byte encodings, a simple "integer number" mapping is usually not possible.
Here's the same example as above, but using "iso-8859-15" and "utf-8":
>>> s = u'€'
>>> s.encode('iso-8859-15')
'\xa4'
>>> s.encode('utf-8')
'\xe2\x82\xac'
>>> [ord(c) for c in s.encode('iso-8859-15')]
[164]
>>> [ord(c) for c in s.encode('utf-8')]
[226, 130, 172]
The "utf-8" encoding uses three bytes to encode the same character, so a one-to-one mapping is not possible. Having said that, many encodings (including "utf-8") are designed to be ASCII-compatible, so a mapping is usually possible for codes in the range 0-127 (but only trivially so, because the code will always be the same).
Here's an example of how the encode/decode dance works:
>>> s = b'd\x06' # perhaps start with bytes encoded in utf-16
>>> map(ord, s) # show those bytes as integers
[100, 6]
>>> u = s.decode('utf-16') # turn the bytes into unicode
>>> print u # show what the character looks like
٤
>>> print ord(u) # show the unicode code point as an integer
1636
>>> t = u.encode('utf-8') # turn the unicode into bytes with a different encoding
>>> map(ord, t) # show that encoding as integers
[217, 164]
Hope this helps :-)
If you need to construct the unicode directly from an integer, use unichr:
>>> u = unichr(1636)
>>> print u
٤

How do I lowercase a string in Python?

Is there a way to convert a string to lowercase?
"Kilometers" → "kilometers"
Use str.lower():
"Kilometer".lower()
The canonical Pythonic way of doing this is
>>> 'Kilometers'.lower()
'kilometers'
However, if the purpose is to do case insensitive matching, you should use case-folding:
>>> 'Kilometers'.casefold()
'kilometers'
Here's why:
>>> "Maße".casefold()
'masse'
>>> "Maße".lower()
'maße'
>>> "MASSE" == "Maße"
False
>>> "MASSE".lower() == "Maße".lower()
False
>>> "MASSE".casefold() == "Maße".casefold()
True
This is a str method in Python 3, but in Python 2, you'll want to look at the PyICU or py2casefold - several answers address this here.
Unicode Python 3
Python 3 handles plain string literals as unicode:
>>> string = 'Километр'
>>> string
'Километр'
>>> string.lower()
'километр'
Python 2, plain string literals are bytes
In Python 2, the below, pasted into a shell, encodes the literal as a string of bytes, using utf-8.
And lower doesn't map any changes that bytes would be aware of, so we get the same string.
>>> string = 'Километр'
>>> string
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> string.lower()
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> print string.lower()
Километр
In scripts, Python will object to non-ascii (as of Python 2.5, and warning in Python 2.4) bytes being in a string with no encoding given, since the intended coding would be ambiguous. For more on that, see the Unicode how-to in the docs and PEP 263
Use Unicode literals, not str literals
So we need a unicode string to handle this conversion, accomplished easily with a unicode string literal, which disambiguates with a u prefix (and note the u prefix also works in Python 3):
>>> unicode_literal = u'Километр'
>>> print(unicode_literal.lower())
километр
Note that the bytes are completely different from the str bytes - the escape character is '\u' followed by the 2-byte width, or 16 bit representation of these unicode letters:
>>> unicode_literal
u'\u041a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> unicode_literal.lower()
u'\u043a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
Now if we only have it in the form of a str, we need to convert it to unicode. Python's Unicode type is a universal encoding format that has many advantages relative to most other encodings. We can either use the unicode constructor or str.decode method with the codec to convert the str to unicode:
>>> unicode_from_string = unicode(string, 'utf-8') # "encoding" unicode from string
>>> print(unicode_from_string.lower())
километр
>>> string_to_unicode = string.decode('utf-8')
>>> print(string_to_unicode.lower())
километр
>>> unicode_from_string == string_to_unicode == unicode_literal
True
Both methods convert to the unicode type - and same as the unicode_literal.
Best Practice, use Unicode
It is recommended that you always work with text in Unicode.
Software should only work with Unicode strings internally, converting to a particular encoding on output.
Can encode back when necessary
However, to get the lowercase back in type str, encode the python string to utf-8 again:
>>> print string
Километр
>>> string
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> string.decode('utf-8')
u'\u041a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> string.decode('utf-8').lower()
u'\u043a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> string.decode('utf-8').lower().encode('utf-8')
'\xd0\xba\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> print string.decode('utf-8').lower().encode('utf-8')
километр
So in Python 2, Unicode can encode into Python strings, and Python strings can decode into the Unicode type.
With Python 2, this doesn't work for non-English words in UTF-8. In this case decode('utf-8') can help:
>>> s='Километр'
>>> print s.lower()
Километр
>>> print s.decode('utf-8').lower()
километр
Also, you can overwrite some variables:
s = input('UPPER CASE')
lower = s.lower()
If you use like this:
s = "Kilometer"
print(s.lower()) - kilometer
print(s) - Kilometer
It will work just when called.
Don't try this, totally un-recommend, don't do this:
import string
s='ABCD'
print(''.join([string.ascii_lowercase[string.ascii_uppercase.index(i)] for i in s]))
Output:
abcd
Since no one wrote it yet you can use swapcase (so uppercase letters will become lowercase, and vice versa) (and this one you should use in cases where i just mentioned (convert upper to lower, lower to upper)):
s='ABCD'
print(s.swapcase())
Output:
abcd
I would like to provide the summary of all possible methods
.lower() method.
str.lower()
combination of str.translate() and str.maketrans()
.lower() method
original_string = "UPPERCASE"
lowercase_string = original_string.lower()
print(lowercase_string) # Output: "uppercase"
str.lower()
original_string = "UPPERCASE"
lowercase_string = str.lower(original_string)
print(lowercase_string) # Output: "uppercase"
combination of str.translate() and str.maketrans()
original_string = "UPPERCASE"
lowercase_string = original_string.translate(str.maketrans(string.ascii_uppercase, string.ascii_lowercase))
print(lowercase_string) # Output: "uppercase"
lowercasing
This method not only converts all uppercase letters of the Latin alphabet into lowercase ones, but also shows how such logic is implemented. You can test this code in any online Python sandbox.
def turnIntoLowercase(string):
lowercaseCharacters = ''
abc = ['a','b','c','d','e','f','g','h','i','j','k','l','m',
'n','o','p','q','r','s','t','u','v','w','x','y','z',
'A','B','C','D','E','F','G','H','I','J','K','L','M',
'N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
for character in string:
if character not in abc:
lowercaseCharacters += character
elif abc.index(character) <= 25:
lowercaseCharacters += character
else:
lowercaseCharacters += abc[abc.index(character) - 26]
return lowercaseCharacters
string = str(input("Enter your string, please: " ))
print(turnIntoLowercase(string = string))
Performance check
Now, let's enter the following string (and press Enter) to make sure everything works as intended:
# Enter your string, please:
"PYTHON 3.11.2, 15TH FeB 2023"
Result:
"python 3.11.2, 15th feb 2023"
If you want to convert a list of strings to lowercase, you can map str.lower:
list_of_strings = ['CamelCase', 'in', 'Python']
list(map(str.lower, list_of_strings)) # ['camelcase', 'in', 'python']

chr() equivalent returning a bytes object, in py3k

Python 2.x has chr(), which converts a number in the range 0-255 to a byte string with one character with that numeric value, and unichr(), which converts a number in the range 0-0x10FFFF to a Unicode string with one character with that Unicode codepoint. Python 3.x replaces unichr() with chr(), in keeping with its "Unicode strings are default" policy, but I can't find anything that does exactly what the old chr() did. The 2to3 utility (from 2.6) leaves chr calls alone, which is not right in general :(
(This is for parsing and serializing a file format which is explicitly defined in terms of 8-bit bytes.)
Try the following:
b = bytes([x])
For example:
>>> bytes([255])
b'\xff'
Consider using bytearray((255,)) which works the same in Python2 and Python3. In both Python generations the resulting bytearray-object can be converted to a bytes(obj) which is an alias for a str() in Python2 and real bytes() in Python3.
# Python2
>>> x = bytearray((32,33))
>>> x
bytearray(b' !')
>>> bytes(x)
' !'
# Python3
>>> x = bytearray((32,33))
>>> x
bytearray(b' !')
>>> bytes(x)
b' !'
In case you want to write Python 2/3 compatible code, use six.int2byte
Yet another alternative (Python 3.5+):
>>> b'%c' % 65
b'A'
>>> import struct
>>> struct.pack('B', 10)
b'\n'
>>> import functools
>>> bchr = functools.partial(struct.pack, 'B')
>>> bchr(10)
b'\n'
simple replacement based on small range memoization (should work on 2 and 3), good performance on CPython and pypy
binchr = tuple([bytes(bytearray((b,))) for b in range(256)]).__getitem__
binchr(1) -> b'\x01'

Categories

Resources