I was trying to build this bytes object in Python 3:
b'3\r\n'
so I tried the obvious (for me), and found a weird behaviour:
>>> bytes(3) + b'\r\n'
b'\x00\x00\x00\r\n'
Apparently:
>>> bytes(10)
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
I've been unable to see any pointers on why the bytes conversion works this way reading the documentation. However, I did find some surprise messages in this Python issue about adding format to bytes (see also Python 3 bytes formatting):
http://bugs.python.org/issue3982
This interacts even more poorly with oddities like bytes(int) returning zeroes now
and:
It would be much more convenient for me if bytes(int) returned the ASCIIfication of that int; but honestly, even an error would be better than this behavior. (If I wanted this behavior - which I never have - I'd rather it be a classmethod, invoked like "bytes.zeroes(n)".)
Can someone explain me where this behaviour comes from?
From python 3.2 you can use to_bytes:
>>> (1024).to_bytes(2, byteorder='big')
b'\x04\x00'
def int_to_bytes(x: int) -> bytes:
return x.to_bytes((x.bit_length() + 7) // 8, 'big')
def int_from_bytes(xbytes: bytes) -> int:
return int.from_bytes(xbytes, 'big')
Accordingly, x == int_from_bytes(int_to_bytes(x)).
Note that the above encoding works only for unsigned (non-negative) integers.
For signed integers, the bit length is a bit more tricky to calculate:
def int_to_bytes(number: int) -> bytes:
return number.to_bytes(length=(8 + (number + (number < 0)).bit_length()) // 8, byteorder='big', signed=True)
def int_from_bytes(binary_data: bytes) -> Optional[int]:
return int.from_bytes(binary_data, byteorder='big', signed=True)
That's the way it was designed - and it makes sense because usually, you would call bytes on an iterable instead of a single integer:
>>> bytes([3])
b'\x03'
The docs state this, as well as the docstring for bytes:
>>> help(bytes)
...
bytes(int) -> bytes object of size given by the parameter initialized with null bytes
You can use the struct's pack:
In [11]: struct.pack(">I", 1)
Out[11]: '\x00\x00\x00\x01'
The ">" is the byte-order (big-endian) and the "I" is the format character. So you can be specific if you want to do something else:
In [12]: struct.pack("<H", 1)
Out[12]: '\x01\x00'
In [13]: struct.pack("B", 1)
Out[13]: '\x01'
This works the same on both python 2 and python 3.
Note: the inverse operation (bytes to int) can be done with unpack.
Python 3.5+ introduces %-interpolation (printf-style formatting) for bytes:
>>> b'%d\r\n' % 3
b'3\r\n'
See PEP 0461 -- Adding % formatting to bytes and bytearray.
On earlier versions, you could use str and .encode('ascii') the result:
>>> s = '%d\r\n' % 3
>>> s.encode('ascii')
b'3\r\n'
Note: It is different from what int.to_bytes produces:
>>> n = 3
>>> n.to_bytes((n.bit_length() + 7) // 8, 'big') or b'\0'
b'\x03'
>>> b'3' == b'\x33' != b'\x03'
True
The documentation says:
bytes(int) -> bytes object of size given by the parameter
initialized with null bytes
The sequence:
b'3\r\n'
It is the character '3' (decimal 51) the character '\r' (13) and '\n' (10).
Therefore, the way would treat it as such, for example:
>>> bytes([51, 13, 10])
b'3\r\n'
>>> bytes('3', 'utf8') + b'\r\n'
b'3\r\n'
>>> n = 3
>>> bytes(str(n), 'ascii') + b'\r\n'
b'3\r\n'
Tested on IPython 1.1.0 & Python 3.2.3
The ASCIIfication of 3 is "\x33" not "\x03"!
That is what python does for str(3) but it would be totally wrong for bytes, as they should be considered arrays of binary data and not be abused as strings.
The most easy way to achieve what you want is bytes((3,)), which is better than bytes([3]) because initializing a list is much more expensive, so never use lists when you can use tuples. You can convert bigger integers by using int.to_bytes(3, "little").
Initializing bytes with a given length makes sense and is the most useful, as they are often used to create some type of buffer for which you need some memory of given size allocated. I often use this when initializing arrays or expanding some file by writing zeros to it.
I was curious about performance of various methods for a single int in the range [0, 255], so I decided to do some timing tests.
Based on the timings below, and from the general trend I observed from trying many different values and configurations, struct.pack seems to be the fastest, followed by int.to_bytes, bytes, and with str.encode (unsurprisingly) being the slowest. Note that the results show some more variation than is represented, and int.to_bytes and bytes sometimes switched speed ranking during testing, but struct.pack is clearly the fastest.
Results in CPython 3.7 on Windows:
Testing with 63:
bytes_: 100000 loops, best of 5: 3.3 usec per loop
to_bytes: 100000 loops, best of 5: 2.72 usec per loop
struct_pack: 100000 loops, best of 5: 2.32 usec per loop
chr_encode: 50000 loops, best of 5: 3.66 usec per loop
Test module (named int_to_byte.py):
"""Functions for converting a single int to a bytes object with that int's value."""
import random
import shlex
import struct
import timeit
def bytes_(i):
"""From Tim Pietzcker's answer:
https://stackoverflow.com/a/21017834/8117067
"""
return bytes([i])
def to_bytes(i):
"""From brunsgaard's answer:
https://stackoverflow.com/a/30375198/8117067
"""
return i.to_bytes(1, byteorder='big')
def struct_pack(i):
"""From Andy Hayden's answer:
https://stackoverflow.com/a/26920966/8117067
"""
return struct.pack('B', i)
# Originally, jfs's answer was considered for testing,
# but the result is not identical to the other methods
# https://stackoverflow.com/a/31761722/8117067
def chr_encode(i):
"""Another method, from Quuxplusone's answer here:
https://codereview.stackexchange.com/a/210789/140921
Similar to g10guang's answer:
https://stackoverflow.com/a/51558790/8117067
"""
return chr(i).encode('latin1')
converters = [bytes_, to_bytes, struct_pack, chr_encode]
def one_byte_equality_test():
"""Test that results are identical for ints in the range [0, 255]."""
for i in range(256):
results = [c(i) for c in converters]
# Test that all results are equal
start = results[0]
if any(start != b for b in results):
raise ValueError(results)
def timing_tests(value=None):
"""Test each of the functions with a random int."""
if value is None:
# random.randint takes more time than int to byte conversion
# so it can't be a part of the timeit call
value = random.randint(0, 255)
print(f'Testing with {value}:')
for c in converters:
print(f'{c.__name__}: ', end='')
# Uses technique borrowed from https://stackoverflow.com/q/19062202/8117067
timeit.main(args=shlex.split(
f"-s 'from int_to_byte import {c.__name__}; value = {value}' " +
f"'{c.__name__}(value)'"
))
The behaviour comes from the fact that in Python prior to version 3 bytes was just an alias for str. In Python3.x bytes is an immutable version of bytearray - completely new type, not backwards compatible.
From bytes docs:
Accordingly, constructor arguments are interpreted as for bytearray().
Then, from bytearray docs:
The optional source parameter can be used to initialize the array in a few different ways:
If it is an integer, the array will have that size and will be initialized with null bytes.
Note, that differs from 2.x (where x >= 6) behavior, where bytes is simply str:
>>> bytes is str
True
PEP 3112:
The 2.6 str differs from 3.0’s bytes type in various ways; most notably, the constructor is completely different.
int (including Python2's long) can be converted to bytes using following function:
import codecs
def int2bytes(i):
hex_value = '{0:x}'.format(i)
# make length of hex_value a multiple of two
hex_value = '0' * (len(hex_value) % 2) + hex_value
return codecs.decode(hex_value, 'hex_codec')
The reverse conversion can be done by another one:
import codecs
import six # should be installed via 'pip install six'
long = six.integer_types[-1]
def bytes2int(b):
return long(codecs.encode(b, 'hex_codec'), 16)
Both functions work on both Python2 and Python3.
Although the prior answer by brunsgaard is an efficient encoding, it works only for unsigned integers. This one builds upon it to work for both signed and unsigned integers.
def int_to_bytes(i: int, *, signed: bool = False) -> bytes:
length = ((i + ((i * signed) < 0)).bit_length() + 7 + signed) // 8
return i.to_bytes(length, byteorder='big', signed=signed)
def bytes_to_int(b: bytes, *, signed: bool = False) -> int:
return int.from_bytes(b, byteorder='big', signed=signed)
# Test unsigned:
for i in range(1025):
assert i == bytes_to_int(int_to_bytes(i))
# Test signed:
for i in range(-1024, 1025):
assert i == bytes_to_int(int_to_bytes(i, signed=True), signed=True)
For the encoder, (i + ((i * signed) < 0)).bit_length() is used instead of just i.bit_length() because the latter leads to an inefficient encoding of -128, -32768, etc.
Credit: CervEd for fixing a minor inefficiency.
As you want to deal with binary representation, the best is to use ctypes.
import ctypes
x = ctypes.c_int(1234)
bytes(x)
You must use the specific integer representation (signed/unsigned and the number of bits: c_uint8, c_int8, c_unit16,...).
Some answers don't work with large numbers.
Convert integer to the hex representation, then convert it to bytes:
def int_to_bytes(number):
hrepr = hex(number).replace('0x', '')
if len(hrepr) % 2 == 1:
hrepr = '0' + hrepr
return bytes.fromhex(hrepr)
Result:
>>> int_to_bytes(2**256 - 1)
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
I think you can convert the int to str first, before you convert to byte.
That should produce the format you want.
bytes(str(your_number),'UTF-8') + b'\r\n'
It works for me in py3.8.
If the question is how to convert an integer itself (not its string equivalent) into bytes, I think the robust answer is:
>>> i = 5
>>> i.to_bytes(2, 'big')
b'\x00\x05'
>>> int.from_bytes(i.to_bytes(2, 'big'), byteorder='big')
5
More information on these methods here:
https://docs.python.org/3.8/library/stdtypes.html#int.to_bytes
https://docs.python.org/3.8/library/stdtypes.html#int.from_bytes
>>> chr(116).encode()
b't'
I am using the answer provided here to convert string IP to integer. But I am not getting expected output. Specifically, the method is
>>> ipstr = '1.2.3.4'
>>> parts = ipstr.split('.')
>>> (int(parts[0]) << 24) + (int(parts[1]) << 16) + \
(int(parts[2]) << 8) + int(parts[3])
but when I provide it the value 172.31.22.98 I get 2887718498 back. However, I expect to see value -1407248798 as provided by Google's Guava library https://google.github.io/guava/releases/20.0/api/docs/com/google/common/net/InetAddresses.html#fromInteger-int-
Also, I've verified that this service provides expected output but all of the answers provided by the aforementioned StackOverflow answer return 2887718498
Note that I cannot use any third party library. So I am pretty much limited to using a hand-written code (no imports)
A better way is to use the library method
>>> from socket import inet_aton
>>> int.from_bytes(inet_aton('172.31.22.98'), byteorder="big")
2887718498
This is still the same result you had above
Here is one way to view it as a signed int
>>> from ctypes import c_long
>>> c_long(2887718498)
c_long(-1407248798)
To do it without imports (but why? The above is all first party CPython)
>>> x = 2887718498
>>> if x >= (1<<31): x-= (1<<32)
>>> x
-1407248798
Found this post whilst trying to do the same as OP. Developed the below for my simple mind to understand and for someone to benefit from.
baseIP = '192.168.1.0'
baseIPFirstOctet = (int((baseIP).split('.')[0]))
baseIPSecondOctet = (int((baseIP).split('.')[1]))
baseIPThirdOctet = (int((baseIP).split('.')[2]))
baseIPFourthOctet = (int((baseIP).split('.')[3]))
I have looked at the modules humanize and humanfriendly, and neither can convert a large bit value to human readable bit output (e.g. Mbits, Gbits, Tbits, ..etc). Has anyone come across such a module? Example:
mbits = 1000000
gbits = 1000000000
Then
print(human.bits(mbits)) # would output "1 Mbit"
print(human.bits(gbits)) # would output "1 Gbit"
...etc, up to exabit.
You can try hurry.filesize
>>> from hurry.filesize import size
>>> size(11000)
'10K'
There is another library bitmath
>>> from bitmath import *
>>> small_number = MiB(10000)
>>> print small_number.best_prefix()
9.765625 GiB
I need to save a tuple of 4 numbers inside a column that only accepts numbers (int or floats)
I have a list of 4 number like -0.0123445552, -29394.2393339, 0.299393333, 0.00002345556.
How can I "store" all these numbers inside a number and be able to retrieve the original tuple in Python?
Thanks
Following up on #YevgenYampolskiy's idea of using numpy:
You could use numpy to convert the numbers to 16-bit floats, and then view the array as one 64-bit int:
import numpy as np
data = np.array((-0.0123445552, -29394.2393339, 0.299393333, 0.00002345556))
stored_int = data.astype('float16').view('int64')[0]
print(stored_int)
# 110959187158999634
recovered = np.array([stored_int], dtype='int64').view('float16')
print(recovered)
# [ -1.23443604e-02 -2.93920000e+04 2.99316406e-01 2.34842300e-05]
Note: This requires numpy version 1.6 or better, as this was the first version to support 16-bit floats.
If by int you mean the datatype int in Python (which is unlimited as of the current version), you may use the following solution
>>> x
(-0.0123445552, -29394.2393339, 0.299393333, 2.345556e-05)
>>> def encode(data):
sz_data = str(data)
import base64
b64_data = base64.b16encode(sz_data)
int_data = int(b64_data, 16)
return int_data
>>> encode(x)
7475673073900173755504583442986834619410853148159171975880377161427327210207077083318036472388282266880288275998775936614297529315947984169L
>>> def decode(data):
int_data = data
import base64
hex_data = hex(int_data)[2:].upper()
if hex_data[-1] == 'L':
hex_data = hex_data[:-1]
b64_data = base64.b16decode(hex_data)
import ast
sz_data = ast.literal_eval(b64_data)
return sz_data
>>> decode(encode(x))
(-0.0123445552, -29394.2393339, 0.299393333, 2.345556e-05)
You can combine 4 integers into a single integer, or two floats into a double using struct module:
from struct import *
s = pack('hhhh', 1, -2, 3,-4)
i = unpack('Q', pack('Q', i[0]))
print i
print unpack('hhhh', s)
s = pack('ff', 1.12, -2.32)
f = unpack('d', s)
print f
print unpack('ff', pack('d', f[0]))
prints
(18445618190982447105L,)
(1, -2, 3, -4)
(-5.119999879002571,)
(1.1200000047683716, -2.319999933242798)
Basically in this example tuple (1,-2,3,-4) gets packed into an integer 18445618190982447105, and tuple ( 1.12, -2.32) gets packed into -5.119999879002571
To pack 4 floats into a single float you will need to use half-floats, however this is a problem here:
With half-float it looks like there is no native support in python as of now:
http://bugs.python.org/issue11734
However numpy module do have some support for half-floats (http://docs.scipy.org/doc/numpy/user/basics.types.html). Maybe you can use it somehow to pack 4 floats into a single float
This does not really answer your question, but what you're trying to do violates 1NF. Is changing the DB schema to introduce an intersection table really not an option?
my idea is weird; but will it work??
In [31]: nk="-0.0123445552, -29394.2393339, 0.299393333, 0.00002345556"
In [32]: nk1="".join(str(ord(x)) for x in nk)
In [33]: nk1
Out[33]: '454846484950515252535353504432455057515752465051575151515744324846505757515751515151443248464848484850515253535354'
In [34]: import math
In [35]: math.log(long(nk1), 1000)
Out[36]: 37.885954947611985
In [37]: math.pow(1000,_)
Out[37]: 4.548464849505043e+113
you can easily unpack this string(Out[33]); for example split it at 32; its for space.
also this string is very long; we can make it to a small number by math.log; as we got in Out[36].