Matlab's vectorized sprintf like function in python - python

After using Matlab for some time I grew quite fond of its sprintf function, which is vectorized (vectorization is the crucial part of the question).
Assuming one has a listli=[1,2,3,4,5,6],
sprintf("%d %d %d\n", li)
would apply the format on the elements in li one after another returning
"1 2 3\n4 5 6\n"
as string.
My current solution does not strike as very pythonic:
def my_sprintf(formatstr, args):
#number of arguments for format string:
n=formatstr.count('%')
res=""
#if there are k*n+m elements in the list, leave the last m out
for i in range(n,len(args)+1,n):
res+=formatstr%tuple(args[i-n:i])
return res
What would be the usual/better way of doing it in python?
Would it be possible, without explicitly eliciting the number of expected parameters from the format string (n=formatstr.count('%') feels like a hack)?
PS: For the sake of simplicity one could assume, that the number of elements in the list is a multiple of number of arguments in the format string.

You could use a variation of the grouper recipe if you get the user to pass in the chunk size.
def sprintf(iterable,fmt, n):
args = zip(*[iter(iterable)] * n)
return "".join([fmt % t for t in args])
Output:
In [144]: sprintf(li,"%.2f %.2f %d\n", 3)
Out[144]: '1.00 2.00 3\n4.00 5.00 6\n'
In [145]: sprintf(li,"%d %d %d\n", 3)
Out[145]: '1 2 3\n4 5 6\n'
You could handle when the chunk size was not a multiple of the list size using izip_longest and str.format but it would not let you specify the types without erroring :
from itertools import izip_longest
def sprintf(iterable, fmt, n, fillvalue=""):
args = izip_longest(*[iter(iterable)] * n, fillvalue=fillvalue)
return "".join([fmt.format(*t) for t in args])
If you split the placeholders or get the user to pass an iterable of placeholders you could catch all the potential issues.
def sprintf(iterable, fmt, sep=" "):
obj = object()
args = izip_longest(*[iter(iterable)] * len(fmt), fillvalue=obj)
return "".join(["{sep}".join([f % i for f, i in zip(fmt, t) if i is not obj]).format(sep=sep) + "\n"
for t in args])
Demo:
In [165]: sprintf(li, ["%.2f", "%d", "%.2f", "%2.f"])
Out[165]: '1.00 2 3.00 4\n5.00 6\n'
In [166]: sprintf(li, ["%d", "%d", "%d"])
Out[166]: '1 2 3\n4 5 6\n'
In [167]: sprintf(li, ["%f", "%f", "%.4f"])
Out[167]: '1.000000 2.000000 3.0000\n4.000000 5.000000 6.0000\n'
In [168]: sprintf(li, ["%.2f", "%d", "%.2f", "%2.f"])
Out[168]: '1.00 2 3.00 4\n5.00 6\n'

You may want to remove the += in the for loop. The following version is approximately three times faster than yours. It also works even in cases where you want to print the % symbol in the output. Therefore, the format string contains '%%'.
def my_sprintf(format_str, li):
n = format_str.count('%') - 2*format_str.count('%%')
repeats = len(li)//n
return (format_str*repeats) % tuple(li[:repeats*n])
A less hacky way is possible if you use the newer .format method instead of %. In such a case, you can use the string.Formatter().parse() method to get the list of fields used in the format_str.
The function then looks like this:
import string
li = [1, 2, 3, 4, 5, 6, 7]
format_str = '{:d} {:d} {:d}\n'
def my_sprintf(format_str, li):
formatter = string.Formatter()
n = len(list(filter(lambda a: a[2] is not None,
formatter.parse(format_str))))
repeats = len(li)//n
return (format_str*repeats).format(*li[:repeats*n])

Related

Combining numbers together to form multiple digit number

I'm trying to combine multiple numbers together in python 3.7 but I'm having no luck.
I want it to be like such:
1 + 4 + 5 = 145
I know this is simple but I'm getting nowhere!
You can use reduce to do this in a mathematical way
>>> l = [1, 4, 5]
>>>
>>> from functools import reduce
>>> reduce(lambda x,y: 10*x+y, l)
145
Alternatively, you can use string concat
>>> int(''.join(map(str, l)))
145
If you want to do this numerically, consider what base-10 numerals means:
145 = 1 * 10**2 + 4 * 10**1 + 5 * 10**0
So, you need to get N numbers that range from N-1 to 0, in lockstep with the digits. One way to do this is with enumerate plus a bit of extra arithmetic:
def add_digits(*digits):
total = 0
for i, digit in enumerate(digits):
total += digit * 10**(len(digits)-i-1)
return total
Now:
>>> add_digits(1, 4, 5)
145
Of course this only works with sequences of digits—where you know how many digits you have in advance. What if you wanted to work with any iterable of digits, even an iterator coming for a generator expression or something? Then you can rethink the problem:
1456 = ((1 * 10 + 4) * 10 + 5) * 10 + 6
So:
def add_digits(digits):
total = 0
for digit in digits:
total = total * 10 + digit
return total
>>> add_digits((1, 3, 5, 6))
1356
>>> add_digits(n for n in range(10) if n%2)
13579
Notice that you can easily extend either version to other bases:
def add_digits(*digits, base=10):
total = 0
for i, digit in enumerate(digits):
total += digit * base**(len(digits)-i-1)
return total
>>> hex(add_digits(1, 0xF, 2, 0xA, base=16))
'0x1f2a'
… which isn't quite as easy to do with the stringy version; you can't just do int(''.join(map(str, digits)), base), but instead need to replace that str with a function that converts to a string in a given base. Which there are plenty of solutions for, but no obvious and readable one-liner.
You should try casting the numbers as strings! When you do something like this
str(1)+str(4)+str(5)
You will get 145, but it will be a string. If you want it to be a number afterwards, then you can cast the whole thing as an integer.
int(str(1)+str(4)+str(5))
or just set the answer to a new variable and cast that as an integer.
You could just write a function that concatenates numbers or any other object/datatype as a string
concatenate = lambda *args : ''.join([str(arg) for arg in args])
a = 1
print(concatenate(4, 5, 6))
print(concatenate(a, MagicNumber(1), "3"))
But also in python you can make a class and write magic functions that control the way that objects of your class are added, subtracted etc. You could make a class to store a number and add it like you want to. You could save this code in a file and import it or paste it into your own script.
class MagicNumber():
value = 0
def __init__(self, value):
self.value = int(value)
def __str__(self):
return str(self.value)
def __int__(self):
return self.value
def __repr__(self):
return self.value
def __add__(self, b):
return MagicNumber(str(self)+str(b))
if __name__ == "__main__":
a = MagicNumber(4)
b = MagicNumber(5)
c = MagicNumber(6)
print(a+b+c)
#You could even do this but I strongly advise against it
print(a+5+6)
And heres a link to the documentation about these "magic methods"
https://docs.python.org/3/reference/datamodel.html
The easiest way to do this is to concat them as strings, and then parse it back into a number.
x = str(1) + str(4) + str(5)
print(int(x))
or
int(str(1) + str(4) + str(5))

Python: format negative number with parentheses

Is there a way to use either string interpolation or string.format to render negative numbers into text formatted using parentheses instead of "negative signs"?
I.e. -3.14 should be (3.14).
I had hoped to do this using string interpolation or string.format rather than needing an import specifically designed for currencies or accounting.
Edit to clarify: Please assume the variable to be formatted is either an int or a float. I.e. while this can be done with regular expressions (see good answers below), I was thinking this would be a more native operation for Python's formatting functionality.
So to be clear:
import numpy as np
list_of_inputs = [-10, -10.5, -10 * np.sqrt(2), 10, 10.5, 10 * np.sqrt(2)]
for i in list_of_inputs:
# your awesome solution goes here
should return:
(10)
(10.5)
(14.14)
10
10.5
14.14
Clearly there is some flexibility about that last one. I had hoped the "put negative numbers in parentheses" would be a natural argument of string interpolation or string.format so that I could use other formatting language while setting the display style of negative numbers.
If you just need to handle possibly-negative numeric input:
print '{0:.2f}'.format(num) if num>=0 else '({0:.2f})'.format(abs(num))
This is what subclassing the formatter class is for. Try the following:
import string
class NegativeParenFormatter(string.Formatter):
def format_field(self, value, format_spec):
try:
if value<0:
return "(" + string.Formatter.format_field(self, -value, format_spec) + ")"
else:
return string.Formatter.format_field(self, value, format_spec)
except:
return string.Formatter.format_field(self, value, format_spec)
f = NegativeParenFormatter()
print f.format("{0} is positive, {1} is negative, {2} is a string", 3, -2, "-4")
this prints:
'3 is positive, (2) is negative, -4 is a string'
Pandas has a display option for floats and numpy has a display option for any dtype:
In [11]: df = pd.DataFrame([[1., -2], [-3., 4]], columns=['A', 'B'])
Note: A is a float column, B is an int column.
We can just write a simple formatter depending on the sign of the number:
In [12]: formatter = lambda x: '(%s)' % str(x)[1:] if x < 0 else str(x)
In [13]: pd.options.display.float_format = formatter
In [14]: df # doesn't work for the int column :(
Out[14]:
A B
0 1.0 2
1 (3.0) 4
In [15]: df.astype(float)
Out[15]:
A B
0 1.0 (2.0)
1 (3.0) 4.0
You can also configure numpy's print options:
In [21]: df.values # float
Out[21]:
array([[1., 2.],
[3., 4.]])
In [22]: df['B'].values # int
Out[22]: array([2, 4])
In [23]: np.set_printoptions(formatter={'int': formatter, 'float': formatter})
In [24]: df.values # float
Out[24]:
array([[1.0, (2.0)],
[(3.0), 4.0]])
In [25]: df['B'].values # int
Out[25]: array([(2), 4])
Note: this doesn't change the way the data is stored, just the way you view it.
Your easiest approach would be to use a trinary.
num = -3.14
output = "({})".format(math.fabs(num)) if num < 0 else "{}".format(num)
I can't remember if this works with a straight print statement instead of an assignment. I will check this when I get by an interpreter.
Thanks LartS for 3.x confirmation: I further confirmed against(3.x and 2.x)
print("({})".format(math.fabs(num)) if num < 0 else "{}".format(num))
Does work
Maybe you're looking for something like this
float = -3.14
num= "(%(key)s)" %{ 'key': str(abs(float))} if float < 0 else str(float)
You can use conditionals in a Python print statement:
print "%s%d%s" % ( "(" if (i<0) else(""), i, ")" if (i<0) else("") )

What is the quickest way to get a number with unique digits in python?

Lemme clarify:
What would be the fastest way to get every number with all unique digits between two numbers. For example, 10,000 and 100,000.
Some obvious ones would be 12,345 or 23,456. I'm trying to find a way to gather all of them.
for i in xrange(LOW, HIGH):
str_i = str(i)
...?
Use itertools.permutations:
from itertools import permutations
result = [
a * 10000 + b * 1000 + c * 100 + d * 10 + e
for a, b, c, d, e in permutations(range(10), 5)
if a != 0
]
I used the fact, that:
numbers between 10000 and 100000 have either 5 or 6 digits, but only 6-digit number here does not have unique digits,
itertools.permutations creates all combinations, with all orderings (so both 12345 and 54321 will appear in the result), with given length,
you can do permutations directly on sequence of integers (so no overhead for converting the types),
EDIT:
Thanks for accepting my answer, but here is the data for the others, comparing mentioned results:
>>> from timeit import timeit
>>> stmt1 = '''
a = []
for i in xrange(10000, 100000):
s = str(i)
if len(set(s)) == len(s):
a.append(s)
'''
>>> stmt2 = '''
result = [
int(''.join(digits))
for digits in permutations('0123456789', 5)
if digits[0] != '0'
]
'''
>>> setup2 = 'from itertools import permutations'
>>> stmt3 = '''
result = [
x for x in xrange(10000, 100000)
if len(set(str(x))) == len(str(x))
]
'''
>>> stmt4 = '''
result = [
a * 10000 + b * 1000 + c * 100 + d * 10 + e
for a, b, c, d, e in permutations(range(10), 5)
if a != 0
]
'''
>>> setup4 = setup2
>>> timeit(stmt1, number=100)
7.955858945846558
>>> timeit(stmt2, setup2, number=100)
1.879319190979004
>>> timeit(stmt3, number=100)
8.599710941314697
>>> timeit(stmt4, setup4, number=100)
0.7493319511413574
So, to sum up:
solution no. 1 took 7.96 s,
solution no. 2 (my original solution) took 1.88 s,
solution no. 3 took 8.6 s,
solution no. 4 (my updated solution) took 0.75 s,
Last solution looks around 10x faster than solutions proposed by others.
Note: My solution has some imports that I did not measure. I assumed your imports will happen once, and code will be executed multiple times. If it is not the case, please adapt the tests to your needs.
EDIT #2: I have added another solution, as operating on strings is not even necessary - it can be achieved by having permutations of real integers. I bet this can be speed up even more.
Cheap way to do this:
for i in xrange(LOW, HIGH):
s = str(i)
if len(set(s)) == len(s):
# number has unique digits
This uses a set to collect the unique digits, then checks to see that there are as many unique digits as digits in total.
List comprehension will work a treat here (logic stolen from nneonneo):
[x for x in xrange(LOW,HIGH) if len(set(str(x)))==len(str(x))]
And a timeit for those who are curious:
> python -m timeit '[x for x in xrange(10000,100000) if len(set(str(x)))==len(str(x))]'
10 loops, best of 3: 101 msec per loop
Here is an answer from scratch:
def permute(L, max_len):
allowed = L[:]
results, seq = [], range(max_len)
def helper(d):
if d==0:
results.append(''.join(seq))
else:
for i in xrange(len(L)):
if allowed[i]:
allowed[i]=False
seq[d-1]=L[i]
helper(d-1)
allowed[i]=True
helper(max_len)
return results
A = permute(list("1234567890"), 5)
print A
print len(A)
print all(map(lambda a: len(set(a))==len(a), A))
It perhaps could be further optimized by using an interval representation of the allowed elements, although for n=10, I'm not sure it will make a difference. I could also transform the recursion into a loop, but in this form it is more elegant and clear.
Edit: Here are the timings of the various solutions
2.75808000565 (My solution)
8.22729802132 (Sol 1)
1.97218298912 (Sol 2)
9.659760952 (Sol 3)
0.841020822525 (Sol 4)
no_list=['115432', '555555', '1234567', '5467899', '3456789', '987654', '444444']
rep_list=[]
nonrep_list=[]
for no in no_list:
u=[]
for digit in no:
# print(digit)
if digit not in u:
u.append(digit)
# print(u)
#iF REPEAT IS THERE
if len(no) != len(u):
# print(no)
rep_list.append(no)
#If repeatation is not there
else:
nonrep_list.append(no)
print('Numbers which have no repeatation are=',rep_list)
print('Numbers which have repeatation are=',nonrep_list)

What is the simplest way to swap each pair of adjoining chars in a string with Python?

I want to swap each pair of characters in a string. '2143' becomes '1234', 'badcfe' becomes 'abcdef'.
How can I do this in Python?
oneliner:
>>> s = 'badcfe'
>>> ''.join([ s[x:x+2][::-1] for x in range(0, len(s), 2) ])
'abcdef'
s[x:x+2] returns string slice from x to x+2; it is safe for odd len(s).
[::-1] reverses the string in Python
range(0, len(s), 2) returns 0, 2, 4, 6 ... while x < len(s)
The usual way to swap two items in Python is:
a, b = b, a
So it would seem to me that you would just do the same with an extended slice. However, it is slightly complicated because strings aren't mutable; so you have to convert to a list and then back to a string.
Therefore, I would do the following:
>>> s = 'badcfe'
>>> t = list(s)
>>> t[::2], t[1::2] = t[1::2], t[::2]
>>> ''.join(t)
'abcdef'
Here's one way...
>>> s = '2134'
>>> def swap(c, i, j):
... c = list(c)
... c[i], c[j] = c[j], c[i]
... return ''.join(c)
...
>>> swap(s, 0, 1)
'1234'
>>>
''.join(s[i+1]+s[i] for i in range(0, len(s), 2)) # 10.6 usec per loop
or
''.join(x+y for x, y in zip(s[1::2], s[::2])) # 10.3 usec per loop
or if the string can have an odd length:
''.join(x+y for x, y in itertools.izip_longest(s[1::2], s[::2], fillvalue=''))
Note that this won't work with old versions of Python (if I'm not mistaking older than 2.5).
The benchmark was run on python-2.7-8.fc14.1.x86_64 and a Core 2 Duo 6400 CPU with s='0123456789'*4.
If performance or elegance is not an issue, and you just want clarity and have the job done then simply use this:
def swap(text, ch1, ch2):
text = text.replace(ch2, '!',)
text = text.replace(ch1, ch2)
text = text.replace('!', ch1)
return text
This allows you to swap or simply replace chars or substring.
For example, to swap 'ab' <-> 'de' in a text:
_str = "abcdefabcdefabcdef"
print swap(_str, 'ab','de') #decabfdecabfdecabf
Loop over length of string by twos and swap:
def oddswap(st):
s = list(st)
for c in range(0,len(s),2):
t=s[c]
s[c]=s[c+1]
s[c+1]=t
return "".join(s)
giving:
>>> s
'foobar'
>>> oddswap(s)
'ofbora'
and fails on odd-length strings with an IndexError exception.
There is no need to make a list. The following works for even-length strings:
r = ''
for in in range(0, len(s), 2) :
r += s[i + 1] + s[i]
s = r
A more general answer... you can do any single pairwise swap with tuples or strings using this approach:
# item can be a string or tuple and swap can be a list or tuple of two
# indices to swap
def swap_items_by_copy(item, swap):
s0 = min(swap)
s1 = max(swap)
if isinstance(item,str):
return item[:s0]+item[s1]+item[s0+1:s1]+item[s0]+item[s1+1:]
elif isinstance(item,tuple):
return item[:s0]+(item[s1],)+item[s0+1:s1]+(item[s0],)+item[s1+1:]
else:
raise ValueError("Type not supported")
Then you can invoke it like this:
>>> swap_items_by_copy((1,2,3,4,5,6),(1,2))
(1, 3, 2, 4, 5, 6)
>>> swap_items_by_copy("hello",(1,2))
'hlelo'
>>>
Thankfully python gives empty strings or tuples for the cases where the indices refer to non existent slices.
To swap characters in a string a of position l and r
def swap(a, l, r):
a = a[0:l] + a[r] + a[l+1:r] + a[l] + a[r+1:]
return a
Example:
swap("aaabcccdeee", 3, 7) returns "aaadcccbeee"
Do you want the digits sorted? Or are you swapping odd/even indexed digits? Your example is totally unclear.
Sort:
s = '2143'
p=list(s)
p.sort()
s = "".join(p)
s is now '1234'. The trick is here that list(string) breaks it into characters.
Like so:
>>> s = "2143658709"
>>> ''.join([s[i+1] + s[i] for i in range(0, len(s), 2)])
'1234567890'
>>> s = "badcfe"
>>> ''.join([s[i+1] + s[i] for i in range(0, len(s), 2)])
'abcdef'
re.sub(r'(.)(.)',r"\2\1",'abcdef1234')
However re is a bit slow.
def swap(s):
i=iter(s)
while True:
a,b=next(i),next(i)
yield b
yield a
''.join(swap("abcdef1234"))
One more way:
>>> s='123456'
>>> ''.join([''.join(el) for el in zip(s[1::2], s[0::2])])
'214365'
>>> import ctypes
>>> s = 'abcdef'
>>> mutable = ctypes.create_string_buffer(s)
>>> for i in range(0,len(s),2):
>>> mutable[i], mutable[i+1] = mutable[i+1], mutable[i]
>>> s = mutable.value
>>> print s
badcfe
def revstr(a):
b=''
if len(a)%2==0:
for i in range(0,len(a),2):
b += a[i + 1] + a[i]
a=b
else:
c=a[-1]
for i in range(0,len(a)-1,2):
b += a[i + 1] + a[i]
b=b+a[-1]
a=b
return b
a=raw_input('enter a string')
n=revstr(a)
print n
A bit late to the party, but there is actually a pretty simple way to do this:
The index sequence you are looking for can be expressed as the sum of two sequences:
0 1 2 3 ...
+1 -1 +1 -1 ...
Both are easy to express. The first one is just range(N). A sequence that toggles for each i in that range is i % 2. You can adjust the toggle by scaling and offsetting it:
i % 2 -> 0 1 0 1 ...
1 - i % 2 -> 1 0 1 0 ...
2 * (1 - i % 2) -> 2 0 2 0 ...
2 * (1 - i % 2) - 1 -> +1 -1 +1 -1 ...
The entire expression simplifies to i + 1 - 2 * (i % 2), which you can use to join the string almost directly:
result = ''.join(string[i + 1 - 2 * (i % 2)] for i in range(len(string)))
This will work only for an even-length string, so you can check for overruns using min:
N = len(string)
result = ''.join(string[min(i + 1 - 2 * (i % 2), N - 1)] for i in range(N))
Basically a one-liner, doesn't require any iterators beyond a range over the indices, and some very simple integer math.
While the above solutions do work, there is a very simple solution shall we say in "layman's" terms. Someone still learning python and string's can use the other answers but they don't really understand how they work or what each part of the code is doing without a full explanation by the poster as opposed to "this works". The following executes the swapping of every second character in a string and is easy for beginners to understand how it works.
It is simply iterating through the string (any length) by two's (starting from 0 and finding every second character) and then creating a new string (swapped_pair) by adding the current index + 1 (second character) and then the actual index (first character), e.g., index 1 is put at index 0 and then index 0 is put at index 1 and this repeats through iteration of string.
Also added code to ensure string is of even length as it only works for even length.
DrSanjay Bhakkad post above is also a good one that works for even or odd strings and is basically doing the same function as below.
string = "abcdefghijklmnopqrstuvwxyz123"
# use this prior to below iteration if string needs to be even but is possibly odd
if len(string) % 2 != 0:
string = string[:-1]
# iteration to swap every second character in string
swapped_pair = ""
for i in range(0, len(string), 2):
swapped_pair += (string[i + 1] + string[i])
# use this after above iteration for any even or odd length of strings
if len(swapped_pair) % 2 != 0:
swapped_adj += swapped_pair[-1]
print(swapped_pair)
badcfehgjilknmporqtsvuxwzy21 # output if the "needs to be even" code used
badcfehgjilknmporqtsvuxwzy213 # output if the "even or odd" code used
One of the easiest way to swap first two characters from a String is
inputString = '2134'
extractChar = inputString[0:2]
swapExtractedChar = extractChar[::-1] """Reverse the order of string"""
swapFirstTwoChar = swapExtractedChar + inputString[2:]
# swapFirstTwoChar = inputString[0:2][::-1] + inputString[2:] """For one line code"""
print(swapFirstTwoChar)
#Works on even/odd size strings
str = '2143657'
newStr = ''
for i in range(len(str)//2):
newStr += str[i*2+1] + str[i*2]
if len(str)%2 != 0:
newStr += str[-1]
print(newStr)
#Think about how index works with string in Python,
>>> a = "123456"
>>> a[::-1]
'654321'

How to convert a string of bytes into an int?

How can I convert a string of bytes into an int in python?
Say like this: 'y\xcc\xa6\xbb'
I came up with a clever/stupid way of doing it:
sum(ord(c) << (i * 8) for i, c in enumerate('y\xcc\xa6\xbb'[::-1]))
I know there has to be something builtin or in the standard library that does this more simply...
This is different from converting a string of hex digits for which you can use int(xxx, 16), but instead I want to convert a string of actual byte values.
UPDATE:
I kind of like James' answer a little better because it doesn't require importing another module, but Greg's method is faster:
>>> from timeit import Timer
>>> Timer('struct.unpack("<L", "y\xcc\xa6\xbb")[0]', 'import struct').timeit()
0.36242198944091797
>>> Timer("int('y\xcc\xa6\xbb'.encode('hex'), 16)").timeit()
1.1432669162750244
My hacky method:
>>> Timer("sum(ord(c) << (i * 8) for i, c in enumerate('y\xcc\xa6\xbb'[::-1]))").timeit()
2.8819329738616943
FURTHER UPDATE:
Someone asked in comments what's the problem with importing another module. Well, importing a module isn't necessarily cheap, take a look:
>>> Timer("""import struct\nstruct.unpack(">L", "y\xcc\xa6\xbb")[0]""").timeit()
0.98822188377380371
Including the cost of importing the module negates almost all of the advantage that this method has. I believe that this will only include the expense of importing it once for the entire benchmark run; look what happens when I force it to reload every time:
>>> Timer("""reload(struct)\nstruct.unpack(">L", "y\xcc\xa6\xbb")[0]""", 'import struct').timeit()
68.474128007888794
Needless to say, if you're doing a lot of executions of this method per one import than this becomes proportionally less of an issue. It's also probably i/o cost rather than cpu so it may depend on the capacity and load characteristics of the particular machine.
In Python 3.2 and later, use
>>> int.from_bytes(b'y\xcc\xa6\xbb', byteorder='big')
2043455163
or
>>> int.from_bytes(b'y\xcc\xa6\xbb', byteorder='little')
3148270713
according to the endianness of your byte-string.
This also works for bytestring-integers of arbitrary length, and for two's-complement signed integers by specifying signed=True. See the docs for from_bytes.
You can also use the struct module to do this:
>>> struct.unpack("<L", "y\xcc\xa6\xbb")[0]
3148270713L
As Greg said, you can use struct if you are dealing with binary values, but if you just have a "hex number" but in byte format you might want to just convert it like:
s = 'y\xcc\xa6\xbb'
num = int(s.encode('hex'), 16)
...this is the same as:
num = struct.unpack(">L", s)[0]
...except it'll work for any number of bytes.
I use the following function to convert data between int, hex and bytes.
def bytes2int(str):
return int(str.encode('hex'), 16)
def bytes2hex(str):
return '0x'+str.encode('hex')
def int2bytes(i):
h = int2hex(i)
return hex2bytes(h)
def int2hex(i):
return hex(i)
def hex2int(h):
if len(h) > 1 and h[0:2] == '0x':
h = h[2:]
if len(h) % 2:
h = "0" + h
return int(h, 16)
def hex2bytes(h):
if len(h) > 1 and h[0:2] == '0x':
h = h[2:]
if len(h) % 2:
h = "0" + h
return h.decode('hex')
Source: http://opentechnotes.blogspot.com.au/2014/04/convert-values-to-from-integer-hex.html
import array
integerValue = array.array("I", 'y\xcc\xa6\xbb')[0]
Warning: the above is strongly platform-specific. Both the "I" specifier and the endianness of the string->int conversion are dependent on your particular Python implementation. But if you want to convert many integers/strings at once, then the array module does it quickly.
In Python 2.x, you could use the format specifiers <B for unsigned bytes, and <b for signed bytes with struct.unpack/struct.pack.
E.g:
Let x = '\xff\x10\x11'
data_ints = struct.unpack('<' + 'B'*len(x), x) # [255, 16, 17]
And:
data_bytes = struct.pack('<' + 'B'*len(data_ints), *data_ints) # '\xff\x10\x11'
That * is required!
See https://docs.python.org/2/library/struct.html#format-characters for a list of the format specifiers.
>>> reduce(lambda s, x: s*256 + x, bytearray("y\xcc\xa6\xbb"))
2043455163
Test 1: inverse:
>>> hex(2043455163)
'0x79cca6bb'
Test 2: Number of bytes > 8:
>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAAA"))
338822822454978555838225329091068225L
Test 3: Increment by one:
>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAAB"))
338822822454978555838225329091068226L
Test 4: Append one byte, say 'A':
>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAABA"))
86738642548474510294585684247313465921L
Test 5: Divide by 256:
>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAABA"))/256
338822822454978555838225329091068226L
Result equals the result of Test 4, as expected.
I was struggling to find a solution for arbitrary length byte sequences that would work under Python 2.x. Finally I wrote this one, it's a bit hacky because it performs a string conversion, but it works.
Function for Python 2.x, arbitrary length
def signedbytes(data):
"""Convert a bytearray into an integer, considering the first bit as
sign. The data must be big-endian."""
negative = data[0] & 0x80 > 0
if negative:
inverted = bytearray(~d % 256 for d in data)
return -signedbytes(inverted) - 1
encoded = str(data).encode('hex')
return int(encoded, 16)
This function has two requirements:
The input data needs to be a bytearray. You may call the function like this:
s = 'y\xcc\xa6\xbb'
n = signedbytes(s)
The data needs to be big-endian. In case you have a little-endian value, you should reverse it first:
n = signedbytes(s[::-1])
Of course, this should be used only if arbitrary length is needed. Otherwise, stick with more standard ways (e.g. struct).
int.from_bytes is the best solution if you are at version >=3.2.
The "struct.unpack" solution requires a string so it will not apply to arrays of bytes.
Here is another solution:
def bytes2int( tb, order='big'):
if order == 'big': seq=[0,1,2,3]
elif order == 'little': seq=[3,2,1,0]
i = 0
for j in seq: i = (i<<8)+tb[j]
return i
hex( bytes2int( [0x87, 0x65, 0x43, 0x21])) returns '0x87654321'.
It handles big and little endianness and is easily modifiable for 8 bytes
As mentioned above using unpack function of struct is a good way. If you want to implement your own function there is an another solution:
def bytes_to_int(bytes):
result = 0
for b in bytes:
result = result * 256 + int(b)
return result
In python 3 you can easily convert a byte string into a list of integers (0..255) by
>>> list(b'y\xcc\xa6\xbb')
[121, 204, 166, 187]
A decently speedy method utilizing array.array I've been using for some time:
predefined variables:
offset = 0
size = 4
big = True # endian
arr = array('B')
arr.fromstring("\x00\x00\xff\x00") # 5 bytes (encoding issues) [0, 0, 195, 191, 0]
to int: (read)
val = 0
for v in arr[offset:offset+size][::pow(-1,not big)]: val = (val<<8)|v
from int: (write)
val = 16384
arr[offset:offset+size] = \
array('B',((val>>(i<<3))&255 for i in range(size)))[::pow(-1,not big)]
It's possible these could be faster though.
EDIT:
For some numbers, here's a performance test (Anaconda 2.3.0) showing stable averages on read in comparison to reduce():
========================= byte array to int.py =========================
5000 iterations; threshold of min + 5000ns:
______________________________________code___|_______min______|_______max______|_______avg______|_efficiency
⣿⠀⠀⠀⠀⡇⢀⡀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⡀⠀⢰⠀⠀⠀⢰⠀⠀⠀⢸⠀⠀⢀⡇⠀⢀⠀⠀⠀⠀⢠⠀⠀⠀⠀⢰⠀⠀⠀⢸⡀⠀⠀⠀⢸⠀⡇⠀⠀⢠⠀⢰⠀⢸⠀
⣿⣦⣴⣰⣦⣿⣾⣧⣤⣷⣦⣤⣶⣾⣿⣦⣼⣶⣷⣶⣸⣴⣤⣀⣾⣾⣄⣤⣾⡆⣾⣿⣿⣶⣾⣾⣶⣿⣤⣾⣤⣤⣴⣼⣾⣼⣴⣤⣼⣷⣆⣴⣴⣿⣾⣷⣧⣶⣼⣴⣿⣶⣿⣶
val = 0 \nfor v in arr: val = (val<<8)|v | 5373.848ns | 850009.965ns | ~8649.64ns | 62.128%
⡇⠀⠀⢀⠀⠀⠀⡇⠀⡇⠀⠀⣠⠀⣿⠀⠀⠀⠀⡀⠀⠀⡆⠀⡆⢰⠀⠀⡆⠀⡄⠀⠀⠀⢠⢀⣼⠀⠀⡇⣠⣸⣤⡇⠀⡆⢸⠀⠀⠀⠀⢠⠀⢠⣿⠀⠀⢠⠀⠀⢸⢠⠀⡀
⣧⣶⣶⣾⣶⣷⣴⣿⣾⡇⣤⣶⣿⣸⣿⣶⣶⣶⣶⣧⣷⣼⣷⣷⣷⣿⣦⣴⣧⣄⣷⣠⣷⣶⣾⣸⣿⣶⣶⣷⣿⣿⣿⣷⣧⣷⣼⣦⣶⣾⣿⣾⣼⣿⣿⣶⣶⣼⣦⣼⣾⣿⣶⣷
val = reduce( shift, arr ) | 6489.921ns | 5094212.014ns | ~12040.269ns | 53.902%
This is a raw performance test, so the endian pow-flip is left out.
The shift function shown applies the same shift-oring operation as the for loop, and arr is just array.array('B',[0,0,255,0]) as it has the fastest iterative performance next to dict.
I should probably also note efficiency is measured by accuracy to the average time.

Categories

Resources