I am trying to decompile asm code to python. I encountered the following line
movsx eax, byte ptr [edi]
I am looking for a way to do signed extension of a byte in python. I am currently using bytearray to get the individual bytes. After getting the individual bytes I need to do a signed extension for each of them.
I use the following snippet:
# sign extend b low bits in x
# from "Bit Twiddling Hacks"
def SIGNEXT(x, b):
m = 1 << (b - 1)
x = x & ((1 << b) - 1)
return (x ^ m) - m
In your case b will be 8. You can probably precalculate the masks for a bit of speedup.
The referenced hack can be found here.
Related
I'm trying to implement the djb2 hash in Python.
Here it is in C:
/* djb2 hash http://www.cse.yorku.ca/~oz/hash.html */
uint64_t djb2(size_t len, char const str[len]) {
uint64_t hash = 5381;
uint8_t c;
for(size_t i = 0; i < len; i++) {
c = str[i];
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
}
return hash;
}
And here's my attempt in Python:
from ctypes import c_uint64, c_byte, cast, POINTER
def djb2(string: str) -> c_uint64:
hash = c_uint64(5381)
raw_bytes = cast(string, POINTER(c_byte * len(string)))[0]
for i in range(0, len(raw_bytes)):
hash = c_uint64((((((hash.value << 5) & 0xffffffffffffffff) + hash.value) & 0xffffffffffffffff) + raw_bytes[i]) & 0xffffffffffffffff) # hash * 33 + c
return hash
However, I'm getting different results between the two, which I suspect is because of different overflow behavior, or otherwise mathematical differences.
The reason for the masking in the python version was to attempt to force an overflow (based on this answer).
You can implement the algorithm being run by the C code very easily in pure Python, without needing any ctypes stuff. Just do it all with regular Python integers, and take a modulus at the end (the high bits won't effect the lower ones for the operations you're doing):
def djb2(string: bytes) -> int: # note, use a bytestring for this, not a Unicode string!
h = 5381
for c in string: # iterating over the bytestring directly gives integer values
h = h * 33 + c # use the computation from the C comments, but consider ^ instead of +
return h % 2**64 # note you may actually want % 2**32, as this hash is often 32-bit
As I commented in the code, since this is an operation defined on bytestrings, you should use a bytes instance as the argument. Note that there are a bunch of different implementations of this algorithm. Some use use ^ (bitwise xor) instead of + in the step where you update the hash value, and it's often defined to use an unsigned long which was usually 32-bits instead of the explicitly 64-bit integer the C version in your question uses.
When calculating DJB2 hash in Python, you have to avoid using long arithmetic. For this purpose, you have to do hash &= 0xFFFFFFFFFFFFFFFF after each iteration.
Here is a proper one-liner implementation of DJB2 in Python:
import functools, itertools
djb2 = lambda x: functools.reduce(lambda x,c: (x*33 + c) & ((1<<64)-1), itertools.chain([5381], x))
Notes:
because Python is a scripting language, doing the (x << 5) + x instead of x*33 is not more efficient
((1<<64)-1) is just a short for 0xFFFFFFFFFFFFFFFF
Need some help understanding python solutions of leetcode 371. "Sum of Two Integers". I found https://discuss.leetcode.com/topic/49900/python-solution/2 is the most voted python solution, but I am having problem understand it.
How to understand the usage of "% MASK" and why "MASK = 0x100000000"?
How to understand "~((a % MIN_INT) ^ MAX_INT)"?
When sum beyond MAX_INT, the functions yells negative value (for example getSum(2147483647,2) = -2147483647), isn't that incorrect?
class Solution(object):
def getSum(self, a, b):
"""
:type a: int
:type b: int
:rtype: int
"""
MAX_INT = 0x7FFFFFFF
MIN_INT = 0x80000000
MASK = 0x100000000
while b:
a, b = (a ^ b) % MASK, ((a & b) << 1) % MASK
return a if a <= MAX_INT else ~((a % MIN_INT) ^ MAX_INT)
Let's disregard the MASK, MAX_INT and MIN_INT for a second.
Why does this black magic bitwise stuff work?
The reason why the calculation works is because (a ^ b) is "summing" the bits of a and b. Recall that bitwise xor is 1 when the bits differ, and 0 when the bits are the same. For example (where D is decimal and B is binary), 20D == 10100B, and 9D = 1001B:
10100
1001
-----
11101
and 11101B == 29D.
But, if you have a case with a carry, it doesn't work so well. For example, consider adding (bitwise xor) 20D and 20D.
10100
10100
-----
00000
Oops. 20 + 20 certainly doesn't equal 0. Enter the (a & b) << 1 term. This term represents the "carry" for each position. On the next iteration of the while loop, we add in the carry from the previous loop. So, if we go with the example we had before, we get:
# First iteration (a is 20, b is 20)
10100 ^ 10100 == 00000 # makes a 0
(10100 & 10100) << 1 == 101000 # makes b 40
# Second iteration:
000000 ^ 101000 == 101000 # Makes a 40
(000000 & 101000) << 1 == 0000000 # Makes b 0
Now b is 0, we are done, so return a. This algorithm works in general, not just for the specific cases I've outlined. Proof of correctness is left to the reader as an exercise ;)
What do the masks do?
All the masks are doing is ensuring that the value is an integer, because your code even has comments stating that a, b, and the return type are of type int. Thus, since the maximum possible int (32 bits) is 2147483647. So, if you add 2 to this value, like you did in your example, the int overflows and you get a negative value. You have to force this in Python, because it doesn't respect this int boundary that other strongly typed languages like Java and C++ have defined. Consider the following:
def get_sum(a, b):
while b:
a, b = (a ^ b), (a & b) << 1
return a
This is the version of getSum without the masks.
print get_sum(2147483647, 2)
outputs
2147483649
while
print Solution().getSum(2147483647, 2)
outputs
-2147483647
due to the overflow.
The moral of the story is the implementation is correct if you define the int type to only represent 32 bits.
Here is solution works in every case
cases
- -
- +
+ -
+ +
solution
python default int size is not 32bit, it is very large number, so to prevent overflow and stop running into infinite loop, we use 32bit mask to limit int size to 32bit (0xffffffff)
a,b=-1,-1
mask=0xffffffff
while (b & mask):
carry=a & b
a=a^b
b=carray <<1
print( (a&Mask) if b>0 else a)
For me, Matt's solution stuck in inifinite loop with inputs Solution().getSum(-1, 1)
So here is another (much slower) approach based on math:
import math
def getSum(a: int, b: int) -> int:
return int(math.log2(2**a * 2**b))
Below a and b (hex), representing two's complement signed binary numbers.
For example:
a = 0x17c7cc6e
b = 0xc158a854
Now I want to know the signed representation of a & b in base 10. Sorry I'm a low level programmer and new to python; feel very stupid for asking this. I don't care about additional library's but the answer should be simple and straight forward. Background: a & b are extracted data from a UDP packet. I have no control over the format. So please don't give me an answer that would assume I can change the format of those varibles before hand.
I have converted a & b into the following with this:
aBinary = bin(int(a, 16))[2:].zfill(32) => 00010111110001111100110001101110 => 398969966
bBinary = bin(int(b, 16))[2:].zfill(32) => 11000001010110001010100001010100 => -1051154348
I was trying to do something like this (doesn't work):
if aBinary[1:2] == 1:
aBinary = ~aBinary + int(1, 2)
What is the proper way to do this in python?
why not using ctypes ?
>>> import ctypes
>>> a = 0x17c7cc6e
>>> ctypes.c_int32(a).value
398969966
>>> b = 0xc158a854
>>> ctypes.c_int32(b).value
-1051154348
A nice way to do this in Python is using bitwise operations. For example, for 32-bit values:
def s32(value):
return -(value & 0x80000000) | (value & 0x7fffffff)
Applying this to your values:
>>> s32(a)
398969966
>>> s32(b)
-1051154348
What this function does is sign-extend the value so it's correctly interpreted with the right sign and value.
Python is a bit tricky in that it uses arbitrary precision integers, so negative numbers are treated as if there were an infinite series of leading 1 bits. For example:
>>> bin(-42 & 0xff)
'0b11010110'
>>> bin(-42 & 0xffff)
'0b1111111111010110'
>>> bin(-42 & 0xffffffff)
'0b11111111111111111111111111010110'
>>> import numpy
>>> numpy.int32(0xc158a854)
-1051154348
You'll have to know at least the width of your data. For instance, 0xc158a854 has 8 hexadecimal digits so it must be at least 32 bits wide; it appears to be an unsigned 32 bit value. We can process it using some bitwise operations:
In [232]: b = 0xc158a854
In [233]: if b >= 1<<31: b -= 1<<32
In [234]: b
Out[234]: -1051154348L
The L here marks that Python 2 has switched to processing the value as a long; it's usually not important, but in this case indicates that I've been working with values outside the common int range for this installation. The tool to extract data from binary structures such as UDP packets is struct.unpack; if you just tell it that your value is signed in the first place, it will produce the correct value:
In [240]: s = '\xc1\x58\xa8\x54'
In [241]: import struct
In [242]: struct.unpack('>i', s)
Out[242]: (-1051154348,)
That assumes two's complement representation; one's complement (such as the checksum used in UDP), sign and magnitude, or IEEE 754 floating point are some less common encodings for numbers.
Another modern solution:
>>> b = 0xc158a854
>>> int.from_bytes(bytes.fromhex(hex(b)[2:]), byteorder='big', signed=True)
-1051154348
2^31 = 0x80000000 (sign bit, which in two's compliment represents -2^31)
2^31-1 = 0x7fffffff (all the positive bits)
and hence (n & 0x7fffffff) - (n & 0x80000000) will apply the sign correctly
you could even do, n - ((n & 0x80000000)<<1) to subtract the msb value twice
or finally there's, (n & 0x7fffffff) | -(n & 0x80000000) to just merge the negative bit, not subtract
def signedHex(n): return (n & 0x7fffffff) | -(n & 0x80000000)
signedHex = lambda n: (n & 0x7fffffff) | -(n & 0x80000000)
value=input("enter hexa decimal value for getting compliment values:=")
highest_value="F"*len(value)
resulting_decimal=int(highest_value,16)-int(value,16)
ones_compliment=hex(resulting_decimal)
twos_compliment=hex(r+1)
print(f'ones compliment={ones_compliment}\n twos complimet={twos_compliment}')
Just to be fully up front, this is regarding a homework, but this in itself is not the assignment. The assignment is using noise to create cool graphics. I have a bit of experience in Python but not enough to figure this simple thing out.
I'm having trouble generating a seeded-random [-1,1]. The pseudocode my teacher gave me is from Hugo Elias.
Pseudocode:
function Noise1(integer x, integer y)
n = x + y * 57
n = (n<<13) ^ n;
return ( 1.0 - ( (n * (n * n * 15731 + 789221) + 1376312589) & 7fffffff) / 1073741824.0);
end function
My attempt in Python:
def noise(x, y):
n = x + y * 57
n = (n<<5) ^ n;
return ( 1.0 - ( (n * (n * n * 15731 + 789221) + 1376312589) & 7fffffff) / 1073741824.0)
The problem is the & 7fffffff bit in return statement. First, I'm not sure what that operation is. Maybe a bit-shift? Second, I'm not sure how to do that operation in Python. I had just removed that part, but I am getting huge negative numbers, nowhere near a [-1,1]
The & symbol stands for bitwise AND
The ^ symbol stands for bitwise XOR
and the 7FFFFFFF is a HEX number.
In programming you can represent a hex number using 0x where 7FFFFFFF is 0x7FFFFFFF
Some further reading on hex numbers.
To do a binary AND in python you just use & 0x7FFFFFFF
see more here about bitwise operations in python
I replaced 7FFFFFFF with 0x7FFFFFFF in your existing code and tried plugging in some random values and all the answers which I got were in [-1, 1].
The problem is the & 7fffffff bit in return statement. First, I'm not sure what that operation is. Maybe a bit-shift?
A bit mask. << is used for shifting. & is bitwise-AND. 7fffffff, interpreted as a hexadecimal number, is a quantity that, if re-written in binary, has 31 bits, all of which are 1. The effect is to select the low-order 31 bits of the value.
To tell Python that 7fffffff should be interpreted as a hexadecimal number, you must prefix it with 0x, hence 0x7fffffff.
Second, I'm not sure how to do that operation in Python.
The same way as in the pseudocode (i.e. with &).
I have a bit counting method that I am trying to make as fast as possible. I want to try the algorithm below from Bit Twiddling Hacks, but I don't know C. What is 'type T' and what is the python equivalent of (T)~(T)0/3?
A generalization of the best bit
counting method to integers of
bit-widths upto 128 (parameterized by
type T) is this:
v = v - ((v >> 1) & (T)~(T)0/3); // temp
v = (v & (T)~(T)0/15*3) + ((v >> 2) & (T)~(T)0/15*3); // temp
v = (v + (v >> 4)) & (T)~(T)0/255*15; // temp
c = (T)(v * ((T)~(T)0/255)) >> (sizeof(v) - 1) * CHAR_BIT; // count
T is a integer type, which I'm assuming is unsigned. Since this is C, it'll be fixed width, probably (but not necessarily) one of 8, 16, 32, 64 or 128. The fragment (T)~(T)0 that appears repeatedly in that code sample just gives the value 2**N-1, where N is the width of the type T. I suspect that the code may require that N be a multiple of 8
for correct operation.
Here's a direct translation of the given code into Python, parameterized in terms of N, the width of T in bits.
def count_set_bits(v, N=128):
mask = (1 << N) - 1
v = v - ((v >> 1) & mask//3)
v = (v & mask//15*3) + ((v >> 2) & mask//15*3)
v = (v + (v >> 4)) & mask//255*15
return (mask & v * (mask//255)) >> (N//8 - 1) * 8
Caveats:
(1) the above will only work for numbers up to 2**128. You might be able to generalize it for larger numbers, though.
(2) There are obvious inefficiencies: for example, 'mask//15' is computed twice. This doesn't matter for C, of course, because the compiler will almost certainly do the division at compile time rather than run time, but Python's peephole optimizer may not be so clever.
(3) The fastest C method may well not translate to the fastest Python method. For Python speed, you should probably be looking for an algorithm that minimizes the number of Python bitwise operations. As Alexander Gessler said: profile!
What you copied is a template for generating code. It's not a good idea to transliterate that template into another language and expect it to run fast. Let's expand the template.
(T)~(T)0 means "as many 1-bits as fit in type T". The algorithm needs 4 masks which we will compute for the various T-sizes we might be interested in.
>>> for N in (8, 16, 32, 64, 128):
... all_ones = (1 << N) - 1
... constants = ' '.join([hex(x) for x in [
... all_ones // 3,
... all_ones // 15 * 3,
... all_ones // 255 * 15,
... all_ones // 255,
... ]])
... print N, constants
...
8 0x55 0x33 0xf 0x1
16 0x5555 0x3333 0xf0f 0x101
32 0x55555555L 0x33333333L 0xf0f0f0fL 0x1010101L
64 0x5555555555555555L 0x3333333333333333L 0xf0f0f0f0f0f0f0fL 0x101010101010101L
128 0x55555555555555555555555555555555L 0x33333333333333333333333333333333L 0xf0f0f0f0f0f0f0f0f0f0f0f0f0f0f0fL 0x1010101010101010101010101010101L
>>>
You'll notice that the masks generated for the 32-bit case match those in the hardcoded 32-bit C code. Implementation detail: lose the L suffix from the 32-bit masks (Python 2.x) and lose all L suffixes for Python 3.x.
As you can see the whole template and (T)~(T)0 caper is merely obfuscatory sophistry. Put quite simply, for a k-byte type, you need 4 masks:
k bytes each 0x55
k bytes each 0x33
k bytes each 0x0f
k bytes each 0x01
and the final shift is merely N-8 (i.e. 8*(k-1)) bits. Aside: I doubt if the template code would actually work on a machine whose CHAR_BIT was not 8, but there aren't very many of those around these days.
Update: There is another point that affects the correctness and the speed when transliterating such algorithms from C to Python. The C algorithms often assume unsigned integers. In C, operations on unsigned integers work silently modulo 2**N. In other words, only the least significant N bits are retained. No overflow exceptions. Many bit twiddling algorithms rely on this. However (a) Python's int and long are signed (b) old Python 2.X will raise an exception, recent Python 2.Xs will silently promote int to long and Python 3.x int == Python 2.x long.
The correctness problem usually requires register &= all_ones at least once in the Python code. Careful analysis is often required to determine the minimal correct masking.
Working in long instead of int doesn't do much for efficiency. You'll notice that the algorithm for 32 bits will return a long answer even from input of 0, because the 32-bits all_ones is long.