Two's complement sign extension python? - python

I'm wondering if there's a way to do a two's complement sign extension as you would in C/C++ in Python, using standard libraries (preferably on a bitarray).
C/C++:
// Example program
#include <iostream>
#include <string>
int main()
{
int x = 0xFF;
x <<= (32 - 8);
x >>= (32 - 8);
std::cout << x;
return 0;
}
And here's a Python function I've written which (in my testing) accomplishes the same thing. I'm simply wondering if there's a built-in (or just faster) way of doing it:
def sign_extend(value, bits):
highest_bit_mask = 1 << (bits - 1)
remainder = 0
for i in xrange(bits - 1):
remainder = (remainder << 1) + 1
if value & highest_bit_mask == highest_bit_mask:
value = (value & remainder) - highest_bit_mask
else:
value = value & remainder
return value

The following code delivers the same results as your function, but is a bit shorter. Also, obviously, if you are going to apply this to a lot of data, you can pre-calculate both the masks.
def sign_extend(value, bits):
sign_bit = 1 << (bits - 1)
return (value & (sign_bit - 1)) - (value & sign_bit)

Related

Mismatch in integer calculation results between C++ and Python (random number generator)

I needed a pseudo random number generation scheme (independent of standard libraries) for one of my projects and I tried a simple LCG based RNG. It seems to work fine except that it produces different values in C++ and Python. I am giving the relevant code for both below. I am not able to find the error. Any help will be greatly appreciated!
(c++)
// file: lgc.cc
// compile and run with: g++ lgc.cc -o lgc && ./lgc
#include <cstdio>
#include <cstdint>
#include <vector>
using namespace std;
uint64_t LGC(uint64_t x) {
uint64_t A = 1103515245;
uint64_t C = 12345;
uint64_t M = (1 << 31);
return (A * x + C) % M;
}
int main(int argc, char* argv[]) {
for (auto x : {485288831, 485288832, 10, 16, 255, 756}) {
uint64_t y = LGC(x);
printf("%u %u\n", (uint32_t)x, (uint32_t) y);
}
return 0;
}
(python)
# file: lgc.py
# run with: python3 lgc.py
def LGC(x):
A = 1103515245;
C = 12345;
M = int(2 ** 31);
return (A * x + C) % M;
for x in [485288831, 485288832, 10, 16, 255, 756]:
y = LGC(x)
print(x, y)
(Results: c++)
485288831 3822790476
485288832 631338425
10 2445230203
16 476387081
255 2223525580
756 1033882141
(Results: python)
485288831 1675306828
485288832 631338425
10 297746555
16 476387081
255 76041932
756 1033882141
The problem is with the shift in: uint64_t M = (1 << 31); the (1<<31) is a negative number -2147483648 for 32 bit integers because this is signed integer math. To fix you can use uint64_t M = (1U << 31); to make the shift use unsigned 32 bit integers. You could also use uint64_t M = (1UL << 31); to have the calculation use 64 bit unsigned integers
The following link shows printing the output of: (1 << 31), (1U << 31) and (1UL << 31) on a system with 32 bit int:
https://ideone.com/vU0eyX

How can I convert C++ code of a CRC16-CCITT algorithm to Python code?

I have an example code for CRC16-CCITT algorithm written in C++ and I need help converting it to Python.
Example C++ code:
#include<iostream>
using namespace std;
unsigned short calculateCRC(unsigned char data[], unsigned int length)
{
unsigned int i;
unsigned short crc = 0;
for(i=0; i<length; i++){
crc = (unsigned char)(crc >>8) | (crc<<8);
crc ^= data[i];
crc ^= (unsigned char)(crc & 0xff) >> 4;
crc ^= crc << 12;
crc ^= (crc & 0x00ff) << 5;
}
return crc;
}
int main()
{
unsigned int length;
length = 15;
unsigned char data[length] = {0x01,0x08,0x00,0x93,0x50,0x2e,0x42,0x83,0x3e,0xf1,0x3f,0x48,0xb5,0x04,0xbb};
unsigned int crc;
crc = calculateCRC(data, length);
cout<< std::hex << crc << '\n';
}
This code gives 9288 as output which is correct.
I tried the following in Python:
#!/usr/bin/env python3
def calculateCRC(data):
crc = 0
for dat in data:
crc = (crc >> 8) or (crc << 8)
crc ^= dat
crc ^= (crc and 0xff) >> 4
crc ^= crc << 12
crc ^= (crc and 0x00ff) << 5
crc = hex(crc)
return (crc)
data = [0x01,0x08,0x00,0x93,0x50,0x2e,0x42,0x83,0x3e,0xf1,0x3f,0x48,0xb5,0x04,0xbb]
print(calculateCRC(data))
This outputs 0xf988334b0799be2081.
Could you please help me understand what I am doing wrong?
Thank you.
Python's int type is unbounded, but C / C++ unsigned short values are represented in 2 bytes so overflow when you shifted to the left. You need to add masking in Python to achieve the same effect, where you remove any bits higher than the 16th most-significant bit. This is only needed where values are shifted to the left as right-shifting already drops the right-most rotated bits.
Next, you are confusing the | and & bitwise operators with or and and boolean logical operators. The C++ code uses bitwise operators, use the same operators in Python.
Last but not least, leave conversion to hex to the caller, don't do this in the CRC function itself:
UNSIGNED_SHORT_MASK = 0xFFFF # 2 bytes, 16 bits.
def calculateCRC(data):
crc = 0
for dat in data:
crc = (crc >> 8) | (crc << 8 & UNSIGNED_SHORT_MASK)
crc ^= dat
crc ^= (crc & 0xff) >> 4
crc ^= crc << 12 & UNSIGNED_SHORT_MASK
crc ^= (crc & 0x00ff) << 5
return crc
Now you get the same output:
>>> print(format(calculateCRC(data), '04x'))
9288
I used the format() function rather than hex() to create hex output without a 0x prefix.
As Mark Adler rightly points out, we don't need to mask for every left-shift operation; just because the C / C++ operations naturally would result in a masked value, doesn't mean we need to do this as often here. Masking once per iteration is enough:
def calculateCRC(data):
crc = 0
for dat in data:
crc = (crc >> 8) | (crc << 8)
crc ^= dat
crc ^= (crc & 0xFF) >> 4
crc ^= crc << 12
crc ^= (crc & 0x00FF) << 5
crc &= 0xFFFF
return crc
There may be more short-cuts we could apply to shave off operations and so speed up operations, but if speed really is an issue, I'd re-implement this in Cython or C or another natively-compiled option, anyway.
Also note you can use a bytes object, you don't have to use a list of integers:
data = b'\x01\x08\x00\x93\x50\x2e\x42\x83\x3e\xf1\x3f\x48\xb5\x04\xbb'
Looping over a bytes object still gives you integers between 0 and 255, just like the char array in C++.
Finally, you don't actually have to translate the code yourself, you could just use an existing project like crccheck, which implements this specific CRC16 variant as well as many others:
>>> from crccheck.crc import CrcXmodem
>>> print(format(CrcXmodem.calc(data), '04x'))
9288
crccheck is written in pure Python. For native implementations, there is crcmod. This library's documentation is a little lacking, but it is also very flexible and powerful, and actually includes predefined functions:
>>> from crcmod.predefined import mkPredefinedCrcFun
>>> xmodem = mkPredefinedCrcFun('xmodem')
>>> print(format(xmodem(data), '04x'))
9288

implement the logic in c++ using python, how?

I want to implement below logic in c++ using python.
struct hash_string ///
{
hash_string() {}
uint32_t operator ()(const std::string &text) const
{
//std::cout << text << std::endl;
static const uint32_t primes[16] =
{
0x01EE5DB9, 0x491408C3, 0x0465FB69, 0x421F0141,
0x2E7D036B, 0x2D41C7B9, 0x58C0EF0D, 0x7B15A53B,
0x7C9D3761, 0x5ABB9B0B, 0x24109367, 0x5A5B741F,
0x6B9F12E9, 0x71BA7809, 0x081F69CD, 0x4D9B740B,
};
//std::cout << text.size() << std::endl;
uint32_t sum = 0;
for (size_t i = 0; i != text.size(); i ++) {
sum += primes[i & 15] * (unsigned char)text[i];
//std::cout << text[i] <<std::endl;
// std::cout << (unsigned char)text[i] << std::endl;
}
return sum;
}
};
python version is like this, which is not completed yet, since I haven't found a way to convert text to unsigned char. So, please help!
# -*- coding: utf-8 -*-
text = u'连衣裙女韩范'
primes = [0x01EE5DB9, 0x491408C3, 0x0465FB69, 0x421F0141,
0x2E7D036B, 0x2D41C7B9, 0x58C0EF0D, 0x7B15A53B,
0x7C9D3761, 0x5ABB9B0B, 0x24109367, 0x5A5B741F,
0x6B9F12E9, 0x71BA7809, 0x081F69CD, 0x4D9B740B]
//*text[i] does not work (of course), but how to mimic the logic above
rand = [primes[i & 15]***text[i]** for i in range(len(text))]
print rand
sum_agg = sum(rand)
print sum_agg
Take text=u'连衣裙女韩范' for example, c++ version returns 18 for text.size() and sum is 2422173716, while, in python, I don't know how to make it 18.
The equality of text size is essential, as a start at least.
Because you are using unicode, for an exact reproduction you will need to turn text in a series of bytes (chars in c++).
bytes_ = text.encode("utf8")
# when iterated over this will yield ints (in python 3)
# or single character strings in python 2
You should use more pythonic idioms for iterating over a pair of sequences
pairs = zip(bytes_, primes)
What if bytes_ is longer than primes? Use itertools.cycle
from itertools import cycle
pairs = zip(bytes_, cycle(primes))
All together:
from itertools import cycle
text = u'连衣裙女韩范'
primes = [0x01EE5DB9, 0x491408C3, 0x0465FB69, 0x421F0141,
0x2E7D036B, 0x2D41C7B9, 0x58C0EF0D, 0x7B15A53B,
0x7C9D3761, 0x5ABB9B0B, 0x24109367, 0x5A5B741F,
0x6B9F12E9, 0x71BA7809, 0x081F69CD, 0x4D9B740B]
# if python 3
rand = [byte * prime for byte, prime in zip(text.encode("utf8"), cycle(primes))]
# else if python 2 (use ord to convert single character string to int)
rand = [ord(byte) * prime for byte, prime in zip(text.encode("utf8"), cycle(primes))]
hash_ = sum(rand)

Bitwise Rotate Right

I am trying to convert this C function into Python;
typedef unsigned long var;
/* Bit rotate rightwards */
var ror(var v,unsigned int bits) {
return (v>>bits)|(v<<(8*sizeof(var)-bits));
}
I have tried Googling for some solutions, but I can't seem to get any of them to give the same results as the one here.
This is one solution I have found from another program;
def mask1(n):
"""Return a bitmask of length n (suitable for masking against an
int to coerce the size to a given length)
"""
if n >= 0:
return 2**n - 1
else:
return 0
def ror(n, rotations=1, width=8):
"""Return a given number of bitwise right rotations of an integer n,
for a given bit field width.
"""
rotations %= width
if rotations < 1:
return n
n &= mask1(width)
return (n >> rotations) | ((n << (8 * width - rotations)))
I am trying to btishift key = 0xf0f0f0f0f123456. The C code gives 000000000f0f0f12 when it is called with; ror(key, 8 << 1) and Python gives; 0x0f0f0f0f0f123456 (the original input!)
Your C output doesn't match the function that you provided. That is presumably because you are not printing it correctly. This program:
#include <stdio.h>
#include <stdint.h>
uint64_t ror(uint64_t v, unsigned int bits)
{
return (v>>bits) | (v<<(8*sizeof(uint64_t)-bits));
}
int main(void)
{
printf("%llx\n", ror(0x0123456789abcdef, 4));
printf("%llx\n", ror(0x0123456789abcdef, 8));
printf("%llx\n", ror(0x0123456789abcdef, 12));
printf("%llx\n", ror(0x0123456789abcdef, 16));
return 0;
}
produces the following output:
f0123456789abcde
ef0123456789abcd
def0123456789abc
cdef0123456789ab
To produce an ror function in Python I refer you to this excellent article: http://www.falatic.com/index.php/108/python-and-bitwise-rotation
This Python 2 code produces the same output as the C program above:
ror = lambda val, r_bits, max_bits: \
((val & (2**max_bits-1)) >> r_bits%max_bits) | \
(val << (max_bits-(r_bits%max_bits)) & (2**max_bits-1))
print "%x" % ror(0x0123456789abcdef, 4, 64)
print "%x" % ror(0x0123456789abcdef, 8, 64)
print "%x" % ror(0x0123456789abcdef, 12, 64)
print "%x" % ror(0x0123456789abcdef, 16, 64)
The shortest way I've found in Python:
(note this works only with integers as inputs)
def ror(n,rotations,width):
return (2**width-1)&(n>>rotations|n<<(width-rotations))
There are different problems in your question.
C part :
You use a value of key that is a 64 bits value (0x0f0f0f0f0f123456), but the output shows that for you compiler unsigned long is only 32 bits wide. So what C code does is rotating the 32 bits value 0x0f123456 16 times giving 0x34560f12
If you had used unsigned long long (assuming it is 64 bits on your architecture as it is on mine), you would have got 0x34560f0f0f0f0f12 (rotation 16 times of a 64 bits)
Python part :
The definition of width between mask1 and ror is not consistent. mask1 takes a width in bits, where ror takes a width in bytes and one byte = 8 bits.
The ror function should be :
def ror(n, rotations=1, width=8):
"""Return a given number of bitwise right rotations of an integer n,
for a given bit field width.
"""
rotations %= width * 8 # width bytes give 8*bytes bits
if rotations < 1:
return n
mask = mask1(8 * width) # store the mask
n &= mask
return (n >> rotations) | ((n << (8 * width - rotations)) & mask) # apply the mask to result
That way with key = 0x0f0f0f0f0f123456, you get :
>>> hex(ror(key, 16))
'0x34560f0f0f0f0f12L'
>>> hex(ror(key, 16, 4))
'0x34560f12L'
exactly the same as C output
i know its nearly 6 years old
I always find it easier to use string slices than bitwise operations.
def rotate_left(x, n):
return int(f"{x:032b}"[n:] + f"{x:032b}"[:n], 2)
def rotate_right(x, n):
return int(f"{x:032b}"[-n:] + f"{x:032b}"[:-n], 2)
def rotation_value(value, rotations, widht=32):
""" Return a given number of bitwise left or right rotations of an interger
value,
for a given bit field widht.
if rotations == -rotations:
left
else:
right
"""
if int(rotations) != abs(int(rotations)):
rotations = widht + int(rotations)
return (int(value)<<(widht-(rotations%widht)) | (int(value)>>(rotations%widht))) & ((1<<widht)-1)

Number of ones in a binary number stored in base 10 [duplicate]

Efficient way to count number of 1s in the binary representation of a number in O(1) if you have enough memory to play with. This is an interview question I found on an online forum, but it had no answer. Can somebody suggest something, I cant think of a way to do it in O(1) time?
That's the Hamming weight problem, a.k.a. population count. The link mentions efficient implementations. Quoting:
With unlimited memory, we could simply create a large lookup table of the Hamming weight of every 64 bit integer
I've got a solution that counts the bits in O(Number of 1's) time:
bitcount(n):
count = 0
while n > 0:
count = count + 1
n = n & (n-1)
return count
In worst case (when the number is 2^n - 1, all 1's in binary) it will check every bit.
Edit:
Just found a very nice constant-time, constant memory algorithm for bitcount. Here it is, written in C:
int BitCount(unsigned int u)
{
unsigned int uCount;
uCount = u - ((u >> 1) & 033333333333) - ((u >> 2) & 011111111111);
return ((uCount + (uCount >> 3)) & 030707070707) % 63;
}
You can find proof of its correctness here.
Please note the fact that: n&(n-1) always eliminates the least significant 1.
Hence we can write the code for calculating the number of 1's as follows:
count=0;
while(n!=0){
n = n&(n-1);
count++;
}
cout<<"Number of 1's in n is: "<<count;
The complexity of the program would be: number of 1's in n (which is constantly < 32).
I saw the following solution from another website:
int count_one(int x){
x = (x & (0x55555555)) + ((x >> 1) & (0x55555555));
x = (x & (0x33333333)) + ((x >> 2) & (0x33333333));
x = (x & (0x0f0f0f0f)) + ((x >> 4) & (0x0f0f0f0f));
x = (x & (0x00ff00ff)) + ((x >> 8) & (0x00ff00ff));
x = (x & (0x0000ffff)) + ((x >> 16) & (0x0000ffff));
return x;
}
public static void main(String[] args) {
int a = 3;
int orig = a;
int count = 0;
while(a>0)
{
a = a >> 1 << 1;
if(orig-a==1)
count++;
orig = a >> 1;
a = orig;
}
System.out.println("Number of 1s are: "+count);
}
countBits(x){
y=0;
while(x){
y += x & 1 ;
x = x >> 1 ;
}
}
thats it?
Below are two simple examples (in C++) among many by which you can do this.
We can simply count set bits (1's) using __builtin_popcount().
int numOfOnes(int x) {
return __builtin_popcount(x);
}
Loop through all bits in an integer, check if a bit is set and if it is then increment the count variable.
int hammingDistance(int x) {
int count = 0;
for(int i = 0; i < 32; i++)
if(x & (1 << i))
count++;
return count;
}
That will be the shortest answer in my SO life: lookup table.
Apparently, I need to explain a bit: "if you have enough memory to play with" means, we've got all the memory we need (nevermind technical possibility). Now, you don't need to store lookup table for more than a byte or two. While it'll technically be Ω(log(n)) rather than O(1), just reading a number you need is Ω(log(n)), so if that's a problem, then the answer is, impossible—which is even shorter.
Which of two answers they expect from you on an interview, no one knows.
There's yet another trick: while engineers can take a number and talk about Ω(log(n)), where n is the number, computer scientists will say that actually we're to measure running time as a function of a length of an input, so what engineers call Ω(log(n)) is actually Ω(k), where k is the number of bytes. Still, as I said before, just reading a number is Ω(k), so there's no way we can do better than that.
Below will work as well.
nofone(int x) {
a=0;
while(x!=0) {
x>>=1;
if(x & 1)
a++;
}
return a;
}
The following is a C solution using bit operators:
int numberOfOneBitsInInteger(int input) {
int numOneBits = 0;
int currNum = input;
while (currNum != 0) {
if ((currNum & 1) == 1) {
numOneBits++;
}
currNum = currNum >> 1;
}
return numOneBits;
}
The following is a Java solution using powers of 2:
public static int numOnesInBinary(int n) {
if (n < 0) return -1;
int j = 0;
while ( n > Math.pow(2, j)) j++;
int result = 0;
for (int i=j; i >=0; i--){
if (n >= Math.pow(2, i)) {
n = (int) (n - Math.pow(2,i));
result++;
}
}
return result;
}
The function takes an int and returns the number of Ones in binary representation
public static int findOnes(int number)
{
if(number < 2)
{
if(number == 1)
{
count ++;
}
else
{
return 0;
}
}
value = number % 2;
if(number != 1 && value == 1)
count ++;
number /= 2;
findOnes(number);
return count;
}
I came here having a great belief that I know beautiful solution for this problem. Code in C:
short numberOfOnes(unsigned int d) {
short count = 0;
for (; (d != 0); d &= (d - 1))
++count;
return count;
}
But after I've taken a little research on this topic (read other answers:)) I found 5 more efficient algorithms. Love SO!
There is even a CPU instruction designed specifically for this task: popcnt.
(mentioned in this answer)
Description and benchmarking of many algorithms you can find here.
The best way in javascript to do so is
function getBinaryValue(num){
return num.toString(2);
}
function checkOnces(binaryValue){
return binaryValue.toString().replace(/0/g, "").length;
}
where binaryValue is the binary String eg: 1100
There's only one way I can think of to accomplish this task in O(1)... that is to 'cheat' and use a physical device (with linear or even parallel programming I think the limit is O(log(k)) where k represents the number of bytes of the number).
However you could very easily imagine a physical device that connects each bit an to output line with a 0/1 voltage. Then you could just electronically read of the total voltage on a 'summation' line in O(1). It would be quite easy to make this basic idea more elegant with some basic circuit elements to produce the output in whatever form you want (e.g. a binary encoded output), but the essential idea is the same and the electronic circuit would produce the correct output state in fixed time.
I imagine there are also possible quantum computing possibilities, but if we're allowed to do that, I would think a simple electronic circuit is the easier solution.
I have actually done this using a bit of sleight of hand: a single lookup table with 16 entries will suffice and all you have to do is break the binary rep into nibbles (4-bit tuples). The complexity is in fact O(1) and I wrote a C++ template which was specialized on the size of the integer you wanted (in # bits)… makes it a constant expression instead of indetermined.
fwiw you can use the fact that (i & -i) will return you the LS one-bit and simply loop, stripping off the lsbit each time, until the integer is zero — but that’s an old parity trick.
The below method can count the number of 1s in negative numbers as well.
private static int countBits(int number) {
int result = 0;
while(number != 0) {
result += number & 1;
number = number >>> 1;
}
return result;
}
However, a number like -1 is represented in binary as 11111111111111111111111111111111 and so will require a lot of shifting. If you don't want to do so many shifts for small negative numbers, another way could be as follows:
private static int countBits(int number) {
boolean negFlag = false;
if(number < 0) {
negFlag = true;
number = ~number;
}
int result = 0;
while(number != 0) {
result += number & 1;
number = number >> 1;
}
return negFlag? (32-result): result;
}
In python or any other convert to bin string then split it with '0' to get rid of 0's then combine and get the length.
len(''.join(str(bin(122011)).split('0')))-1
By utilizing string operations of JS one can do as follows;
0b1111011.toString(2).split(/0|(?=.)/).length // returns 6
or
0b1111011.toString(2).replace("0","").length // returns 6
I had to golf this in ruby and ended up with
l=->x{x.to_s(2).count ?1}
Usage :
l[2**32-1] # returns 32
Obviously not efficient but does the trick :)
Ruby implementation
def find_consecutive_1(n)
num = n.to_s(2)
arr = num.split("")
counter = 0
max = 0
arr.each do |x|
if x.to_i==1
counter +=1
else
max = counter if counter > max
counter = 0
end
max = counter if counter > max
end
max
end
puts find_consecutive_1(439)
Two ways::
/* Method-1 */
int count1s(long num)
{
int tempCount = 0;
while(num)
{
tempCount += (num & 1); //inc, based on right most bit checked
num = num >> 1; //right shift bit by 1
}
return tempCount;
}
/* Method-2 */
int count1s_(int num)
{
int tempCount = 0;
std::string strNum = std::bitset< 16 >( num ).to_string(); // string conversion
cout << "strNum=" << strNum << endl;
for(int i=0; i<strNum.size(); i++)
{
if('1' == strNum[i])
{
tempCount++;
}
}
return tempCount;
}
/* Method-3 (algorithmically - boost string split could be used) */
1) split the binary string over '1'.
2) count = vector (containing splits) size - 1
Usage::
int count = 0;
count = count1s(0b00110011);
cout << "count(0b00110011) = " << count << endl; //4
count = count1s(0b01110110);
cout << "count(0b01110110) = " << count << endl; //5
count = count1s(0b00000000);
cout << "count(0b00000000) = " << count << endl; //0
count = count1s(0b11111111);
cout << "count(0b11111111) = " << count << endl; //8
count = count1s_(0b1100);
cout << "count(0b1100) = " << count << endl; //2
count = count1s_(0b11111111);
cout << "count(0b11111111) = " << count << endl; //8
count = count1s_(0b0);
cout << "count(0b0) = " << count << endl; //0
count = count1s_(0b1);
cout << "count(0b1) = " << count << endl; //1
A Python one-liner
def countOnes(num):
return bin(num).count('1')

Categories

Resources