Related
I'm having successfully embedded a Python script into a C module. The Python script produces a multi-dimensional Numpy array. Whereas the entire calculation in python takes 9 ms, the final tolist() conversion in order to return it to C takes 4 ms alone. I would like to change that by passing the Numpy array as reference and do the iterations in C again. But I can't currently figure out, how this can be done.
There are a lot of samples around, which use the other way around: Passing a Numpy array to a C function which is called from Python, but this is not my use case.
Any pointer welcome.
Ok, it's a while ago but I solved it like so:
My python process delivers an array, containing one array, containing one array, containing N arrays of M floats each. The input is a JPEG image.
Unwrapping it like so:
int predict(PyObject *pyFunction, unsigned char *image_pointer, unsigned long image_len) {
int result = -1;
PyObject *pImage = NULL;
PyObject *pList = NULL;
pImage = PyBytes_FromStringAndSize((const char *)image_pointer, image_len);
if (!pImage) {
fprintf(stderr, "Cannot provide image to python 'predict'\n");
return result;
}
pList = PyObject_CallFunctionObjArgs(pyFunction, pImage, NULL);
Py_DECREF(pImage);
PyArrayObject *pPrediction = reinterpret_cast<PyArrayObject *>(pList);
if (!pPrediction) {
fprintf(stderr, "Cannot predict, for whatever reason\n");
return result;
}
if (PyArray_NDIM(pPrediction) != 4) {
fprintf(stderr, "Prediction failed, returned array with wrong dimensions\n");
} else {
RESULTPTR pResult = reinterpret_cast<RESULTPTR>(PyArray_DATA(pPrediction));
int len0 = PyArray_SHAPE(pPrediction)[0];
int len1 = PyArray_SHAPE(pPrediction)[1];
int len2 = PyArray_SHAPE(pPrediction)[2];
int len3 = PyArray_SHAPE(pPrediction)[3];
for (int i = 0; i < len0; i++) {
int offs1 = i * len1;
for (int j = 0; j < len1; j++) {
int offs2 = j * len2;
for (int k = 0; k < len2; k++) {
int offs3 = k * len3;
for (int l = 0; l < len3; l++) {
float f = (*pResult)[offs1 + offs2 + offs3 + l];
//printf("data: %.8f\n", f);
}
}
}
}
result = 0;
}
Py_XDECREF(pList);
return result;
}
HTH
I need to know how I would go about recreating a version of the int() function in Python so that I can fully understand it and create my own version with multiple bases that go past base 36. I can convert from a decimal to my own base (base 54, altered) just fine, but I need to figure out how to go from a string in my base's format to an integer (base 10).
Basically, I want to know how to go from my base, which I call base 54, to base 10. I don't need specifics, because if I have an example, I can work it out on my own. Unfortunately, I can't find anything on the int() function, though I know it has to be somewhere, since Python is open-source.
This is the closest I can find to it, but it doesn't provide source code for the function itself. Python int() test.
If you can help, thanks. If not, well, thanks for reading this, I guess?
You're not going to like this answer, but int(num, base) is defined in C (it's a builtin)
I went searching around and found it:
https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Objects/longobject.c
/* Parses an int from a bytestring. Leading and trailing whitespace will be
* ignored.
*
* If successful, a PyLong object will be returned and 'pend' will be pointing
* to the first unused byte unless it's NULL.
*
* If unsuccessful, NULL will be returned.
*/
PyObject *
PyLong_FromString(const char *str, char **pend, int base)
{
int sign = 1, error_if_nonzero = 0;
const char *start, *orig_str = str;
PyLongObject *z = NULL;
PyObject *strobj;
Py_ssize_t slen;
if ((base != 0 && base < 2) || base > 36) {
PyErr_SetString(PyExc_ValueError,
"int() arg 2 must be >= 2 and <= 36");
return NULL;
}
while (*str != '\0' && Py_ISSPACE(Py_CHARMASK(*str))) {
str++;
}
if (*str == '+') {
++str;
}
else if (*str == '-') {
++str;
sign = -1;
}
if (base == 0) {
if (str[0] != '0') {
base = 10;
}
else if (str[1] == 'x' || str[1] == 'X') {
base = 16;
}
else if (str[1] == 'o' || str[1] == 'O') {
base = 8;
}
else if (str[1] == 'b' || str[1] == 'B') {
base = 2;
}
else {
/* "old" (C-style) octal literal, now invalid.
it might still be zero though */
error_if_nonzero = 1;
base = 10;
}
}
if (str[0] == '0' &&
((base == 16 && (str[1] == 'x' || str[1] == 'X')) ||
(base == 8 && (str[1] == 'o' || str[1] == 'O')) ||
(base == 2 && (str[1] == 'b' || str[1] == 'B')))) {
str += 2;
/* One underscore allowed here. */
if (*str == '_') {
++str;
}
}
if (str[0] == '_') {
/* May not start with underscores. */
goto onError;
}
start = str;
if ((base & (base - 1)) == 0) {
int res = long_from_binary_base(&str, base, &z);
if (res < 0) {
/* Syntax error. */
goto onError;
}
}
else {
/***
Binary bases can be converted in time linear in the number of digits, because
Python's representation base is binary. Other bases (including decimal!) use
the simple quadratic-time algorithm below, complicated by some speed tricks.
First some math: the largest integer that can be expressed in N base-B digits
is B**N-1. Consequently, if we have an N-digit input in base B, the worst-
case number of Python digits needed to hold it is the smallest integer n s.t.
BASE**n-1 >= B**N-1 [or, adding 1 to both sides]
BASE**n >= B**N [taking logs to base BASE]
n >= log(B**N)/log(BASE) = N * log(B)/log(BASE)
The static array log_base_BASE[base] == log(base)/log(BASE) so we can compute
this quickly. A Python int with that much space is reserved near the start,
and the result is computed into it.
The input string is actually treated as being in base base**i (i.e., i digits
are processed at a time), where two more static arrays hold:
convwidth_base[base] = the largest integer i such that base**i <= BASE
convmultmax_base[base] = base ** convwidth_base[base]
The first of these is the largest i such that i consecutive input digits
must fit in a single Python digit. The second is effectively the input
base we're really using.
Viewing the input as a sequence <c0, c1, ..., c_n-1> of digits in base
convmultmax_base[base], the result is "simply"
(((c0*B + c1)*B + c2)*B + c3)*B + ... ))) + c_n-1
where B = convmultmax_base[base].
Error analysis: as above, the number of Python digits `n` needed is worst-
case
n >= N * log(B)/log(BASE)
where `N` is the number of input digits in base `B`. This is computed via
size_z = (Py_ssize_t)((scan - str) * log_base_BASE[base]) + 1;
below. Two numeric concerns are how much space this can waste, and whether
the computed result can be too small. To be concrete, assume BASE = 2**15,
which is the default (and it's unlikely anyone changes that).
Waste isn't a problem: provided the first input digit isn't 0, the difference
between the worst-case input with N digits and the smallest input with N
digits is about a factor of B, but B is small compared to BASE so at most
one allocated Python digit can remain unused on that count. If
N*log(B)/log(BASE) is mathematically an exact integer, then truncating that
and adding 1 returns a result 1 larger than necessary. However, that can't
happen: whenever B is a power of 2, long_from_binary_base() is called
instead, and it's impossible for B**i to be an integer power of 2**15 when
B is not a power of 2 (i.e., it's impossible for N*log(B)/log(BASE) to be
an exact integer when B is not a power of 2, since B**i has a prime factor
other than 2 in that case, but (2**15)**j's only prime factor is 2).
The computed result can be too small if the true value of N*log(B)/log(BASE)
is a little bit larger than an exact integer, but due to roundoff errors (in
computing log(B), log(BASE), their quotient, and/or multiplying that by N)
yields a numeric result a little less than that integer. Unfortunately, "how
close can a transcendental function get to an integer over some range?"
questions are generally theoretically intractable. Computer analysis via
continued fractions is practical: expand log(B)/log(BASE) via continued
fractions, giving a sequence i/j of "the best" rational approximations. Then
j*log(B)/log(BASE) is approximately equal to (the integer) i. This shows that
we can get very close to being in trouble, but very rarely. For example,
76573 is a denominator in one of the continued-fraction approximations to
log(10)/log(2**15), and indeed:
>>> log(10)/log(2**15)*76573
16958.000000654003
is very close to an integer. If we were working with IEEE single-precision,
rounding errors could kill us. Finding worst cases in IEEE double-precision
requires better-than-double-precision log() functions, and Tim didn't bother.
Instead the code checks to see whether the allocated space is enough as each
new Python digit is added, and copies the whole thing to a larger int if not.
This should happen extremely rarely, and in fact I don't have a test case
that triggers it(!). Instead the code was tested by artificially allocating
just 1 digit at the start, so that the copying code was exercised for every
digit beyond the first.
***/
twodigits c; /* current input character */
Py_ssize_t size_z;
Py_ssize_t digits = 0;
int i;
int convwidth;
twodigits convmultmax, convmult;
digit *pz, *pzstop;
const char *scan, *lastdigit;
char prev = 0;
static double log_base_BASE[37] = {0.0e0,};
static int convwidth_base[37] = {0,};
static twodigits convmultmax_base[37] = {0,};
if (log_base_BASE[base] == 0.0) {
twodigits convmax = base;
int i = 1;
log_base_BASE[base] = (log((double)base) /
log((double)PyLong_BASE));
for (;;) {
twodigits next = convmax * base;
if (next > PyLong_BASE) {
break;
}
convmax = next;
++i;
}
convmultmax_base[base] = convmax;
assert(i > 0);
convwidth_base[base] = i;
}
/* Find length of the string of numeric characters. */
scan = str;
lastdigit = str;
while (_PyLong_DigitValue[Py_CHARMASK(*scan)] < base || *scan == '_') {
if (*scan == '_') {
if (prev == '_') {
/* Only one underscore allowed. */
str = lastdigit + 1;
goto onError;
}
}
else {
++digits;
lastdigit = scan;
}
prev = *scan;
++scan;
}
if (prev == '_') {
/* Trailing underscore not allowed. */
/* Set error pointer to first underscore. */
str = lastdigit + 1;
goto onError;
}
/* Create an int object that can contain the largest possible
* integer with this base and length. Note that there's no
* need to initialize z->ob_digit -- no slot is read up before
* being stored into.
*/
double fsize_z = (double)digits * log_base_BASE[base] + 1.0;
if (fsize_z > (double)MAX_LONG_DIGITS) {
/* The same exception as in _PyLong_New(). */
PyErr_SetString(PyExc_OverflowError,
"too many digits in integer");
return NULL;
}
size_z = (Py_ssize_t)fsize_z;
/* Uncomment next line to test exceedingly rare copy code */
/* size_z = 1; */
assert(size_z > 0);
z = _PyLong_New(size_z);
if (z == NULL) {
return NULL;
}
Py_SIZE(z) = 0;
/* `convwidth` consecutive input digits are treated as a single
* digit in base `convmultmax`.
*/
convwidth = convwidth_base[base];
convmultmax = convmultmax_base[base];
/* Work ;-) */
while (str < scan) {
if (*str == '_') {
str++;
continue;
}
/* grab up to convwidth digits from the input string */
c = (digit)_PyLong_DigitValue[Py_CHARMASK(*str++)];
for (i = 1; i < convwidth && str != scan; ++str) {
if (*str == '_') {
continue;
}
i++;
c = (twodigits)(c * base +
(int)_PyLong_DigitValue[Py_CHARMASK(*str)]);
assert(c < PyLong_BASE);
}
convmult = convmultmax;
/* Calculate the shift only if we couldn't get
* convwidth digits.
*/
if (i != convwidth) {
convmult = base;
for ( ; i > 1; --i) {
convmult *= base;
}
}
/* Multiply z by convmult, and add c. */
pz = z->ob_digit;
pzstop = pz + Py_SIZE(z);
for (; pz < pzstop; ++pz) {
c += (twodigits)*pz * convmult;
*pz = (digit)(c & PyLong_MASK);
c >>= PyLong_SHIFT;
}
/* carry off the current end? */
if (c) {
assert(c < PyLong_BASE);
if (Py_SIZE(z) < size_z) {
*pz = (digit)c;
++Py_SIZE(z);
}
else {
PyLongObject *tmp;
/* Extremely rare. Get more space. */
assert(Py_SIZE(z) == size_z);
tmp = _PyLong_New(size_z + 1);
if (tmp == NULL) {
Py_DECREF(z);
return NULL;
}
memcpy(tmp->ob_digit,
z->ob_digit,
sizeof(digit) * size_z);
Py_DECREF(z);
z = tmp;
z->ob_digit[size_z] = (digit)c;
++size_z;
}
}
}
}
if (z == NULL) {
return NULL;
}
if (error_if_nonzero) {
/* reset the base to 0, else the exception message
doesn't make too much sense */
base = 0;
if (Py_SIZE(z) != 0) {
goto onError;
}
/* there might still be other problems, therefore base
remains zero here for the same reason */
}
if (str == start) {
goto onError;
}
if (sign < 0) {
Py_SIZE(z) = -(Py_SIZE(z));
}
while (*str && Py_ISSPACE(Py_CHARMASK(*str))) {
str++;
}
if (*str != '\0') {
goto onError;
}
long_normalize(z);
z = maybe_small_long(z);
if (z == NULL) {
return NULL;
}
if (pend != NULL) {
*pend = (char *)str;
}
return (PyObject *) z;
onError:
if (pend != NULL) {
*pend = (char *)str;
}
Py_XDECREF(z);
slen = strlen(orig_str) < 200 ? strlen(orig_str) : 200;
strobj = PyUnicode_FromStringAndSize(orig_str, slen);
if (strobj == NULL) {
return NULL;
}
PyErr_Format(PyExc_ValueError,
"invalid literal for int() with base %d: %.200R",
base, strobj);
Py_DECREF(strobj);
return NULL;
}
If you want to defined it in C, go ahead and try using this- if not, you'll have to write it yourself
>>> hash("\x01")
128000384
>>> hash("\x02")
256000771
>>> hash("\x03")
384001154
>>> hash("\x04")
512001541
Interesting part is 128000384 x 2 is not 256000771, and also others
I am just wondering how that algorithm works and want to learn something on it.
If you download the source code of Python, you will find it for sure!
But bear in mind the hash function is implemented for each kind of objects differently.
For example, you will find the unicode hash function in Objects/unicodeobject.c in the function unicode_hash. You might have to look a bit more to find the string hash function. Find the structure defining the object you are interested in, and in the field tp_hash, you will find the function that compute the hash code of that object.
For the string object: The exact code is found in Objects/stringobject.c in the function string_hash:
static long string_hash(PyStringObject *a)
{
register Py_ssize_t len;
register unsigned char *p;
register long x;
if (a->ob_shash != -1)
return a->ob_shash;
len = Py_SIZE(a);
p = (unsigned char *) a->ob_sval;
x = *p << 7;
while (--len >= 0)
x = (1000003*x) ^ *p++;
x ^= Py_SIZE(a);
if (x == -1)
x = -2;
a->ob_shash = x;
return x;
}
I don't think the accepted answer is really representative of cPython's internal hash implementations, which can be found in pyhash.c:
Description of hashing algorithm for numeric types:
/* For numeric types, the hash of a number x is based on the reduction
of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that
hash(x) == hash(y) whenever x and y are numerically equal, even if
x and y have different types.
A quick summary of the hashing strategy:
(1) First define the 'reduction of x modulo P' for any rational
number x; this is a standard extension of the usual notion of
reduction modulo P for integers. If x == p/q (written in lowest
terms), the reduction is interpreted as the reduction of p times
the inverse of the reduction of q, all modulo P; if q is exactly
divisible by P then define the reduction to be infinity. So we've
got a well-defined map
reduce : { rational numbers } -> { 0, 1, 2, ..., P-1, infinity }.
(2) Now for a rational number x, define hash(x) by:
reduce(x) if x >= 0
-reduce(-x) if x < 0
If the result of the reduction is infinity (this is impossible for
integers, floats and Decimals) then use the predefined hash value
_PyHASH_INF for x >= 0, or -_PyHASH_INF for x < 0, instead.
_PyHASH_INF and -_PyHASH_INF are also used for the
hashes of float and Decimal infinities.
NaNs hash with a pointer hash. Having distinct hash values prevents
catastrophic pileups from distinct NaN instances which used to always
have the same hash value but would compare unequal.
A selling point for the above strategy is that it makes it possible
to compute hashes of decimal and binary floating-point numbers
efficiently, even if the exponent of the binary or decimal number
is large. The key point is that
reduce(x * y) == reduce(x) * reduce(y) (modulo _PyHASH_MODULUS)
provided that {reduce(x), reduce(y)} != {0, infinity}. The reduction of a
binary or decimal float is never infinity, since the denominator is a power
of 2 (for binary) or a divisor of a power of 10 (for decimal). So we have,
for nonnegative x,
reduce(x * 2**e) == reduce(x) * reduce(2**e) % _PyHASH_MODULUS
reduce(x * 10**e) == reduce(x) * reduce(10**e) % _PyHASH_MODULUS
and reduce(10**e) can be computed efficiently by the usual modular
exponentiation algorithm. For reduce(2**e) it's even better: since
P is of the form 2**n-1, reduce(2**e) is 2**(e mod n), and multiplication
by 2**(e mod n) modulo 2**n-1 just amounts to a rotation of bits.
*/
Hashing of doubles:
Py_hash_t
_Py_HashDouble(double v)
{
int e, sign;
double m;
Py_uhash_t x, y;
if (!Py_IS_FINITE(v)) {
if (Py_IS_INFINITY(v))
return v > 0 ? _PyHASH_INF : -_PyHASH_INF;
else
return _PyHASH_NAN;
}
m = frexp(v, &e);
sign = 1;
if (m < 0) {
sign = -1;
m = -m;
}
/* process 28 bits at a time; this should work well both for binary
and hexadecimal floating point. */
x = 0;
while (m) {
x = ((x << 28) & _PyHASH_MODULUS) | x >> (_PyHASH_BITS - 28);
m *= 268435456.0; /* 2**28 */
e -= 28;
y = (Py_uhash_t)m; /* pull out integer part */
m -= y;
x += y;
if (x >= _PyHASH_MODULUS)
x -= _PyHASH_MODULUS;
}
/* adjust for the exponent; first reduce it modulo _PyHASH_BITS */
e = e >= 0 ? e % _PyHASH_BITS : _PyHASH_BITS-1-((-1-e) % _PyHASH_BITS);
x = ((x << e) & _PyHASH_MODULUS) | x >> (_PyHASH_BITS - e);
x = x * sign;
if (x == (Py_uhash_t)-1)
x = (Py_uhash_t)-2;
return (Py_hash_t)x;
}
Hashing of pointers:
Py_hash_t
_Py_HashPointerRaw(const void *p)
{
size_t y = (size_t)p;
/* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
excessive hash collisions for dicts and sets */
y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));
return (Py_hash_t)y;
}
Py_hash_t
_Py_HashPointer(const void *p)
{
Py_hash_t x = _Py_HashPointerRaw(p);
if (x == -1) {
x = -2;
}
return x;
}
Hashing of bytes for very short strings uses DJBX33A, otherwise uses the default hash:
Py_hash_t
_Py_HashBytes(const void *src, Py_ssize_t len)
{
Py_hash_t x;
/*
We make the hash of the empty string be 0, rather than using
(prefix ^ suffix), since this slightly obfuscates the hash secret
*/
if (len == 0) {
return 0;
}
#ifdef Py_HASH_STATS
hashstats[(len <= Py_HASH_STATS_MAX) ? len : 0]++;
#endif
#if Py_HASH_CUTOFF > 0
if (len < Py_HASH_CUTOFF) {
/* Optimize hashing of very small strings with inline DJBX33A. */
Py_uhash_t hash;
const unsigned char *p = src;
hash = 5381; /* DJBX33A starts with 5381 */
switch(len) {
/* ((hash << 5) + hash) + *p == hash * 33 + *p */
case 7: hash = ((hash << 5) + hash) + *p++; /* fallthrough */
case 6: hash = ((hash << 5) + hash) + *p++; /* fallthrough */
case 5: hash = ((hash << 5) + hash) + *p++; /* fallthrough */
case 4: hash = ((hash << 5) + hash) + *p++; /* fallthrough */
case 3: hash = ((hash << 5) + hash) + *p++; /* fallthrough */
case 2: hash = ((hash << 5) + hash) + *p++; /* fallthrough */
case 1: hash = ((hash << 5) + hash) + *p++; break;
default:
Py_UNREACHABLE();
}
hash ^= len;
hash ^= (Py_uhash_t) _Py_HashSecret.djbx33a.suffix;
x = (Py_hash_t)hash;
}
else
#endif /* Py_HASH_CUTOFF */
x = PyHash_Func.hash(src, len);
if (x == -1)
return -2;
return x;
}
The file also implements modified FNV hashing:
#if Py_HASH_ALGORITHM == Py_HASH_FNV
/* **************************************************************************
* Modified Fowler-Noll-Vo (FNV) hash function
*/
static Py_hash_t
fnv(const void *src, Py_ssize_t len)
{
const unsigned char *p = src;
Py_uhash_t x;
Py_ssize_t remainder, blocks;
union {
Py_uhash_t value;
unsigned char bytes[SIZEOF_PY_UHASH_T];
} block;
#ifdef Py_DEBUG
assert(_Py_HashSecret_Initialized);
#endif
remainder = len % SIZEOF_PY_UHASH_T;
if (remainder == 0) {
/* Process at least one block byte by byte to reduce hash collisions
* for strings with common prefixes. */
remainder = SIZEOF_PY_UHASH_T;
}
blocks = (len - remainder) / SIZEOF_PY_UHASH_T;
x = (Py_uhash_t) _Py_HashSecret.fnv.prefix;
x ^= (Py_uhash_t) *p << 7;
while (blocks--) {
PY_UHASH_CPY(block.bytes, p);
x = (_PyHASH_MULTIPLIER * x) ^ block.value;
p += SIZEOF_PY_UHASH_T;
}
/* add remainder */
for (; remainder > 0; remainder--)
x = (_PyHASH_MULTIPLIER * x) ^ (Py_uhash_t) *p++;
x ^= (Py_uhash_t) len;
x ^= (Py_uhash_t) _Py_HashSecret.fnv.suffix;
if (x == -1) {
x = -2;
}
return x;
}
static PyHash_FuncDef PyHash_Func = {fnv, "fnv", 8 * SIZEOF_PY_HASH_T,
16 * SIZEOF_PY_HASH_T};
#endif /* Py_HASH_ALGORITHM == Py_HASH_FNV */
According to PEP 456, SipHash (MIT License) is the default string and bytes hash algorithm:
/* byte swap little endian to host endian
* Endian conversion not only ensures that the hash function returns the same
* value on all platforms. It is also required to for a good dispersion of
* the hash values' least significant bits.
*/
#if PY_LITTLE_ENDIAN
# define _le64toh(x) ((uint64_t)(x))
#elif defined(__APPLE__)
# define _le64toh(x) OSSwapLittleToHostInt64(x)
#elif defined(HAVE_LETOH64)
# define _le64toh(x) le64toh(x)
#else
# define _le64toh(x) (((uint64_t)(x) << 56) | \
(((uint64_t)(x) << 40) & 0xff000000000000ULL) | \
(((uint64_t)(x) << 24) & 0xff0000000000ULL) | \
(((uint64_t)(x) << 8) & 0xff00000000ULL) | \
(((uint64_t)(x) >> 8) & 0xff000000ULL) | \
(((uint64_t)(x) >> 24) & 0xff0000ULL) | \
(((uint64_t)(x) >> 40) & 0xff00ULL) | \
((uint64_t)(x) >> 56))
#endif
#ifdef _MSC_VER
# define ROTATE(x, b) _rotl64(x, b)
#else
# define ROTATE(x, b) (uint64_t)( ((x) << (b)) | ( (x) >> (64 - (b))) )
#endif
#define HALF_ROUND(a,b,c,d,s,t) \
a += b; c += d; \
b = ROTATE(b, s) ^ a; \
d = ROTATE(d, t) ^ c; \
a = ROTATE(a, 32);
#define DOUBLE_ROUND(v0,v1,v2,v3) \
HALF_ROUND(v0,v1,v2,v3,13,16); \
HALF_ROUND(v2,v1,v0,v3,17,21); \
HALF_ROUND(v0,v1,v2,v3,13,16); \
HALF_ROUND(v2,v1,v0,v3,17,21);
static uint64_t
siphash24(uint64_t k0, uint64_t k1, const void *src, Py_ssize_t src_sz) {
uint64_t b = (uint64_t)src_sz << 56;
const uint64_t *in = (uint64_t*)src;
uint64_t v0 = k0 ^ 0x736f6d6570736575ULL;
uint64_t v1 = k1 ^ 0x646f72616e646f6dULL;
uint64_t v2 = k0 ^ 0x6c7967656e657261ULL;
uint64_t v3 = k1 ^ 0x7465646279746573ULL;
uint64_t t;
uint8_t *pt;
uint8_t *m;
while (src_sz >= 8) {
uint64_t mi = _le64toh(*in);
in += 1;
src_sz -= 8;
v3 ^= mi;
DOUBLE_ROUND(v0,v1,v2,v3);
v0 ^= mi;
}
t = 0;
pt = (uint8_t *)&t;
m = (uint8_t *)in;
switch (src_sz) {
case 7: pt[6] = m[6]; /* fall through */
case 6: pt[5] = m[5]; /* fall through */
case 5: pt[4] = m[4]; /* fall through */
case 4: memcpy(pt, m, sizeof(uint32_t)); break;
case 3: pt[2] = m[2]; /* fall through */
case 2: pt[1] = m[1]; /* fall through */
case 1: pt[0] = m[0]; /* fall through */
}
b |= _le64toh(t);
v3 ^= b;
DOUBLE_ROUND(v0,v1,v2,v3);
v0 ^= b;
v2 ^= 0xff;
DOUBLE_ROUND(v0,v1,v2,v3);
DOUBLE_ROUND(v0,v1,v2,v3);
/* modified */
t = (v0 ^ v1) ^ (v2 ^ v3);
return t;
}
static Py_hash_t
pysiphash(const void *src, Py_ssize_t src_sz) {
return (Py_hash_t)siphash24(
_le64toh(_Py_HashSecret.siphash.k0), _le64toh(_Py_HashSecret.siphash.k1),
src, src_sz);
}
uint64_t
_Py_KeyedHash(uint64_t key, const void *src, Py_ssize_t src_sz)
{
return siphash24(key, 0, src, src_sz);
}
#if Py_HASH_ALGORITHM == Py_HASH_SIPHASH24
static PyHash_FuncDef PyHash_Func = {pysiphash, "siphash24", 64, 128};
#endif
Objects such as tuples in (in tupleobject.c) have their own hashing methods. See the source for more examples:
static Py_hash_t
tuplehash(PyTupleObject *v)
{
Py_uhash_t x; /* Unsigned for defined overflow behavior. */
Py_hash_t y;
Py_ssize_t len = Py_SIZE(v);
PyObject **p;
Py_uhash_t mult = _PyHASH_MULTIPLIER;
x = 0x345678UL;
p = v->ob_item;
while (--len >= 0) {
y = PyObject_Hash(*p++);
if (y == -1)
return -1;
x = (x ^ y) * mult;
/* the cast might truncate len; that doesn't change hash stability */
mult += (Py_hash_t)(82520UL + len + len);
}
x += 97531UL;
if (x == (Py_uhash_t)-1)
x = -2;
return x;
}
Background
I have created a python module that wraps a c++ program using SWIG. It works just fine, but it has a pretty serious memory leak issue that I think is a result of poorly handled pointers to large map objects. I have very little experience with c++, and I have questions as to whether delete[] can be used on an object created with new in a different function or method.
The program was written in 2007, so excuse the lack of useful c++11 tricks.
The swig extension basically just wraps a single c++ class (Matrix) and a few functions.
Matrix.h
#ifndef __MATRIX__
#define __MATRIX__
#include <string>
#include <vector>
#include <map>
#include <cmath>
#include <fstream>
#include <cstdlib>
#include <stdio.h>
#include <unistd.h>
#include "FileException.h"
#include "ParseException.h"
#define ROUND_TO_INT(n) ((long long)floor(n))
#define MIN(a,b) ((a)<(b)?(a):(b))
#define MAX(a,b) ((a)>(b)?(a):(b))
using namespace std;
class Matrix {
private:
/**
* Split a string following delimiters
*/
void tokenize(const string& str, vector<string>& tokens, const string& delimiters) {
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
public:
// used for efficiency tests
long long totalMapSize;
long long totalOp;
double ** mat; // the matrix as it is stored in the matrix file
int length;
double granularity; // the real granularity used, greater than 1
long long ** matInt; // the discrete matrix with offset
double errorMax;
long long *offsets; // offset of each column
long long offset; // sum of offsets
long long *minScoreColumn; // min discrete score at each column
long long *maxScoreColumn; // max discrete score at each column
long long *sum;
long long minScore; // min total discrete score (normally 0)
long long maxScore; // max total discrete score
long long scoreRange; // score range = max - min + 1
long long *bestScore;
long long *worstScore;
double background[4];
Matrix() {
granularity = 1.0;
offset = 0;
background[0] = background[1] = background[2] = background[3] = 0.25;
}
Matrix(double pA, double pC, double pG, double pT) {
granularity = 1.0;
offset = 0;
background[0] = pA;
background[1] = pC;
background[2] = pG;
background[3] = pT;
}
~Matrix() {
for (int k = 0; k < 4; k++ ) {
delete[] matInt[k];
}
delete[] matInt;
delete[] mat;
delete[] offsets;
delete[] minScoreColumn;
delete[] maxScoreColumn;
delete[] sum;
delete[] bestScore;
delete[] worstScore;
}
void toLogOddRatio () {
for (int p = 0; p < length; p++) {
double sum = mat[0][p] + mat[1][p] + mat[2][p] + mat[3][p];
for (int k = 0; k < 4; k++) {
mat[k][p] = log((mat[k][p] + 0.25) /(sum + 1)) - log (background[k]);
}
}
}
void toLog2OddRatio () {
for (int p = 0; p < length; p++) {
double sum = mat[0][p] + mat[1][p] + mat[2][p] + mat[3][p];
for (int k = 0; k < 4; k++) {
mat[k][p] = log2((mat[k][p] + 0.25) /(sum + 1)) - log2 (background[k]);
}
}
}
/**
* Transforms the initial matrix into an integer and offseted matrix.
*/
void computesIntegerMatrix (double granularity, bool sortColumns = true);
// computes the complete score distribution between score min and max
void showDistrib (long long min, long long max) {
map<long long, double> *nbocc = calcDistribWithMapMinMax(min,max);
map<long long, double>::iterator iter;
// computes p values and stores them in nbocc[length]
double sum = 0;
map<long long, double>::reverse_iterator riter = nbocc[length-1].rbegin();
while (riter != nbocc[length-1].rend()) {
sum += riter->second;
nbocc[length][riter->first] = sum;
riter++;
}
iter = nbocc[length].begin();
while (iter != nbocc[length].end() && iter->first <= max) {
//cout << (((iter->first)-offset)/granularity) << " " << (iter->second) << " " << nbocc[length-1][iter->first] << endl;
iter ++;
}
}
/**
* Computes the pvalue associated with the threshold score requestedScore.
*/
void lookForPvalue (long long requestedScore, long long min, long long max, double *pmin, double *pmax);
/**
* Computes the score associated with the pvalue requestedPvalue.
*/
long long lookForScore (long long min, long long max, double requestedPvalue, double *rpv, double *rppv);
/**
* Computes the distribution of scores between score min and max as the DP algrithm proceeds
* but instead of using a table we use a map to avoid computations for scores that cannot be reached
*/
map<long long, double> *calcDistribWithMapMinMax (long long min, long long max);
void readMatrix (string matrix) {
vector<string> str;
tokenize(matrix, str, " \t|");
this->length = 0;
this->length = str.size() / 4;
mat = new double*[4];
int idx = 0;
for (int j = 0; j < 4; j++) {
this->mat[j] = new double[this->length];
for (int i = 0; i < this->length; i++) {
mat[j][i] = atof(str.at(idx).data());
idx++;
}
}
str.clear();
}
}; /* Matrix */
#endif
Matrix.cpp
#include "Matrix.h"
#define MEMORYCOUNT
void Matrix::computesIntegerMatrix (double granularity, bool sortColumns) {
double minS = 0, maxS = 0;
double scoreRange;
// computes precision
for (int i = 0; i < length; i++) {
double min = mat[0][i];
double max = min;
for (int k = 1; k < 4; k++ ) {
min = ((min < mat[k][i])?min:(mat[k][i]));
max = ((max > mat[k][i])?max:(mat[k][i]));
}
minS += min;
maxS += max;
}
// score range
scoreRange = maxS - minS + 1;
if (granularity > 1.0) {
this->granularity = granularity / scoreRange;
} else if (granularity < 1.0) {
this->granularity = 1.0 / granularity;
} else {
this->granularity = 1.0;
}
matInt = new long long *[length];
for (int k = 0; k < 4; k++ ) {
matInt[k] = new long long[length];
for (int p = 0 ; p < length; p++) {
matInt[k][p] = ROUND_TO_INT((double)(mat[k][p]*this->granularity));
}
}
this->errorMax = 0.0;
for (int i = 1; i < length; i++) {
double maxE = mat[0][i] * this->granularity - (matInt[0][i]);
for (int k = 1; k < 4; k++) {
maxE = ((maxE < mat[k][i] * this->granularity - matInt[k][i])?(mat[k][i] * this->granularity - (matInt[k][i])):(maxE));
}
this->errorMax += maxE;
}
if (sortColumns) {
// sort the columns : the first column is the one with the greatest value
long long min = 0;
for (int i = 0; i < length; i++) {
for (int k = 0; k < 4; k++) {
min = MIN(min,matInt[k][i]);
}
}
min --;
long long *maxs = new long long [length];
for (int i = 0; i < length; i++) {
maxs[i] = matInt[0][i];
for (int k = 1; k < 4; k++) {
if (maxs[i] < matInt[k][i]) {
maxs[i] = matInt[k][i];
}
}
}
long long **mattemp = new long long *[4];
for (int k = 0; k < 4; k++) {
mattemp[k] = new long long [length];
}
for (int i = 0; i < length; i++) {
long long max = maxs[0];
int p = 0;
for (int j = 1; j < length; j++) {
if (max < maxs[j]) {
max = maxs[j];
p = j;
}
}
maxs[p] = min;
for (int k = 0; k < 4; k++) {
mattemp[k][i] = matInt[k][p];
}
}
for (int k = 0; k < 4; k++) {
for (int i = 0; i < length; i++) {
matInt[k][i] = mattemp[k][i];
}
}
for (int k = 0; k < 4; k++) {
delete[] mattemp[k];
}
delete[] mattemp;
delete[] maxs;
}
// computes offsets
this->offset = 0;
offsets = new long long [length];
for (int i = 0; i < length; i++) {
long long min = matInt[0][i];
for (int k = 1; k < 4; k++ ) {
min = ((min < matInt[k][i])?min:(matInt[k][i]));
}
offsets[i] = -min;
for (int k = 0; k < 4; k++ ) {
matInt[k][i] += offsets[i];
}
this->offset += offsets[i];
}
// look for the minimum score of the matrix for each column
minScoreColumn = new long long [length];
maxScoreColumn = new long long [length];
sum = new long long [length];
minScore = 0;
maxScore = 0;
for (int i = 0; i < length; i++) {
minScoreColumn[i] = matInt[0][i];
maxScoreColumn[i] = matInt[0][i];
sum[i] = 0;
for (int k = 1; k < 4; k++ ) {
sum[i] = sum[i] + matInt[k][i];
if (minScoreColumn[i] > matInt[k][i]) {
minScoreColumn[i] = matInt[k][i];
}
if (maxScoreColumn[i] < matInt[k][i]) {
maxScoreColumn[i] = matInt[k][i];
}
}
minScore = minScore + minScoreColumn[i];
maxScore = maxScore + maxScoreColumn[i];
//cout << "minScoreColumn[" << i << "] = " << minScoreColumn[i] << endl;
//cout << "maxScoreColumn[" << i << "] = " << maxScoreColumn[i] << endl;
}
this->scoreRange = maxScore - minScore + 1;
bestScore = new long long[length];
worstScore = new long long[length];
bestScore[length-1] = maxScore;
worstScore[length-1] = minScore;
for (int i = length - 2; i >= 0; i--) {
bestScore[i] = bestScore[i+1] - maxScoreColumn[i+1];
worstScore[i] = worstScore[i+1] - minScoreColumn[i+1];
}
}
/**
* Computes the pvalue associated with the threshold score requestedScore.
*/
void Matrix::lookForPvalue (long long requestedScore, long long min, long long max, double *pmin, double *pmax) {
map<long long, double> *nbocc = calcDistribWithMapMinMax(min,max);
map<long long, double>::iterator iter;
// computes p values and stores them in nbocc[length]
double sum = nbocc[length][max+1];
long long s = max + 1;
map<long long, double>::reverse_iterator riter = nbocc[length-1].rbegin();
while (riter != nbocc[length-1].rend()) {
sum += riter->second;
if (riter->first >= requestedScore) s = riter->first;
nbocc[length][riter->first] = sum;
riter++;
}
//cout << " s found : " << s << endl;
iter = nbocc[length].find(s);
while (iter != nbocc[length].begin() && iter->first >= s - errorMax) {
iter--;
}
//cout << " s - E found : " << iter->first << endl;
#ifdef MEMORYCOUNT
// for tests, store the number of memory bloc necessary
for (int pos = 0; pos <= length; pos++) {
totalMapSize += nbocc[pos].size();
}
#endif
*pmax = nbocc[length][s];
*pmin = iter->second;
}
/**
* Computes the score associated with the pvalue requestedPvalue.
*/
long long Matrix::lookForScore (long long min, long long max, double requestedPvalue, double *rpv, double *rppv) {
map<long long, double> *nbocc = calcDistribWithMapMinMax(min,max);
map<long long, double>::iterator iter;
// computes p values and stores them in nbocc[length]
double sum = 0.0;
map<long long, double>::reverse_iterator riter = nbocc[length-1].rbegin();
long long alpha = riter->first+1;
long long alpha_E = alpha;
nbocc[length][alpha] = 0.0;
while (riter != nbocc[length-1].rend()) {
sum += riter->second;
nbocc[length][riter->first] = sum;
if (sum >= requestedPvalue) {
break;
}
riter++;
}
if (sum > requestedPvalue) {
alpha_E = riter->first;
riter--;
alpha = riter->first;
} else {
if (riter == nbocc[length-1].rend()) { // path following the remark of the mail
riter--;
alpha = alpha_E = riter->first;
} else {
alpha = riter->first;
riter++;
sum += riter->second;
alpha_E = riter->first;
}
nbocc[length][alpha_E] = sum;
//cout << "Pv(S) " << riter->first << " " << sum << endl;
}
#ifdef MEMORYCOUNT
// for tests, store the number of memory bloc necessary
for (int pos = 0; pos <= length; pos++) {
totalMapSize += nbocc[pos].size();
}
#endif
if (alpha - alpha_E > errorMax) alpha_E = alpha;
*rpv = nbocc[length][alpha];
*rppv = nbocc[length][alpha_E];
delete[] nbocc;
return alpha;
}
// computes the distribution of scores between score min and max as the DP algrithm proceeds
// but instead of using a table we use a map to avoid computations for scores that cannot be reached
map<long long, double> *Matrix::calcDistribWithMapMinMax (long long min, long long max) {
// maps for each step of the computation
// nbocc[length] stores the pvalue
// nbocc[pos] for pos < length stores the qvalue
map<long long, double> *nbocc = new map<long long, double> [length+1];
map<long long, double>::iterator iter;
long long *maxs = new long long[length+1]; // # pos i maximum score reachable with the suffix matrix from i to length-1
maxs[length] = 0;
for (int i = length-1; i >= 0; i--) {
maxs[i] = maxs[i+1] + maxScoreColumn[i];
}
// initializes the map at position 0
for (int k = 0; k < 4; k++) {
if (matInt[k][0]+maxs[1] >= min) {
nbocc[0][matInt[k][0]] += background[k];
}
}
// computes q values for scores greater or equal than min
nbocc[length-1][max+1] = 0.0;
for (int pos = 1; pos < length; pos++) {
iter = nbocc[pos-1].begin();
while (iter != nbocc[pos-1].end()) {
for (int k = 0; k < 4; k++) {
long long sc = iter->first + matInt[k][pos];
if (sc+maxs[pos+1] >= min) {
// the score min can be reached
if (sc > max) {
// the score will be greater than max for all suffixes
nbocc[length-1][max+1] += nbocc[pos-1][iter->first] * background[k]; //pow(4,length-pos-1) ;
totalOp++;
} else {
nbocc[pos][sc] += nbocc[pos-1][iter->first] * background[k];
totalOp++;
}
}
}
iter++;
}
//cerr << " map size for " << pos << " " << nbocc[pos].size() << endl;
}
delete[] maxs;
return nbocc;
}
pytfmpval.i
%module pytfmpval
%{
#include "../src/Matrix.h"
#define SWIG_FILE_WITH_INIT
%}
%include "cpointer.i"
%include "std_string.i"
%include "std_vector.i"
%include "typemaps.i"
%include "../src/Matrix.h"
%pointer_class(double, doublep)
%pointer_class(int, intp)
%nodefaultdtor Matrix;
The c++ functions are called in a python module.
I worry that nbocc in Matrix.cpp is not being properly dereferenced or deleted. Is this use valid?
I have tried using gc.collect() and I am using the multiprocessing module as recommended in this question to call these functions from my python program. I've also tried deleting the Matrix object from within python to no avail.
I'm out of characters, but will provide any additional needed info in the comments as well as I can.
UPDATE: I've removed all of the python code, as it wasn't the issue and made for an absurdly long post. As I stated in the comments below, this was ultimately solved by taking the suggestion of many users and creating a minimal example that exhibited the issue in pure C++. I then used valgrind to identify the problematic pointers created with new and made sure that they were properly dereferenced. This fixed almost all memory leaks. One remains, but it leaks only a few hundred bytes over thousands of iterations and would require refactoring the entire Matrix class, which simply isn't worth the time for what it is. Bad practice, I know. To any other newbie in C++ out there, seriously try to avoid dynamic memory allocation or utilize std::unique_ptr or std::shared_ptr.
Thanks again to everyone who provided input and suggestions.
It’s hard to follow what’s happening, but I’m pretty sure your matrices are not being cleaned up correctly.
In readMatrix, you have a loop over j which contains the line this->mat[j] = new double[this->length];. This allocates memory, which mat[j] points to. This memory needs to be freed at some point, by calling delete[] mat[j] (or some other loop variable). However, in the destructor, you just call delete[] mat, which leaks all of the arrays inside it.
Some general suggestions on cleaning this up:
If you know the bounds of an array, such as that matInt will always have a length of 4, you should declare it with that fixed length (long long* matInt[4] will make an array of four pointers to long long, each of which could be a pointer to an array); this will mean you don’t need to either new or delete it.
If you have a double pointer like double ** mat, and you allocate both the first and second layers of pointers with new[], you need to deallocate the inner layer with delete[] (and you need to do it before you delete[] the outer layer).
If you still have trouble, more of your code will be clear if you remove the methods which don’t seem relevant to the problem. For example, toLogOddRatio doesn’t allocate or deallocate memory at all; it almost certainly isn’t contributing to the problem and you can remove it from the code you post here (once you’ve removed the parts which you think don’t contribute, test again to make sure the problem’s still there; if not then you know that it was one of those parts somehow causing the leak).
Two questions are in play here: managing memory in C++, and then nudging the C++ side from the Python side to clean up. I'm guessing SWIG is generating a wrapper for the Matrix destructor and calling the destructor at some useful time. (I might convince myself of that by having the dtor make some noise.) That should handle the second question.
So let's focus on the C++ side. Passing around a bare map * is a well-known invitation to mischief. Here are two alternatives.
Alternative one: make the map a member of Matrix. Then it gets cleaned up automatically by ~Matrix(). This is the easiest thing. If the lifetime of the map does not exceed the lifetime of the Matrix, then this route will work.
Alternative two: if the map needs to persist after the Matrix object, then instead of passing around map *, use a shared pointer, std::shared_ptr<map>. The shared pointer reference counts the pointee (i.e. the dynamically allocated Matrix). When the ref count goes to zero, it deletes the underlying object.
They both build on the rule to allocate resources (memory in this case) in constructors and deallocate in destructors. This is called RAII (Resource Allocation Is Initialization). Another application of RAII in your code would be to use std::vector<long long> offsets instead of long long *offsets etc. Then you just resize the vectors as needed. When the Matrix is destroyed, the vectors are deleted with no intervention on your part. For the matrix, you could use a vector of vectors, and so on.
to answer your question, yes you can use delete on diffrent function or method. and you should, any memory you allocate in c/c++ you need to free (delete in c++ lingo)
python isn't aware of this memory, it's not a python object, so gc.collect() won't help.
you should add a c function that would take a Matrix struct and free/delete the memory use on that struct. and call it from python, swig in not handling memory allocation (only for the objects swig creates)
I would recommended looking into newer packages other then swig, like cython or cffi (or even NumPy matrix handling, I've heard he's good at)
Efficient way to count number of 1s in the binary representation of a number in O(1) if you have enough memory to play with. This is an interview question I found on an online forum, but it had no answer. Can somebody suggest something, I cant think of a way to do it in O(1) time?
That's the Hamming weight problem, a.k.a. population count. The link mentions efficient implementations. Quoting:
With unlimited memory, we could simply create a large lookup table of the Hamming weight of every 64 bit integer
I've got a solution that counts the bits in O(Number of 1's) time:
bitcount(n):
count = 0
while n > 0:
count = count + 1
n = n & (n-1)
return count
In worst case (when the number is 2^n - 1, all 1's in binary) it will check every bit.
Edit:
Just found a very nice constant-time, constant memory algorithm for bitcount. Here it is, written in C:
int BitCount(unsigned int u)
{
unsigned int uCount;
uCount = u - ((u >> 1) & 033333333333) - ((u >> 2) & 011111111111);
return ((uCount + (uCount >> 3)) & 030707070707) % 63;
}
You can find proof of its correctness here.
Please note the fact that: n&(n-1) always eliminates the least significant 1.
Hence we can write the code for calculating the number of 1's as follows:
count=0;
while(n!=0){
n = n&(n-1);
count++;
}
cout<<"Number of 1's in n is: "<<count;
The complexity of the program would be: number of 1's in n (which is constantly < 32).
I saw the following solution from another website:
int count_one(int x){
x = (x & (0x55555555)) + ((x >> 1) & (0x55555555));
x = (x & (0x33333333)) + ((x >> 2) & (0x33333333));
x = (x & (0x0f0f0f0f)) + ((x >> 4) & (0x0f0f0f0f));
x = (x & (0x00ff00ff)) + ((x >> 8) & (0x00ff00ff));
x = (x & (0x0000ffff)) + ((x >> 16) & (0x0000ffff));
return x;
}
public static void main(String[] args) {
int a = 3;
int orig = a;
int count = 0;
while(a>0)
{
a = a >> 1 << 1;
if(orig-a==1)
count++;
orig = a >> 1;
a = orig;
}
System.out.println("Number of 1s are: "+count);
}
countBits(x){
y=0;
while(x){
y += x & 1 ;
x = x >> 1 ;
}
}
thats it?
Below are two simple examples (in C++) among many by which you can do this.
We can simply count set bits (1's) using __builtin_popcount().
int numOfOnes(int x) {
return __builtin_popcount(x);
}
Loop through all bits in an integer, check if a bit is set and if it is then increment the count variable.
int hammingDistance(int x) {
int count = 0;
for(int i = 0; i < 32; i++)
if(x & (1 << i))
count++;
return count;
}
That will be the shortest answer in my SO life: lookup table.
Apparently, I need to explain a bit: "if you have enough memory to play with" means, we've got all the memory we need (nevermind technical possibility). Now, you don't need to store lookup table for more than a byte or two. While it'll technically be Ω(log(n)) rather than O(1), just reading a number you need is Ω(log(n)), so if that's a problem, then the answer is, impossible—which is even shorter.
Which of two answers they expect from you on an interview, no one knows.
There's yet another trick: while engineers can take a number and talk about Ω(log(n)), where n is the number, computer scientists will say that actually we're to measure running time as a function of a length of an input, so what engineers call Ω(log(n)) is actually Ω(k), where k is the number of bytes. Still, as I said before, just reading a number is Ω(k), so there's no way we can do better than that.
Below will work as well.
nofone(int x) {
a=0;
while(x!=0) {
x>>=1;
if(x & 1)
a++;
}
return a;
}
The following is a C solution using bit operators:
int numberOfOneBitsInInteger(int input) {
int numOneBits = 0;
int currNum = input;
while (currNum != 0) {
if ((currNum & 1) == 1) {
numOneBits++;
}
currNum = currNum >> 1;
}
return numOneBits;
}
The following is a Java solution using powers of 2:
public static int numOnesInBinary(int n) {
if (n < 0) return -1;
int j = 0;
while ( n > Math.pow(2, j)) j++;
int result = 0;
for (int i=j; i >=0; i--){
if (n >= Math.pow(2, i)) {
n = (int) (n - Math.pow(2,i));
result++;
}
}
return result;
}
The function takes an int and returns the number of Ones in binary representation
public static int findOnes(int number)
{
if(number < 2)
{
if(number == 1)
{
count ++;
}
else
{
return 0;
}
}
value = number % 2;
if(number != 1 && value == 1)
count ++;
number /= 2;
findOnes(number);
return count;
}
I came here having a great belief that I know beautiful solution for this problem. Code in C:
short numberOfOnes(unsigned int d) {
short count = 0;
for (; (d != 0); d &= (d - 1))
++count;
return count;
}
But after I've taken a little research on this topic (read other answers:)) I found 5 more efficient algorithms. Love SO!
There is even a CPU instruction designed specifically for this task: popcnt.
(mentioned in this answer)
Description and benchmarking of many algorithms you can find here.
The best way in javascript to do so is
function getBinaryValue(num){
return num.toString(2);
}
function checkOnces(binaryValue){
return binaryValue.toString().replace(/0/g, "").length;
}
where binaryValue is the binary String eg: 1100
There's only one way I can think of to accomplish this task in O(1)... that is to 'cheat' and use a physical device (with linear or even parallel programming I think the limit is O(log(k)) where k represents the number of bytes of the number).
However you could very easily imagine a physical device that connects each bit an to output line with a 0/1 voltage. Then you could just electronically read of the total voltage on a 'summation' line in O(1). It would be quite easy to make this basic idea more elegant with some basic circuit elements to produce the output in whatever form you want (e.g. a binary encoded output), but the essential idea is the same and the electronic circuit would produce the correct output state in fixed time.
I imagine there are also possible quantum computing possibilities, but if we're allowed to do that, I would think a simple electronic circuit is the easier solution.
I have actually done this using a bit of sleight of hand: a single lookup table with 16 entries will suffice and all you have to do is break the binary rep into nibbles (4-bit tuples). The complexity is in fact O(1) and I wrote a C++ template which was specialized on the size of the integer you wanted (in # bits)… makes it a constant expression instead of indetermined.
fwiw you can use the fact that (i & -i) will return you the LS one-bit and simply loop, stripping off the lsbit each time, until the integer is zero — but that’s an old parity trick.
The below method can count the number of 1s in negative numbers as well.
private static int countBits(int number) {
int result = 0;
while(number != 0) {
result += number & 1;
number = number >>> 1;
}
return result;
}
However, a number like -1 is represented in binary as 11111111111111111111111111111111 and so will require a lot of shifting. If you don't want to do so many shifts for small negative numbers, another way could be as follows:
private static int countBits(int number) {
boolean negFlag = false;
if(number < 0) {
negFlag = true;
number = ~number;
}
int result = 0;
while(number != 0) {
result += number & 1;
number = number >> 1;
}
return negFlag? (32-result): result;
}
In python or any other convert to bin string then split it with '0' to get rid of 0's then combine and get the length.
len(''.join(str(bin(122011)).split('0')))-1
By utilizing string operations of JS one can do as follows;
0b1111011.toString(2).split(/0|(?=.)/).length // returns 6
or
0b1111011.toString(2).replace("0","").length // returns 6
I had to golf this in ruby and ended up with
l=->x{x.to_s(2).count ?1}
Usage :
l[2**32-1] # returns 32
Obviously not efficient but does the trick :)
Ruby implementation
def find_consecutive_1(n)
num = n.to_s(2)
arr = num.split("")
counter = 0
max = 0
arr.each do |x|
if x.to_i==1
counter +=1
else
max = counter if counter > max
counter = 0
end
max = counter if counter > max
end
max
end
puts find_consecutive_1(439)
Two ways::
/* Method-1 */
int count1s(long num)
{
int tempCount = 0;
while(num)
{
tempCount += (num & 1); //inc, based on right most bit checked
num = num >> 1; //right shift bit by 1
}
return tempCount;
}
/* Method-2 */
int count1s_(int num)
{
int tempCount = 0;
std::string strNum = std::bitset< 16 >( num ).to_string(); // string conversion
cout << "strNum=" << strNum << endl;
for(int i=0; i<strNum.size(); i++)
{
if('1' == strNum[i])
{
tempCount++;
}
}
return tempCount;
}
/* Method-3 (algorithmically - boost string split could be used) */
1) split the binary string over '1'.
2) count = vector (containing splits) size - 1
Usage::
int count = 0;
count = count1s(0b00110011);
cout << "count(0b00110011) = " << count << endl; //4
count = count1s(0b01110110);
cout << "count(0b01110110) = " << count << endl; //5
count = count1s(0b00000000);
cout << "count(0b00000000) = " << count << endl; //0
count = count1s(0b11111111);
cout << "count(0b11111111) = " << count << endl; //8
count = count1s_(0b1100);
cout << "count(0b1100) = " << count << endl; //2
count = count1s_(0b11111111);
cout << "count(0b11111111) = " << count << endl; //8
count = count1s_(0b0);
cout << "count(0b0) = " << count << endl; //0
count = count1s_(0b1);
cout << "count(0b1) = " << count << endl; //1
A Python one-liner
def countOnes(num):
return bin(num).count('1')