python sum(dict.values()) giving incorrect result?

python sum(dict.values()) giving incorrect result? - python

Dear Stack Exchangers,
I encountered a strange result while attempting to calculate the sum of values in a python dictionary. If my dictionary gets past a certain size, the following function: sum(dict.values()) appears to give incorrect results. It appears that the result becomes suddenly negative for no apparent reason. I am using python 2.7 on Windows 7. (I apologize in advance for my coding style).
Please note that I encountered this behaviour while working on https://projecteuler.net/problem=72, but I am not asking for help to get the answer, I am just perplexed by the behaviour of the built-in function. (I also apologise for posting a partial solution on a public forum, please look away now if you don't want any hints).
The goal of the program is explained on the project Euler link (above), but I will attempt to briefly explain my code:
The first function uses a Sieve of Erasthenes to produce a list of prime numbers and a modified sieve to produce a dictionary of {composite_number:[prime_factor_list]} within a specified range.
The second function attempts to count the number of fractions of the form n/d that can be produced where n < d and d <= 1000000. The problem states that I should only count reduced proper fractions, so the bulk of this function is concerned with weeding out reducible fractions. My strategy is to loop through each numerator between 1 and d-1, and discard unsuitable denominators. For primes this is simple, and for non-primes I am close to a working solution, but still discard some values more than once. For the purposes of this post, the important detail is the way that I tally up the count:
Initially I used a simple counter (initialise count to 0 then increment as needed), but decided to try a dictionary instead. What surprised me was that the two methods gave different results, but only when the upper limit (d) exceeded a certain size. I probed deeper and managed to isolate the exact moment that the counts diverge. The line if 88000 < i < 88055: near the bottom identifies the point at which the sum of dict values begins to differ from the simple count. For values up to i = 88032, the value are the same, but when i = 88033, the values diverge dramatically:
from collections import defaultdict
def primeset(limit):
pr = [0]*(limit+1)
for i in range(2,limit+1):
j = i
i += j
while i <= limit:
pr[i] = 1
i += j
primes = [k for k in range(2,limit+1) if pr[k] == 0]
composites = defaultdict(list)
for p in primes:
q = p
p += q
while p <= limit:
composites[p].append(q)
p += q
return primes, composites
def method2(limit):
primes, composites = primeset(limit)
prf = {}
count = 0
count += limit-1
count += (limit-2)/2
prf[1] = limit-1
prf[2] = (limit-2)/2
for i in primes:
if i != 2:
tally = limit-i-(limit/i)+1
count += tally
prf[i] = tally
for i in composites:
tally = limit-i
for item in composites[i]:
tally -= (limit/item-i/item)
count += tally
prf[i] = tally
if 88000 < i < 88055:
print i, count, tally, sum(prf.values())
return count, prf
result, index = method2(88547)
print result,sum(index.values())
I expect I have done something really stupid, but I felt compelled to put it out there in case something really is amiss.
Regards,

You are having a problem with integer overflow, which in Python is not supposed to happen. You have a 32-bit machine, so the largest normal integer is (2^31 - 1). Once your calculation exceeds that Python should automatically switch to doing calculations using a long which isn't limited in the size of the number that it can support. I only have 64-bit machines, but the same thing applies except the max integer is (2^63 - 1). You can tell from the shell when you have a long because of the L that is printed after the number. Here is an example from my shell:
>>> 2**62 - 1 + 2**62 # This is max int
9223372036854775807
>>> 2**63 # This is a long
9223372036854775808L
>>> num = 2**62 - 1 + 2**62
>>> num
9223372036854775807
>>> num+1
9223372036854775808L
>>> d = {1:2**62,2:-1,3:2**62}
>>> sum(d.values())
9223372036854775807
>>> d = {1:2**62,2:-1,3:2**62,4:1}
>>> sum(d.values())
9223372036854775808L
In my case with Python 2.7 on Linux on a 64-bit machine this all works as expected.
Now I run the same thing using Spyder and I get the wrong answer:
>>> d = {1:2**62,2:-1,3:2**62,4:1}
>>> sum(d.values())
-9223372036854775808
It promotes correctly when I just do normal addition, but this sum from a dictionary gives the wrong answer. This isn't specific to dictionaries, just the sum function. Same thing with an list:
>>> list = [2**62, -1, 2**62, 1]
>>> sum(list)
-9223372036854775808
So the problem is isolated to the sum() function in Spyder and happens for both 32 and 64-bit machines.
The real answer turns out that Spyder automatically imports numpy. Numpy has its own version of the sum function. It is described as follows "Arithmetic is modular when using integer types, and no error is raised on overflow." You are using that version of sum and it is causing the problem. If you don't want to use that sum you can put the following at the top of your file:
from __builtin__ import sum
That will cause the built-in version of sum to be used and you will get the correct answer.
To figure out that sum was not coming from where I thought it was coming from I could have used the following:
>>> import inspect
>>> inspect.getmodule(sum)
<module 'numpy.core.fromnumeric' from '/usr/lib/python2.7/dist-packages/nump/core/fromnumeric.pyc'>

Related

2 raised to what power, is greater than 1,000,000,000? print the answer(answer must be int)

I am trying to find this answer using while loop but I have not been able to write the code.
I was trying the below code:
base=2
num=1
while base**num > 1000000000:
print(num)
num +=1

Here's a solution using log.
import math
res = math.log(1000000000,2)
num = int(res)+1 # taking ceiling and not floor of the log as 2^floor will result in a number less than 1000000000
print(num)
Using a while loop will be inefficient and computationally very expensive, especially for large numbers

Your condition is reversed. It should be base**num>1000000000. That's an extremely inefficient way of calculating powers though.
Use :
>>> math.ceil(math.log2(1000000000))
30
math.log2 is a specialized version of log. You could even use int.bit_length because in binary, numbers are represented using as many digits as their base-2. Unless the number is an exact power of 2, 2^digits is guarantee to return a larger number
>>> int.bit_length(1000000000)
30
The power you need to raise a "base" (in this case 2) to get a number is the base-N logarithm of that number. In this case you're looking for the base-2 logarithm of 1000000000.
The result of is math.log2(1000000000) is 29.897 though, so you need its ceiling to get the next higher integer number
>>> math.log2(1000000000)
29.897352853986263
Logarithms are so useful CPUs have built-in commands to handle them.

I don't know if I understood properly your question, but if so...
You are trying to find a number that, used to elevate 2, gives you a number greater than 1'000'000'000.
The condition is wrong, because it should be:
while (base**num < 1000000000):
num += 1
print(num)
So, it increments num while it isn't greater than your "limit" 1'000'000'000.
I suggest you to use the logarithm function, to avoid the loop:
import math
num = (int) math.log(1000000000, 2)
print(num)

You can try with this:
base = 2
power = 1
value = 1000000000
output = base**power
while output <= value:
power+=1
output = base**power
print(output, power)`enter code here`
print(power)

after define the variable and value is getting output as expected, check the below code.
base = 2
num = 0
while (base**num < 1000000000):
num += 1
print(num)
Output will come:- 30

Why do these python and kotlin functions perform differently on hackerrank?

I am trying to solve a contest challenge on Hackerrank (Hack the Interview II - Global Product Distribution) using Kotlin
I started getting annoyed because my code always passed on the test cases with a small number of inputs and failed on the larger ones, even timing out on one.
So I went online and found this python code that solved all test cases neatly. I went as far as converting the Python code line for line into Kotlin. But my Kotlin code always retained the same poor performance as before.
These are the two pieces of code.
Python:
def maxScore(a, m):
a.sort()
print(a)
x=len(a)
if x%m==0:
y=int(x/m)
else:
y=int(x/m)-1
summ=0
count=1
#print(y)
i=0
for _ in range(y):
summ=summ+(sum(a[i:i+m])*count)
count=count+1
i=i+m
print(summ)
summ=summ+sum(a[i:])*count
print(summ)
return summ%1000000007
Kotlin:
fun maxScore(a: Array<Int>, m: Int): Int {
a.sort()
// print(a)
val x = a.size
val y = if (x % m == 0) x / m
else (x / m) - 1
var summ = 0
var count = 1
// print(y)
var i = 0
for (s in 0 until y) {
summ += a.sliceArray(i until (i + m)).sum() * count
count++
i += m
// print(summ)
}
summ += a.sliceArray(i until a.size).sum() * count
// print(summ)
return summ % 1000000007
}
Is there something wrong with the code translation? How can I make the Kotlin code work on the larger test cases?
UPDATE: copyOfRange() performs better than sliceArray(). Code no longer times out on any test case, but still fails on all the large test cases

There's three issues I can see here. I'll point you in the right direction for now.
Both the Python and the Kotlin copy the array each time. This might or might not be a problem. You have up to a million elements and each is copied only once. I'd be surprised if that exceeds your time limits but it might do. It looks like you can avoid the copy with .subList().
It looks like you're treating the leftover items as if they're in a bin of their own. But this bin is smaller than m, which isn't allowed. Check that this is really what you intend.
Kotlin Ints are 32-bit signed integers. You can only store numbers up to about 2 billion before they overflow. You need to avoid this! Look at the constraints - you can have up to a million products with individual values up to a billion each. (This is different from Python ints, which never overflow, and so will always give the right answer, but can use a lot of memory and slow down if you try to do operations on really big numbers, which might well be causing your program to time out.) Here is a hint: (a + b) % n is equal to ((a % n) + (b % n)) % n

Efficient random generator for very large range (in python)

I am trying to create a generator that returns numbers in a given range that pass a particular test given by a function foo. However I would like the numbers to be tested in a random order. The following code will achieve this:
from random import shuffle
def MyGenerator(foo, num):
order = list(range(num))
shuffle(order)
for i in order:
if foo(i):
yield i
The Problem
The problem with this solution is that sometimes the range will be quite large (num might be of the order 10**8 and upwards). This function can become slow, having such a large list in memory. I have tried to avoid this problem, with the following code:
from random import randint
def MyGenerator(foo, num):
tried = set()
while len(tried) <= num - 1:
i = randint(0, num-1)
if i in tried:
continue
tried.add(i)
if foo(i):
yield i
This works well most of the time, since in most cases num will be quite large, foo will pass a reasonable number of numbers and the total number of times the __next__ method will be called will be relatively small (say, a maximum of 200 often much smaller). Therefore its reasonable likely we stumble upon a value that passes the foo test and the size of tried never gets large. (Even if it only passes 10% of the time, we wouldn't expect tried to get larger than about 2000 roughly.)
However, when num is small (close to the number of times that the __next__ method is called, or foo fails most of the time, the above solution becomes very inefficient - randomly guessing numbers until it guesses one that isn't in tried.
My attempted solution...
I was hoping to use some kind of function that maps the numbers 0,1,2,..., n onto themselves in a roughly random way. (This isn't being used for any security purposes and so doesn't matter if it isn't the most 'random' function in the world). The function here (Create a random bijective function which has same domain and range) maps signed 32-bit integers onto themselves, but I am not sure how to adapt the mapping to a smaller range. Given num I don't even need a bijection on 0,1,..num just a value of n larger than and 'close' to num (using whatever definition of close you see fit). Then I can do the following:
def mix_function_factory(num):
# something here???
def foo(index):
# something else here??
return foo
def MyGenerator(foo, num):
mix_function = mix_function_factory(num):
for i in range(num):
index = mix_function(i)
if index <= num:
if foo(index):
yield index
(so long as the bijection isn't on a set of numbers massively larger than num the number of times index <= num isn't True will be small).
My Question
Can you think of one of the following:
A potential solution for mix_function_factory or even a few other potential functions for mix_function that I could attempt to generalise for different values of num?
A better way of solving the original problem?
Many thanks in advance....

The problem is basically generating a random permutation of the integers in the range 0..n-1.
Luckily for us, these numbers have a very useful property: they all have a distinct value modulo n. If we can apply some mathemical operations to these numbers while taking care to keep each number distinct modulo n, it's easy to generate a permutation that appears random. And the best part is that we don't need any memory to keep track of numbers we've already generated, because each number is calculated with a simple formula.
Examples of operations we can perform on every number x in the range include:
Addition: We can add any integer c to x.
Multiplication: We can multiply x with any number m that shares no prime factors with n.
Applying just these two operations on the range 0..n-1 already gives quite satisfactory results:
>>> n = 7
>>> c = 1
>>> m = 3
>>> [((x+c) * m) % n for x in range(n)]
[3, 6, 2, 5, 1, 4, 0]
Looks random, doesn't it?
If we generate c and m from a random number, it'll actually be random, too. But keep in mind that there is no guarantee that this algorithm will generate all possible permutations, or that each permutation has the same probability of being generated.
Implementation
The difficult part about the implementation is really just generating a suitable random m. I used the prime factorization code from this answer to do so.
import random
# credit for prime factorization code goes
# to https://stackoverflow.com/a/17000452/1222951
def prime_factors(n):
gaps = [1,2,2,4,2,4,2,4,6,2,6]
length, cycle = 11, 3
f, fs, next_ = 2, [], 0
while f * f <= n:
while n % f == 0:
fs.append(f)
n /= f
f += gaps[next_]
next_ += 1
if next_ == length:
next_ = cycle
if n > 1: fs.append(n)
return fs
def generate_c_and_m(n, seed=None):
# we need to know n's prime factors to find a suitable multiplier m
p_factors = set(prime_factors(n))
def is_valid_multiplier(m):
# m must not share any prime factors with n
factors = prime_factors(m)
return not p_factors.intersection(factors)
# if no seed was given, generate random values for c and m
if seed is None:
c = random.randint(n)
m = random.randint(1, 2*n)
else:
c = seed
m = seed
# make sure m is valid
while not is_valid_multiplier(m):
m += 1
return c, m
Now that we can generate suitable values for c and m, creating the permutation is trivial:
def random_range(n, seed=None):
c, m = generate_c_and_m(n, seed)
for x in range(n):
yield ((x + c) * m) % n
And your generator function can be implemented as
def MyGenerator(foo, num):
for x in random_range(num):
if foo(x):
yield x

That may be a case where the best algorithm depends on the value of num, so why not using 2 selectable algorithms wrapped in one generator ?
you could mix your shuffle and set solutions with a threshold on the value of num. That's basically assembling your 2 first solutions in one generator:
from random import shuffle,randint
def MyGenerator(foo, num):
if num < 100000 # has to be adjusted by experiments
order = list(range(num))
shuffle(order)
for i in order:
if foo(i):
yield i
else: # big values, few collisions with random generator
tried = set()
while len(tried) < num:
i = randint(0, num-1)
if i in tried:
continue
tried.add(i)
if foo(i):
yield i
The randint solution (for big values of num) works well because there aren't so many repeats in the random generator.

Getting the best performance in Python is much trickier than in lower-level languages. For example, in C, you can often save a little bit in hot inner loops by replacing a multiplication by a shift. The overhead of python bytecode-orientation erases this. Of course, this changes again when you consider which variant of "python" you're targetting (pypy? numpy? cython?)- you really have to write your code based on which one you're using.
But even more important is arranging operations to avoid serialized dependencies, since all CPUs are superscalar these days. Of course, real compilers know about this, but it still matters when choosing an algorithm.
One of the easiest ways to gain a little bit over existing answers would be by by generating numbers in chunks using numpy.arange() and applying the ((x + c) * m) % n to the numpy ndarray directly. Every python-level loop that can be avoided helps.
If the function can be applied directly to numpy ndarrays, that might even better. Of course, a sufficiently-small function in python will be dominated by function-call overhead anyway.
The best fast random-number-generator today is PCG. I wrote a pure-python port here but concentrated on flexibility and ease-of-understanding rather than speed.
Xoroshiro128+ is second-best-quality and faster, but less informative to study.
Python's (and many others') default choice of Mersenne Twister is among the worst.
(there's also something called splitmix64 which I don't know enough about to place - some people say it's better than xoroshiro128+, but it has a period problem - of course, you might want that here)
Both default-PCG and xoroshiro128+ use a 2N-bit state to generate N-bit numbers. This is generally desirable, but means numbers will be repeated. PCG has alternate modes that avoid this, however.
Of course, much of this depends on whether num is (close to) a power of 2. In theory, PCG variants can be created for any bit width, but currently only various word sizes are implemented since you'd need explicit masking. I'm not sure exactly how to generate the parameters for new bit sizes (perhaps it's in the paper?), but they can be tested simply by doing a period/2 jump and verifying that the value is different.
Of course, if you're only making 200 calls to the RNG, you probably don't actually need to avoid duplicates on the math side.
Alternatively, you could use an LFSR, which does exist for every bit size (though note that it never generates the all-zeros value (or equivalently, the all-ones value)). LFSRs are serial and (AFAIK) not jumpable, and thus can't be easily split across multiple tasks. Edit: I figured out that this is untrue, simply represent the advance step as a matrix, and exponentiate it to jump.
Note that LFSRs do have the same obvious biases as simply generating numbers in sequential order based on a random start point - for example, if rng_outputs[a:b] all fail your foo function, then rng_outputs[b] will be much more likely as a first output regardless of starting point. PCG's "stream" parameter avoids this by not generating numbers in the same order.
Edit2: I have completed what I thought was a "brief project" implementing LFSRs in python, including jumping, fully tested.

Improving perfomance(speed) of exponent problem

I'm currently learning Python on repl.it and I have a problem with one of my work.
My code is supposed to:
1.Input a given integer X
2.Find the greatest integer n where 2ⁿ is less than or equal to X.
3.Print the exponent value(n) and the result of the expression 2ⁿ.
But my code fail as the machine insert too big number like 10^8+2. The program completely failed
Here is the piece of code that I'm working on:
X = int(input())
a = X//2
while a > -1:
if (2**a) < =x:
print(a)
print(2**a)
break
else:
a -= 1
Can anyone find me another solution to this problem, or improve the bit of code I'm working on by its runtime? It works with small number(less than 10^6) but otherwise the program freeze.
Thanks in advance!

Of course, I can't refer to the "too big input" that you mention (since you didn't provide it), but as for the problem itself, it could be easier solved in the following way:
import numpy as np
a = int(np.log2(your_input))

The first issue I see is that in you code
if (2**a) < =x:
print(a)
print(2**a)
you calculate the value of 2**a twice. A good start could be to save the value of 2**a into a variable. However, since you are only doing powers of 2 you could also take a look at bitwise operations. So instead of doing a = X//2 you could also write
a= X >> 2
and instead of doing 2**a write
temp = 1 << a
When working with powers of 2 it can be significantly faster to work with bitwise operations.

I did it! (using some of your solutions of course)
This is my teachers code :
x = int(input())
n = 1
while 2 ** n <= x:
n += 1
print(n - 1, 2 ** (n - 1))

Generating evenly distributed bits, using approximation

I'm trying to generate 0 or 1 with 50/50 chance of any using random.uniform instead of random.getrandbits.
Here's what I have
0 if random.uniform(0, 1e-323) == 0.0 else 1
But if I run this long enough, the average is ~70% to generate 1. As seem here:
sum(0 if random.uniform(0, 1e-323) == 0.0
else 1
for _ in xrange(1000)) / 1000.0 # --> 0.737
If I change it to 1e-324 it will always be 0. And if I change it to 1e-322, the average will be ~%90.
I made a dirty program that will try to find the sweet spot between 1e-322 and 1e-324, by dividing and multiplying it several times:
v = 1e-323
n_runs = 100000
target = n_runs/2
result = 0
while True:
result = sum(0 if random.uniform(0, v) == 0.0 else 1 for _ in xrange(n_runs))
if result > target:
v /= 1.5
elif result < target:
v *= 1.5 / 1.4
else:
break
print v
This end ups with 4.94065645841e-324
But it still will be wrong if I ran it enough times.
Is there I way to find this number without the dirty script I wrote? I know that Python has a intern min float value, show in sys.float_info.min, which in my PC is 2.22507385851e-308. But I don't see how to use it to solve this problem.
Sorry if this feels more like a puzzle than a proper question, but I'm not able to answer it myself.

I know that Python has a intern min float value, show in sys.float_info.min, which in my PC is 2.22507385851e-308. But I don't see how to use it to solve this problem.
2.22507385851e-308 is not the smallest positive float value, it is the smallest positive normalized float value. The smallest positive float value is 2-52 times that, that is, near 5e-324.
2-52 is called the “machine epsilon” and it is usual to call the “min” of a floating-point type a value that is nether that which is least of all comparable values (that is -inf), nor the least of finite values (that is -max), nor the least of positive values.
Then, the next problem you face is that random.uniform is not uniform to that level. It probably works ok when you pass it a normalized number, but if you pass it the smallest positive representable float number, the computation it does with it internally may be very approximative and lead it to behave differently than the documentation says. Although it appears to work surprisingly ok according to the results of your “dirty script”.

Here's the random.uniform implementation, according to the source:
from os import urandom as _urandom
BPF = 53 # Number of bits in a float
RECIP_BPF = 2**-BPF
def uniform(self, a, b):
"Get a random number in the range [a, b) or [a, b] depending on rounding."
return a + (b-a) * self.random()
def random(self):
"""Get the next random number in the range [0.0, 1.0)."""
return (int.from_bytes(_urandom(7), 'big') >> 3) * RECIP_BPF
So, your problem boils down to finding a number b that will give 0 when multiplied by a number less than 0.5 and another result when multiplied by a number larger than 0.5. I've found out that, on my machine, that number is 5e-324.
To test it, I've made the following script:
from random import uniform
def test():
runs = 1000000
results = [0, 0]
for i in range(runs):
if uniform(0, 5e-324) == 0:
results[0] += 1
else:
results[1] += 1
print(results)
Which returned results consistent with a 50% probability:
>>> test()
[499982, 500018]
>>> test()
[499528, 500472]
>>> test()
[500307, 499693]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.