how to compute log factorial of an array of numbers - python

The quantity to be computed is log(k!), where k could be 4000 or even higher, but of course the log will compensate. I tried computing sum(log(k)) which is the same.
So, I am given an large array with integers and I want to efficiently compute sum(log(k)). This was my attempt:
integers = np.asarray([435, 535, 242,])
score = np.sum(np.log(np.arange(1,integers+1)))
This would work, except that np.arange would generate an array of different size for each integer, so when I run that, it gives me an error (as it should).
The problem could be easily solved with a for loop as follows:
scores = []
for i in range(integers.shape[0]):
score = np.sum(np.log(np.arange(1,integer[i]+1)))
scores.append(score)
but that's too slow. My actual integers has millions of value to be computed.
Is there an efficient implementation for this that basically that doesn't need a for loop? I was thinking of a lambda function or something like that, but I am not really sure how to apply it. Any help is appreciated!

How about math.lgamma? Gamma function is factorial, and lgamma is log of gamma.
You don't need to compute factorial and then log.
There is also gammaln in the SciPy
Code, Python 3.9 x64 Win 10
import numpy as np
from scipy.special import gammaln
startf = 1 # start of factorial sequence
stopf = 400 # end of of factorial sequence
q = gammaln(range(startf+1, stopf+1)) # n! = G(n+1)
print(q)
looks reasonable to me

You can vectorize with something like this:
mi = integers.max()
ls = np.log(np.arange(2, mi + 1))
Two optimizations so far: you only need the range up to the maximum, since the other numbers are covered by that, and you don't need log(1).
Now you take the cumulative sum:
cs = np.cumsum(ls)
The desired elements can be indexed directly:
result = cs[integers - 2]
If this is something you need to do many times, and you know the upper bound, this solution will be much faster than using math.lgmamma or scipy.special.gammaln once you precompute cs to the upper bound.
If this is a one-time call, here is the obligatory one-liner:
np.cumsum(np.log(np.arange(2, np.max(integers))))[integers - 2]
You can do most of the operations in-place if memory is a concern (I think it also makes them faster):
mi = integers.max()
cs = np.arange(2, mi + 1)
np.cumsum(np.log(cs, out=cs), out=cs)

Related

my code is giving me the wrong output sometimes, how to solve it?

I am trying to solve this problem: 'Your task is to construct a building which will be a pile of n cubes. The cube at the bottom will have a volume of n^3, the cube above will have the volume of (n-1)^3 and so on until the top which will have a volume of 1^3.
You are given the total volume m of the building. Being given m can you find the number n of cubes you will have to build?
The parameter of the function findNb (find_nb, find-nb, findNb) will be an integer m and you have to return the integer n such as n^3 + (n-1)^3 + ... + 1^3 = m if such a n exists or -1 if there is no such n.'
I tried to first create an arithmetic sequence then transform it into a sigma sum with the nth term of the arithmetic sequence, the get a formula which I can compare its value with m.
I used this code and work 70 - 80% fine, most of the calculations that it does are correct, but some don't.
import math
def find_nb(m):
n = 0
while n < 100000:
if (math.pow(((math.pow(n, 2))+n), 2)) == 4*m:
return n
break
n += 1
return -1
print(find_nb(4183059834009))
>>> output 2022, which is correct
print(find_nb(24723578342962))
>>> output -1, which is also correct
print(find_nb(4837083252765022010))
>>> real output -1, which is incorrect
>>> expected output 57323
As mentioned, this is a math problem, which is mainly what I am better at :).
Sorry for the in-line mathematical formula as I cannot do any math formula rendering (in SO).
I do not see any problem with your code and I believe your sample test case is wrong. However, I'll still give optimisation "tricks" below for your code to run quicker
Firstly as you know, sum of the cubes between 1^3 and n^3 is n^2(n+1)^2/4. Therefore we want to find integer solutions for the equation
n^2(n+1)^2/4 == m i.e. n^4+2n^3+n^2 - 4m=0
Running a loop for n from 1 (or in your case, 2021) to 100000 is inefficient. Firstly, if m is a large number (1e100+) the complexity of your code is O(n^0.25). Considering Python's runtime, you can run your code in time only if m is less than around 1e32.
To optimise your code, you have two approaches.
1) Use Binary Search. I will not get into the details here, but basically, you can halve the search range for a simple comparison. For the initial bounds you can use lower = 0 & upper = k. A better bound for k will be given below, but let's use k = m for now.
Complexity: O(log(k)) = O(log(m))
Feasible range for m: m < 10^(3e7)
2) Use the almighty Newton-Raphson!! Using the iteration formula x_(n+1) = x_n - f(x_n) / f'(x_n), where f'(x) can be calculated explicitly, and a reasonable initial guess, let's say k = m again, the complexity is (I believe) O(log(k)) + O(1) = O(log(m)).
Complexity: O(log(k)) = O(log(m))
Feasible range for m: m < 10^(3e7)
Finally, I'll give a better initial guess for k in the above methods, also given in Ian's answer to this question. Since n^4+2n^3+n^2 = O(n^4), we can actually take k ~ m^0.25 = (m^0.5)^0.5. To calculate this, We can take k = 2^(log(k)/4) where log is base 2. The log should be O(1), but I'm not sure for big numbers/dynamic size (int in Python). Not a theorist. Using this better guess and Newton-Raphson, since the guess is in a constant range from the result, the algorithm is nearly O(1). Again, check out the links for better understanding.
Finally
Since your goal is to find whether n exists such that the equation is "exactly satisfied", use Newton-Raphson and iterate until the next guess is less than 0.5 from the current guess. If your implementation is "floppy", you can also do a range +/- 10 from the guess to ensure that you find the solution.
I think this is a Math question rather than a programming question.
Firstly, I would advise you to start iterating from a function of your input m. Right now you are initialising your n value arbitrarily (though of course it might be a requirement of the question) but I think there are ways to optimise it. Maybe, just maybe you can iterate from the cube root, so if n reaches zero or if at any point the sum becomes smaller than m you can safely assume there is no possible building that can be built.
Secondly, the equation you derived from your summation doesn't seem to be correct. I substituted your expected n and input m into the condition in your if clause and they don't match. So either 1) your equation is wrong or 2) the expected output is wrong. I suggest that you relook at your derivation of the condition. Are you using the sum of cubes factorisation? There might be some edge cases that you neglected (maybe odd n) but my Math is rusty so I can't help much.
Of course, as mentioned, the break is unnecessary and will never be executed.

Nested for loops in python for Ising Model

Im working on statistical mechanics currently, and trying to apply some programming to it since they fit so well together! Im working on finding the partition function for a finite number of particles. However..the partition function is defined as a sum of a sum! I guess we could write this as a list of a list, so we would use nested for-loops, but i just cant quite figure out the correct way of writing it.
Z=\sum_{s_1}^{s_N}e^(s_1s_2+...+s_(N-1)s_N) is the partition function.
the possible values of s_i are -1,+1.
Effectively the ising model(1D) is a chain with N points on it and each point can have s_i=-1 or +1. The energy of the system depends on the values of s_i, and each possible combination is called a state. the total sum of these states is called Z, the partition fucntion.
So for a chain of length N=5(hence 2^5=32 possible states) how would i calculate this Z? I dont really have any code to show, but i know from the formula the result should be something like e^(+1+1+1+1+1)+e^(-1+1+1+1+1)+...+e^(-1-1-1-1-1). The question is..how on earth do I go about doing that? Ive generate the set of possible states:
import itertools
counting=0
for state in itertools.product([1,-1],repeat=5):
print(state)
counting+=1
print('the total possible number of states is',counting).
but how can i use this to get to a value for Z?
I'd use a function to calculate the sum for each state, then do the overall sum afterwards:
import itertools
from math import exp
def each_state(products):
for state in products:
yield sum(state)
Z = sum(exp(x) for x in each_state(itertools.product([1,-1],repeat=5)))
The benefit of this approach is that it is in keeping with the spirit of itertools: to not aggregate everything into memory at once. So while a numpy solution might be faster, say you wanted to calculate Z for many states, a numpy implementation would start to hit memory issues whereas the generator expression will not:
from itertools import product
import numpy as np
from math import exp
# this will yield a single number, and product will yield
# each state one at a time, never aggregating the
# full set of objects into memory (even though it might seem slow)
x = sum(exp(sum(x)) for x in product([1,-1], repeat=500))
# On my 16GB MacBook, this process will be killed because
# we collect all of the states into memory
x = np.array(list(product([1, -1], repeat=500))
[1] 7743 killed python
The general rule of thumb is that list(giant_iterable) runs out of space whereas for item in giant_iterable will run out of time
Based on your description of the problem, you can calculate it using numpy as follows:
import itertools
import numpy as np
states = np.array([state for state in itertools.product([1,-1], repeat=5)])
print("There are %d states" % states.shape[0]) # 32 states
# calculate the sum for each state
sum_over_each_state = np.sum(states, axis=1)
print(sum_over_each_state)
# calculate e^(sum(state)) for each state
exp_of_all_states = np.exp(sum_over_each_state)
print(exp_of_all_states)
# sum up all exponentials
Z = np.sum(exp_of_all_states)
print("Z:", Z)
This gives Z = 279.96.

Summation without using for loops - python

I am looking to perform a summation that, for the sake of this question, can take the form:
NOTE: Apologies, I have changed the equation as the original was not getting the problem across in the way I intended.
Ideally I would like to solve f(t) without using a for loop, as there are lots of different values of n and t to pass in. For example, the worst case scenario would be two for loops, taking the form:
import numpy as np
import math
for t in range(len(ft)):
sum=0
for n in range(1,len(N)):
sum += np.sin(math.pow(n,2) * t)
ft[t] = sum
I have improved this to only have one for loop, taking the form:
for t in range(len(ft)):
n = np.arange(1,N)
ft[t] = np.sum(np.sin(math.pow(n,2) * t))
Is there a way of further simplifying this to avoid having to iterate through all values of t? For my purposes, the summation that I need is too time expensive while having to loop through all values of t.
UPDATE: The actual equation I have trying to solve, since simplifying it for the sake of finding a solution appears to be causing confusion, is:
I can simplify it down to a single for loop through the range of t values (similar to the example shown). I am hoping to simplify it further as there are about 90000 t values to iterate through.
Sine is a periodic function. You can exploit this to perform a smaller number of sums than N sums as values will repeat - if not exactly due to number spacings, you can find the relative phaseshift to change the summed values for each new period within N terms.
You can perform Einstein summation in numpy to avoid for loops
np.sin(np.einsum('i,ji',np.arange(1,N)**2,np.tile(np.arange(len(ft))[:,None],N-1)))
Maybe I didn't get the question, but isn't it as simple as:
mysum(t, N):
return sum(map(lambda n : sin(t*n*n), range(N)))

Exponentially distributed random generator (log function) in python?

I really need help as I am stuck at the begining of the code.
I am asked to create a function to investigate the exponential distribution on histogram. The function is x = −log(1−y)/λ. λ is a constant and I referred to that as lamdr in the code and simply gave it 10. I gave N (the number of random numbers) 10 and ran the code yet the results and the generated random numbers gave me totally different results; below you can find the code, I don't know what went wrong, hope you guys can help me!! (I use python 2)
import random
import math
N = raw_input('How many random numbers you request?: ')
N = int(N)
lamdr = raw_input('Enter a value:')
lamdr = int(lamdr)
def exprand(lamdr):
y = []
for i in range(N):
y.append(random.uniform(0,1))
return y
y = exprand(lamdr)
print 'Randomly generated numbers:', (y)
x = []
for w in y:
x.append((math.log((1 - w) / lamdr)) * -1)
print 'Results:', x
After viewing the code you provided, it looks like you have the pieces you need but you're not putting them together.
You were asked to write function exprand(lambdr) using the specified formula. Python already provides a function called random.expovariate(lambd) for generating exponentials, but what the heck, we can still make our own. Your formula requires a "random" value for y which has a uniform distribution between zero and one. The documentation for the random module tells us that random.random() will give us a uniform(0,1) distribution. So all we have to do is replace y in the formula with that function call, and we're in business:
def exprand(lambdr):
return -math.log(1.0 - random.random()) / lambdr
An historical note: Mathematically, if y has a uniform(0,1) distribution, then so does 1-y. Implementations of the algorithm dating back to the 1950's would often leverage this fact to simplify the calculation to -math.log(random.random()) / lambdr. Mathematically this gives distributionally correct results since P{X = c} = 0 for any continuous random variable X and constant c, but computationally it will blow up in Python for the 1 in 264 occurrence where you get a zero from random.random(). One historical basis for doing this was that when computers were many orders of magnitude slower than now, ditching the one additional arithmetic operation was considered worth the minuscule risk. Another was that Prime Modulus Multiplicative PRNGs, which were popular at the time, never yield a zero. These days it's primarily of historical interest, and an interesting example of where math and computing sometimes diverge.
Back to the problem at hand. Now you just have to call that function N times and store the results somewhere. Likely candidates to do so are loops or list comprehensions. Here's an example of the latter:
abuncha_exponentials = [exprand(0.2) for _ in range(5)]
That will create a list of 5 exponentials with λ=0.2. Replace 0.2 and 5 with suitable values provided by the user, and you're in business. Print the list, make a histogram, use it as input to something else...
Replacing exporand with expovariate in the list comprehension should produce equivalent results using Python's built-in exponential generator. That's the beauty of functions as an abstraction, once somebody writes them you can just use them to your heart's content.
Note that because of the use of randomness, this will give different results every time you run it unless you "seed" the random generator to the same value each time.
WHat #pjs wrote is true to a point. While statement mathematically, if y has a uniform(0,1) distribution, so does 1-y appears to be correct, proposal to replace code with -math.log(random.random()) / lambdr is just wrong. Why? Because Python random module provide U(0,1) in the range [0,1) (as mentioned here), thus making such replacement non-equivalent.
In more layman term, if your U(0,1) is actually generating numbers in the [0,1) range, then code
import random
def exprand(lambda):
return -math.log(1.0 - random.random()) / lambda
is correct, but code
import random
def exprand(lambda):
return -math.log(random.random()) / lambda
is wrong, it will sometimes generate NaN/exception, as log(0) will be called

Work with logarithms of very large numbers

I have done a program that must output a file with values of a function. This function produce very large values but I only need the logarithm, which can go up to values of around 10000 or even a million (large, but manageable with integer 32 bit variables).
Now, obviously the function itself will be of the order of exp(10000) and that's huge. So I'm looking for any tricks to calculate the logarithm. I'm using python since I thought that it's native support for very large numbers would be useful, but it's not in the cases of very large numbers.
The value of the function is calculated as:
a*(x1+x2+x3+x4)
and I have to take the logarithm of that. I already preprocess the logarithms of all factors and then sum them all, but I can't do anything (at least anything simple) with log(x1+x2+x3+x4).
The results from python ar NaN because the x1,x2,x3,x4 variable grow to much. They are calculated as:
x = [1,1,1,1]
for i in range(1,K):
x[j] *= a*cosh(b*g[i]) # even values of i
x[j] *= a*sinh(b*g[i]) # odd values of i
for some constants a, b and a vector g[]. That's just pseudo code, I write each x[1], x[2].
Is there any trick by which I could calculate the logarithm of that sum without running into the NaN problem?
Thank you very much
P.S.: I was using python because of what I said, if there's any special library for C(++) or something like that to deal with very large numbers, I would really appreciate it.
P.S.: The constant b inside the cosh can be of the order of 100 and than can make things blow up, so if there's anything to do with taking that constant out somehow...
I see that in your loop you are each time multiplying each x with a constant a. If you skip that factor, and take log(x1+x2+x3+x4) which may then be manageable, you just add log(a) to that to get the final result. Or n*log(a) if you're multiplying by a several times.
That idea is language independent. :-)
Scaling the summands like this:
x = []
scale_factor = max(b*g[i] for i in range(1, K))
for i in range(1, K):
x[i] = cosh(b*g[i] - scale_factor)
result = log(a)*K + sum(log(x[i]) for i in range(1, K)) + log(scale_factor)
edit:
uh-oh, one detail is wrong:
result = log(a)*K + sum(log(x[i]) for i in range(1, K)) + scale_factor
The last term is just the factor, not it's log. Sorry.

Categories

Resources