Im working on statistical mechanics currently, and trying to apply some programming to it since they fit so well together! Im working on finding the partition function for a finite number of particles. However..the partition function is defined as a sum of a sum! I guess we could write this as a list of a list, so we would use nested for-loops, but i just cant quite figure out the correct way of writing it.
Z=\sum_{s_1}^{s_N}e^(s_1s_2+...+s_(N-1)s_N) is the partition function.
the possible values of s_i are -1,+1.
Effectively the ising model(1D) is a chain with N points on it and each point can have s_i=-1 or +1. The energy of the system depends on the values of s_i, and each possible combination is called a state. the total sum of these states is called Z, the partition fucntion.
So for a chain of length N=5(hence 2^5=32 possible states) how would i calculate this Z? I dont really have any code to show, but i know from the formula the result should be something like e^(+1+1+1+1+1)+e^(-1+1+1+1+1)+...+e^(-1-1-1-1-1). The question is..how on earth do I go about doing that? Ive generate the set of possible states:
import itertools
counting=0
for state in itertools.product([1,-1],repeat=5):
print(state)
counting+=1
print('the total possible number of states is',counting).
but how can i use this to get to a value for Z?
I'd use a function to calculate the sum for each state, then do the overall sum afterwards:
import itertools
from math import exp
def each_state(products):
for state in products:
yield sum(state)
Z = sum(exp(x) for x in each_state(itertools.product([1,-1],repeat=5)))
The benefit of this approach is that it is in keeping with the spirit of itertools: to not aggregate everything into memory at once. So while a numpy solution might be faster, say you wanted to calculate Z for many states, a numpy implementation would start to hit memory issues whereas the generator expression will not:
from itertools import product
import numpy as np
from math import exp
# this will yield a single number, and product will yield
# each state one at a time, never aggregating the
# full set of objects into memory (even though it might seem slow)
x = sum(exp(sum(x)) for x in product([1,-1], repeat=500))
# On my 16GB MacBook, this process will be killed because
# we collect all of the states into memory
x = np.array(list(product([1, -1], repeat=500))
[1] 7743 killed python
The general rule of thumb is that list(giant_iterable) runs out of space whereas for item in giant_iterable will run out of time
Based on your description of the problem, you can calculate it using numpy as follows:
import itertools
import numpy as np
states = np.array([state for state in itertools.product([1,-1], repeat=5)])
print("There are %d states" % states.shape[0]) # 32 states
# calculate the sum for each state
sum_over_each_state = np.sum(states, axis=1)
print(sum_over_each_state)
# calculate e^(sum(state)) for each state
exp_of_all_states = np.exp(sum_over_each_state)
print(exp_of_all_states)
# sum up all exponentials
Z = np.sum(exp_of_all_states)
print("Z:", Z)
This gives Z = 279.96.
Related
The quantity to be computed is log(k!), where k could be 4000 or even higher, but of course the log will compensate. I tried computing sum(log(k)) which is the same.
So, I am given an large array with integers and I want to efficiently compute sum(log(k)). This was my attempt:
integers = np.asarray([435, 535, 242,])
score = np.sum(np.log(np.arange(1,integers+1)))
This would work, except that np.arange would generate an array of different size for each integer, so when I run that, it gives me an error (as it should).
The problem could be easily solved with a for loop as follows:
scores = []
for i in range(integers.shape[0]):
score = np.sum(np.log(np.arange(1,integer[i]+1)))
scores.append(score)
but that's too slow. My actual integers has millions of value to be computed.
Is there an efficient implementation for this that basically that doesn't need a for loop? I was thinking of a lambda function or something like that, but I am not really sure how to apply it. Any help is appreciated!
How about math.lgamma? Gamma function is factorial, and lgamma is log of gamma.
You don't need to compute factorial and then log.
There is also gammaln in the SciPy
Code, Python 3.9 x64 Win 10
import numpy as np
from scipy.special import gammaln
startf = 1 # start of factorial sequence
stopf = 400 # end of of factorial sequence
q = gammaln(range(startf+1, stopf+1)) # n! = G(n+1)
print(q)
looks reasonable to me
You can vectorize with something like this:
mi = integers.max()
ls = np.log(np.arange(2, mi + 1))
Two optimizations so far: you only need the range up to the maximum, since the other numbers are covered by that, and you don't need log(1).
Now you take the cumulative sum:
cs = np.cumsum(ls)
The desired elements can be indexed directly:
result = cs[integers - 2]
If this is something you need to do many times, and you know the upper bound, this solution will be much faster than using math.lgmamma or scipy.special.gammaln once you precompute cs to the upper bound.
If this is a one-time call, here is the obligatory one-liner:
np.cumsum(np.log(np.arange(2, np.max(integers))))[integers - 2]
You can do most of the operations in-place if memory is a concern (I think it also makes them faster):
mi = integers.max()
cs = np.arange(2, mi + 1)
np.cumsum(np.log(cs, out=cs), out=cs)
I really need help as I am stuck at the begining of the code.
I am asked to create a function to investigate the exponential distribution on histogram. The function is x = −log(1−y)/λ. λ is a constant and I referred to that as lamdr in the code and simply gave it 10. I gave N (the number of random numbers) 10 and ran the code yet the results and the generated random numbers gave me totally different results; below you can find the code, I don't know what went wrong, hope you guys can help me!! (I use python 2)
import random
import math
N = raw_input('How many random numbers you request?: ')
N = int(N)
lamdr = raw_input('Enter a value:')
lamdr = int(lamdr)
def exprand(lamdr):
y = []
for i in range(N):
y.append(random.uniform(0,1))
return y
y = exprand(lamdr)
print 'Randomly generated numbers:', (y)
x = []
for w in y:
x.append((math.log((1 - w) / lamdr)) * -1)
print 'Results:', x
After viewing the code you provided, it looks like you have the pieces you need but you're not putting them together.
You were asked to write function exprand(lambdr) using the specified formula. Python already provides a function called random.expovariate(lambd) for generating exponentials, but what the heck, we can still make our own. Your formula requires a "random" value for y which has a uniform distribution between zero and one. The documentation for the random module tells us that random.random() will give us a uniform(0,1) distribution. So all we have to do is replace y in the formula with that function call, and we're in business:
def exprand(lambdr):
return -math.log(1.0 - random.random()) / lambdr
An historical note: Mathematically, if y has a uniform(0,1) distribution, then so does 1-y. Implementations of the algorithm dating back to the 1950's would often leverage this fact to simplify the calculation to -math.log(random.random()) / lambdr. Mathematically this gives distributionally correct results since P{X = c} = 0 for any continuous random variable X and constant c, but computationally it will blow up in Python for the 1 in 264 occurrence where you get a zero from random.random(). One historical basis for doing this was that when computers were many orders of magnitude slower than now, ditching the one additional arithmetic operation was considered worth the minuscule risk. Another was that Prime Modulus Multiplicative PRNGs, which were popular at the time, never yield a zero. These days it's primarily of historical interest, and an interesting example of where math and computing sometimes diverge.
Back to the problem at hand. Now you just have to call that function N times and store the results somewhere. Likely candidates to do so are loops or list comprehensions. Here's an example of the latter:
abuncha_exponentials = [exprand(0.2) for _ in range(5)]
That will create a list of 5 exponentials with λ=0.2. Replace 0.2 and 5 with suitable values provided by the user, and you're in business. Print the list, make a histogram, use it as input to something else...
Replacing exporand with expovariate in the list comprehension should produce equivalent results using Python's built-in exponential generator. That's the beauty of functions as an abstraction, once somebody writes them you can just use them to your heart's content.
Note that because of the use of randomness, this will give different results every time you run it unless you "seed" the random generator to the same value each time.
WHat #pjs wrote is true to a point. While statement mathematically, if y has a uniform(0,1) distribution, so does 1-y appears to be correct, proposal to replace code with -math.log(random.random()) / lambdr is just wrong. Why? Because Python random module provide U(0,1) in the range [0,1) (as mentioned here), thus making such replacement non-equivalent.
In more layman term, if your U(0,1) is actually generating numbers in the [0,1) range, then code
import random
def exprand(lambda):
return -math.log(1.0 - random.random()) / lambda
is correct, but code
import random
def exprand(lambda):
return -math.log(random.random()) / lambda
is wrong, it will sometimes generate NaN/exception, as log(0) will be called
I am using numpy random generator to generate a number from (0,1) with
np.random.uniform(0,1)
I can't find a way to seed a number and still keep it uniformly and randomly chosen from 0 to 1.
I am also desperately new at coding.
I personally would recommend creating a new numpy.random.RandomState object rather than using np.random.seed. For example:
import numpy as np
rs = np.random.RandomState(0)
x = rs.randn(10)
will give an equivalent result to:
np.random.seed(0)
x = np.random.randn(10)
However the first method is much more explicit and makes it easier to keep track of the RNG state, for example in cases where you need multiple random number generators with different internal states.
If you want to set the seed for the generator use numpy.random.seed.
I'm trying to run a MC simulator for a Markov Chain that is uniformly distributed among all NxN matrices that have no neighboring 1's. My algo is supposed to fill up the state space by running the chain a bunch of times. However there's something horribly wrong with my logic somewhere and the state space just isn't filling up. Any help would be greatly appreciated. Here is my code.
import random
import numpy
M=numpy.zeros((52,52),dtype=int)
z=0
State_Space=[]
for i in range(1,100):
x=random.randint(1,50)
y=random.randint(1,50)
T=M
if T[x][y]==1:
T[x][y]=0
if T[x][y]==0:
T[x][y]=1
if T not in State_Space:
if T[x+1][y+1]==0 and T[x+1][y-1]==0 and T[x-1][y-1]==0 and T[x-1][y+1]==0:
State_Space.append(T)
M=T
else:
if T[x+1][y+1]==0 and T[x+1][y-1]==0 and T[x-1][y-1]==0 and T[x-1][y+1]==0:
M=T
print State_Space
I notice two things:
First in line 12 you have T=M and I assume you want T=M.copy(). Doing T=M makes T and M reference the same matrix, so changing a value in T will affect M also. If you assign a copy of M to T then this won't happen.
Second, T not in State_Space is not checking for T in the State_Space array. Because of how numpy indexing works, the in operator cannot be used for arrays. If you tried T in State_Space with a non-empty State_Space you would get a ValueError about truth value ambiguity. Instead you need to check if any element of State_Space is equal to T. We should use if any(numpy.array_equal(T, X) for X in State_Space):
In the end, my code looks like this:
import random
import numpy
M=numpy.zeros((52,52),dtype=int)
z=0
State_Space=[]
for i in range(1,100):
x=random.randint(1,50)
y=random.randint(1,50)
T=M.copy()
if T[x][y]==1:
T[x][y]=0
if T[x][y]==0:
T[x][y]=1
if not any(numpy.array_equal(T, X) for X in State_Space):
if T[x+1][y+1]==0 and T[x+1][y-1]==0 and T[x-1][y-1]==0 and T[x-1][y+1]==0:
State_Space.append(T)
M=T
else:
if T[x+1][y+1]==0 and T[x+1][y-1]==0 and T[x-1][y-1]==0 and T[x-1][y+1]==0:
M=T
print len(State_Space)
After running, I have ~90 entries in State_Space.
I regularly find myself in the position of needing a random index to an array or a list, where the probabilities of indices are not uniformly distributed, but according to certain positive weights. What's a fast way to obtain them? I know I can pass weights to numpy.random.choice as optional argument p, but the function seems quite slow, and building an arange to pass it is not ideal either. The sum of weights can be an arbitrary positive number and is not guaranteed to be 1, which makes the approach to generate a random number in (0,1] and then substracting weight entries until the result is 0 or less impossible.
While there are answers on how to implement similar things (mostly not about obtaining the array index, but the corresponding element) in a simple manner, such as Weighted choice short and simple, I'm looking for a fast solution, because the appropriate function is executed very often. My weights change frequently, so the overhead of building something like an alias mask (a detailed introduction can be found on http://www.keithschwarz.com/darts-dice-coins/) should be considered part of the calculation time.
Cumulative summing and bisect
In any generic case, it seems advisable to calculate the cumulative sum of weights, and use bisect from the bisect module to find a random point in the resulting sorted array
def weighted_choice(weights):
cs = numpy.cumsum(weights)
return bisect.bisect(cs, numpy.random.random() * cs[-1])
if speed is a concern. A more detailed analysis is given below.
Note: If the array is not flat, numpy.unravel_index can be used to transform a flat index into a shaped index, as seen in https://stackoverflow.com/a/19760118/1274613
Experimental Analysis
There are four more or less obvious solutions using numpy builtin functions. Comparing all of them using timeit gives the following result:
import timeit
weighted_choice_functions = [
"""import numpy
wc = lambda weights: numpy.random.choice(
range(len(weights)),
p=weights/weights.sum())
""",
"""import numpy
# Adapted from https://stackoverflow.com/a/19760118/1274613
def wc(weights):
cs = numpy.cumsum(weights)
return cs.searchsorted(numpy.random.random() * cs[-1], 'right')
""",
"""import numpy, bisect
# Using bisect mentioned in https://stackoverflow.com/a/13052108/1274613
def wc(weights):
cs = numpy.cumsum(weights)
return bisect.bisect(cs, numpy.random.random() * cs[-1])
""",
"""import numpy
wc = lambda weights: numpy.random.multinomial(
1,
weights/weights.sum()).argmax()
"""]
for setup in weighted_choice_functions:
for ps in ["numpy.ones(40)",
"numpy.arange(10)",
"numpy.arange(200)",
"numpy.arange(199,-1,-1)",
"numpy.arange(4000)"]:
timeit.timeit("wc(%s)"%ps, setup=setup)
print()
The resulting output is
178.45797914802097
161.72161589498864
223.53492237901082
224.80936180002755
1901.6298267539823
15.197789980040397
19.985687876993325
20.795070077001583
20.919113760988694
41.6509403079981
14.240949985047337
17.335801470966544
19.433710905024782
19.52205040602712
35.60536142199999
26.6195822560112
20.501282756973524
31.271995796996634
27.20013752405066
243.09768892999273
This means that numpy.random.choice is surprisingly very slow, and even the dedicated numpy searchsorted method is slower than the type-naive bisect variant. (These results were obtained using Python 3.3.5 with numpy 1.8.1, so things may be different for other versions.) The function based on numpy.random.multinomial is less efficient for large weights than the methods based on cumulative summing. Presumably the fact that argmax has to iterate over the whole array and run comparisons each step plays a significant role, as can be seen as well from the four second difference between an increasing and a decreasing weight list.