Markov Chain Monte Carlo Simulation Prooblem - python

I'm trying to run a MC simulator for a Markov Chain that is uniformly distributed among all NxN matrices that have no neighboring 1's. My algo is supposed to fill up the state space by running the chain a bunch of times. However there's something horribly wrong with my logic somewhere and the state space just isn't filling up. Any help would be greatly appreciated. Here is my code.
import random
import numpy
M=numpy.zeros((52,52),dtype=int)
z=0
State_Space=[]
for i in range(1,100):
x=random.randint(1,50)
y=random.randint(1,50)
T=M
if T[x][y]==1:
T[x][y]=0
if T[x][y]==0:
T[x][y]=1
if T not in State_Space:
if T[x+1][y+1]==0 and T[x+1][y-1]==0 and T[x-1][y-1]==0 and T[x-1][y+1]==0:
State_Space.append(T)
M=T
else:
if T[x+1][y+1]==0 and T[x+1][y-1]==0 and T[x-1][y-1]==0 and T[x-1][y+1]==0:
M=T
print State_Space

I notice two things:
First in line 12 you have T=M and I assume you want T=M.copy(). Doing T=M makes T and M reference the same matrix, so changing a value in T will affect M also. If you assign a copy of M to T then this won't happen.
Second, T not in State_Space is not checking for T in the State_Space array. Because of how numpy indexing works, the in operator cannot be used for arrays. If you tried T in State_Space with a non-empty State_Space you would get a ValueError about truth value ambiguity. Instead you need to check if any element of State_Space is equal to T. We should use if any(numpy.array_equal(T, X) for X in State_Space):
In the end, my code looks like this:
import random
import numpy
M=numpy.zeros((52,52),dtype=int)
z=0
State_Space=[]
for i in range(1,100):
x=random.randint(1,50)
y=random.randint(1,50)
T=M.copy()
if T[x][y]==1:
T[x][y]=0
if T[x][y]==0:
T[x][y]=1
if not any(numpy.array_equal(T, X) for X in State_Space):
if T[x+1][y+1]==0 and T[x+1][y-1]==0 and T[x-1][y-1]==0 and T[x-1][y+1]==0:
State_Space.append(T)
M=T
else:
if T[x+1][y+1]==0 and T[x+1][y-1]==0 and T[x-1][y-1]==0 and T[x-1][y+1]==0:
M=T
print len(State_Space)
After running, I have ~90 entries in State_Space.

Related

Is there anyway to make this code more efficient? I am running it on a sparse matrix (m) that is around 190k x 30k. Perhaps a way to get rid of loops?

from collections import Counter
import numpy as np
from sklearn.preprocessing import normalize
from sklearn.metrics.pairwise import pairwise_distances
import sys
import heapq
from .kmeanspp import kmeanspp
from .utils import log
from fbpca import pca
import math as math
import statistics
import random
import operator as op
import time
lil=m.tolil()
gene_avg=[]
cell_rndm=random.sample(range(lil.shape[0]), round(lil.shape[0]*rndm_cell_pcnt))
print('calculating gene averages')
for gene in range(lil.shape[1]):
print('calculating mean')
gene_avg.append(lil[:,gene].todense().mean())
print('setting chosen cells to cell type random. changing gene expression')
for cell in cell_rndm:
nonzero=set(lil[cell,:].nonzero()[1])
rndm_nonzero=random.sample(nonzero,round(len(nonzero)*gene_prcnt))
zero =list(set(list(range(lil.shape[1])))-nonzero)
rndm_zero=random.sample(zero,round(len(zero)*gene_prcnt))
print('setting celltype to random')
labels[cell] = 'random'
print('reraranging some gene expresion')
lil[cell,rndm_nonzero]=0.0
lil[cell,rndm_zero] = list(itemgetter(*rndm_zero)(gene_avg))
In the first for loop, I am giving all 30k genes their mean expression. In the second I am going through all 190k cells and setting some nonzero genes to zero and zero ones to the average. This process takes a very long time.
for gene in range(lil.shape[1]):
print('calculating mean')
gene_avg.append(lil[:,gene].todense().mean())
should be replaceable with
arr = lil.A # dense array
gene_avg = arr.mean(axis=1)
With numpy random, you should be able to take all 'random-samples' at once. I haven't studied your code enough to give you the details.
While lil assignment is the best for sparse matrices, assigning values to dense arrays is better - if they fit in memory.
Trying to understand the action:
For one cell (random list, no repeats), hence lil[cell,:] is a row of the lil.
Get the nonzero indices, as a set. set(lil.rows[cell]) may do the same thing:
nonzero=set(lil[cell,:].nonzero()[1])
then get a random sample of those:
rndm_nonzero=random.sample(nonzero,round(len(nonzero)*gene_prcnt))
And a random sample of the zeros, using set difference. Use of set feels like it might be slower than needed, but I haven't worked out an alternative.
zero =list(set(list(range(lil.shape[1])))-nonzero)
rndm_zero=random.sample(zero,round(len(zero)*gene_prcnt))
Prints are great for debugging, but they do slow down the run.
print('setting celltype to random')
labels[cell] = 'random'
print('reraranging some gene expresion')
And finally set some elements of that row to 0
lil[cell,rndm_nonzero]=0.0
If gene_avg is an array, then you can use gene_avg[rndm_zero] instead of this itemgetter. I don't think itemgetter is any faster than [gene_avg[i] for i in rndm_zero].
lil[cell,rndm_zero] = list(itemgetter(*rndm_zero)(gene_avg))
While it would be nice to work with all sampled rows at once
arr[cell_rndm]
it would take a lot of work to get the details right. So for a start I'd focus on streamlining the row by row operation.

How to extract numpy arrays (pairs) from a list which are subjected to condition depending on both the elements in pair?

I want to generate a number of random points in hexagon. To do so, i generate random points in square and then try to use conditions to drop not suitable pairs. I tried solutions like this:
import scipy.stats as sps
import numpy as np
size=100
kx = 1/np.sqrt(3)*sps.uniform.rvs(loc=-1,scale=2,size=size)
ky = 2/3*sps.uniform.rvs(loc=-1,scale=2,size=size)
pairs = [(i, j) for i in kx for j in ky]
def conditions(pair):
return (-1/np.sqrt(3)<pair[0]<1/np.sqrt(3)) & (-2/3<pair[1]<2/3)
mask = np.apply_along_axis(conditions, 1, pairs)
hex_pairs = np.extract(mask, pairs)
L=len(hex_pairs)
print(L)
In this example I try to construct a logical mask for future use of np.extract to extract needed values. I try to apply conditional function to all pairs from a list. But it seems that I understand something badly because if using this mask the output of this code is:
10000
That means that no pairs were dropped and all boolean numbers in mask were True. Can anyone suggest how to correct this solution or maybe to put it another way (with a set of randomly distributed points in hexagon as a result)?
The reason why none of your pairs gets eliminated is, that they are created such that the condition is fulfilled (all x-values are in [-1/sqrt(3), 1/sqrt(3)], similar for the y-values).
I think an intuitive and easy way to get their is to create a hexagonal polygon, generate uniformly distributed random numbers within a square that encloses this hexagon and then apply the respective method from one of the already existing polygon-libraries, such as shapely. See e.g. https://stackoverflow.com/a/36400130/7084566

Nested for loops in python for Ising Model

Im working on statistical mechanics currently, and trying to apply some programming to it since they fit so well together! Im working on finding the partition function for a finite number of particles. However..the partition function is defined as a sum of a sum! I guess we could write this as a list of a list, so we would use nested for-loops, but i just cant quite figure out the correct way of writing it.
Z=\sum_{s_1}^{s_N}e^(s_1s_2+...+s_(N-1)s_N) is the partition function.
the possible values of s_i are -1,+1.
Effectively the ising model(1D) is a chain with N points on it and each point can have s_i=-1 or +1. The energy of the system depends on the values of s_i, and each possible combination is called a state. the total sum of these states is called Z, the partition fucntion.
So for a chain of length N=5(hence 2^5=32 possible states) how would i calculate this Z? I dont really have any code to show, but i know from the formula the result should be something like e^(+1+1+1+1+1)+e^(-1+1+1+1+1)+...+e^(-1-1-1-1-1). The question is..how on earth do I go about doing that? Ive generate the set of possible states:
import itertools
counting=0
for state in itertools.product([1,-1],repeat=5):
print(state)
counting+=1
print('the total possible number of states is',counting).
but how can i use this to get to a value for Z?
I'd use a function to calculate the sum for each state, then do the overall sum afterwards:
import itertools
from math import exp
def each_state(products):
for state in products:
yield sum(state)
Z = sum(exp(x) for x in each_state(itertools.product([1,-1],repeat=5)))
The benefit of this approach is that it is in keeping with the spirit of itertools: to not aggregate everything into memory at once. So while a numpy solution might be faster, say you wanted to calculate Z for many states, a numpy implementation would start to hit memory issues whereas the generator expression will not:
from itertools import product
import numpy as np
from math import exp
# this will yield a single number, and product will yield
# each state one at a time, never aggregating the
# full set of objects into memory (even though it might seem slow)
x = sum(exp(sum(x)) for x in product([1,-1], repeat=500))
# On my 16GB MacBook, this process will be killed because
# we collect all of the states into memory
x = np.array(list(product([1, -1], repeat=500))
[1] 7743 killed python
The general rule of thumb is that list(giant_iterable) runs out of space whereas for item in giant_iterable will run out of time
Based on your description of the problem, you can calculate it using numpy as follows:
import itertools
import numpy as np
states = np.array([state for state in itertools.product([1,-1], repeat=5)])
print("There are %d states" % states.shape[0]) # 32 states
# calculate the sum for each state
sum_over_each_state = np.sum(states, axis=1)
print(sum_over_each_state)
# calculate e^(sum(state)) for each state
exp_of_all_states = np.exp(sum_over_each_state)
print(exp_of_all_states)
# sum up all exponentials
Z = np.sum(exp_of_all_states)
print("Z:", Z)
This gives Z = 279.96.

Numpy Array index problems

I am having a small issue understanding indexing in Numpy arrays. I think a simplified example is best to get an idea of what I am trying to do.
So first I create an array of zeros of the size I want to fill:
x = range(0,10,2)
y = range(0,10,2)
a = zeros(len(x),len(y))
so that will give me an array of zeros that will be 5X5. Now, I want to fill the array with a rather complicated function that I can't get to work with grids. My problem is that I'd like to iterate as:
for i in xrange(0,10,2):
for j in xrange(0,10,2):
.........
"do function and fill the array corresponding to (i,j)"
however, right now what I would like to be a[2,10] is a function of 2 and 10 but instead the index for a function of 2 and 10 would be a[1,4] or whatever.
Again, maybe this is elementary, I've gone over the docs and find myself at a loss.
EDIT:
In the end I vectorized as much as possible and wrote the simulation loops that I could not in Cython. Further I used Joblib to Parallelize the operation. I stored the results in a list because an array was not filling right when running in Parallel. I then used Itertools to split the list into individual results and Pandas to organize the results.
Thank you for all the help
Some tips for your to get the things done keeping a good performance:
- avoid Python `for` loops
- create a function that can deal with vectorized inputs
Example:
def f(xs, ys)
return x**2 + y**2 + x*y
where you can pass xs and ys as arrays and the operation will be done element-wise:
xs = np.random.random((100,200))
ys = np.random.random((100,200))
f(xs,ys)
You should read more about numpy broadcasting to get a better understanding about how the arrays's operations work. This will help you to design a function that can handle properly the arrays.
First, you lack some parenthesis with zeros, the first argument should be a tuple :
a = zeros((len(x),len(y)))
Then, the corresponding indices for your table are i/2 and j/2 :
for i in xrange(0,10,2):
for j in xrange(0,10,2):
# do function and fill the array corresponding to (i,j)
a[i/2, j/2] = 1
But I second Saullo Castro, you should try to vectorize your computations.

How do I create the environment for my genetic algorithm?

I'm writing a genetic algorithm in python about the optimal series of moves for a virtual organism that will get it the most randomly-placed food in a 2D-grid. It does not have intelligence; it just moves in a pattern ie circle or square. My code for creating the 2D array for the environment that the organisms reside in is this:
grid = ([])
for i in range(5):
grid[i]=0
for j in range(5):
grid[i][j]=0
(board[4][5] means 4,5 in x, y; and the value of board[4][5] is 0 or 1, depending on
whether or not the space is occupied. Right now the program is really just assigning
a zero-value to each space, indicating no individual is there)
It just says "list assignment index out of range." How can i fix this? By the way, does anyone know of a better way to create the 2D environment for the organisms?
right now your array is only one element and you're indexing outside of the array. Try this
grid = [[[] for x in xrange(5)] for y in xrange(5)]
in place of your grid. This will now give you a 5 by 5 grid and now you can index grid[3][4].
It's likely to require a lot of evaluations of your fitness function, so an efficient implementation might be very beneficial for you. Numpy offers multi-dimensional arrays out of the box.
numpy.zeros((5, 5))
it will give you a 5x5 array filled with zeros. Numpy offers also nice things like counting occurrences of a value, which will be way faster than a pure Python implementation.

Categories

Resources