Chinese Restaurant Process implementation in Python

Chinese Restaurant Process implementation in Python - python

I have wrote a code in Python for CRP problem. The problem itself can be found here:
http://cog.brown.edu/~mj/classes/cg168/slides/ChineseRestaurants.pdf
And to give a short description of it:
Suppose we want to assign people entering to a restaurants to potentially infinite number of tables. If $z_i$ represents the random variable assigned for the $i$'th person entering the restaurant the following should hold:
With probability $p(z_i=a|z_1,...,z_{i-1})=\frac{n_a}{i-1+\alpha} for $n_a>0$, $i$'th person will sit in table $a$ and with probability $p(z_i=a|z_1,...,z_{i-1})=\frac{\alpha}{i-1+\alpha} $i$'th person will sit around a new table.
I am not quite sure if my code is correct cause I am surprised how small the final number of tables are.
I would be happy if somebody could say if the implementation is correct and if so are there any possible improvements.
import numpy as np
def CRP(alpha,N):
"""Chinese Restaurant Process with alpha as concentration parameter and N
the number of sample"""
#Array which will save for each i, the number of people people sitting
#until table i
summed=np.ones(1) #first person assigned to the first table
for i in range(1,N):
#A loop that assigns the people to tables
#randind represent the random number from the interval [1,i-1+alpha]
randind=(float(i)+alpha)*np.random.uniform(low=0.0, high=1.0, size=1)
#update is the index for the table that the person should be placed which
#if greater than the total number, will be placed in a new table
update=np.searchsorted(summed,randind,side='left')
if randind>i:
summed=np.append(summed,i+1)
else:
zerovec=np.zeros(update)
onevec=np.ones(summed.size-update)
summed+=np.append(zerovec,onevec)
#This part converts summed array to tables array which indicates the number
#of persons assigned to that table
tables=np.zeros(summed.size)
tables[0]=summed[0]
for i in range(1,summed.size):
tables[i]=summed[i]-summed[i-1]
return tables
a=CRP(0.9999,1000)
print a

Suggestion. Forget about the code you have written. Construct declarative tests of the code. By taking that approach, you start with examples for which you know the correct answer. That would have answered Brainiac's question, for example.
Then write your program. You will likely find that if you start approaching problems this way, you may create sub-problems first, for which you can also write tests. Until they all pass, there is no need to rush on to the full problem.

Related

Python convert list into split lists

so I have been given the task of using an api to pull student records and learnerID's to put into an in house application. The json formatting is dreadful and the only successful way I managed to split students individually is by the last value.
Now I am at the next stumbling block, I need to split these student lists into smaller sections so I implement a for loop as so:
student = request.text.split('"SENMajorNeedsDetails"')
for students in student:
r = str(student).split(',')
print (student[0], student[1])
print (r[0], r[1])
This works perfectly except this puts it all into a single list again and each student record isn't a set length (some have more values/fields than others).
so what I am looking to do is have a list for each student split on the comma, so student1 would equal [learnerID,personID,name,etc...]
this way when I want to reference the learnerID I can call learner1[0]
It is also very possible that I am going about this the wrong way and I should be doing some other form of list comprehension
my step by step process that I am aiming towards is:
pull data from system - DONE
split data into individual students - DONE
take learnerID,name,group of each student and add database entry
I have split step 3 into two stages where one involves my issue above and the second is the create database records
Below is a shortended example of the list item student[0], followed by student[1] if more is needed then say
:null},{"LearnerId":XXXXXX,"PersonId":XXXXXX,"LearnerCode":"XXXX-XXXXXX","UPN":"XXXXXXXXXXX","ULN":"XXXXXXXXXX","Surname":"XXXXX","Forename":"XXXXX","LegalSurname":"XXXXX","LegalForename":"XXXXXX","DateOfBirth":"XX/XX/XXXX 00:00:00","Year":"XX","Course":"KS5","DateOfEntry":"XX/XX/XXXX 00:00:00","Gender":"X","RegGroup":"1XX",],
:null},{"LearnerId":YYYYYYY,"PersonId":YYYYYYYY,"LearnerCode":"XXXX-YYYYYYYY","UPN":"YYYYYYYYYY","ULN":"YYYYYYYYYY","Surname":"YYYYYYYY","Forename":"YYYYYY","LegalSurname":"YYYYYY","LegalForename":"YYYYYYY","DateOfBirth":"XX/XX/XXXX 00:00:00","Year":"XX","Course":"KS5","DateOfEntry":"XX/XX/XXXX 00:00:00","Gender":"X","RegGroup":"1YY",],
Sorry doesn't like putting it on seperate lines
EDIT* changed wording at the end and added a redacted student record

Just to clarify the resolution to my issue was to learn how to parse JSON propperly, this was pointed out by #Patrick Haugh and all credit should go to him for pointing me in the right direction. Second most helpful person was #ArndtJonasson
The problem was that I was manually trying to do the job of the JSON library and I am no where near that level of competency yet. As stated originally it was totally likely that I was going about it in completely the wrong way.

Minimum and Maximum query not working properly (Python 3.5)

I wonder if you can help because I've been looking at this for a good half hour and I'm completely baffled, I think I must be missing something so I hope you can shed some light on this.
In this area of my program I am coding a query which will search a list of tuples for the salary of the person. Each tuple in the list is a separate record of a persons details, hence I have used two indexes; one for the record which is looped over, and one for the salary of the employee. What I am aiming for is for the program to ask you a minimum and maximum salary and for the program to print the names of the employees who are in that salary range.
It all seemed to work fine, until I realised that when entering in the value '100000' as a maximum value the query would output nothing. Completely baffled I tried entering in '999999' which then worked and all records were print. The only thing that I can think of is that the program is ignoring the extra digit, which I could not figure out why this would be?!
Below is my code for that specific section and output for a maximum value of 999999 (I would prefer not to paste the whole program as this is for a coursework project and I want to prevent anyone on the same course potentially copying my work, sorry if this makes my question unclear!):
The maximum salary out of all the records is 55000, hence why it doesnt make sense that a minimum of 0 and maximum of 100000 does not work, but a maximum of 999999 does!
If any more information is need to help, please ask! This probably seems unclear but like I said above, I dont want anyone from the class to plagiarise and my work to be void because of that! So I have tried to ask this without posting all my code on here!

Given your use of the print function (instead of the Python 2 print statement), it looks like you're writing Python 3 code. In Python 3, input returns a str. I'm guessing your data is also storing the salaries as str (otherwise the comparison would raise a TypeError). You need to convert both stored values and the result of input to int so it performs numerical comparisons, not ASCIIbetical comparisons.

When you read in from standard input in Python, no matter what input you get, you receive the input as a string. That means that your comparison function is resulting to:
if tuplist[x][2] > "0" and tuplist[x][2] < "999999" :
Can you see what the problem is now? Because it's a homework assignment, I don't want to give you the answer straight away.

Adding +1 to specific matrix elements

I'm currently coding an algorithm for 4-parametric RAINFLOW method. The idea of this method is to eliminate load cycles from a cycle history, which is normally given in a load (for example force) - time diagram. This is a very frequently used method in mechanical engineering to determine the life span of a product/element, that is exposed to a certain number of the load cycles.
However the result of this method is a so called FROM-TO table or FROM-TO matrix where the rows present the FROM and the columns present the TO number like shown in the picture below:
example of from-to table/matrix
This example is non-realistic as you normally get a file with million points of measurements which means, that some cycles won't occur only once(1) or twice (2) like its shown in the table, but they may occur thousands of times.
Now to the problem:
I coded the algorithm of the method and as a result formed a vector with FROM values and a vector with TO values, like this:
vek_from=[]
vek_to=[]
d=len(a)/2
for i in range(int(d)):
vek_from.append(a[2*i]) # FROM
vek_to.append(a[2*i+1]) # TO
a is the vector with all values, like a=[from, to, from, to,...]
Now I'm trying to form a matrix out of this, like this:
mat_from_to = np.zeros(shape=(int(d),int(d)))
MAT = np.zeros(shape=(int(d),int(d)))
s=int(d-1)
for i in range(s):
mat_from_to[vek_from[i]-2, vek_to[i]-2] += 1
So the problem is that I don't know how to code that when a load cycles occurs several times (it has the same from-to values), how to add +1 to the FROM-TO combination every time that happens, because with what I've coded, it only replaces the previous value with 1, so I can never exceed 1...
So to make explanation shorter, how to code that whenever a combination of FROM-TO values is made that determined the position of an element in the matrix, to add a +1 there...
Hopefully I didn't make it too complicated and someone will be happy to help me with this.
Regards,
Luka

Random number based on probability python

Hi I have done reseatch and I believe I ended up in the right direction when I ended up at this thread:
http://code.activestate.com/recipes/117241/
Basically my question is: what is the code in the link doing line by line. You could potentially ignore all that I wrote below if your explanation makes me understand what the code in the link does to a satisfactory extent.
I BELIEVE that the code at that link generates a random number BUT the random number is directly related to the probability.
In my own code I am attempting to take a "number" and its probability of appearing, and get an output "number", that will appear according to the probability. I know this is confusing but if you look at the link above then I hope it will be clear what I am trying to do. My code below is in reference to the link above.
so in my program, these are my global variables:
HIGH= 3
MED= 2
LOW= 1
This is the list I am working with:
n= [LOW,lowAttackProb).(MED,medAttackProb),(HIGH,highAttackProb)]
#lowAttackProb,med...,etc. are based on user input are just percents converted to decimals that add up to 1 in every case
This is how I implemented the random code as per the link above:
x= random.uniform(0,1)
for alevel,probability in n:
if x<probability:
break
x=x-probability
return alevel
I am unsure exactly what is happening inside the for loop and what x=x-probability is doing.
Lets say that x=0.90
and that in my list, the chance of the second list entry occuring is 0.60, then, since x (is less than)probability is False(im not too sure what if x(is less than)probability even does), the code moves on to n=n-probability.
I really hope this makes sense. If it does not please let me know what is unclear and I will try to fix it up. Thank you for any and all help.

This code implements the selection of event taking probabilities of possible events into account. Here is the idea behind it.
There are three events (or levels as you call them), LOW, MED, HIGH, with certain nonzero probability each, and all probabilities sum up to exactly 1. Using standard means of Python one can generate a random number between 0 and 1. So how can we "map" them to each other? Lets align our probabilities (lets call them L, M, and H for brevity) along the numbers line the following way:
0__________________L______________L+M_________________________L+M+H ( = 1)
Now taking our randomly generated number x we can say that
If x lies in interval [0, L] then the first event occurred.
If x lies in half-interval (L, L+M] then the second event occurred.
If x lies in half-interval (L+M, L+M+H] then the third event occurred.
The code you are asking about simply matches x to one of the intervals and returns the corresponding event (or level).

Random selection ideas

I am thinking of giving one or more set of introductory lectures to introduce people in my department with Python and related scientific tools as I did once in the last summer py4science # UND.
To make the meetings more interesting and catch more attention I had given two Python learning materials to one of the lucky audience via the shown ways:
1-) Get names and assign with a number and pick the first one as the winner from the assigned dictionary.
import random
lucky = {1:'Lucky1',...}
random.choice(lucky.keys())
2-) Similar to the previous one but pop items from the dictionary, thus the last one becomes the luckiest.
import random
lucky = {1:'Lucky1',...}
lucky.pop(random.choice(lucky.keys()))
Right now, I am looking at least for one more idea that will have randomness inherently and demonstrate a useful language feature helping me to make a funnier lottery time at the end of one of the sessions.

Cards are also a source of popular (and familiar!) games of chance.
Perhaps you could show how easy it is to generate, shuffle and sample cards:
#!/usr/bin/env python
import random
import itertools
numname={1:'Ace',11:'Jack',12:'Queen',13:'King'}
suits=['Clubs','Diamonds','Hearts','Spades']
numbers=range(1,14)
cards=['%s-%s'%(numname.get(number,number),suit)
for number,suit in itertools.product(numbers,suits)]
print(cards)
random.shuffle(cards)
print(cards)
hand=random.sample(cards,5)
print(hand)

One of the cutest uses of random numbers for mid-sized crowds is finding cycles. I will describe the physical method, and then some explorations. The Python code is fairly trivial.
Start with your group of about 100 people with their names on pieces of paper in a bowl. Everyone descends on the bowl and takes a random piece of paper. Each person goes to the person with that name. This leads to groups clumping together in various sizes. Not always what people expect.
For example, if Alice picks Bob, Bob picks Charlie, and Charlie picks Alice, then these three people will end up in their own clump. For some groups, have people join hands with their matches to see everyone being pulled this way and that. Also to see how the matches create chains or clumps.
Now write software to watch the number of clumps. Do the match on clumps, asking, for example, "how often is the biggest clump less than half the people"? For example, for N students, an average of 1/N will draw their own names.
Do you need code?

Computing Pi is always fun ;-)
import random
def approx_pi( n ):
# n random (x,y) pairs (as a generator)
data = ( (random.random(),random.random()) for _ in range(n) )
return 4.0*sum( 1 for x,y in data if x**2 + y**2 < 1 )/n
print approx_pi(100000)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Chinese Restaurant Process implementation in Python - python

Related

Python convert list into split lists

Minimum and Maximum query not working properly (Python 3.5)

Adding +1 to specific matrix elements

Random number based on probability python

Random selection ideas

Categories

Resources