Rosalind: Mendel's first law

Rosalind: Mendel's first law - python

I'm trying to solve the problem at http://rosalind.info/problems/iprb/
Given: Three positive integers k, m, and n, representing a population
containing k+m+n organisms: k individuals are homozygous dominant for
a factor, m are heterozygous, and n are homozygous recessive.
Return: The probability that two randomly selected mating organisms
will produce an individual possessing a dominant allele (and thus
displaying the dominant phenotype). Assume that any two organisms can
mate.
My solution works for the sample, but not for any problems generated. After further research it seems that I should find the probability of choosing any one organism at random, find the probability of choosing the second organism, and then the probability of that pairing producing offspring with a dominant allele.
My question is then: what does my code below find the probability of? Does it find the percentage of offspring with a dominant allele for all possible matings -- so rather than the probability of one random mating, my code is solving for the percentage of offspring with dominant alleles if all pairs were tested?
f = open('rosalind_iprb.txt', 'r')
r = f.read()
s = r.split()
############# k = # homozygotes dominant, m = #heterozygotes, n = # homozygotes recessive
k = float(s[0])
m = float(s[1])
n = float(s[2])
############# Counts for pairing between each group and within groups
k_k = 0
k_m = 0
k_n = 0
m_m = 0
m_n = 0
n_n = 0
##############
if k > 1:
k_k = 1.0 + (k-2) * 2.0
k_m = k * m
k_n = k * n
if m > 1:
m_m = 1.0 + (m-2) * 2.0
m_n = m * n
if n> 1:
n_n = 1.0 + (n-2) * 2.0
#################
dom = k_k + k_m + k_n + 0.75*m_m + 0.5*m_n
total = k_k + k_m + k_n + m_m + m_n + n_n
chance = dom/total
print chance

Looking at your code, I'm having a hard time figuring out what it's supposed to do. I'll work through the problem here.
Let's simplify the wording. There are n1 type 1, n2 type 2, and n3 type 3 items.
How many ways are there to choose a set of size 2 out of all the items? (n1 + n2 + n3) choose 2.
Every pair of items will have item types corresponding to one of the six following unordered multisets: {1,1}, {2,2}, {3,3}, {1,2}, {1,3}, {2,3}
How many multisets of the form {i,i} are there? ni choose 2.
How many multisets of the form {i,j} are there, where i != j? ni * nj.
The probabilities of the six multisets are thus the following:
P({1,1}) = [n1 choose 2] / [(n1 + n2 + n3) choose 2]
P({2,2}) = [n2 choose 2] / [(n1 + n2 + n3) choose 2]
P({3,3}) = [n3 choose 2] / [(n1 + n2 + n3) choose 2]
P({1,2}) = [n1 * n2] / [(n1 + n2 + n3) choose 2]
P({1,3}) = [n1 * n3] / [(n1 + n2 + n3) choose 2]
P({2,3}) = [n2 * n3] / [(n1 + n2 + n3) choose 2]
These sum to 1. Note that [X choose 2] is just [X * (X - 1) / 2] for X > 1 and 0 for X = 0 or 1.
Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype).
To answer this question, you simply need to identify which of the six multisets correspond to this event. Lacking the genetics knowledge to answer that question, I'll leave that to you.
For example, suppose that a dominant allele results if either of the two parents was type 1. Then the events of interest are {1,1}, {1,2}, {1,3} and the probability of the event is P({1,1}) + P({1,2}) + P({1,3}).

I spend some time in this question, so, to clarify in python:
lst = ['2', '2', '2']
k, m, n = map(float, lst)
t = sum(map(float, lst))
# organize a list with allele one * allele two (possibles) * dominant probability
# multiplications by one were ignored
# remember to substract the haplotype from the total when they're the same for the second haplotype choosed
couples = [
k*(k-1), # AA x AA
k*m, # AA x Aa
k*n, # AA x aa
m*k, # Aa x AA
m*(m-1)*0.75, # Aa x Aa
m*n*0.5, # Aa x aa
n*k, # aa x AA
n*m*0.5, # aa x Aa
n*(n-1)*0 # aa x aa
]
# (t-1) indicate that the first haplotype was select
print(round(sum(couples)/t/(t-1), 5))

If you are interested, I just found a solution and put it in C#.
public double mendel(double k, double m, double n)
{
double prob;
prob = ((k*k - k) + 2*(k*m) + 2*(k*n) + (.75*(m*m - m)) + 2*(.5*m*n))/((k + m + n)*(k + m + n -1));
return prob;
}
Our parameters are k (dominant), m (hetero), & n (recessive).
First I found the probability for each possible breeding pair selection in terms of percentage of the population. So, a first round choice for k would look like k/(k+m+n), and a second round choice of k after a first round choice of k would look like (k-1)/(k+m+n). Then multiply these two to get the outcome. Since there were three identified populations, there were nine possible outcomes.
Then I multiplied each outcome by it's dominance probability - 100% for anything with k, 75% for m&m, 50% for m&n, n&m, and 0% for n&n. Now add the outcomes together, and you have your solution.
http://rosalind.info/problems/iprb/

Here is the code I did in python:
We don't want the offspring to be completely recessive, so we should make the probability tree and look at the cases and the probabilities of the cases that event might happen.
Then the probability that we want is 1 - p_reccesive. More explanation is provided in the comment section of the following code.
"""
Let d: dominant, h: hetero, r: recessive
Let a = k+m+n
Let X = the r.v. associated with the first person randomly selected
Let Y = the r.v. associated with the second person randomly selected without replacement
Then:
k = f_d => p(X=d) = k/a => p(Y=d| X=d) = (k-1)/(a-1) ,
p(Y=h| X=d) = (m)/(a-1) ,
p(Y=r| X=d) = (n)/(a-1)
m = f_h => p(X=h) = m/a => p(Y=d| X=h) = (k)/(a-1) ,
p(Y=h| X=h) = (m-1)/(a-1)
p(Y=r| X=h) = (n)/(a-1)
n = f_r => p(X=r) = n/a => p(Y=d| X=r) = (k)/(a-1) ,
p(Y=h| X=r) = (m)/(a-1) ,
p(Y=r| X=r) = (n-1)/(a-1)
Now the joint would be:
| offspring possibilites given X and Y choice
-------------------------------------------------------------------------
X Y | P(X,Y) | d(dominant) h(hetero) r(recessive)
-------------------------------------------------------------------------
d d k/a*(k-1)/(a-1) | 1 0 0
d h k/a*(m)/(a-1) | 1/2 1/2 0
d r k/a*(n)/(a-1) | 0 1 0
|
h d m/a*(k)/(a-1) | 1/2 1/2 0
h h m/a*(m-1)/(a-1) | 1/4 1/2 1/4
h r m/a*(n)/(a-1) | 0 1/2 1/2
|
r d n/a*(k)/(a-1) | 0 0 0
r h n/a*(m)/(a-1) | 0 1/2 1/2
r r n/a*(n-1)/(a-1) | 0 0 1
Here what we don't want is the element in the very last column where the offspring is completely recessive.
so P = 1 - those situations as follow
"""
path = 'rosalind_iprb.txt'
with open(path, 'r') as file:
lines = file.readlines()
k, m, n = [int(i) for i in lines[0].split(' ')]
a = k + m + n
p_recessive = (1/4*m*(m-1) + 1/2*m*n + 1/2*m*n + n*(n-1))/(a*(a-1))
p_wanted = 1 - p_recessive
p_wanted = round(p_wanted, 5)
print(p_wanted)

Related

How to slice array quickly based on conditions?

I have a giant nested for loop....10 in all, but for illustration here, i am including 6. I am doing a summation (over multiple indices; the incides are not independent!). The index in any inner for loop depends on the index of the outer loop (except for one instance). The inner-most loop contains an operation where i slice an array (named 'w') based on 8 different conditions all combined using '&' and '|'. There is also this 'HB' function that takes as an argument this sliced array (named 'wrange'), performs some operations on it and returns an array of the same size.
The timescale for this slicing and the 'HB' function to execute is 300-400 microseconds and 100 microseconds respectively. I need to bring it down drastically. To nanoseconds.!!
Tried using dictionary instead of array (where i am slicing). It is much slower. Tried storing the sliced array for all possible values. That is a very huge computation in its own right since there are many many possible combinations of the conditions (these conditions depend indirectly on the indices of the for loop)
s goes from 1 to 49
t goes from -s to s
and there are 641 combinations of l,n
Here, i have posted one value of s,t and an l,n combination for illustration.
s = 7
t = -7
l = 72
n = 12
Nl = Dictnorm[n,l]
Gamma_l = Dictfwhm[n,l]
Dictc1 = {}
Dictc2 = {}
Dictwrange = {}
DictH = {}
DictG = {}
product = []
startm = max(-l-t,-l)
endm = min(l-t,l)+1
sum5 = 0
for sp in range(s-2,s+3): #s'
sum4 = 0
for tp in range(-sp,-sp+1): #t'
#print(tp)
sum3 = 0
integral = 1
for lp in range(l-2,l+3): #l'
sum2 = 0
if (n,lp) in Dictknl2.keys():
N1 = Dictnorm[n,lp]
Gamma_1 = Dictfwhm[n,lp]
for lpp in range(l-2,l+3): #l"
sum1 = 0
if ((sp+lpp-lp)%2 == 1 and sp>=abs(lpp-lp) and
lp>=abs(sp-lpp) and lpp>=abs(sp-lp) and
(n,lpp) in Dictknl2.keys()):
F = f(lpp,lp,sp)
N2 = Dictnorm[n,lpp]
Gamma_2 = Dictfwhm[n,lpp]
for m in range(startm, endm): #m
sum0 = 0
L1 = LKD(n,l,m,l,m)
L2 = LKD(n,l,m+t,l,m+t)
for mp in range(max(m+t-tp-5,m-5),
min(m+5,m+t-tp+5)+1): #m'
if (abs(mp)<=lp and abs(mp)<=lpp and
abs(mp+tp)<=lp and abs(mp+tp)<=lpp
and LKD(n,l,m,lp,mp)!=0
and LKD(n,l,m+t,lpp,mp+tp)!=0):
c3 = Dictomega[n,lp,mp+tp]
c4 = Dictomega[n,lpp,mp]
wrange = np.unique(np.concatenate
((Dictwrange[m],
w[((w>=(c3-Gamma_1))&
((c3+Gamma_1)>=w))|
((w>=(c4-Gamma_2))&
((c4+Gamma_2)>=w))])))
factor = (sum(
HB(Dictc1[n,l,m+t],
Dictc2[n,l,m],Nl,
Nl,Gamma_l,
Gamma_l,wrange,
Sigma).conjugate()
*HB(c3,c4,N1,N2,Gamma_1,
Gamma_2,wrange,0)*L1*L2)
*LKD(n,l,m,lp,mp)
*LKD(n,l,m+t,lpp,mp+tp) *DictG[m]
*gamma(lpp,sp,lp,tp,mp)
*F)
sum0 = sum0 + factor #sum over m'
sum1 = sum1 + sum0 #sum over m
sum2 = sum2 + sum1 #sum over l"
sum3 = sum3 + sum2 #sum over l'
sum4 = sum4 + sum3*integral #sum over t'
sum5 = sum5 + sum4 #sum over s'
z = (1/(sum(product)))*sum5
print(z.real,z.imag,l,n)
TL;DR
def HB(a,...f,array1): #########timesucker
perform_some_operations_on_array1_using_a_b_c_d
return operated_on_array1
for i in ():
for j in ():
...
...
for o in ():
array1 = w[w>some_function1(i,j,..k) &
w<some_function2(i,j,..k) |.....] #########timesucker
factor = HB(a,....f,array1) * HB(g,...k,array1) *
alpha*beta*gamma....
It takes about 30 seconds to run this whole section once. I need to bring it down to as low as possible. 1 second is the minimum target

Python Gtin 8 Code not totaling

Hi I have been experimenting for some time to try and total 7 variables at once. I am trying to calculate the 8th number for GTIN 8 codes. I have tried many things and so far I am using float. I Don't know what it does but people say use it. I need to times the 1,3,5,7 number by 3 and 2,4,6 number by 1. Then find the total of all of them added together. I have looked everywhere and I cant find anything. Anything will help. Thanks Ben
code = input ("enter 7 digit code? ")
sum1 = 3 * (code[0] + ',')
sum2 = code[1] + ','
sum3 = 3 * (code[2] + ',')
sum4 = code[3] + ','
sum5 = 3 * (code[4] + ',')
sum6 = code[5] + ','
sum7 = 3 * (code[6] + ',')
checksum_value = sum1 + sum2 + sum3+ sum4 + sum5+ sum6 + sum7
b = str(checksum_value)
print(b)

Quick solution:
x = "1234567"
checksum_value = sum(int(v) * 3 if i in (0,2,4,6) else int(v) for (i, v) in enumerate(x[:7]))
# (1*3) + 2 + (3*3) + 4 + (5*3) + 6 + (7*3)
# ==
# 3 + 2 + 9 + 4 + 15 + 6 + 21
# ==
# sum(int(v) * 3 if i in (0,2,4,6) else int(v) for (i, v) in enumerate(x[:7]))
Explanation:
# Sum the contained items
sum(
# multiply by three if the index is 0,2,4 or 6
int(v) * 3 if i in (0,2,4,6) else int(v)
# grab our index `i` and value `v` from `enumerate()`
for (i, v) in
# Provide a list of (index, value) from the iterable
enumerate(
# use the first 7 elements
x[:7]
)
)

`enter code here`code = input ("enter 7 digit code? ")
sum1 = 3 * (code[0] + ',')
sum2 = code[1] + ','
sum3 = 3 * (code[2] + ',')
sum4 = code[3] + ','
sum5 = 3 * (code[4] + ',')
sum6 = code[5] + ','
sum7 = 3 * (code[6] + ',')
checksum_value = sum1 + sum2 + sum3+ sum4 + sum5+ sum6 + sum7
b = str(checksum_value)
print(b)

GS1 codes come in different lengths, ranging from GTIN-8 (8 digits) to SSCC (2 digit application ID + 18 digits). Here's a simple, general Python formula that works for any length GS1 identifier:
cd = lambda x: -sum(int(v) * [3,1][i%2] for i, v in enumerate(str(x)[::-1])) % 10
Explanation:
Convert input to string, so input can be numeric or string - just a convenience factor.
Reverse the string - simple way to align the 3x/1x pattern with variable-length input.
The weighting factor is selected based on odd and even input character position by calculating i mod 2. The last character in the input string (i=0 after the string has been reversed) gets 3x.
Calculate the negative weighted sum mod 10. Equivalent to the (10 - (sum mod 10)) mod 10 approach you'd get if you follow the GS1 manual calculation outline exactly, but that's ugly.
Test Cases
## GTIN-8
>>> cd(1234567)
0
>>> cd(9505000)
3
## GTIN-12
>>> cd(71941050001)
6
>>> cd('05042833241')
2
## GTIN-13
>>> cd(900223631103)
6
>>> cd(501234567890)
0
## GTIN-14
>>> cd(1038447886180)
4
>>> cd(1001234512345)
7
## SSCC (20 digits incl. application identifier)
>>> cd('0000718908562723189')
6
>>> cd('0037612345000001009')
1

Rosalind "Mendel's First Law" IPRB

As preparation for an upcoming bioinformatics course, I am doing some assignments from rosalind.info. I am currently stuck in the assignment "Mendel's First Law".
I think I could brute force myself through this, but that somehow my thinking must be too convoluted. My approach would be this:
Build a tree of probabilities which has three levels. There are two creatures that mate, creature A and creature B. First level is, what is the probability for picking as creature A homozygous dominant (k), heterozygous (m) or homozygous recessive (n). It seems that for example for homozygous dominant, since there are a total of (k+m+n) creatures and k of them are homozygous dominant, the probability is k/(k+m+n).
Then in this tree, under each of these would come the probability of creature B being k / m / n given that we know what creature A got picked as. For example if creature A was picked to be heterozygous (m), then the probability that creature B would also be heterozygous is (m-1)/(k+m+n-1) because there is now one less heterozygous creature left.
This would give the two levels of probabilities, and would involve a lot of code just to get this far, as I would literally be building a tree structure and for each branch have manually written code for that part.
Now after choosing creatures A and B, each of them has two chromosomes. One of these chromosomes can randomly be picked. So for A chromosome 1 or 2 can be picked and same for B. So there are 4 different options: pick 1 of A, 1 of B. Pick 2 of A, 1 of B. Pick 1 of A, 2 of B. Pick 2 of A, 2 of B. The probability of each of these would be 1/4. So finally this tree would have these leaf probabilities.
Then from there somehow by magic I would add up all of these probabilities to see what is the probability that two organisms would produce a creature with a dominant allele.
I doubt that this assignment was designed to take hours to solve. What am I thinking too hard?
Update:
Solved this in the most ridiculous brute-force way possible. Just ran thousands of simulated matings and figured out the portion that ended up having a dominant allele, until there was enough precision to pass the assignment.
import random
k = 26
m = 18
n = 25
trials = 0
dominants = 0
while True:
s = ['AA'] * k + ['Aa'] * m + ['aa'] * n
first = random.choice(s)
s.remove(first)
second = random.choice(s)
has_dominant_allele = 'A' in [random.choice(first), random.choice(second)]
trials += 1
if has_dominant_allele:
dominants += 1
print "%.5f" % (dominants / float(trials))

Species with dominant alleles are either AA or Aa.
Your total ppopulation (k + n + m consists of k (hom) homozygous dominant organisms with AA, m (het) heterozygous dominant organisms with Aa and n (rec) homozygous recessive organisms with aa. Each of these can mate with any other.
The probability for organisms with the dominant allele is:
P_dom = n_dominant/n_total or 1 - n_recessive/n_total
Doing the Punnett squares for each of these combinations is not a bad idea:
hom + het
| A | a
-----------
A | AA | Aa
a | Aa | aa
het + rec
| a | a
-----------
A | Aa | Aa
a | aa | aa
Apparently, mating of of two organisms results in four possible children. hom + het yields 1 of 4 organisms with the recessive allele, het + rec yields 2 of 4 organisms with the recessive allele.
You might want to do that for the other combinations as well.
Since we're not just mating the organisms one on one, but throw together a whole k + m + n bunch, the total number of offspring and the number of 'children' with a particular allele would be nice to know.
If you don't mind a bit of Python, comb from scipy.misc might be helpful here. in the calculation, don't forget (a) that you get 4 children from each combination and (b) that you need a factor (from the Punnett squares) to determine the recessive (or dominant) offspring from the combinations.
Update
# total population
pop_total = 4 * comb(hom + het + rec, 2)
# use PUNNETT squares!
# dominant organisms
dom_total = 4*comb(hom,2) + 4*hom*het + 4*hom*rec + 3*comb(het,2) + 2*het*rec
# probability for dominant organisms
phom = dom_total/pop_total
print phom
# probability for dominant organisms +
# probability for recessive organisms should be 1
# let's check that:
rec_total = 4 * comb(rec, 2) + 2*rec*het + comb(het, 2)
prec = totalrec/totalpop
print 1 - prec

This is more a probability/counting question than coding. It's easier to calculate the probability of an offspring having only recessive traits first. Let me know if you have any trouble understanding anything. I ran the following code and my output passed the rosalind grader.
def mendel(x, y, z):
#calculate the probability of recessive traits only
total = x+y+z
twoRecess = (z/total)*((z-1)/(total-1))
twoHetero = (y/total)*((y-1)/(total-1))
heteroRecess = (z/total)*(y/(total-1)) + (y/total)*(z/(total-1))
recessProb = twoRecess + twoHetero*1/4 + heteroRecess*1/2
print(1-recessProb) # take the complement
#mendel(2, 2, 2)
with open ("rosalind_iprb.txt", "r") as file:
line =file.readline().split()
x, y, z= [int(n) for n in line]
print(x, y, z)
file.close()
print(mendel(x, y, z))

Klaus's solution has most of it correct; however, the error occurs when calculating the number of combinations that have at least one dominant allele. This part is incorrect, because while there are 4 possibilities when combining 2 alleles to form an offspring, only one possibility is actually executed. Therefore, Klaus's solution calculates a percentage that is markedly higher than it should be.
The correct way to calculate the number of combos of organisms with at least one dominant allele is the following:
# k = number of homozygous dominant organisms
# n = number of heterozygous organisms
# m = number of homozygous recessive organisms
dom_total = comb(k, 2) + k*m + k*n + .5*m*n + .75*comb(m, 2)
# Instead of:
# 4*comb(k,2) + 4*k*n + 4*k*m + 3*comb(n,2) + 2*n*m
The above code segment works for calculating the total number of dominant combos because it multiplies each part by the percentage (100% being 1) that it will produce a dominant offspring. You can think of each part as being the number of punnet squares for combos of each type (k&k, k&m, k&n, m&n, m&m).
So the entire correct code segment would look like this:
# Import comb (combination operation) from the scipy library
from scipy.special import comb
def calculateProbability(k, m, n):
# Calculate total number of organisms in the population:
totalPop = k + m + n
# Calculate the number of combos that could be made (valid or not):
totalCombos = comb(totalPop, 2)
# Calculate the number of combos that have a dominant allele therefore are valid:
validCombos = comb(k, 2) + k*m + k*n + .5*m*n + .75*comb(m, 2)
probability = validCombos/totalCombos
return probability
# Example Call:
calculateProbability(2, 2, 2)
# Example Output: 0.783333333333

You dont need to run thousands of simulations in while loop. You can run one simulation, and calculate probability from it results.
from itertools import product
k = 2 # AA homozygous dominant
m = 2 # Aa heterozygous
n = 2 # aa homozygous recessive
population = (['AA'] * k) + (['Aa'] * m) + (['aa'] * n)
all_children = []
for parent1 in population:
# remove selected parent from population.
chosen = population[:]
chosen.remove(parent1)
for parent2 in chosen:
# get all possible children from 2 parents. Punnet square
children = product(parent1, parent2)
all_children.extend([''.join(c) for c in children])
dominants = filter(lambda c: 'A' in c, all_children)
# float for python2
print float(len(list(dominants))) / len(all_children)
# 0.7833333

Here I am adding my answer to explain it more clearly:
We don't want the offspring to be completely recessive, so we should make the probability tree and look at the cases and the probabilities of the cases that event might happen.
Then the probability that we want is 1 - p_reccesive. More explanation is provided in the comment section of the following code.
"""
Let d: dominant, h: hetero, r: recessive
Let a = k+m+n
Let X = the r.v. associated with the first person randomly selected
Let Y = the r.v. associated with the second person randomly selected without replacement
Then:
k = f_d => p(X=d) = k/a => p(Y=d| X=d) = (k-1)/(a-1) ,
p(Y=h| X=d) = (m)/(a-1) ,
p(Y=r| X=d) = (n)/(a-1)
m = f_h => p(X=h) = m/a => p(Y=d| X=h) = (k)/(a-1) ,
p(Y=h| X=h) = (m-1)/(a-1)
p(Y=r| X=h) = (n)/(a-1)
n = f_r => p(X=r) = n/a => p(Y=d| X=r) = (k)/(a-1) ,
p(Y=h| X=r) = (m)/(a-1) ,
p(Y=r| X=r) = (n-1)/(a-1)
Now the joint would be:
| offspring possibilites given X and Y choice
-------------------------------------------------------------------------
X Y | P(X,Y) | d(dominant) h(hetero) r(recessive)
-------------------------------------------------------------------------
d d k/a*(k-1)/(a-1) | 1 0 0
d h k/a*(m)/(a-1) | 1/2 1/2 0
d r k/a*(n)/(a-1) | 0 1 0
|
h d m/a*(k)/(a-1) | 1/2 1/2 0
h h m/a*(m-1)/(a-1) | 1/4 1/2 1/4
h r m/a*(n)/(a-1) | 0 1/2 1/2
|
r d n/a*(k)/(a-1) | 0 0 0
r h n/a*(m)/(a-1) | 0 1/2 1/2
r r n/a*(n-1)/(a-1) | 0 0 1
Here what we don't want is the element in the very last column where the offspring is completely recessive.
so P = 1 - those situations as follow
"""
path = 'rosalind_iprb.txt'
with open(path, 'r') as file:
lines = file.readlines()
k, m, n = [int(i) for i in lines[0].split(' ')]
a = k + m + n
p_recessive = (1/4*m*(m-1) + 1/2*m*n + 1/2*m*n + n*(n-1))/(a*(a-1))
p_wanted = 1 - p_recessive
p_wanted = round(p_wanted, 5)
print(p_wanted)

I just found the formula for the answer. You have 8 possible mating interactions that can yield a dominant offspring:
DDxDD, DDxDd, DdxDD, DdxDd, DDxdd, ddxDD, Ddxdd, ddxDd
With the respective probabilities of producing dominant offspring of:
1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 0.5, 0.5
Initially it seemed odd to me that DDxdd and ddxDD were two separate mating events, but if you think about it they are slightly different conceptually. The probability of DDxdd is k/(k+m+n) * n/((k-1)+m+n) and the probability of ddxDD is n/(k+m+n) * k/(k+m+(n-1)). Mathematically these are identical, but speaking from a probability stand point these are two separate events. So your total probability is the sum of the probabilities of each of these different mating events multiplied by the probability of that mating event producing a dominant offspring. I won't simplify it here step by step but that gives you the code:
total_probability = ((k ** 2 - k) + (2 * k * m) + (3 / 4 * (m ** 2 - m)) + (2* k * n) + (m * n)) / (total_pop ** 2 - total_pop)
All you need to do is plug in your values of k, m, and n and you'll get the probability they ask for.

I doubt that this assignment was designed to take hours to solve. What am I thinking too hard?
I also had the same question. After reading the whole thread, I came up with the code.
I hope the code itself will explain the probability calculation:
def get_prob_of_dominant(k, m, n):
# A - dominant factor
# a - recessive factor
# k - amount of organisms with AA factors (homozygous dominant)
# m - amount of organisms with Aa factors (heterozygous)
# n - amount of organisms with aa factors (homozygous recessive)
events = ['AA+Aa', 'AA+aa', 'Aa+aa', 'AA+AA', 'Aa+Aa', 'aa+aa']
# get the probability of dominant traits (set up Punnett square)
punnett_probabilities = {
'AA+Aa': 1,
'AA+aa': 1,
'Aa+aa': 1 / 2,
'AA+AA': 1,
'Aa+Aa': 3 / 4,
'aa+aa': 0,
}
event_probabilities = {}
totals = k + m + n
# Event: AA+Aa -> P(X=k, Y=m) + P(X=m, Y=k):
P_km = k / totals * m / (totals - 1)
P_mk = m / totals * k / (totals - 1)
event_probabilities['AA+Aa'] = P_km + P_mk
# Event: AA+aa -> P(X=k, Y=n) + P(X=n, Y=k):
P_kn = k / totals * n / (totals - 1)
P_nk = n / totals * k / (totals - 1)
event_probabilities['AA+aa'] = P_kn + P_nk
# Event: Aa+aa -> P(X=m, Y=n) +P(X=n, Y=m):
P_mn = m / totals * n / (totals - 1)
P_nm = n / totals * m / (totals - 1)
event_probabilities['Aa+aa'] = P_mn + P_nm
# Event: AA+AA -> P(X=k, Y=k):
P_kk = k / totals * (k - 1) / (totals - 1)
event_probabilities['AA+AA'] = P_kk
# Event: Aa+Aa -> P(X=m, Y=m):
P_mm = m / totals * (m - 1) / (totals - 1)
event_probabilities['Aa+Aa'] = P_mm
# Event: aa+aa -> P(X=n, Y=n) + P(X=n, Y=n) = 0 (will be * 0, so just don't use it)
event_probabilities['aa+aa'] = 0
# Total probability is the sum of (prob of dominant factor * prob of the event)
total_probability = 0
for event in events:
total_probability += punnett_probabilities[event] * event_probabilities[event]
return round(total_probability, 5)

Approximate periods of strings - port Python code to F#

Given two strings u and v we can compute the edit distance using the popular Levenshtein algorithm. Using a method introduced in [1] by Sim et al. I was able to compute k-approximate periods of strings in Python with the following code
def wagnerFischerTable(a, b):
D = [[0]]
[D.append([i]) for i, s in enumerate(a, 1)]
[D[0].append(j) for j, t in enumerate(b, 1)]
for j, s in enumerate(b, 1):
for i, t in enumerate(a, 1):
if s == t:
D[i].append(D[i-1][j-1])
else:
D[i].append(
min(
D[i-1][j] + 1,
D[i][j-1] + 1,
D[i-1][j-1] +1
)
)
return D
def simEtAlTables(s, p):
D = []
for i in xrange(len(s)):
D.append(wagnerFischerTable(p, s[i:]))
return D
def approx(s, p):
D = simEtAlTables(s, p)
t = [0]
for i in xrange(1, len(s)+1):
cmin = 9000
for h in xrange(0, i):
cmin = min(
cmin,
max(t[h], D[h][-1][i-h])
)
t.append(cmin)
return t[len(s)]
I wanted to port this to F# however I wasn't successful yet and I am looking forward to get some feedback what might be wrong.
let inline min3 x y z =
min (min x y) z
let wagnerFischerTable (u: string) (v: string) =
let m = u.Length
let n = v.Length
let d = Array2D.create (m + 1) (n + 1) 0
for i = 0 to m do d.[i, 0] <- i
for j = 0 to n do d.[0, j] <- j
for j = 1 to n do
for i = 1 to m do
if u.[i-1] = v.[j-1] then
d.[i, j] <- d.[i-1, j-1]
else
d.[i, j] <-
min3
(d.[i-1, j ] + 1) // a deletion
(d.[i , j-1] + 1) // an insertion
(d.[i-1, j-1] + 1) // a substitution
d
let simEtAlTables (u: string) (v: string) =
let rec tabulate n lst =
if n <> u.Length then
tabulate (n+1) (lst # [wagnerFischerTable (u.Substring(n)) v])
else
lst
tabulate 0 []
let approx (u: string) (v: string) =
let tables = simEtAlTables u v
let rec kApprox i (ks: int list) =
if i = u.Length + 1 then
ks
else
let mutable curMin = 9000
for h = 0 to i-1 do
curMin <- min curMin (max (ks.Item h) ((tables.Item h).[i-h, v.Length - 1]))
kApprox (i+1) (ks # [curMin])
List.head (List.rev (kApprox 1 [0]))
The reason why it "doesn't work" is just that I am getting wrong values. The Python code passes all test cases while the F# code fails every test. I presume that I have errors in the functions simEtAlTables and/or approx. Probably something with the indices, especially accessing the three dimensional list of table in approx.
So here are three test cases which should cover different results:
Test 1: approx "abcdabcabb" "abc" -> 1
Test 2: approx "abababababab" "ab" -> 0
Test 3: approx "abcdefghijklmn" "xyz" -> 3
[1] http://www.lirmm.fr/~rivals/ALGOSEQ/DOC/SimApprPeriodsTCS262.pdf

This isn't functional in the least (neither is your Python solution), but here's a more direct translation to F#. Maybe you can use it as a starting point and make it more functional from there (although I'll hazard a guess it won't improve performance).
let wagnerFischerTable (a: string) (b: string) =
let d = ResizeArray([ResizeArray([0])])
for i = 1 to a.Length do d.Add(ResizeArray([i]))
for j = 1 to b.Length do d.[0].Add(j)
for j = 1 to b.Length do
for i = 1 to a.Length do
let s, t = b.[j-1], a.[i-1]
if s = t then
d.[i].Add(d.[i-1].[j-1])
else
d.[i].Add(
Seq.min [
d.[i-1].[j] + 1
d.[i].[j-1] + 1
d.[i-1].[j-1] + 1
])
d
let simEtAlTables (s: string) (p: string) =
let d = ResizeArray()
for i = 0 to s.Length - 1 do
d.Add(wagnerFischerTable p s.[i..])
d
let approx (s: string) (p: string) =
let d = simEtAlTables s p
let t = ResizeArray([0])
for i = 1 to s.Length do
let mutable cmin = 9000
for h = 0 to i - 1 do
let dh = d.[h]
cmin <- min cmin (max t.[h] dh.[dh.Count-1].[i-h])
t.Add(cmin)
t.[s.Length]

This code may help:
let levenshtein word1 word2 =
let preprocess = fun (str : string) -> str.ToLower().ToCharArray()
let chars1, chars2 = preprocess word1, preprocess word2
let m, n = chars1.Length, chars2.Length
let table : int[,] = Array2D.zeroCreate (m + 1) (n + 1)
for i in 0..m do
for j in 0..n do
match i, j with
| i, 0 -> table.[i, j] <- i
| 0, j -> table.[i, j] <- j
| _, _ ->
let delete = table.[i-1, j] + 1
let insert = table.[i, j-1] + 1
//cost of substitution is 2
let substitute =
if chars1.[i - 1] = chars2.[j - 1]
then table.[i-1, j-1] //same character
else table.[i-1, j-1] + 2
table.[i, j] <- List.min [delete; insert; substitute]
table.[m, n], table //return tuple of the table and distance
//test
levenshtein "intention" "execution" //|> ignore
You might also want to check this blog posting from Rick Minerich.

Trying to create a heatmap for a planet wars bot which shows which army has the most influence

I'm trying to create a heatmap for a planetwars bot which indicates which influence a planet is under.
The initial map looks like: http://imgur.com/a/rPVnl#0
Ideally the Red planet should have a value of -1, the Blue planet should have a value of 1, and the planet marked 1 should have a value of 0. (Or 0 to 1, mean of 0.5 would work)
My initial analysis code is below, but the results it outputs are between 0.13 and 7.23.
for p in gameinfo.planets: #gameinfo.planets returns {pid:planet_object}
planet = gameinfo.planets[p]
own_value = 1
for q in gameinfo.my_planets.values():
if q != planet:
q_value = q.num_ships / planet.distance_to(q)
own_value = own_value + q_value
enemy_value = 1
for q in gameinfo.enemy_planets.values():
if q != planet:
q_value = q.num_ships / planet.distance_to(q)
enemy_value = enemy_value + q_value
self.heatmap[p] = own_value/enemy_value
I've also tried to add some code to curb the range from 0 to 1
highest = self.heatmap.keys()[0]
lowest = self.heatmap.keys()[0]
for p in gameinfo.planets.keys():
if self.heatmap[p] > highest:
highest = self.heatmap[p]
elif self.heatmap[p] < lowest:
lowest = self.heatmap[p]
map_range = highest-lowest
for p in gameinfo.planets.keys():
self.heatmap[p] = self.heatmap[p]/map_range
self.heatmap_mean = sum(self.heatmap.values(), 0.0) / len(self.heatmap)
The values ended up between 0 and 1, but the mean was 0.245? (Also the values actually ranged from 0.019 to 1.019).

I've solved my problem, this is what the solution looks like.
#HEATMAP ANALYSIS
for p in gameinfo.planets:
ave_self_value = 0
for q in gameinfo.my_planets:
if q != p:
ave_self_value = ave_self_value + (self.planet_distances[p][q] * gameinfo.planets[q].num_ships / self.own_strength)
ave_enemy_value = 0
for q in gameinfo.enemy_planets:
if q != p:
ave_enemy_value = ave_enemy_value + (self.planet_distances[p][q] * gameinfo.planets[q].num_ships / self.enemy_strength)
self.heatmap[p] = ave_enemy_value - ave_self_value
hmin, hmax = min(self.heatmap.values()), max(self.heatmap.values())
for h in self.heatmap.keys():
self.heatmap[h] = 2 * (self.heatmap[h] - hmin) / (hmax - hmin) - 1
self.heatmap_mean = sum(self.heatmap.values(), 0.0) / len(self.heatmap)
#END HEATMAP ANALYSIS

for p in foo:
...
...
for q in bar:
...
if q != p:
q_value = some_value / another_value
own_value = own_value + q_value
Apologies for the gross simplification. Say foo is [1, 2, 3, 4, 5] and bar is [1, 5].
First time through, p is 1. q takes 1, so q==p. Next, q takes 5, now q!=p, own_value accumulates the q_value which I presume is some positive number less than one.
But second time through, p is 2. q takes 1, so q!=p, so own_value goes up by some fraction of one. Then q takes 5, so q!=p still, so own_value goes up by that same fraction again. This is where the problem lies: (some_value / another_value) + (some_value / another_value) breaks the -1 to 1 scale. You get 7.23 sometimes because that's how many times q did not equal p.
There is nothing in the
for x in foo:
for y in bar:
construction that cares about normalising for x in foo - just for q in bar.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Rosalind: Mendel's first law - python

Related

How to slice array quickly based on conditions?

Python Gtin 8 Code not totaling

Rosalind "Mendel's First Law" IPRB

Approximate periods of strings - port Python code to F#

Trying to create a heatmap for a planet wars bot which shows which army has the most influence

Categories

Resources