Approximate periods of strings - port Python code to F# - python

Given two strings u and v we can compute the edit distance using the popular Levenshtein algorithm. Using a method introduced in [1] by Sim et al. I was able to compute k-approximate periods of strings in Python with the following code
def wagnerFischerTable(a, b):
D = [[0]]
[D.append([i]) for i, s in enumerate(a, 1)]
[D[0].append(j) for j, t in enumerate(b, 1)]
for j, s in enumerate(b, 1):
for i, t in enumerate(a, 1):
if s == t:
D[i].append(D[i-1][j-1])
else:
D[i].append(
min(
D[i-1][j] + 1,
D[i][j-1] + 1,
D[i-1][j-1] +1
)
)
return D
def simEtAlTables(s, p):
D = []
for i in xrange(len(s)):
D.append(wagnerFischerTable(p, s[i:]))
return D
def approx(s, p):
D = simEtAlTables(s, p)
t = [0]
for i in xrange(1, len(s)+1):
cmin = 9000
for h in xrange(0, i):
cmin = min(
cmin,
max(t[h], D[h][-1][i-h])
)
t.append(cmin)
return t[len(s)]
I wanted to port this to F# however I wasn't successful yet and I am looking forward to get some feedback what might be wrong.
let inline min3 x y z =
min (min x y) z
let wagnerFischerTable (u: string) (v: string) =
let m = u.Length
let n = v.Length
let d = Array2D.create (m + 1) (n + 1) 0
for i = 0 to m do d.[i, 0] <- i
for j = 0 to n do d.[0, j] <- j
for j = 1 to n do
for i = 1 to m do
if u.[i-1] = v.[j-1] then
d.[i, j] <- d.[i-1, j-1]
else
d.[i, j] <-
min3
(d.[i-1, j ] + 1) // a deletion
(d.[i , j-1] + 1) // an insertion
(d.[i-1, j-1] + 1) // a substitution
d
let simEtAlTables (u: string) (v: string) =
let rec tabulate n lst =
if n <> u.Length then
tabulate (n+1) (lst # [wagnerFischerTable (u.Substring(n)) v])
else
lst
tabulate 0 []
let approx (u: string) (v: string) =
let tables = simEtAlTables u v
let rec kApprox i (ks: int list) =
if i = u.Length + 1 then
ks
else
let mutable curMin = 9000
for h = 0 to i-1 do
curMin <- min curMin (max (ks.Item h) ((tables.Item h).[i-h, v.Length - 1]))
kApprox (i+1) (ks # [curMin])
List.head (List.rev (kApprox 1 [0]))
The reason why it "doesn't work" is just that I am getting wrong values. The Python code passes all test cases while the F# code fails every test. I presume that I have errors in the functions simEtAlTables and/or approx. Probably something with the indices, especially accessing the three dimensional list of table in approx.
So here are three test cases which should cover different results:
Test 1: approx "abcdabcabb" "abc" -> 1
Test 2: approx "abababababab" "ab" -> 0
Test 3: approx "abcdefghijklmn" "xyz" -> 3
[1] http://www.lirmm.fr/~rivals/ALGOSEQ/DOC/SimApprPeriodsTCS262.pdf

This isn't functional in the least (neither is your Python solution), but here's a more direct translation to F#. Maybe you can use it as a starting point and make it more functional from there (although I'll hazard a guess it won't improve performance).
let wagnerFischerTable (a: string) (b: string) =
let d = ResizeArray([ResizeArray([0])])
for i = 1 to a.Length do d.Add(ResizeArray([i]))
for j = 1 to b.Length do d.[0].Add(j)
for j = 1 to b.Length do
for i = 1 to a.Length do
let s, t = b.[j-1], a.[i-1]
if s = t then
d.[i].Add(d.[i-1].[j-1])
else
d.[i].Add(
Seq.min [
d.[i-1].[j] + 1
d.[i].[j-1] + 1
d.[i-1].[j-1] + 1
])
d
let simEtAlTables (s: string) (p: string) =
let d = ResizeArray()
for i = 0 to s.Length - 1 do
d.Add(wagnerFischerTable p s.[i..])
d
let approx (s: string) (p: string) =
let d = simEtAlTables s p
let t = ResizeArray([0])
for i = 1 to s.Length do
let mutable cmin = 9000
for h = 0 to i - 1 do
let dh = d.[h]
cmin <- min cmin (max t.[h] dh.[dh.Count-1].[i-h])
t.Add(cmin)
t.[s.Length]

This code may help:
let levenshtein word1 word2 =
let preprocess = fun (str : string) -> str.ToLower().ToCharArray()
let chars1, chars2 = preprocess word1, preprocess word2
let m, n = chars1.Length, chars2.Length
let table : int[,] = Array2D.zeroCreate (m + 1) (n + 1)
for i in 0..m do
for j in 0..n do
match i, j with
| i, 0 -> table.[i, j] <- i
| 0, j -> table.[i, j] <- j
| _, _ ->
let delete = table.[i-1, j] + 1
let insert = table.[i, j-1] + 1
//cost of substitution is 2
let substitute =
if chars1.[i - 1] = chars2.[j - 1]
then table.[i-1, j-1] //same character
else table.[i-1, j-1] + 2
table.[i, j] <- List.min [delete; insert; substitute]
table.[m, n], table //return tuple of the table and distance
//test
levenshtein "intention" "execution" //|> ignore
You might also want to check this blog posting from Rick Minerich.

Related

Binary search: Not getting upper & lower bound for very large values

I'm trying to solve this cp problem, UVA - The Playboy Chimp using Python but for some reason, the answer comes wrong for very large values for example this input:
5
3949 45969 294854 9848573 2147483647
5
10000 6 2147483647 4959 5949583
Accepted output:
3949 45969
X 3949
9848573 X
3949 45969
294854 9848573
My output:
X 294854
X 294854
9848573 X
X 294854
45969 9848573
My code:
def bs(target, search_space):
l, r = 0, len(search_space) - 1
while l <= r:
m = (l + r) >> 1
if target == search_space[m]:
return m - 1, m + 1
elif target > search_space[m]:
l = m + 1
else:
r = m - 1
return r, l
n = int(input())
f_heights = list(set([int(a) for a in input().split()]))
q = int(input())
heights = [int(b) for b in input().split()]
for h in heights:
a, b = bs(h, f_heights)
print(f_heights[a] if a >= 0 else 'X', f_heights[b] if b < len(f_heights) else 'X')
Any help would be appreciated!
This is because you are inserting the first input to set, which changes the order of the numbers in the list. If you are using Python 3.6 or newer
dict maintains the insertion order, so you can use dict.fromkeys to maintain the order
f_heights = list(dict.fromkeys(int(a) for a in s.split()))
Example:
f_heights = list(set([int(a) for a in input().split()]))
print(f_heights) # [294854, 3949, 45969, 9848573, 2147483647]
f_heights = list(dict.fromkeys(int(a) for a in input().split()))
print(f_heights) # [3949, 45969, 294854, 9848573, 2147483647]

Check the difference between string i and p

How would I go about checking to see what the difference between string p and i? So the 2nd line can equal the first line.
t=int(input())
print(t)
for i in range(t):
print(i)
i=input()
p=input()
print(i,p)
print('Case #'+(str(i+1))+': ')
if len(i)==0:
#print(len(p))
else:
#print((len(p)-len(i)))
Help Barbara find out how many extra letters she needs to remove in order to obtain I or if I cannot be obtained from P by removing letters then output IMPOSSIBLE.
input:
2
aaaa
aaaaa
bbbbb
bbbbc
output:
Case #1: 1
Case #2: IMPOSSIBLE
You can use Levenshtein distance to calculate the difference and decide what is possible and impossible yourself.
You can find more resources on YouTube to understand the concept better. E.g. https://www.youtube.com/watch?v=We3YDTzNXEk
I have provided a version of code for your convenient as well.
import numpy as np
def calculate_edit_distance(source, target):
'''Calculate the edit distance from source to target
[In] source="ab" target="bc"
[Out] return 2
'''
num_row = len(target) + 1
num_col = len(source) + 1
distance_table = np.array([[0] * num_col for _ in range(num_row)])
# getting from X[0...i] to empty target string requires i deletions
distance_table[:, 0] = [i for i in range(num_row)]
# getting from Y[0...i] to empty source string requires i deletions
distance_table[0] = [i for i in range(num_col)]
# loop through all the characters and calculate their respective distances
for i in range(num_row - 1):
for j in range(num_col - 1):
insert = distance_table[i + 1, j]
delete = distance_table[i, j + 1]
substitute = distance_table[i, j]
# if target char and source char are the same,
# just copy the diagonal value
if target[i] == source[j]:
distance_table[i + 1, j + 1] = substitute
else:
operations = [delete, insert, substitute]
best_operation = np.argmin(operations)
if best_operation == 2: # +2 if the operation is to substitute
distance_table[i + 1, j + 1] = substitute + 2
else: # same formula for both delete and insert operation
distance_table[i + 1, j + 1] = operations[best_operation] + 1
return distance_table[num_row - 1, num_col - 1]

Is there an easy way to map elements of a list to each other?

I am using python. Basically I have a large list with each value having a triple index: (i, j, t):
Y_1,2,1
Y_1,3,1
and so on.
What I want to do is extract specifically the values: Y_i,j,t and Y_j,i,t but I am having some difficulty.
So for example, I would want to be able to extract: Y_1,2,1 and Y_2,1,1. Y_1,3,4 and Y_3,1,4...
To populate my data I use:
N = 6
T = 2 * N - 2
list_ijt = []
for t in range(1, T + 1):
for i in range(1, N + 1):
for j in range(1, N + 1):
# Avoid making Y_i,j=i,t index
if j == i:
continue
element = "Y" + str(i) + ',' + str(j) + ',' + str(t)
list_ijt.append(element)
Is there some way to do this for a general case n? Would it be easier done with a dictionary? I have tried and tried to come up with some algorithm or equation, like
for n in range(len(list_ijt)):
match_index = 4 * (n + 1) + (n + 1)
print(list_ijt[n], list_ijt[match_index])
But to no avail, and with no clue how this would be generalized for any n (the above example was for n = 6).
Example list:
print(list_ijt)
['Y1,2,1', 'Y1,3,1', 'Y1,4,1', 'Y1,5,1', 'Y1,6,1', 'Y2,1,1', 'Y2,3,1', 'Y2,4,1', 'Y2,5,1', 'Y2,6,1', 'Y3,1,1', 'Y3,2,1', 'Y3,4,1', 'Y3,5,1', 'Y3,6,1', 'Y4,1,1', 'Y4,2,1', 'Y4,3,1', 'Y4,5,1', 'Y4,6,1', 'Y5,1,1', 'Y5,2,1', 'Y5,3,1', 'Y5,4,1', 'Y5,6,1', 'Y6,1,1', 'Y6,2,1', 'Y6,3,1', 'Y6,4,1', 'Y6,5,1', 'Y1,2,2', 'Y1,3,2', 'Y1,4,2', 'Y1,5,2', 'Y1,6,2', 'Y2,1,2', 'Y2,3,2', 'Y2,4,2', 'Y2,5,2', 'Y2,6,2', 'Y3,1,2', 'Y3,2,2', 'Y3,4,2', 'Y3,5,2', 'Y3,6,2', 'Y4,1,2', 'Y4,2,2', 'Y4,3,2', 'Y4,5,2', 'Y4,6,2', 'Y5,1,2', 'Y5,2,2', 'Y5,3,2', 'Y5,4,2', 'Y5,6,2', 'Y6,1,2', 'Y6,2,2', 'Y6,3,2', 'Y6,4,2', 'Y6,5,2', 'Y1,2,3', 'Y1,3,3', 'Y1,4,3', 'Y1,5,3', 'Y1,6,3', 'Y2,1,3', 'Y2,3,3', 'Y2,4,3', 'Y2,5,3', 'Y2,6,3', 'Y3,1,3', 'Y3,2,3', 'Y3,4,3', 'Y3,5,3', 'Y3,6,3', 'Y4,1,3', 'Y4,2,3', 'Y4,3,3', 'Y4,5,3', 'Y4,6,3', 'Y5,1,3', 'Y5,2,3', 'Y5,3,3', 'Y5,4,3', 'Y5,6,3', 'Y6,1,3', 'Y6,2,3', 'Y6,3,3', 'Y6,4,3', 'Y6,5,3', 'Y1,2,4', 'Y1,3,4', 'Y1,4,4', 'Y1,5,4', 'Y1,6,4', 'Y2,1,4', 'Y2,3,4', 'Y2,4,4', 'Y2,5,4', 'Y2,6,4', 'Y3,1,4', 'Y3,2,4', 'Y3,4,4', 'Y3,5,4', 'Y3,6,4', 'Y4,1,4', 'Y4,2,4', 'Y4,3,4', 'Y4,5,4', 'Y4,6,4', 'Y5,1,4', 'Y5,2,4', 'Y5,3,4', 'Y5,4,4', 'Y5,6,4', 'Y6,1,4', 'Y6,2,4', 'Y6,3,4', 'Y6,4,4', 'Y6,5,4', 'Y1,2,5', 'Y1,3,5', 'Y1,4,5', 'Y1,5,5', 'Y1,6,5', 'Y2,1,5', 'Y2,3,5', 'Y2,4,5', 'Y2,5,5', 'Y2,6,5', 'Y3,1,5', 'Y3,2,5', 'Y3,4,5', 'Y3,5,5', 'Y3,6,5', 'Y4,1,5', 'Y4,2,5', 'Y4,3,5', 'Y4,5,5', 'Y4,6,5', 'Y5,1,5', 'Y5,2,5', 'Y5,3,5', 'Y5,4,5', 'Y5,6,5', 'Y6,1,5', 'Y6,2,5', 'Y6,3,5', 'Y6,4,5', 'Y6,5,5', 'Y1,2,6', 'Y1,3,6', 'Y1,4,6', 'Y1,5,6', 'Y1,6,6', 'Y2,1,6', 'Y2,3,6', 'Y2,4,6', 'Y2,5,6', 'Y2,6,6', 'Y3,1,6', 'Y3,2,6', 'Y3,4,6', 'Y3,5,6', 'Y3,6,6', 'Y4,1,6', 'Y4,2,6', 'Y4,3,6', 'Y4,5,6', 'Y4,6,6', 'Y5,1,6', 'Y5,2,6', 'Y5,3,6', 'Y5,4,6', 'Y5,6,6', 'Y6,1,6', 'Y6,2,6', 'Y6,3,6', 'Y6,4,6', 'Y6,5,6', 'Y1,2,7', 'Y1,3,7', 'Y1,4,7', 'Y1,5,7', 'Y1,6,7', 'Y2,1,7', 'Y2,3,7', 'Y2,4,7', 'Y2,5,7', 'Y2,6,7', 'Y3,1,7', 'Y3,2,7', 'Y3,4,7', 'Y3,5,7', 'Y3,6,7', 'Y4,1,7', 'Y4,2,7', 'Y4,3,7', 'Y4,5,7', 'Y4,6,7', 'Y5,1,7', 'Y5,2,7', 'Y5,3,7', 'Y5,4,7', 'Y5,6,7', 'Y6,1,7', 'Y6,2,7', 'Y6,3,7', 'Y6,4,7', 'Y6,5,7', 'Y1,2,8', 'Y1,3,8', 'Y1,4,8', 'Y1,5,8', 'Y1,6,8', 'Y2,1,8', 'Y2,3,8', 'Y2,4,8', 'Y2,5,8', 'Y2,6,8', 'Y3,1,8', 'Y3,2,8', 'Y3,4,8', 'Y3,5,8', 'Y3,6,8', 'Y4,1,8', 'Y4,2,8', 'Y4,3,8', 'Y4,5,8', 'Y4,6,8', 'Y5,1,8', 'Y5,2,8', 'Y5,3,8', 'Y5,4,8', 'Y5,6,8', 'Y6,1,8', 'Y6,2,8', 'Y6,3,8', 'Y6,4,8', 'Y6,5,8', 'Y1,2,9', 'Y1,3,9', 'Y1,4,9', 'Y1,5,9', 'Y1,6,9', 'Y2,1,9', 'Y2,3,9', 'Y2,4,9', 'Y2,5,9', 'Y2,6,9', 'Y3,1,9', 'Y3,2,9', 'Y3,4,9', 'Y3,5,9', 'Y3,6,9', 'Y4,1,9', 'Y4,2,9', 'Y4,3,9', 'Y4,5,9', 'Y4,6,9', 'Y5,1,9', 'Y5,2,9', 'Y5,3,9', 'Y5,4,9', 'Y5,6,9', 'Y6,1,9', 'Y6,2,9', 'Y6,3,9', 'Y6,4,9', 'Y6,5,9', 'Y1,2,10', 'Y1,3,10', 'Y1,4,10', 'Y1,5,10', 'Y1,6,10', 'Y2,1,10', 'Y2,3,10', 'Y2,4,10', 'Y2,5,10', 'Y2,6,10', 'Y3,1,10', 'Y3,2,10', 'Y3,4,10', 'Y3,5,10', 'Y3,6,10', 'Y4,1,10', 'Y4,2,10', 'Y4,3,10', 'Y4,5,10', 'Y4,6,10', 'Y5,1,10', 'Y5,2,10', 'Y5,3,10', 'Y5,4,10', 'Y5,6,10', 'Y6,1,10', 'Y6,2,10', 'Y6,3,10', 'Y6,4,10', 'Y6,5,10']
Tried:
string = '\n'.join(list_ijt)
for t in range(T):
for i in range(n):
for j in range(i, n):
s = get(i, j, t, string)
if s:
list_ijt.append(s)
Use
def get(i, j, t, string):
def _get(i, j):
pat = f'Y{i},{j},{t}'
ind = string.find(pat)
if ind >=0:
return string[ind:ind+len(pat)]
a, b = _get(i, j), _get(j, i)
if a and b:
return a, b
n = 3
T = 2*n-2
list_ijt = []
string = '\n'.join(your_list)
for t in range(T):
for i in range(n):
for j in range(i, n):
if s := get(i, j, t, string):
list_ijt.append(s)
print(list_ijt)
[('Y1,2,1', 'Y2,1,1'), ('Y1,2,2', 'Y2,1,2'), ('Y1,2,3', 'Y2,1,3')]
If your data is exactly as shown (no missing elements or anything like that), you can analyze the placement of elements pretty easily.
There are a total of T blocks of N * (N - 1) elements. Each block consists of N segments of N - 1 elements each. Each segment has a constant value of i. The first i - 1 elements are for j < i and the remainder for j > i.
So for a given choice of i, j, t, the index in the list is
(t - 1) * N * (N - 1) + (i - 1) * (N - 1) + j - (j > i) - 1
The expression (j > i) evaluates to a bool, which is an integer that's 0 or 1.
That means that the index for j, i, t is given by
(t - 1) * N * (N - 1) + (j - 1) * (N - 1) + i - (i > j) - 1
So if you have an index in the list, k, you can break it down into components and apply the second formula. The components are
t = (k // (N * (N - 1))) + 1
i = (k % (N * (N - 1))) // (N - 1) + 1
j = k % (N - 1) + 1
j += (j >= i)
So you can compute a match index for any k totally deterministically with arithmetic and boolean operations. You don't need loops or dictionaries in this particular case.
You can write your final loop as something like this:
for k in range(len(list_ijt)):
t = (k // (N * (N - 1))) + 1
i = (k % (N * (N - 1))) // (N - 1) + 1
j = k % (N - 1) + 1
j += (j >= i)
match_index = (t - 1) * N * (N - 1) + (j - 1) * (N - 1) + i - (i > j) - 1
print(list_ijt[k], '->', list_ijt[match_index])

Rosalind: Mendel's first law

I'm trying to solve the problem at http://rosalind.info/problems/iprb/
Given: Three positive integers k, m, and n, representing a population
containing k+m+n organisms: k individuals are homozygous dominant for
a factor, m are heterozygous, and n are homozygous recessive.
Return: The probability that two randomly selected mating organisms
will produce an individual possessing a dominant allele (and thus
displaying the dominant phenotype). Assume that any two organisms can
mate.
My solution works for the sample, but not for any problems generated. After further research it seems that I should find the probability of choosing any one organism at random, find the probability of choosing the second organism, and then the probability of that pairing producing offspring with a dominant allele.
My question is then: what does my code below find the probability of? Does it find the percentage of offspring with a dominant allele for all possible matings -- so rather than the probability of one random mating, my code is solving for the percentage of offspring with dominant alleles if all pairs were tested?
f = open('rosalind_iprb.txt', 'r')
r = f.read()
s = r.split()
############# k = # homozygotes dominant, m = #heterozygotes, n = # homozygotes recessive
k = float(s[0])
m = float(s[1])
n = float(s[2])
############# Counts for pairing between each group and within groups
k_k = 0
k_m = 0
k_n = 0
m_m = 0
m_n = 0
n_n = 0
##############
if k > 1:
k_k = 1.0 + (k-2) * 2.0
k_m = k * m
k_n = k * n
if m > 1:
m_m = 1.0 + (m-2) * 2.0
m_n = m * n
if n> 1:
n_n = 1.0 + (n-2) * 2.0
#################
dom = k_k + k_m + k_n + 0.75*m_m + 0.5*m_n
total = k_k + k_m + k_n + m_m + m_n + n_n
chance = dom/total
print chance
Looking at your code, I'm having a hard time figuring out what it's supposed to do. I'll work through the problem here.
Let's simplify the wording. There are n1 type 1, n2 type 2, and n3 type 3 items.
How many ways are there to choose a set of size 2 out of all the items? (n1 + n2 + n3) choose 2.
Every pair of items will have item types corresponding to one of the six following unordered multisets: {1,1}, {2,2}, {3,3}, {1,2}, {1,3}, {2,3}
How many multisets of the form {i,i} are there? ni choose 2.
How many multisets of the form {i,j} are there, where i != j? ni * nj.
The probabilities of the six multisets are thus the following:
P({1,1}) = [n1 choose 2] / [(n1 + n2 + n3) choose 2]
P({2,2}) = [n2 choose 2] / [(n1 + n2 + n3) choose 2]
P({3,3}) = [n3 choose 2] / [(n1 + n2 + n3) choose 2]
P({1,2}) = [n1 * n2] / [(n1 + n2 + n3) choose 2]
P({1,3}) = [n1 * n3] / [(n1 + n2 + n3) choose 2]
P({2,3}) = [n2 * n3] / [(n1 + n2 + n3) choose 2]
These sum to 1. Note that [X choose 2] is just [X * (X - 1) / 2] for X > 1 and 0 for X = 0 or 1.
Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype).
To answer this question, you simply need to identify which of the six multisets correspond to this event. Lacking the genetics knowledge to answer that question, I'll leave that to you.
For example, suppose that a dominant allele results if either of the two parents was type 1. Then the events of interest are {1,1}, {1,2}, {1,3} and the probability of the event is P({1,1}) + P({1,2}) + P({1,3}).
I spend some time in this question, so, to clarify in python:
lst = ['2', '2', '2']
k, m, n = map(float, lst)
t = sum(map(float, lst))
# organize a list with allele one * allele two (possibles) * dominant probability
# multiplications by one were ignored
# remember to substract the haplotype from the total when they're the same for the second haplotype choosed
couples = [
k*(k-1), # AA x AA
k*m, # AA x Aa
k*n, # AA x aa
m*k, # Aa x AA
m*(m-1)*0.75, # Aa x Aa
m*n*0.5, # Aa x aa
n*k, # aa x AA
n*m*0.5, # aa x Aa
n*(n-1)*0 # aa x aa
]
# (t-1) indicate that the first haplotype was select
print(round(sum(couples)/t/(t-1), 5))
If you are interested, I just found a solution and put it in C#.
public double mendel(double k, double m, double n)
{
double prob;
prob = ((k*k - k) + 2*(k*m) + 2*(k*n) + (.75*(m*m - m)) + 2*(.5*m*n))/((k + m + n)*(k + m + n -1));
return prob;
}
Our parameters are k (dominant), m (hetero), & n (recessive).
First I found the probability for each possible breeding pair selection in terms of percentage of the population. So, a first round choice for k would look like k/(k+m+n), and a second round choice of k after a first round choice of k would look like (k-1)/(k+m+n). Then multiply these two to get the outcome. Since there were three identified populations, there were nine possible outcomes.
Then I multiplied each outcome by it's dominance probability - 100% for anything with k, 75% for m&m, 50% for m&n, n&m, and 0% for n&n. Now add the outcomes together, and you have your solution.
http://rosalind.info/problems/iprb/
Here is the code I did in python:
We don't want the offspring to be completely recessive, so we should make the probability tree and look at the cases and the probabilities of the cases that event might happen.
Then the probability that we want is 1 - p_reccesive. More explanation is provided in the comment section of the following code.
"""
Let d: dominant, h: hetero, r: recessive
Let a = k+m+n
Let X = the r.v. associated with the first person randomly selected
Let Y = the r.v. associated with the second person randomly selected without replacement
Then:
k = f_d => p(X=d) = k/a => p(Y=d| X=d) = (k-1)/(a-1) ,
p(Y=h| X=d) = (m)/(a-1) ,
p(Y=r| X=d) = (n)/(a-1)
m = f_h => p(X=h) = m/a => p(Y=d| X=h) = (k)/(a-1) ,
p(Y=h| X=h) = (m-1)/(a-1)
p(Y=r| X=h) = (n)/(a-1)
n = f_r => p(X=r) = n/a => p(Y=d| X=r) = (k)/(a-1) ,
p(Y=h| X=r) = (m)/(a-1) ,
p(Y=r| X=r) = (n-1)/(a-1)
Now the joint would be:
| offspring possibilites given X and Y choice
-------------------------------------------------------------------------
X Y | P(X,Y) | d(dominant) h(hetero) r(recessive)
-------------------------------------------------------------------------
d d k/a*(k-1)/(a-1) | 1 0 0
d h k/a*(m)/(a-1) | 1/2 1/2 0
d r k/a*(n)/(a-1) | 0 1 0
|
h d m/a*(k)/(a-1) | 1/2 1/2 0
h h m/a*(m-1)/(a-1) | 1/4 1/2 1/4
h r m/a*(n)/(a-1) | 0 1/2 1/2
|
r d n/a*(k)/(a-1) | 0 0 0
r h n/a*(m)/(a-1) | 0 1/2 1/2
r r n/a*(n-1)/(a-1) | 0 0 1
Here what we don't want is the element in the very last column where the offspring is completely recessive.
so P = 1 - those situations as follow
"""
path = 'rosalind_iprb.txt'
with open(path, 'r') as file:
lines = file.readlines()
k, m, n = [int(i) for i in lines[0].split(' ')]
a = k + m + n
p_recessive = (1/4*m*(m-1) + 1/2*m*n + 1/2*m*n + n*(n-1))/(a*(a-1))
p_wanted = 1 - p_recessive
p_wanted = round(p_wanted, 5)
print(p_wanted)

What's wrong with my Extended Euclidean Algorithm (python)?

My algorithm to find the HCF of two numbers, with displayed justification in the form r = a*aqr + b*bqr, is only partially working, even though I'm pretty sure that I have entered all the correct formulae - basically, it can and will find the HCF, but I am also trying to provide a demonstration of Bezout's Lemma, so I need to display the aforementioned displayed justification. The program:
# twonumbers.py
inp = 0
a = 0
b = 0
mul = 0
s = 1
r = 1
q = 0
res = 0
aqc = 1
bqc = 0
aqd = 0
bqd = 1
aqr = 0
bqr = 0
res = 0
temp = 0
fin_hcf = 0
fin_lcd = 0
seq = []
inp = input('Please enter the first number, "a":\n')
a = inp
inp = input('Please enter the second number, "b":\n')
b = inp
mul = a * b # Will come in handy later!
if a < b:
print 'As you have entered the first number as smaller than the second, the program will swap a and b before proceeding.'
temp = a
a = b
b = temp
else:
print 'As the inputted value a is larger than or equal to b, the program has not swapped the values a and b.'
print 'Thank you. The program will now compute the HCF and simultaneously demonstrate Bezout\'s Lemma.'
print `a`+' = ('+`aqc`+' x '+`a`+') + ('+`bqc`+' x '+`b`+').'
print `b`+' = ('+`aqd`+' x '+`a`+') + ('+`bqd`+' x '+`b`+').'
seq.append(a)
seq.append(b)
c = a
d = b
while r != 0:
if s != 1:
c = seq[s-1]
d = seq[s]
res = divmod(c,d)
q = res[0]
r = res[1]
aqr = aqc - (q * aqd)#These two lines are the main part of the justification
bqr = bqc - (q * aqd)#-/
print `r`+' = ('+`aqr`+' x '+`a`+') + ('+`bqr`+' x '+`b`+').'
aqd = aqr
bqd = bqr
aqc = aqd
bqc = bqd
s = s + 1
seq.append(r)
fin_hcf = seq[-2] # Finally, the HCF.
fin_lcd = mul / fin_hcf
print 'Using Euclid\'s Algorithm, we have now found the HCF of '+`a`+' and '+`b`+': it is '+`fin_hcf`+'.'
print 'We can now also find the LCD (LCM) of '+`a`+' and '+`b`+' using the following method:'
print `a`+' x '+`b`+' = '+`mul`+';'
print `mul`+' / '+`fin_hcf`+' (the HCF) = '+`fin_lcd`+'.'
print 'So, to conclude, the HCF of '+`a`+' and '+`b`+' is '+`fin_hcf`+' and the LCD (LCM) of '+`a`+' and '+`b`+' is '+`fin_lcd`+'.'
I would greatly appreciate it if you could help me to find out what is going wrong with this.
Hmm, your program is rather verbose and hence hard to read. For example, you don't need to initialise lots of those variables in the first few lines. And there is no need to assign to the inp variable and then copy that into a and then b. And you don't use the seq list or the s variable at all.
Anyway that's not the problem. There are two bugs. I think that if you had compared the printed intermediate answers to a hand-worked example you should have found the problems.
The first problem is that you have a typo in the second line here:
aqr = aqc - (q * aqd)#These two lines are the main part of the justification
bqr = bqc - (q * aqd)#-/
in the second line, aqd should be bqd
The second problem is that in this bit of code
aqd = aqr
bqd = bqr
aqc = aqd
bqc = bqd
you make aqd be aqr and then aqc be aqd. So aqc and aqd end up the same. Whereas you actually want the assignments in the other order:
aqc = aqd
bqc = bqd
aqd = aqr
bqd = bqr
Then the code works. But I would prefer to see it written more like this which is I think a lot clearer. I have left out the prints but I'm sure you can add them back:
a = input('Please enter the first number, "a":\n')
b = input('Please enter the second number, "b":\n')
if a < b:
a,b = b,a
r1,r2 = a,b
s1,s2 = 1,0
t1,t2 = 0,1
while r2 > 0:
q,r = divmod(r1,r2)
r1,r2 = r2,r
s1,s2 = s2,s1 - q * s2
t1,t2 = t2,t1 - q * t2
print r1,s1,t1
Finally, it might be worth looking at a recursive version which expresses the structure of the solution even more clearly, I think.
Hope this helps.
Here is a simple version of Bezout's identity; given a and b, it returns x, y, and g = gcd(a, b):
function bezout(a, b)
if b == 0
return 1, 0, a
else
q, r := divide(a, b)
x, y, g := bezout(b, r)
return y, x - q * y, g
The divide function returns both the quotient and remainder.
The python program that does what you want (please note that extended Euclid algorithm gives only one pair of Bezout coefficients) might be:
import sys
def egcd(a, b):
if a == 0:
return (b, 0, 1)
g, y, x = egcd(b % a, a)
return (g, x - (b // a) * y, y)
def main():
if len(sys.argv) != 3:
's program caluclates LCF, LCM and Bezout identity of two integers
usage %s a b''' % (sys.argv[0], sys.argv[0])
sys.exit(1)
a = int(sys.argv[1])
b = int(sys.argv[2])
g, x, y = egcd(a, b)
print 'HCF =', g
print 'LCM =', a*b/g
print 'Bezout identity: %i * (%i) + %i * (%i) = %i' % (a, x, b, y, g)
main()

Categories

Resources