Random mutation - python

I am a beginner python coder and I am writing a code to generate random mutation at random position.
I have written a function which includes:
The sequence where mutation happens
A List of nucleotide from which a nucleotide is selected randomly and replaced to the nucleotide of the original sequence.
Basic concept of the code:
Say we have to pick one ball from (A) basket and replace with another ball from another basket (B). The colors of the two balls need to be different.
I know I need to use while loop but I am not able to do it.
def random(s)
length = len(s)
seq = list(s)
nucl = "ATGC" ## pick one nucleotide from this list
lengthnucl= len(nucleotide_list)
position_orgseq = np.random.choice(range(0,length))
position_nucl = np.random.choice(range(0,lengthnucl))
#while c < length:
##if the two nucleotides chosen are not equaul then:
#two nucleotides are from
# TTTTGGGCCCCAAA - original seq, ATGC = nucloetide list
if seq[position_orgseq] != nucleotide_list[position_nucl]:
seq[position_orgseq] = nucleotide_list[position_nucl]
final = "".join(seq)
return s,final
actual_seq, mut_seq = random("TTTTGGGCCCCAAA")
print(actual_seq)
print(mut_seq)

First, as #Error - Syntatical Remorse pointed out in the comment, there is no need to import numpy, use built in random instead (specifically, you can use random.randint()).
Your code as is, doesn't run, you have misnamed variables. Other than that, you are close. Your hunch to using a while loop is correct. You can simply keep looping until your two random values don't give the same nucleotide in the two lists. Like so:
from random import randint
def random(s):
length = len(s)
seq = list(s)
nucl = "ATGC"
lengthnucl = len(nucl)
position_orgseq = randint(0, length - 1)
position_nucl = randint(0, lengthnucl - 1)
while seq[position_orgseq] == nucl[position_nucl]:
position_orgseq = randint(0, length - 1)
position_nucl = randint(0, lengthnucl - 1)
seq[position_orgseq] = nucl[position_nucl]
final = "".join(seq)
return s, final
actual_seq, mut_seq = random("TTTTGGGCCCCAAA")
print(actual_seq)
print(mut_seq)
This may be optimized further.

Related

efficient way to find domino's board order

I have a small domino's game that works this way: I'm given a N*N 4-tiles, and I need to order them so that every two adjacent tiles have the same number. The tiles may be rotated. For example, here is my 2*2 board:
a,b,c,d = [1,2,3,4], [7,9,6,2], [6,8,8,5], [3,5,0,0]
They can be visualized by:
print(print_2_tiles(a,b,'a','b'))
print(print_2_tiles(d,c,'d','c'))
##############
#**1**##**7**#
#4*A*2##2*B*9#
#**3**##**6**#
##############
##############
#**3**##**6**#
#0*D*5##5*C*8#
#**0**##**8**#
##############
It can be seen, that the only way to "win" this board, is the way I ordered the tiles, since <a,b> are only connected via 2, <a,d> only via 3, and so on... <a,c>,<b,d> are not connected at all. No rotation or movement of any of the tiles will get a "win".
I wrote functions to:
find connections between any given 2 tiles
figure out how many rotations are needed to connect given 2 tiles
check all possibilities and find the correct order
However, this was only a simple case with 16*12*8 options, where I could rule out many options since there were unique connectors (i.e. '2' that connected <a,c> was not present in other tiles). If I get a bigger board (bigger alphabet could also complicate things...), say, 5*5, the number of options will be 100*96*92... and brute-forcing will not cut it.
How can I find the right order (the board is guaranteed to have exactly one correct order) efficiently?
Here are my efforts:
import numpy as np
from itertools import combinations, product
# returns list of [<connector element>, <indices of element in a>, <indices of element in b>]
def find_connections(a,b):
intersected_elem = np.array(list(set(a).intersection(b)))
possible_connections = []
for val in intersected_elem:
x = list(np.where(np.array(a) == val)[0])
y = list(np.where(np.array(b) == val)[0])
possible_connections.append([val,x,y])
return possible_connections
def str_tile(t, name):
template = '''#######
#**{}**#
#{}*{}*{}#
#**{}**#
#######'''
up,right,down,left = t
return template.format(up,left,name.upper(),right,down)
def print_2_tiles(a,b, name_a, name_b):
res = ''
for line in zip(str_tile(a,name_a).splitlines(), str_tile(b,name_b).splitlines()):
res += ''.join(line)
res += '\n'
return res[:-1]
def find_final_connections(tiles_ls):
tiles_combinations = list(combinations(tiles_ls, 2))
a_idx,b_idx = 0,1
final_connections = []
for comb in tiles_combinations:
connections = find_connections(comb[0], comb[1])
print('({},{})'.format(a_idx,b_idx), connections, end='\t')
if len(connections):
print('this meants {},{} are connected via {} in directions {},{}'.format(a_idx,b_idx, connections[0][0], connections[0][1][0], connections[0][2][0]))
final_connections.append((a_idx,b_idx))
else:
print()
# is there a neater way, using enumerate on itertools.combinations?
b_idx += 1
if b_idx == len(tiles_ls):
a_idx += 1
b_idx = a_idx + 1
print(final_connections)
a,b,c,d = [1,2,3,4], [7,9,6,2], [6,8,8,5], [3,5,0,0]
tiles_ls = [a,b,c,d]
find_final_connections(tiles_ls) # returns a 4-elem list -> success
print('#'*30)
a,b,c,d = [1,2,3,4], [7,9,6,2], [6,8,8,5], [0,5,0,0]
tiles_ls = [a,b,c,d]
find_final_connections(tiles_ls) # returns a 3-elem list -> fail
Is it so sure that brute-forcing can't do ?
I would try a systematic solution where you pick every domino in turn and place it top-left, trying all four rotations. Then pick another domino and place it in the second position, trying all four rotations and checking if there is a match with the first one.
And so on, at any stage you pick a domino from the remaining ones, try it with the four rotations and check compatibility with the known neighbors.
This is better written as a recursive procedure.

Word Ladder without replacement in python

I have question, where I need to implement ladder problem with different logic.
In each step, the player must either add one letter to the word
from the previous step, or take away one letter, and then rearrange the letters to make a new word.
croissant(-C) -> arsonist(-S) -> aroints(+E)->notaries(+B)->baritones(-S)->baritone
The new word should make sense from a wordList.txt which is dictionary of word.
Dictionary
My code look like this,
where I have calculated first the number of character removed "remove_list" and added "add_list". Then I have stored that value in the list.
Then I read the file, and stored into the dictionary which the sorted pair.
Then I started removing and add into the start word and matched with dictionary.
But now challenge is, some word after deletion and addition doesn't match with the dictionary and it misses the goal.
In that case, it should backtrack to previous step and should add instead of subtracting.
I am looking for some sort of recursive function, which could help in this or complete new logic which I could help to achieve the output.
Sample of my code.
start = 'croissant'
goal = 'baritone'
list_start = map(list,start)
list_goal = map(list, goal)
remove_list = [x for x in list_start if x not in list_goal]
add_list = [x for x in list_goal if x not in list_start]
file = open('wordList.txt','r')
dict_words = {}
for word in file:
strip_word = word.rstrip()
dict_words[''.join(sorted(strip_word))]=strip_word
file.close()
final_list = []
flag_remove = 0
for i in remove_list:
sorted_removed_list = sorted(start.replace(''.join(map(str, i)),"",1))
sorted_removed_string = ''.join(map(str, sorted_removed_list))
if sorted_removed_string in dict_words.keys():
print dict_words[sorted_removed_string]
final_list.append(sorted_removed_string)
flag_remove = 1
start = sorted_removed_string
print final_list
flag_add = 0
for i in add_list:
first_character = ''.join(map(str,i))
sorted_joined_list = sorted(''.join([first_character, final_list[-1]]))
sorted_joined_string = ''.join(map(str, sorted_joined_list))
if sorted_joined_string in dict_words.keys():
print dict_words[sorted_joined_string]
final_list.append(sorted_joined_string)
flag_add = 1
sorted_removed_string = sorted_joined_string
Recursion-based backtracking isn't a good idea for search problem of this sort. It blindly goes downward in search tree, without exploiting the fact that words are almost never 10-12 distance away from each other, causing StackOverflow (or recursion limit exceeded in Python).
The solution here uses breadth-first search. It uses mate(s) as helper, which given a word s, finds all possible words we can travel to next. mate in turn uses a global dictionary wdict, pre-processed at the beginning of the program, which for a given word, finds all it's anagrams (i.e re-arrangement of letters).
from queue import Queue
words = set(''.join(s[:-1]) for s in open("wordsEn.txt"))
wdict = {}
for w in words:
s = ''.join(sorted(w))
if s in wdict: wdict[s].append(w)
else: wdict[s] = [w]
def mate(s):
global wdict
ans = [''.join(s[:c]+s[c+1:]) for c in range(len(s))]
for c in range(97,123): ans.append(s + chr(c))
for m in ans: yield from wdict.get(''.join(sorted(m)),[])
def bfs(start,goal,depth=0):
already = set([start])
prev = {}
q = Queue()
q.put(start)
while not q.empty():
cur = q.get()
if cur==goal:
ans = []
while cur: ans.append(cur);cur = prev.get(cur)
return ans[::-1] #reverse the array
for m in mate(cur):
if m not in already:
already.add(m)
q.put(m)
prev[m] = cur
print(bfs('croissant','baritone'))
which outputs: ['croissant', 'arsonist', 'rations', 'senorita', 'baritones', 'baritone']

different result from recursive and dynamic programming

Working on below problem,
Problem,
Given a m * n grids, and one is allowed to move up or right, find the different paths between two grid points.
I write a recursive version and a dynamic programming version, but they return different results, and any thoughts what is wrong?
Source code,
from collections import defaultdict
def move_up_right(remaining_right, remaining_up, prefix, result):
if remaining_up == 0 and remaining_right == 0:
result.append(''.join(prefix[:]))
return
if remaining_right > 0:
prefix.append('r')
move_up_right(remaining_right-1, remaining_up, prefix, result)
prefix.pop(-1)
if remaining_up > 0:
prefix.append('u')
move_up_right(remaining_right, remaining_up-1, prefix, result)
prefix.pop(-1)
def move_up_right_v2(remaining_right, remaining_up):
# key is a tuple (given remaining_right, given remaining_up),
# value is solutions in terms of list
dp = defaultdict(list)
dp[(0,1)].append('u')
dp[(1,0)].append('r')
for right in range(1, remaining_right+1):
for up in range(1, remaining_up+1):
for s in dp[(right-1,up)]:
dp[(right,up)].append(s+'r')
for s in dp[(right,up-1)]:
dp[(right,up)].append(s+'u')
return dp[(right, up)]
if __name__ == "__main__":
result = []
move_up_right(2,3,[],result)
print result
print '============'
print move_up_right_v2(2,3)
In version 2 you should be starting your for loops at 0 not at 1. By starting at 1 you are missing possible permutations where you traverse the bottom row or leftmost column first.
Change version 2 to:
def move_up_right_v2(remaining_right, remaining_up):
# key is a tuple (given remaining_right, given remaining_up),
# value is solutions in terms of list
dp = defaultdict(list)
dp[(0,1)].append('u')
dp[(1,0)].append('r')
for right in range(0, remaining_right+1):
for up in range(0, remaining_up+1):
for s in dp[(right-1,up)]:
dp[(right,up)].append(s+'r')
for s in dp[(right,up-1)]:
dp[(right,up)].append(s+'u')
return dp[(right, up)]
And then:
result = []
move_up_right(2,3,[],result)
set(move_up_right_v2(2,3)) == set(result)
True
And just for fun... another way to do it:
from itertools import permutations
list(map(''.join, set(permutations('r'*2+'u'*3, 5))))
The problem with the dynamic programming version is that it doesn't take into account the paths that start from more than one move up ('uu...') or more than one move right ('rr...').
Before executing the main loop you need to fill dp[(x,0)] for every x from 1 to remaining_right+1 and dp[(0,y)] for every y from 1 to remaining_up+1.
In other words, replace this:
dp[(0,1)].append('u')
dp[(1,0)].append('r')
with this:
for right in range(1, remaining_right+1):
dp[(right,0)].append('r'*right)
for up in range(1, remaining_up+1):
dp[(0,up)].append('u'*up)

How to NOT repeat a random.choice when printing a piece of text in python

I have three lists in text files and I am trying to generate a four-word message randomly, choosing from the prefix and suprafix lists for the first three words and the `suffix' file for the fourth word.
However, I want to prevent it from picking a word that was already chosen by the random.choice function.
import random
a= random.random
prefix = open('prefix.txt','r').readlines()
suprafix = open('suprafix.txt','r').readlines()
suffix = open('suffix.txt','r').readlines()
print (random.choice(prefix + suprafix), random.choice(prefix + suprafix), random.choice(prefix + suprafix), random.choice(suffix))
As you can see it chooses randomly from those two lists for three words.
random.sample(pop, k) selected k items from pop without replacement. Hence:
prefix1, prefix2, prefix3 = random.sample(prefix, 3)
suprafix1, suprafix2, suprafix3 = random.sample(suprafix, 3)
suffix = random.choice(suffix)
print (prefix1 + suprafix1, prefix2 + suprafix2, prefix3 + suprafix3, suffix))
Thankyou xnx that helped me sort out the problem by using the random.sample first then printing either of them afterwards, i might have done it the long way round but this is how i did it >
import random
a= random.random
prefix = open('prefix.txt','r').readlines()
suprafix = open('suprafix.txt','r').readlines()
suffix = open('suffix.txt','r').readlines()
prefix1, prefix2, prefix3 = random.sample(prefix, 3)
suprafix1, suprafix2, suprafix3 = random.sample(suprafix, 3)
suffix = random.choice(suffix)
one = prefix1, suprafix1
two = prefix2, suprafix2
three = prefix3, suprafix3
print (random.choice(one), random.choice(two), random.choice(three), suffix)

I need a Python Function that will output a random string of 4 different characters when given the desired probabilites of the characters

For example,
The function could be something like def RandABCD(n, .25, .34, .25, .25):
Where n is the length of the string to be generated and the following numbers are the desired probabilities of A, B, C, D.
I would imagine this is quite simple, however i am having trouble creating a working program. Any help would be greatly appreciated.
Here's the code to select a single weighted value. You should be able to take it from here. It uses bisect and random to accomplish the work.
from bisect import bisect
from random import random
def WeightedABCD(*weights):
chars = 'ABCD'
breakpoints = [sum(weights[:x+1]) for x in range(4)]
return chars[bisect(breakpoints, random())]
Call it like this: WeightedABCD(.25, .34, .25, .25).
EDIT: Here is a version that works even if the weights don't add up to 1.0:
from bisect import bisect_left
from random import uniform
def WeightedABCD(*weights):
chars = 'ABCD'
breakpoints = [sum(weights[:x+1]) for x in range(4)]
return chars[bisect_left(breakpoints, uniform(0.0,breakpoints[-1]))]
The random class is quite powerful in python. You can generate a list with the characters desired at the appropriate weights and then use random.choice to obtain a selection.
First, make sure you do an import random.
For example, let's say you wanted a truly random string from A,B,C, or D.
1. Generate a list with the characters
li = ['A','B','C','D']
Then obtain values from it using random.choice
output = "".join([random.choice(li) for i in range(0, n)])
You could easily make that a function with n as a parameter.
In the above case, you have an equal chance of getting A,B,C, or D.
You can use duplicate entries in the list to give characters higher probabilities. So, for example, let's say you wanted a 50% chance of A and 25% chances of B and C respectively. You could have an array like this:
li = ['A','A','B','C']
And so on.
It would not be hard to parameterize the characters coming in with desired weights, to model that I'd use a dictionary.
characterbasis = {'A':25, 'B':25, 'C':25, 'D':25}
Make that the first parameter, and the second being the length of the string and use the above code to generate your string.
For four letters, here's something quick off the top of my head:
from random import random
def randABCD(n, pA, pB, pC, pD):
# assumes pA + pB + pC + pD == 1
cA = pA
cB = cA + pB
cC = cB + pC
def choose():
r = random()
if r < cA:
return 'A'
elif r < cB:
return 'B'
elif r < cC:
return 'C'
else:
return 'D'
return ''.join([choose() for i in xrange(n)])
I have no doubt that this can be made much cleaner/shorter, I'm just in a bit of a hurry right now.
The reason I wouldn't be content with David in Dakota's answer of using a list of duplicate characters is that depending on your probabilities, it may not be possible to create a list with duplicates in the right numbers to simulate the probabilities you want. (Well, I guess it might always be possible, but you might wind up needing a huge list - what if your probabilities were 0.11235442079, 0.4072777384, 0.2297927874, 0.25057505341?)
EDIT: here's a much cleaner generic version that works with any number of letters with any weights:
from bisect import bisect
from random import uniform
def rand_string(n, content):
''' Creates a string of letters (or substrings) chosen independently
with specified probabilities. content is a dictionary mapping
a substring to its "weight" which is proportional to its probability,
and n is the desired number of elements in the string.
This does not assume the sum of the weights is 1.'''
l, cdf = zip(*[(l, w) for l, w in content.iteritems()])
cdf = list(cdf)
for i in xrange(1, len(cdf)):
cdf[i] += cdf[i - 1]
return ''.join([l[bisect(cdf, uniform(0, cdf[-1]))] for i in xrange(n)])
Here is a rough idea of what might suit you
import random as r
def distributed_choice(probs):
r= r.random()
cum = 0.0
for pair in probs:
if (r < cum + pair[1]):
return pair[0]
cum += pair[1]
The parameter probs takes a list of pairs of the form (object, probability). It is assumed that the sum of probabilities is 1 (otherwise, its trivial to normalize).
To use it just execute:
''.join([distributed_choice(probs)]*4)
Hmm, something like:
import random
class RandomDistribution:
def __init__(self, kv):
self.entries = kv.keys()
self.where = []
cnt = 0
for x in self.entries:
self.where.append(cnt)
cnt += kv[x]
self.where.append(cnt)
def find(self, key):
l, r = 0, len(self.where)-1
while l+1 < r:
m = (l+r)/2
if self.where[m] <= key:
l=m
else:
r=m
return self.entries[l]
def randomselect(self):
return self.find(random.random()*self.where[-1])
rd = RandomDistribution( {"foo": 5.5, "bar": 3.14, "baz": 2.8 } )
for x in range(1000):
print rd.randomselect()
should get you most of the way...
Thank you all for your help, I was able to figure something out, mostly with this info.
For my particular need, I did something like this:
import random
#Create a function to randomize a given string
def makerandom(seq):
return ''.join(random.sample(seq, len(seq)))
def randomDNA(n, probA=0.25, probC=0.25, probG=0.25, probT=0.25):
notrandom=''
A=int(n*probA)
C=int(n*probC)
T=int(n*probT)
G=int(n*probG)
#The remainder part here is used to make sure all n are used, as one cannot
#have half an A for example.
remainder=''
for i in range(0, n-(A+G+C+T)):
ramainder+=random.choice("ATGC")
notrandom=notrandom+ 'A'*A+ 'C'*C+ 'G'*G+ 'T'*T + remainder
return makerandom(notrandom)

Categories

Resources