For Loop efficiency when using enumerate or other functions - python

Someone suggested replacing my:
for m in hazardflr:
safetiles.append((m, step))
i = 0
with a more reasonable approach such as:
for i, m in enumerate(hazardflr):
safetiles.append((m, step))
if there is a way to make this more efficient,
I see now how this saves code lines and says the same thing. I didn't know about enum() function. My question is now if are there any other modifications I can do to make this code more efficient and line saving?
def missingDoor(trapdoor, roomwidth, roomheight, step):
safezone = []
hazardflr = givenSteps(roomwidth, step, True)
safetiles = []
for i, m in enumerate(hazardflr):
safetiles.append((m,step))
while i < len(safetiles):
nextSafe = safetiles[i]
if knownSafe(roomwidth, roomheight, nextSafe[0], nextSafe[1]):
if trapdoor[nextSafe[0]/roomwidth][nextSafe[0]%roomwidth] is "0":
if nextSafe[0] not in safezone:
safezone.append(nextSafe[0])
for e in givenSteps(roomwidth, nextSafe[0], True):
if knownSafe(roomwidth, roomheight, e, nextSafe[0]):
if trapdoor[e/roomwidth][e%roomwidth] is "0" and (e,nextSafe[0]) not in safetiles:
safetiles.append((e,nextSafe[0]))
i += 1
return sorted(safezone)

assign nextSafe[0] to a local variable
Your code is 9 times (if I count correctly) using expression nextSafe[0].
Accessing an item from a list is more expensive than picking the value from a variable.
Modification as follows:
for i,m in enumerate(hazardflr):
safetiles.append((m,step))
while i < len(safetiles):
nextSafe = safetiles[i]
ns0 = nextSafe[0]
if knownSafe(roomwidth, roomheight, ns0, nextSafe[1]):
if trapdoor[ns0/roomwidth][ns0 % roomwidth] is "0":
if ns0 not in safezone:
safezone.append(ns0)
for e in givenSteps(roomwidth,ns0,True):
if knownSafe(roomwidth, roomheight, e, ns0):
if trapdoor[e/roomwidth][e%roomwidth] is "0" and (e, ns0) not in safetiles:
safetiles.append((e, ns0))
could speed it up a bit.
turn safezone into set
a test item in list_var is scanning whole list for list_var being a list.
If you turn the test to item in set_var, it knows the result almost immediately regardless of size of the set_var variable, because set has sort of hash which works as "database index" for lookup.
In your code change safezone = [] into safezone = set()
In fact, you can completely skip the membership test in your case:
if ns0 not in safezone:
safezone.append(ns0)
can be turned into:
safezone.add(ns0)
as set will take care of keeping only unique items.

Related

Turn python code into a generator function

How can I turn this code into a generator function? Or can I do it in a other way avoiding reading all data into memory?
The problem right now is that my memory gets full. I get KILLED after a long time when executing the code.
Code:
data = [3,4,3,1,2]
def convert(data):
for index in range(len(data)):
if data[index] == 0:
data[index] = 6
data.append(8)
elif data[index] == 1:
data[index] = 0
elif data[index] == 2:
data[index] = 1
elif data[index] == 3:
data[index] = 2
elif data[index] == 4:
data[index] = 3
elif data[index] == 5:
data[index] = 4
elif data[index] == 6:
data[index] = 5
elif data[index] == 7:
data[index] = 6
elif data[index] == 8:
data[index] = 7
return data
for i in range(256):
output = convert(data)
print(len(output))
Output:
266396864
290566743
316430103
346477329
376199930
412595447
447983143
490587171
534155549
582826967
637044072
692630033
759072776
824183073
903182618
982138692
1073414138
1171199621
1275457000
1396116848
1516813106
Killed
To answer the question: to turn a function into a generator function, all you have to do is yield something. You might do it like this:
def convert(data):
for index in range(len(data)):
...
yield data
Then, you can iterate over the output like this:
iter_converted_datas = convert(data)
for _, converted in zip(range(256), iter_converted_datas):
print(len(converted))
I also would suggest some improvements to this code. The first thing that jumps out at me, is to get rid of all those elif statements.
One helpful thing for this might be to supply a dictionary argument to your generator function that tells it how to convert the data values (the first one is a special case since it also appends).
Here is what that dict might look like:
replacement_dict = {
0: 6,
1: 0,
2: 1,
3: 2,
4: 3,
5: 4,
6: 5,
7: 6,
8: 7,
}
By the way: replacing a series of elif statements with a dictionary is a pretty typical thing to do in python. It isn't always appropriate, but it often works well.
Now you can write your generator like this:
def convert(data, replacement_dict):
for index in range(len(data)):
if index==0:
lst.append(8)
data[index] = replacement_dict[index]
yield data
And use it like this:
iter_converted_datas = convert(data, replacement_dict)
for _, converted in enumerate(iter_converted_datas):
print(len(converted))
But we haven't yet addressed the underlying memory problem.
For that, we need to step back a second: the reason your memory is filling up is you have created a routine that grows very large very fast. And if you were to keep going beyond 256 iterations, the list would get longer without end.
If you want to compute the Xth output for some member of the list without storing the entire list into memory, you have to change things around quite a bit.
My suggestion on how you might get started: create a function to get the Xth iteration for any starting input value.
Here is a generator that just produces outputs based on the replacement dict. Depending on the contents of the replacement dict, this could be infinite, or it might have an end (in which case it would raise a KeyError). In your case, it is infinite.
def process_replacements(value, replacement_dict):
while True:
yield (value := replacement_dict[value])
Next we can write our function to process the Xth iteration for a starting value:
def process_xth(value, xth, replacement_dict):
# emit the xth value from the original value
for _, value in zip(range(xth), process_replacements(value, replacement_dict)):
pass
return value
Now you can process the Xth iteration for any value in your starting data list:
index = 0
xth = 256
process_xth(data[index], xth, data, replacement_dict)
However, we have not appended 8 to the data list anytime we encounter the 0 value. We could do this, but as you have discovered, eventually the list of 8s will get too big. Instead, what we need to do is keep COUNT of how many 8s we have added to the end.
So I suggest adding a zero_tracker function to increment the count:
def zero_tracker():
global eights_count
eights_count += 1
Now you can call that function in the generator every time a zero is encountered, but resetting the global eights_count to zero at the start of the iteration:
def process_replacements(value, replacement_dict):
global eights_count
eights_count = 0
while True:
if value == 0:
zero_tracker()
yield (value := replacement_dict[value])
Now, for any Xth iteration you perform at some point in the list, you can know how many 8s were appended at the end, and when they were added.
But unfortunately simply counting the 8s isn't enough to get the final sequence; you also have to keep track of WHEN (ie, which iteration) they were added to the sequence, so you can know how deeply to iterate them. You could store this in memory pretty efficiently by keeping track of each iteration in a dictionary; that dictionary would look like this:
eights_dict = {
# iteration: count of 8s
}
And of course you can also calculate what each of these 8s will become at any arbitrary depth:
depth = 1
process_xth(8, depth, data, replacement_dict)
Once you know how many 8s there are added for every iteration given some finite number of Xth iterations, you can construct the final sequence by just yielding the correct value the right number of times over and over again, in a generator, without storing anything. I leave it to you to figure out how to construct your eights_dict and do this final part. :)
Here are a few things you can do to optimize it:
Instead of range(len(data)) you can use enumerate(data). This gives you access to both the element AND it's index. Example:
EDIT: According to this post, range is faster than enumerate. If you care about speed, you could ignore this change.
for index, element in enumerate(data):
if element == 0:
data[index] = 6
Secondly, most of the if statements have a predictable pattern. So you can rewrite them like this:
def convert(data):
for idx, elem in enumerate(data):
if elem == 0:
data[idx] = 6
data.append(8)
if elem <= 8:
data[index] = elem - 1
Since lists are mutable, you don't need to return data. It modifies it in-place.
I see that you ask about generator functions, but that ain't solve your memory issues. You run out of memory because, well, you keep everything in memory...
The memory complexity of your solution is O*((8/7)^n) where n is a number of calls to convert. This is because every time you call convert(), the data structure gets expanded with 1/7 of its elements (on average). This is the case because every number in your structure has (roughly) a 1/7 probability of being zero.
So memory complexity is O*((8/7)^n), hence exponential. But can we do better?
Yes we can (assuming that the conversion function remains this "nice and predictable"). We can keep in memory just the number of zeros that were present in a structure when we called a convert(). That way, we will have a linear memory complexity O*(n). Does that come with a cost?
Yes. Element access time no longer has a constant complexity O(1) but it has linear complexity O(n) where n is a number of calls to convert() (At least that's what I came up with).
But it resolves out-of-memory issue.
I also assumed that there would be need to iterate over the computed list. If you are only interested in the length, it is sufficient to keep count of digits in a number and work over those. That way you would use just a few integers of memory.
Here is a code:
from copy import deepcopy # to keep original list untouched ;)
class Data:
def __init__(self, seed):
self.seed = deepcopy(seed)
self.iteration = 0
self.zero_counts = list()
self.len = len(seed)
def __len__(self):
return self.len
def __iter__(self):
return SeededDataIterator(self)
def __repr__(self):
"""not necessary for a solution, but helps with debugging"""
return "[" + (", ".join(f"{n}" for n in self)) + "]"
def __getitem__(self, index: int):
if index >= self.len:
raise IndexError
if index < len(self.seed):
ret = self.seed[index] - self.iteration
else:
inner_it_idx = index - len(self.seed)
for i, cnt in enumerate(self.zero_counts):
if inner_it_idx < cnt:
ret = 9 + i - self.iteration
break
else:
inner_it_idx -= cnt
ret = ret if ret > 6 else ret % 7
return ret
def convert(self):
zero_count = sum((self[i] == 0) for i, _ in enumerate(self.seed))
for i, count in enumerate(self.zero_counts):
i = 9 + i - self.iteration
i = i if i > 6 else i % 7
if i == 0:
zero_count += count
self.zero_counts.append(zero_count)
self.len += self.zero_counts[self.iteration]
self.iteration += 1
class DataIterator:
"""Iterator class for the Data class"""
def __init__(self, seed_data):
self.seed_data = seed_data
self.index = 0
def __next__(self):
if self.index >= self.seed_data.len:
raise StopIteration
ret = self.seed_data[self.index]
self.index += 1
return ret
There is code that tests logical equality and prints required output:
original_data = [3,4,3,1,2]
data = deepcopy(original_data)
d = Data(data)
for _ in range(30):
output = convert(data)
d.convert()
print("---------------------------------------")
print(len(output))
assert len(output) == len(d)
for i, e in enumerate(output):
assert e == d[i]
data = deepcopy(original_data)
d = Data(data)
for _ in range(256):
d.convert()
print(len(d))
Results after your program crashed are:
1516813106
1662255394 <<< Killed here
1806321765
1976596756
2153338313
2348871138
2567316469
2792270106
3058372242
3323134871
3638852150
3959660078
4325467894
4720654782
5141141244
5625688711
6115404977
6697224392
7282794949
7964320044
8680314860
9466609138
10346343493
11256546221
12322913103
13398199926
14661544436
15963109809
17430929182
19026658353
20723155359
22669256596
24654746147
26984457539

Somewhere inside my loop it's not appending results to a list. Why?

So I have two files/dictionaries I want to compare, using a binary search implementation (yes, this is very obviously homework).
One file is
american-english
Amazon
Americana
Americanization
Civilization
And the other file is
british-english
Amazon
Americana
Americanisation
Civilisation
The code below should be pretty straight forward. Import files, compare them, return differences. However, somewhere near the bottom, where it says entry == found_difference: I feel as if the debugger skips right over, even though I can see the two variables in memory being different, and I only get the final element returned in the end. Where am I going wrong?
# File importer
def wordfile_to_list(filename):
"""Converts a list of words to a Python list"""
wordlist = []
with open(filename) as f:
for line in f:
wordlist.append(line.rstrip("\n"))
return wordlist
# Binary search algorithm
def binary_search(sorted_list, element):
"""Search for element in list using binary search. Assumes sorted list"""
matches = []
index_start = 0
index_end = len(sorted_list)
while (index_end - index_start) > 0:
index_current = (index_end - index_start) // 2 + index_start
if element == sorted_list[index_current]:
return True
elif element < sorted_list[index_current]:
index_end = index_current
elif element > sorted_list[index_current]:
index_start = index_current + 1
return element
# Check file differences using the binary search algorithm
def wordfile_differences_binarysearch(file_1, file_2):
"""Finds the differences between two plaintext lists,
using binary search algorithm, and returns them in a new list"""
wordlist_1 = wordfile_to_list(file_1)
wordlist_2 = wordfile_to_list(file_2)
matches = []
for entry in wordlist_1:
found_difference = binary_search(sorted_list=wordlist_2, element=entry)
if entry == found_difference:
pass
else:
matches.append(found_difference)
return matches
# Check if it works
differences = wordfile_differences_binarysearch(file_1="british-english", file_2="american-english")
print(differences)
You don't have an else suite for your if statement. Your if statement does nothing (it uses pass when the test is true, skipped otherwise).
You do have an else suite for the for loop:
for entry in wordlist_1:
# ...
else:
matches.append(found_difference)
A for loop can have an else suite as well; it is executed when a loop completes without a break statement. So when your for loop completes, the current value for found_difference is appended; so whatever was assigned last to that name.
Fix your indentation if the else suite was meant to be part of the if test:
for entry in wordlist_1:
found_difference = binary_search(sorted_list=wordlist_2, element=entry)
if entry == found_difference:
pass
else:
matches.append(found_difference)
However, you shouldn't use a pass statement there, just invert the test:
matches = []
for entry in wordlist_1:
found_difference = binary_search(sorted_list=wordlist_2, element=entry)
if entry != found_difference:
matches.append(found_difference)
Note that the variable name matches feels off here; you are appending words that are missing in the other list, not words that match. Perhaps missing is a better variable name here.
Note that your binary_search() function always returns element, the word you searched on. That'll always be equal to the element you passed in, so you can't use that to detect if a word differed! You need to unindent that last return line and return False instead:
def binary_search(sorted_list, element):
"""Search for element in list using binary search. Assumes sorted list"""
matches = []
index_start = 0
index_end = len(sorted_list)
while (index_end - index_start) > 0:
index_current = (index_end - index_start) // 2 + index_start
if element == sorted_list[index_current]:
return True
elif element < sorted_list[index_current]:
index_end = index_current
elif element > sorted_list[index_current]:
index_start = index_current + 1
return False
Now you can use a list comprehension in your wordfile_differences_binarysearch() loop:
[entry for entry in wordlist_1 if not binary_search(wordlist_2, entry)]
Last but not least, you don't have to re-invent the binary seach wheel, just use the bisect module:
from bisect import bisect_left
def binary_search(sorted_list, element):
return sorted_list[bisect(sorted_list, element)] == element
With sets
Binary search is used to improve efficiency of an algorithm, and decrease complexity from O(n) to O(log n).
Since the naive approach would be to check every word in wordlist1 for every word in wordlist2, the complexity would be O(n**2).
Using binary search would help to get O(n * log n), which is already much better.
Using sets, you could get O(n):
american = """Amazon
Americana
Americanization
Civilization"""
british = """Amazon
Americana
Americanisation
Civilisation"""
american = {line.strip() for line in american.split("\n")}
british = {line.strip() for line in british.split("\n")}
You could get the american words not present in the british dictionary:
print(american - british)
# {'Civilization', 'Americanization'}
You could get the british words not present in the american dictionary:
print(british - american)
# {'Civilisation', 'Americanisation'}
You could get the union of the two last sets. I.e. words that are present in exactly one dictionary:
print(american ^ british)
# {'Americanisation', 'Civilisation', 'Americanization', 'Civilization'}
This approach is faster and more concise than any binary search implementation. But if you really want to use it, as usual, you cannot go wrong with #MartijnPieters' answer.
With two iterators
Since you know the two lists are sorted, you could simply iterate in parallel over the two sorted lists and look for any difference:
american = """Amazon
Americana
Americanism
Americanization
Civilization"""
british = """Amazon
Americana
Americanisation
Americanism
Civilisation"""
american = [line.strip() for line in american.split("\n")]
british = [line.strip() for line in british.split("\n")]
n1, n2 = len(american), len(british)
i, j = 0, 0
while True:
try:
w1 = american[i]
w2 = british[j]
if w1 == w2:
i += 1
j += 1
elif w1 < w2:
print('%s is in american dict only' % w1)
i += 1
else:
print('%s is in british dict only' % w2)
j += 1
except IndexError:
break
for w1 in american[i:]:
print('%s is in american dict only' % w1)
for w2 in british[j:]:
print('%s is in british dict only' % w2)
It outputs:
Americanisation is in british dict only
Americanization is in american dict only
Civilisation is in british dict only
Civilization is in american dict only
It's O(n) as well.

More on dynamic programming

Two weeks ago I posted THIS question here about dynamic programming. User Andrea Corbellini answered precisely what I wanted, but I wanted to take the problem one more step further.
This is my function
def Opt(n):
if len(n) == 1:
return 0
else:
return sum(n) + min(Opt(n[:i]) + Opt(n[i:])
for i in range(1, len(n)))
Let's say you would call
Opt( [ 1,2,3,4,5 ] )
The previous question solved the problem of computing the optimal value. Now,
instead of the computing the optimum value 33 for the above example, I want to print the way we got to the most optimal solution (path to the optimal solution). So, I want to print the indices where the list got cut/divided to get to the optimal solution in the form of a list. So, the answer to the above example would be :
[ 3,2,1,4 ] ( Cut the pole/list at third marker/index, then after second index, then after first index and lastly at fourth index).
That is the answer should be in the form of a list. The first element of the list will be the index where the first cut/division of the list should happen in the optimal path. The second element will be the second cut/division of the list and so on.
There can also be a different solution:
[ 3,4,2,1 ]
They both would still lead you to the correct output. So, it doesn't matter which one you printed. But, I have no idea how to trace and print the optimal path taken by the Dynamic Programming solution.
By the way, I figured out a non-recursive solution to that problem that was solved in my previous question. But, I still can't figure out to print the path for the optimal solution. Here is the non-recursive code for the previous question, it might be helpful to solve the current problem.
def Opt(numbers):
prefix = [0]
for i in range(1,len(numbers)+1):
prefix.append(prefix[i-1]+numbers[i-1])
results = [[]]
for i in range(0,len(numbers)):
results[0].append(0)
for i in range(1,len(numbers)):
results.append([])
for j in range(0,len(numbers)):
results[i].append([])
for i in range(2,len(numbers)+1): # for all lenghts (of by 1)
for j in range(0,len(numbers)-i+1): # for all beginning
results[i-1][j] = results[0][j]+results[i-2][j+1]+prefix[j+i]-prefix[j]
for k in range(1,i-1): # for all splits
if results[k][j]+results[i-2-k][j+k+1]+prefix[j+i]-prefix[j] < results[i-1][j]:
results[i-1][j] = results[k][j]+results[i-2-k][j+k+1]+prefix[j+i]-prefix[j]
return results[len(numbers)-1][0]
Here is one way of printing the selected :
I used the recursive solution using memoization provided by #Andrea Corbellini in your previous question. This is shown below:
cache = {}
def Opt(n):
# tuple objects are hashable and can be put in the cache.
n = tuple(n)
if n in cache:
return cache[n]
if len(n) == 1:
result = 0
else:
result = sum(n) + min(Opt(n[:i]) + Opt(n[i:])
for i in range(1, len(n)))
cache[n] = result
return result
Now, we have the cache values for all the tuples including the selected ones.
Using this, we can print the selected tuples as shown below:
selectedList = []
def printSelected (n, low):
if len(n) == 1:
# No need to print because it's
# already printed at previous recursion level.
return
minVal = math.Inf
minTupleLeft = ()
minTupleRight = ()
splitI = 0
for i in range(1, len(n)):
tuple1ToI = tuple (n[:i])
tupleiToN = tuple (n[i:])
if (cache[tuple1ToI] + cache[tupleiToN]) < minVal:
minVal = cache[tuple1ToI] + cache[tupleiToN]
minTupleLeft = tuple1ToI
minTupleRight = tupleiToN
splitI = low + i
print minTupleLeft, minTupleRight, minVal
print splitI # OP just wants the split index 'i'.
selectedList.append(splitI) # or add to the list as requested by OP
printSelected (list(minTupleLeft), low)
printSelected (list(minTupleRight), splitI)
You call the above method like shown below:
printSelected (n, 0)

in-place modification of strings within a list?

I'm trying to change some elements of a list based on the properties of previous ones. Because I need to assign an intermediate variable, I don't think this can be done as a list comprehension. The following code, with comment, is what I'm trying to achieve:
for H in header:
if "lower" in H.lower():
pref="lower"
elif "higher" in H.lower():
pref="higher"
if header.count(H) > 1:
# change H inplace
H = pref+H
The best solution I've come up with is:
for ii,H in enumerate(header):
if "lower" in H.lower():
pref="lower"
elif "higher" in H.lower():
pref="higher"
if header.count(H) > 1:
header[ii] = pref+H
It doesn't quite work, and feels un-pythonic to me because of the indexing. Is there a better way to do this?
Concrete example:
header = ['LowerLevel','Term','J','UpperLevel','Term','J']
desired output:
header = ['LowerLevel','LowerTerm','LowerJ','UpperLevel','UpperTerm','UpperJ']
Note that neither of my solutions work: the former never modifies header at all, the latter only returns
header = ['LowerLevel','LowerTerm','LowerJ','UpperLevel','Term','J']
because count is wrong after the modifications.
header = ['LowerLevel','Term','J','UpperLevel','Term','J']
prefixes = ['lower', 'upper']
def prefixed(header):
prefix = ''
for h in header:
for p in prefixes:
if h.lower().startswith(p):
prefix, h = h[:len(p)], h[len(p):]
yield prefix + h
print list(prefixed(header))
I don't really know that this is better than what you had. It's different...
$ ./lower.py
['LowerLevel', 'LowerTerm', 'LowerJ', 'UpperLevel', 'UpperTerm', 'UpperJ']
something like this, using generator function:
In [62]: def func(lis):
pref=""
for x in lis:
if "lower" in x.lower():
pref="Lower"
elif "upper" in x.lower():
pref="Upper"
if header.count(x)>1:
yield pref+x
else:
yield x
....:
In [63]: list(func(header))
Out[63]: ['LowerLevel', 'LowerTerm', 'LowerJ', 'UpperLevel', 'UpperTerm', 'UpperJ']
This should work for the data you presented.
from collections import defaultdict
def find_dups(seq):
'''Finds duplicates in a sequence and returns a dict
of value:occurences'''
seen = defaultdict(int)
for curr in seq:
seen[curr] += 1
d = dict([(i, seen[i]) for i in seen if seen[i] > 1])
return d
if __name__ == '__main__':
header = ['LowerLevel','Term','J','UpperLevel','Term','J']
d = find_dups(header)
for i, s in enumerate(header):
if s in d:
if d[s] % 2:
pref = 'Upper'
else:
pref = 'Lower'
header[i] = pref + s
d[s] -= 1
But it give me the creeps to suggest anything, not knowing but a little about the entire set of data you will be working with.
good luck,
Mike

translate my sequence?

I have to write a script to translate this sequence:
dict = {"TTT":"F|Phe","TTC":"F|Phe","TTA":"L|Leu","TTG":"L|Leu","TCT":"S|Ser","TCC":"S|Ser",
"TCA":"S|Ser","TCG":"S|Ser", "TAT":"Y|Tyr","TAC":"Y|Tyr","TAA":"*|Stp","TAG":"*|Stp",
"TGT":"C|Cys","TGC":"C|Cys","TGA":"*|Stp","TGG":"W|Trp", "CTT":"L|Leu","CTC":"L|Leu",
"CTA":"L|Leu","CTG":"L|Leu","CCT":"P|Pro","CCC":"P|Pro","CCA":"P|Pro","CCG":"P|Pro",
"CAT":"H|His","CAC":"H|His","CAA":"Q|Gln","CAG":"Q|Gln","CGT":"R|Arg","CGC":"R|Arg",
"CGA":"R|Arg","CGG":"R|Arg", "ATT":"I|Ile","ATC":"I|Ile","ATA":"I|Ile","ATG":"M|Met",
"ACT":"T|Thr","ACC":"T|Thr","ACA":"T|Thr","ACG":"T|Thr", "AAT":"N|Asn","AAC":"N|Asn",
"AAA":"K|Lys","AAG":"K|Lys","AGT":"S|Ser","AGC":"S|Ser","AGA":"R|Arg","AGG":"R|Arg",
"GTT":"V|Val","GTC":"V|Val","GTA":"V|Val","GTG":"V|Val","GCT":"A|Ala","GCC":"A|Ala",
"GCA":"A|Ala","GCG":"A|Ala", "GAT":"D|Asp","GAC":"D|Asp","GAA":"E|Glu",
"GAG":"E|Glu","GGT":"G|Gly","GGC":"G|Gly","GGA":"G|Gly","GGG":"G|Gly"}
seq = "TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA"
a=""
for y in range( 0, len ( seq)):
c=(seq[y:y+3])
#print(c)
for k, v in dict.items():
if seq[y:y+3] == k:
alle_amino = v[::3] #alle aminozuren op rijtje, a1.1 -a2.1- a.3.1-a1.2 enzo
print (v)
With this script I get the amino acids from the 3 frames under each other, but how can I sort this and get all the amino acids from frame 1 next to each other, and all the amino acids from frame 2 next to each other, and the same for frame 3?
for example , my results must be :
+3 SerIleLeuAlaStpProLysTrpGluProProTyrValAlaStpProIleTyrIleTyrTle
+2 PheAsnThrSerMetThrLysValGlyThrProLeuArgSerMetThrHisIleTyrIleTyr
+1 PheGlnTyrStpHisAspGlnSerGlyAsnProLeuThrStpHisAspProTyrIleTyrIle
TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA
I use Python 3.
i had one more question : can i make this results by some changes in mine own script ?
You can use (Note this would be ridiculously much more easier using biopython translate method):
dictio = {your dictionary here}
def translate(seq):
x = 0
aaseq = []
while True:
try:
aaseq.append(dicti[seq[x:x+3]])
x += 3
except (IndexError, KeyError):
break
return aaseq
seq = "TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA"
for frame in range(3):
print('+%i' %(frame+1), ''.join(item.split('|')[1] for item in translate(seq[frame:])))
Note I changed the name of your dictionary with dicti (not to overwrite dict).
Some comments to help you understand:
translate takes you sequence and returns it in the form of a list in which each item corresponds to the amino acid translation of the triplet coding that position. Like:
aaseq = ["L|Leu","L|Leu","P|Pro", ....]
you could process more this data (get only one or three letters code) inside translate or return it as it is to be processed latter as I have done.
translate is called in
''.join(item.split('|')[1] for item in translate(seq[frame:]))
for each frame. For frame value being 0, 1 or 2 it sends seq[frame:] as a parameter to translate. That is, you are sending the sequences corresponding to the three different reading frames processing them in series. Then, in
''.join(item.split('|')[1]
I split the one and three-letters codes for each amino acid and take the one at index 1 (the second). Then they are joined in a single string
Not too pretty, but does what you want
dct = {"TTT":"F|Phe","TTC":"F|Phe","TTA":"L|Leu","TTG":"L|Leu","TCT":"S|Ser","TCC":"S|Ser",
"TCA":"S|Ser","TCG":"S|Ser", "TAT":"Y|Tyr","TAC":"Y|Tyr","TAA":"*|Stp","TAG":"*|Stp",
"TGT":"C|Cys","TGC":"C|Cys","TGA":"*|Stp","TGG":"W|Trp", "CTT":"L|Leu","CTC":"L|Leu",
"CTA":"L|Leu","CTG":"L|Leu","CCT":"P|Pro","CCC":"P|Pro","CCA":"P|Pro","CCG":"P|Pro",
"CAT":"H|His","CAC":"H|His","CAA":"Q|Gln","CAG":"Q|Gln","CGT":"R|Arg","CGC":"R|Arg",
"CGA":"R|Arg","CGG":"R|Arg", "ATT":"I|Ile","ATC":"I|Ile","ATA":"I|Ile","ATG":"M|Met",
"ACT":"T|Thr","ACC":"T|Thr","ACA":"T|Thr","ACG":"T|Thr", "AAT":"N|Asn","AAC":"N|Asn",
"AAA":"K|Lys","AAG":"K|Lys","AGT":"S|Ser","AGC":"S|Ser","AGA":"R|Arg","AGG":"R|Arg",
"GTT":"V|Val","GTC":"V|Val","GTA":"V|Val","GTG":"V|Val","GCT":"A|Ala","GCC":"A|Ala",
"GCA":"A|Ala","GCG":"A|Ala", "GAT":"D|Asp","GAC":"D|Asp","GAA":"E|Glu",
"GAG":"E|Glu","GGT":"G|Gly","GGC":"G|Gly","GGA":"G|Gly","GGG":"G|Gly"}
seq = "TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA"
def get_amino_list(s):
for y in range(3):
yield [s[x:x+3] for x in range(y, len(s) - 2, 3)]
for n, amn in enumerate(get_amino_list(seq), 1):
print ("+%d " % n + "".join(dct[x][2:] for x in amn))
print(seq)
Here's my solution. I've called your "dict" variable "aminos". The function method3 returns a list of the values to the right of the "|". To merge them into a single string, just join them on "".
From looking at your code, I believe that your aminos dict contains all possible three-letter combinations. Therefore, I've removed the checks that verify this. It should run a lot faster as a result.
def overlapping_groups(seq, group_len=3):
"""Returns `N` adjacent items from an iterable in a sliding window style
"""
for i in range(len(seq)-group_len):
yield seq[i:i+group_len]
def method3(seq, aminos):
return [aminos[k][2:] for k in overlapping_groups(seq, 3)]
for i in range(3):
print("%d: %s" % (i, "".join(method3(seq[i:], aminos))))

Categories

Resources