python program runtime taking so long/won't end - python

so i'm doing the dna problem for cs50 wherein i have to count the number of times a STR repeats in a dna sequence. i had an idea on how to solve the problem so i took one of the data and ran my code but the problem is that the program doesn't end and keeps running i think it has been about 10 minutes now from when i started the program and it still like this. here's the code:
text="AAGGTAAGTTTAGAATATAAAAGGTGAGTTAAATAGAATAGGTTAAAATTAAAGGAGATCAGATCAGATCAGATCTATCTATCTATCTATCTATCAGAAAAGAGTAAATAGTTAAAGAGTAAGATATTGAATTAATGGAAAATATTGTTGGGGAAAGGAGGGATAGAAGG"
length=len(text)
AGAT=0
tmp=0
for i in range(length):
while text[i]=="A" and text[i+1]=="G" and text[i+2]=="A" and text[i+3]=="T":
tmp+=1
if tmp>AGAT:
AGAT=tmp
else:
AGAT=AGAT
print("done")

As mentioned in the comments there is an infinite loop in your while loop, you could just remove it and choose to use a sliding window technique where you go over the text looking at neighbouring slices of 4 adjacent characters at a time:
text = "AAGGTAAGTTTAGAATATAAAAGGTGAGTTAAATAGAATAGGTTAAAATTAAAGGAGATCAGATCAGATCAGATCTATCTATCTATCTATCTATCAGAAAAGAGTAAATAGTTAAAGAGTAAGATATTGAATTAATGGAAAATATTGTTGGGGAAAGGAGGGATAGAAGG"
search_seq = "AGAT"
count = 0
for i in range(len(text) - len(search_seq) + 1):
if text[i:i+len(search_seq)] == search_seq:
count += 1
print(f"Sequence {search_seq} found {count} times")
Output:
Sequence AGAT found 5 times

I know this is a weird way to solve your problem but i wanted to do something a bit different...
Try this:
agat_string="AAGGTAAGTTTAGAATATAAAAGGTGAGTTAAATAGAATAGGTTAAAATTAAAGGAGATCAGATCAGATCAGATCTATCTATCTATCTATCTATCAGAAAAGAGTAAATAGTTAAAGAGTAAGATATTGAATTAATGGAAAATATTGTTGGGGAAAGGAGGGATAGAAGG"
agat_list=[AGAT for AGAT in range(len(agat_string)) if agat_string.find("AGAT", AGAT) == AGAT] #finds the indices of "AGAT" ;-)
print(len(agat_list))
The output:
5
Also, as someone said, tmp has nothing to do with the while condition. It just throws you in an infinite loop....

Related

Translation and Use of Stop Codon using list comprehension python

I'm trying to write a function about open reading frame using a dictionary of only the stop codon. The program takes in three letter at a time and if that three letter is one of that stop codon, the program stops and counts the number of letters (the stop codon is NOT counted, nor is anything afterwards). For example, nextStop2('AAAAAAAGTGGGTGCTAGGTTGGC') should return 15. I'm not sure why but the code I wrote below doesn't seem to work. Can anyone give me any advice on how to improve? Thanks!
def nextStop2(Seq):
GeneticCodeStop = {'TAA':'X', 'TAG':'X', 'TGA':'X'}
seq2 = ''.join(end_of_loop() if GeneticCodeStop[i]=='X' else i for i in Seq)
return len((seq2)/3)
your parens are off you should write len(seq2)/3. but you don't want to divide by 3 (you expect 15 and not 5) so just return len(seq2).
def nextStop2(Seq):
GeneticCodeStop = ['TAA', 'TAG', 'TGA']
seq2=''
for i in range(0,len(Seq),3) :
codon=Seq[i:i+3]
if codon in GeneticCodeStop:
break
seq2+=codon
return len(seq2)
print(nextStop2('AAAAAAAGTGGGTGCTAGGTTGGC') )
>>> 15
but i don't know biopython, and i think it should have a function to do this

Runtime Error (Python3) when you manipulate lists with very long strings

I wrote a Python3 code to manipulate lists of strings but the code gives Runtime Error for long strings. Here is my code for the problem:
string = "BANANA"
slist= list (string)
mark = list(range(len(slist)))
vowel_substrings = list()
consonants_substrings = list()
#print(mark)
for i in range(len(slist)):
if slist[i]=='A' or slist[i]=='E' or slist[i]=='I' or slist[i]=='O' or mark[i]=='U':
mark[i] = 1
else:
mark[i] = 0
#print(mark)
for j in range(len(slist)):
if mark[j] == 1:
for l in range(j,len(string)):
vowel_substrings.append(string[j:l+1])
#print(string[j:l+1])
else:
for l in range(j,len(string)):
consonants_substrings.append(string[j:l+1])
#print(consonants_substrings)
unique_consonants = list(set(consonants_substrings))
unique_vowels = list(set(vowel_substrings))
##add two lists
all_substrings = consonants_substrings+(vowel_substrings)
#print(all_substrings)
##Find points earned by vowel guy and consonant guy
vowel_guy_score = 0
consonant_guy_score = 0
for strng in unique_vowels:
vowel_guy_score += vowel_substrings.count(strng)
for strng in unique_consonants:
consonant_guy_score += consonants_substrings.count(strng)
#print(vowel_guy_score) #Kevin
#print(consonant_guy_score) #Stuart
if vowel_guy_score > consonant_guy_score:
print("Kevin ",vowel_guy_score)
elif vowel_guy_score < consonant_guy_score:
print("Stuart ",consonant_guy_score)
else:
print("Draw")
gives the right answer. But if you have a long string, shown below, it fails.
NANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANAN
I think initialization or memory allocation might be a problem but I don't know how to allocate memory before even knowing how much memory the code will need. Thank you in advance for any help you can provide.
In the middle there, you generate a data structure of size O(n³): for each starting position × each ending position × length of the substring. That's probably where your memory problems appear (you haven't posted a traceback).
One possible optimisation would be, instead of having a list of substrings and then generating the set, use instead a Counter class. That would let you know how many times each substring appears without storing all the copies:
vowel_substrings = collections.Counter()
consonant_substrings = collections.Counter()
for j in range(len(slist)):
if mark[j] == 1:
for l in range(j,len(string)):
vowel_substrings[string[j:l+1]] += 1
#print(string[j:l+1])
else:
for l in range(j,len(string)):
consonants_substrings[string[j:l+1]] += 1
Even better would be to calculate the scores as you go along, without storing any of the substrings. If I'm reading the code correctly, the substrings aren't actually used for anything — each letter is effectively scored based on its distance from the end of the string, and the scores are added up. This can be calculated in a single pass through the string, without making any additional copies or keeping track of anything other than the cumulative scores and the length of the string.

Optimising a Python script

I've been trying to complete the below task in Python:
http://codeforces.com/problemset/problem/4/C
I created a simple script for it as can be seen below, but it returns a runtime error for the 7th test. I believe this is due to perhaps the code is taking too long, so I require assistance optimising it. I have looked at map and filter commands and tried implementing them, without success.
a=int(input())
entered_usernames=[]
n=0
while n<a:
y=input()
entered_usernames.append(y)
n+=1
valid_usernames=[]
for i in entered_usernames:
if i not in valid_usernames:
valid_usernames.append(i)
print('OK')
else:
count=1
while i+str(count) in valid_usernames:
count+=1
valid_usernames.append(i+str(count))
print(i+str(count))
You can try changing valid_usernames to a set instead of a list.
For a list list_a operation x in list_a takes (on average) linear time.
For a set set_a operation x in set_a takes (on average) constant time.
(source: https://wiki.python.org/moin/TimeComplexity)
This simple change could improve runtime a bit.
What also strikes me as potentially very slow is this fragment:
while i+str(count) in valid_usernames:
count+=1
However, if you want to improve this, you need to think about using a completely different data structure.
Why don't you use a lookup dict with a counter and solve this in O(N) time?
total = int(input()) # get the first input (total usernames)
database = {} # our 'database' / lookup dict
candidates = [input() for _ in range(total)] # pick usernames from the input
for candidate in candidates: # loop through each candidate
if candidate in database: # already used, print with a counter
print(candidate + str(database[candidate]))
database[candidate] += 1 # increase the counter
else: # the candidate doesn't exist in the 'database'...
print("OK")
database[candidate] = 1 # initialize counter for the next time
Why don't you try
valid_usernames.append(i+str(valid_usernames.count(i)))
print(i+str(valid_usernames.count(i))

while loop in python issue

I started learning python few weeks ago (no prior programming knowledge) and got stuck with following issue I do not understand. Here is the code:
def run():
count = 1
while count<11:
return count
count=count+1
print run()
What confuses me is why does printing this function result in: 1?
Shouldn't it print: 10?
I do not want to make a list of values from 1 to 10 (just to make myself clear), so I do not want to append the values. I just want to increase the value of my count until it reaches 10.
What am I doing wrong?
Thank you.
The first thing that you do in the while loop is return the current value of count, which happens to be 1. The loop never actually runs past the first iteration. Python is indentation sensitive (and all languages that I know of are order-sensitive).
Move your return after the while loop.
def run():
count = 1
while count<11:
count=count+1
return count
Change to:
def run():
count = 1
while count<11:
count=count+1
return count
print run()
so you're returning the value after your loop.
Return ends the function early, prohibiting it from going on to the adding part.

How do I calculate something inside a 'for' loop?

I'm doing an assignment in which I have to move a simulated robot across the screen in a loop - I've got that part down, however, between loops, I also have to print the percentage of area covered with each movement - that's my issue.
I googled a bit and even found someone with the same problem, however, I'm not sure if I'm doing it properly.
This code was offered:
percent_complete = 0
for i in range(5):
percent_complete += 20
print('{}% complete'.format(percent_complete))
However, after an error, further googling revealed that only worked with certain versions
so I used this code:
percent_complete = 0
for i in range(5):
percent_complete += 20
print '% complete' % (percent_complete)
And, at the very least, it now executes the code, however, the output when printing is the following:
Here we go!
hello
omplete
hello
(omplete
hello
<omplete
hello
Pomplete
hello
domplete
What is the cause of this? I assume because one of the codes had to be edited, the other parts do as well, but I'm not sure what needs to be done.
for i in range(5):
percent_complete += 20
print '%d complete' % (percent_complete)
You were missing the d specifier.
The first version only works in Python 3 because it uses print as a function. You're probably looking for the following:
percent_complete = 0
for i in xrange(5):
percent_complete += 20
print '{0} complete'.format(percent_complete)
Your other code doesn't do what you intend to do because it now display the number as a string. What you want is that the number is properly converted to a string first and then displayed in the string. The function format does that for you.
You can also use Ansari's approach which explicitly specifies that percent_complete is a number (with the d specifier).
To add to/correct above answers:
The reason your first example didn't work isn't because print isn't a function, but because you left out the argument specifier. Try print('{0}% complete'.format(percent_complete)). The 0 inside the brackets is the crucial factor there.

Categories

Resources