Multiple index matches with a for loop in python - python

I'm trying to understand just how a python for loop iterates. I know how to iterate with c++ but I have been asked to write this program in python. Forgive my knowledge in python but I am by no means an expert on the subject.
I've googled many possible solutions, however, they have not given actual guidance to my issue. Meaning that there was never an actual explanation as to how the coding works to iterate one by one and to be able to match 3 consecutive indexes.
for i in range(0, len(dna)):
if dna[i] == 'A' & dna[i+1] == 'T' & dna[i+2] == 'G':
protein_sequence[dna[i:i+3]]
//for i in range(0, len(dna)-(3+len(dna)%3), 3):
// if protein[dna[i:i+3]] == "ATG":
// protein_sequence += protein[dna[i:i+3]]
if protein[dna[i:i+3]] == "STOP" :
break
protein_sequence += protein[dna[i:i+3]]
What I am trying to do is to iterate through and match an "exact" three character sequence. Once the sequence is found then I can iterate through by sequences of 3's until I match the "Stop" sequence. The for loop that is commented out didn't work either as far as finding the "Start" trigger to initiate the for loop. Thank you in advance for assistance.

In Python, there is no such thing as a multiple index match; in case you need to look up the surrounding values of an element in an array, use a sliding window of size len(pattern):
def match(s, pattern): # returns the FIRST match
for start in xrange(len(s) - len(pattern)):
if s[start: start + len(pattern)] == pattern:
return start
return None
idx = match(dna, "ATG")
if idx is not None:
pass # do something witty with it instead
Of course, this performs poorly on large data due to its time complexity of O(n^2): you'll need to employ faster algorithms, like Aho-Corasick or KMP.

You could simplify by using the split function limiting it to the first occurrence of ‘atg’ then doing your 3 letter loop:
dna='cgatgxggctatgaatcttccggtaatg'
z=dna.split('atg',1)
Output:
z
['cg', 'xggctatgaatcttccggtaatg']

Related

writing an adaptor removal tool, advice on ignoring case on the sequence

I am learning how to code. I need to code, among other things, an adaptor removal tool. My scripts works fine except in the cases where the sequence is a mix of lower and upper cases.
adaptor sequence== TATA
sequence == TAtaGATTACA
This is the function for the adaptor removal
elif operation == "adaptor-removal":
adaptor = args.adaptor
reads = sequences(args.input, format)
num_reads = len(reads)
bases = "".join([read["seq"] for read in reads])
adaptors_found = 0
for read in reads:
for i, j in read.items():
if i == "seq":
if j.startswith(adaptor.upper()) or j.startswith(adaptor.lower()):
adaptors_found += 1
j = j.replace(adaptor.upper(), "", 1)
j = j.replace(adaptor.lower(), "", 1)
args.output.write("%s\n" % j)
print_summary(operation)
print("%s adaptors found" % adaptors_found)
I tried with:
if j.startswith(adaptor,re.I):
but doesn't work, I don't really understand why. Can anyone experienced guide me through this?
Thank you very much
Let's suppose j is TAtaGATTACA and adaptor is TATA.
Is j.startswith(adaptor.upper()) true? No, because j doesn't start with TATA.
Is j.startswith(adaptor.lower()) true? No, because j doesn't start with tata.
The easiest way to compare two strings case-insensitively is to convert both of them to the same case, upper or lower, and then compare those two strings as if you were comparing them case-sensitively. It doesn't matter whether you choose upper-case or lower-case, as long as you choose the same for both.
Is j.lower().startswith(adaptor.lower()) true? Yes, because j.lower() starts with tata.
Also, take care with your two .replace() calls: it's possible that one of them may end up removing text further along in j, which I don't believe you want. If you just want to trim the adaptor off the front of j, you are better off using a string slice:
if j.lower().startswith(adaptor.lower()):
adaptors_found += 1
j = j[len(adaptor):]
Finally, you also ask why
if j.startswith(adaptor,re.I):
doesn't do what you want. The answer is that if you pass a second parameter to .startswith(), the value of this second parameter is the start position that you search from, not a flag that controls the matching:
"abcd".startswith("cd") # False
"abcd".startswith("cd", 2) # True
It happens that re.I can be converted to the integer 2. So the following is also True, although it looks odd:
"abcd".startswith("cd", re.I)

Function result varies on each run

I have the following function that generates the longest palindrome of a string by removing and re-ordering the characters:
from collections import Counter
def find_longest_palindrome(s):
count = Counter(s)
chars = list(set(s))
beg, mid, end = '', '', ''
for i in range(len(chars)):
if count[chars[i]] % 2 != 0:
mid = chars[i]
count[chars[i - 1]] -= 1
else:
for j in range(0, int(count[chars[i]] / 2)):
beg += chars[i]
end = beg
end = ''.join(list(reversed(end)))
return beg + mid + end
out = find_longest_palindrome('aacggg')
print(out)
I got this function by 'translating' this example from C++
When ever I run my function, I get one of the following outputs at random it seems:
a
aca
agcga
The correct one in this case is 'agcga' as this is the longest palindrome for the input string 'aacggg'.
Could anyone suggest why this is occurring and how I could get the function to reliably return the longest palindrome?
P.S. The C++ code does not have this issue.
Your code depends on the order of list(set(s)).
But sets are unordered.
In CPython 3.4-3.7, the specific order you happen to get for sets of strings depends on the hash values for strings, which are explicitly randomized at startup, so it makes sense that you’d get different results on each run.
The reason you don’t see this in C++ is that the C++ set class template is not an unordered set, but a sorted set (based on a binary search tree, instead of a hash table), so you always get the same order in every run.
You could get the same behavior in Python by calling sorted on the set instead of just copying it to a list in whatever order it has.
But the code still isn’t correct; it just happens to work for some examples because the sorted order happens to give you the characters in most-repeated order. But that’s obviously not true in general, so you need to rethink your logic.
The most obvious difference introduced in your translation is this:
count[ch--]--;
… or, since you're looping over the characters by index instead of directly, more like:
count[chars[i--]]--;
Either way, this decrements the count of the current character, and then decrements the current character so that the loop will re-check the same character the next time through. You've turned this into something completely different:
count[chars[i - 1]] -= 1
This just decrements the count of the previous character.
In a for-each loop, you can't just change the loop variable and have any effect on the looping. To exactly replicate the C++ behavior, you'd either need to switch to a while loop, or put a while True: loop inside the for loop to get the same "repeat the same character" effect.
And, of course, you have to decrement the count of the current character, not decrement the count of the previous character that you're never going to see again.
for i in range(len(chars)):
while True:
if count[chars[i]] % 2 != 0:
mid = chars[i]
count[chars[i]] -= 1
else:
for j in range(0, int(count[chars[i]] / 2)):
beg += chars[i]
break
Of course you could obviously simplify this—starting with just looping for ch in chars:, but if you think about the logic of how the two loops work together, you should be able to see how to remove a whole level of indentation here. But this seems to be the smallest change to your code.
Notice that if you do this change, without the sorted change, the answer is chosen randomly when the correct answer is ambiguous—e.g., your example will give agcga one time, then aggga the next time.
Adding the sorted will make that choice consistent, but no less arbitrary.

how can i search for common elements in two integers with while loop

in my code im having a problem because i cannot compare to list as i wanted. what i try to do is looking for first indexes of inputs firstly and then if indexes not the same looking for the next index of the longer input as i guess1. and then after finishing comparing the first index of elements i want to compare second indexes .. what i mean first checking (A-C)(A-A)(A-T) and then (C-A)(C-T).. and then (T-T)...
and want an input list as (A,T) beacuse of ATT part of guess1..
however i stuck in a moment that i always find ACT not A and T..
where i am wrong.. i will be very glad if you enlighten me..
edit..
what i'm trying to do is looking for the best similarity in the longer list of guess1 and find the most similiar list as ATT
GUESS1="CATTCG"
GUESS2="ACT"
if len(str(GUESS1))>len(str(GUESS2)):
DNA_input_list=list((GUESS1))
DNA_input1_list=list((GUESS2))
common_elements=[]
i=0
while i<len(DNA_input1_list)-1:
j=0
while j<len(DNA_input_list)-len(DNA_input1_list):
if DNA_input_list[i] == DNA_input1_list[j]:
common_elements.append(DNA_input1_list[j])
i+=1
j+=1
if j>len(DNA_input1_list)-1:
break
print(common_elements)
As far as I understand, you want to find a shorter substring in a longer substring, and if not found, remove an element from shorter substring then repeat the search.
You can use string find function in python for that. i.e. "CATTCG".find('ACT'), this function will return -1 because there are no substing ACT. What then you can do is remove an element from the shorter string using slice operator [::] and repeat the search like this --
>>> for x in range(len('ACT')):
... if "CATTCG".find('ACT'[x:]) > -1 :
... print("CATTCG".find('ACT'[x:]))
... print("Match found for " + 'ACT'[x:])
In code here, first a range of lengths is generated i.e. [0, 1, 2, 3] this is the number of items we're gonna slice off from the beginning.
In second line we do the slicing with 'ACT'[x:] (for x==0, we get 'ACT', for x == 1, we get 'CT' and for x==2, we get 'T').
The last two lines print out the position and the string that matched.
If I have understood everything correctly, you want to return the longest similar substring from GUESS2, with is included in GUESS1.
I would use something like this.
<!-- language: lang-py -->
for count in range(len(GUESS2)):
if GUESS2[:count] in GUESS1:
common_elements = GUESS2[:count]
print(GUESS2[:count]) #if a function, return GUESS2[:count]
A loop as long as the count from the searching string.
Then check if the substring is included in the other.
If so, save it to a variable and print/return it after the loop has finished.

Python IndexError : string index out of range in substring program

I am writing a code for a class that wants me to make a code to check the substring in a string using nested loops.
Basically my teacher wants to prove how the function 'in', as in:
ana in banana will return True.
The goal of the program is to make a function of 2 parameters,
substring(subStr,fullStr)
that will print out a sentence saying if subStr is a substring of fullStr, my program is as follows:
def substring(subStr,fullStr):
tracker=""
for i in (0,(len(fullStr)-1)):
for j in (0,(len(subStr)-1)):
if fullStr[i]==subStr[j]:
tracker=tracker+subStr[j]
i+=1
if i==(len(fullStr)-1):
break
if tracker==subStr:
print "Yes",subStr,"is a substring of",fullStr
When i called the function in the interpreter 'substring("ana","banana")', it printed out a traceback error on line 5 saying string index out of range:
if fullStr[i]==subStr[j]:
I'm banging my head trying to find the error. Any help would be appreciated
There are a few separate issues.
You are not reseting tracker in every iteration of the outer loop. This means that the leftovers from previous iterations contaminate later iterations.
You are not using range, and are instead looping over a tuple of just the 0 and the length of each string.
You are trying to increment the outer counter and skipping checks for the iteration of the outer loop.
You are not doing the bounds check correctly before trying to index into the outer string.
Here is a corrected version.
def substring(subStr,fullStr):
for i in range(0,(len(fullStr))):
tracker=""
for j in range(0,(len(subStr))):
if i + j >= len(fullStr):
break
if fullStr[i+j]==subStr[j]:
tracker=tracker+subStr[j]
if tracker==subStr:
print "Yes",subStr,"is a substring of",fullStr
return
substring("ana", "banana")
First off, your loops should be
for i in xrange(0,(len(fullStr))):
for example. i in (0, len(fullStr)-1) will have i take on the value of 0 the first time around, then take on len(fullStr)-1 the second time. I assume by your algorithm you want it to take on the intermediate values as well.
Now as for the error, consider i on the very last pass of the for loop. i is going to be equal to len(fullStr)-1. Now when we execute i+=1, i is now equal to len(fullStr). This does not fufill the condition of i==len(fullStr)-1, so we do not break, we loop, and we crash. It would be better if you either made it if i>=len(fullStr)-1 or checked for i==len(fullStr)-1 before your if fullStr[i]==subStr[j]: statement.
Lastly, though not related to the question specifically, you do not reset tracker each time you stop checking a certain match. You should place tracker = "" after the for i in xrange(0,(len(fullStr))): line. You also do not check if tracker is correct after looping through the list starting at i, nor do you break from the loop when you get a mismatch(instead continuing and possibly picking up more letters that match, but not consecutively.)
Here is a fully corrected version:
def substring(subStr,fullStr):
for i in xrange(0,(len(fullStr))):
tracker="" #this is going to contain the consecutive matches we find
for j in xrange(0,(len(subStr))):
if i==(len(fullStr)): #end of i; no match.
break
if fullStr[i]==subStr[j]: #okay, looks promising, check the next letter to see if it is a match,
tracker=tracker+subStr[j]
i+=1
else: #found a mismatch, leave inner loop and check what we have so far.
break
if tracker==subStr:
print "Yes",subStr,"is a substring of",fullStr
return #we already know it is a substring, so we don't need to check the rest

Printing one character at a time from a string, using the while loop

Im reading "Core Python Programming 2nd Edition", They ask me to print a string, one character at a time using a "while" loop.
I know how the while loop works, but for some reason i can not come up with an idea how to do this. I've been looking around, and only see examples using for loops.
So what i have to do:
user gives input:
text = raw_input("Give some input: ")
I know how to read out each piece of data from an array, but i can't remember anything how to do it to a string.
Now all i need is working while-loop, that prints every character of the string, one at a time.
i guess i've to use len(text), but i'm not 100% sure how to use it in this problem.
Some help would be awsome! I'm sure this is a very simple issue, but for some reason i cannot come up with it!
Thx in advance! :)
I'm quite sure, that the internet is full of python while-loops, but one example:
i=0
while i < len(text):
print text[i]
i += 1
Strings can have for loops to:
for a in string:
print a
Other answers have already given you the code you need to iterate though a string using a while loop (or a for loop) but I thought it might be useful to explain the difference between the two types of loops.
while loops repeat some code until a certain condition is met. For example:
import random
sum = 0
while sum < 100:
sum += random.randint(0,100) #add a random number between 0 and 100 to the sum
print sum
This code will keep adding random numbers between 0 and 100 until the total is greater or equal to 100. The important point is that this loop could run exactly once (if the first random number is 100) or it could run forever (if it keeps selecting 0 as the random number). We can't predict how many times the loop will run until after it completes.
for loops are basically just while loops but we use them when we want a loop to run a preset number of times. Java for loops usually use some sort of a counter variable (below I use i), and generally makes the similarity between while and for loops much more explicit.
for (int i=0; i < 10; i++) { //starting from 0, until i is 10, adding 1 each iteration
System.out.println(i);
}
This loop will run exactly 10 times. This is just a nicer way to write this:
int i = 0;
while (i < 10) { //until i is 10
System.out.println(i);
i++; //add one to i
}
The most common usage for a for loop is to iterate though a list (or a string), which Python makes very easy:
for item in myList:
print item
or
for character in myString:
print character
However, you didn't want to use a for loop. In that case, you'll need to look at each character using its index. Like this:
print myString[0] #print the first character
print myString[len(myString) - 1] # print the last character.
Knowing that you can make a for loop using only a while loop and a counter and knowing that you can access individual characters by index, it should now be easy to access each character one at a time using a while loop.
HOWEVER in general you'd use a for loop in this situation because it's easier to read.
Try this procedure:
def procedure(input):
a=0
print input[a]
ecs = input[a] #ecs stands for each character separately
while ecs != input:
a = a + 1
print input[a]
In order to use it you have to know how to use procedures and although it works, it has an error in the end so you have to work that out too.
Python allows you to use a string as an iterator:
for character in 'string':
print(character)
I'm guessing it's your job to figure out how to turn that into a while loop.
# make a list out of text - ['h','e','l','l','o']
text = list('hello')
while text:
print text.pop()
:)
In python empty object are evaluated as false.
The .pop() removes and returns the last item on a list. And that's why it prints on reverse !
But can be fixed by using:
text.pop( 0 )
Python Code:
for s in myStr:
print s
OR
for i in xrange(len(myStr)):
print myStr[i]
Try this instead ...
Printing each character using while loop
i=0
x="abc"
while i<len(x) :
print(x[i],end=" ")
print(i)
i+=1
This will print each character in text
text = raw_input("Give some input: ")
for i in range(0,len(text)):
print(text[i])

Categories

Resources