Finding a instances of a string inside a string

Finding a instances of a string inside a string - python

I'm working through the bioinformatics problems on rosalind.org and I've come across a problem where the python script I've written works on a smaller dataset but when applied to a larger one, I get the IndexError: list index out of range message.
Basically I have both a smaller motif and a larger DNA sequence and I have to find instances of the motif in the DNA sequence. When I put the sample dataset in the question into my script, it works fine and I get the right answer. However, using significantly larger motifs and sequences yields the previously mentioned error.
This is my code:
motif = "<motif around 9 characters>"
cMotif = list(motif)
motifLength = len(cMotif)
dna = "<DNA sequence around 900 characters>"
dnArray = list(dna)
locations = ""
position = 0
for nt in dnArray:
if (nt == cMotif[0]):
for x in range(0, (motifLength)):
if ((x + position) > len(dnArray)):
break
if (dnArray[position + x] == cMotif[x]):
if (x >= (motifLength - 1)):
locations += (str(position + 1) + " ")
break
else:
break
position += 1
print(locations)
The IndexError: list index out of range error occurs at line 18, if (dnArray[position + x] == cMotif[x]): hence I added the
if ((x + position) > len(dnArray)):
break
but this doesn't make a difference.
Cheers

Python's lists are zero-based, so when (x + position) == len(dnArray) trying to access dnArray[x + position] will be one past the last index. You should change your test to if (x + position) >= len(dnArray): to solve your problem.

I will suggest you to use python's regex instead for easiness.
import re
motif = "abc"
dna = "helloabcheyabckjlkjsabckjetc"
for i in re.finditer(motif,dna):
print(i.start(), i.end())
It gives you the start,end index in the string for every occrence of motif in dna

Here is your program that throws an error:
motif = "abcd"
cMotif = list(motif)
motifLength = len(cMotif)
dna = "I am a dna which has abcd in it.a"
dnArray = list(dna)
locations = ""
position = 0
for nt in dnArray:
if (nt == cMotif[0]):
for x in range(0, (motifLength)):
if ((x + position) > len(dnArray)):
break
if (dnArray[position + x] == cMotif[x]):
if (x >= (motifLength - 1)):
locations += (str(position + 1) + " ")
break
else:
break
position += 1
print(locations)
I changed if ((x + position) > len(dnArray)): to if ((x + position) >= len(dnArray)): and the error goes away because your program is never going to the break statement because you are not checking for the "=" condition. Remember, in programming languages things start from 0.
Put this line above your condition if ((x + position) > len(dnArray)): and you will know the reason:
print("My position is: " + str(x+position) + " and the length is: " + str(len(dnArray)))
The last line of this print statement will indicate that My position is: 33 and the length is: 33
See here that you have reached the end of the line and it does not match your existing criteria to go in the break statement.

Related

Draw a centered triforce surrounded by hyphens using Python

I want to draw a triangle of asterisks from a given n which is an odd number and at least equal to 3. So far I did the following:
def main():
num = 5
for i in range(num):
if i == 0:
print('-' * num + '*' * (i + 1) + '-' * num)
elif i % 2 == 0:
print('-' * (num-i+1) + '*' * (i + 1) + '-' * (num-i+1))
else:
continue
if __name__ == "__main__":
main()
And got this as the result:
-----*-----
----***----
--*****--
But how do I edit the code so the number of hyphens corresponds to the desirable result:
-----*-----
----***----
---*****---
--*-----*--
-***---***-
*****-*****

There's probably a better way but this seems to work:
def triangle(n):
assert n % 2 != 0 # make sure n is an odd number
hyphens = n
output = []
for stars in range(1, n+1, 2):
h = '-'*hyphens
s = '*'*stars
output.append(h + s + h)
hyphens -= 1
pad = n // 2
mid = n
for stars in range(1, n+1, 2):
fix = '-'*pad
mh = '-'*mid
s = '*'*stars
output.append(fix + s + mh + s + fix)
pad -= 1
mid -= 2
print(*output, sep='\n')
triangle(5)
Output:
-----*-----
----***----
---*****---
--*-----*--
-***---***-
*****-*****

Think about what it is you're iterating over and what you're doing with your loop. Currently you're iterating up to the maximum number of hyphens you want, and you seem to be treating this as the number of asterisks to print, but if you look at the edge of your triforce, the number of hyphens is decreasing by 1 each line, from 5 to 0. To me, this would imply you need to print num-i hyphens each iteration, iterating over line number rather than the max number of hyphens/asterisks (these are close in value, but the distinction is important).
I'd recommend trying to make one large solid triangle first, i.e.
-----*-----
----***----
---*****---
--*******--
-*********-
***********
since this is a simpler problem to solve and is just one modification away from what you're trying to do (this is where the distinction between number of asterisks and line number will be important, as your pattern changes dependent on what line you're on).
I'll help get you started; for any odd n, the number of lines you need to print is going to be (n+1). If you modify your range to be over this value, you should be able to figure out how many hyphens and asterisks to print on each line to make a large triangle, and then you can just modify it to cut out the centre.

get single index number of a duplicate element in a list

I want to turn this "RqaEzty" into this "R-Qq-Aaa-Eeee-Zzzzz-Tttttt-Yyyyyyy". it basically prints out a letter as many times as the index of it is.
I created the following method, but it has one mistake.
Every time a letter occurs twice or more in the input my output is wrong, because my code takes the same index of the letter appearing for the first time as for the same letter appearing the second time
E.g. "ZpglnRxqenU" should be
"Z-Pp-Ggg-Llll-Nnnnn-Rrrrrr-Xxxxxxx-Qqqqqqqq-Eeeeeeeee-Nnnnnnnnnn-Uuuuuuuuuuu"
but I get:
"Z-Pp-Ggg-Llll-Nnnnn-Rrrrrr-Xxxxxxx-Qqqqqqqq-Eeeeeeeee-Nnnnn-Uuuuuuuuuuu"
because my code takes the same index for the first "n" as for the second "n"
def accum(s):
x = list(s)
s = ""
for y in x:
amount = 1 + (x.index(y))
word = y * amount
s += word.capitalize() + "-"
s = s.rstrip('-')
print(s)
my idea is to implement an if function into the "for loop" to control wether a letter occurs more than once and if yes to put out the index of the second (or third or fourth...) letter.
My question:
how do I put out the index of a duplicate as a single value?

Instead of looking for the index every time, you could iterate over the indexes themselves:
for i in range(len(x)):
y = x[i];
amount = 1 + i
word = y * amount
s += word.capitalize() + "-"

You can use a counter variable instead of (x.index(y))
def accum(s):
x = list(s)
s = ""
i=1 #change happens here
for y in x:
word = y * i #change happens here
s += word.capitalize() + "-"
i+1
s = s.rstrip('-')
print(s)
It will solve the error

If you need a value that increases each step, you can use enumerate().
Then it will work.
def accum(s):
x = list(s)
s = ""
for idx, y in enumerate(x):
amount = 1 + idx
word = y * amount
s += word.capitalize() + "-"
s = s.rstrip('-')
print(s)

Python: How to find all ways to decode a string?

I'm trying to solve this problem but it fails with input "226".
Problem:
A message containing letters from A-Z is being encoded to numbers using the following mapping:
'A' -> 1
'B' -> 2
...
'Z' -> 26
Given a non-empty string containing only digits, determine the total number of ways to decode it.
My Code:
class Solution:
def numDecodings(self, s: str) -> int:
decode =[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]
ways = []
for d in decode:
for i in s:
if str(d) == s or str(d) in s:
ways.append(d)
if int(i) in decode:
ways.append(str(i))
return len(ways)
My code returns 2. It only takes care of combinations (22,6) and (2,26).
It should be returning 3, so I'm not sure how to take care of the (2,2,6) combination.

Looks like this problem can be broken down into many subproblems thus can be solved recursively
Subproblem 1 = when the last digit of the string is valid ( i.e. non zero number ) for that you can just recur for (n-1) digits left
if s[n-1] > "0":
count = number_of_decodings(s,n-1)
Subproblem 2 = when last 2 digits form a valid number ( less then 27 ) for that you can just recur for remaining (n-2) digits
if (s[n - 2] == '1' or (s[n - 2] == '2' and s[n - 1] < '7') ) :
count += number_of_decodings(s, n - 2)
Base Case = length of the string is 0 or 1
if n == 0 or n == 1 :
return 1
EDIT: A quick searching on internet , I found another ( more interesting ) method to solve this particular problem which uses dynamic programming to solve this problem
# A Dynamic Programming based function
# to count decodings
def countDecodingDP(digits, n):
count = [0] * (n + 1); # A table to store
# results of subproblems
count[0] = 1;
count[1] = 1;
for i in range(2, n + 1):
count[i] = 0;
# If the last digit is not 0, then last
# digit must add to the number of words
if (digits[i - 1] > '0'):
count[i] = count[i - 1];
# If second last digit is smaller than 2
# and last digit is smaller than 7, then
# last two digits form a valid character
if (digits[i - 2] == '1' or
(digits[i - 2] == '2' and
digits[i - 1] < '7') ):
count[i] += count[i - 2];
return count[n];
the above solution solves the problem in complexity of O(n) and uses the similar method as that of fibonacci number problem
source: https://www.geeksforgeeks.org/count-possible-decodings-given-digit-sequence/

This seemed like a natural for recursion. Since I was bored, and the first answer didn't use recursion and didn't return the actual decodings, I thought there was room for improvement. For what it's worth...
def encodings(str, prefix = ''):
encs = []
if len(str) > 0:
es = encodings(str[1:], (prefix + ',' if prefix else '') + str[0])
encs.extend(es)
if len(str) > 1 and int(str[0:2]) <= 26:
es = encodings(str[2:], (prefix + ',' if prefix else '') + str[0:2])
encs.extend(es)
return encs if len(str) else [prefix]
This returns a list of the possible decodings. To get the count, you just take the length of the list. Here a sample run:
encs = encodings("123")
print("{} {}".format(len(encs), encs))
with result:
3 ['1,2,3', '1,23', '12,3']
Another sample run:
encs = encodings("123123")
print("{} {}".format(len(encs), encs))
with result:
9 ['1,2,3,1,2,3', '1,2,3,1,23', '1,2,3,12,3', '1,23,1,2,3', '1,23,1,23', '1,23,12,3', '12,3,1,2,3', '12,3,1,23', '12,3,12,3']

Check result using 4 operations based with Python

I'm struggling to make a Python program that can solve riddles such as:
get 23 using [1,2,3,4] and the 4 basic operations however you'd like.
I expect the program to output something such as
# 23 reached by 4*(2*3)-1
So far I've come up with the following approach as reduce input list by 1 item by checking every possible 2-combo that can be picked and every possible result you can get to.
With [1,2,3,4] you can pick:
[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]
With x and y you can get to:
(x+y),(x-y),(y-x),(x*y),(x/y),(y/x)
Then I'd store the operation computed so far in a variable, and run the 'reducing' function again onto every result it has returned, until the arrays are just 2 items long: then I can just run the x,y -> possible outcomes function.
My problem is this "recursive" approach isn't working at all, because my function ends as soon as I return an array.
If I input [1,2,3,4] I'd get
[(1+2),3,4] -> [3,3,4]
[(3+3),4] -> [6,4]
# [10,2,-2,24,1.5,0.6666666666666666]
My code so far:
from collections import Counter
def genOutputs(x,y,op=None):
results = []
if op == None:
op = str(y)
else:
op = "("+str(op)+")"
ops = ['+','-','*','/','rev/','rev-']
z = 0
#will do every operation to x and y now.
#op stores the last computated bit (of other functions)
while z < len(ops):
if z == 4:
try:
results.append(eval(str(y) + "/" + str(x)))
#yield eval(str(y) + "/" + str(x)), op + "/" + str(x)
except:
continue
elif z == 5:
results.append(eval(str(y) + "-" + str(x)))
#yield eval(str(y) + "-" + str(x)), op + "-" + str(x)
else:
try:
results.append(eval(str(x) + ops[z] + str(y)))
#yield eval(str(x) + ops[z] + str(y)), str(x) + ops[z] + op
except:
continue
z = z+1
return results
def pickTwo(array):
#returns an array with every 2-combo
#from input array
vomit = []
a,b = 0,1
while a < (len(array)-1):
choice = [array[a],array[b]]
vomit.append((choice,list((Counter(array) - Counter(choice)).elements())))
if b < (len(array)-1):
b = b+1
else:
b = a+2
a = a+1
return vomit
def reduceArray(array):
if len(array) == 2:
print("final",array)
return genOutputs(array[0],array[1])
else:
choices = pickTwo(array)
print(choices)
for choice in choices:
opsofchoices = genOutputs(choice[0][0],choice[0][1])
for each in opsofchoices:
newarray = list([each] + choice[1])
print(newarray)
return reduceArray(newarray)
reduceArray([1,2,3,4])

The largest issues when dealing with problems like this is handling operator precedence and parenthesis placement to produce every possible number from a given set. The easiest way to do this is to handle operations on a stack corresponding to the reverse polish notation of the infix notation. Once you do this, you can draw numbers and/or operations recursively until all n numbers and n-1 operations have been exhausted, and store the result. The below code generates all possible permutations of numbers (without replacement), operators (with replacement), and parentheses placement to generate every possible value. Note that this is highly inefficient since operators such as addition / multiplication commute so a + b equals b + a, so only one is necessary. Similarly by the associative property a + (b + c) equals (a + b) + c, but the below algorithm is meant to be a simple example, and as such does not make such optimizations.
def expr_perm(values, operations="+-*/", stack=[]):
solution = []
if len(stack) > 1:
for op in operations:
new_stack = list(stack)
new_stack.append("(" + new_stack.pop() + op + new_stack.pop() + ")")
solution += expr_perm(values, operations, new_stack)
if values:
for i, val in enumerate(values):
new_values = values[:i] + values[i+1:]
solution += expr_perm(new_values, operations, stack + [str(val)])
elif len(stack) == 1:
return stack
return solution
Usage:
result = expr_perm([4,5,6])
print("\n".join(result))

Python IndexError: list index out of range with long string

My code seems to work with shorter strings, but inexplicably to me gets stuck on others. The function of this is to replace characters with digits, and I have it print out the new string after each part is replaced. Any help you can give me is appreciated, thanks!
By the way, I did look at the similar questions on this and they did not answer my particular question, please don't remove my question.
possibleChars = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W',
'X','Y','Z','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v',
'w','x','y','z','1','2','3','4','5','6','7','8','9','0',' ',',','.','?','!','/','\\','[',']','{','}',
'|','<','>',';',':','+','=','-','_','(',')','#','#','$','%','^','&','*','~','`'] #0-92
possibleCharsToDigit = ['1','2','3','4','5','6','7','8','9','0','1','2','3','4','5','6','7','8','9','0','1','2','3',
'4','5','6','7','8','9','0','1','2','3','4','5','6','7','8','9','0','1','2','3','4','5','6','7','8',
'9','0','1','2','3','4','5','6','7','8','9','0','1','2','3','4','5','6','7','8','9','0','1','2','3',
'4','5','6','7','8','9','0','1','2','3','4','5','6','7','8','9','0','1','2','3'] #0-92
password = "How is your day today?"
def passwordToDigit(passToConvert):
passLen = len(passToConvert) #puts the length of the password in a variable
i = 0 #i is the selected character in the password
j = 0 #j is the selected possible char, i.e. '0' is 'A' in possibleChars or '1' in possibleCharsToDigit
while i < passLen:
if passToConvert[i] == possibleChars[j]:
passToConvert = passToConvert[0:i] + possibleCharsToDigit[j] + passToConvert[i + 1:]
i += 1
print passToConvert
else:
j += 1
print passToConvert
passwordToDigit(password)

When you are incrementing j variable inside the while loop, notice that when j gets bigger than the length of possibleCharsToDigit list then you are trying to access its element with index out of bounds.

you should set j = 0 in the if passToConvert[i] == possibleChars[j] clause:
def passwordToDigit(passToConvert):
passLen = len(passToConvert) #puts the length of the password in a variable
i = 0 #i is the selected character in the password
j = 0 #j is the selected possible char, i.e. '0' is 'A' in possibleChars or '1' in possibleCharsToDigit
while i < passLen:
if passToConvert[i] == possibleChars[j]:
passToConvert = passToConvert[0:i] + possibleCharsToDigit[j] + passToConvert[i + 1:]
i += 1
j = 0
print passToConvert
else:
j += 1
print passToConvert

As you increment j within your while loop, without ever resetting it, each time you successfully match a character and move onto the next one. This will cause your code to fail as soon as you have a character earlier in possibleChars than a previous one.
To illustrate:
passwordToDigit('ABCDEFGHIJKLMNOP') #will work correctly
passwordToDigit('BA') #will fail with IndexError
Quick Solution
The quickest solution would be to reset the j index when you find a match.
ie
# ...
if passToConvert[i] == possibleChars[j]:
passToConvert = passToConvert[0:i] + possibleCharsToDigit[j] + passToConvert[i + 1:]
i += 1
print passToConvert
j = 0 #Reset tje j index to start searching from beginning
else:
#...
Dict Solution
You could also spend some time refactoring your code to use a dict to map characters to digits as in:
import string
charopts = string.ascii_uppercase + string.ascii_lowercase + string.digits[1:] + r'0 ,.?!/\[]{}|<>;:+=-_()##$%^&*~`'
char2dig = dict((k,str((i+1)%10)) for i,k in enumerate(charopts))
def passwordToDigitDic(passToConvert):
newpass = ''
for c in passToConvert:
newpass += char2dig[c]
print(newpass + passToConvert[len(newpass):])
passwordToDigitDic('ABCDEFGH')
passwordToDigitDic('HGEFBCA')
Note, if you are ever interested in doing the translation in one go as opposed to step by step with prints, look into the string.translate function.

When i=passLen-1, then you are trying to access passToConvert[i+1], which is out of the range of passToConvert. Hence you are getting this error. Try this:
if passToConvert[i] == possibleChars[j]:
if i<passLen-1:
passToConvert = passToConvert[0:i] + possibleCharsToDigit[j] + passToConvert[i + 1:]
else:
passToConvert = passToConvert[0:i] + possibleCharsToDigit[j]
i += 1
print passToConvert

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding a instances of a string inside a string - python

Python's lists are zero-based, so when (x + position) == len(dnArray) trying to access dnArray[x + position] will be one past the last index. You should change your test to if (x + position) >= len(dnArray): to solve your problem.

I will suggest you to use python's regex instead for easiness. import re motif = "abc" dna = "helloabcheyabckjlkjsabckjetc" for i in re.finditer(motif,dna): print(i.start(), i.end()) It gives you the start,end index in the string for every occrence of motif in dna

Related

Draw a centered triforce surrounded by hyphens using Python

get single index number of a duplicate element in a list

Python: How to find all ways to decode a string?

Check result using 4 operations based with Python

Python IndexError: list index out of range with long string

Categories

Resources