CodingBat Warm up 2 Python last2 - python

I think I have the solution to this problem but when I run it on pythonfiddle.com or Canopy nothing comes up.
The problem is:
Given a string, return the count of the number of times that a
substring length 2 appears in the string and also as the last 2 chars
of the string, so "hixxxhi" yields 1 (we won't count the end
substring).
last2('hixxhi') → 1
last2('xaxxaxaxx') → 1
last2('axxxaaxx') → 2
My solution is :
def last2(str):
test2 = str[-2:]
start = 0
count = 0
while True:
if str.find(test2, start, -2) > 0:
count +=1
start = str.find(test2, start, -2) + 1
else:
break
return count
When I call the function last2, I get nothing. Is there something I'm missing?

str.find() returns -1 if a match is not found. If the match is found at the start of the string, 0 is returned, but your test condition excludes this case.
if str.find(test2, start, -1) > 0:
You want to match 0 too:
if str.find(test2, start, -2) >= 0:
You could avoid using str.find() twice here, and you want to allow for the one but last character to count too (xxxx has xx in there twice outside of matching the last two characters). Last but not least, if the string is shorter than length three, there never will be any matches:
def last2(value):
if len(value) < 3:
return 0
test2 = value[-2:]
start = 0
count = 0
while True:
index = value.find(test2, start, -1)
if index == -1:
break
count +=1
start = index + 1
return count
I've avoided shadowing the built-in str() function here too.
Demo:
>>> last2('hixxhi')
1
>>> last2('xaxxaxaxx')
1
>>> last2('axxxaaxx')
2

You've got an off by one error. str.find returns -1 if the string does not contain the substring, but it will return 0 if the substring is at the beginning of the string. Notice that your method works fine on the third example.
It should be:
if str.find(test2, start -2) >= 0:

Related

How to find the max number of times a sequence of characters repeats consecutively in a string? [duplicate]

This question already has answers here:
How to count consecutive repetitions of a substring in a string?
(4 answers)
Closed 1 year ago.
I'm working on a cs50/pset6/dna project. I'm struggling with finding a way to analyze a sequence of strings, and gather the maximum number of times a certain sequence of characters repeats consecutively. Here is an example:
String: JOKHCNHBVDBVDBVDJHGSBVDBVD
Sequence of characters I should look for: BVD
Result: My function should be able to return 3, because in one point the characters BVD repeat three times consecutively, and even though it repeats again two times, I should look for the time that it repeats the most number of times.
It's a bit lame, but one "brute-force"ish way would be to just check for the presence of the longest substring possible. As soon as a substring is found, break out of the loop:
EDIT - Using a function might be more straight forward:
def get_longest_repeating_pattern(string, pattern):
if not pattern:
return ""
for i in range(len(string)//len(pattern), 0, -1):
current_pattern = pattern * i
if current_pattern in string:
return current_pattern
return ""
string = "JOKHCNHBVDBVDBVDJHGSBVDBVD"
pattern = "BVD"
longest_repeating_pattern = get_longest_repeating_pattern(string, pattern)
print(len(longest_repeating_pattern))
EDIT - explanation:
First, just a simple for-loop that starts at a larger number and goes down to a smaller number. For example, we start at 5 and go down to 0 (but not including 0), with a step size of -1:
>>> for i in range(5, 0, -1):
print(i)
5
4
3
2
1
>>>
if string = "JOKHCNHBVDBVDBVDJHGSBVDBVD", then len(string) would be 26, if pattern = "BVD", then len(pattern) is 3.
Back to my original code:
for i in range(len(string)//len(pattern), 0, -1):
Plugging in the numbers:
for i in range(26//3, 0, -1):
26//3 is an integer division which yields 8, so this becomes:
for i in range(8, 0, -1):
So, it's a for-loop that goes from 8 to 1 (remember, it doesn't go down to 0). i takes on the new value for each iteration, first 8 , then 7, etc.
In Python, you can "multiply" strings, like so:
>>> pattern = "BVD"
>>> pattern * 1
'BVD'
>>> pattern * 2
'BVDBVD'
>>> pattern * 3
'BVDBVDBVD'
>>>
A slightly less bruteforcey solution:
string = 'JOKHCNHBVDBVDBVDJHGSBVDBVD'
key = 'BVD'
len_k = len(key)
max_l = 0
passes = 0
curr_len=0
for i in range(len(string) - len_k + 1): # split the string into substrings of same len as key
if passes > 0: # If key was found in previous sequences, pass ()this way, if key is 'BVD', we will ignore 'VD.' and 'D..'
passes-=1
continue
s = string[i:i+len_k]
if s == key:
curr_len+=1
if curr_len > max_l:
max_l=curr_len
passes = len(key)-1
if prev_s == key:
if curr_len > max_l:
max_l=curr_len
else:
curr_len=0
prev_s = s
print(max_l)
You can do that very easily, elegantly and efficiently using a regex.
We look for all sequences of at least one repetition of your search string. Then, we just need to take the maximum length of these sequences, and divide by the length of the search string.
The regex we use is '(:?<your_sequence>)+': at least one repetition (the +) of the group (<your_sequence>). The :? is just here to make the group non capturing, so that findall returns the whole match, and not just the group.
In case there is no match, we use the default parameter of the max function to return 0.
The code is very short, then:
import re
def max_consecutive_repetitions(search, data):
search_re = re.compile('(?:' + search + ')+')
return max((len(seq) for seq in search_re.findall(data)), default=0) // len(search)
Sample run:
print(max_consecutive_repetitions("BVD", "JOKHCNHBVDBVDBVDJHGSBVDBVD"))
# 3
This is my contribution, I'm not a professional but it worked for me (sorry for bad English)
results = {}
# Loops through all the STRs
for i in range(1, len(reader.fieldnames)):
STR = reader.fieldnames[i]
j = 0
s=0
pre_s = 0
# Loops through all the characters in sequence.txt
while j < (len(sequence) - len(STR)):
# checks if the character we are currently looping is the same than the first STR character
if STR[0] == sequence[j]:
# while the sub-string since j to j - STR lenght is the same than STR, I called this a streak
while sequence[j:(j + len(STR))] == STR:
# j skips to the end of sub-string
j += len(STR)
# streaks counter
s += 1
# if s > 0 means that that the whole STR and sequence coincided at least once
if s > 0:
# save the largest streak as pre_s
if s > pre_s:
pre_s = s
# restarts the streak counter to continue exploring the sequence
s=0
j += 1
# assigns pre_s value to a dictionary with the current STR as key
results[STR] = pre_s
print(results)

Is there a way to increment the iterator if an 'if' condition is met

I'm solving this HackerRank challenge:
Alice has a binary string. She thinks a binary string is beautiful if and only if it doesn't contain the substring '010'.
In one step, Alice can change a 0 to a 1 or vice versa. Count and print the minimum number of steps needed to make Alice see the string as beautiful.
So basically count the number of '010' occurrences in the string 'b' passed to the function.
I want to increment i by 2 once the if statement is true so that I don't include overlapping '010' strings in my count.
And I do realize that I can just use the count method but I wanna know why my code isn't working the way I want to it to.
def beautifulBinaryString(b):
count = 0
for i in range(len(b)-2):
if b[i:i+3]=='010':
count+=1
i+=2
return count
Input: 0101010
Expected Output: 2
Output I get w/ this code: 3
You are counting overlapping sequences. For your input 0101010 you find 010 three times, but the middle 010 overlaps with the outer two 010 sequences:
0101010
--- ---
---
You can't increment i in a for loop, because the for loop construct sets i at the top. Giving i a different value inside the loop body doesn't change this.
Don't use a for loop; you could use a while loop:
def beautifulBinaryString(b):
count = 0
i = 0
while i < len(b) - 2:
if b[i:i+3]=='010':
count += 1
i += 2
i += 1
return count
A simpler solution is to just use b.count("010"), as you stated.
If you want to do it using a for loop, you can add a delta variable to keep track of the number of positions that you have to jump over the current i value.
def beautifulBinaryString(b):
count = 0
delta = 0
for i in range(len(b)-2):
try:
if b[i+delta:i+delta+3]=='010':
count+=1
delta=delta+2
except IndexError:
break
return count
You don't need to count the occurrences; as soon as you find one occurrence, the string is "ugly". If you never find one, it's beautiful.
def is_beautiful(b):
for i in range(len(b) - 2):
if b[i:i+3] == '010':
return False
return True
You can also avoid the slicing by simply keeping track of whether you've started to see 010:
seen_0 = False
seen_01 = False
for c in b:
if seen_01 and c == '0':
return False
elif seen_1 and c == '1':
seen_01 = True
elif c == '0':
seen_0 = True
else:
# c == 1, but it doesn't follow a 0
seen_0 = False
seen_01 = False
return True

Intersection between two list of strings python

This is in reference to problem that is done in Java:
Finding the intersection between two list of string candidates.
I tried to find an equivalent solution in python:
def solution(S):
length = len(S)
if length == 0:
return 1
count = 0
i = 0
while i <= length:
prefix = S[0:i:]
suffix = S[length-i:length]
print suffix
if prefix == suffix:
count += 1
i += 1
return count
print solution("")
print solution("abbabba")
print solution("codility")
I know I am not doing the step in terms of suffix.
I am getting the values as 1,4,2 instead of 1,4,0
Your current code runs through giving the following prefix, suffix pairs for the second two examples:
a a
ab ba
abb bba
abba abba
abbab babba
abbabb bbabba
abbabba abbabba
c y
co ty
cod ity
codi lity
codil ility
codili dility
codilit odility
codility codility
I presume you are trying to return the number of times the first n characters of a string are the same as the last n. The reason the word codility is returning 2 is because you are starting the index from zero, so it is matching the empty string and the full string. Try instead:
def solution(S):
length = len(S)
if length == 0:
return 1
count = 0
i = 1 # Don't test the empty string!
while i < length: # Don't test the whole string! Use '<'
prefix = S[:i] # Up to i
suffix = S[length-i:] # length - i to the end
print prefix, suffix
if prefix == suffix:
count += 1
i += 1
return count
print solution("")
print solution("abbabba")
print solution("codility")
This returns 1, 2, 0.
Perhaps you were looking to test if the prefix is the same as the suffix backwards, and only test half the length of the string? In this case you should try the following:
def solution(S):
length = len(S)
if length == 0:
return 1
count = 0
i = 1 # Don't test the empty string!
while i <= (length + 1)/2: # Test up to halfway
prefix = S[:i] # Up to i
suffix = S[:length-i-1:-1] # Reverse string, up to length - i - 1
print prefix, suffix
if prefix == suffix:
count += 1
i += 1
return count
print solution("")
print solution("abbabba")
print solution("codility")
This returns 1, 4, 0.

Finding the length of longest repeating?

I have tried plenty of different methods to achieve this, and I don't know what I'm doing wrong.
reps=[]
len_charac=0
def longest_charac(strng)
for i in range(len(strng)):
if strng[i] == strng[i+1]:
if strng[i] in reps:
reps.append(strng[i])
len_charac=len(reps)
return len_charac
Remember in Python counting loops and indexing strings aren't usually needed. There is also a builtin max function:
def longest(s):
maximum = count = 0
current = ''
for c in s:
if c == current:
count += 1
else:
count = 1
current = c
maximum = max(count,maximum)
return maximum
Output:
>>> longest('')
0
>>> longest('aab')
2
>>> longest('a')
1
>>> longest('abb')
2
>>> longest('aabccdddeffh')
3
>>> longest('aaabcaaddddefgh')
4
Simple solution:
def longest_substring(strng):
len_substring=0
longest=0
for i in range(len(strng)):
if i > 0:
if strng[i] != strng[i-1]:
len_substring = 0
len_substring += 1
if len_substring > longest:
longest = len_substring
return longest
Iterates through the characters in the string and checks against the previous one. If they are different then the count of repeating characters is reset to zero, then the count is incremented. If the current count beats the current record (stored in longest) then it becomes the new longest.
Compare two things and there is one relation between them:
'a' == 'a'
True
Compare three things, and there are two relations:
'a' == 'a' == 'b'
True False
Combine these ideas - repeatedly compare things with the things next to them, and the chain gets shorter each time:
'a' == 'a' == 'b'
True == False
False
It takes one reduction for the 'b' comparison to be False, because there was one 'b'; two reductions for the 'a' comparison to be False because there were two 'a'. Keep repeating until the relations are all all False, and that is how many consecutive equal characters there were.
def f(s):
repetitions = 0
while any(s):
repetitions += 1
s = [ s[i] and s[i] == s[i+1] for i in range(len(s)-1) ]
return repetitions
>>> f('aaabcaaddddefgh')
4
NB. matching characters at the start become True, only care about comparing the Trues with anything, and stop when all the Trues are gone and the list is all Falses.
It can also be squished into a recursive version, passing the depth in as an optional parameter:
def f(s, depth=1):
s = [ s[i] and s[i]==s[i+1] for i in range(len(s)-1) ]
return f(s, depth+1) if any(s) else depth
>>> f('aaabcaaddddefgh')
4
I stumbled on this while trying for something else, but it's quite pleasing.
You can use itertools.groupby to solve this pretty quickly, it will group characters together, and then you can sort the resulting list by length and get the last entry in the list as follows:
from itertools import groupby
print(sorted([list(g) for k, g in groupby('aaabcaaddddefgh')],key=len)[-1])
This should give you:
['d', 'd', 'd', 'd']
This works:
def longestRun(s):
if len(s) == 0: return 0
runs = ''.join('*' if x == y else ' ' for x,y in zip(s,s[1:]))
starStrings = runs.split()
if len(starStrings) == 0: return 1
return 1 + max(len(stars) for stars in starStrings)
Output:
>>> longestRun("aaabcaaddddefgh")
4
First off, Python is not my primary language, but I can still try to help.
1) you look like you are exceeding the bounds of the array. On the last iteration, you check the last character against the character beyond the last character. This normally leads to undefined behavior.
2) you start off with an empty reps[] array and compare every character to see if it's in it. Clearly, that check will fail every time and your append is within that if statement.
def longest_charac(string):
longest = 0
if string:
flag = string[0]
tmp_len = 0
for item in string:
if item == flag:
tmp_len += 1
else:
flag = item
tmp_len = 1
if tmp_len > longest:
longest = tmp_len
return longest
This is my solution. Maybe it will help you.
Just for context, here is a recursive approach that avoids dealing with loops:
def max_rep(prev, text, reps, rep=1):
"""Recursively consume all characters in text and find longest repetition.
Args
prev: string of previous character
text: string of remaining text
reps: list of ints of all reptitions observed
rep: int of current repetition observed
"""
if text == '': return max(reps)
if prev == text[0]:
rep += 1
else:
rep = 1
return max_rep(text[0], text[1:], reps + [rep], rep)
Tests:
>>> max_rep('', 'aaabcaaddddefgh', [])
4
>>> max_rep('', 'aaaaaabcaadddddefggghhhhhhh', [])
7

Count overlapping substring in a string [duplicate]

This question already has answers here:
String count with overlapping occurrences [closed]
(25 answers)
Closed 1 year ago.
Say I have string = 'hannahannahskdjhannahannah' and I want to count the number of times the string hannah occurs, I can't simply use count, because that only counts the substring once in each case. That is, I am expecting to return 4 but only returns 2 when I run this with string.count('hannah').
You could use a running index to fetch the next occurance:
bla = 'hannahannahskdjhannahannah'
cnt = 0
idx = 0
while True:
idx = bla.find('hannah', idx)
if idx >= 0:
cnt += 1
idx += 1
else:
break
print(cnt)
Gives:
>> 4
How about something like this?
>>> d = {}
>>> string = 'hannahannahskdjhannahannah'
>>> for i in xrange(0,len(string)-len('hannah')+1):
... if string[i:i+len('hannah')] == 'hannah':
... d['hannah'] = d.get('hannah',0)+1
...
>>> d
{'hannah': 4}
>>>
This searches the string for hannah by splicing the string iteratively from index 0 all the way up to the length of the string minus the length of hannah
'''
s: main string
sub: sub-string
count: number of sub-strings found
p: use the found sub-string's index in p for finding the next occurrence of next sub-string
'''
count=0
p=0
for letter in s:
p=s.find(sub,p)
if(p!=-1):
count+=1
p+=1
print count
If you want to count also nonconsecutive substrings, this is the way to do it
def subword(lookup,whole):
if len(whole)<len(lookup):
return 0
if lookup==whole:
return 1
if lookup=='':
return 1
if lookup[0]==whole[0]:
return subword(lookup[1:],whole[1:])+subword(lookup,whole[1:])
return subword(lookup,whole[1:])
def Count_overlap(string, substring):
count = 0
start = 0
while start < len(string):
pos = string.find(substring, start)
if pos != -1:
start = pos + 1
count += 1
else:
break
return count
string = "hannahannahskdjhannahannah"
print(Count_overlap(string, "hannah"))
Don't want to answer this for you as it's simple enough to work out yourself.
But if I were you I'd use the string.find() method which takes the string you're looking for and the position to start looking from, combined with a while loop which uses the result of the find method as it's condition in some way.
That should in theory give you the answer.

Categories

Resources