Accessing two following indexes in a loop - python

I'm new to python (although it's more of a logical question rather than syntax question I belive), and I wonder what's the proper way to access two folowing objects in a loop.
I can't really provide a specific example without getting too cumbersome with my explanation but let's just say that I usually try to tackle this with either [index + 1] or [index - 1] and both are problematic when it comes to either the last (IndexError) or first (addresses the last position right at the beginning) iterations respectively.
Is there a well known way to address this? I haven't really seen any questions regarding this floating around so it made me think it's basic logic I'm missing here.
For example this peice of code that wouldn't have worked had I not wrapped everything with try/except, and also the second inner loop works only since it checks for identical characters, otherwise it could have been a mess.
(explanation for clarity - it recieves a string (my_string) and a number (k) and checks whether a sequence of identical characters the length of k exists in my_string)
# ex2 5
my_string = 'abaadddefggg'
sub_my_string = ''
k = 9
count3 = 0
try:
for index in range(len(my_string)):
i = 0
while i < k:
sub_my_string += my_string[index + i]
i += 1
for index2 in range(len(sub_my_string)):
if sub_my_string[index2] == sub_my_string[index2 - 1]:
count3 += 1
if count3 == k:
break
else:
sub_my_string = ""
count3 = 0
print(f"For length {k}, found the substring {sub_my_string}!")
except IndexError:
print(f"Didn't find a substring of length {k}")
Thanks a lot

First off, by definition you need to give special attention to the first or last element, because they really don't have a pair.
Second-off, I personally tend to use list-comprehensions of the following type for these cases -
[something_about_the_two_consecutive_elements(x, y) for x, y in zip(my_list, my_list[1:])]
And last but not least, the whole code snippet seems like major overkill. How about a simple one-liner -
my_string = 'abaadddefggg'
k = 3
existing_substrings = ([x * k for x in set(my_string) if x * k in my_string])
print(f'For length {k}, found substrings {existing_substrings}')
(To be adapted by one's needs of course)
Explanation:
For each of the unique characters in the string, we can check if a string of that character repeated k times appears in my_string.
set(my_string) gives a set of the unique characters over which we iterate (that's the for x in set(my_string) in the list comprehension).
Taking a character x and multiplying by k gives a string xx...x of length k.
So x * k in my_string tests whether my_string includes the substring xx...x.
Summing up the list-comprehension, we return only characters for which x * k in my_string is True.

If I am understanding what you are trying to achieve, I would approach this differently using string slices and a set.
my_string = "abaadddefggg"
sub_my_string = ""
k = 3
count3 = 0
found = False
for index, _ in enumerate(my_string):
if index + k > len(my_string):
continue
sub_my_string = my_string[index : index + k]
if len(set(sub_my_string)) == 1:
found = True
break
if found:
print(f"For length {k}, found the substring {sub_my_string}!")
else:
print(f"Didn't find a substring of length {k}")
Here we use:
enumerate as this usually signals that we are looking at the indices of an iterable.
Check whether the slice will be take us over the string length as there's no point in checking these.
Use the string slice to subset the string
Use the set to see if all the characters are the same.

Related

Reducing a string by detecting patterns

Given a string, I would like to detect the repeating substrings, and then reduce abab to (ab)2.
For instance, ababababacdecdecdeababab would reduce to (ab)4a(cde)3(ab)3.
The string does not have the same character twice in a row. So, aaab is an invalid string.
Here is the Python that I wrote:
def superscript(n):
return "".join(["⁰¹²³⁴⁵⁶⁷⁸⁹"[ord(c)-ord('0')] for c in str(n)])
signature = 'hdhdhdhdcgcgcgcgcbfbfbfbfbaeaeaeaeadhdhdhdhdcgcgcgcgcbfbfbfbfbaeaeaeaeadhdhdhdhdcgcgcgcgcbfbfbfbfbaeaeaeaeadhdhdhdhdcgcgcgcgcbfbfbfbfbaeaeaeaeadhdhdhdhdcgcgcgcgcbfbfbfbfbaeaeaeaeadhdhdhdhdcgcgcgcgcbfb'
d = {}
processed = []
for k in range(2, len(signature)):
i = 0
j = i + k
while j <= len(signature):
repeat_count = 0
while signature[i:i+k] == signature[j:j+k]:
repeat_count += 1
j += k
if repeat_count > 0 and i not in processed:
d[i] = [i+k, repeat_count + 1]
for j in range(i, (i+k)*repeat_count + 1):
processed.append(j)
i = j
j = i + k
else:
i += 1
j = i + k
od = collections.OrderedDict(sorted(d.items()))
output = ''
for k,v in od.items():
print(k, v)
output += '(' + signature[k:v[0]] + ')' + superscript(v[1])
Which aims to detect the repeating substrings of length 2, 3, 4, and so on. I mark the start and the end of a repeating substring by using a dict. I also mark the index of the processed characters by keeping a list to avoid replacing (ab)4 by (abab)2 (since the latter one will overwrite the beginning index in the dict).
The example string I work with is hdhdhdhdcgcgcgcgcbfbfbfbfbaeaeaeaeadhdhdhdhdcgcgcgcgcbfbfbfbfbaeaeaeaeadhdhdhdhdcgcgcgcgcbfbfbfbfbaeaeaeaeadhdhdhdhdcgcgcgcgcbfbfbfbfbaeaeaeaeadhdhdhdhdcgcgcgcgcbfbfbfbfbaeaeaeaeadhdhdhdhdcgcgcgcgcbfb which should output (hd)4(cg)4c(bf)4b(ae)4a(dh)4d(cg)4c(bf)4b(ae)4a(dh)4d(cg)4c(bf)4b(ae)4a(dh)4d(cg)4cbfb.
However, I get this output:
(hd)4(dcgcgcgcgcbfbfbfbfbaeaeaeaeadhdhdhdh)5(cg)4(ea)2(dh)4(hd)2(cg)4
I don't know whether this is a well-known problem, but I couldn't find any resources. I don't mind the time complexity of the algorithm.
Where did I make a mistake?
The algorithm I try to describe looks like this:
First, find the repeating substrings of length 2, then 3, then 4, ..., up to the length of the input string.
Then, do the same operation until there is no repetition at all.
A step-by-step example looks like this:
abcabcefefefghabcdabcdefefefghabcabcefefefghabcdabcdefefefgh
abcabc(ef)²ghabcdabcd(ef)²ghabcabc(ef)²ghabcdabcd(ef)²gh
(abc)³(ef)²ghabcdabcd(ef)²gh(abc)³(ef)²ghabcdabcd(ef)²gh
(abc)³(ef)²gh(abcd)²(ef)²gh(abc)³(ef)²gh(abcd)²(ef)²gh
((abc)³(ef)²gh(abcd)²(ef)²gh)²
You can use re.sub to match any repeating two chars and then pass a replacement function that formats the pattern you desire
import re
def superscript(n):
return "".join(["⁰¹²³⁴⁵⁶⁷⁸⁹"[ord(c)-ord('0')] for c in str(n)])
s = 'hdhdhdhdcgcgcgcgcbfbfbfbfbaeaeaeaeadhdhdhdhdcgcgcgcgcbfbfbfbfbaeaeaeaeadhdhdhdhdcgcgcgcgcbfbfbfbfbaeaeaeaeadhdhdhdhdcgcgcgcgcbfbfbfbfbaeaeaeaeadhdhdhdhdcgcgcgcgcbfbfbfbfbaeaeaeaeadhdhdhdhdcgcgcgcgcbfb'
max_length = 5
result = re.sub(
rf'(\w{{2,{max_length}}}?)(\1+)', # Omit the second number in the repetition to match any number of repeating chars (\w{2,}?)(\1+)
lambda m: f'({m.group(1)}){superscript(len(m.group(0))//len(m.group(1)))}',
s
)
print(result) # (hd)⁴(cg)⁴c(bf)⁴b(ae)⁴a(dh)⁴d(cg)⁴c(bf)⁴b(ae)⁴a(dh)⁴d(cg)⁴c(bf)⁴b(ae)⁴a(dh)⁴d(cg)⁴c(bf)⁴b(ae)⁴a(dh)⁴d(cg)⁴c(bf)⁴b(ae)⁴a(dh)⁴d(cg)⁴cbfb
The problem in your code happens when you put together the list of repeating patterns. When you are merging patterns of length 2 and patterns of length 3, you are using patterns that are not compatible with each other.
hdhdhdhd = (hd)4 starts at index 0 and ends at index 7 (included).
(dcgcgcgcgcbfbfbfbfbaeaeaeaeadhdhdhdh)5, which is a correct pattern in your string, starts at index 7 (included).
This means when you merge the two patterns, you get an incorrect end result because the letter at index 7 is shared.
This problem stems from the fact that one pattern is even in length, while the other is odd and their limits are not aligning. So, they don't even overwrite each other in d and you end up with your result.
I think you tried to solve this problem using the dictionary d with the starting index as key and with the processed list, but there is still a couple of problems.
for j in range(i, (i+k)*repeat_count + 1): should be for l in range(i, j), otherwise you are skipping the last index of the pattern (j). Also, I changed the loop index to l because j was already used. This fixed the problem I described above.
Even with that fixed, there is still an issue. You check for patterns starting from length 2 (for k in range(2, len(signature))), so single letters not belonging to any pattern, like the c in (hd)4(cg)4c(bf)4 will never make it in the dictionary and therefore you will still have overlapping patterns with different lengths at those positions.

Check how many character need to be deleted to make an anagram in Python

I wrote python code to check how many characters need to be deleted from two strings for them to become anagrams of each other.
This is the problem statement "Given two strings, and , that may or may not be of the same length, determine the minimum number of character deletions required to make and anagrams. Any characters can be deleted from either of the strings"
def makeAnagram(a, b):
# Write your code here
ac=0 # tocount the no of occurences of chracter in a
bc=0 # tocount the no of occurences of chracter in b
p=False #used to store result of whether an element is in that string
c=0 #count of characters to be deleted to make these two strings anagrams
t=[] # list of previously checked chracters
for x in a:
if x in t == True:
continue
ac=a.count(x)
t.insert(0,x)
for y in b:
p = x in b
if p==True:
bc=b.count(x)
if bc!=ac:
d=ac-bc
c=c+abs(d)
elif p==False:
c=c+1
return(c)
You can use collections.Counter for this:
from collections import Counter
def makeAnagram(a, b):
return sum((Counter(a) - Counter(b) | Counter(b) - Counter(a)).values())
Counter(x) (where x is a string) returns a dictionary that maps characters to how many times they appear in the string.
Counter(a) - Counter(b) gives you a dictionary that maps characters which are overabundant in b to how many times they appear in b more than the number of times they appear in a.
Counter(b) - Counter(a) is like above, but for characters which are overabundant in a.
The | merges the two resulting counters. We then take the values of this, and sum them to get the total number of characters which are overrepresented in either string. This is equivalent to the minimum number of characters that need to be deleted to form an anagram.
As for why your code doesn't work, I can't pin down any one problem with it. To obtain the code below, all I did was some simplification (e.g. removing unnecessary variables, looping over a and b together, removing == True and == False, replacing t with a set, giving variables descriptive names, etc.), and the code began working. Here is that simplified working code:
def makeAnagram(a, b):
c = 0 # count of characters to be deleted to make these two strings anagrams
seen = set() # set of previously checked characters
for character in a + b:
if character not in seen:
seen.add(character)
c += abs(a.count(character) - b.count(character))
return c
I recommend you make it a point to learn how to write simple/short code. It may not seem important compared to actually tackling the algorithms and getting results. It may seem like cleanup or styling work. But it pays off enormously. Bug are harder to introduce in simple code, and easier to spot. Oftentimes simple code will be more performant than equivalent complex code too, either because the programmer was able to more easily see ways to improve it, or because the more performant approach just arose naturally from the cleaner code.
Assuming there are only lowercase letters
The idea is to make character count arrays for both the strings and store frequency of each character. Now iterate the count arrays of both strings and difference in frequency of any character abs(count1[str1[i]-‘a’] – count2[str2[i]-‘a’]) in both the strings is the number of character to be removed in either string.
CHARS = 26
# function to calculate minimum
# numbers of characters
# to be removed to make two
# strings anagram
def remAnagram(str1, str2):
count1 = [0]*CHARS
count2 = [0]*CHARS
i = 0
while i < len(str1):
count1[ord(str1[i])-ord('a')] += 1
i += 1
i =0
while i < len(str2):
count2[ord(str2[i])-ord('a')] += 1
i += 1
# traverse count arrays to find
# number of characters
# to be removed
result = 0
for i in range(26):
result += abs(count1[i] - count2[i])
return result
Here time complexity is O(n + m) where n and m are the length of the two strings
Space complexity is O(1) as we use only array of size 26
This can be further optimised by just using a single array for taking the count.
In this case for string s1 -> we increment the counter
for string s2 -> we decrement the counter
def makeAnagram(a, b):
buffer = [0] * 26
for char in a:
buffer[ord(char) - ord('a')] += 1
for char in b:
buffer[ord(char) - ord('a')] -= 1
return sum(map(abs, buffer))
if __name__ == "__main__" :
str1 = "bcadeh"
str2 = "hea"
print(makeAnagram(str1, str2))
Output : 3

Optimal brute force solution for finding longest palindromic substring

This is my current approach:
def isPalindrome(s):
if (s[::-1] == s):
return True
return False
def solve(s):
l = len(s)
ans = ""
for i in range(l):
subStr = s[i]
for j in range(i + 1, l):
subStr += s[j]
if (j - i + 1 <= len(ans)):
continue
if (isPalindrome(subStr)):
ans = max(ans, subStr, key=len)
return ans if len(ans) > 1 else s[0]
print(solve(input()))
My code exceeds the time limit according to the auto scoring system. I've already spend some time to look up on Google, all of the solutions i found have the same idea with no optimization or using dynamic programming, but sadly i must and only use brute force to solve this problem. I was trying to break the loop earlier by skipping all the substrings that are shorter than the last found longest palindromic string, but still end up failing to meet the time requirement. Is there any other way to break these loops earlier or more time-efficient approach than the above?
With subStr += s[j], a new string is created over the length of the previous subStr. And with s[::-1], the substring from the previous offset j is copied over and over again. Both are inefficient because strings are immutable in Python and have to be copied as a new string for any string operation. On top of that, the string comparison in s[::-1] == s is also inefficient because you've already compared all of the inner substrings in the previous iterations and need to compare only the outermost two characters at the current offset.
You can instead keep track of just the index and the offset of the longest palindrome so far, and only slice the string upon return. To account for palindromes of both odd and even lengths, you can either increase the index by 0.5 at a time, or double the length to avoid having to deal with float-to-int conversions:
def solve(s):
length = len(s) * 2
index_longest = offset_longest = 0
for index in range(length):
offset = 0
for offset in range(1 + index % 2, min(index, length - index), 2):
if s[(index - offset) // 2] != s[(index + offset) // 2]:
offset -= 2
break
if offset > offset_longest:
index_longest = index
offset_longest = offset
return s[(index_longest - offset_longest) // 2: (index_longest + offset_longest) // 2 + 1]
Solved by using the approach "Expand Around Center", thanks #Maruthi Adithya
This modification of your code should improve performance. You can stop your code when the max possible substring is smaller than your already computed answer. Also, you should start your second loop with j+ans+1 instead of j+1 to avoid useless iterations :
def solve(s):
l = len(s)
ans = ""
for i in range(l):
if (l-i+1 <= len(ans)):
break
subStr = s[i:len(ans)]
for j in range(i + len(ans) + 1, l+1):
if (isPalindrome(subStr)):
ans = subStr
subStr += s[j]
return ans if len(ans) > 1 else s[0]
This is a solution that has a time complexity greater than the solutions provided.
Note: This post is to think about the problem better and does not specifically answer the question. I have taken a mathematical approach to find a time complexity greater than 2^L (where L is size of input string)
Note: This is a post to discuss potential algorithms. You will not find the answer here. And the logic shown here has not been proven extensively.
Do let me know if there is something that I haven't considered.
Approach: Create set of possible substrings. Compare and find the maximum pair* from this set that has the highest possible pallindrome.
Example case with input string: "abc".
In this example, substring set has: "a","b","c","ab","ac","bc","abc".
7 elements.
Comparing each element with all other elements will involve: 7^2 = 49 calculations.
Hence, input size is 3 & no of calculations is 49.
Time Complexity:
First compute time complexity for generating the substring set:
<img src="https://latex.codecogs.com/gif.latex?\sum_{a=1}^{L}\left&space;(&space;C_{a}^{L}&space;\right&space;)" title="\sum_{a=1}^{L}\left ( C_{a}^{L} \right )" />
(The math equation is shown in the code snippet)
Here, we are adding all the different substring size combination from the input size L.
To make it clear: In the above example input size is 3. So we find all the pairs with size =1 (i.e: "a","b","c"). Then size =2 (i.e: "ab","ac","bc") and finally size = 3 (i.e: "abc").
So choosing 1 character from input string = combination of taking L things 1 at a time without repetition.
In our case number of combinations = 3.
This can be mathematically shown as (where a = 1):
<img src="https://latex.codecogs.com/gif.latex?C_{a}^{L}" title="C_{a}^{L}" />
Similarly choosing 2 char from input string = 3
Choosing 3 char from input string = 1
Finding time complexity of palindrome pair from generated set with maximum length:
Size of generated set: N
For this we have to compare each string in set with all other strings in set.
So N*N, or 2 for loops. Hence the final time complexity is:
<img src="https://latex.codecogs.com/gif.latex?\sum_{a=1}^{L}\left&space;(&space;C_{a}^{L}&space;\right&space;)^{2}" title="\sum_{a=1}^{L}\left ( C_{a}^{L} \right )^{2}" />
This is diverging function greater than 2^L for L > 1.
However, there can be multiple optimizations applied to this. For example: there is no need to compare "a" with "abc" as "a" will also be compared with "a". Even if this optimization is applied, it will still have a time complexity > 2^L (For the most cases).
Hope this gave you a new perspective to the problem.
PS: This is my first post.
You should not find the string start from the beginning of that string, but you should start from the middle of it & expand the current string
For example, for the string xyzabccbalmn, your solution will cost ~ 6 * 11 comparison but searching from the middle will cost ~ 11 * 2 + 2 operations
But anyhow, brute-forcing will never ensure that your solution will run fast enough for any arbitrary string.
Try this:
def solve(s):
if len(s)==1:
print(0)
return '1'
if len(s)<=2 and not(isPalindrome(s)):
print (0)
return '1'
elif isPalindrome(s):
print( len(s))
return '1'
elif isPalindrome(s[0:len(s)-1]) or isPalindrome(s[1:len(s)]):
print (len(s)-1)
return '1'
elif len(s)>=2:
solve(s[0:len(s)-1])
return '1'
return 0

Find repeats with certain length within a string using python

I am trying to use the regex module to find non-overlapping repeats (duplicated sub-strings) within a given string (30 char), with the following requirements:
I am only interested in non-overlapping repeats that are 6-15 char long.
allow 1 mis-match
return the positions for each match
One way I thought of is that for each possible repeat length, let python loop through the 30char string input. For example,
string = "ATAGATATATGGCCCGGCCCATAGATATAT" #input
#for 6char repeats, first one in loop would be for the following event:
text = "ATAGAT"
text2 ="(" + text + ")"+ "{e<=1}" #this is to allow 1 mismatch later in regex
string2="ATATGGCCCGGCCCATAGATATAT" #string after excluding text
for x in regex.finditer(text2,string2,overlapped=True):
print x.span()
#then still for 6char repeats, I will move on to text = "TAGATA"...
#after 6char, loop again for 7char...
There should be two outputs for this particular string = "ATAGATATATGGCCCGGCCCATAGATATAT". 1. The bold two "ATAGATATAT" + 1 mismatch: "ATAGATATATG" &"CATAGATATAT" with position index returned as (0,10)&(19, 29); 2. "TGGCCC" & "GGCCCA" (need add one mismatch to be at least 6 char), with index (9,14)&(15,20). Numbers can be in a list or table.
I'm sorry that I didn't include a real loop, but I hope the idea is clear...As you can see, this is a very less efficient method, not to mention it would create redundancy --- e.g. 10char repeats will be counted more than once, because it would suit for 9,8,7 and 6 char repeats loops. Moreover, I have a lot of such 30 char strings to work with, so I would appreciate your advice on some cleaner methods.
Thank you very much:)
I'd try straightforward algorithm instead of regex (which are quite confusing in this instance);
s = "ATAGATATATGGCCCGGCCCATAGATATAT"
def fuzzy_compare(s1, s2):
# sanity check
if len(s1) != len(s2):
return False
diffs = 0
for a, b in zip(s1, s2):
if a != b:
diffs += 1
if diffs > 1:
return False
return True
slen = len(s) # 30
for l in range(6, 16):
i = 0
while (i + l * 2) <= slen:
sub1 = s[i:i+l]
for j in range(i+l, slen - l):
sub2 = s[j:j+l]
if fuzzy_compare(sub1, sub2):
# checking if this could be partial
partial = False
if i + l < j and j + l < slen:
extsub1 = s[i:i+l+1]
extsub2 = s[j:j+l+1]
# if it is partial, we'll get it later in the main loop
if fuzzy_compare(extsub1, extsub2):
partial = True
if not partial:
print (i, i+l), (j, j+l)
i += 1
It's a first draft, so feel free to experiment with it. It also seems to be clunky and not optimal, but try running it first - it may be sufficient enough.

Basic indexing recurrences of a substring within a string (python)

I'm working on teaching myself basic programming.
One simple project is to find the index of recurrences of a substring within a string. So for example, in string "abcdefdef" and substring "def", I would like the output to be 3 and 6. I have some code written, but I'm not getting the answers I want. Following is what I have written
Note:I'm aware that there may be easier way to produce the result, leveraging built-in features/packages of the language, such as Regular Expressions. I'm also aware that my approach is probably not an optimal algorithm. Never the less, at this time, I'm only seeking advice on fixing the following logic, rather than using more idiomatic approaches.
import string
def MIT(String, substring): # "String" is the main string I'm searching within
String_list = list(String)
substring_list = list(substring)
i = 0
j = 0
counter = 0
results = []
while i < (len(String)-1):
if [j] == [i]:
j = j + 1
i = i + 1
counter = counter + 1
if counter == len(substring):
results.append([i - len(substring)+1])
counter = 0
j = 0
i = i+1
else:
counter = 0
j = 0
i = i+1
print results
return
My line of reasoning is as such. I turn the String and substring into a list. That allows for indexing of each letter in the string. I set i and j = 0--these will be my first values in the String and substring index, respectively. I also have a new variable, counter, which I set = to 0. Basically, I'm using counter to count how many times the letter in position [i] is equal to the element in position [j]. If counter equals the length of substring, then I know that [i - len(substring) + 1] is a position where my substring starts, so I add it to a list called results. Then I reset counter and j and continue searching for more substrings.
I know the code is awkward, but I thought that I should still be able to get the answer. Instead I get:
>>> MIT("abcdefghi", "def")
[[3]]
>>> MIT("abcdefghi", "efg")
[[3]]
>>> MIT("abcdefghi", "b")
[[1]]
>>> MIT("abcdefghi", "k")
[[1]]
Any thoughts?
The regular expressions module (re) is much more suited for this task.
Good reference:
http://docs.python.org/howto/regex.html
Also:
http://docs.python.org/library/re.html
EDIT:
A more 'manual' way may be to use slicing
s = len(String)
l = len(substring)
for i in range(s-l+1):
if String[i:i+l] == substring:
pass #add to results or whatever
I'm not clear on whether you want to learn some good string searching algorithms, or a straightforward way to do it in Python. If it's the latter, then string.find is your friend. Something like
def find_all_indexes(needle, haystack):
"""Find the index for the beginning of each occurrence of ``needle`` in ``haystack``. Overlaps are allowed."""
indexes = []
last_index = haystack.find(needle)
while -1 != last_index:
indexes.append(last_index)
last_index = haystack.find(needle, last_index + 1)
return indexes
if __name__ == '__main__':
print find_all_indexes('is', 'This is my string.')
While this is a pretty naive approach, it should be easily understandable.
If you're looking for something that uses even less of the standard library (and will actually teach you a fairly common algorithm used when implementing libraries), you could try implementing the Boyer-Moore string search algorithm.
The main/major problem are the following:
for comparison, use: if String[i] == substring[j]
you increment i twice when you found a match, remove the second increment.
the loop should go till while i < len(String):
and of course it won't find overlapping matches (eg: MIT("aaa", "aa"))
There are some minor "problems", it's not really pythonic, there is no need for building lists, increment is clearer if written i += 1, a useful function should return the values not print them, etc...
If you want proper and fast code, check the classic algorithm book: http://www.amazon.com/Introduction-Algorithms-Thomas-H-Cormen/dp/0262033844 . It has a whole chapter about string search.
If you want a pythonic solution without implementing the whole thing check the other answers.
First, I added some comments to your code to give some tips
import string
def MIT(String, substring):
String_list = list(String) # this doesn't need to be done; you can index strings
substring_list = list(substring)
i = 0
j = 0
counter = 0
results = []
while i < (len(String)-1):
if [j] == [i]: # here you're comparing two, one-item lists. you must do substring[j] and substring[i]
j = j + 1
i = i + 1
counter = counter + 1
if counter == len(substring):
results.append([i - len(substring)+1]) # remove the brackets; append doesn't require them
counter = 0
j = 0
i = i+1 # remove this
else:
counter = 0
j = 0
i = i+1
print results
return
Here's how I would do it without using built-in libraries and such:
def MIT(fullstring, substring):
results = []
sub_len = len(substring)
for i in range(len(fullstring)): # range returns a list of values from 0 to (len(fullstring) - 1)
if fullstring[i:i+sub_len] == substring: # this is slice notation; it means take characters i up to (but not including) i + the length of th substring
results.append(i)
return results
For finding the position of substring in a string this algorithm will do:
def posnof_substring(string,sub_string):
l=len(sub_string)
for i in range(len(string)-len(sub_string)+1):
if(string[i:i+len(sub_string)] == sub_string ):
posn=i+1
return posn
I myself checked this algorithm and it worked!
Based on #Hank Gay's answer. Using regex plus adding an option to search for words.
def find_all(item, text, as_word=False):
indexes = []
re_term = rf'\b{item}\b' if as_word else item
for r in re.finditer(re_term, text.lower()):
indexes.append(r.start())
return indexes
if __name__ == '__main__':
word = 'for'
text = 'Now for a bold step forward.'
print(find_all(word, text), find_all(word, text, as_word=True))

Categories

Resources