Python increment to the next position in the string to search [duplicate] - python

This question already has answers here:
Finding multiple occurrences of a string within a string in Python [duplicate]
(19 answers)
Closed 1 year ago.
I am trying to solve for the below function. I am getting my expected empty tuple when the sub is not found in the given string 'text'. However, I am having a problem incrementing during the for-loop to find the subsequent positions of the same sub in the remainder of the string 'text'.
def findall(text,sub):
"""
Returns the tuple of all positions of substring sub in text.
If sub does not appears anywhere in text, this function returns the
empty tuple ().
Examples:
findall('how now brown cow','ow') returns (1, 5, 10, 15)
findall('how now brown cow','cat') returns ()
findall('jeeepeeer','ee') returns (1,2,5,6)
Parameter text: The text to search
Precondition: text is a string
Parameter sub: The substring to search for
Precondition: sub is a nonempty string
"""
tup = ()
for pos in range(len(text)):
if sub not in text:
tup = ()
else:
pos1 = introcs.find_str(text,sub)
tup = tup + (pos1,)
# increment to the next pos and look for the sub again, not sure how to
# move beyond the first instance of the substring in the text ???
pos1 = pos1 + 1
return tup

I think your problem will be solved easily with regular expressions best refer to this answer
but if you insist on making it your way what about you delete the substring once you find it and then the position of the next substring will be the previous one + the length of the substring something like that.
For example for findall('how now brown cow','ow') first the position is 1 we delete the susbtring we are left with 'h now brown cow','ow' then the position returned by find will be 3 + substring.length = 5 which is the actual position of the susbstring ... Try it out hope it helps

Related

How to merge strings with overlapping characters in python?

I'm working on a python project which reads in an URL encoded overlapping list of strings. Each string is 15 characters long and overlaps with its sequential string by at least 3 characters and at most 15 characters (identical).
The goal of the program is to go from a list of overlapping strings - either ordered or unordered - to a compressed URL encoded string.
My current method fails at duplicate segments in the overlapping strings. For example, my program is incorrectly combining:
StrList1 = [ 'd+%7B%0A++++public+', 'public+static+v','program%0Apublic+', 'ublic+class+Hel', 'lass+HelloWorld', 'elloWorld+%7B%0A+++', '%2F%2F+Sample+progr', 'program%0Apublic+']
to output:
output = ['ublic+class+HelloWorld+%7B%0A++++public+', '%2F%2F+Sample+program%0Apublic+static+v`]
when correct output is:
output = ['%2F%2F+Sample+program%0Apublic+class+HelloWorld+%7B%0A++++public+static+v']
I am using simple python, not biopython or sequence aligners, though perhaps I should be?
Would greatly appreciate any advice on the matter or suggestions of a nice way to do this in python!
Thanks!
You can start with one of the strings in the list (stored as string), and for each of the remaining strings in the list (stored as candidate) where:
candidate is part of string,
candidate contains string,
candidate's tail matches the head of string,
or, candidate's head matches the tail of string,
assemble the two strings according to how they overlap, and then recursively repeat the procedure with the overlapping string removed from the remaining strings and the assembled string appended, until there is only one string left in the list, at which point it is a valid fully assembled string that can be added to the final output.
Since there can potentially be multiple ways several strings can overlap with each other, some of which can result in the same assembled strings, you should make output a set of strings instead:
def assemble(str_list, min=3, max=15):
if len(str_list) < 2:
return set(str_list)
output = set()
string = str_list.pop()
for i, candidate in enumerate(str_list):
matches = set()
if candidate in string:
matches.add(string)
elif string in candidate:
matches.add(candidate)
for n in range(min, max + 1):
if candidate[:n] == string[-n:]:
matches.add(string + candidate[n:])
if candidate[-n:] == string[:n]:
matches.add(candidate[:-n] + string)
for match in matches:
output.update(assemble(str_list[:i] + str_list[i + 1:] + [match]))
return output
so that with your sample input:
StrList1 = ['d+%7B%0A++++public+', 'public+static+v','program%0Apublic+', 'ublic+class+Hel', 'lass+HelloWorld', 'elloWorld+%7B%0A+++', '%2F%2F+Sample+progr', 'program%0Apublic+']
assemble(StrList1) would return:
{'%2F%2F+Sample+program%0Apublic+class+HelloWorld+%7B%0A++++public+static+v'}
or as an example of an input with various overlapping possibilities (that the second string can match the first by being inside, having tail matching the head, and having head matching the tail):
assemble(['abcggggabcgggg', 'ggggabc'])
would return:
{'abcggggabcgggg', 'abcggggabcggggabc', 'abcggggabcgggggabc', 'ggggabcggggabcgggg'}

Find() first find the index of the first letter match value and then check it the entire substring is contained in string?

I don’t really understand find() method. At first, I thought it was kind of like the remove() method in the sense that
remove() removes the first matching item in the list, while find() returns the index of the first letter of the substring.
But, if the entire substring does not contain in the string, it will return -1.
So is my understanding correct? find() first find the index of the first letter of the substring, and then sees if
the entire substring is contained in the string. If it does, then the index of the first letter of substring is returned.
Otherwise, -1 will be returned. If the first letter of the substring does not exist in string, -1 is returned? Or does it recursively look at each letter, if any letter does not contain in string, return -1
Yes, your understanding is correct. The find() method returns -1 if the entire substring has not been found in the target string, and returns the lowest index of the target string where substring has been found (the index of the first match of substring in the target string).
For example:
# First example (found a substring and there is only one in the target string)
In [1]: target_str = 'hello world'
In [2]: target_str.find('lo')
Out[2]: 3
# Second example (there are multiple substrings in the target string, but the start index of the first one is returned)
In [3]: target_str ='hello hello world'
In [4]: target_str.find('lo')
Out[4]: 3
# Third example (no substring has been found in the target string, so -1 is returned)
In [5]: target_str = 'bye world'
In [6]: target_str.find('lo')
Out[6]: -1
You can read more about the find method and its arguments in the python documentation. You can find it here.

How to remove only one of a certain character from a string that appears multiple times in Python 3 [duplicate]

This question already has answers here:
How to delete a character from a string using Python
(17 answers)
Closed 5 years ago.
I was trying to figure out how to remove only a certain character from a string that appears more than once.
Example:
>>>x = 'a,b,c,d'
>>>x = x.someremovingfunction(',', 3)
>>>print(x)
'a,b,cd'
If anyone can help, that would be greatly appreciated!
Split the original string by the character you want to remove. Then reassemble the parts in front of the offending character and behind it, and recombine the parts:
def remove_nth(text, separator, position):
parts = text.split(separator)
return separator.join(parts[:position]) + separator.join(parts[position:])
remove_nth(x,",",3)
# 'a,b,cd'
Assuming that the argument 3 means the occurence of the character in question, you could just iterate over the string and count. When you find the occurence just create a new string without it.
def someremovingfunction(text, char, occurence):
pos = 0
for i in text:
pos += 1
if i == char:
occurence -= 1
if not occurence:
return text[:pos-1] + text[pos:]
return text
Usage example:
print someremovingfunction('a,b,c,d', ',', 3)
This may help
>>> x = 'a,b,c,d'
>>> ''.join(x.split(','))
'abcd'

How to find the index of undetermined pattern in a string? [duplicate]

This question already has answers here:
Python Regex - How to Get Positions and Values of Matches
(4 answers)
Closed 6 years ago.
I want to find the index of multiple occurrences of at least two zeros followed by at least two ones (e.g., '0011','00011', '000111' and so on), from a string (called 'S')
The string S may look like:
'00111001100011'
The code I tried can only spot occurrences of '0011', and strangely returns the index of the first '1'. For example for the S above, my code returns 2 instead of 0:
index = []
index = [n for n in range(len(S)) if S.find('0011', n) == n]
Then I tried to use regular expression but I the regex I found can't express the specific digit I want (like '0' and '1')
Could anyone kindly come up with a solution, and tell me why my first result returns index of '1' instead of '0'? Lot's f thanks in advance!!!!!
In the following code the regex defines a single instance of the required pattern of digits. Then uses the finditer iterator of the regex to identify successive matches in the given string S. match.start() gives the starting position of each of these matches, and the entire list is returned to starts.
S = '00111001100011'
r = re.compile(r'(0{2,}1{2,})')
starts = [match.start() for match in r.finditer(S)]
print(starts)
# [0, 5, 9]

Why empty string is on every string? [duplicate]

This question already has answers here:
Why is True returned when checking if an empty string is in another?
(5 answers)
Closed 5 years ago.
For example:
>>> s = 'python'
>>> s.index('')
0
>>> s.index('p')
0
This is because the substring of length 0 starting at index 0 in 'python' is equal to the empty string:
>>> s[0:0]
''
Of course every substring of length zero of any string is equal to the empty string.
You can see "python" as "the empty string, followed by a p, followed by fifteen more empty strings, followed by a y, followed by forty-two empty strings, ...".
Point being, empty strings don't take any space, so there's no reason why it should not be there.
The index method could be specified like this:
s.index(t) returns a value i such that s[i : i+len(t)] is equal to t
If you substitute the empty string for t, this reads: "returns a value i such that s[i:i] is equal to """. And indeed, the value 0 is a correct return value according to this specification.

Categories

Resources