Find if char precedes substring

Find if char precedes substring - python

I'm trying to find out if a substring ("xyz") is in a string, and if it is, if it has "." in the index to its left. If the substring has the period before it, it is not counted, and if the substring appears without the period it returns true.
I started by checking if the substring is in the string, and appending the index of the substring if it appears. Then I iterated through that list and checked if the index-1 was a ".", and if it was, removed the index. Then if the list still had anything in it, I returned True since the conditions would be met.
I cannot import any module since this is part of a competition, so no regex.
Here is what I have so far:
def xyz_there(a_str):
#Finds all indexes that xyz starts at
indexes=[i for i in range(len(a_str)) if a_str.startswith("xyz", i)]
#Check if sub not in string or string too short
if len(a_str)<3 or "xyz" not in a_str:
return False
#Iterate through indexes, check for preceding "."
for i in indexes:
if a_str[i-1] == ".":
indexes.remove(i)
if len(indexes)>0:
return True
else:
return False
It works well for the most part, but it has an issue using this test:
xyz_there('1.xyz.xyz2.xyz') #Should return False
Given 3 instances of the substring, it finds the period in the first and third instances, but not the second, and I'm not seeing why it would skip that one.

What about using count:
def xyz_there(s):
return s.count('xyz') - s.count('.xyz') > 0
And example usage:
xyz_there('1.xyz.xyz2.xyz')
xyz_there('1.xyz.xyz2xyz')
Output:
False
True

Your first problem is that you use indexes.remove(i). indexes.remove(i) removes the first occurrence of i in indexes. It does not remove the element at that position. To do what you want, you will need to use indexes.pop(i). Also, you are checking the length of indexes before you are done with it. You need to out-dent those lines:
for i in indexes:
if a_str[i-1] == ".":
indexes.pop(i)
if len(indexes)>0:
return True
else:
return False
You can replace those if-else lines with return len(indexes) > 0

One line solution if you want to use list compression, you could also do it with a filter:
def xyz_there(a_str):
return a_str[:3] == 'xyz' or any([not a_str[i-1] == '.' and a_str[i:i+3] == 'xyz' for i in range(1,len(a_str)-2)])

import re
def xyz_there(a_str):
all_indexes = re.finditer("xyz", a_str)
with_dot_preceding = [[n.start(), n.end()] for n in all_indexes if a_str[n.start() - 1] == "."]
return with_dot_preceding
test = xyz_there(".xyz.xyz")
if(len(test) > 0):
print True
print "True in %d places" % len(test)

Related

Algorithm for figuring out if two strings are palindromes at a specific index

Given two strings, A and B, of equal length, find whether it is possible to split
both strings at the same index such that merging the first part of A and the second
part of B forms a palindrome. Return the location of the split. Palindrome is a word
that reads the same backward as forward.
If the solution can not be found, return -1.
Here is what I have so far
def palindrome(str1, str2):
if len(str1) != len(str2):
return None
for i in range(len(str1)):
firstStr = str1[i:]
secondStr = str2[:i]
if isPalindrome(firstStr+secondStr):
return i
return -1
def isPalindrome(s):
return s == s[::-1]
print(palindrome('abcdefgh', 'dasedcba'))
My solution returns -1 for the test case though it should return 4
I'm not sure what to change to make sure that the correct index is returned.

According to your problem, you want to merge the first part of A (str1) and the second part of B (str2), but in your code you're doing the reverse, just switch the slices on str1 and str2:
firstStr = str1[:i] # from the start up to i (first part of str1)
secondStr = str2[i:] # from i to up the end (second part of str2)
Test:
def palindrome(str1, str2):
if len(str1) != len(str2):
return None
for i in range(len(str1)):
firstStr = str1[:i]
secondStr = str2[i:]
if isPalindrome(firstStr+secondStr):
return i
return -1
def isPalindrome(s):
return s == s[::-1]
print(palindrome('abcdefgh', 'dasedcba'))
Output:
4

You are not taking correctly the first part of the first string and the second part of the second string. Keep in mind that when you slice a string:
>>> s = 'abcdefgh'
>>> s[:4] # this will take the first 4 elements of s
'abcd'
>>> s[4:] # this will take a sub-string from the 5th element at index 4 until the end of s
'efgh'
Therefore, you should change your code:
def palindrome(str1, str2):
if len(str1) != len(str2):
raise ValueError('str1 and str2 should have the same length')
for i in range(len(str1)):
firstStr = str1[:i]
secondStr = str2[i:]
if isPalindrome(firstStr + secondStr):
return i
return -1
Also keep in mind the following, if both strings str1 and str2 do not have the same length, you are returning None, so you can just write return since that returns None implicitly. However, I think that it is better to raise an exception that tells the user that str1 and str2 do not have the same length.

You should also just be reversing your words and then running your check that way:
reverse_word = word[::-1]

check if letters of a string are in sequential order in another string

If it were just checking whether letters in a test_string are also in a control_string,
I would not have had this problem.
I will simply use the code below.
if set(test_string.lower()) <= set(control_string.lower()):
return True
But I also face a rather convoluted task of discerning whether the overlapping letters in the
control_string are in the same sequential order as those in test_string.
For example,
test_string = 'Dih'
control_string = 'Danish'
True
test_string = 'Tbl'
control_string = 'Bottle'
False
I thought of using the for iterator to compare the indices of the alphabets, but it is quite hard to think of the appropriate algorithm.
for i in test_string.lower():
for j in control_string.lower():
if i==j:
index_factor = control_string.index(j)
My plan is to compare the primary index factor to the next factor, and if primary index factor turns out to be larger than the other, the function returns False.
I am stuck on how to compare those index_factors in a for loop.
How should I approach this problem?

You could just join the characters in your test string to a regular expression, allowing for any other characters .* in between, and then re.search that pattern in the control string.
>>> test, control = "Dih", "Danish"
>>> re.search('.*'.join(test), control) is not None
True
>>> test, control = "Tbl", "Bottle"
>>> re.search('.*'.join(test), control) is not None
False
Without using regular expressions, you can create an iter from the control string and use two nested loops,1) breaking from the inner loop and else returning False until all the characters in test are found in control. It is important to create the iter, even though control is already iterable, so that the inner loop will continue where it last stopped.
def check(test, control):
it = iter(control)
for a in test:
for b in it:
if a == b:
break
else:
return False
return True
You could even do this in one (well, two) lines using all and any:
def check(test, control):
it = iter(control)
return all(any(a == b for b in it) for a in test)
Complexity for both approaches should be O(n), with n being the max number of characters.
1) This is conceptually similar to what #jpp does, but IMHO a bit clearer.

Here's one solution. The idea is to iterate through the control string first and yield a value if it matches the next test character. If the total number of matches equals the length of test, then your condition is satisfied.
def yield_in_order(x, y):
iterstr = iter(x)
current = next(iterstr)
for i in y:
if i == current:
yield i
current = next(iterstr)
def checker(test, control):
x = test.lower()
return sum(1 for _ in zip(x, yield_in_order(x, control.lower()))) == len(x)
test1, control1 = 'Tbl', 'Bottle'
test2, control2 = 'Dih', 'Danish'
print(checker(test1, control1)) # False
print(checker(test2, control2)) # True
#tobias_k's answer has cleaner version of this. If you want some additional information, e.g. how many letters align before there's a break found, you can trivially adjust the checker function to return sum(1 for _ in zip(x, yield_in_order(...))).

You can use find(letter, last_index) to find occurence of desired letter after processed letters.
def same_order_in(test, control):
index = 0
control = control.lower()
for i in test.lower():
index = control.find(i, index)
if index == -1:
return False
# index += 1 # uncomment to check multiple occurrences of same letter in test string
return True
If test string have duplicate letters like:
test_string = 'Diih'
control_string = 'Danish'
With commented line same_order_in(test_string, control_string) == True
and with uncommented line same_order_in(test_string, control_string) == False

Recursion is the best way to solve such problems.
Here's one that checks for sequential ordering.
def sequentialOrder(test_string, control_string, len1, len2):
if len1 == 0: # base case 1
return True
if len2 == 0: # base case 2
return False
if test_string[len1 - 1] == control_string[len2 - 1]:
return sequentialOrder(test_string, control_string, len1 - 1, len2 - 1) # Recursion
return sequentialOrder(test_string, control_string, len1, len2-1)
test_string = 'Dih'
control_string = 'Danish'
print(isSubSequence(test_string, control_string, len(test_string), len(control_string)))
Outputs:
True
and False for
test_string = 'Tbl'
control_string = 'Bottle'
Here's an Iterative approach that does the same thing,
def sequentialOrder(test_string,control_string,len1,len2):
i = 0
j = 0
while j < len1 and i < len2:
if test_string[j] == control_string[i]:
j = j + 1
i = i + 1
return j==len1
test_string = 'Dih'
control_string = 'Danish'
print(sequentialOrder(test_string,control_string,len(test_string) ,len(control_string)))

An elegant solution using a generator:
def foo(test_string, control_string):
if all(c in control_string for c in test_string):
gen = (char for char in control_string if char in test_string)
if all(x == test_string[i] for i, x in enumerate(gen)):
return True
return False
print(foo('Dzn','Dahis')) # False
print(foo('Dsi','Dahis')) # False
print(foo('Dis','Dahis')) # True
First check if all the letters in the test_string are contained in the control_string. Then check if the order is similar to the test_string order.

A simple way is making use of the key argument in sorted, which serves as a key for the sort comparison:
def seq_order(l1, l2):
intersection = ''.join(sorted(set(l1) & set(l2), key = l2.index))
return True if intersection == l1 else False
Thus this is computing the intersection of the two sets and sorting it according to the longer string. Having done so you only need to compare the result with the shorter string to see if they are the same.
The function returns True or False accordingly. Using your examples:
seq_order('Dih', 'Danish')
#True
seq_order('Tbl', 'Bottle')
#False
seq_order('alp','apple')
#False

Search a string for a given key

I've been doing some more CodeEval challenges and came across one on the hard tab.
You are given two strings. Determine if the second string is a substring of the first (Do NOT use any substr type library function). The second string may contain an asterisk() which should be treated as a regular expression i.e. matches zero or more characters. The asterisk can be escaped by a \ char in which case it should be interpreted as a regular '' character. To summarize: the strings can contain alphabets, numbers, * and \ characters.
So you are given two strings in a file that look something like this: Hello,ell your job is to figure out if ell is in hello, what I do:
I haven't quite gotten it perfect, but I did get it to the point where it passes and works with a 65% complete. How it runs through the string, and the key, and checks if the characters match. If the characters match, it appends the character into a list. After this it divides the length of the string by 2 and checks if the length of the list is either greater than, or equal to half of the string. I figured half of the string length would be enough to verify if it indeed matches or not. Example of how it works:
h == e -> no
e == e -> yes -> list
l == e -> no
l == e -> no
...
My question is what can I do better to the point where I can verify the wildcards that are said above?
import sys
def search_string(string, key):
""" Search a string for a specified key.
If the key exists out put "true" if it doesn't output "false"
>>> search_string("test", "est")
true
>>> search_string("testing", "rawr")
false"""
results = []
for c in string:
for ch in key:
if c == ch:
results.append(c)
if len(string) / 2 < len(results) or len(string) / 2 == len(results):
return "true"
else:
return "false"
if __name__ == '__main__':
with open(sys.argv[1]) as data:
for line in data.readlines():
data_list = line.rstrip().split(",")
search_key = data_list[1]
word = data_list[0]
print(search_string(word, search_key))

I've come up with a solution to this problem. You've said "Do NOT use any substr type library function", I'm not sure If some of the functions I used are allowed or not, so tell me if I've broken any rules :D
Hope this helps you :)
def search_string(string, key):
key = key.replace("\\*", "<NormalStar>") # every \* becomes <NormalStar>
key = key.split("*") # splitting up the key makes it easier to work with
#print(key)
point = 0 # for checking order, e.g. test = t*est, test != est*t
found = "true" # default
for k in key:
k = k.replace("<NormalStar>", "*") # every <NormalStar> becomes *
if k in string[point:]: # the next part of the key is after the part before
point = string.index(k) + len(k) # move point after this
else: # k nbt found, return false
found = "false"
break
return found
print(search_string("test", "est")) # true
print(search_string("t....est", "t*est")) # true
print(search_string("n....est", "t*est")) # false
print(search_string("est....t", "t*est")) # false
print(search_string("anything", "*")) # true
print(search_string("test", "t\*est")) # false
print(search_string("t*est", "t\*est")) # true

string.find indexing in Python

I'm trying to understand why the following python code incorrectly returns the string "dining":
def remove(somestring, sub):
"""Return somestring with sub removed."""
location = somestring.find(sub)
length = len(sub)
part_before = somestring[:location]
part_after = somestring[location + length:]
return part_before + part_after
print remove('ding', 'do')
I realize the way to make the code run correctly is to add an if statement so that if the location variable returns a -1 it will simply return the original string (in this case "ding"). The code, for example, should be:
def remove(somestring, sub):
"""Return somestring with sub removed."""
location = somestring.find(sub)
if location == -1:
return somestring
length = len(sub)
part_before = somestring[:location]
part_after = somestring[location + length:]
return part_before + part_after
print remove('ding', 'do')
Without using the if statement to fix the function, the part_before variable will return the string "din". I would love to know why this happens. Reading the python documentation on string.find (which is ultimately how part_before is formulated) I see that the location variable would become a -1 because "do" is NOT found. But if the part_before variable holds all letters before the -1 index, shouldn't it be blank and not "din"? What am I missing here?
For reference, Python documentation for string.find states:
string.find(s, sub[, start[, end]])
Return the lowest index in s where the substring sub is found such that sub is wholly contained in s[start:end]. Return -1 on failure. Defaults for start and end and interpretation of negative values is the same as for slices.

string = 'ding'
string[:-1]
>>> 'din'
Using a negative number as an index in python returns the nth element from the right-hand side. Accordingly, a slice with :-1 return all but the last element of the string.

If you have a string 'ding' and you are searching for 'do', str.find() will return -1. 'ding'[:-1] is equal to 'din' and 'ding'[-1 + len(sub):] equals 'ding'[1:] which is equal to 'ing'. Putting the two together results in 'dining'. To get the right answer, try something like this:
def remove(string, sub):
index = string.find(sub)
if index == -1:
return string
else:
return string[:index] + string[index + len(sub):]
The reason that string[:-1] is not equal to the whole string is that in slicing, the first number (in this case blank so equal to None, or for our purposes equivalent to 0) is inclusive, but the second number (-1) is exclusive. string[-1] is the last character, so that character is not included.

Create Your Own Find String Function

For a school project I have to create a function called find_str that essentially does the same thing as the .find string method, but we cannot use any string methods in our definition.
The project description reads: "Function find_str has two parameters (both strings). It returns the lowest index where the second parameter is found within the first parameter (it returns -1 if the second parameter is not found within the first parameter)."
I have spent a lot of time working on this project and have yet to come to a solution. This is the current definition that I have come up with:
def find_str (string, substring):
index = 0
length = len (substring)
for ch in string:
if ch == substring [0]:
subindex1 = 0
subindex2 = index
for i in range (length):
if ch == substring [i]:
subindex1 +=1
if subindex1 == length:
return index
ch = string [(subindex2)+1]
subindex2 +=1
index += 1
return "-1"
This sample of code only works in some instances, but not all.
For example:
print (find_str ("hello", "llo"))
returns:
2
as it should.
But
print (find_str ("hello", "el"))
returns:
ch = string [(subindex2)+1]
IndexError: string index out of range
I feel like I am overthinking this and there must be is an easier way to do it. Any input or help would be great! Thanks.

FFUsing a sub function to clear your thoughts often help.
def find_str (string, substring):
index = 0
length = len (substring)
for j in range(len(string)):
if is_next_sub(string, substring, j):
return j
return "-1"
def is_next_sub(string, substring, index):
for i in range(len(substring)):
if substring[i] != string[index + i]:
return False
return True

I'm not sure we should be helping you with 'homework'
How about this:
def find_str(string, substring):
for off in xrange(len(string)):
if string[off:].startswith(substring):
return off
return -1

I haven't checked through your code in detail, but it looks like you're trying to compare characters that don't exist.
Suppose you're searching "aaaaa" for the substring "aaa", and you need to find all matches...
String : aaaaa
Match at 0 : aaa..
Match at 1 : .aaa.
Match at 2 : ..aaa
Even though the characters always match, and there five characters in the string, there are only three positions that you might need to consider.
So before you look at the actual characters at all, you can restrict the number of start positions you might need to consider based on the lengths of the string and substring. You only loop for those start positions. That means you're not looping for start positions that cannot match. Also, if you don't do this...
String : aaaaa
Match at 0 : aaa..
Match at 1 : .aaa.
Match at 2 : ..aaa
Match at 3 : ...aa!
Match at 4 : ....a!!
Those exclamation points are places where you try to match a character in the substring with a character that doesn't exist, after the end of the string. You can check for that within the loop to avoid the error each time it occurs, but why not eliminate all those cases at once by not looping for the match positions that cannot occur?
The number of start positions you may need to check is len(fullstring) + 1 - len(substring), so you can derive a range of possible start positions using range(0, len(fullstring) + 1 - len(substring)).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find if char precedes substring - python

What about using count: def xyz_there(s): return s.count('xyz') - s.count('.xyz') > 0 And example usage: xyz_there('1.xyz.xyz2.xyz') xyz_there('1.xyz.xyz2xyz') Output: False True

One line solution if you want to use list compression, you could also do it with a filter: def xyz_there(a_str): return a_str[:3] == 'xyz' or any([not a_str[i-1] == '.' and a_str[i:i+3] == 'xyz' for i in range(1,len(a_str)-2)])

import re def xyz_there(a_str): all_indexes = re.finditer("xyz", a_str) with_dot_preceding = [[n.start(), n.end()] for n in all_indexes if a_str[n.start() - 1] == "."] return with_dot_preceding test = xyz_there(".xyz.xyz") if(len(test) > 0): print True print "True in %d places" % len(test)

Related

Algorithm for figuring out if two strings are palindromes at a specific index

check if letters of a string are in sequential order in another string

Search a string for a given key

string.find indexing in Python

Create Your Own Find String Function

Categories

Resources