Search a string for a given key - python

I've been doing some more CodeEval challenges and came across one on the hard tab.
You are given two strings. Determine if the second string is a substring of the first (Do NOT use any substr type library function). The second string may contain an asterisk() which should be treated as a regular expression i.e. matches zero or more characters. The asterisk can be escaped by a \ char in which case it should be interpreted as a regular '' character. To summarize: the strings can contain alphabets, numbers, * and \ characters.
So you are given two strings in a file that look something like this: Hello,ell your job is to figure out if ell is in hello, what I do:
I haven't quite gotten it perfect, but I did get it to the point where it passes and works with a 65% complete. How it runs through the string, and the key, and checks if the characters match. If the characters match, it appends the character into a list. After this it divides the length of the string by 2 and checks if the length of the list is either greater than, or equal to half of the string. I figured half of the string length would be enough to verify if it indeed matches or not. Example of how it works:
h == e -> no
e == e -> yes -> list
l == e -> no
l == e -> no
...
My question is what can I do better to the point where I can verify the wildcards that are said above?
import sys
def search_string(string, key):
""" Search a string for a specified key.
If the key exists out put "true" if it doesn't output "false"
>>> search_string("test", "est")
true
>>> search_string("testing", "rawr")
false"""
results = []
for c in string:
for ch in key:
if c == ch:
results.append(c)
if len(string) / 2 < len(results) or len(string) / 2 == len(results):
return "true"
else:
return "false"
if __name__ == '__main__':
with open(sys.argv[1]) as data:
for line in data.readlines():
data_list = line.rstrip().split(",")
search_key = data_list[1]
word = data_list[0]
print(search_string(word, search_key))

I've come up with a solution to this problem. You've said "Do NOT use any substr type library function", I'm not sure If some of the functions I used are allowed or not, so tell me if I've broken any rules :D
Hope this helps you :)
def search_string(string, key):
key = key.replace("\\*", "<NormalStar>") # every \* becomes <NormalStar>
key = key.split("*") # splitting up the key makes it easier to work with
#print(key)
point = 0 # for checking order, e.g. test = t*est, test != est*t
found = "true" # default
for k in key:
k = k.replace("<NormalStar>", "*") # every <NormalStar> becomes *
if k in string[point:]: # the next part of the key is after the part before
point = string.index(k) + len(k) # move point after this
else: # k nbt found, return false
found = "false"
break
return found
print(search_string("test", "est")) # true
print(search_string("t....est", "t*est")) # true
print(search_string("n....est", "t*est")) # false
print(search_string("est....t", "t*est")) # false
print(search_string("anything", "*")) # true
print(search_string("test", "t\*est")) # false
print(search_string("t*est", "t\*est")) # true

Related

Substitution Cipher Verification

I am tasked to write a function that returns whether two strings are substitution ciphers of each other. It is assumed that one isn't given a key. The output is expected to return True or False.
Here is what I have written so far on this (borrowed from a CodeFights question). The idea is to append the counts of each element in the string and add it to the string1count and string2count variables. Then, compare the counts at each index, and if they are not equal, we can assume that it is not a valid substitution cipher since each element in the array needs to have the same number of corresponding of characters in order to be a substitution cipher.
def isSubstitutionCipher(string1, string2):
string1count = []
string2count = []
for i in range(0,len(string1)):
string1count.append(string1.count(string1[i]))
for i in range(0,len(string2)):
string2count.append(string2.count(string2[i]))
for i in range(0,len(string1count)):
if string1count.count(string1count[i])!=string2count.count(string1count[i]):
return False
return True
Does anyone else have other proposals on how to solve this very general question / problem statement?
you could try to re-create the subsitution:
def isSubstitutionCipher(string1, string2):
if len(string1) != len(string2):
return False
subst = {}
for c1, c2 in zip(string1, string2):
if c1 in subst:
if c2 != subst[c1]:
return False
else:
if c2 in subst.values():
return False
subst[c1] = c2
return True
for all the characters you have already seen, make sure the substitution matches. for the new ones: store them in the substitution and make sure they are not already a substitution target.
this will return False at the first character that does not match.
Here is a variation on hiro's excellent answer:
def is_sub(s,t):
if len(s) != len(t):return False
d = dict(zip(s,t))
return t == ''.join(d[c] for c in s)
We can use word patterns to check if one string is the ciphertext of another.
word pattern: first letter gets the number 0 and the first occurrence of each different letter after that gets the next number.
advantage is this has O(n) complexity
Code
def isSubstitutionCipher(s1, s2):
def word_pattern(s):
' Generates word pattern of s '
seen, pattern = {}, []
for c in s:
seen.setdefault(c, len(seen))
pattern.append(seen[c])
return pattern
return word_pattern(s1) == word_pattern(s2) # related by ciphertext if same word patterns
Test
'
print(isSubstitutionCipher('banana', 'cololo')) # Output: True
print(isSubstitutionCipher('dog', 'cat') # Output: True
print(isSubstitutionCipher('banana', 'cololl') # Output: False

How do I check if the next item in a string is the alphabetical successor of the one before? + Inverse

I'm trying to compress a string in a way that any sequence of letters in strict alphabetical order is swapped with the first letter plus the length of the sequence.
For example, the string "abcdefxylmno", would become: "a6xyl4"
Single letters that aren't in order with the one before or after just stay the way they are.
How do I check that two letters are successors (a,b) and not simply in alphabetical order (a,c)? And how do I keep iterating on the string until I find a letter that doesn't meet this requirement?
I'm also trying to do this in a way that makes it easier to write an inverse function (that given the result string gives me back the original one).
EDIT :
I've managed to get the function working, thanks to your suggestion of using the alphabet string as comparison; now I'm very much stuck on the inverse function: given "a6xyl4" expand it back into "abcdefxylmno".
After quite some time I managed to split the string every time there's a number and I made a function that expands a 2 char string, but it fails to work when I use it on a longer string:
from string import ascii_lowercase as abc
def subString(start,n):
L=[]
ind = abc.index(start)
newAbc = abc[ind:]
for i in range(len(newAbc)):
while i < n:
L.append(newAbc[i])
i+=1
res = ''.join(L)
return res
def unpack(S):
for i in range(len(S)-1):
if S[i] in abc and S[i+1] not in abc:
lett = str(S[i])
num = int(S[i+1])
return subString(lett,num)
def separate(S):
lst = []
for i in S:
lst.append(i)
for el in lst:
if el.isnumeric():
ind = lst.index(el)
lst.insert(ind+1,"-")
a = ''.join(lst)
L = a.split("-")
if S[-1].isnumeric():
L.remove(L[-1])
return L
else:
return L
def inverse(S):
L = separate(S)
for i in L:
return unpack(i)
Each of these functions work singularly, but inverse(S) doesn't output anything. What's the mistake?
You can use the ord() function which returns an integer representing the Unicode character. Sequential letters in alphabetical order differ by 1. Thus said you can implement a simple funtion:
def is_successor(a,b):
# check for marginal cases if we dont ensure
# input restriction somewhere else
if ord(a) not in range(ord('a'), ord('z')) and ord(a) not in range(ord('A'),ord('Z')):
return False
if ord(b) not in range(ord('a'), ord('z')) and ord(b) not in range(ord('A'),ord('Z')):
return False
# returns true if they are sequential
return ((ord(b) - ord(a)) == 1)
You can use chr(int) method for your reversing stage as it returns a string representing a character whose Unicode code point is an integer given as argument.
This builds on the idea that acceptable subsequences will be substrings of the ABC:
from string import ascii_lowercase as abc # 'abcdefg...'
text = 'abcdefxylmno'
stack = []
cache = ''
# collect subsequences
for char in text:
if cache + char in abc:
cache += char
else:
stack.append(cache)
cache = char
# if present, append the last sequence
if cache:
stack.append(cache)
# stack is now ['abcdef', 'xy', 'lmno']
# Build the final string 'a6x2l4'
result = ''.join(f'{s[0]}{len(s)}' if len(s) > 1 else s for s in stack)

Find if char precedes substring

I'm trying to find out if a substring ("xyz") is in a string, and if it is, if it has "." in the index to its left. If the substring has the period before it, it is not counted, and if the substring appears without the period it returns true.
I started by checking if the substring is in the string, and appending the index of the substring if it appears. Then I iterated through that list and checked if the index-1 was a ".", and if it was, removed the index. Then if the list still had anything in it, I returned True since the conditions would be met.
I cannot import any module since this is part of a competition, so no regex.
Here is what I have so far:
def xyz_there(a_str):
#Finds all indexes that xyz starts at
indexes=[i for i in range(len(a_str)) if a_str.startswith("xyz", i)]
#Check if sub not in string or string too short
if len(a_str)<3 or "xyz" not in a_str:
return False
#Iterate through indexes, check for preceding "."
for i in indexes:
if a_str[i-1] == ".":
indexes.remove(i)
if len(indexes)>0:
return True
else:
return False
It works well for the most part, but it has an issue using this test:
xyz_there('1.xyz.xyz2.xyz') #Should return False
Given 3 instances of the substring, it finds the period in the first and third instances, but not the second, and I'm not seeing why it would skip that one.
What about using count:
def xyz_there(s):
return s.count('xyz') - s.count('.xyz') > 0
And example usage:
xyz_there('1.xyz.xyz2.xyz')
xyz_there('1.xyz.xyz2xyz')
Output:
False
True
Your first problem is that you use indexes.remove(i). indexes.remove(i) removes the first occurrence of i in indexes. It does not remove the element at that position. To do what you want, you will need to use indexes.pop(i). Also, you are checking the length of indexes before you are done with it. You need to out-dent those lines:
for i in indexes:
if a_str[i-1] == ".":
indexes.pop(i)
if len(indexes)>0:
return True
else:
return False
You can replace those if-else lines with return len(indexes) > 0
One line solution if you want to use list compression, you could also do it with a filter:
def xyz_there(a_str):
return a_str[:3] == 'xyz' or any([not a_str[i-1] == '.' and a_str[i:i+3] == 'xyz' for i in range(1,len(a_str)-2)])
import re
def xyz_there(a_str):
all_indexes = re.finditer("xyz", a_str)
with_dot_preceding = [[n.start(), n.end()] for n in all_indexes if a_str[n.start() - 1] == "."]
return with_dot_preceding
test = xyz_there(".xyz.xyz")
if(len(test) > 0):
print True
print "True in %d places" % len(test)

trying to find if a character appears successively in a string

Simple script to find if the second arguement appears 3 times successively in the first arguement. I am able to find if the second arguement is in first and how many time etc but how do i see if its present 3 times successively or not ?
#!/usr/bin/python
import string
def three_consec(s1,s2) :
for i in s1 :
total = s1.count(s2)
if total > 2:
return "True"
print three_consec("ABABA","A")
total = s1.count(s2) will give you the number of s2 occurrences in s1 regardless of your position i.
Instead, just iterate through the string, and keep counting as you see characters s2:
def three_consec (string, character):
found = 0
for c in string:
if c == character:
found += 1
else:
found = 0
if found > 2:
return True
return False
Alternatively, you could also do it the other way around, and just look if “three times the character” appears in the string:
def three_consec (string, character):
return (character * 3) in string
This uses the feature that you can multiplicate a string by a number to repeat that string (e.g. 'A' * 3 will give you 'AAA') and that the in operator can be used to check whether a substring exists in a string.

Python - packing/unpacking by letters

I'm just starting to learn python and I have this exercise that's puzzling me:
Create a function that can pack or unpack a string of letters.
So aaabb would be packed a3b2 and vice versa.
For the packing part of the function, I wrote the following
def packer(s):
if s.isalpha(): # Defines if unpacked
stack = []
for i in s:
if s.count(i) > 1:
if (i + str(s.count(i))) not in stack:
stack.append(i + str(s.count(i)))
else:
stack.append(i)
print "".join(stack)
else:
print "Something's not quite right.."
return False
packer("aaaaaaaaaaaabbbccccd")
This seems to work all proper. But the assignment says that
if the input has (for example) the letter a after b or c, then
it should later be unpacked into it's original form.
So "aaabbkka" should become a3b2k2a, not a4b2k2.
I hence figured, that I cannot use the "count()" command, since
that counts all occurrences of the item in the whole string, correct?
What would be my options here then?
On to the unpacking -
I've thought of the basics what my code needs to do -
between the " if s.isalpha():" and else, I should add an elif that
checks whether or not the string has digits in it. (I figured this would be
enough to determine whether it's the packed version or unpacked).
Create a for loop and inside of it an if sentence, which then checks for every element:
2.1. If it has a number behind it > Return (or add to an empty stack) the number times the digit
2.2. If it has no number following it > Return just the element.
Big question number 2 - how do I check whether it's a number or just another
alphabetical element following an element in the list? I guess this must be done with
slicing, but those only take integers. Could this be achieved with the index command?
Also - if this is of any relevance - so far I've basically covered lists, strings, if and for
and I've been told this exercise is doable with just those (...so if you wouldn't mind keeping this really basic)
All help appreciated for the newbie enthusiast!
SOLVED:
def packer(s):
if s.isalpha(): # Defines if unpacked
groups= []
last_char = None
for c in s:
if c == last_char:
groups[-1].append(c)
else:
groups.append([c])
last_char = c
return ''.join('%s%s' % (g[0], len(g)>1 and len(g) or '') for g in groups)
else: # Seems to be packed
stack = ""
for i in range(len(s)):
if s[i].isalpha():
if i+1 < len(s) and s[i+1].isdigit():
digit = s[i+1]
char = s[i]
i += 2
while i < len(s) and s[i].isdigit():
digit +=s[i]
i+=1
stack += char * int(digit)
else:
stack+= s[i]
else:
""
return "".join(stack)
print (packer("aaaaaaaaaaaabbbccccd"))
print (packer("a4b19am4nmba22"))
So this is my final code. Almost managed to pull it all off with just for loops and if statements.
In the end though I had to bring in the while loop to solve reading the multiple-digit numbers issue. I think I still managed to keep it simple enough. Thanks a ton millimoose and everyone else for chipping in!
A straightforward solution:
If a char is different, make a new group. Otherwise append it to the last group. Finally count all groups and join them.
def packer(s):
groups = []
last_char = None
for c in s:
if c == last_char:
groups[-1].append(c)
else:
groups.append([c])
last_char = c
return ''.join('%s%s'%(g[0], len(g)) for g in groups)
Another approach is using re.
Regex r'(.)\1+' can match consecutive characters longer than 1. And with re.sub you can easily encode it:
regex = re.compile(r'(.)\1+')
def replacer(match):
return match.group(1) + str(len(match.group(0)))
regex.sub(replacer, 'aaabbkka')
#=> 'a3b2k2a'
I think You can use `itertools.grouby' function
for example
import itertools
data = 'aaassaaasssddee'
groupped_data = ((c, len(list(g))) for c, g in itertools.groupby(data))
result = ''.join(c + (str(n) if n > 1 else '') for c, n in groupped_data)
of course one can make this code more readable using generator instead of generator statement
This is an implementation of the algorithm I outlined in the comments:
from itertools import takewhile, count, islice, izip
def consume(items):
from collections import deque
deque(items, maxlen=0)
def ilen(items):
result = count()
consume(izip(items, result))
return next(result)
def pack_or_unpack(data):
start = 0
result = []
while start < len(data):
if data[start].isdigit():
# `data` is packed, bail
return unpack(data)
run = run_len(data, start)
# append the character that might repeat
result.append(data[start])
if run > 1:
# append the length of the run of characters
result.append(str(run))
start += run
return ''.join(result)
def run_len(data, start):
"""Return the end index of the run of identical characters starting at
`start`"""
return start + ilen(takewhile(lambda c: c == data[start],
islice(data, start, None)))
def unpack(data):
result = []
for i in range(len(data)):
if data[i].isdigit():
# skip digits, we'll look for them below
continue
# packed character
c = data[i]
# number of repetitions
n = 1
if (i+1) < len(data) and data[i+1].isdigit():
# if the next character is a digit, grab all the digits in the
# substring starting at i+1
n = int(''.join(takewhile(str.isdigit, data[i+1:])))
# append the repeated character
result.append(c*n) # multiplying a string with a number repeats it
return ''.join(result)
print pack_or_unpack('aaabbc')
print pack_or_unpack('a3b2c')
print pack_or_unpack('a10')
print pack_or_unpack('b5c5')
print pack_or_unpack('abc')
A regex-flavoured version of unpack() would be:
import re
UNPACK_RE = re.compile(r'(?P<char> [a-zA-Z]) (?P<count> \d+)?', re.VERBOSE)
def unpack_re(data):
matches = UNPACK_RE.finditer(data)
pairs = ((m.group('char'), m.group('count')) for m in matches)
return ''.join(char * (int(count) if count else 1)
for char, count in pairs)
This code demonstrates the most straightforward (or "basic") approach of implementing that algorithm. It's not particularly elegant or idiomatic or necessarily efficient. (It would be if written in C, but Python has the caveats such as: indexing a string copies the character into a new string, and algorithms that seem to copy data excessively might be faster than trying to avoid this if the copying is done in C and the workaround was implemented with a Python loop.)

Categories

Resources