Comparing string differences to a list of strings - python

I have a method, that computes the number of differences in two strings, and outputs where the differences are.
def method(a):
count=0
s1="ABC"
for i in range (len(a)):
if not a[i]==s1[i]:
count=count+1
else:
count=count+0
return a,count,difference(a, s1)
On input ex CBB, this method outputs
('CBB', 2, [1, 0, 1])
What I really need is for this method to do the same, but where is not only compares to a single string in s1, but to a list of strings
s1 = ['ACB', 'ABC', 'ABB']
Anyone with a smart method to do this?

Ok, after clarification, instead of hardcoding s1, make your method take it as argument:
def method(a, s1):
count=0
for i in range (len(a)):
if not a[i]==s1[i]:
count=count+1
else:
count=count+0
return a,count,difference(a, s1)
Then use list compherension:
result = [method(a, s1) for s1 in list]
Be careful though, as your method will fail if a is longer than s1. As you really don't say what the result should be in that case, i left it as is.

the compare function calculates the number of differences (and map of differences that you had been creating with difference()). I rewrote the compare function to take a base string to be compared to, src, so that you don't get stuck with comparing to "ABC" all the time.
def compare(src, test):
if len(src) != len(test):
return # must be the same length
diffmap = [0]*len(src)
count = 0
for i, c in enumerate(src):
if not c == test[i]:
count = count+1
diffmap[i] = 1
return test, count, diffmap
The compare_to_many function simply goes through a list of strings to compare to, srcs, and creates a list of the comparisons between those base strings and a test string test.
def compare_to_many(srcs, test):
return map(lambda x: compare(x, test), srcs)
EDIT:
After clarification in the comments, #X-Pender needs the source list to be hardcoded. This can be reflected by the following, single function:
def compare(test):
def compare_one(src, test):
diffmap = [0]*len(src)
count = 0
for i, c in enumerate(src):
if not c == test[i]:
count = count+1
diffmap[i] = 1
return test, count, diffmap
sources = ["ABC", "CDB", "EUA"] # this is your hardcoded list
return map(lambda x: compare_one(x, test), sources)

Related

Scatter palindrome - How to parse through a dictionary to figure out combinations of substrings that form a palindrome

I was given this HC problem at an interview. I was able to come up with what I'll call a brute force method.
Here is the problem statement:
Find all the scatter palindromes in a given string, "aabb". The
substrings can be scattered but rearranged to form a palindrome.
example: a, aa, aab, aabb, a, abb, b, bb, bba and b are the substrings
that satisfy this criteria.
My logic:
divide the str into substrings
counter = 0
if the len(substr) is even:
and substr == reverse(substr)
increment counter
else:
store all the number of occurrences of each str of substr in a dict
process dict somehow to figure out if it can be arranged into a palindrome
###### this is where I ran out of ideas ######
My code:
class Solution:
def countSubstrings(self, s: str) -> int:
n = len(s)
c=0
for i in range(0,n-1): #i=0
print('i:',i)
for j in range(i+1,n+1): #j=1
print('j',j)
temp = s[i:j]
if len(temp) == 1:
c+=1
# if the len of substring is even,
# check if the reverse of the string is same as the string
elif(len(temp)%2 == 0):
if (temp == temp[::-1]):
c+=1
print("c",c)
else:
# create a dict to check how many times
# each value has occurred
d = {}
for l in range(len(temp)):
if temp[l] in d:
d[temp[l]] = d[temp[l]]+1
else:
d[temp[l]] = 1
print(d)
return c
op = Solution()
op.countSubstrings('aabb')
By now, it must be obvious I'm a beginner. I'm sure there are better, more complicated ways to solve this. Some of my code is adapted from visleck's logic here and I wasn't able to follow the second half of it. If someone can explain it, that would be great as well.
As a partial answer, the test for a string being a scattered palindrome is simple: if the number of letters which occur an odd number of times is at most 1, it is a scattered palindrome. Otherwise it isn't.
It can be implemented like this:
from collections import Counter
def scattered_palindrome(s):
counts = Counter(s)
return len([count for count in counts.values() if count % 2 == 1]) <= 1
For example,
>>> scattered_palindrome('abb')
True
>>> scattered_palindrome('abbb')
False
Note that at no stage is it necessary to compare a string with its reverse. Also, note that I used a Counter object to keep track of the letter counts. This is a streamlined way of creating a dictionary-like collection of letter counts.

Match two strings (char to char) till the first non-match using python

I am trying to match two strings sequentially till the first the non-matched character and then determine the percentage exact match. My code is like this:
def match(a, b):
a, b = list(a), list(b)
count = 0
for i in range(len(a)):
if (a[i]!= b[i]): break
else: count = count + 1
return count/len(a)
a = '354575368987943'
b = '354535368987000'
c = '354575368987000'
print(match(a,b)) # return 0.267
print(match(a,c)) # return 0.8
Is there any built-in method in python already which can do it faster ? For simplicity assume that both strings are of same length.
There's no built-in to do the entire thing, but you can use a built-in for computing the common prefix:
import os
def match(a, b):
common = os.path.commonprefix([a, b])
return float(len(common))/len(a)
I don't think there is such build-in method.
But you can improve your implementation:
No need to wrap the inputs in list(...). Strings are indexable.
No need for count variable, i already carries the same meaning. And you can return immediately when you know the result.
Like this, with some doctests added as a bonus:
def match(a, b):
"""
>>> match('354575368987943', '354535368987000')
0.26666666666666666
>>> match('354575368987943', '354575368987000')
0.8
>>> match('354575368987943', '354575368987943')
1
"""
for i in range(len(a)):
if a[i] != b[i]:
return i / len(a)
return 1
alternative
(Just now saw that the answer below me thought of the same thing while I was editing the post)
def match(l1, l2):
# find mismatch
try:
stop = next(i for i, (el1, el2) in enumerate(zip(l1, l2)) if el1 != el2)
return stop/len(l1)
except StopIteration:
return 1

Code optimisation ideas

I wrote a code in Python that print a Fibonacci sequence truncated at a given threshold.
m_char=input('threshold: ')
m=int(m_char)
def fibonacci(m):
lst=[0, 1]
while lst[-1] <= m:
a = lst[-2]+lst[-1]
if a <= m:
lst.append(a)
else:
print(lst)
return
fibonacci(m)
I don't like the double check on the variable m in the while and if statement: I'm pretty sure it is redundant, so there is a way to write more efficient code. I would like to preserve the use of lists. Have you got any idea?
def fibonacci(m):
lst=[0, 1]
a = lst[-2]+lst[-1]
while a <= m:
lst.append(a)
a = lst[-2]+lst[-1]
return lst
You can calculate a once per loop, and use it to determine the whether the loop continues
Just use
while True:
it is the check inside the loop which actually determines how often the loop is run.
It would be slightly more efficient to not use list indexing at all but instead maintain the last two Fibonacci numbers with two variables. Furthermore, it is more idiomatic to return the list rather than print it. Let the calling code print the list if it wants:
def fibonacci(m):
lst=[0, 1]
a,b = lst
while True:
a,b = b, a+b
if b <= m:
lst.append(b)
else:
return lst

Python - packing/unpacking by letters

I'm just starting to learn python and I have this exercise that's puzzling me:
Create a function that can pack or unpack a string of letters.
So aaabb would be packed a3b2 and vice versa.
For the packing part of the function, I wrote the following
def packer(s):
if s.isalpha(): # Defines if unpacked
stack = []
for i in s:
if s.count(i) > 1:
if (i + str(s.count(i))) not in stack:
stack.append(i + str(s.count(i)))
else:
stack.append(i)
print "".join(stack)
else:
print "Something's not quite right.."
return False
packer("aaaaaaaaaaaabbbccccd")
This seems to work all proper. But the assignment says that
if the input has (for example) the letter a after b or c, then
it should later be unpacked into it's original form.
So "aaabbkka" should become a3b2k2a, not a4b2k2.
I hence figured, that I cannot use the "count()" command, since
that counts all occurrences of the item in the whole string, correct?
What would be my options here then?
On to the unpacking -
I've thought of the basics what my code needs to do -
between the " if s.isalpha():" and else, I should add an elif that
checks whether or not the string has digits in it. (I figured this would be
enough to determine whether it's the packed version or unpacked).
Create a for loop and inside of it an if sentence, which then checks for every element:
2.1. If it has a number behind it > Return (or add to an empty stack) the number times the digit
2.2. If it has no number following it > Return just the element.
Big question number 2 - how do I check whether it's a number or just another
alphabetical element following an element in the list? I guess this must be done with
slicing, but those only take integers. Could this be achieved with the index command?
Also - if this is of any relevance - so far I've basically covered lists, strings, if and for
and I've been told this exercise is doable with just those (...so if you wouldn't mind keeping this really basic)
All help appreciated for the newbie enthusiast!
SOLVED:
def packer(s):
if s.isalpha(): # Defines if unpacked
groups= []
last_char = None
for c in s:
if c == last_char:
groups[-1].append(c)
else:
groups.append([c])
last_char = c
return ''.join('%s%s' % (g[0], len(g)>1 and len(g) or '') for g in groups)
else: # Seems to be packed
stack = ""
for i in range(len(s)):
if s[i].isalpha():
if i+1 < len(s) and s[i+1].isdigit():
digit = s[i+1]
char = s[i]
i += 2
while i < len(s) and s[i].isdigit():
digit +=s[i]
i+=1
stack += char * int(digit)
else:
stack+= s[i]
else:
""
return "".join(stack)
print (packer("aaaaaaaaaaaabbbccccd"))
print (packer("a4b19am4nmba22"))
So this is my final code. Almost managed to pull it all off with just for loops and if statements.
In the end though I had to bring in the while loop to solve reading the multiple-digit numbers issue. I think I still managed to keep it simple enough. Thanks a ton millimoose and everyone else for chipping in!
A straightforward solution:
If a char is different, make a new group. Otherwise append it to the last group. Finally count all groups and join them.
def packer(s):
groups = []
last_char = None
for c in s:
if c == last_char:
groups[-1].append(c)
else:
groups.append([c])
last_char = c
return ''.join('%s%s'%(g[0], len(g)) for g in groups)
Another approach is using re.
Regex r'(.)\1+' can match consecutive characters longer than 1. And with re.sub you can easily encode it:
regex = re.compile(r'(.)\1+')
def replacer(match):
return match.group(1) + str(len(match.group(0)))
regex.sub(replacer, 'aaabbkka')
#=> 'a3b2k2a'
I think You can use `itertools.grouby' function
for example
import itertools
data = 'aaassaaasssddee'
groupped_data = ((c, len(list(g))) for c, g in itertools.groupby(data))
result = ''.join(c + (str(n) if n > 1 else '') for c, n in groupped_data)
of course one can make this code more readable using generator instead of generator statement
This is an implementation of the algorithm I outlined in the comments:
from itertools import takewhile, count, islice, izip
def consume(items):
from collections import deque
deque(items, maxlen=0)
def ilen(items):
result = count()
consume(izip(items, result))
return next(result)
def pack_or_unpack(data):
start = 0
result = []
while start < len(data):
if data[start].isdigit():
# `data` is packed, bail
return unpack(data)
run = run_len(data, start)
# append the character that might repeat
result.append(data[start])
if run > 1:
# append the length of the run of characters
result.append(str(run))
start += run
return ''.join(result)
def run_len(data, start):
"""Return the end index of the run of identical characters starting at
`start`"""
return start + ilen(takewhile(lambda c: c == data[start],
islice(data, start, None)))
def unpack(data):
result = []
for i in range(len(data)):
if data[i].isdigit():
# skip digits, we'll look for them below
continue
# packed character
c = data[i]
# number of repetitions
n = 1
if (i+1) < len(data) and data[i+1].isdigit():
# if the next character is a digit, grab all the digits in the
# substring starting at i+1
n = int(''.join(takewhile(str.isdigit, data[i+1:])))
# append the repeated character
result.append(c*n) # multiplying a string with a number repeats it
return ''.join(result)
print pack_or_unpack('aaabbc')
print pack_or_unpack('a3b2c')
print pack_or_unpack('a10')
print pack_or_unpack('b5c5')
print pack_or_unpack('abc')
A regex-flavoured version of unpack() would be:
import re
UNPACK_RE = re.compile(r'(?P<char> [a-zA-Z]) (?P<count> \d+)?', re.VERBOSE)
def unpack_re(data):
matches = UNPACK_RE.finditer(data)
pairs = ((m.group('char'), m.group('count')) for m in matches)
return ''.join(char * (int(count) if count else 1)
for char, count in pairs)
This code demonstrates the most straightforward (or "basic") approach of implementing that algorithm. It's not particularly elegant or idiomatic or necessarily efficient. (It would be if written in C, but Python has the caveats such as: indexing a string copies the character into a new string, and algorithms that seem to copy data excessively might be faster than trying to avoid this if the copying is done in C and the workaround was implemented with a Python loop.)

Higher order function in Python exercise

I learning Python and during solution an exercise, function filter() returns empty list and i can't understand why. Here is my source code:
"""
Using the higher order function filter(), define a function filter_long_words()
that takes a list of words and an integer n and returns
the list of words that are longer than n.
"""
def filter_long_words(input_list, n):
print 'n = ', n
lengths = map(len, input_list)
print 'lengths = ', lengths
dictionary = dict(zip(lengths, input_list))
filtered_lengths = filter(lambda x: x > n, lengths) #i think error is here
print 'filtered_lengths = ', filtered_lengths
print 'dict = ',dictionary
result = [dictionary[i] for i in filtered_lengths]
return result
input_string = raw_input("Enter a list of words\n")
input_list = []
input_list = input_string.split(' ')
n = raw_input("Display words, that longer than...\n")
print filter_long_words(input_list, n)
Your function filter_long_words works fine, but the error stems from the fact that when you do:
n = raw_input("Display words, that longer than...\n")
print filter_long_words(input_list, n)
n is a string, not an integer.
Unfortunately, a string is always "greater" than an integer in Python (but you shouldn't compare them anyway!):
>>> 2 > '0'
False
If you're curious why, this question has the answer: How does Python compare string and int?
Regarding the rest of your code, you should not create a dictionary that maps the lengths of the strings to the strings themselves.
What happens when you have two strings of equal length? You should map the other way around: strings to their length.
But better yet: you don't even need to create a dictionary:
filtered_words = filter(lambda: len(word) > n, words)
n is a string. Convert it to an int before using it:
n = int(raw_input("Display words, that longer than...\n"))
Python 2.x will attempt to produce a consistent-but-arbitrary ordering for objects with no meaningful ordering relationship to make sorting easier. This was deemed a mistake and changed in the backwards-incompatible 3.x releases; in 3.x, this would have raised a TypeError.
I don't know what your function does, or what you think it does, just looking at it gives me a headache.
Here's a correct answer to your exercise:
def filter_long_words(input_list, n):
return filter(lambda s: len(s) > n, input_list)
My answer:
def filter_long_words():
a = raw_input("Please give a list of word's and a number: ").split()
print "You word's without your Number...", filter(lambda x: x != a, a)[:-1]
filter_long_words()

Categories

Resources