Is this list comparison in Python unnecessary? - python

The Count and Compare Anagram solution provided at interactivepython.org checks iterates through the lists for a final time checking if the count for each ASCII value is the same.
j = 0
stillOK = True
while j<26 and stillOK:
if c1[j]==c2[j]:
j = j + 1
else:
stillOK = False
return stillOK
Why not use the comparison operator?
return (c1 == c2)
Full code:
def anagramSolution4(s1,s2):
c1 = [0]*26
c2 = [0]*26
for i in range(len(s1)):
pos = ord(s1[i])-ord('a')
c1[pos] = c1[pos] + 1
for i in range(len(s2)):
pos = ord(s2[i])-ord('a')
c2[pos] = c2[pos] + 1
j = 0
stillOK = True
while j<26 and stillOK:
if c1[j]==c2[j]:
j = j + 1
else:
stillOK = False
return stillOK
print(anagramSolution4('apple','pleap'))
Edited to add:
I've tested with:
anagramSolution4('abc','cba') #returns True
anagramSolution4('abc','cbd') #returns False
anagramSolution4('abc','cbah') #returns False
..they all pass. What would be an appropriate test to show c1==c2 fails?

Using == on the two lists would produce the same result, but would also hide some implementation details. Given that the script comes from a website used for learning, I guess this is for learning purposes.
Also, I see that in the webpage you are asked some questions about complexities. Well, using c1 == c2 instead of the loop would probably mislead some people and make them think that the operation is O(1) instead of O(min(len(c1), len(c2)))[1].
Finally, note that there are many languages which have no notion of lists.
[1] This isn't necessarily true either, as the two lists may contain items with custom and complex __eq__() methods.

Related

Remove equal characters from two python strings

I am writing a Python code to remove equal same characters from two strings which lies on the same indices. For example remove_same('ABCDE', 'ACBDE') should make both arguments as BC and CB. I know that string is immutable here so I have converted them to list. I am getting an out of index error.
def remove_same(l_string, r_string):
l_list = list(l_string)
r_list = list(r_string)
i = 0
while i != len(l_list):
print(f'in {i} length is {len(l_list)}')
while l_list[i] == r_list[i]:
l_list.pop(i)
r_list.pop(i)
if i == len(l_list) - 1:
break
if i != len(l_list):
i += 1
return l_list[0] == r_list[0]
I would avoid using a while loop in that case, I think this is a better and more clear solution:
def remove_same(s1, s2):
l1 = list(s1)
l2 = list(s2)
out1 = []
out2 = []
for c1, c2 in zip(l1, l2):
if c1 != c2:
out1.append(c1)
out2.append(c2)
s1_out = "".join(out1)
s2_out = "".join(out2)
print(s1_out)
print(s2_out)
It could be shortened using some list comprehensions but I was trying to be as explicit as possible
I feel this could be a problem.
while l_list[i] == r_list[i]:
l_list.pop(i)
r_list.pop(i)
This could reduce size of list and it can go below i.
Do a dry run on this, if l_list = ["a"] and r_list = ["a"].
It is in general not a good idea to modify a list in a loop. Here is a cleaner, more Pythonic solution. The two strings are zipped and processed in parallel. Each pair of equal characters is discarded, and the remaining characters are arranged into new strings.
a = 'ABCDE'
b = 'ACFDE'
def remove_same(s1, s2):
return ["".join(s) for s
in zip(*[(x,y) for x,y in zip(s1,s2) if x!=y])]
remove_same(a, b)
#['BC', 'CF']
Here you go:
def remove_same(l_string, r_string):
# if either string is empty, return False
if not l_string or not r_string:
return False
l_list = list(l_string)
r_list = list(r_string)
limit = min(len(l_list), len(r_list))
i = 0
while i < limit:
if l_list[i] == r_list[i]:
l_list.pop(i)
r_list.pop(i)
limit -= 1
else:
i += 1
return l_list[0] == r_list[0]
print(remove_same('ABCDE', 'ACBDE'))
Output:
False

How to write a function to find the longest common subsequence using dynamic programming?

To be clear I am looking for the subsequence itself and not the length. I have written this function which works the majority of the time but in some cases it doesn't work. I have to write this recursively without any loops or imports. I used a memoise function to be more efficient but didn't include it here.
This function works when s1 = "abcde" and s2 = "qbxxd" (which correctly returns "bd") but it doesn't work for when s1 = "Look at me, I can fly!" and s2 = "Look at that, it's a fly" which should return "Look at , a fly" but I get instead "Look at a fly". For whatever reason the comma and the space is ignored. I've tried s1 = "ab, cde" and s2 = "qbxx, d" which correctly returns "b, d".
def lcs(s1, s2):
"""y5tgr"""
i = len(s1)
j = len(s2)
if i == 0 or j == 0:
return ""
if s1[i-1] == s2[j-1]:
return lcs(s1[:-1], s2[:-1]) + s1[i-1]
else:
return max(lcs(s1[:-1], s2), lcs(s1, s2[:-1]))
I have a feeling the problem is with the last line and the max function. I've seen solutions with for and while loops but not without.
There's only a slight change to fix your code (you're right the problem was in max).
Just change max so it finds the string of max length using it's key function.
def lcs(s1, s2):
"""y5tgr"""
i = len(s1)
j = len(s2)
if i == 0 or j == 0:
return ""
if s1[i-1] == s2[j-1]:
return lcs(s1[:-1], s2[:-1]) + s1[i-1]
else:
# Find max based upon the string length
return max(lcs(s1[:-1], s2), lcs(s1, s2[:-1]), key=len)
However, this is very slow without memoization
Code with Memoization (to improve performance)
Memoization Decorator Reference
import functools
def memoize(obj):
cache = obj.cache = {}
#functools.wraps(obj)
def memoizer(*args, **kwargs):
if args not in cache:
cache[args] = obj(*args, **kwargs)
return cache[args]
return memoizer
#memoize
def lcs(s1, s2):
"""y5tgr"""
i = len(s1)
j = len(s2)
if i == 0 or j == 0:
return ""
if s1[i-1] == s2[j-1]:
return lcs(s1[:-1], s2[:-1]) + s1[i-1]
else:
return max(lcs(s1[:-1], s2), lcs(s1, s2[:-1]), key=len)
Test
s1 = "Look at me, I can fly!"
s2 = "Look at that, it's a fly"
print(lcs(s1, s2))
Output
Look at , a fly
For strings, max takes the string which lexicographically goes last:
>>> max("a", "b")
'b'
>>> max("aaaaa", "b")
'b'
>>>
Certainly not what you need; you seem to look for the longer of the two.
You don't need a loop, just a comparison:
lsc1 = lcs(s1[:-1], s2)
lcs2 = lcs(s1, s2[:-1])
return lcs1 if len(lcs1) > len(lcs2) else lcs2

backtracking not trying all possibilities

so I've got a list of questions as a dictionary, e.g
{"Question1": 3, "Question2": 5 ... }
That means the "Question1" has 3 points, the second one has 5, etc.
I'm trying to create all subset of question that have between a certain number of questions and points.
I've tried something like
questions = {"Q1":1, "Q2":2, "Q3": 1, "Q4" : 3, "Q5" : 1, "Q6" : 2}
u = 3 #
v = 5 # between u and v questions
x = 5 #
y = 10 #between x and y points
solution = []
n = 0
def main(n_):
global n
n = n_
global solution
solution = []
finalSolution = []
for x in questions.keys():
solution.append("_")
finalSolution.extend(Backtracking(0))
return finalSolution
def Backtracking(k):
finalSolution = []
for c in questions.keys():
solution[k] = c
print ("candidate: ", solution)
if not reject(k):
print ("not rejected: ", solution)
if accept(k):
finalSolution.append(list(solution))
else:
finalSolution.extend(Backtracking(k+1))
return finalSolution
def reject(k):
if solution[k] in solution: #if the question already exists
return True
if k > v: #too many questions
return True
points = 0
for x in solution:
if x in questions.keys():
points = points + questions[x]
if points > y: #too many points
return True
return False
def accept(k):
points = 0
for x in solution:
if x in questions.keys():
points = points + questions[x]
if points in range (x, y+1) and k in range (u, v+1):
return True
return False
print(main(len(questions.keys())))
but it's not trying all possibilities, only putting all the questions on the first index..
I have no idea what I'm doing wrong.
There are three problems with your code.
The first issue is that the first check in your reject function is always True. You can fix that in a variety of ways (you commented that you're now using solution.count(solution[k]) != 1).
The second issue is that your accept function uses the variable name x for what it intends to be two different things (a question from solution in the for loop and the global x that is the minimum number of points). That doesn't work, and you'll get a TypeError when trying to pass it to range. A simple fix is to rename the loop variable (I suggest q since it's a key into questions). Checking if a value is in a range is also a bit awkward. It's usually much nicer to use chained comparisons: if x <= points <= y and u <= k <= v
The third issue is that you're not backtracking at all. The backtracking step needs to reset the global solution list to the same state it had before Backtracking was called. You can do this at the end of the function, just before you return, using solution[k] = "_" (you commented that you've added this line, but I think you put it in the wrong place).
Anyway, here's a fixed version of your functions:
def Backtracking(k):
finalSolution = []
for c in questions.keys():
solution[k] = c
print ("candidate: ", solution)
if not reject(k):
print ("not rejected: ", solution)
if accept(k):
finalSolution.append(list(solution))
else:
finalSolution.extend(Backtracking(k+1))
solution[k] = "_" # backtracking step here!
return finalSolution
def reject(k):
if solution.count(solution[k]) != 1: # fix this condition
return True
if k > v:
return True
points = 0
for q in solution:
if q in questions:
points = points + questions[q]
if points > y: #too many points
return True
return False
def accept(k):
points = 0
for q in solution: # change this loop variable (also done above, for symmetry)
if q in questions:
points = points + questions[q]
if x <= points <= y and u <= k <= v: # chained comparisons are much nicer than range
return True
return False
There are still things that could probably be improved in there. I think having solution be a fixed-size global list with dummy values is especially unpythonic (a dynamically growing list that you pass as an argument would be much more natural). I'd also suggest using sum to add up the points rather than using an explicit loop of your own.

Checking if word segmentation is possible

This is a follow up question to this response and the pseudo-code algorithm that the user posted. I didn't comment on that question because of its age. I am only interested in validating whether or not a string can be split into words. The algorithm doesn't need to actually split the string. This is the response from the linked question:
Let S[1..length(w)] be a table with Boolean entries. S[i] is true if
the word w[1..i] can be split. Then set S[1] = isWord(w[1]) and for
i=2 to length(w) calculate
S[i] = (isWord[w[1..i] or for any j in {2..i}: S[j-1] and
isWord[j..i]).
I'm translating this algorithm into simple python code, but I'm not sure if I'm understanding it properly. Code:
def is_all_words(a_string, dictionary)):
str_len = len(a_string)
S = [False] * str_len
S[0] = is_word(a_string[0], dictionary)
for i in range(1, str_len):
check = is_word(a_string[0:i], dictionary)
if (check):
S[i] = check
else:
for j in range(1, str_len):
check = (S[j - 1] and is_word(a_string[j:i]), dictionary)
if (check):
S[i] == True
break
return S
I have two related questions. 1) Is this code a proper translation of the linked algorithm into Python, and if it is, 2) Now that I have S, how do I use it to tell if the string is only comprised of words? In this case, is_word is a function that simply looks a given word up in a list. I haven't implemented it as a trie yet.
UPDATE: After updating the code to include the suggested change, it doesn't work. This is the updated code:
def is_all_words(a_string, dictionary)):
str_len = len(a_string)
S = [False] * str_len
S[0] = is_word(a_string[0], dictionary)
for i in range(1, str_len):
check = is_word(a_string[0:i], dictionary)
if (check):
S[i] = check
else:
for j in range(1, i): #THIS LINE WAS UPDATED
check = (S[j - 1] and is_word(a_string[j:i]), dictionary)
if (check):
S[i] == True
break
return S
a_string = "carrotforever"
S = is_all_words(a_string, dictionary)
print(S[len(S) - 1]) #prints FALSE
a_string = "hello"
S = is_all_words(a_string, dictionary)
print(S[len(S) - 1]) #prints TRUE
It should return True for both of these.
Here is a modified version of your code that should return good results.
Notice that your mistake was simply in the translation from pseudocode array indexing (starting at 1) to python array indexing (starting at 0) therefore S[0] and S[1] where populated with the same value where S[L-1] was actually never computed. You can easily trace this mistake by printing the whole S values. You will find that S[3] is set true in the first example where it should be S[2] for the word "car".
Also you could speed up the process by storing the index of composite words found so far, instead of testing each position.
def is_all_words(a_string, dictionary):
str_len = len(a_string)
S = [False] * (str_len)
# I replaced is_word function by a simple list lookup,
# feel free to replace it with whatever function you use.
# tries or suffix tree are best for this.
S[0] = (a_string[0] in dictionary)
for i in range(1, str_len):
check = a_string[0:i+1] in dictionary # i+1 instead of i
if (check):
S[i] = check
else:
for j in range(0,i+1): # i+1 instead of i
if (S[j-1] and (a_string[j:i+1] in dictionary)): # i+1 instead of i
S[i] = True
break
return S
a_string = "carrotforever"
S = is_all_words(a_string, ["a","car","carrot","for","eve","forever"])
print(S[len(a_string)-1]) #prints TRUE
a_string = "helloworld"
S = is_all_words(a_string, ["hello","world"])
print(S[len(a_string)-1]) #prints TRUE
For a real-world example of how to do English word segmentation, look at the source of the Python wordsegment module. It's a little more sophisticated because it uses word and phrase frequency tables but it illustrates the recursive approach. By modifying the score function you can prioritize longer matches.
Installation is easy with pip:
$ pip install wordsegment
And segment returns a list of words:
>>> import wordsegment
>>> wordsegment.segment('carrotfever')
['carrot', 'forever']
1) at first glance, looks good. One thing: for j in range(1, str_len): should be for j in range(1, i): I think
2) if S[str_len-1]==true, then the whole string should consist of whole words only.
After all S[i] is true iff
the whole string from 0 to i consists of a single dictionary word
OR there exists a S[j-1]==true with j<i, and the string[j:i] is a single dictionaryword
so if S[str_len-1] is true, then the whole string is composed out of dictionary words

Python set intersection question

I have three sets:
s0 = [set([16,9,2,10]), set([16,14,22,15]), set([14,7])] # true, 16 and 14
s1 = [set([16,9,2,10]), set([16,14,22,15]), set([7,8])] # false
I want a function that will return True if every set in the list intersects with at least one other set in the list. Is there a built-in for this or a simple list comprehension?
all(any(a & b for a in s if a is not b) for b in s)
Here's a very simple solution that's very efficient for large inputs:
def g(s):
import collections
count = collections.defaultdict(int)
for a in s:
for x in a:
count[x] += 1
return all(any(count[x] > 1 for x in a) for a in s)
It's a little verbose but I think it's a pretty efficient solution. It takes advantage of the fact that when two sets intersect, we can mark them both as connected. It does this by keeping a list of flags as long as the list of sets. when set i and set j intersect, it sets the flag for both of them. It then loops over the list of sets and only tries to find a intersection for sets that haven't already been intersected. After reading the comments, I think this is what #Victor was talking about.
s0 = [set([16,9,2,10]), set([16,14,22,15]), set([14,7])] # true, 16 and 14
s1 = [set([16,9,2,10]), set([16,14,22,15]), set([7,8])] # false
def connected(sets):
L = len(sets)
if not L: return True
if L == 1: return False
passed = [False] * L
i = 0
while True:
while passed[i]:
i += 1
if i == L:
return True
for j, s in enumerate(sets):
if j == i: continue
if sets[i] & s:
passed[i] = passed[j] = True
break
else:
return False
print connected(s0)
print connected(s1)
I decided that an empty list of sets is connected (If you produce an element of the list, I can produce an element that it intersects ;). A list with only one element is dis-connected trivially. It's one line to change in either case if you disagree.
Here's a more efficient (if much more complicated) solution, that performs a linear number of intersections and a number of unions of order O( n*log(n) ), where n is the length of s:
def f(s):
import math
j = int(math.log(len(s) - 1, 2)) + 1
unions = [set()] * (j + 1)
for i, a in enumerate(s):
unions[:j] = [set.union(set(), *s[i+2**k:i+2**(k+1)]) for k in range(j)]
if not (a & set.union(*unions)):
return False
j = int(math.log(i ^ (i + 1), 2))
unions[j] = set.union(a, *unions[:j])
return True
Note that this solution only works on Python >= 2.6.
As usual I'd like to give the inevitable itertools solution ;-)
from itertools import combinations, groupby
from operator import itemgetter
def any_intersects( sets ):
# we are doing stuff with combinations of sets
combined = combinations(sets,2)
# group these combinations by their first set
grouped = (g for k,g in groupby( combined, key=itemgetter(0)))
# are any intersections in each group
intersected = (any((a&b) for a,b in group) for group in grouped)
return all( intersected )
s0 = [set([16,9,2,10]), set([16,14,22,15]), set([14,7])]
s1 = [set([16,9,2,10]), set([16,14,22,15]), set([7,8])]
print any_intersects( s0 ) # True
print any_intersects( s1 ) # False
This is really lazy and will only do the intersections that are required. It can also be a very confusing and unreadable oneliner ;-)
To answer your question, no, there isn't a built-in or simple list comprehension that does what you want. Here's another itertools based solution that is very efficient -- surprisingly about twice as fast as #THC4k's itertools answer using groupby() in timing tests using your sample input. It could probably be optimized a bit further, but is very readable as presented. Like #AaronMcSmooth, I arbitrarily decided what to return when there are no or is only one set in the input list.
from itertools import combinations
def all_intersect(sets):
N = len(sets)
if not N: return True
if N == 1: return False
intersected = [False] * N
for i,j in combinations(xrange(N), 2):
if not intersected[i] or not intersected[j]:
if sets[i] & sets[j]:
intersected[i] = intersected[j] = True
return all(intersected)
This strategy isn't likely to be as efficient as #Victor's suggestion, but might be more efficient than jchl's answer due to increased use of set arithmetic (union).
s0 = [set([16,9,2,10]), set([16,14,22,15]), set([14,7])]
s1 = [set([16,9,2,10]), set([16,14,22,15]), set([7,8])]
def freeze(list_of_sets):
"""Transform a list of sets into a frozenset of frozensets."""
return frozenset(frozenset(set_) for set_ in list_of_sets)
def all_sets_have_relatives(set_of_sets):
"""Check if all sets have another set that they intersect with.
>>> all_sets_have_relatives(s0) # true, 16 and 14
True
>>> all_sets_have_relatives(s1) # false
False
"""
set_of_sets = freeze(set_of_sets)
def has_relative(set_):
return set_ & frozenset.union(*(set_of_sets - set((set_,))))
return all(has_relative(set) for set in set_of_sets)
This may give better performance depending on the distribution of the sets.
def all_intersect(s):
count = 0
for x, a in enumerate(s):
for y, b in enumerate(s):
if a & b and x!=y:
count += 1
break
return count == len(s)

Categories

Resources