Remove equal characters from two python strings

Remove equal characters from two python strings - python

I am writing a Python code to remove equal same characters from two strings which lies on the same indices. For example remove_same('ABCDE', 'ACBDE') should make both arguments as BC and CB. I know that string is immutable here so I have converted them to list. I am getting an out of index error.
def remove_same(l_string, r_string):
l_list = list(l_string)
r_list = list(r_string)
i = 0
while i != len(l_list):
print(f'in {i} length is {len(l_list)}')
while l_list[i] == r_list[i]:
l_list.pop(i)
r_list.pop(i)
if i == len(l_list) - 1:
break
if i != len(l_list):
i += 1
return l_list[0] == r_list[0]

I would avoid using a while loop in that case, I think this is a better and more clear solution:
def remove_same(s1, s2):
l1 = list(s1)
l2 = list(s2)
out1 = []
out2 = []
for c1, c2 in zip(l1, l2):
if c1 != c2:
out1.append(c1)
out2.append(c2)
s1_out = "".join(out1)
s2_out = "".join(out2)
print(s1_out)
print(s2_out)
It could be shortened using some list comprehensions but I was trying to be as explicit as possible

I feel this could be a problem.
while l_list[i] == r_list[i]:
l_list.pop(i)
r_list.pop(i)
This could reduce size of list and it can go below i.
Do a dry run on this, if l_list = ["a"] and r_list = ["a"].

It is in general not a good idea to modify a list in a loop. Here is a cleaner, more Pythonic solution. The two strings are zipped and processed in parallel. Each pair of equal characters is discarded, and the remaining characters are arranged into new strings.
a = 'ABCDE'
b = 'ACFDE'
def remove_same(s1, s2):
return ["".join(s) for s
in zip(*[(x,y) for x,y in zip(s1,s2) if x!=y])]
remove_same(a, b)
#['BC', 'CF']

Here you go:
def remove_same(l_string, r_string):
# if either string is empty, return False
if not l_string or not r_string:
return False
l_list = list(l_string)
r_list = list(r_string)
limit = min(len(l_list), len(r_list))
i = 0
while i < limit:
if l_list[i] == r_list[i]:
l_list.pop(i)
r_list.pop(i)
limit -= 1
else:
i += 1
return l_list[0] == r_list[0]
print(remove_same('ABCDE', 'ACBDE'))
Output:
False

Related

How to compare a list of string and a string, return the best match string from the list?

My problem is on my second function code.
This is my code so far....
def simi(d1,d2):
dna_1 = d1.lower()
dna_2 = d2.lower()
lst = []
i = 0
while i < len(dna_1):
if dna_1[i] == dna_2[i]:
lst.append(1)
i += 1
return len(lst) / len(d1)
def match(list_1, d , s):
dna = []
for item in list_1:
dna.append(simi(item, d))
if max(dna) < s:
return None
return list_1[max(dna)]

You have two problems, the first is you return in the loop before you have tried all the elements, secondly your function simi(item, d)returns a float if it works correctly so trying to index a list with a float will also fail. There is no way your code could do anything but error or return None.
I imagine you want to keep track of the best each iteration and return the item that is the best based on what it's simi distance calc is and if the simi is > s or else return None:
def match(list_1, d , s):
best = None
mx = float("-inf")
for item in list_1:
f = simi(item, d)
if f > mx:
mx = f
best = item
return best if mx > s else None
You can also use range in simi instead of your while loop with a list comp:
def simi(d1,d2):
dna_1 = d1.lower()
dna_2 = d2.lower()
lst = [1 for i in range(len(dna_1)) if dna_1[i] == dna_2[i] ]
return len(lst) / len(dna_1)
But if you just want to add 1 each time they condition is True you can use sum:
def simi(d1,d2):
dna_1 = d1.lower()
dna_2 = d2.lower()
sm = sum(dna_1[i] == dna_2[i] for i in range(len(dna_1)))
return sm / len(dna_1)

Using some builtins:
from functools import partial
similarity_with_sample = partial(simi, 'TACgtAcGaCGT')
Now similarity_with_sample is a function that takes one argument, and returns its similarity with 'TACgtAcGaCGT'.
Now use that as the key argument of the builtin max function:
best_match = max(list_of_samples, key=similarity_with_sample)
I'm not sure what your s variable is doing.

Is there a way to check if a list is a sublist of another list? [duplicate]

I want to write a function that determines if a sublist exists in a larger list.
list1 = [1,0,1,1,1,0,0]
list2 = [1,0,1,0,1,0,1]
#Should return true
sublistExists(list1, [1,1,1])
#Should return false
sublistExists(list2, [1,1,1])
Is there a Python function that can do this?

Let's get a bit functional, shall we? :)
def contains_sublist(lst, sublst):
n = len(sublst)
return any((sublst == lst[i:i+n]) for i in xrange(len(lst)-n+1))
Note that any() will stop on first match of sublst within lst - or fail if there is no match, after O(m*n) ops

If you are sure that your inputs will only contain the single digits 0 and 1 then you can convert to strings:
def sublistExists(list1, list2):
return ''.join(map(str, list2)) in ''.join(map(str, list1))
This creates two strings so it is not the most efficient solution but since it takes advantage of the optimized string searching algorithm in Python it's probably good enough for most purposes.
If efficiency is very important you can look at the Boyer-Moore string searching algorithm, adapted to work on lists.
A naive search has O(n*m) worst case but can be suitable if you cannot use the converting to string trick and you don't need to worry about performance.

No function that I know of
def sublistExists(list, sublist):
for i in range(len(list)-len(sublist)+1):
if sublist == list[i:i+len(sublist)]:
return True #return position (i) if you wish
return False #or -1
As Mark noted, this is not the most efficient search (it's O(n*m)). This problem can be approached in much the same way as string searching.

My favourite simple solution is following (however, its brutal-force, so i dont recommend it on huge data):
>>> l1 = ['z','a','b','c']
>>> l2 = ['a','b']
>>>any(l1[i:i+len(l2)] == l2 for i in range(len(l1)))
True
This code above actually creates all possible slices of l1 with length of l2, and sequentially compares them with l2.
Detailed explanation
Read this explanation only if you dont understand how it works (and you want to know it), otherwise there is no need to read it
Firstly, this is how you can iterate over indexes of l1 items:
>>> [i for i in range(len(l1))]
[0, 1, 2, 3]
So, because i is representing index of item in l1, you can use it to show that actuall item, instead of index number:
>>> [l1[i] for i in range(len(l1))]
['z', 'a', 'b', 'c']
Then create slices (something like subselection of items from list) from l1 with length of2:
>>> [l1[i:i+len(l2)] for i in range(len(l1))]
[['z', 'a'], ['a', 'b'], ['b', 'c'], ['c']] #last one is shorter, because there is no next item.
Now you can compare each slice with l2 and you see that second one matched:
>>> [l1[i:i+len(l2)] == l2 for i in range(len(l1))]
[False, True, False, False] #notice that the second one is that matching one
Finally, with function named any, you can check if at least one of booleans is True:
>>> any(l1[i:i+len(l2)] == l2 for i in range(len(l1)))
True

The efficient way to do this is to use the Boyer-Moore algorithm, as Mark Byers suggests. I have done it already here: Boyer-Moore search of a list for a sub-list in Python, but will paste the code here. It's based on the Wikipedia article.
The search() function returns the index of the sub-list being searched for, or -1 on failure.
def search(haystack, needle):
"""
Search list `haystack` for sublist `needle`.
"""
if len(needle) == 0:
return 0
char_table = make_char_table(needle)
offset_table = make_offset_table(needle)
i = len(needle) - 1
while i < len(haystack):
j = len(needle) - 1
while needle[j] == haystack[i]:
if j == 0:
return i
i -= 1
j -= 1
i += max(offset_table[len(needle) - 1 - j], char_table.get(haystack[i]));
return -1
def make_char_table(needle):
"""
Makes the jump table based on the mismatched character information.
"""
table = {}
for i in range(len(needle) - 1):
table[needle[i]] = len(needle) - 1 - i
return table
def make_offset_table(needle):
"""
Makes the jump table based on the scan offset in which mismatch occurs.
"""
table = []
last_prefix_position = len(needle)
for i in reversed(range(len(needle))):
if is_prefix(needle, i + 1):
last_prefix_position = i + 1
table.append(last_prefix_position - i + len(needle) - 1)
for i in range(len(needle) - 1):
slen = suffix_length(needle, i)
table[slen] = len(needle) - 1 - i + slen
return table
def is_prefix(needle, p):
"""
Is needle[p:end] a prefix of needle?
"""
j = 0
for i in range(p, len(needle)):
if needle[i] != needle[j]:
return 0
j += 1
return 1
def suffix_length(needle, p):
"""
Returns the maximum length of the substring ending at p that is a suffix.
"""
length = 0;
j = len(needle) - 1
for i in reversed(range(p + 1)):
if needle[i] == needle[j]:
length += 1
else:
break
j -= 1
return length
Here is the example from the question:
def main():
list1 = [1,0,1,1,1,0,0]
list2 = [1,0,1,0,1,0,1]
index = search(list1, [1, 1, 1])
print(index)
index = search(list2, [1, 1, 1])
print(index)
if __name__ == '__main__':
main()
Output:
2
-1

Here is a way that will work for simple lists that is slightly less fragile than Mark's
def sublistExists(haystack, needle):
def munge(s):
return ", "+format(str(s)[1:-1])+","
return munge(needle) in munge(haystack)

def sublistExists(x, y):
occ = [i for i, a in enumerate(x) if a == y[0]]
for b in occ:
if x[b:b+len(y)] == y:
print 'YES-- SUBLIST at : ', b
return True
if len(occ)-1 == occ.index(b):
print 'NO SUBLIST'
return False
list1 = [1,0,1,1,1,0,0]
list2 = [1,0,1,0,1,0,1]
#should return True
sublistExists(list1, [1,1,1])
#Should return False
sublistExists(list2, [1,1,1])

Might as well throw in a recursive version of #NasBanov's solution
def foo(sub, lst):
'''Checks if sub is in lst.
Expects both arguments to be lists
'''
if len(lst) < len(sub):
return False
return sub == lst[:len(sub)] or foo(sub, lst[1:])

def sublist(l1,l2):
if len(l1) < len(l2):
for i in range(0, len(l1)):
for j in range(0, len(l2)):
if l1[i]==l2[j] and j==i+1:
pass
return True
else:
return False

I know this might not be quite relevant to the original question but it might be very elegant 1 line solution to someone else if the sequence of items in both lists doesn't matter. The result below will show True if List1 elements are in List2 (regardless of order). If the order matters then don't use this solution.
List1 = [10, 20, 30]
List2 = [10, 20, 30, 40]
result = set(List1).intersection(set(List2)) == set(List1)
print(result)
Output
True

if iam understanding this correctly, you have a larger list, like :
list_A= ['john', 'jeff', 'dave', 'shane', 'tim']
then there are other lists
list_B= ['sean', 'bill', 'james']
list_C= ['cole', 'wayne', 'jake', 'moose']
and then i append the lists B and C to list A
list_A.append(list_B)
list_A.append(list_C)
so when i print list_A
print (list_A)
i get the following output
['john', 'jeff', 'dave', 'shane', 'tim', ['sean', 'bill', 'james'], ['cole', 'wayne', 'jake', 'moose']]
now that i want to check if the sublist exists:
for value in list_A:
value= type(value)
value= str(value).strip('<>').split()[1]
if (value == "'list'"):
print "True"
else:
print "False"
this will give you 'True' if you have any sublist inside the larger list.

Is this list comparison in Python unnecessary?

The Count and Compare Anagram solution provided at interactivepython.org checks iterates through the lists for a final time checking if the count for each ASCII value is the same.
j = 0
stillOK = True
while j<26 and stillOK:
if c1[j]==c2[j]:
j = j + 1
else:
stillOK = False
return stillOK
Why not use the comparison operator?
return (c1 == c2)
Full code:
def anagramSolution4(s1,s2):
c1 = [0]*26
c2 = [0]*26
for i in range(len(s1)):
pos = ord(s1[i])-ord('a')
c1[pos] = c1[pos] + 1
for i in range(len(s2)):
pos = ord(s2[i])-ord('a')
c2[pos] = c2[pos] + 1
j = 0
stillOK = True
while j<26 and stillOK:
if c1[j]==c2[j]:
j = j + 1
else:
stillOK = False
return stillOK
print(anagramSolution4('apple','pleap'))
Edited to add:
I've tested with:
anagramSolution4('abc','cba') #returns True
anagramSolution4('abc','cbd') #returns False
anagramSolution4('abc','cbah') #returns False
..they all pass. What would be an appropriate test to show c1==c2 fails?

Using == on the two lists would produce the same result, but would also hide some implementation details. Given that the script comes from a website used for learning, I guess this is for learning purposes.
Also, I see that in the webpage you are asked some questions about complexities. Well, using c1 == c2 instead of the loop would probably mislead some people and make them think that the operation is O(1) instead of O(min(len(c1), len(c2)))[1].
Finally, note that there are many languages which have no notion of lists.
[1] This isn't necessarily true either, as the two lists may contain items with custom and complex __eq__() methods.

How to merge two sorted lists?

I'm supposed to write a code to merge sorted lists, I got an error in the second function which says:
TypeError: object of type 'NoneType' has no len()
my code is :
def merge(lst1, lst2):
""" merging two ordered lists using
the three pointer algorithm """
n1 = len(lst1)
n2 = len(lst2)
lst3 = [0 for i in range(n1 + n2)] # alocates a new list
i = j = k = 0 # simultaneous assignment
while (i < n1 and j < n2):
if (lst1[i] <= lst2[j]):
lst3[k] = lst1[i]
i = i +1
else:
lst3[k] = lst2[j]
j = j + 1
k = k + 1 # incremented at each iteration
lst3[k:] = lst1[i:] + lst2[j:] # append remaining elements
def multi_merge_v3(lst_of_lsts):
m = len(lst_of_lsts)
merged = []
for i in range(m):
merged= merge((merged),(lst_of_lsts)[i])
return(merged)
what does this error mean?
what should I fix in my code?

You're not returning anything from function merge(), so by default it returns None that you assigned to merged. So, during the second call to merge() you'll be doing len(None).
for i in range(m):
#Assigning None here
merged = merge(merged, lst_of_lsts[i])
return(merged)
So, at the end of that function:
return lst3

Here's a simple take on the sorted-list merging problem.
l1 = [1,12,15,26,38]
l2 = [2,13,17,30,45,50]
# merged list
ml= [0]*(len(l1)+len(l2))
i1 = i2 = 0
for i in range(len(l1)+len(l2)):
if i1 == len(l1):
ml = ml[:i] + l2[i2:]
break
if i2 == len(l2):
ml = ml[:i] + l1[i1:]
break
if l1[i1] < l2[i2]:
ml[i] = l1[i1]
i1 = i1 + 1
else:
ml[i] = l2[i2]
i2 = i2 + 1
print ml

Here's an option: rely on Python's built-in sorted function:
merged = sorted(lst1 + lst2)

Here's another answer: use the "merge" function of heapq:
import heapq
merged = heapq.merge(lst1, lst2)
This is very efficient as merge expects lst1 and lst2 to already be sorted, so it only looks at the first element as it goes.
Note that merge also allow multiple (already sorted) arguments:
merged = heapq.merge(lst1, lst2, lst3, lst4)
At this point, merged is actually itself an iterator. To get an actual list, instead use:
merged = list(heapq.merge(lst1, lst2))
Remember: with Python, the "batteries are included".

Python set intersection question

I have three sets:
s0 = [set([16,9,2,10]), set([16,14,22,15]), set([14,7])] # true, 16 and 14
s1 = [set([16,9,2,10]), set([16,14,22,15]), set([7,8])] # false
I want a function that will return True if every set in the list intersects with at least one other set in the list. Is there a built-in for this or a simple list comprehension?

all(any(a & b for a in s if a is not b) for b in s)

Here's a very simple solution that's very efficient for large inputs:
def g(s):
import collections
count = collections.defaultdict(int)
for a in s:
for x in a:
count[x] += 1
return all(any(count[x] > 1 for x in a) for a in s)

It's a little verbose but I think it's a pretty efficient solution. It takes advantage of the fact that when two sets intersect, we can mark them both as connected. It does this by keeping a list of flags as long as the list of sets. when set i and set j intersect, it sets the flag for both of them. It then loops over the list of sets and only tries to find a intersection for sets that haven't already been intersected. After reading the comments, I think this is what #Victor was talking about.
s0 = [set([16,9,2,10]), set([16,14,22,15]), set([14,7])] # true, 16 and 14
s1 = [set([16,9,2,10]), set([16,14,22,15]), set([7,8])] # false
def connected(sets):
L = len(sets)
if not L: return True
if L == 1: return False
passed = [False] * L
i = 0
while True:
while passed[i]:
i += 1
if i == L:
return True
for j, s in enumerate(sets):
if j == i: continue
if sets[i] & s:
passed[i] = passed[j] = True
break
else:
return False
print connected(s0)
print connected(s1)
I decided that an empty list of sets is connected (If you produce an element of the list, I can produce an element that it intersects ;). A list with only one element is dis-connected trivially. It's one line to change in either case if you disagree.

Here's a more efficient (if much more complicated) solution, that performs a linear number of intersections and a number of unions of order O( n*log(n) ), where n is the length of s:
def f(s):
import math
j = int(math.log(len(s) - 1, 2)) + 1
unions = [set()] * (j + 1)
for i, a in enumerate(s):
unions[:j] = [set.union(set(), *s[i+2**k:i+2**(k+1)]) for k in range(j)]
if not (a & set.union(*unions)):
return False
j = int(math.log(i ^ (i + 1), 2))
unions[j] = set.union(a, *unions[:j])
return True
Note that this solution only works on Python >= 2.6.

As usual I'd like to give the inevitable itertools solution ;-)
from itertools import combinations, groupby
from operator import itemgetter
def any_intersects( sets ):
# we are doing stuff with combinations of sets
combined = combinations(sets,2)
# group these combinations by their first set
grouped = (g for k,g in groupby( combined, key=itemgetter(0)))
# are any intersections in each group
intersected = (any((a&b) for a,b in group) for group in grouped)
return all( intersected )
s0 = [set([16,9,2,10]), set([16,14,22,15]), set([14,7])]
s1 = [set([16,9,2,10]), set([16,14,22,15]), set([7,8])]
print any_intersects( s0 ) # True
print any_intersects( s1 ) # False
This is really lazy and will only do the intersections that are required. It can also be a very confusing and unreadable oneliner ;-)

To answer your question, no, there isn't a built-in or simple list comprehension that does what you want. Here's another itertools based solution that is very efficient -- surprisingly about twice as fast as #THC4k's itertools answer using groupby() in timing tests using your sample input. It could probably be optimized a bit further, but is very readable as presented. Like #AaronMcSmooth, I arbitrarily decided what to return when there are no or is only one set in the input list.
from itertools import combinations
def all_intersect(sets):
N = len(sets)
if not N: return True
if N == 1: return False
intersected = [False] * N
for i,j in combinations(xrange(N), 2):
if not intersected[i] or not intersected[j]:
if sets[i] & sets[j]:
intersected[i] = intersected[j] = True
return all(intersected)

This strategy isn't likely to be as efficient as #Victor's suggestion, but might be more efficient than jchl's answer due to increased use of set arithmetic (union).
s0 = [set([16,9,2,10]), set([16,14,22,15]), set([14,7])]
s1 = [set([16,9,2,10]), set([16,14,22,15]), set([7,8])]
def freeze(list_of_sets):
"""Transform a list of sets into a frozenset of frozensets."""
return frozenset(frozenset(set_) for set_ in list_of_sets)
def all_sets_have_relatives(set_of_sets):
"""Check if all sets have another set that they intersect with.
>>> all_sets_have_relatives(s0) # true, 16 and 14
True
>>> all_sets_have_relatives(s1) # false
False
"""
set_of_sets = freeze(set_of_sets)
def has_relative(set_):
return set_ & frozenset.union(*(set_of_sets - set((set_,))))
return all(has_relative(set) for set in set_of_sets)

This may give better performance depending on the distribution of the sets.
def all_intersect(s):
count = 0
for x, a in enumerate(s):
for y, b in enumerate(s):
if a & b and x!=y:
count += 1
break
return count == len(s)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove equal characters from two python strings - python

I feel this could be a problem. while l_list[i] == r_list[i]: l_list.pop(i) r_list.pop(i) This could reduce size of list and it can go below i. Do a dry run on this, if l_list = ["a"] and r_list = ["a"].

Related

How to compare a list of string and a string, return the best match string from the list?

Is there a way to check if a list is a sublist of another list? [duplicate]

Is this list comparison in Python unnecessary?

How to merge two sorted lists?

Python set intersection question

Categories

Resources