Complex list comparisons in python - python

I want to do a complex list comparison with python. I want to see if listB contains all of the items from listA and if they are in the same order. But I do not care if listB has extra items or interleaved items.
Examples:
listA = ['A','B','C','D','E']
listB = [':','A','*','B','C','D','E','`']
A, B, C, D, and E all appear in listB and are presented in the same order even though A and B have an item in between them and items at the start and end of listB.
Extra complicated:
listA = ['A','B','C','D','E']
listB = ['A','*','C','B','C','D','E']
A, B, C, D, and E all appear in listB and are presented in the same order even though A and B have two items in between them and one of those items happens to be something we are searching for. But since we are looking if A -> B is sequential and B -> C is sequential the fact that we also have C -> B -> C shouldn't matter.
So,
listA = ['A','B','C','D','E']
listB = [':','A','*','B','C','D','E','`']
Would be True
listA = ['A','B','C','D','E']
listB = ['A','*','C','B','C','D','E']
Would be True
But something like:
listA = ['A','B','C','D','E']
listB = ['A','B','C','D','F']
or even
listB = ['A','B','C','D']
Would be False
If it get a False answer, I'd ideally like to be able to point to where the break in sequence happened -- i.e. E is missing.

Simple solution using a nested loop. Walk over listA and search the elements in listB in order. Should you fail at any point -> this is not a substring:
def check(listA, listB):
start = 0
for a in listA:
for i in range(start, len(listB)):
if a == listB[i]:
start = i+1
break
else: # triggered only if no break
# print(f'{a} not found after position {start}')
return False
return True
check('ABCDE', 'A*CBCDE')
# True
check('ABCDEF', 'A*CBCDE')
# False
check ('', '')
# True
check('ABA', 'ABBA')
# True
NB. Using strings here for clarity, but this works for any iterable.
To get information on the non-found item, you can uncomment the print.
Example:
check('ABCA', 'ABBA')
C not found after position 2
# False

As always Python comes with batteries included, use SequenceMatcher.get_matching_blocks for a one-line solution:
from difflib import SequenceMatcher
def check(lstA, lstB):
sm = SequenceMatcher(isjunk=lambda x: x not in set(lstA), a=lstA, b=lstB)
return sum(block.size for block in sm.get_matching_blocks()) == len(lstA)
print(check(['A', 'B', 'C', 'D', 'E'], [':', 'A', '*', 'B', 'C', 'D', 'E', '`']))
print(check(['A', 'B', 'C', 'D', 'E'], ['A', '*', 'C', 'B', 'C', 'D', 'E']))
print(check(['A', 'B', 'C', 'D', 'E'], ['A', 'B', 'C', 'D', 'F']))
Output
True
True
False
Notice that this also works for any sequence, like strings:
print(check('ABCDE', 'A*CBCDE'))
print(check('ABCDEF', 'A*CBCDE'))
print(check('', ''))
print(check('ABA', 'ABBA'))
Output (for strings)
True
False
True
True
SequenceMatcher will even give information on how to transform one sequence into the other:
listA = ['A', 'B', 'C', 'D', 'E']
listB = [':', 'A', '*', 'B', 'C', 'D', 'E', '`']
s = SequenceMatcher(None, listA, listB)
for tag, i1, i2, j1, j2 in s.get_opcodes():
print('{:7} a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(tag, i1, i2, j1, j2, listA[i1:i2], listB[j1:j2]))
Output
insert a[0:0] --> b[0:1] [] --> [':']
equal a[0:1] --> b[1:2] ['A'] --> ['A']
insert a[1:1] --> b[2:3] [] --> ['*']
equal a[1:5] --> b[3:7] ['B', 'C', 'D', 'E'] --> ['B', 'C', 'D', 'E']
insert a[5:5] --> b[7:8] [] --> ['`']

Recursive Solution
def is_contained(a, b):
# Base case
if not a or not b:
return True if not a else False
# if first letters match check remaining, else check b remaining
return is_contained(a[1:], b[1:]) if a[0] == b[0] else is_contained(a, b[1:])
Test
print(is_contained(['A','B','C','D','E'], ['A','*','C','B','C','D','E'])) # True
print(is_contained(['A','B','C','D','E'], ['A','B','C','D','F'])) # False

Related

complete list if the first and last element is equal

I have a problem trying to transform a list.
The original list is like this:
[['a','b','c',''],['c','e','f'],['c','g','h']]
now I want to make the output like this:
[['a','b','c','e','f'],['a','b','c','g','h']]
When the blank is found ( '' ) merge the three list into two lists.
I need to write a function to do this for me.
Here is what I tried:
for x in mylist:
if x[len(x) - 1] == '':
m = x[len(x) - 2]
for y in mylist:
if y[0] == m:
combine(x, y)
def combine(x, y):
for m in y:
if not m in x:
x.append(m)
return(x)
but its not working the way I want.
try this :
mylist = [['a','b','c',''],['c','e','f'],['c','g','h']]
def combine(x, y):
for m in y:
if not m in x:
x.append(m)
return(x)
result = []
for x in mylist:
if x[len(x) - 1] == '':
m = x[len(x) - 2]
for y in mylist:
if y[0] == m:
result.append(combine(x[0:len(x)-2], y))
print(result)
your problem was with
combine(x[0:len(x)-2], y)
output :
[['a', 'b', 'c', 'e', 'f'], ['a', 'b', 'c', 'g', 'h']]
So you basically want to merge 2 lists? If so, you can use one of 2 ways :
Either use the + operator, or use the
extend() method.
And then you put it into a function.
I made it with standard library only with comments. Please refer it.
mylist = [['a','b','c',''],['c','e','f'],['c','g','h']]
# I can't make sure whether the xlist's item is just one or not.
# So, I made it to find all
# And, you can see how to get the last value of a list as [-1]
xlist = [x for x in mylist if x[-1] == '']
ylist = [x for x in mylist if x[-1] != '']
result = []
# combine matrix of x x y
for x in xlist:
for y in ylist:
c = x + y # merge
c = [i for i in c if i] # drop ''
c = list(set(c)) # drop duplicates
c.sort() # sort
result.append(c) # add to result
print (result)
The result is
[['a', 'b', 'c', 'e', 'f'], ['a', 'b', 'c', 'g', 'h']]
Your code almost works, except you never do anything with the result of combine (print it, or add it to some result list), and you do not remove the '' element. However, for a longer list, this might be a bit slow, as it has quadratic complexity O(n²).
Instead, you can use a dictionary to map first elements to the remaining elements of the lists. Then you can use a loop or list comprehension to combine the lists with the right suffixes:
lst = [['a','b','c',''],['c','e','f'],['c','g','h']]
import collections
replacements = collections.defaultdict(list)
for first, *rest in lst:
replacements[first].append(rest)
result = [l[:-2] + c for l in lst if l[-1] == "" for c in replacements[l[-2]]]
# [['a', 'b', 'c', 'e', 'f'], ['a', 'b', 'c', 'g', 'h']]
If the list can have more than one placeholder '', and if those can appear in the middle of the list, then things get a bit more complicated. You could make this a recursive function. (This could be made more efficient by using an index instead of repeatedly slicing the list.)
def replace(lst, last=None):
if lst:
first, *rest = lst
if first == "":
for repl in replacements[last]:
yield from replace(repl + rest)
else:
for res in replace(rest, first):
yield [first] + res
else:
yield []
for l in lst:
for x in replace(l):
print(x)
Output for lst = [['a','b','c','','b',''],['c','b','','e','f'],['c','g','b',''],['b','x','y']]:
['a', 'b', 'c', 'b', 'x', 'y', 'e', 'f', 'b', 'x', 'y']
['a', 'b', 'c', 'g', 'b', 'x', 'y', 'b', 'x', 'y']
['c', 'b', 'x', 'y', 'e', 'f']
['c', 'g', 'b', 'x', 'y']
['b', 'x', 'y']
try my solution
although it will change the order of list but it's quite simple code
lst = [['a', 'b', 'c', ''], ['c', 'e', 'f'], ['c', 'g', 'h']]
lst[0].pop(-1)
print([list(set(lst[0]+lst[1])), list(set(lst[0]+lst[2]))])

Subtract 2 lists by duplicate elements in python

Hello I want to know how to subtract 2 lists by duplicate elements, not by value, in python.
ListA = [G, A, H, I, J, B]
ListB = [A, B, C]
ListC = [G, H, I, J]
So we subtract the ListB values, if they are found in ListA as duplicates, and the ListC will give back the non-duplicate values in ListA.
Mathematically written it would be:
ListC = ListA - (ListA ∩ ListB)
(I don't want to remove the duplicates in ListA, only the intersection between ListA and ListB, as described in the above formula, so this question is not a duplicate of questions/48242432
You can do a list comprehension..
[x for x in listA if x not in listB]
Try this
>>> def li(li1,li2):
li3=li1
for i in li2:
if i in li1:
li3.remove(i)
return(li3)
>>> li(["G","A","H","I","J","B"],["A","B","C"])
['G', 'H', 'I', 'J']
Use the sets library in Python.
from sets import Set
setA = Set(['G', 'A', 'H', 'I', 'J', 'B'])
setB = Set(['A', 'B', 'C'])
# get difference between setA and intersection of setA and setB
setC = setA - (setA & setB)
The cool thing about sets is that they tend to operate faster than list comprehensions. For instance, this operation would tend to run at O(len(setA)) + O(min(len(setA), len(setB))) = O(len(setA)) whereas a list comprehension would run at O(len(setA) * len(setB)) to achieve the same result. Of course, these are average cases not worst cases. Worst case, they'd be the same. Either way, you should use the object that best fits your operations, right?
See the Python documentation for more.
This is what you want?
L1 = ['A', 'G', 'H', 'I', 'J', 'B']
L2 = ['A', 'B', 'C']
for i in L1:
if i not in L2:
print(i)
On basis of using mathematical set notations, why not use sets?
ListA = [G,A,H,I,J,B]
ListB = [A,B,C]
SetC = set(ListA) - set(ListB)
But then you get sets out and have to go back to lists... also the order might change and any character that was twice in the list is then only once in it
https://docs.python.org/3/tutorial/datastructures.html#sets
>>> a = set('abracadabra') # sets have only unique elements and are unordered
>>> b = set('alacazam')
>>> a # unique letters in a
{'a', 'r', 'b', 'c', 'd'}
>>> a - b # letters in a but not in b
{'r', 'd', 'b'}
>>> a | b # letters in a or b or both
{'a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'}
>>> a & b # letters in both a and b
{'a', 'c'}
>>> a ^ b # letters in a or b but not both
{'r', 'd', 'b', 'm', 'z', 'l'}
list1 = ['string1','string2','string3']
list2 = ['string1','string2','string3','pussywagon']
newList = list(set(list2)-set(list1))
# output
print(newList)
# type
print(type(newList))
test code

Would like to prevent dupes in a python list of lists

I am creating a list of lists and want to prevent dupes. For example, I have:
mainlist = [[a,b],[c,d],[a,d]]
the next item (list) to be added is [b,a] which is considered a duplicate of [a,b].
UPDATE
mainlist = [[a,b],[c,d],[a,d]]
swap = [b,a]
for item in mainlist:
if set(item) & set(swap):
print "match was found", item
else:
mainlist.append(swap)
Any suggestions as to how I can test whether the next item to be added is already in the list?
Here's an approach using frozensets within a set to check for duplicates. It's a bit ugly since I'm invoking a function that works with global variables.
def add_to_mainlist(new_list):
if frozenset(new_list) not in dups:
mainlist.append(new_list)
mainlist = [['a', 'b'],['c', 'd'],['a', 'd']]
dups = set()
for l in mainlist:
dups.add(frozenset(l))
print("Before:", mainlist)
add_to_mainlist(['a', 'b'])
print("After:", mainlist)
This outputs:
Before: [['a', 'b'], ['c', 'd'], ['a', 'd']]
After: [['a', 'b'], ['c', 'd'], ['a', 'd']]
Showing that the new list was indeed not added to the original.
Here's a cleaner version that calculates the existing set on the fly inside a function that does everything locally:
def add_to_mainlist(mainlist, new_list):
dups = set()
for l in mainlist:
dups.add(frozenset(l))
if frozenset(new_list) not in dups:
mainlist.append(new_list)
return mainlist
mainlist = [['a', 'b'],['c', 'd'],['a', 'd']]
print("Before:", mainlist)
mainlist = add_to_mainlist(mainlist, ['a', 'b']) # the assignment isn't needed, but done anyway :-)
print("After:", mainlist)
Why doesn't your existing code work?
This is what you're doing:
...
for item in mainlist:
if set(item) & set(swap):
print "match was found", item
else:
mainlist.append(swap)
You're intersecting two sets and checking the truthiness of the result. While this might be okay for 0 intersections, in the event that even one of the elements are common (example, ['a', 'b'] and ['b', 'd']), you'd still declare a match which is false.
Ideally you'd want to check the length of the resultant set and make sure its length is equal to than 2:
dups = False
for item in mainlist:
if len(set(item) & set(swap)) == 2:
dups = True
break
if dups == False:
mainlist.append(swap)
You'd also ideally want a flag to ensure that you did not find duplicates. Your previous code would add without checking all items first.
If the order of your inner lists doesn't matter, then this can trivially be accomplished using frozenset()s:
>>> mainlist = [['a', 'b'],['c', 'd'],['a', 'd']]
>>> mainlist = [frozenset(sublist) for sublist in mainlist]
>>>
>>> def add_to_list(lst, sublist):
... if frozenset(sublist) not in lst:
... lst.append(frozenset(sublist))
...
>>> mainlist
[frozenset({'a', 'b'}), frozenset({'d', 'c'}), frozenset({'a', 'd'})]
>>> add_to_list(mainlist, ['b', 'a'])
>>> mainlist
[frozenset({'a', 'b'}), frozenset({'d', 'c'}), frozenset({'a', 'd'})]
>>>
If the order does matter you can either do what #Coldspeed suggested - Construct a set() from your list, construct a frozenset() from the list to be added, and test for membership - or you can use all() and sorted() to test if the list to be added is equivalent to any of the other lists:
>>> def add_to_list(lst, sublist):
... for l in lst:
... if all(a == b for a, b in zip(sorted(sublist), sorted(l))):
... return
... lst.append(sublist)
...
>>> mainlist
[['a', 'b'], ['c', 'd'], ['a', 'd']]
>>> add_to_list(mainlist, ['b', 'a'])
>>> mainlist
[['a', 'b'], ['c', 'd'], ['a', 'd']]
>>>

How can I group a list of objects by continuity?

Given a very large (gigabytes) list of arbitrary objects (I've seen a similar solution to this for ints), can I either group it easily into sublists by equivalence? Either in-place or by generator which consumes the original list.
l0 = [A,B, A,B,B, A,B,B,B,B, A, A, A,B] #spaces for clarity
Desired result:
[['A', 'B'], ['A', 'B', 'B'], ['A', 'B', 'B', 'B', 'B'], ['A'], ['A'], ['A', 'B']]
I wrote a looping version like so:
#find boundaries
b0 = []
prev = A
group = A
for idx, elem in enumerate(l0):
if elem == group:
b0.append(idx)
prev = elem
b0.append(len(l0)-1)
for idx, b in enumerate(b0):
try:
c = b0[idx+1]
except:
break
if c == len(l0)-1:
l1.append(l0[b:])
else:
l1.append(l0[b:c])
Can this be done as a generator gen0(l) that will work like:
for g in gen(l0):
print g
....
['A', 'B']
['A', 'B', 'B']
['A', 'B', 'B', 'B', 'B']
....
etc?
EDIT: using python 2.6 or 2.7
EDIT: preferred solution, mostly based on the accepted answer:
def gen_group(f, items):
out = [items[0]]
while items:
for elem in items[1:]:
if f(elem, out[0]):
break
else:
out.append(elem)
for _i in out:
items.pop(0)
yield out
if items:
out = [items[0]]
g = gen_group(lambda x, y: x == y, l0)
for out in g:
print out
Maybe something like this:
def subListGenerator(f,items):
i = 0
n = len(items)
while i < n:
sublist = [items[i]]
i += 1
while i < n and not f(items[i]):
sublist.append(items[i])
i += 1
yield sublist
Used like:
>>> items = ['A', 'B', 'A', 'B', 'B', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'B']
>>> g = subListGenerator(lambda x: x == 'A',items)
>>> for x in g: print(x)
['A', 'B']
['A', 'B', 'B']
['A', 'B', 'B', 'B', 'B']
['A']
['A']
['A', 'B']
I assume that A is your breakpoint.
>>> A, B = 'A', 'B'
>>> x = [A,B, A,B,B, A,B,B,B,B, A, A, A,B]
>>> map(lambda arr: [i for i in arr[0]], map(lambda e: ['A'+e], ''.join(x).split('A')[1:]))
[['A', 'B'], ['A', 'B', 'B'], ['A', 'B', 'B', 'B', 'B'], ['A'], ['A'], ['A', 'B']]
Here's a simple generator to perform your task:
def gen_group(L):
DELIMETER = "A"
out = [DELIMETER]
while L:
for ind, elem in enumerate(L[1:]):
if elem == DELIMETER :
break
else:
out.append(elem)
for i in range(ind + 1):
L.pop(0)
yield out
out = [DELIMETER ]
The idea is to cut down the list and yield the sublists until there is nothing left. This assumes the list starts with "A" (DELIMETER variable).
Sample output:
for out in gen_group(l0):
print out
Produces
['A', 'B']
['A', 'B', 'B']
['A', 'B', 'B', 'B', 'B']
['A']
['A']
['A', 'B']
['A']
Comparitive Timings:
timeit.timeit(s, number=100000) is used to test each of the current answers, where s is the multiline string of the code (listed below):
Trial 1 Trial 2 Trial 3 Trial 4 | Avg
This answer (s1): 0.08247 0.07968 0.08635 0.07133 0.07995
Dilara Ismailova (s2): 0.77282 0.72337 0.73829 0.70574 0.73506
John Coleman (s3): 0.08119 0.09625 0.08405 0.08419 0.08642
This answer is the fastest, but it is very close. I suspect the difference is the additional argument and anonymous function in John Coleman's answer.
s1="""l0 = ["A","B", "A","B","B", "A","B","B","B","B", "A", "A", "A","B"]
def gen_group(L):
out = ["A"]
while L:
for ind, elem in enumerate(L[1:]):
if elem == "A":
break
else:
out.append(elem)
for i in range(ind + 1):
L.pop(0)
yield out
out = ["A"]
out =gen_group(l0)"""
s2 = """A, B = 'A', 'B'
x = [A,B, A,B,B, A,B,B,B,B, A, A, A,B]
map(lambda arr: [i for i in arr[0]], map(lambda e: ['A'+e], ''.join(x).split('A')[1:]))"""
s3 = """def subListGenerator(f,items):
i = 0
n = len(items)
while i < n:
sublist = [items[i]]
i += 1
while i < n and not f(items[i]):
sublist.append(items[i])
i += 1
yield sublist
items = ['A', 'B', 'A', 'B', 'B', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'B']
g = subListGenerator(lambda x: x == 'A',items)"""
The following works in this case. You could change the l[0] != 'A' condition to be whatever. I would probably pass it as an argument, so that you can reuse it somewhere else.
def gen(l_arg, boundary):
l = l_arg.copy() # Optional if you want to save memory
while l:
sub_list = [l.pop(0)]
while l and l[0] != boundary: # Here boundary = 'A'
sub_list.append(l.pop(0))
yield sub_list
It assumes that there is an 'A' at the beginning of your list. And it copies the list, which isn't great when the list is in the range of Gb. you could remove the copy to save memory if you don't care about keeping the original list.

Python Remove SOME duplicates from a list while maintaining order?

I want to remove certain duplicates in my python list.
I know there are ways to remove all duplicates, but I wanted to remove only consecutive duplicates, while maintaining the list order.
For example, I have a list such as the following:
list1 = [a,a,b,b,c,c,f,f,d,d,e,e,f,f,g,g,c,c]
However, I want to remove the duplicates, and maintain order, but still keep the 2 c's and 2 f's, such as this:
wantedList = [a,b,c,f,d,e,f,g,c]
So far, I have this:
z = 0
j=0
list2=[]
for i in list1:
if i == "c":
z = z+1
if (z==1):
list2.append(i)
if (z==2):
list2.append(i)
else:
pass
elif i == "f":
j = j+1
if (j==1):
list2.append(i)
if (j==2):
list2.append(i)
else:
pass
else:
if i not in list2:
list2.append(i)
However, this method gives me something like:
wantedList = [a,b,c,c,d,e,f,f,g]
Thus, not maintaining the order.
Any ideas would be appreciated! Thanks!
Not completely sure if c and f are special cases, or if you want to compress consecutive duplicates only. If it is the latter, you can use itertools.groupby():
>>> import itertools
>>> list1
['a', 'a', 'b', 'b', 'c', 'c', 'f', 'f', 'd', 'd', 'e', 'e', 'f', 'f', 'g', 'g', 'c', 'c']
>>> [k for k, g in itertools.groupby(list1)]
['a', 'b', 'c', 'f', 'd', 'e', 'f', 'g', 'c']
To remove consecutive duplicates from a list, you can use the following generator function:
def remove_consecutive_duplicates(a):
last = None
for x in a:
if x != last:
yield x
last = x
With your data, this gives:
>>> list1 = ['a','a','b','b','c','c','f','f','d','d','e','e','f','f','g','g','c','c']
>>> list(remove_consecutive_duplicates(list1))
['a', 'b', 'c', 'f', 'd', 'e', 'f', 'g', 'c']
If you want to ignore certain items when removing duplicates...
list2 = []
for item in list1:
if item not in list2 or item in ('c','f'):
list2.append(item)
EDIT: Note that this doesn't remove consecutive items
EDIT
Never mind, I read your question wrong. I thought you were wanting to keep only certain sets of doubles.
I would recommend something like this. It allows a general form to keep certain doubles once.
list1 = ['a','a','b','b','c','c','f','f','d','d','e','e','f','f','g','g','c','c']
doubleslist = ['c', 'f']
def remove_duplicate(firstlist, doubles):
newlist = []
for x in firstlist:
if x not in newlist:
newlist.append(x)
elif x in doubles:
newlist.append(x)
doubles.remove(x)
return newlist
print remove_duplicate(list1, doubleslist)
The simple solution is to compare this element to the next or previous element
a=1
b=2
c=3
d=4
e=5
f=6
g=7
list1 = [a,a,b,b,c,c,f,f,d,d,e,e,f,f,g,g,c,c]
output_list=[list1[0]]
for ctr in range(1, len(list1)):
if list1[ctr] != list1[ctr-1]:
output_list.append(list1[ctr])
print output_list
list1 = ['a', 'a', 'b', 'b', 'c', 'c', 'f', 'f', 'd', 'd', 'e', 'e', 'f', 'f', 'g', 'g', 'c', 'c']
wantedList = []
for item in list1:
if len(wantedList) == 0:
wantedList.append(item)
elif len(wantedList) > 0:
if wantedList[-1] != item:
wantedList.append(item)
print(wantedList)
Fetch each item from the main list(list1).
If the 'temp_list' is empty add that item.
If not , check whether the last item in the temp_list is
not same as the item we fetched from 'list1'.
if items are different append into temp_list.

Categories

Resources