Subtract 2 lists by duplicate elements in python

Subtract 2 lists by duplicate elements in python - python

Hello I want to know how to subtract 2 lists by duplicate elements, not by value, in python.
ListA = [G, A, H, I, J, B]
ListB = [A, B, C]
ListC = [G, H, I, J]
So we subtract the ListB values, if they are found in ListA as duplicates, and the ListC will give back the non-duplicate values in ListA.
Mathematically written it would be:
ListC = ListA - (ListA ∩ ListB)
(I don't want to remove the duplicates in ListA, only the intersection between ListA and ListB, as described in the above formula, so this question is not a duplicate of questions/48242432

You can do a list comprehension..
[x for x in listA if x not in listB]

Try this
>>> def li(li1,li2):
li3=li1
for i in li2:
if i in li1:
li3.remove(i)
return(li3)
>>> li(["G","A","H","I","J","B"],["A","B","C"])
['G', 'H', 'I', 'J']

Use the sets library in Python.
from sets import Set
setA = Set(['G', 'A', 'H', 'I', 'J', 'B'])
setB = Set(['A', 'B', 'C'])
# get difference between setA and intersection of setA and setB
setC = setA - (setA & setB)
The cool thing about sets is that they tend to operate faster than list comprehensions. For instance, this operation would tend to run at O(len(setA)) + O(min(len(setA), len(setB))) = O(len(setA)) whereas a list comprehension would run at O(len(setA) * len(setB)) to achieve the same result. Of course, these are average cases not worst cases. Worst case, they'd be the same. Either way, you should use the object that best fits your operations, right?
See the Python documentation for more.

This is what you want?
L1 = ['A', 'G', 'H', 'I', 'J', 'B']
L2 = ['A', 'B', 'C']
for i in L1:
if i not in L2:
print(i)

On basis of using mathematical set notations, why not use sets?
ListA = [G,A,H,I,J,B]
ListB = [A,B,C]
SetC = set(ListA) - set(ListB)
But then you get sets out and have to go back to lists... also the order might change and any character that was twice in the list is then only once in it
https://docs.python.org/3/tutorial/datastructures.html#sets
>>> a = set('abracadabra') # sets have only unique elements and are unordered
>>> b = set('alacazam')
>>> a # unique letters in a
{'a', 'r', 'b', 'c', 'd'}
>>> a - b # letters in a but not in b
{'r', 'd', 'b'}
>>> a | b # letters in a or b or both
{'a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'}
>>> a & b # letters in both a and b
{'a', 'c'}
>>> a ^ b # letters in a or b but not both
{'r', 'd', 'b', 'm', 'z', 'l'}

list1 = ['string1','string2','string3']
list2 = ['string1','string2','string3','pussywagon']
newList = list(set(list2)-set(list1))
# output
print(newList)
# type
print(type(newList))
test code

Related

Complex list comparisons in python

I want to do a complex list comparison with python. I want to see if listB contains all of the items from listA and if they are in the same order. But I do not care if listB has extra items or interleaved items.
Examples:
listA = ['A','B','C','D','E']
listB = [':','A','*','B','C','D','E','`']
A, B, C, D, and E all appear in listB and are presented in the same order even though A and B have an item in between them and items at the start and end of listB.
Extra complicated:
listA = ['A','B','C','D','E']
listB = ['A','*','C','B','C','D','E']
A, B, C, D, and E all appear in listB and are presented in the same order even though A and B have two items in between them and one of those items happens to be something we are searching for. But since we are looking if A -> B is sequential and B -> C is sequential the fact that we also have C -> B -> C shouldn't matter.
So,
listA = ['A','B','C','D','E']
listB = [':','A','*','B','C','D','E','`']
Would be True
listA = ['A','B','C','D','E']
listB = ['A','*','C','B','C','D','E']
Would be True
But something like:
listA = ['A','B','C','D','E']
listB = ['A','B','C','D','F']
or even
listB = ['A','B','C','D']
Would be False
If it get a False answer, I'd ideally like to be able to point to where the break in sequence happened -- i.e. E is missing.

Simple solution using a nested loop. Walk over listA and search the elements in listB in order. Should you fail at any point -> this is not a substring:
def check(listA, listB):
start = 0
for a in listA:
for i in range(start, len(listB)):
if a == listB[i]:
start = i+1
break
else: # triggered only if no break
# print(f'{a} not found after position {start}')
return False
return True
check('ABCDE', 'A*CBCDE')
# True
check('ABCDEF', 'A*CBCDE')
# False
check ('', '')
# True
check('ABA', 'ABBA')
# True
NB. Using strings here for clarity, but this works for any iterable.
To get information on the non-found item, you can uncomment the print.
Example:
check('ABCA', 'ABBA')
C not found after position 2
# False

As always Python comes with batteries included, use SequenceMatcher.get_matching_blocks for a one-line solution:
from difflib import SequenceMatcher
def check(lstA, lstB):
sm = SequenceMatcher(isjunk=lambda x: x not in set(lstA), a=lstA, b=lstB)
return sum(block.size for block in sm.get_matching_blocks()) == len(lstA)
print(check(['A', 'B', 'C', 'D', 'E'], [':', 'A', '*', 'B', 'C', 'D', 'E', '`']))
print(check(['A', 'B', 'C', 'D', 'E'], ['A', '*', 'C', 'B', 'C', 'D', 'E']))
print(check(['A', 'B', 'C', 'D', 'E'], ['A', 'B', 'C', 'D', 'F']))
Output
True
True
False
Notice that this also works for any sequence, like strings:
print(check('ABCDE', 'A*CBCDE'))
print(check('ABCDEF', 'A*CBCDE'))
print(check('', ''))
print(check('ABA', 'ABBA'))
Output (for strings)
True
False
True
True
SequenceMatcher will even give information on how to transform one sequence into the other:
listA = ['A', 'B', 'C', 'D', 'E']
listB = [':', 'A', '*', 'B', 'C', 'D', 'E', '`']
s = SequenceMatcher(None, listA, listB)
for tag, i1, i2, j1, j2 in s.get_opcodes():
print('{:7} a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(tag, i1, i2, j1, j2, listA[i1:i2], listB[j1:j2]))
Output
insert a[0:0] --> b[0:1] [] --> [':']
equal a[0:1] --> b[1:2] ['A'] --> ['A']
insert a[1:1] --> b[2:3] [] --> ['*']
equal a[1:5] --> b[3:7] ['B', 'C', 'D', 'E'] --> ['B', 'C', 'D', 'E']
insert a[5:5] --> b[7:8] [] --> ['`']

Recursive Solution
def is_contained(a, b):
# Base case
if not a or not b:
return True if not a else False
# if first letters match check remaining, else check b remaining
return is_contained(a[1:], b[1:]) if a[0] == b[0] else is_contained(a, b[1:])
Test
print(is_contained(['A','B','C','D','E'], ['A','*','C','B','C','D','E'])) # True
print(is_contained(['A','B','C','D','E'], ['A','B','C','D','F'])) # False

How would you re-order the reordered list back to its previous form?

element = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
index_list = [3,5,6,1,2,4,0]
result = [element[i] for i in index_list]
print(result)
this would eventually give me a ordered list based on the index list which would give
['d', 'f', 'g', 'b', 'c', 'e', 'a'].
How would you re-order this already re-ordered list back to its previous form which would be ['a', 'b', 'c', 'd', 'e', 'f', 'g']? I tried using the given index list again but it did not returned it back, but simply gave me a new list. Would there be any way I could still use the given index list to reorder the list back?

You can do the opposite:
reordered = [None] * len(result)
for index, e in zip(index_list, result):
reordered[index] = e

You can process index_list to do the reverse permutation:
index_list = [3,5,6,1,2,4,0]
reverse = [i for i, n in sorted(enumerate(index_list), key=lambda x: x[1])]
original = [result[i] for i in reverse]

something like this
print([a for a, b in sorted(zip(result, index_list), key=lambda x: x[1])])

complete list if the first and last element is equal

I have a problem trying to transform a list.
The original list is like this:
[['a','b','c',''],['c','e','f'],['c','g','h']]
now I want to make the output like this:
[['a','b','c','e','f'],['a','b','c','g','h']]
When the blank is found ( '' ) merge the three list into two lists.
I need to write a function to do this for me.
Here is what I tried:
for x in mylist:
if x[len(x) - 1] == '':
m = x[len(x) - 2]
for y in mylist:
if y[0] == m:
combine(x, y)
def combine(x, y):
for m in y:
if not m in x:
x.append(m)
return(x)
but its not working the way I want.

try this :
mylist = [['a','b','c',''],['c','e','f'],['c','g','h']]
def combine(x, y):
for m in y:
if not m in x:
x.append(m)
return(x)
result = []
for x in mylist:
if x[len(x) - 1] == '':
m = x[len(x) - 2]
for y in mylist:
if y[0] == m:
result.append(combine(x[0:len(x)-2], y))
print(result)
your problem was with
combine(x[0:len(x)-2], y)
output :
[['a', 'b', 'c', 'e', 'f'], ['a', 'b', 'c', 'g', 'h']]

So you basically want to merge 2 lists? If so, you can use one of 2 ways :
Either use the + operator, or use the
extend() method.
And then you put it into a function.

I made it with standard library only with comments. Please refer it.
mylist = [['a','b','c',''],['c','e','f'],['c','g','h']]
# I can't make sure whether the xlist's item is just one or not.
# So, I made it to find all
# And, you can see how to get the last value of a list as [-1]
xlist = [x for x in mylist if x[-1] == '']
ylist = [x for x in mylist if x[-1] != '']
result = []
# combine matrix of x x y
for x in xlist:
for y in ylist:
c = x + y # merge
c = [i for i in c if i] # drop ''
c = list(set(c)) # drop duplicates
c.sort() # sort
result.append(c) # add to result
print (result)
The result is
[['a', 'b', 'c', 'e', 'f'], ['a', 'b', 'c', 'g', 'h']]

Your code almost works, except you never do anything with the result of combine (print it, or add it to some result list), and you do not remove the '' element. However, for a longer list, this might be a bit slow, as it has quadratic complexity O(n²).
Instead, you can use a dictionary to map first elements to the remaining elements of the lists. Then you can use a loop or list comprehension to combine the lists with the right suffixes:
lst = [['a','b','c',''],['c','e','f'],['c','g','h']]
import collections
replacements = collections.defaultdict(list)
for first, *rest in lst:
replacements[first].append(rest)
result = [l[:-2] + c for l in lst if l[-1] == "" for c in replacements[l[-2]]]
# [['a', 'b', 'c', 'e', 'f'], ['a', 'b', 'c', 'g', 'h']]
If the list can have more than one placeholder '', and if those can appear in the middle of the list, then things get a bit more complicated. You could make this a recursive function. (This could be made more efficient by using an index instead of repeatedly slicing the list.)
def replace(lst, last=None):
if lst:
first, *rest = lst
if first == "":
for repl in replacements[last]:
yield from replace(repl + rest)
else:
for res in replace(rest, first):
yield [first] + res
else:
yield []
for l in lst:
for x in replace(l):
print(x)
Output for lst = [['a','b','c','','b',''],['c','b','','e','f'],['c','g','b',''],['b','x','y']]:
['a', 'b', 'c', 'b', 'x', 'y', 'e', 'f', 'b', 'x', 'y']
['a', 'b', 'c', 'g', 'b', 'x', 'y', 'b', 'x', 'y']
['c', 'b', 'x', 'y', 'e', 'f']
['c', 'g', 'b', 'x', 'y']
['b', 'x', 'y']

try my solution
although it will change the order of list but it's quite simple code
lst = [['a', 'b', 'c', ''], ['c', 'e', 'f'], ['c', 'g', 'h']]
lst[0].pop(-1)
print([list(set(lst[0]+lst[1])), list(set(lst[0]+lst[2]))])

Would like to prevent dupes in a python list of lists

I am creating a list of lists and want to prevent dupes. For example, I have:
mainlist = [[a,b],[c,d],[a,d]]
the next item (list) to be added is [b,a] which is considered a duplicate of [a,b].
UPDATE
mainlist = [[a,b],[c,d],[a,d]]
swap = [b,a]
for item in mainlist:
if set(item) & set(swap):
print "match was found", item
else:
mainlist.append(swap)
Any suggestions as to how I can test whether the next item to be added is already in the list?

Here's an approach using frozensets within a set to check for duplicates. It's a bit ugly since I'm invoking a function that works with global variables.
def add_to_mainlist(new_list):
if frozenset(new_list) not in dups:
mainlist.append(new_list)
mainlist = [['a', 'b'],['c', 'd'],['a', 'd']]
dups = set()
for l in mainlist:
dups.add(frozenset(l))
print("Before:", mainlist)
add_to_mainlist(['a', 'b'])
print("After:", mainlist)
This outputs:
Before: [['a', 'b'], ['c', 'd'], ['a', 'd']]
After: [['a', 'b'], ['c', 'd'], ['a', 'd']]
Showing that the new list was indeed not added to the original.
Here's a cleaner version that calculates the existing set on the fly inside a function that does everything locally:
def add_to_mainlist(mainlist, new_list):
dups = set()
for l in mainlist:
dups.add(frozenset(l))
if frozenset(new_list) not in dups:
mainlist.append(new_list)
return mainlist
mainlist = [['a', 'b'],['c', 'd'],['a', 'd']]
print("Before:", mainlist)
mainlist = add_to_mainlist(mainlist, ['a', 'b']) # the assignment isn't needed, but done anyway :-)
print("After:", mainlist)
Why doesn't your existing code work?
This is what you're doing:
...
for item in mainlist:
if set(item) & set(swap):
print "match was found", item
else:
mainlist.append(swap)
You're intersecting two sets and checking the truthiness of the result. While this might be okay for 0 intersections, in the event that even one of the elements are common (example, ['a', 'b'] and ['b', 'd']), you'd still declare a match which is false.
Ideally you'd want to check the length of the resultant set and make sure its length is equal to than 2:
dups = False
for item in mainlist:
if len(set(item) & set(swap)) == 2:
dups = True
break
if dups == False:
mainlist.append(swap)
You'd also ideally want a flag to ensure that you did not find duplicates. Your previous code would add without checking all items first.

If the order of your inner lists doesn't matter, then this can trivially be accomplished using frozenset()s:
>>> mainlist = [['a', 'b'],['c', 'd'],['a', 'd']]
>>> mainlist = [frozenset(sublist) for sublist in mainlist]
>>>
>>> def add_to_list(lst, sublist):
... if frozenset(sublist) not in lst:
... lst.append(frozenset(sublist))
...
>>> mainlist
[frozenset({'a', 'b'}), frozenset({'d', 'c'}), frozenset({'a', 'd'})]
>>> add_to_list(mainlist, ['b', 'a'])
>>> mainlist
[frozenset({'a', 'b'}), frozenset({'d', 'c'}), frozenset({'a', 'd'})]
>>>
If the order does matter you can either do what #Coldspeed suggested - Construct a set() from your list, construct a frozenset() from the list to be added, and test for membership - or you can use all() and sorted() to test if the list to be added is equivalent to any of the other lists:
>>> def add_to_list(lst, sublist):
... for l in lst:
... if all(a == b for a, b in zip(sorted(sublist), sorted(l))):
... return
... lst.append(sublist)
...
>>> mainlist
[['a', 'b'], ['c', 'd'], ['a', 'd']]
>>> add_to_list(mainlist, ['b', 'a'])
>>> mainlist
[['a', 'b'], ['c', 'd'], ['a', 'd']]
>>>

Removing every third item from list (Deleting entries at regular interval in list)

I want to remove every 3rd item from list.
For Example:
list1 = list(['a','b','c','d','e','f','g','h','i','j'])
After removing indexes which are multiple of three the list will be:
['a','b','d','e','g','h','j']
How can I achieve this?

You may use enumerate():
>>> x = ['a','b','c','d','e','f','g','h','i','j']
>>> [i for j, i in enumerate(x) if (j+1)%3]
['a', 'b', 'd', 'e', 'g', 'h', 'j']
Alternatively, you may create the copy of list and delete the values at interval. For example:
>>> y = list(x) # where x is the list mentioned in above example
>>> del y[2::3] # y[2::3] = ['c', 'f', 'i']
>>> y
['a', 'b', 'd', 'e', 'g', 'h', 'j']

[v for i, v in enumerate(list1) if (i + 1) % 3 != 0]
It seems like you want the third item in the list, which is actually at index 2, gone. This is what the +1 is for.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Subtract 2 lists by duplicate elements in python - python

You can do a list comprehension.. [x for x in listA if x not in listB]

Try this >>> def li(li1,li2): li3=li1 for i in li2: if i in li1: li3.remove(i) return(li3) >>> li(["G","A","H","I","J","B"],["A","B","C"]) ['G', 'H', 'I', 'J']

This is what you want? L1 = ['A', 'G', 'H', 'I', 'J', 'B'] L2 = ['A', 'B', 'C'] for i in L1: if i not in L2: print(i)

list1 = ['string1','string2','string3'] list2 = ['string1','string2','string3','pussywagon'] newList = list(set(list2)-set(list1)) # output print(newList) # type print(type(newList)) test code

Related

Complex list comparisons in python

How would you re-order the reordered list back to its previous form?

complete list if the first and last element is equal

Would like to prevent dupes in a python list of lists

Removing every third item from list (Deleting entries at regular interval in list)

Categories

Resources