Removing common elements in two lists - python

I have two sorted lists of positive integers which can have repeated elements and I must remove matching pairs of numbers, one from each list:
a=[1,2,2,2,3]
b=[2,3,4,5,5]
should become:
a=[1,2,2]
b=[4,5,5]
That is, the 2's and the 3's have been removed because they appear in both lists.
Set intersection can't be used here because of the repeated elements.
How do I go about this?

To remove elements appearing in both lists, use the following:
for i in a[:]:
if i in b:
a.remove(i)
b.remove(i)
To create a function which does it for you, simply do:
def removeCommonElements(a, b):
for e in a[:]:
if e in b:
a.remove(e)
b.remove(e)
Or to return new lists and not to edit the old ones:
def getWithoutCommonElements(a, b): # Name subject to change
a2 = a.copy()
b2 = b.copy()
for e in a:
if e not in b:
a2.remove(e)
b2.remove(e)
return a2, b2
However the former could be replaced with removeCommonElements like so:
a2, b2 = a.copy(), b.copy()
removeCommonElements(a2, b2)
Which would keep a and b, but create a duplicates without common elements.

The Counter object from collections can do this quite concisely:
from collections import Counter
a=Counter([1,2,2,2,3])
b=Counter([2,3,4,5,5])
print list((a-b).elements())
print list((b-a).elements())
The idea is:
Count up how often each element appears (e.g. 2 appears 3 times in a, and 1 time in b)
Subtract the counts to work out how many extra times the element appears (e.g. 2 appears 3-1=2 times more in a than b)
Output each element the extra number of times it appears (the collections elements method automatically drops any elements with counts less than 1)
(Warning: the output lists won't necessarily be sorted)

Given that the lists are sorted, you can merge/distribute element-wise, like for example:
x, y = [], []
while a and b:
if a[0] < b[0]:
x.append(a.pop(0))
elif a[0] > b[0]:
y.append(b.pop(0))
else: # a[0]==b[0]
a.pop(0)
b.pop(0)
x += a
y += b

The solution given by #Mahi is nearly correct. The simplest way to achieve what you want is this:
def remove_common_elements(a, b):
for i in a[:]:
if i in b:
a.remove(i)
b.remove(i)
return a, b
The important thing here is to make a copy of a by writing a[:]. If you iterate through a list while removing elements from it, you won't get correct results.
If you don't want to modify the lists in place, make a copy of both lists beforehand and return the copied lists.
def remove_common_elements(a, b):
a_new = a[:]
b_new = b[:]
for i in a:
if i in b_new:
a_new.remove(i)
b_new.remove(i)
return a_new, b_new

One solution would be to create a new copy of a and removing common elements from b.
a = [1,2,2,2,3]
b = [2,2,3,4,5]
a_new = []
for ai in a:
if ai in b:
b.remove(ai)
else:
a_new.append(ai)
print a_new
print b

Related

Check if list is sublist of another list and the elements are in the same order

#sub list test 1
a = [1,1,2]
b = [0, 1,1,1,2,1]
#sub list test 2
c = [4,5]
d = [2,3,4,5,6]
if all(i in a for i in b):
print("Yes I have all the elements but not sure about their order :( ")
I have tried all() or using counter() from collections module. I can't seem to figure it out how to check their order too.
This will check all subList of length a and return True if a is found in b
Not sure about order :
def isSublist(a,b):
#checking all unique elements and their count
for i in set(a):
if i not in b or a.count(i) > b.count(i):
return False
return True
Sure about order :
def isSublist(a,b):
for i in range(len(b)-len(a)+1):
if b[i:i+len(a)] == a:
return True
return False
In most cases, when elements of lists has normal types, you can just convert them to str first and then do
"#".join(sublist) in "#".join(list)
Where # is a symbol that doesn't occur in list
Edit, as no comment pointed out, indeed there is one bug it that approach - if last sublists element is prefix of last lists element. Same with first element suffix. We can manage it by adding some kind of sentinel at the beggining/end, so correct solution would be
"#".join([""] + sublist + [""]) in "#".join([""] + list + [""])
You can use all() by performing the check for membership of a in b through an iterator on b:
a = [1,1,2]
b = [0, 1,1,1,2,1]
r = all(n in ib for ib in [iter(b)] for n in a)
print(r) # True
It will also find a match for items in the same relative order (that are not consecutive):
a=[1,2,1]
b=[1,2,3,1]
r = all(n in ib for ib in [iter(b)] for n in a)
print(r) # True
How it works:
every time n in ib is executed, the ib iterator advances up to the next matching value in b.
Going through all values of a will match items sequentially over the remainder of b after each match check
If ib is exhausted before getting to the end of a, then the elements aren't all present or their order doesn't match
If you are looking for consecutive matches, you can use the in operator on a generator that yields sublists of the corresponding length:
a = [1,1,2]
b = [0, 1,1,1,2,1]
r = a in (b[i:i+len(a)] for i in range(len(b)-len(a)+1))
print(r) # True

Compare two lists for same entries at same place

I am trying to compare two lists for the same element at the same index. The idea is to verify whether both lists contain same element at the same index. If yes, I want to count such occurrences. Here is my code:
count = 0
a = ['.ps2\n >|<4 *|*.ps2xml', '.c\n >|<2 *|*.wsc', '.h\n >|<2 *|*.wsh', '.c\n >|<2 *|*.chm', '.h\n >|<2 *|*.hta' ]
b = ['.ps2xml', '.chm', '.hta']
for x in a:
for y in b:
if y==x[x.index(" *|*")+4:]:
print "match"
count += 1
print count
This gives me a count of 3. What I expect is 1 because only first element of b matched with a's first element. The second element of both lists differ. The third elements are also different. The remaining elements in list a should not count as there is no such index in b.
Hope it makes sense. Thanks
In that case, you should not use nested loops (since this means you will repeat the search over b for each line in a); but use a zip(..):
for x,y in zip(a,b):
if y==x[x.index(" *|*")+4:]:
print "match"
count += 1
print count
zip takes some iterators and generates tuples. In this case the i-th tuple is thus (a[i],b[i]) so to speak.
Short solution using min() function(to get a limit size of compared sequences):
for i in range(min([len(a), len(b)])):
if (a[i][a[i].index('*|*') + 3:] == b[i]):
count += 1
print(count)
The output:
1
does the match have to be qualified as following '*|*' ?
if not then really simple is:
sum([1 for e, f in zip(a, b) if f in e])
or in later versions of python where iterator args are automatically unpacked:
sum(f in e for e, f in zip(a, b)) # relies on bools True, False = ints 1, 0
if the match is just the last bit you could split
'.ps2\n >|<4 *|*.ps2xml'.split(" *|*")
Out[13]: ['.ps2\n >|<4', '.ps2xml']
'.ps2\n >|<4 *|*.ps2xml'.split(" *|*")[1]
Out[14]: '.ps2xml'
sum([1 for e, f in zip(a, b) if f in e.split(" *|*")[1]])
and while sum() is more "intentional" len() could be used for a speed advantage since it doesn't have to iterate over the list

How to check if sum of 3 integers in list matches another integer? (python)

Here's my issue:
I have a large integer (anywhere between 0 and 2^32-1). Let's call this number X.
I also have a list of integers, unsorted currently. They are all unique numbers, greater than 0 and less than X. Assume that there is a large amount of items in this list, let's say over 100,000 items.
I need to find up to 3 numbers in this list (let's call them A, B and C) that add up to X.
A, B and C all need to be inside of the list, and they can be repeated (for example, if X is 4, I can have A=1, B=1 and C=2 even though 1 would only appear once in the list).
There can be multiple solutions for A, B and C but I just need to find one possible solution for each the quickest way possible.
I've tried creating a for loop structure like this:
For A in itemlist:
For B in itemlist:
For C in itemlist:
if A + B + C == X:
exit("Done")
But since my list of integers contains over 100,000 items, this uses too much memory and would take far too long.
Is there any way to find a solution for A, B and C without using an insane amount of memory or taking an insane amount of time? Thanks in advance.
you can reduce the running time from n^3 to n^2 by using set something like that
s = set(itemlist)
for A in itemlist:
for B in itemlist:
if X-(A+B) in s:
print A,B,X-(A+B)
break
you can also sort the list and use binary search if you want to save memory
import itertools
nums = collections.Counter(itemlist)
target = t # the target sum
for i in range(len(itemlist)):
if itemlist[i] > target: continue
for j in range(i+1, len(itemlist)):
if itemlist[i]+itemlist[j] > target: continue
if target - (itemlist[i]+itemlist[j]) in nums - collections.Counter([itemlist[i], itemlist[j]]):
print("Found", itemlist[i], itemlist[j], target - (itemlist[i]+itemlist[j]))
Borrowing from #inspectorG4dget's code, this has two modifications:
If C < B then we can short-circuit the loop.
Use bisect_left() instead of collections.Counter().
This seems to run more quickly.
from random import randint
from bisect import bisect_left
X = randint(0, 2**32 - 1)
itemset = set(randint(0,X) for _ in range(100000))
itemlist = sorted(list(itemset)) # sort the list for binary search
l = len(itemlist)
for i,A in enumerate(itemlist):
for j in range(i+1, l): # use numbers above A
B = itemlist[j]
C = X - A - B # calculate C
if C <= B: continue
# see https://docs.python.org/2/library/bisect.html#searching-sorted-lists
i = bisect_left(itemlist, C)
if i != l and itemlist[i] == C:
print("Found", A, B, C)
To reduce the number of comparisons, we enforce A < B < C.

Python Itertools Permutations

I am currently writing a program that uses itertools, and one piece of it does not seems to functioning properly. I would like the input that determines the length of the lists the permutation function outputs to be equal to length of the list from which it generates its outputs. In other words, I have
import itertools
b = 0
c = 9
d = [0,1,2]
e = len(d)
while b < c:
d.append(b)
b = b+1
print([x for x in itertools.permutations(d,e)])
And I would like this to generate all the possible permutations of d that are equal to this length. I have been experimenting with this and it seems that second determiner must be an integer. I even tried making a new variable, f, and having f = int(e) and then replacing e with f in the print statement, but with no success. With either of these all I got was [()]
Thanks for your help.
You need to set e after you build the list. len(d) returns a value, not a reference to the list's length.
d = range(0,9) # build an arbitrary list here
# this creates a list of numbers: [0,1,2,3,4,5,6,7,8]
e = len(d)
print list(itertools.permutations(d, e))
Note that the number of permutations is very large, so storing all of them in a list will consume large amounts of memory - you'd be better off with this:
d = range(0,9)
e = len(d)
for p in itertools.permutations(d, e):
print p

Finding median without using a sort function

As a homework assignment I have to write a script which finds the median of 3 given numbers without using a standard sort function of Python.
This is my first week in class and my first programming experience so I find it very difficult to get any further than I am right now.
Here's what I have so far:
def med3(a,b,c):
list = [a, b, c]
newlist = []
if list:
minimum = list[0]
for x in list:
if x < minimum:
minimum = x
newlist.append(minimum)
list.remove(minimum)
elif x >= minimum:
newlist.append(x)
list.remove(x)
return newlist[1]
This seems to do the trick, but only for the first two entries of the list. The loop doesn't include the third entry.
How can I make the script include all three entries?
Thanks in advance!
Sander
sum([a, b, c]) - min(a, b, c) - max(a, b, c) - no sorting!
You are modifying the list in-place while looping over it, which has consequences for what elements you see:
>>> numbers = [1,2,3]
>>> for i in numbers:
... if i == 2: numbers.remove(i)
... print i
...
1
2
Note how 3 is never printed; by removing the second entry in the list, we've shortened it by one element and the loop finds the list exhausted early.
Note that you don't need to loop over the items, a few simple comparisons will tell you what item is the median if you think about it for a second. :-)
There are a number of simpler ways to go about this, but as for your approach:
You're modifying list inside of your loop. Don't do that. :)
In your case, you should be removing elements from newlist:
def med3(a,b,c):
list = [a, b, c]
newlist = []
if list:
minimum = list[0]
for x in list:
if x < minimum:
minimum = x
newlist.pop()
newlist.append(minimum)
elif x >= minimum:
newlist.append(x)
return newlist[1]
But as an exercise, you might want to think about a few things:
Why are you putting the elements in a list and looping over them? What advantage does this have over comparing a,b,c with simply if statements?
Why the if list:?
The fastest way to do it:
def medianFast(a, b, c):
if a > b:
if b > c:
return b
elif a > c:
return c
else:
return a
else:
if b < c:
return b
elif a > c:
return a
else:
return c
Guarantees you 3 comparisons at the worst case and 2 comparisons in the best case. 2,5 comparisons in average.
Using ternary conditional we can write it shorter as:
def medianTernary(a, b, c):
return (b if b > c else (c if a > c else a)) if a > b else (b if b < c else (a if a > c else c))
If you could use sorting you would have the shortest version:
def medianSorted(a, b, c):
return sorted([a, b, c])[1]

Categories

Resources