Separating sets using tuples - python

In the list of tuples called mixed_sets, three separate sets exist. Each set contains tuples with values that intersect. A tuple from one set will not intersect with a tuple from another set.
I've come up with the following code to sort out the sets. I found that the python set functionality was limited when tuples are involved. It would be nice if the set intersection operation could look into each tuple index and not stop at the enclosing tuple object.
Here's the code:
mixed_sets= [(1,15),(2,22),(2,23),(3,13),(3,15),
(3,17),(4,22),(4,23),(5,15),(5,17),
(6,21),(6,22),(6,23),(7,15),(8,12),
(8,15),(9,19),(9,20),(10,19),(10,20),
(11,14),(11,16),(11,18),(11,19)]
def sort_sets(a_set):
idx= 0
idx2=0
while len(mixed_sets) > idx and len(a_set) > idx2:
if a_set[idx2][0] == mixed_sets[idx][0] or a_set[idx2][1] == mixed_sets[idx][1]:
a_set.append(mixed_sets[idx])
mixed_sets.pop(idx)
idx=0
else:
idx+=1
if idx == len(mixed_sets):
idx2+=1
idx=0
a_set.pop(0) #remove first item; duplicate
print a_set, 'a returned set'
return a_set
sorted_sets=[]
for new_set in mixed_sets:
sorted_sets.append(sort_sets([new_set]))
print mixed_sets #Now empty.
OUTPUT:
[(1, 15), (3, 15), (5, 15), (7, 15), (8, 15), (3, 13), (3, 17), (5, 17), (8, 12)] a returned set
[(2, 22), (2, 23), (4, 23), (6, 23), (4, 22), (6, 22), (6, 21)] a returned set
[(9, 19), (10, 19), (10, 20), (11, 19), (9, 20), (11, 14), (11, 16), (11, 18)] a returned set
Now this doesn't look like the most pythonic way of doing this task. This code is intended for large lists of tuples (approx 2E6) and I felt the program would run quicker if it didn't have to check tuples already sorted. Therefore I used pop() to shrink the mixed_sets list. I found using pop() made list comprehensions, for loops or any iterators problematic, so I've used the while loop instead.
It does work, but is there a more pythonic way of carrying out this task that doesn't use while loops and the idx and idx2 counters?.

Probably you can increase the speed by first computing a set of all the first elements in the tuples in the mixed_sets, and a set of all the second elements. Then in your iteration you can check if the first or the second element is in one of these sets, and find the correct complete tuple using binary search.
Actually you'd need multi-sets, which you can simulate using dictionaries.
Something like[currently not tested]:
from collections import defaultdict
# define the mixed_sets list.
mixed_sets.sort()
first_els = defaultdict(int)
secon_els = defaultdict(int)
for first,second in mixed_sets:
first_els[first] += 1
second_els[second] += 1
def sort_sets(a_set):
index= 0
while mixed_sets and len(a_set) > index:
first, second = a_set[index]
if first in first_els or second in second_els:
if first in first_els:
element = find_tuple(mixed_sets, first, index=0)
first_els[first] -= 1
if first_els[first] <= 0:
del first_els[first]
else:
element = find_tuple(mixed_sets, second, index=1)
second_els[second] -= 1
if second_els[second] <= 0:
del second_els[second]
a_set.append(element)
mixed_sets.remove(element)
index += 1
a_set.pop(0) #remove first item; duplicate
print a_set, 'a returned set'
return a_set
Where "find_tuple(mixed_sets, first, index=0,1)" return the tuple belonging to mixed_sets that has "first" at the given index.
Probably you'll have to duplicate also mixed_sets and order one of the copies by the first element and the other one by the second element.
Or maybe you could play with dictionaries again. Adding to the values in "first_els" and "second_els" also a sorted list of tuples.
I don't know how the performances will scale, but I think that if the data is in the order of 2 millions you shouldn't have too much to worry about.

Related

Sort a list of tuples in consecutive order

I want to sort a list of tuples in a consecutive order, so the first element of each tuple is equal to the last element of the previous one.
For example:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
output = [(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
I have developed a search like this:
output=[]
given = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
t = given[0][0]
for i in range(len(given)):
# search tuples starting with element t
output += [e for e in given if e[0] == t]
t = output[-1][-1] # Get the next element to search
print(output)
Is there a pythonic way to achieve such order?
And a way to do it "in-place" (with only a list)?
In my problem, the input can be reordered in a circular way using all the tuples, so it is not important the first element chosen.
Assuming your tuples in the list will be circular, you may use dict to achieve it within complexity of O(n) as:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
input_dict = dict(input) # Convert list of `tuples` to dict
elem = input[0][0] # start point in the new list
new_list = [] # List of tuples for holding the values in required order
for _ in range(len(input)):
new_list.append((elem, input_dict[elem]))
elem = input_dict[elem]
if elem not in input_dict:
# Raise exception in case list of tuples is not circular
raise Exception('key {} not found in dict'.format(elem))
Final value hold by new_list will be:
>>> new_list
[(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
if you are not afraid to waste some memory you could create a dictionary start_dict containing the start integers as keys and the tuples as values and do something like this:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
start_dict = {item[0]: item for item in tpl}
start = tpl[0][0]
res = []
while start_dict:
item = start_dict[start]
del start_dict[start]
res.append(item)
start = item[-1]
print(res)
if two tuples start with the same number you will lose one of them... if not all the start numbers are used the loop will not terminate.
but maybe this is something to build on.
Actually there're many questions about what you intend to have as an output and what if the input list has invalid structure to do what you need.
Assuming you have an input of pairs where each number is included twice only. So we can consider such input as a graph where numbers are nodes and each pair is an edge. And as far as I understand your question you suppose that this graph is cyclic and looks like this:
10 - 7 - 13 - 4 - 9 - 10 (same 10 as at the beginning)
This shows you that you can reduce the list to store the graph to [10, 7, 13, 4, 9]. And here is the script that sorts the input list:
# input
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
# sorting and archiving
first = input[0][0]
last = input[0][1]
output_in_place = [first, last]
while last != first:
for item in input:
if item[0] == last:
last = item[1]
if last != first:
output_in_place.append(last)
print(output_in_place)
# output
output = []
for i in range(len(output_in_place) - 1):
output.append((output_in_place[i], output_in_place[i+1]))
output.append((output_in_place[-1], output_in_place[0]))
print(output)
I would first create a dictionary of the form
{first_value: [list of tuples with that first value], ...}
Then work from there:
from collections import defaultdict
chosen_tuples = input[:1] # Start from the first
first_values = defaultdict()
for tup in input[1:]:
first_values[tup[0]].append(tup)
while first_values: # Loop will end when all lists are removed
value = chosen_tuples[-1][1] # Second item of last tuple
tuples_with_that_value = first_values[value]
chosen_tuples.append(tuples_with_that_value.pop())
if not chosen_with_that_value:
del first_values[value] # List empty, remove it
You can try this:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
output = [input[0]] # output contains the first element of input
temp = input[1:] # temp contains the rest of elements in input
while temp:
item = [i for i in temp if i[0] == output[-1][1]].pop() # We compare each element with output[-1]
output.append(item) # We add the right item to output
temp.remove(item) # We remove each handled element from temp
Output:
>>> output
[(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
Here is a robust solution using the sorted function and a custom key function:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
def consec_sort(lst):
def key(x):
nonlocal index
if index <= lower_index:
index += 1
return -1
return abs(x[0] - lst[index - 1][1])
for lower_index in range(len(lst) - 2):
index = 0
lst = sorted(lst, key=key)
return lst
output = consec_sort(input)
print(output)
The original list is not modified. Note that sorted is called 3 times for your input list of length 5. In each call, one additional tuple is placed correctly. The first tuple keeps it original position.
I have used the nonlocal keyword, meaning that this code is for Python 3 only (one could use global instead to make it legal Python 2 code).
My two cents:
def match_tuples(input):
# making a copy to not mess up with the original one
tuples = input[:] # [(10,7), (4,9), (13, 4), (7, 13), (9, 10)]
last_elem = tuples.pop(0) # (10,7)
# { "first tuple's element": "index in list"}
indexes = {tup[0]: i for i, tup in enumerate(tuples)} # {9: 3, 4: 0, 13: 1, 7: 2}
yield last_elem # yields de firts element
for i in range(len(tuples)):
# get where in the list is the tuple which first element match the last element in the last tuple
list_index = indexes.get(last_elem[1])
last_elem = tuples[list_index] # just get that tuple
yield last_elem
Output:
input = [(10,7), (4,9), (13, 4), (7, 13), (9, 10)]
print(list(match_tuples(input)))
# output: [(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
this is a (less efficient than the dictionary version) variant where the list is changed in-place:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
for i in range(1, len(tpl)-1): # iterate over the indices of the list
item = tpl[i]
for j, next_item in enumerate(tpl[i+1:]): # find the next item
# in the remaining list
if next_item[0] == item[1]:
next_index = i + j
break
tpl[i], tpl[next_index] = tpl[next_index], tpl[i] # now swap the items
here is a more efficient version of the same idea:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
start_index = {item[0]: i for i, item in enumerate(tpl)}
item = tpl[0]
next_index = start_index[item[-1]]
for i in range(1, len(tpl)-1):
tpl[i], tpl[next_index] = tpl[next_index], tpl[i]
# need to update the start indices:
start_index[tpl[next_index][0]] = next_index
start_index[tpl[i][0]] = i
next_index = start_index[tpl[i][-1]]
print(tpl)
the list is changed in-place; the dictionary only contains the starting values of the tuples and their index in the list.
To get a O(n) algorithm one needs to make sure that one doesn't do a double-loop over the array. One way to do this is by keeping already processed values in some sort of lookup-table (a dict would be a good choice).
For example something like this (I hope the inline comments explain the functionality well). This modifies the list in-place and should avoid unnecessary (even implicit) looping over the list:
inp = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
# A dictionary containing processed elements, first element is
# the key and the value represents the tuple. This is used to
# avoid the double loop
seen = {}
# The second value of the first tuple. This must match the first
# item of the next tuple
current = inp[0][1]
# Iteration to insert the next element
for insert_idx in range(1, len(inp)):
# print('insert', insert_idx, seen)
# If the next value was already found no need to search, just
# pop it from the seen dictionary and continue with the next loop
if current in seen:
item = seen.pop(current)
inp[insert_idx] = item
current = item[1]
continue
# Search the list until the next value is found saving all
# other items in the dictionary so we avoid to do unnecessary iterations
# over the list.
for search_idx in range(insert_idx, len(inp)):
# print('search', search_idx, inp[search_idx])
item = inp[search_idx]
first, second = item
if first == current:
# Found the next tuple, break out of the inner loop!
inp[insert_idx] = item
current = second
break
else:
seen[first] = item

Organizing list of tuples

I have a list of tuples which I create dynamically.
The list appears as:
List = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]
Each tuple (a, b) of list represents the range of indexes from a certain table.
The ranges (a, b) and (b, d) is same in my situation as (a, d)
I want to merge the tuples where the 2nd element matches the first of any other.
So, in the example above, I want to merge (8, 10), (10,13) to obtain (8,13) and remove (8, 10), (10,13)
(19,25) and (25,30) merge should yield (19, 30)
I don't have a clue where to start. The tuples are non overlapping.
Edit: I have been trying to just avoid any kind of for loop as I have a pretty large list
If you need to take into account things like skovorodkin's example in the comment,
[(1, 4), (4, 8), (8, 10)]
(or even more complex examples), then one way to do efficiently would be using graphs.
Say you create a digraph (possibly using networkx), where each pair is a node, and there is an edge from (a, b) to node (c, d) if b == c. Now run topological sort, iterate according to the order, and merge accordingly. You should take care to handle nodes with two (or more) outgoing edges properly.
I realize your question states you'd like to avoid loops on account of the long list size. Conversely, for long lists, I doubt you'll find even an efficient linear time solution using list comprehension (or something like that). Note that you cannot sort the list in linear time, for example.
Here is a possible implementation:
Say we start with
l = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]
It simplifies the following to remove duplicates, so let's do:
l = list(set(l))
Now to build the digraph:
import networkx as nx
import collections
g = nx.DiGraph()
The vertices are simply the pairs:
g.add_nodes_from(l)
To build the edges, we need a dictionary:
froms = collections.defaultdict(list)
for p in l:
froms[p[0]].append(p)
Now we can add the edges:
for p in l:
for from_p in froms[p[1]]:
g.add_edge(p, from_p)
Next two lines are unneeded - they're just here to show what the graph looks like at this point:
>>> g.nodes()
[(25, 30), (14, 16), (10, 13), (8, 10), (1, 4), (19, 25)]
>>> g.edges()
[((8, 10), (10, 13)), ((19, 25), (25, 30))]
Now, let's sort the pairs by topological sort:
l = nx.topological_sort(g)
Finally, here's the tricky part. The result will be a DAG. We have to to traverse things recursively, but remember what we visited already.
Let's create a dict of what we visited:
visited = {p: False for p in l}
Now a recursive function, that given a node, returns the maximum range edge from any node reachable from it:
def visit(p):
neighbs = g.neighbors(p)
if visited[p] or not neighbs:
visited[p] = True
return p[1]
mx = max([visit(neighb_p) for neighb_p in neighbs])
visited[p] = True
return mx
We're all ready. Let's create a list for the final pairs:
final_l = []
and visit all nodes:
for p in l:
if visited[p]:
continue
final_l.append((p[0], visit(p)))
Here's the final result:
>>> final_l
[(1, 4), (8, 13), (14, 16)]
If they don't overlap, then you can sort them, and then just combine adjacent ones.
Here's a generator that yields the new tuples:
def combine_ranges(L):
L = sorted(L) # Make a copy as we're going to remove items!
while L:
start, end = L.pop(0) # Get the first item
while L and L[0][0] == end:
# While the first of the rest connects to it, adjust
# the end and remove the first of the rest
_, end = L.pop(0)
yield (start, end)
print(list(combine_ranges(List)))
If speed is important, use a collections.deque instead of a list, so that the .pop(0) operations can be in constant speed.
non-recursive approach, using sorting (I've added more nodes to handle complex case):
l = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30), (30,34), (38,40)]
l = sorted(l)
r=[]
idx=0
while idx<len(l):
local=idx+1
previous_value = l[idx][1]
# search longest string
while local<len(l):
if l[local][0]!=previous_value:
break
previous_value = l[local][1]
local+=1
# store tuple
r.append((l[idx][0],l[local-1][1]))
idx = local
print(r)
result:
[(1, 4), (8, 13), (14, 16), (19, 34), (38, 40)]
The only drawback is that original sort order is not preserved. I don't know if it's a problem.
Here is one optimized recursion approach:
In [44]: def find_intersection(m_list):
for i, (v1, v2) in enumerate(m_list):
for j, (k1, k2) in enumerate(m_list[i + 1:], i + 1):
if v2 == k1:
m_list[i] = (v1, m_list.pop(j)[1])
return find_intersection(m_list)
return m_list
Demo:
In [45]: lst = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]
In [46]: find_intersection(lst)
Out[46]: [(1, 4), (8, 13), (19, 30), (14, 16)]
You can use a dictionary to map the different end indices to the range ending at that index; then just iterate the list sorted by start index and merge the segments accordingly:
def join_lists(lst):
ending = {} # will map end position to range
for start, end in sorted(lst): # iterate in sorted order
if start in ending:
ending[end] = (ending[start][0], end) # merge
del ending[start] # remove old value
else:
ending[end] = (start, end)
return list(ending.values()) # return remaining values from dict
Alternatively, as pointed out by Tomer W in comments, you can do without the sorting, by iterating the list twice, making this solution take only linear time (O(n)) w.r.t. the length of the list.
def join_lists(lst):
ending = {} # will map end position to range
# first pass: add to dictionary
for start, end in lst:
ending[end] = (start, end)
# second pass: lookup and merge
for start, end in lst:
if start in ending:
ending[end] = (ending[start][0], end)
del ending[start]
# return remaining values from dict
return list(ending.values())
Examples output, for both cases:
>>> join_lists([(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)])
[(1, 4), (8, 13), (14, 16), (19, 30)]
>>> join_lists(lst = [(1, 4), (4, 8), (8, 10)])
[(1, 10)]
The list is first sorted and adjacent pairs of (min1, max1), (min2, max2) are merged together if they overlap.
MIN=0
MAX=1
def normalize(intervals):
isort = sorted(intervals)
for i in range(len(isort) - 1):
if isort[i][MAX] >= isort[i + 1][MIN]:
vmin = isort[i][MIN]
vmax = max(isort[i][MAX], isort[i + 1][MAX])
isort[i] = None
isort[i + 1] = (vmin, vmax)
return [r for r in isort if r is not None]
List1 = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]
List2 = [(1, 4), (4, 8), (8, 10)]
print(normalize(List1))
print(normalize(List2))
#[(1, 4), (8, 13), (14, 16), (19, 30)]
#[(1, 10)]
The following should work. It breaks tuples into individual numbers, then finds the tuple bound on each cluster. This should work even with difficult overlaps, like [(4, 10), (9, 12)]
It's a very simple fix.
# First turn your list of tuples into a list of numbers:
my_list = []
for item in List: my_list = my_list + [i for i in range(item[0], item[1]+1)]
# Then create tuple pairs:
output = []
a = False
for x in range(max(my_list)+1):
if (not a) and (x in my_list): a = x
if (a) and (x+1 not in my_list):
output.append((a, x))
a = False
print output

Insert element to list based on previous and next elements

I'm trying to add a new tuple to a list of tuples (sorted by first element in tuple), where the new tuple contains elements from both the previous and the next element in the list.
Example:
oldList = [(3, 10), (4, 7), (5,5)]
newList = [(3, 10), (4, 10), (4, 7), (5, 7), (5, 5)]
(4,10) was constructed from and added in between (3,10) and (4,7).
Construct (x,y) from (a,y) and (x,b)
I've tried using enumerate() to insert at the specific position, but that doesn't really let me access the next element.
oldList = [(3, 10), (4, 7), (5,5)]
def pair(lst):
# create two iterators
it1, it2 = iter(lst), iter(lst)
# move second to the second tuple
next(it2)
for ele in it1:
# yield original
yield ele
# yield first ele from next and first from current
yield (next(it2)[0], ele[1])
Which will give you:
In [3]: oldList = [(3, 10), (4, 7), (5, 5)]
In [4]: list(pair(oldList))
Out[4]: [(3, 10), (4, 10), (4, 7), (5, 7), (5, 5)]
Obviously we need to do some error handling to handle different possible situations.
You could also do it using a single iterator if you prefer:
def pair(lst):
it = iter(lst)
prev = next(it)
for ele in it:
yield prev
yield (prev[0], ele[1])
prev = ele
yield (prev[0], ele[1])
You can use itertools.tee in place of calling iter:
from itertools import tee
def pair(lst):
# create two iterators
it1, it2 = tee(lst)
# move second to the second tuple
next(it2)
for ele in it1:
# yield original
yield ele
# yield first ele from next and first from current
yield (next(it2)[0], ele[1])
You can use a list comprehension and itertools.chain():
>>> list(chain.from_iterable([((i, j), (x, j)) for (i, j), (x, y) in zip(oldList, oldList[1:])])) + oldList[-1:]
[(3, 10), (4, 10), (4, 7), (5, 7), (5, 5)]
Not being a big fan of one-liners (or complexity) myself, I will propose a very explicit and readable (which is usually a good thing!) solution to your problem.
So, in a very simplistic approach, you could do this:
def insertElements(oldList):
"""
Return a new list, alternating oldList tuples with
new tuples in the form (oldList[i+1][0],oldList[i][1])
"""
newList = []
for i in range(len(oldList)-1):
# take one tuple as is
newList.append(oldList[i])
# then add a new one with items from current and next tuple
newList.append((oldList[i+1][0],oldList[i][1]))
else:
# don't forget the last tuple
newList.append(oldList[-1])
return newList
oldList = [(3, 10), (4, 7), (5, 5)]
newList = insertElements(oldList)
That will give you the desired result in newList:
print(newList)
[(3, 10), (4, 10), (4, 7), (5, 7), (5, 5)]
This is not much longer code than other more sophisticated (and memory efficient!) solutions, like using generators, AND I consider it a lot easier to read than intricate one-liners. Also, it would be easy to add some checks to this simple function (like making sure you have a list of tuples).
Unless you already know you need to optimize this particular piece of your code (assuming this is part of a bigger project), this should be good enough. At the same time it is: easy to implement, easy to read, easy to explain, easy to maintain, easy to extend, easy to refactor, etc.
Note: all other previous answers to your question are also better solutions than this simple one, in many ways. Just wanted to give you another choice. Hope this helps.

Manually enumerating a list

I'm trying to figure out how to manually enumerate a list however I'm stuck as I cannot figure out how to split up the data list. This is the code I have so far..
enumerated_list = []
data = [5, 10, 15]
for x in (data):
print(x)
for i in range(len(data)):
enumerate_rule = (i, x)
enumerated_list.append(enumerate_rule)
print(enumerated_list)
This prints out..
5
10
15
[(0, 15), (1, 15), (2, 15)]
When what I'm after is [(0, 5), (1, 15), (2, 15)]. How would I go about this?
Use the enumerate() built-in:
>>> list(enumerate([5, 15, 15]))
[(0, 5), (1, 15), (2, 15)]
Your original code's fault lies in the fact you use x in your loop, however, x doesn't change in that loop, it's simply left there from the previous loop where you printed values.
However, this method of doing it is a bad way. Fixing it would require looping by index, something which Python isn't designed to do - it's slow and hard to read. Instead, we loop by value. The enumerate() built-in is there to do this job for us, as it's a reasonably common task.
If you really don't want to use enumerate() (which doesn't ever really make sense, but maybe as an arbitrary restriction trying to teach you about something else, at a stretch), then there are still better ways:
>>> from itertools import count
>>> list(zip(count(), [5, 15, 15]))
[(0, 5), (1, 15), (2, 15)]
Here we use zip(), which is the python function used to loop over two sets of data at once. This returns tuples of the first value from each iterable, then the second from each, etc... This gives us the result we want when combined with itertools.count(), which does what it says on the tin.
If you really feel the need to build a list manually, the more pythonic way of doing something rather unpythonic would be:
enumerated_list = []
count = 0
for item in data:
enumerated_list.append((count, item))
count += 1
Note, however, that generally, one would use a list comprehension to build a list like this - in this case, as soon as one would do that, it makes more sense to use one of the earlier methods. This kind of production of a list is inefficient and hard to read.
Since x goes through every element in `data, at the end of:
for x in (data):
print(x)
x will be the last element. Which is why you get 15 as the second element in each tuple:
[(0, 15), (1, 15), (2, 15)]
You only need one loop:
for i in range(len(data)):
enumerate_rule = (i, data[i]) # data[i] gets the ith element of data
enumerated_list.append(enumerate_rule)
enumerate_rule = (i, x) is the problem. You are using the same value (x, the last item in the list) each time. Change it to enumerate_rule = (i, data[i]).
I would use a normal "for loop" but with enumerated(), so you can use an index i in the loop:
enumerated_list=[]
data = [5, 10, 15]
for i,f in enumerate(data):
enumerated_list.append((i,f))
print enumerated_list
Result:
[(0, 5), (1, 15), (2, 15)]

checking to see if same value in two lists [duplicate]

This question already has answers here:
How to test if a list contains another list as a contiguous subsequence?
(19 answers)
Closed 9 years ago.
I have 2 lists one :
[(12,23),(12,45),(12,23),(2,5),(1,2),(2,4),(7,34)] which goes up to around 1000 elements
and another:
[(12,23),(12,45),(12,23),(2,5),(1,2),(2,66),(34,7)] which goes up to around 241 elements.
What I want is to check to see if the lists contain any of the same elements and then put them in a new list.
so the new list becomes
[(12,23),(12,45),(12,23),(2,5),(1,2)]
This doesn't include duplicates
>>> A = [(12,23),(12,45),(12,23),(2,5),(1,2),(2,4),(7,34)]
>>> B = [(12,23),(12,45),(12,23),(2,5),(1,2),(2,66),(34,7)]
>>> set(B).intersection(A) # note: making the smaller list to a set is faster
set([(12, 45), (1, 2), (12, 23), (2, 5)])
Or
>>> A = [(12,23),(12,45),(12,23),(2,5),(1,2),(2,4),(7,34)]
>>> B = [(12,23),(12,45),(12,23),(2,5),(1,2),(2,66),(34,7)]
>>> filter(set(B).__contains__, A)
[(12, 23), (12, 45), (12, 23), (2, 5), (1, 2)]
This returns every item in B if it occured in A, which produces the result you give in the example, however the set is probably what you want.
Since I don't know exactly what you are using this for, I'll suggest one more solution which returns a list, containing the items that occur in both lists, the minimum amount of times they occurred in either list (unordered). This differs from the set solution above which only returns each item the number of times it occurred in the other and doesn't care how many times it occurred in the first. This uses Counter for the intersection of multisets.
>>> from collections import Counter
>>> A = [(12,23),(12,45),(12,23),(2,5),(1,2),(2,4),(7,34)]
>>> B = [(12,23),(12,45),(12,23),(2,5),(1,2),(2,66),(34,7)]
>>> list((Counter(A) & Counter(B)).elements())
[(1, 2), (12, 45), (12, 23), (12, 23), (2, 5)]

Categories

Resources