This question already has answers here:
How to test if a list contains another list as a contiguous subsequence?
(19 answers)
Closed 9 years ago.
I have 2 lists one :
[(12,23),(12,45),(12,23),(2,5),(1,2),(2,4),(7,34)] which goes up to around 1000 elements
and another:
[(12,23),(12,45),(12,23),(2,5),(1,2),(2,66),(34,7)] which goes up to around 241 elements.
What I want is to check to see if the lists contain any of the same elements and then put them in a new list.
so the new list becomes
[(12,23),(12,45),(12,23),(2,5),(1,2)]
This doesn't include duplicates
>>> A = [(12,23),(12,45),(12,23),(2,5),(1,2),(2,4),(7,34)]
>>> B = [(12,23),(12,45),(12,23),(2,5),(1,2),(2,66),(34,7)]
>>> set(B).intersection(A) # note: making the smaller list to a set is faster
set([(12, 45), (1, 2), (12, 23), (2, 5)])
Or
>>> A = [(12,23),(12,45),(12,23),(2,5),(1,2),(2,4),(7,34)]
>>> B = [(12,23),(12,45),(12,23),(2,5),(1,2),(2,66),(34,7)]
>>> filter(set(B).__contains__, A)
[(12, 23), (12, 45), (12, 23), (2, 5), (1, 2)]
This returns every item in B if it occured in A, which produces the result you give in the example, however the set is probably what you want.
Since I don't know exactly what you are using this for, I'll suggest one more solution which returns a list, containing the items that occur in both lists, the minimum amount of times they occurred in either list (unordered). This differs from the set solution above which only returns each item the number of times it occurred in the other and doesn't care how many times it occurred in the first. This uses Counter for the intersection of multisets.
>>> from collections import Counter
>>> A = [(12,23),(12,45),(12,23),(2,5),(1,2),(2,4),(7,34)]
>>> B = [(12,23),(12,45),(12,23),(2,5),(1,2),(2,66),(34,7)]
>>> list((Counter(A) & Counter(B)).elements())
[(1, 2), (12, 45), (12, 23), (12, 23), (2, 5)]
Related
This question already has an answer here:
How to get combinations of elements from a list?
(1 answer)
Closed 4 years ago.
Let's say I have a list of four values. I want to find all combinations of two of the values. For example, I would like to get an output like:
((0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3))
As you can see, I do not want repetitions, for example (0, 1) and (1, 0)
This needs to be able to be used with larger numbers, not just 4, and I will have to iterate through all of the combos
I am using Python 3 and Windows, and this would ideally be an inbuilt function, a simple bit of list comprehension code, or something I can import. I have tried making this with range, but I do not know how to exclude the numbers that I have already done from it.
It is very easy
from itertools import combinations
list(combinations([0,1,2,3],2))
Take just the lower triangular matrix if you only need a distinct set
a = [1,2,10,20]
[(a[i], a[j+i+1]) for i in range(len(a)) for j in range(len(a[i+1:]))]
[(1, 2), (1, 10), (1, 20), (2, 10), (2, 20), (10, 20)]
This question already has answers here:
Remove partially duplicate tuples from list of tuples
(3 answers)
Making a sequence of tuples unique by a specific element
(1 answer)
Closed 3 years ago.
I have a list of tuples like this one here:
test = [('ent1', 24), ('ent2',12), ('ent3',4.5), ('ent1', 4), ('ent2', 3.5)]
I would like to remove those tuples from the list where the first element has already appeared. So the desired output would be
[('ent1', 24), ('ent2',12), ('ent3',4.5)]
I have no idea how to do this. Normally, if I would like to remove exact duplicated tuples, I would use
list(set(test))
but this is not working in this case. Has anybody an appropriate approach for this problem?
How do you like the output of dict(test)?
{'ent1': 4, 'ent2': 3.5, 'ent3': 4.5}
Or you may want to convert this back to a list of tuples with
>>> list(dict(test).items())
[('ent1', 4), ('ent2', 3.5), ('ent3', 4.5)]
Edit: This will keep the last assigned value but you can also keep the first assigned value by reversing first your list:
>>> list(dict(reversed(test)).items())
[('ent2', 12), ('ent1', 24), ('ent3', 4.5)]
Edit2: If you want to preserve list order, as well, this seems to be a good one-liner solution (inspired by Julien's answer):
>>> [(uk,next(v for k,v in test if k == uk)) for uk in dict(test).keys()]
[('ent1', 24), ('ent2', 12), ('ent3', 4.5)]
And finally, you with functools.reduce you can get another one-liner:
>>> from functools import reduce
>>> reduce(lambda lu,i:i[0] in dict(lu).keys() and lu or lu+[i], test, [])
[('ent1', 24), ('ent2', 12), ('ent3', 4.5)]
Explanation: lu is the list with only unique keys, i is the next item from the test list. If i[0], i.e. the key of the next element is in lu already, we keep lu, otherwise we append i.
Using a check flag
Ex:
test = [('ent1', 24), ('ent2',12), ('ent3',4.5), ('ent1', 4), ('ent2', 3.5)]
check_val = set() #Check Flag
res = []
for i in test:
if i[0] not in check_val:
res.append(i)
check_val.add(i[0])
print(res)
Output:
[('ent1', 24), ('ent2', 12), ('ent3', 4.5)]
test = [('ent1', 24), ('ent2',12), ('ent3',4.5), ('ent1', 4), ('ent2', 3.5)]
deduplicated_test = [(s,[t[1] for t in test if t[0] == s][0]) for s in sorted(set([t[0] for t in test]))]
Short and painful to read, sorry.
I don't remember why sorted(set()) works and set() doesn't but anyway...
This question already has an answer here:
How to get combinations of elements from a list?
(1 answer)
Closed 4 years ago.
Let's say I have a list of four values. I want to find all combinations of two of the values. For example, I would like to get an output like:
((0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3))
As you can see, I do not want repetitions, for example (0, 1) and (1, 0)
This needs to be able to be used with larger numbers, not just 4, and I will have to iterate through all of the combos
I am using Python 3 and Windows, and this would ideally be an inbuilt function, a simple bit of list comprehension code, or something I can import. I have tried making this with range, but I do not know how to exclude the numbers that I have already done from it.
It is very easy
from itertools import combinations
list(combinations([0,1,2,3],2))
Take just the lower triangular matrix if you only need a distinct set
a = [1,2,10,20]
[(a[i], a[j+i+1]) for i in range(len(a)) for j in range(len(a[i+1:]))]
[(1, 2), (1, 10), (1, 20), (2, 10), (2, 20), (10, 20)]
I have created a program using python Dictionary. In this simple program i can not understand the memory structure of the dictionary. And when I retrieve the data from the dictionary at that time the data is not retrieve in the sequence.
Digit = {1 : One, 2: Two,3: Three,4: Four,5: Five,6: Six,7: Seven,8: Eight,9: nine,0: Zero}
print Digit
It will give me the output like thatTwo,Three,Five,Four etc. If I want it ordered in sequence what do I have to do ?
Dictionaries are arbitrarily ordered in Python. The order is not guaranteed and you should not rely on it. If you need an ordered collection, use either OrderedDict or a list.
If you want to access the dictionary in key order, first get a list of the keys then sort it and then step through that:
keys = Digit.keys()
keys.sort()
for i in keys:
print Digit[i]
If you absolutely want to store ordered data, you could use OrderedDict as Burhan Khalid suggested in his answer:
>>> from collections import OrderedDict
>>> Digit = [(1, "One"), (2, "Two"), (3, "Three"), (4, "Four"), (5, "Five"), (6, "Six"), (7, "Seven"), (8, "Eight"), (9, "Nine"), (0, "Zero")]
>>> Digit = OrderedDict(Digit)
>>> Digit
OrderedDict([(1, 'One'), (2, 'Two'), (3, 'Three'), (4, 'Four'), (5, 'Five'), (6, 'Six'), (7, 'Seven'), (8, 'Eight'), (9, 'Nine'), (0, 'Zero')])
>>> for k,v in Digit.items():
... print k, v
...
1 One
2 Two
3 Three
4 Four
5 Five
6 Six
7 Seven
8 Eight
9 Nine
0 Zero
In the list of tuples called mixed_sets, three separate sets exist. Each set contains tuples with values that intersect. A tuple from one set will not intersect with a tuple from another set.
I've come up with the following code to sort out the sets. I found that the python set functionality was limited when tuples are involved. It would be nice if the set intersection operation could look into each tuple index and not stop at the enclosing tuple object.
Here's the code:
mixed_sets= [(1,15),(2,22),(2,23),(3,13),(3,15),
(3,17),(4,22),(4,23),(5,15),(5,17),
(6,21),(6,22),(6,23),(7,15),(8,12),
(8,15),(9,19),(9,20),(10,19),(10,20),
(11,14),(11,16),(11,18),(11,19)]
def sort_sets(a_set):
idx= 0
idx2=0
while len(mixed_sets) > idx and len(a_set) > idx2:
if a_set[idx2][0] == mixed_sets[idx][0] or a_set[idx2][1] == mixed_sets[idx][1]:
a_set.append(mixed_sets[idx])
mixed_sets.pop(idx)
idx=0
else:
idx+=1
if idx == len(mixed_sets):
idx2+=1
idx=0
a_set.pop(0) #remove first item; duplicate
print a_set, 'a returned set'
return a_set
sorted_sets=[]
for new_set in mixed_sets:
sorted_sets.append(sort_sets([new_set]))
print mixed_sets #Now empty.
OUTPUT:
[(1, 15), (3, 15), (5, 15), (7, 15), (8, 15), (3, 13), (3, 17), (5, 17), (8, 12)] a returned set
[(2, 22), (2, 23), (4, 23), (6, 23), (4, 22), (6, 22), (6, 21)] a returned set
[(9, 19), (10, 19), (10, 20), (11, 19), (9, 20), (11, 14), (11, 16), (11, 18)] a returned set
Now this doesn't look like the most pythonic way of doing this task. This code is intended for large lists of tuples (approx 2E6) and I felt the program would run quicker if it didn't have to check tuples already sorted. Therefore I used pop() to shrink the mixed_sets list. I found using pop() made list comprehensions, for loops or any iterators problematic, so I've used the while loop instead.
It does work, but is there a more pythonic way of carrying out this task that doesn't use while loops and the idx and idx2 counters?.
Probably you can increase the speed by first computing a set of all the first elements in the tuples in the mixed_sets, and a set of all the second elements. Then in your iteration you can check if the first or the second element is in one of these sets, and find the correct complete tuple using binary search.
Actually you'd need multi-sets, which you can simulate using dictionaries.
Something like[currently not tested]:
from collections import defaultdict
# define the mixed_sets list.
mixed_sets.sort()
first_els = defaultdict(int)
secon_els = defaultdict(int)
for first,second in mixed_sets:
first_els[first] += 1
second_els[second] += 1
def sort_sets(a_set):
index= 0
while mixed_sets and len(a_set) > index:
first, second = a_set[index]
if first in first_els or second in second_els:
if first in first_els:
element = find_tuple(mixed_sets, first, index=0)
first_els[first] -= 1
if first_els[first] <= 0:
del first_els[first]
else:
element = find_tuple(mixed_sets, second, index=1)
second_els[second] -= 1
if second_els[second] <= 0:
del second_els[second]
a_set.append(element)
mixed_sets.remove(element)
index += 1
a_set.pop(0) #remove first item; duplicate
print a_set, 'a returned set'
return a_set
Where "find_tuple(mixed_sets, first, index=0,1)" return the tuple belonging to mixed_sets that has "first" at the given index.
Probably you'll have to duplicate also mixed_sets and order one of the copies by the first element and the other one by the second element.
Or maybe you could play with dictionaries again. Adding to the values in "first_els" and "second_els" also a sorted list of tuples.
I don't know how the performances will scale, but I think that if the data is in the order of 2 millions you shouldn't have too much to worry about.