Python sets: "in" set command not working - python

I have a list of tuples that I want to get all the combinations for as a set and filter out certain sets based on criteria.
For example
pairs = [(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)]
combs = []
for i in range(len(pairs)):
intermediate = (set(list(combinations(pairs, i))))
if ((2,3) and (2,2) and (3,2)) in intermediate:
combs.append(intermediate)
But it doesn't detect if any of the tuples are in the set, so I tried a more basic test version.
pairs = [(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)]
test = set(combinations(pairs,1))
if (1,1) in test:
print("true")
This also doesn't work even though I can clearly see in my variable explorer that the set contains (1,1).
I've also tried adding an integer 1 as one of the elements in pairs and checking if 1 is in the set but that still doesn't work. I've run out of ideas and some help would be appreciated.

There are two issues here...
First, it would seem like you are not testing membership at the correct depth of a nested data structure. When you call set(combinations(pairs, i)), you get a structure that is 3 levels deep: A set of tuples of tuples of ints (ints 3 containers deep).
>>> pairs = [(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)]
>>> test = set(combinations(pairs,1))
>>> test
{((3, 2),), ((2, 3),), ((1, 1),), ((2, 2),), ((3, 1),), ((1, 3),), ((1, 2),), ((3, 3),), ((2, 1),)}
It is perfectly valid to test if a specific tuple of tuples of ints is contained within the set, but those tuples aren't automatically flattened for you to be able to test against a simple tuple of ints.
>>> ((1,1),) in test
True
>>> (1,1) in test
False
If you want to check if any tuples within the set contain a specific sub-tuple, you'll have to iterate over the set and check each top level tuple individually (hint: things like map can make this iteration a little shorter and sweeter)
for top_tuple in test:
if (1,1) in top_tuple:
print("found it!")
Second, is a somewhat common trap for new python programmers, which is chaining logical operators. You must think of and or in etc.. as similar to mathematical operators similar to + - * / etc.. The other important thing is how the logical operators treat things that aren't True and False. In general python treats things that are empty such as empty lists, strings, tuples, sets, etc.. as False, as well as things that are equal to 0. Basically everything else non-zero or non-empty is treated as True. Then when you run into an and, if the first value (on the left) is True-ish the return value of the and statement will be whatever is on the right. if The first value is False-ish, the return value will be that first value. When you chain them together, they get evaluated left to right.
>>> (1,1) and "cookies"
"cookies"
>>> False and "cookies"
False
>>> (2,3) and (2,2) and (3,2)
(3, 2)

Here is your test set, which does not contain (1,1) as an isolated tuple. It is a tuple inside a tuple.
{((3, 2),), ((2, 3),), ((1, 1),), ((2, 2),), ((3, 1),), ((1, 3),), ((1, 2),), ((3, 3),), ((2, 1),)}
To detect it, you can:
for combo in test:
if (1,1) in combo:
print("true")
#output: true

Related

How to circumvent slow search with Index() method for a large list

I have a large list myList containing tuples.
I need to remove the duplicates in this list (that is the tuples with same elements in the same order). I also need to keep track of this list's indices in a separate list, indexList. If I remove a duplicate, I need to change its index in indexList to first identical value's index.
To demonstrate what I mean, if myList looks like this:
myList = [(6, 2), (4, 3), (6, 2), (8, 1), (5, 4), (4, 3), (2, 1)]
Then I need to construct indexList like this:
indexList = (0, 1, 0, 2, 3, 1, 4)
Here the third value is identical to first, so it (third value) gets index 0. Also the subsequent value gets an updated index of 2 and so on.
Here is how I achieved this:
unique = set()
i = 0
for v in myList[:]:
if v not in unique:
unique.add(v)
indexList.append(i)
i = i+1
else:
myList.pop(i)
indexList.append(myList.index(v))
This does what I need. However index() method makes the script very slow when myList contains hundreds of thousands of elements. As I understand this is because it's an O(n) operation.
So what changes could I make to achieve the same result but make it faster?
If you make a dict to store the first index of each value, you can do the lookup in O(1) instead of O(n). So in this case, before the for loop, do indexes = {}, and then in the if block, do indexes[v] = i and in the else block use indexes[v] instead of myList.index(v).

Why is python's sorted called stable despite not preserving the original order?

Summary
Sorting in Python is guaranteed to be stable since Python 2.2, as documented here and here.
Wikipedia explains what the property of being stable means for the behavior of the algorithm:
A sorting algorithm is stable if whenever there are two records R and S with the same key, and R appears before S in the original list, then R will always appear before S in the sorted list.
However, when sorting objects, such as tuples, sorting appears to be unstable.
For example,
>>> a = [(1, 3), (3, 2), (2, 4), (1, 2)]
>>> sorted(a)
[(1, 2), (1, 3), (2, 4), (3, 2)]
However, to be considered stable, I thought the new sequence should've been
[(1, 3), (1, 2), (2, 4), (3, 2)]
because, in the original sequence, the tuple (1, 3) appears before tuple (1, 2). The sorted function is relying on the 2-ary "keys" when the 1-ary "keys" are equal. (To clarify, the 1-ary key of some tuple t would be t[0] and the 2-ary t[1].)
To produce the expected result, we have to do the following:
>>> sorted(a, key=lambda t: t[0])
[(1, 3), (1, 2), (2, 4), (3, 2)]
I'm guessing there's a false assumption on my part, either about sorted or maybe on how tuple and/or list types are treated during comparison.
Questions
Why is the sorted function said to be "stable" even though it alters the original sequence in this manner?
Wouldn't setting the default behavior to that of the lambda version be more consistent with what "stable" means? Why is it not set this way?
Is this behavior simply a side-effect of how tuples and/or lists are inherently compared (i.e. the false assumption)?
Thanks.
Please note that this is not about whether the default behavior is or isn't useful, common, or something else. It's about whether the default behavior is consistent with the definition of what it means to be stable (which, IMHO, does not appear to be the case) and the guarantee of stability mentioned in the docs.
Think about it - (1, 2) comes before (1, 3), does it not? Sorting a list by default does not automatically mean "just sort it based off the first element". Otherwise you could say that apple comes before aardvark in the alphabet. In other words, this has nothing to do with stability.
The docs also have a nice explanation about how data structures such as lists and tuples are sorted lexicographically:
In particular, tuples and lists are compared lexicographically by comparing corresponding elements. This means that to compare equal, every element must compare equal and the two sequences must be of the same type and have the same length.
Stable sort keeps the order of those elements which are considered equal from the sorting point of view. Because tuples are compared element by element lexicographically, (1, 2) precedes (1, 3), so it should go first:
>>> (1, 2) < (1, 3)
True
A tuple's key is made out of all of its items.
>>> (1,2) < (1,3)
True

Lambda function behavior

I'm trying to understand the behavior of the lambda below. What value is actually passing on to argument pair? This will help me understand the return part pair[1].
pairs = [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
pairs.sort(key=lambda pair: pair[1])
print (pairs)
As I understand, sort will sort the list pairs. It will compare if a function is passed as a parameter. So how I'm I getting the below output:
OUTPUT:
[(4, 'four'), (1, 'one'), (3, 'three'), (2, 'two')]
If you want to sort by the numeric value, change your lambda to
pairs.sort(key=lambda pair: pair[0])
Python is zero-indexed. The first element of each tuple has index 0. pair[1] would refer to the words in the tuple, not the numbers. So if you want to sort by the text, alphabetically, what you have works.
If you want to see what's being passed through the lambda --- which was your question:
from __future__ import print_function #Needed if you're on Python 2
pairs = [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
pairs.sort(key=lambda pair: print(pair[1]))
Which returns
one
two
three
four
Verify this by checking the output if you print(pair[0]) or print(pair).
The lambda function receives a parameter which in your case is a tuple of size 2 and returns the second element (the number in word format in your case). the sort method will sort your pairs list according to the key you pass to it, which is the lambda function in your code. In python, when sorting a list of strings, it will sort in lexicographically, so your code will sort the pairs in a way that the 2nd elements are sorted lexicographically.
Try to avoid lambda expressions where possible. At one point, they were going to be eliminated from the language altogether, and various helper functions and classes were introduced to fill the void that would be left behind. (Ultimately, lambda expressions survived, but the helpers remained.)
One of those was the itemgetter class, which can be used to define functions suitable for use with sort.
from operator import itemgetter
pairs.sort(key=itemgetter(0))
(As #metropolis points out, you want to use 0, not 1, to sort by the initial integer component of each pair.)

Why does map work like izip_longest with fill=None?

When map has different-length inputs, a fill value of None is used for the missing inputs:
>>> x = [[1,2,3,4],[5,6]]
>>> map(lambda *x:x, *x)
[(1, 5), (2, 6), (3, None), (4, None)]
This is the same behavior as:
>>> import itertools
>>> list(itertools.izip_longest(*x))
[(1, 5), (2, 6), (3, None), (4, None)]
What's the reason map provides this behavior, and not the following?
>>> map(lambda *x:x, *x)
[(1, 5), (2, 6), (3,), (4,)]
…and is there an easy way to get the latter behavior either with some flavor of zip or map?
I think this a design decision that the core devs opted for at the time they implemented map. There's no universally defined behavior for map when it is used with multiple iterables, quoting from Map (higher-order function):
Map with 2 or more lists encounters the issue of handling when the
lists are of different lengths. Various languages differ on this; some
raise an exception, some stop after the length of the shortest list
and ignore extra items on the other lists; some continue on to the
length of the longest list, and for the lists that have already ended,
pass some placeholder value to the function indicating no value.
So, Python core devs opted for None as placeholder for shorter iterables at the time map was introduced to Python in 1993.
But in case of itertools.imap it short-circuits with the shortest iterable because its design is heavily inspired from languages like Standard ML, Haskell and APL. In Standard ML and Haskell map ends with shortest iterable(I am not sure about APL though).
Python 3 also removed the map(None, ...)(or we should say itertools.imap, Python 3's map is actually almost itertools.imap: Move map() from itertools to builtins) construct because it was present in Python 2 only because at the time map was added to Python there was no zip() function in Python.
From Issue2186: map and filter shouldn't support None as first argument (in Py3k only):
I concur with Guido that we never would have created map(None, ...) if
zip() had existed. The primary use case is obsolete.
To get the result you want I would suggest using itertools.izip_longest with a sentinel value(object()) rather than default None, None will break things if the iterables itself contain None:
from itertools import izip_longest
def solve(seq):
sentinel = object()
return [tuple(x for x in item if x is not sentinel) for item in
izip_longest(*seq, fillvalue=sentinel)]
print solve([[1,2,3,4],[5,6]])
# [(1, 5), (2, 6), (3,), (4,)]
Given that the first list is always longer and that there are only two lists, you would do something like this:
x = [1,2,3,4,5]
y = ['a','b']
zip(x,y) + [(i,) for i in x[len(y):]]
[(1, 'a'), (2, 'b'), (3,), (4,), (5,)]

sorting a list of tuples in Python

While working on a problem from Google Python class, I formulated following result by using 2-3 examples from Stack overflow-
def sort_last(tuples):
return [b for a,b in sorted((tup[1], tup) for tup in tuples)]
print sort_last([(1, 3), (3, 2), (2, 1)])
I learned List comprehension yesterday, so know a little about list comprehension but I am confused how this solution is working overall. Please help me to understand this (2nd line in function).
That pattern is called decorate-sort-undecorate.
You turn each (1, 3) into (3, (1, 3)), wrapping each tuple in a new tuple, with the item you want to sort by first.
You sort, with the outer tuple ensuring that the second item in the original tuple is sorted on first.
You go back from (3, (1, 3)) to (1, 3) while maintaining the order of the list.
In Python, explicitly decorating is almost always unnecessary. Instead, use the key argument of sorted:
sorted(list_of_tuples, key=lambda tup: tup[1]) # or key=operator.itemgetter(1)
Or, if you want to sort on the reversed version of the tuple, no matter its length:
sorted(list_of_tuples, key=lambda tup: tup[::-1])
# or key=operator.itemgetter(slice(None, None, -1))
Lets break it down:
In : [(tup[1],tup) for tup in tuples]
Out: [(3, (1, 3)), (2, (3, 2)), (1, (2, 1))]
So we just created new tuple where its first value is the last value of inner tuple - this way it is sorted by the 2nd value of each tuple in 'tuples'.
Now we sort the returned list:
In: sorted([(3, (1, 3)), (2, (3, 2)), (1, (2, 1))])
Out: [(1, (2, 1)), (2, (3, 2)), (3, (1, 3))]
So we now have our list sorted by its 2nd value of each tuple. All that is remain is to extract the original tuple, and this is done by taking only b from the for loop.
The list comprehension iterates the given list (sorted([...] in this case) and returns the extracted values by order.
Your example works by creating a new list with the element at index 1 followed by the original tuple for each tuple in the list. Eg. (3,(1,3)) for the first element. The sorted function sorts by each element starting from index 0 so the list is sorted by the second item. The function then goes through each item in the new list, and returns the orignal tuples.
Another way of doing this is by using the key parameter in the sorted function which sorts based on the value of the key. In this case you want the key to be the item in each tuple at index 1.
>>> from operator import itemgetter
>>> sorted([(1, 3), (3, 2), (2, 1)],key=itemgetter(1))
Pls refer to the accepted answer.. + here is an example for better visualization,
key is a function that will be called to transform the collection's items for comparison.. like compareTo method in Java.
The parameter passed to key must be something that is callable. Here, the use of lambda creates an anonymous function (which is a callable).
The syntax of lambda is the word lambda followed by a iterable name then a single block of code.
Below example, we are sorting a list of tuple that holds the info abt time of certain event and actor name.
We are sorting this list by time of event occurrence - which is the 0th element of a tuple.
Shout out for the Ready Player One fans! =)
>>> gunters = [('2044-04-05', 'parzival'), ('2044-04-07', 'aech'), ('2044-04-06', 'art3mis')]
>>> gunters.sort(key=lambda tup: tup[0])
>>> print gunters
[('2044-04-05', 'parzival'), ('2044-04-06', 'art3mis'), ('2044-04-07', 'aech')]
Note - s.sort([cmp[, key[, reverse]]]) sorts the items of s in place

Categories

Resources