Is there a situation where the use of a list leads to an error, and you must use a tuple instead?
I know something about the properties of both tuples and lists, but not enough to find out the answer to this question. If the question would be the other way around, it would be that lists can be adjusted but tuples don't.
You can use tuples as dictionary keys, because they are immutable, but you can't use lists. Eg:
d = {(1, 2): 'a', (3, 8, 1): 'b'} # Valid.
d = {[1, 2]: 'a', [3, 8, 1]: 'b'} # Error.
Because of their immutable nature, tuples (unlike lists) are hashable. This is what allows tuples to be keys in dictionaries and also members of sets. Strictly speaking it is their hashability, not their immutability that counts.
So in addition to the dictionary key answer already given, a couple of other things that will work for tuples but not lists are:
>>> hash((1, 2))
3713081631934410656
>>> set([(1, 2), (2, 3, 4), (1, 2)])
set([(1, 2), (2, 3, 4)])
In string formatting tuples are mandatory:
"You have %s new %s" % ('5', 'mails') # must be a tuple, not a list!
Using a list in that example produces the error "not enough arguments for format string", because a list is considered as one argument. Weird but true.
Related
I have a list of dictionaries like the following:
a = [{1000976: 975},
{1000977: 976},
{1000978: 977},
{1000979: 978},
{1000980: 979},
{1000981: 980},
{1000982: 981},
{1000983: 982},
{1000984: 983},
{1000985: 984}]
I could be thinking about this wrong, but I'm comparing this list of dicts to another list of dicts and am attempting to remove elements (dictionaries) in one list that are in the other. In order to list operations, I want to transform both into sets and perform set subtraction. However I'm getting the following error when attempting to do the conversion.
set_a = set(a)
TypeError: unhashable type: 'dict'
Am I thinking about this incorrectly?
Try this:
>>> a = [{1000976: 975},
... {1000977: 976},
... {1000978: 977},
... {1000979: 978},
... {1000980: 979},
... {1000981: 980},
... {1000982: 981},
... {1000983: 982},
... {1000984: 983},
... {1000985: 984}]
>>> a.extend(a) # just to add some duplicates
>>> len(a)
20
>>> dict_set = set(frozenset(d.items()) for d in a)
>>> b = [dict(s) for s in dict_set]
>>> b
[{1000982: 981}, {1000983: 982}, {1000981: 980}, {1000985: 984}, {1000978: 977}, {1000980: 979}, {1000977: 976}, {1000976: 975}, {1000984: 983}, {1000979: 978}]
>>> len(b)
10
If you want do set subtraction between two lists of dicts then just use the same conversion to sets as above on both dicts, do the subtraction, then convert back.
Note: At the very least all values in your dict should also be hashable (as well as keys but that goes without saying). If not, you need a similar transformation on the values into a hashable, immutable type of some kind.
Note: This is also does not preserve the original order; if that's important to you need to adapt this to an algorithm like this one. The key though is converting dicts to some immutable type.
You could turn the dictionaries into tuples, as there are only two values like so:
a_set = set(t for d in a for t in d.items())
And then use set operations to compare two sets from that point. To convert back into a list of dictionaries, you can use:
a_list = [{key: value} for key, value in a_set]
For filtering there's a one-liner. (b is the filter list of dicts). This is by far and away the fastest approach, unless you are using the same filter against multiple sets.
c = [a[i] for i,j in enumerate(a) if j not in b]
Or using the built in filter: another one-liner (slower):
c = list(filter(lambda i: i not in b, a))
If you are really asking how to convert a list of dicts into a set-operable variable, then you can do this with yet another one-liner:
a_set = set(map(lambda i: frozenset(i.items()), a))
again, if we have 'b' as a list of dicts as our filter
b_set = set(map(lambda i: frozenset(i.items()), b))
... and we can now use set operations on them:
c_set = a_set - b_set
The 'frozenset' method of converting a dict to a set is about 25% faster than using a list comprehension; but it's much slower to convert everything to sets and then perform the set operations than it is just to use a simple list comprehension filter such as the one at the top of my answer. Obviously, if one is going to do many filters, it may be cost effective to convert the objects to immutables; but in that case, it may be better to change the underlying data structure of the objects, and convert the entire structure to a class.
If you don't want to use frozen set and your dicts are arbitrary, rather than single entry dicts, you can tupelise the dicts:
a_set = set(map(lambda j: tuple(map(lambda i: tuple((i, j[i])), j)), a))
You suggest in the question that you don't want ANY nested loop, and so far all the answers (including mine) have a 'for' (or a lambda).
When we want to use a set method for filtering two dictionaries, it's not too shabby to do exactly that as follows:
c = a.items() - b.items()
of course if we want c to be a dict, we need to wrap it again:
c = dict(a.items() - b.items()
Likewise, for lists of immutable types, we can do the same (by coercing our lists into sets:
x = [3, 4, 5, 6, 7]
y = [3, 2, 1, 7]
z = set(x) - set(y)
or (tuples are immutable)
x = [(3, 1), (4, 1), (5, 1), (6, 2), (7, 5)]
y = [(4, 1), (4, 2), (5, 1)]
z = set(x) - set(y)
but (mutable) lists fail (as do your dicts):
x = [[3, 1], [4, 1], [5, 1], [6, 2], [7, 5]]
y = [[4, 1], [4, 2], [5, 1]]
z = set(x) - set(y)
>>>> TypeError: unhashable type: 'list'
This is because they are being stored by reference, not by value - so the uniqueness of them is unknowable at that point. One can handle it by creating a class - but then that is not using a list of dicts anymore, and your 'for' is just being buried into a class method.
So - you will need a nested loop somewhere, even if it is hidden by a lambda or a function..
I saw this, and this questions and I'd like to have the same effect, only efficiently done with itertool.izip.
From itertool.izip's documentation:
Like zip() except that it returns an iterator instead of a list
I need an iterator because I can't fit all values to memory so instead I'm using a generator and iterating over the values.
More specifically, I have a generator that generates a three values tuple, and instead of iterating it I'd like to feed three lists of values to three functions, each list represents a single position in the tuple.
Out of those three-tuple-values, only one is has big items (memory consumption wise) in it (lets call it data) while the other two contain only values that require only little amount of memory to hold, so iterating over the data value's "list of values" first should work for me by consuming the data values one by one, and caching the small ones.
I can't think of a smart way to generate one "list of values" at a time, because I might decide to remove instances of a three-value-tuple occasionally, depending on the big value of the tuple.
Using the widely suggested zip solution, similar to:
>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
Results in the "unpacking argument list" part (*[...]) of this to trigger a full iteration over the entire iterator and (I assume) cache all results in memory, which is as I said, an issue for me.
I can build a mask list (True/False for small values to keep), but I'm looking for a cleaner more pythonic way. If all else fails, I'll do that.
What's wrong with a traditional loop?
>>> def gen():
... yield 'first', 0, 1
... yield 'second', 2, 3
... yield 'third', 4, 5
...
>>> numbers = []
>>> for data, num1, num2 in gen():
... print data
... numbers.append((num1, num2))
...
first
second
third
>>> numbers
[(0, 1), (2, 3), (4, 5)]
I want to sort by first element in tuple, and, if first element for some tuples is equal, by second element.
For example, I have [(5,1),(1,2),(1,1),(4,3)] and I want to get [(1,1),(1,2),(4,3),(5,1)]
How can I do it in pythonic way?
d = [(5,1),(1,2),(1,1),(4,3)]
print(sorted(d,key=lambda x:(x[0],x[1])))
if you want better performance use itemgetter
import operator
l = [(5,1),(1,2),(1,1),(4,3)]
print(sorted(l, key=operator.itemgetter(0,1))
You don't really need to specify the key since you want to sort on the list item itself.
>>> d = [(5,1),(1,2),(1,1),(4,3)]
>>> sorted(d)
[(1, 1), (1, 2), (4, 3), (5, 1)]
Remember, Sorted method will return a list object. In this case d still remains unsorted.
If you want to sort d you can use
>>>d.sort()
Hope it helps
I was reading through some older code of mine and came across this line
itertools.starmap(lambda x,y: x + (y,),
itertools.izip(itertools.repeat(some_tuple,
len(list_of_tuples)),
itertools.imap(lambda x: x[0],
list_of_tuples)))
To be clear, I have some list_of_tuples from which I want to get the first item out of each tuple (the itertools.imap), I have another tuple that I want to repeat (itertools.repeat) such that there is a copy for each tuple in list_of_tuples, and then I want to get new, longer tuples based on the items from list_of_tuples (itertools.starmap).
For example, suppose some_tuple = (1, 2, 3) and list_of_tuples = [(1, other_info), (5, other), (8, 12)]. I want something like [(1, 2, 3, 1), (1, 2, 3, 5), (1, 2, 3, 8)]. This isn't the exact IO (it uses some pretty irrelevant and complex classes) and my actual lists and tuples are very big.
Is there a point to nesting the iterators like this? It seems to me like each function from itertools would have to iterate over the iterator I gave it and store the information from it somewhere, meaning that there is no benefit to putting the other iterators inside of starmap. Am I just completely wrong? How does this work?
There is no reason to nest iterators. Using variables won't have a noticeable impact on performance/memory:
first_items = itertools.imap(lambda x: x[0], list_of_tuples)
repeated_tuple = itertools.repeat(some_tuple, len(list_of_tuples))
items = itertools.izip(repeated_tuple, first_items)
result = itertools.starmap(lambda x,y: x + (y,), items)
The iterator objects used and returned by itertools do not store all the items in memory, but simply calculate the next item when it is needed. You can read more about how they work here.
I don't believe the combobulation above is necessary in this case.
it appears to be equivalent to this generator expression:
(some_tuple + (y[0],) for y in list_of_tuples)
However occasionally itertools can have a performance advantage especially in cpython
I have a list like
[(1, 3), (6, 7)]
and a string
'AABBCCDD'
I need to get the result AABCD.
I know I can get the integers form the tuple with nameOfTuple[0][0] yielding 1.
I also know that I can get the chars form the string with nameOfString[0] yielding A.
My question is, how do I iterate through two arguments in the tuple, in order to save the integers (to a list maybe) and then get the chars from the string?
In [1]: l = [(1, 3), (6, 7)]
In [2]: s = 'AABBCCDD'
In [3]: ''.join(s[start-1:end] for (start,end) in l)
Out[3]: 'AABCD'
Here, pairs of indices from l are assigned to start and end, one pair at a time. The relevant portion of the string is then extracted using s[start-1:end], yielding a sequence of strings. The strings are then merged using join().