Related
This question already has answers here:
Get unique items from list of lists? [duplicate]
(3 answers)
Closed 2 years ago.
I have a list of lists of tuples:
x = [[(0,0),(0,1)],[(1,2),(2,3)],[(0,0),(0,1)],[(1,2),(2,3)]]
I want all the unique lists present in the list x
My output should be:
x=[[(0,0),(0,1)],[(1,2),(2,3)]]
I tried using x=list(set(x)) but it gives error: 'list is not hashable', I also tried using numpy.unique but it does not give desired output. How can I implement this?
list as mutable and hence can not be used as an element to a set. However, you can type-cast the list to tuple which are immutable, and then you can get unique elements using set. Here I am using map() to covert all sub-lists to tuple:
>>> x = [[(0,0),(0,1)],[(1,2),(2,3)],[(0,0),(0,1)],[(1,2),(2,3)]]
>>> set(map(tuple, x))
{((1, 2), (2, 3)), ((0, 0), (0, 1))}
To type-cast the set back to list and change back nested tuple to list, you can further use map() as:
>>> list(map(list, set(map(tuple, x))))
[[(1, 2), (2, 3)], [(0, 0), (0, 1)]]
i would do something like this:
x = [[(0,0),(0,1)],[(1,2),(2,3)],[(0,0),(0,1)],[(1,2),(2,3)]]
result = []
for i in x:
if not i in result:
result.append(i)
print(result)
maybe it is not the fastest way but certainly it is the simpler.
Otherwise you can use the most "cool" way, you can use sets. Sets are like lists that don't allow equal elements.
x = x = [[(0,0),(0,1)],[(1,2),(2,3)],[(0,0),(0,1)],[(1,2),(2,3)]]
result = list(map(list,set(map(tuple,x))))
I have a list of lists. Each list has string values in it.
A value in the list is often seen passing through different lists. I want to find the values that occur in different lists at least more than k times.
For example, 127-0-0-1-59928 can be seen 3 times or 3-7-3-final-0 can be seen 4 times in the following case, and similarly there are other values that repeat.
[['127-0-0-1-59924'],
['127-0-0-1-59922'],
['127-0-0-1-59926'],
['127-0-0-1-59926', '3-8-0', '4-15-0-76', '3-7-3-final-0'],
['127-0-0-1-59928'],
['127-0-0-1-59928', '3-8-0', '4-15-0-76', '3-7-3-final-0'],
['127-0-0-1-59928'],
['127-0-0-1-59926'],
['127-0-0-1-34426'],
['127-0-0-1-34426', '3-8-0', '4-15-0-76', '3-7-3-final-0'],
['127-0-0-1-34428'],
['127-0-0-1-34428', '3-8-0', '4-15-0-76', '3-7-3-final-0'],
['127-0-0-1-34428'],
['127-0-0-1-34426']]
Is there an efficient way in which the frequencies of the values and/or values that occur in multiple lists more frequently (say above a certain threshold 'k') can be calculated?
Thanks a lot for the help!
You could just create a collections.Counter with the elements of all the lists:
lst = [['127-0-0-1-59924'], ...]
import collections
counts = collections.Counter(c for l in lst for c in l)
print(counts.most_common())
# [('3-8-0', 4), ('4-15-0-76', 4), ('3-7-3-final-0', 4), ('127-0-0-1-59926', 3), ('127-0-0-1-59928', 3), ('127-0-0-1-34426', 3), ('127-0-0-1-34428', 3), ('127-0-0-1-59924', 1), ('127-0-0-1-59922', 1)]
Note that this will be the accumulated counts of all the lists, so if an element appears twice in the same list, that counts as two occurrences, too.
If, instead, you do not want to consider multiple occurrences in the same list, but just count the number of different lists the elements appear in, you could do the same, but convert the sublists to set first (the result is the same in this case):
counts = collections.Counter(c for l in lst for c in set(l))
Neither of those methods considers the position of the element in the list, in case that's a concern.
I saw this, and this questions and I'd like to have the same effect, only efficiently done with itertool.izip.
From itertool.izip's documentation:
Like zip() except that it returns an iterator instead of a list
I need an iterator because I can't fit all values to memory so instead I'm using a generator and iterating over the values.
More specifically, I have a generator that generates a three values tuple, and instead of iterating it I'd like to feed three lists of values to three functions, each list represents a single position in the tuple.
Out of those three-tuple-values, only one is has big items (memory consumption wise) in it (lets call it data) while the other two contain only values that require only little amount of memory to hold, so iterating over the data value's "list of values" first should work for me by consuming the data values one by one, and caching the small ones.
I can't think of a smart way to generate one "list of values" at a time, because I might decide to remove instances of a three-value-tuple occasionally, depending on the big value of the tuple.
Using the widely suggested zip solution, similar to:
>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
Results in the "unpacking argument list" part (*[...]) of this to trigger a full iteration over the entire iterator and (I assume) cache all results in memory, which is as I said, an issue for me.
I can build a mask list (True/False for small values to keep), but I'm looking for a cleaner more pythonic way. If all else fails, I'll do that.
What's wrong with a traditional loop?
>>> def gen():
... yield 'first', 0, 1
... yield 'second', 2, 3
... yield 'third', 4, 5
...
>>> numbers = []
>>> for data, num1, num2 in gen():
... print data
... numbers.append((num1, num2))
...
first
second
third
>>> numbers
[(0, 1), (2, 3), (4, 5)]
I want to sort by first element in tuple, and, if first element for some tuples is equal, by second element.
For example, I have [(5,1),(1,2),(1,1),(4,3)] and I want to get [(1,1),(1,2),(4,3),(5,1)]
How can I do it in pythonic way?
d = [(5,1),(1,2),(1,1),(4,3)]
print(sorted(d,key=lambda x:(x[0],x[1])))
if you want better performance use itemgetter
import operator
l = [(5,1),(1,2),(1,1),(4,3)]
print(sorted(l, key=operator.itemgetter(0,1))
You don't really need to specify the key since you want to sort on the list item itself.
>>> d = [(5,1),(1,2),(1,1),(4,3)]
>>> sorted(d)
[(1, 1), (1, 2), (4, 3), (5, 1)]
Remember, Sorted method will return a list object. In this case d still remains unsorted.
If you want to sort d you can use
>>>d.sort()
Hope it helps
I have a dataset of events (tweets to be specific) that I am trying to bin / discretize. The following code seems to work fine so far (assuming 100 bins):
HOUR = timedelta(hours=1)
start = datetime.datetime(2009,01,01)
z = [dt + x*HOUR for x in xrange(1, 100)]
But then, I came across this fateful line at python docs 'This makes possible an idiom for clustering a data series into n-length groups using zip(*[iter(s)]*n)'. The zip idiom does indeed work - but I can't understand how (what is the * operator for instance?). How could I use to make my code prettier? I'm guessing this means I should make a generator / iterable for time that yields the time in graduations of an HOUR?
I will try to explain zip(*[iter(s)]*n) in terms of a simpler example:
imagine you have the list s = [1, 2, 3, 4, 5, 6]
iter(s) gives you a listiterator object that will yield the next number from s each time you ask for an element.
[iter(s)] * n gives you the list with iter(s) in it n times e.g. [iter(s)] * 2 = [<listiterator object>, <listiterator object>] - the key here is that these are 2 references to the same iterator object, not 2 distinct iterator objects.
zip takes a number of sequences and returns a list of tuples where each tuple contains the ith element from each of the sequences. e.g. zip([1,2], [3,4], [5,6]) = [(1, 3, 5), (2, 4, 6)] where (1, 3, 5) are the first elements from the parameters passed to zip and (2, 4, 6) are the second elements from the parameters passed to zip.
The * in front of *[iter(s)]*n converts the [iter(s)]*n from being a list into being multiple parameters being passed to zip. so if n is 2 we get zip(<listiterator object>, <listiterator object>)
zip will request the next element from each of its parameters but because these are both references to the same iterator this will result in (1, 2), it does the same again resulting in (3, 4) and again resulting in (5, 6) and then there are no more elements so it stops. Hence the result [(1, 2), (3, 4), (5, 6)]. This is the clustering a data series into n-length groups as mentioned.
The expression from the docs looks like this:
zip(*[iter(s)]*n)
This is equivalent to:
it = iter(s)
zip(*[it, it, ..., it]) # n times
The [...]*n repeats the list n times, and this results in a list that contains nreferences to the same iterator.
This is again equal to:
it = iter(s)
zip(it, it, ..., it) # turning a list into positional parameters
The * before the list turns the list elements into positional parameters of the function call.
Now, when zip is called, it starts from left to right to call the iterators to obtain elements that should be grouped together. Since all parameters refer to the same iterator, this yields the first n elements of the initial sequence. Then that process continues for the second group in the resulting list, and so on.
The result is the same as if you had constructed the list like this (evaluated from left to right):
it = iter(s)
[(it.next(), it.next(), ..., it.next()), (it.next(), it.next(), ..., it.next()), ...]