Question about combing the sum() with zip() - python

a=[1,2,3]
b=[3,4,5,2]
c=[60,70,80]
sum(zip(a,b,c),())
what's the logic for the sum function here? why does it return a single tuple? especially why the following won't work
sum(zip(a,b,c))

The sum() function, simply concatenates items together with "+" and an initial value. Likewise, the zip() function produces tupled items together. Explicitly:
list(zip(a,b,c)) # [(1, 3, 60), (2, 4, 70), (3, 5, 80)]
sum([1,2,3],0) # 0 + 1 + 2 + 3
sum(zip(a,b,c),()) # () + (1,3,60) + (2,4,70) + (3,5,80)
Hope this helps explain the sum() and zip() functions. zip() can be tricky to see what it is doing since it produces an iterator instead of an answer. If you want to see what zip() does, wrap it in a list().
The sum(zip(a,b,c)) fails because the default initial value is 0. Hence, python tried to do 0 + (1,3,60) + ..., which fails because a 0 cannot be added to a tuple.

The other answers are useful in resolving any confusion, but perhaps the result you might be looking for is achieved by doing this:
sum(a+b+c)
because the + operator when applied to lists, concatenates them into a single list whereas zip does not

zip() does not do what you think it does. sum() will add the items of its input and return the result. In your case, you want to sum numbers from 3 lists. zip() returns tuples containing elements of the same index from the inputs, and when the result of this is passed to sum, it concatenates the tuples, leaving you with your undesired result. The fix is to use itertools.chain to combine the lists, then use sum to sum the numbers in those lists.
To show exactly how zip() works, an example should be useful:
a = ["a", "b", "c"]
b = [1, 2, 3]
list(zip(a, b)) -> [('a', 1), ('b', 2), ('c', 3)]
zip returned a generator of tuples (converted to a list here), each containing the element from each input that corresponds to the index of the tuple in the result, i.e, list(zip(a, b))[index] == (a[index], b[index])
What you want is this:
sum(itertools.chain(a, b, c))
EDIT: Make sure to import itertools first.

Related

Fast, pythonic way to get all tuples obtained by dropping the elements from a given tuple?

Given a tuple T that contains all different integers, I want to get all the tuples that result from dropping individual integers from T. I came up with the following code:
def drop(T):
S = set(T)
for i in S:
yield tuple(S.difference({i}))
for t in drop((1,2,3)):
print(t)
# (2,3)
# (1,3)
# (1,2)
I'm not unhappy with this, but I wonder if there is a better/faster way because with large tuples, difference() needs to look for the item in the set, but I already know that I'll be removing items sequentially. However, this code is only 2x faster:
def drop(T):
for i in range(len(T)):
yield T[:i] + T[i+1:]
and in any case, neither scales linearly with the size of T.
Instead of looking at it as "remove one item each item" you can look at it as "use all but one" and then using itertools it becomes straightforward:
from itertools import combinations
T = (1, 2, 3, 4)
for t in combinations(T, len(T)-1):
print(t)
Which gives:
(1, 2, 3)
(1, 2, 4)
(1, 3, 4)
(2, 3, 4)
* Assuming the order doesn't really matter
From your description, you're looking for combinations of the elements of T. With itertools.combinations, you can ask for all r-length tuples, in sorted order, without repeated elements. For example :
import itertools
T = [1,2,3]
for i in itertools.combinations(T, len(T) - 1):
print(i)

How to sort list of tuples using both elements of tuple?

I want to sort by first element in tuple, and, if first element for some tuples is equal, by second element.
For example, I have [(5,1),(1,2),(1,1),(4,3)] and I want to get [(1,1),(1,2),(4,3),(5,1)]
How can I do it in pythonic way?
d = [(5,1),(1,2),(1,1),(4,3)]
print(sorted(d,key=lambda x:(x[0],x[1])))
if you want better performance use itemgetter
import operator
l = [(5,1),(1,2),(1,1),(4,3)]
print(sorted(l, key=operator.itemgetter(0,1))
You don't really need to specify the key since you want to sort on the list item itself.
>>> d = [(5,1),(1,2),(1,1),(4,3)]
>>> sorted(d)
[(1, 1), (1, 2), (4, 3), (5, 1)]
Remember, Sorted method will return a list object. In this case d still remains unsorted.
If you want to sort d you can use
>>>d.sort()
Hope it helps

Is there a point to using nested iterators?

I was reading through some older code of mine and came across this line
itertools.starmap(lambda x,y: x + (y,),
itertools.izip(itertools.repeat(some_tuple,
len(list_of_tuples)),
itertools.imap(lambda x: x[0],
list_of_tuples)))
To be clear, I have some list_of_tuples from which I want to get the first item out of each tuple (the itertools.imap), I have another tuple that I want to repeat (itertools.repeat) such that there is a copy for each tuple in list_of_tuples, and then I want to get new, longer tuples based on the items from list_of_tuples (itertools.starmap).
For example, suppose some_tuple = (1, 2, 3) and list_of_tuples = [(1, other_info), (5, other), (8, 12)]. I want something like [(1, 2, 3, 1), (1, 2, 3, 5), (1, 2, 3, 8)]. This isn't the exact IO (it uses some pretty irrelevant and complex classes) and my actual lists and tuples are very big.
Is there a point to nesting the iterators like this? It seems to me like each function from itertools would have to iterate over the iterator I gave it and store the information from it somewhere, meaning that there is no benefit to putting the other iterators inside of starmap. Am I just completely wrong? How does this work?
There is no reason to nest iterators. Using variables won't have a noticeable impact on performance/memory:
first_items = itertools.imap(lambda x: x[0], list_of_tuples)
repeated_tuple = itertools.repeat(some_tuple, len(list_of_tuples))
items = itertools.izip(repeated_tuple, first_items)
result = itertools.starmap(lambda x,y: x + (y,), items)
The iterator objects used and returned by itertools do not store all the items in memory, but simply calculate the next item when it is needed. You can read more about how they work here.
I don't believe the combobulation above is necessary in this case.
it appears to be equivalent to this generator expression:
(some_tuple + (y[0],) for y in list_of_tuples)
However occasionally itertools can have a performance advantage especially in cpython

Accessing grouped items in arrays

I'm new to Python and have a list of numbers. e.g.
5,10,32,35,64,76,23,53....
and I've grouped them into fours (5,10,32,35, 64,76,23,53 etc..) using the code from this post.
def group_iter(iterator, n=2, strict=False):
""" Transforms a sequence of values into a sequence of n-tuples.
e.g. [1, 2, 3, 4, ...] => [(1, 2), (3, 4), ...] (when n == 2)
If strict, then it will raise ValueError if there is a group of fewer
than n items at the end of the sequence. """
accumulator = []
for item in iterator:
accumulator.append(item)
if len(accumulator) == n: # tested as fast as separate counter
yield tuple(accumulator)
accumulator = [] # tested faster than accumulator[:] = []
# and tested as fast as re-using one list object
if strict and len(accumulator) != 0:
raise ValueError("Leftover values")
How can I access the individual arrays so that I can perform functions on them. For example, I'd like to get the average of the first values of every group (e.g. 5 and 64 in my example numbers).
Let's say you have the following tuple of tuples:
a=((5,10,32,35), (64,76,23,53))
To access the first element of each tuple, use a for-loop:
for i in a:
print i[0]
To calculate average for the first values:
elements=[i[0] for i in a]
avg=sum(elements)/float(len(elements))
Ok, this is yielding a tuple of four numbers each time it's iterated. So, convert the whole thing to a list:
L = list(group_iter(your_list, n=4))
Then you'll have a list of tuples:
>>> L
[(5, 10, 32, 35), (64, 76, 23, 53), ...]
You can get the first item in each tuple this way:
firsts = [tup[0] for tup in L]
(There are other ways, of course.)
You've created a tuple of tuples, or a list of tuples, or a list of lists, or a tuple of lists, or whatever...
You can access any element of any nested list directly:
toplist[x][y] # yields the yth element of the xth nested list
You can also access the nested structures by iterating over the top structure:
for list in lists:
print list[y]
Might be overkill for your application but you should check out my library, pandas. Stuff like this is pretty simple with the GroupBy functionality:
http://pandas.sourceforge.net/groupby.html
To do the 4-at-a-time thing you would need to compute a bucketing array:
import numpy as np
bucket_size = 4
n = len(your_list)
buckets = np.arange(n) // bucket_size
Then it's as simple as:
data.groupby(buckets).mean()

Binning into timeslots - Is there a better way than using list comp?

I have a dataset of events (tweets to be specific) that I am trying to bin / discretize. The following code seems to work fine so far (assuming 100 bins):
HOUR = timedelta(hours=1)
start = datetime.datetime(2009,01,01)
z = [dt + x*HOUR for x in xrange(1, 100)]
But then, I came across this fateful line at python docs 'This makes possible an idiom for clustering a data series into n-length groups using zip(*[iter(s)]*n)'. The zip idiom does indeed work - but I can't understand how (what is the * operator for instance?). How could I use to make my code prettier? I'm guessing this means I should make a generator / iterable for time that yields the time in graduations of an HOUR?
I will try to explain zip(*[iter(s)]*n) in terms of a simpler example:
imagine you have the list s = [1, 2, 3, 4, 5, 6]
iter(s) gives you a listiterator object that will yield the next number from s each time you ask for an element.
[iter(s)] * n gives you the list with iter(s) in it n times e.g. [iter(s)] * 2 = [<listiterator object>, <listiterator object>] - the key here is that these are 2 references to the same iterator object, not 2 distinct iterator objects.
zip takes a number of sequences and returns a list of tuples where each tuple contains the ith element from each of the sequences. e.g. zip([1,2], [3,4], [5,6]) = [(1, 3, 5), (2, 4, 6)] where (1, 3, 5) are the first elements from the parameters passed to zip and (2, 4, 6) are the second elements from the parameters passed to zip.
The * in front of *[iter(s)]*n converts the [iter(s)]*n from being a list into being multiple parameters being passed to zip. so if n is 2 we get zip(<listiterator object>, <listiterator object>)
zip will request the next element from each of its parameters but because these are both references to the same iterator this will result in (1, 2), it does the same again resulting in (3, 4) and again resulting in (5, 6) and then there are no more elements so it stops. Hence the result [(1, 2), (3, 4), (5, 6)]. This is the clustering a data series into n-length groups as mentioned.
The expression from the docs looks like this:
zip(*[iter(s)]*n)
This is equivalent to:
it = iter(s)
zip(*[it, it, ..., it]) # n times
The [...]*n repeats the list n times, and this results in a list that contains nreferences to the same iterator.
This is again equal to:
it = iter(s)
zip(it, it, ..., it) # turning a list into positional parameters
The * before the list turns the list elements into positional parameters of the function call.
Now, when zip is called, it starts from left to right to call the iterators to obtain elements that should be grouped together. Since all parameters refer to the same iterator, this yields the first n elements of the initial sequence. Then that process continues for the second group in the resulting list, and so on.
The result is the same as if you had constructed the list like this (evaluated from left to right):
it = iter(s)
[(it.next(), it.next(), ..., it.next()), (it.next(), it.next(), ..., it.next()), ...]

Categories

Resources