Why is there a difference between the following two lines of code? - python

total = sum([float(item) for item in s.split(",")])
total = sum(float(item) for item in s.split(","))
Source: https://stackoverflow.com/a/21212727/1825083

The first one uses a list comprehension to build a list of all of the float values.
The second one uses a generator expression to build a generator that only builds each float value as requested, one a time. This saves a lot of memory when the list would be very large.
The generator expression may also be either faster (because it allows work to be pipelined, and avoids memory allocation times) or slower (because it adds a bit of overhead), but that's usually not a good reason to choose between them. Just follow this simple rule of thumb:
If you need a list (or, more likely, just something you can store, loop over multiple times, print out, etc.), build a list. If you just need to loop over the values, don't build a list.
In this case, obviously, you don't need a list, so leave the square brackets off.
In Python 2.x, there are some other minor differences; in 3.x, a list comprehension is actually defined as just calling the list function on a generator expression. (Although there is a minor bug in at least 3.0-3.3 which you will only find if you go looking for it very hard…)

The first one makes a list while the second one is a generator expression. Try them without the sum() function call.
In [25]: [float(a) for a in s.split(',')]
Out[25]: [1.23, 2.4, 3.123]
In [26]: (float(a) for a in s.split(','))
Out[26]: <generator object <genexpr> at 0x0698EF08>
In [27]: m = (float(a) for a in s.split(','))
In [28]: next(m)
Out[28]: 1.23
In [29]: next(m)
Out[29]: 2.4
In [30]: next(m)
Out[30]: 3.123
So, the first expression creates the whole list in memory first and then computes the sum whereas the second one just gets the next item in the expression and adds it to its current total. (More memory efficient)

As others have said, the first creates a list, while the second creates a generator that generates all the values. The reason you might care about this is that creating the list puts all the elements into memory at once, whereas with the generator, you can process them as they are generated without having to store them all, which might matter for very large amounts of data.

The first one creates a list, and the sums the numbers in the list. It is a list comprehension inside a sum
The second one computes each item in turn and adds it to a running total, and returns that running total as the sum, when all items are exhausted. This is a generator comprehension.
It does not create a list at all, which means that it doesn't take extra time to allocate memory for a list and populate it. This also means that it has a better space complexity, as it only uses constant space (for the call to float; aside from the call to split, which both line do)

Related

Compare whether the first two elements on a nested list are equal to a comparison list in python

In python 2.7, I would like to verify whether a subset list of elements is included in a longer nested list when comparing let's say only the first two elements.
Lets say we have a big list of nested elements (this big_list will have over 10k elements so looping for every comparison is very inefficient and I'd like to avoid this). For this example, lets say we only have 4 nested lists in big_list:
`
big_list = ((2,3,5,6,7), (4,5,6,7,8), (6,7,8,8), (8,4,2,7))
`
If I have a single list, let's say (4,5,11,11,11), I am looking for an operation that will return True when compared to big_list since the second list in big_list starts with (4,5,...) and matches the first two elements of my single_list. Essentially I want to know whether the first two elements of a single list (e.g. (4,5,11,11,11)) are repeated in my big list regardless of the other followed numbers (e.g. 11,11, ...).
My operation should also return False if another single_list (e.g. (4,8,11,11,11) ) does not match the first two element in the big_list.
I hope this is clearer. Any help?
Thanks in advance,
Since you have a huge list, to avoid iterating over the whole thing every time — O(n) time complexity for each search, you can do a constant time lookup using a set.
tup_truth_set = set([tup[:2] for tup in big_list]) # set with first two letters of interest
then you would simply do something like this to check in constant time:
tuple_of_interest[:2] in tup_truth_set
I don't think that you can avoid the loop over your list. Even if you don't run the loop yourself and suppose there is a built-in function, that I am not aware of and can do what you are asking, I am pretty sure it would loop the list in the background. So I suggest a single line of code to do that, including a loop, obviously.
(4,5,11,11,11)[:2] in [i[:2] for i in big_list]

similar list.append statements returning different results

I have the two expressions below, which to me are basically the same but the first line gives a list with generator inside rather than the values while the second one works fine.
I just wanted to know why this happens what is a generator and how its used.
newer_list.append([sum(i)] for i in new_list)
for i in new_list:
newer_list.append([sum(i)])
The first one has a generator expression (sum[i] for i in new_list), while the second one just loops, adding the sum.
It is possible you wanted something like newer_list.extend([sum(i) for i in new_list]), where extend concatenates lists instead of just appending, and the whole thing is wrapped in brackets so it's a list comprehension instead of a generator.
A generator is a way for Python to keep from storing everything in memory. The expression ([sum(i)] for i in new_list) is a formula for generating the items in a list. To keep from storing that list in memory, it just stores the function it would need to execute, which has less of a memory footprint.
To turn a generator into a list, you can just do list([sum(i)] for i in new_list), or in this case ([[sum(i)] for i in new_list])

Is this the most efficient way to vertically slice a list of dictionaries for unique values?

I've got a list of dictionaries, and I'm looking for a unique list of values for one of the keys.
This is what I came up with, but can't help but wonder if its efficient, time and/or memory wise:
list(set([d['key'] for d in my_list]))
Is there a better way?
This:
list(set([d['key'] for d in my_list]))
… constructs a list of all values, then constructs a set of just the unique values, then constructs a list out of the set.
Let's say you had 10000 items, of which 1000 are unique. You've reduced final storage from 10000 items to 1000, which is great—but you've increased peak storage from 10000 to 11000 (because there clearly has to be a time when the entire list and almost the entire set are both in memory simultaneously).
There are two very simple ways to avoid this.
First (as long as you've got Python 2.4 or later) use a generator expression instead of a list comprehension. In most cases, including this one, that's just a matter of removing the square brackets or turning them into parentheses:
list(set(d['key'] for d in my_list))
Or, even more simply (with Python 2.7 or later), just construct the set directly by using a set comprehension instead of a list comprehension:
list({d['key'] for d in my_list})
If you're stuck with Python 2.3 or earlier, you'll have to write an explicit loop. And with 2.2 or earlier, there are no sets, so you'll have to fake it with a dict mapping each key to None or similar.
Beyond space, what about time? Well, clearly you have to traverse the entire list of 10000 dictionaries, and do an O(1) dict.get for each one.
The original version does a list.append (actually a slightly faster internal equivalent) for each of those steps, and then the set conversion is a traversal of a list of the same size with a set.add for each one, and then the list conversion is a traversal of a smaller set with a list.append for each one. So, it's O(N), which is clearly optimal algorithmically, and only worse by a smallish multiplier than just iterating the list and doing nothing.
The set version skips over the list.appends, and only iterates once instead of twice. So, it's also O(N), but with an even smaller multiplier. And the savings in memory management (if N is big enough to matter) may help as well.

Efficient use of Python list comprehensions

I have a Python list of objects that could be pretty long. At particular times, I'm interested in all of the elements in the list that have a certain attribute, say flag, that evaluates to False. To do so, I've been using a list comprehension, like this:
objList = list()
# ... populate list
[x for x in objList if not x.flag]
Which seems to work well. After forming the sublist, I have a few different operations that I might need to do:
Subscript the sublist to get the element at index ind.
Calculate the length of the sublist (i.e. the number of elements that have flag == False).
Search the sublist for the first instance of a particular object (i.e. using the list's .index() method).
I've implemented these using the naive approach of just forming the sublist and then using its methods to get at the data I want. I'm wondering if there are more efficient ways to go about these. #1 and #3 at least seem like they could be optimized, because in #1 I only need the first ind + 1 matching elements of the sublist, not necessarily the entire result set, and in #3 I only need to search through the sublist until I find a matching element.
Is there a good Pythonic way to do this? I'm guessing I might be able to use the () syntax in some way to get a generator instead of creating the entire list, but I haven't happened upon the right way yet. I obviously could write loops manually, but I'm looking for something as elegant as the comprehension-based method.
If you need to do any of these operations a couple of times, the overhead of other methods will be higher, the list is the best way. It's also probably the clearest, so if memory isn't a problem, then I'd recommend just going with it.
If memory/speed is a problem, then there are alternatives - note that speed-wise, these might actually be slower, depending on the common case for your software.
For your scenarios:
#value = sublist[n]
value = nth(x for x in objList if not x.flag, n)
#value = len(sublist)
value = sum(not x.flag for x in objList)
#value = sublist.index(target)
value = next(dropwhile(lambda x: x != target, (x for x in objList if not x.flag)))
Using itertools.dropwhile() and the nth() recipe from the itertools docs.
I'm going to assume you might do any of these three things, and you might do them more than once.
In that case, what you want is basically to write a lazily evaluated list class. It would keep two pieces of data, a real list cache of evaluated items, and a generator of the rest. You could then do ll[10] and it would evaluate up to the 10th item, ll.index('spam') and it would evaluate until it finds 'spam', and then len(ll) and it would evaluate the rest of the list, all the while caching in the real list what it sees so nothing is done more than once.
Constructing it would look like this:
LazyList(x for x in obj_list if not x.flag)
But nothing would actually be computed until you actually start using it as above.
Since you commented that your objList can change, if you don't also need to index or search objList itself, then you might be better off just storing two different lists, one with .flag = True and one with .flag = False. Then you can use the second list directly instead of constructing it with a list comprehension each time.
If this works in your situation, it is likely the most efficient way to do it.

Do list comprehensions in Python reduce in a memory efficient manner?

I am a beginner at Python, and this is my first post, so don't be too harsh :). I've been playing around with Python lately and was wondering if something like
max([x for x in range(25)])
would result in Python first creating a list of all the elements and then finding the max, resulting in O(2n) time, or it would keep track of the max as it was iterating for Θ(n). Also, since range differs in Python3 (being a iterable), would that make it different than in Python2?
Your example will result in Python first building the entire list. If you want to avoid that, you can use a generator expression instead:
max((x for x in range(25)))
or simply:
max(x for x in range(25))
Of course (in Python 2), range itself builds an entire list, so what you really want in this case is:
max(x for x in xrange(25))
However, regarding the time taken, all these expressions have the same complexity. The important difference is that the last one requires O(1) space, whereas the others require O(n) space.
List comprehensions always generate a list (unless something throws an exception). Use of a genex instead is recommended in most cases.
max(x for x in xrange(25))

Categories

Resources