Does Python calculate list comprehension condition in each step? - python

In a list comprehension with a condition that has a function call in it, does Python (specifically CPython 3.9.4) call the function each time, or does it calculate the value once and then uses it?
For example if you have:
list_1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
list_2 = [x for x in list_1 if x > np.average(list_1)]
Will Python actually calculate the np.average(list_1) len(list_1) times? So would it be more optimized to write
list_1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
np_avg = np.average(list_1)
list_2 = [x for x in list_1 if x > np_avg]
instead? Or does Python already "know" to just calculate the average beforehand?

Python has to call the function each time. It cannot optimize that part, because successive calls of the function might return different results (for example because of side effects). There is no easy way for Python’s compiler to be sure that this can’t happen.
Therefore, if you (the programmer) know that the result will always be the same – like in this case – it is probably advisable to calculate the result of the function in advance and use it inside the list comprehension.

Assuming standard CPython - Short answer: Yes. Your second snippet is more efficient.
A function call in the filter part of a list comprehension will be called for each element.
We can test this quite easily with a trivial example:
def f(value):
""" Allow even values only """
print('function called')
return value % 2 == 0
mylist = [x for x in range(5) if f(x)]
# 'function called' will be printed 5 times
The above is somewhat equivalent to doing:
mylist = []
for x in range(5):
if f(x):
mylist.append(x)
Since you're comparing against the same average each time, you can indeed just calculate it beforehand and use the same value as you did in your second code snippet.

Related

How to use lambda with two parameters to find the element with the highest value?

I was given a task which was to find the largest value in a list using a lambda function. With the lambda function, I must have two parameters and I am having a hard time how to retrieve elements from a list to use in a lambda function.
I understand how to perform the task by defining a function but I do not know how to translate that to a lambda function.
myList = [1,2,3,4,5,6,7,8,9,10]
#Here I can find the max value by defining a function
def maxVal(list_a):
max = 0
for i in list_a:
if i > max:
max = i
return max
print(maxVal(myList))
#Here I attempt to use a lambda function to find the max value
maxMyList = map(lambda x,y: x[0] >= y[0], myList, myList)
print(maxMyList(myList,myList))
Edit: Sorry for any confusion as this is my first time posting here. Just for clarity, I CANNOT define any functions for use in this program. I just wanted to post the code for the defined function maxVal to show that I understand the logic of what I need to do. Thank you for all of your responses, I really appreciate it!
There is really nothing wrong with the way you were going about this. You just needed to incorporate your lambda into the rest of your code in the right way:
myList = [1,2,3,10,4,5,6,7,8,9]
def maxVal(list_a, compare_function):
max = -sys.maxsize - 1 # strange but true...this seems to be the canonical way to get the smallest 32 bit `int`
for i in list_a:
if compare_function(i, max):
max = i
return max
my_compare_function = lambda x,y: max(x, y)
print(maxVal(myList, my_compare_function))
Result:
10
I did have to change at what level return max was being called. I don't know if this was a logic error, or just a transcription error when you put your code into a S.O. question. Also note that I moved the 10 to a different place in your input data to make sure it didn't have to be at the end to be found to be the largest value.
If you want a "cooler" and more modern answer, reduce is really designed to do just what you want. Here's a simple solution.
import functools
myList = [1,2,3,10,4,5,6,7,8,9]
r = functools.reduce(lambda v1, v2: max(v1, v2), myList)
print(r)
Result:
10
Also note that 'max' itself is a function that takes two parameters, so you could use it directly to get to something very simple...not that this works in your case where your assignment is to use a lambda:
myList = [1,2,3,10,4,5,6,7,8,9]
r = functools.reduce(max, myList)
print(r)
These functional equivalents to writing loops are all the rage, and for good reason. When combined with processing data as streams, it's a super powerful idea. And as it becomes more popular, I think it will be thought to be more readable. As an old timer, I'm just getting on board with all of this functional/streams programming...more in Java than Python, but still. It's a powerful set of tools that would be to your benefit to really understand well.
You can define a function that takes a list as the first parameter and optionally a current maximum as the second parameter, and recursively calls itself with the first item in the list as the second argument if it's bigger than the current maximum or if the current maximum is None, and the rest of the list as the first argument, until the rest of the list is empty, at which point returns the current maximum:
maxVal = lambda lst, m=None: maxVal(lst[1:], m if m is not None and lst[0] < m else lst[0]) if lst else m
so that:
from random import shuffle
myList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
shuffle(myList)
print(maxVal(myList))
outputs: 10
max= (lambda x, y: x if x > y else y)
max(1, 2)
Python has a built-in function for sorting lists. It's sorted()
Ex: sorted([2, 8, 8, 3, 7, 4, 2, 93, 1]) = [1, 2, 2, 3, 4, 7, 8, 8, 93]
We can use the subscript [::-1] to reverse the list, so the completed lambda function is:
maxVal = lambda l: sorted(l)[::-1][0]

Using map to sum the elements of list

I was wondering if map can be used at all to sum the elements of a list.
assume a = [1, 2, 3, 4]
list(map(sum, a)) will give an error that int object is not iterable because list wants iterables.
map(sum, a) is a valid statement but given the object, I do not see an easy way to dereference it.
[map(sum, a)] will return an object inside the list
this answer states that it should be easy. What am I missing here?
map applies a function to every element in the list. Instead, you can use reduce:
a = [1, 2, 3, 4]
sum_a = reduce(lambda x, y:x+y, a)
In this case, purely sum can be used, however, to be more functional, reduce is a better option.
Or, in Python3:
from functools import reduce
a = [1, 2, 3, 4]
sum_a = reduce(lambda x, y:x+y, a)
x = list(map(sum,a))
Is equivalent to
x = []
for i in a:
x.append(sum(i))
Sum needs a iterable to apply sum across. If you see the docs syntax goes this way sum(iterable[, start]). Since int is not an iterable you get that error.
Of course if one just want to sum the elements of a list, he should simply call sum(list_).
Now, comming to your question: map, both the Python built-in and the pattern refer to applying a function to a data sequence, and yield another sequence, with a separate result for each element in the initial sequence.
sum does not do that - it yields a single result for the whole sequence. That pattern is called reduce, and so is the Python ex-built-in to do that. In Python 3, it was "demoted" to the functools module, as it is rarely used when compared to the map pattern.
The sum built-in itself employs the "reduce" pattern alone - but if you were to explicitly recreate sum using the reduce pattern it goes like:
from functools import reduce
a = [1, 2, 3, 4]
reduce(lambda result, value: result + value, a, 0)
The first parameter is a callable that takes the "accumulated result so far", the second value is the sequence of items you want to run reduce at, and the third parameter is the initial value to be passed as the accumulated result. (so,it starts at zero). For a multiplicatory, we could use:
reduce(lambda result, value: result * value, a, 1)
update: Python 3.8 implemented the "multiplicatory" in the standard library as math.prod.
The error int object is not iterable is not because list expects an iterable, but sum expected an iterable.
The following code:
map(sum , [1,2,3,4])
Is somewhat equivalent to:
[sum(x) for x in [1,2,3,4]]
Executing the last expression yields the same error.
reduce(lambda x,y:x+y, L) #summing all elements of a list L
Using map reduce and printing the elapsed time in seconds
import time
from six.moves import reduce
import numpy as np
start=time.time()
L = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
result = reduce(lambda x,y:x+y, L)
end=time.time()
print("sum of list L ", L, " is equal to", result)
print("elapsed time is ", end-start, ' seconds')
output:
sum of list L [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] is equal to 55
elapsed time is 0.00014519691467285156 seconds
using python's build-in sum function and elapsed time
start=time.time()
s = sum([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
end=time.time()
print("elapsed time is ", end-start, ' seconds')
output:
elapsed time is 9.226799011230469e-05 seconds
sum is a slightly faster method since 9e-05 is less than 1e-04
Here's one way to do it purely functionally.
from operator import add
from functools import reduce
result = reduce(add, a)
Indirectly you can add all the elements of a list using a map function using a global variable like below:
# reading the file
with open('numbers.txt') as f:
lines = [line.strip() for line in f]
numbers = [int(line) for line in lines]
all_sum = 0
def add(n):
global all_sum
all_sum += n
return(all_sum)
result = map(add, numbers)
print(list(result)[-1])
There is only one number in one line in the text file.

Optimization on Python list comprehension

[getattr(x, contact_field_map[communication_type])
for x in curr_role_group.contacts if
getattr(x, contact_field_map[communication_type])]
The above is my list comprehension. The initial function and the filter clause call getattr twice. Will Python run this twice or does it optimize the calculation internally knowing it can cache the result after the first call?
If Python doesn't do the optimization, how can I rewrite it to run faster?
Python will run the getattr twice -- It doesn't do any optimization (after all, how does it know that the first attribute fetch doesn't change the value of the second one?)
To optimize the query, you can do it in 2 stages. The first stage computes the values using a generator expression, the second stage filters those values:
gen = (getattr(x, contact_field_map[communication_type])
for x in curr_role_group.contacts)
result = [item for item in gen if item]
Give this a try:
[res for x in curr_role_group.contacts
for res in [getattr(x, contact_field_map[communication_type])] if res]
For example, instead of
[i**2 for i in range(10) if i**2 < 10]
Out: [0, 1, 4, 9]
You can do
[res for i in range(10) for res in [i**2] if res < 10]
Out: [0, 1, 4, 9]
Here, you are computing i**2 only once.
Could you use a generator for the entire thing? Something like:
def gen():
for x in curr_role_group.contacts:
value = getattr(x, contact_field_map[communication_type])
if value:
yield value
result = list(gen())

What is the difference between `sorted(list)` vs `list.sort()`?

list.sort() sorts the list and replaces the original list, whereas sorted(list) returns a sorted copy of the list, without changing the original list.
When is one preferred over the other?
Which is more efficient? By how much?
Can a list be reverted to the unsorted state after list.sort() has been performed?
Please use Why do these list operations (methods) return None, rather than the resulting list? to close questions where OP has inadvertently assigned the result of .sort(), rather than using sorted or a separate statement. Proper debugging would reveal that .sort() had returned None, at which point "why?" is the remaining question.
sorted() returns a new sorted list, leaving the original list unaffected. list.sort() sorts the list in-place, mutating the list indices, and returns None (like all in-place operations).
sorted() works on any iterable, not just lists. Strings, tuples, dictionaries (you'll get the keys), generators, etc., returning a list containing all elements, sorted.
Use list.sort() when you want to mutate the list, sorted() when you want a new sorted object back. Use sorted() when you want to sort something that is an iterable, not a list yet.
For lists, list.sort() is faster than sorted() because it doesn't have to create a copy. For any other iterable, you have no choice.
No, you cannot retrieve the original positions. Once you called list.sort() the original order is gone.
What is the difference between sorted(list) vs list.sort()?
list.sort mutates the list in-place & returns None
sorted takes any iterable & returns a new list, sorted.
sorted is equivalent to this Python implementation, but the CPython builtin function should run measurably faster as it is written in C:
def sorted(iterable, key=None):
new_list = list(iterable) # make a new list
new_list.sort(key=key) # sort it
return new_list # return it
when to use which?
Use list.sort when you do not wish to retain the original sort order
(Thus you will be able to reuse the list in-place in memory.) and when
you are the sole owner of the list (if the list is shared by other code
and you mutate it, you could introduce bugs where that list is used.)
Use sorted when you want to retain the original sort order or when you
wish to create a new list that only your local code owns.
Can a list's original positions be retrieved after list.sort()?
No - unless you made a copy yourself, that information is lost because the sort is done in-place.
"And which is faster? And how much faster?"
To illustrate the penalty of creating a new list, use the timeit module, here's our setup:
import timeit
setup = """
import random
lists = [list(range(10000)) for _ in range(1000)] # list of lists
for l in lists:
random.shuffle(l) # shuffle each list
shuffled_iter = iter(lists) # wrap as iterator so next() yields one at a time
"""
And here's our results for a list of randomly arranged 10000 integers, as we can see here, we've disproven an older list creation expense myth:
Python 2.7
>>> timeit.repeat("next(shuffled_iter).sort()", setup=setup, number = 1000)
[3.75168503401801, 3.7473005310166627, 3.753129180986434]
>>> timeit.repeat("sorted(next(shuffled_iter))", setup=setup, number = 1000)
[3.702025591977872, 3.709248117986135, 3.71071034099441]
Python 3
>>> timeit.repeat("next(shuffled_iter).sort()", setup=setup, number = 1000)
[2.797430992126465, 2.796825885772705, 2.7744789123535156]
>>> timeit.repeat("sorted(next(shuffled_iter))", setup=setup, number = 1000)
[2.675589084625244, 2.8019039630889893, 2.849375009536743]
After some feedback, I decided another test would be desirable with different characteristics. Here I provide the same randomly ordered list of 100,000 in length for each iteration 1,000 times.
import timeit
setup = """
import random
random.seed(0)
lst = list(range(100000))
random.shuffle(lst)
"""
I interpret this larger sort's difference coming from the copying mentioned by Martijn, but it does not dominate to the point stated in the older more popular answer here, here the increase in time is only about 10%
>>> timeit.repeat("lst[:].sort()", setup=setup, number = 10000)
[572.919036605, 573.1384446719999, 568.5923951]
>>> timeit.repeat("sorted(lst[:])", setup=setup, number = 10000)
[647.0584738299999, 653.4040515829997, 657.9457361929999]
I also ran the above on a much smaller sort, and saw that the new sorted copy version still takes about 2% longer running time on a sort of 1000 length.
Poke ran his own code as well, here's the code:
setup = '''
import random
random.seed(12122353453462456)
lst = list(range({length}))
random.shuffle(lst)
lists = [lst[:] for _ in range({repeats})]
it = iter(lists)
'''
t1 = 'l = next(it); l.sort()'
t2 = 'l = next(it); sorted(l)'
length = 10 ** 7
repeats = 10 ** 2
print(length, repeats)
for t in t1, t2:
print(t)
print(timeit(t, setup=setup.format(length=length, repeats=repeats), number=repeats))
He found for 1000000 length sort, (ran 100 times) a similar result, but only about a 5% increase in time, here's the output:
10000000 100
l = next(it); l.sort()
610.5015971539542
l = next(it); sorted(l)
646.7786222379655
Conclusion:
A large sized list being sorted with sorted making a copy will likely dominate differences, but the sorting itself dominates the operation, and organizing your code around these differences would be premature optimization. I would use sorted when I need a new sorted list of the data, and I would use list.sort when I need to sort a list in-place, and let that determine my usage.
The main difference is that sorted(some_list) returns a new list:
a = [3, 2, 1]
print sorted(a) # new list
print a # is not modified
and some_list.sort(), sorts the list in place:
a = [3, 2, 1]
print a.sort() # in place
print a # it's modified
Note that since a.sort() doesn't return anything, print a.sort() will print None.
Can a list original positions be retrieved after list.sort()?
No, because it modifies the original list.
Here are a few simple examples to see the difference in action:
See the list of numbers here:
nums = [1, 9, -3, 4, 8, 5, 7, 14]
When calling sorted on this list, sorted will make a copy of the list. (Meaning your original list will remain unchanged.)
Let's see.
sorted(nums)
returns
[-3, 1, 4, 5, 7, 8, 9, 14]
Looking at the nums again
nums
We see the original list (unaltered and NOT sorted.). sorted did not change the original list
[1, 2, -3, 4, 8, 5, 7, 14]
Taking the same nums list and applying the sort function on it, will change the actual list.
Let's see.
Starting with our nums list to make sure, the content is still the same.
nums
[-3, 1, 4, 5, 7, 8, 9, 14]
nums.sort()
Now the original nums list is changed and looking at nums we see our original list has changed and is now sorted.
nums
[-3, 1, 2, 4, 5, 7, 8, 14]
Note: Simplest difference between sort() and sorted() is: sort()
doesn't return any value while, sorted() returns an iterable list.
sort() doesn't return any value.
The sort() method just sorts the elements of a given list in a specific order - Ascending or Descending without returning any value.
The syntax of sort() method is:
list.sort(key=..., reverse=...)
Alternatively, you can also use Python's in-built function sorted()
for the same purpose. sorted function return sorted list
list=sorted(list, key=..., reverse=...)
The .sort() function stores the value of new list directly in the list variable; so answer for your third question would be NO.
Also if you do this using sorted(list), then you can get it use because it is not stored in the list variable. Also sometimes .sort() method acts as function, or say that it takes arguments in it.
You have to store the value of sorted(list) in a variable explicitly.
Also for short data processing the speed will have no difference; but for long lists; you should directly use .sort() method for fast work; but again you will face irreversible actions.
With list.sort() you are altering the list variable but with sorted(list) you are not altering the variable.
Using sort:
list = [4, 5, 20, 1, 3, 2]
list.sort()
print(list)
print(type(list))
print(type(list.sort())
Should return this:
[1, 2, 3, 4, 5, 20]
<class 'NoneType'>
But using sorted():
list = [4, 5, 20, 1, 3, 2]
print(sorted(list))
print(list)
print(type(sorted(list)))
Should return this:
[1, 2, 3, 4, 5, 20]
[4, 5, 20, 1, 3, 2]
<class 'list'>

Optimized method of cutting/slicing sorted lists

Is there any pre-made optimized tool/library in Python to cut/slice lists for values "less than" something?
Here's the issue: Let's say I have a list like:
a=[1,3,5,7,9]
and I want to delete all the numbers which are <= 6, so the resulting list would be
[7,9]
6 is not in the list, so I can't use the built-in index(6) method of the list. I can do things like:
#!/usr/bin/env python
a = [1, 3, 5, 7, 9]
cut=6
for i in range(len(a)-1, -2, -1):
if a[i] <= cut:
break
b = a[i+1:]
print "Cut list: %s" % b
which would be fairly quick method if the index to cut from is close to the end of the list, but which will be inefficient if the item is close to the beginning of the list (let's say, I want to delete all the items which are >2, there will be a lot of iterations).
I can also implement my own find method using binary search or such, but I was wondering if there's a more... wide-scope built in library to handle this type of things that I could reuse in other cases (for instance, if I need to delete all the number which are >=6).
Thank you in advance.
You can use the bisect module to perform a sorted search:
>>> import bisect
>>> a[bisect.bisect_left(a, 6):]
[7, 9]
bisect.bisect_left is what you are looking for, I guess.
If you just want to filter the list for all elements that fulfil a certain criterion, then the most straightforward way is to use the built-in filter function.
Here is an example:
a_list = [10,2,3,8,1,9]
# filter all elements smaller than 6:
filtered_list = filter(lambda x: x<6, a_list)
the filtered_list will contain:
[2, 3, 1]
Note: This method does not rely on the ordering of the list, so for very large lists it might be that a method optimised for ordered searching (as bisect) performs better in terms of speed.
Bisect left and right helper function
#!/usr/bin/env python3
import bisect
def get_slice(list_, left, right):
return list_[
bisect.bisect_left(list_, left):
bisect.bisect_left(list_, right)
]
assert get_slice([0, 1, 1, 3, 4, 4, 5, 6], 1, 5) == [1, 1, 3, 4, 4]
Tested in Ubuntu 16.04, Python 3.5.2.
Adding to Jon's answer, if you need to actually delete the elements less than 6 and want to keep the same reference to the list, rather than returning a new one.
del a[:bisect.bisect_right(a,6)]
You should note as well that bisect will only work on a sorted list.

Categories

Resources