Optimization on Python list comprehension

Optimization on Python list comprehension - python

[getattr(x, contact_field_map[communication_type])
for x in curr_role_group.contacts if
getattr(x, contact_field_map[communication_type])]
The above is my list comprehension. The initial function and the filter clause call getattr twice. Will Python run this twice or does it optimize the calculation internally knowing it can cache the result after the first call?
If Python doesn't do the optimization, how can I rewrite it to run faster?

Python will run the getattr twice -- It doesn't do any optimization (after all, how does it know that the first attribute fetch doesn't change the value of the second one?)
To optimize the query, you can do it in 2 stages. The first stage computes the values using a generator expression, the second stage filters those values:
gen = (getattr(x, contact_field_map[communication_type])
for x in curr_role_group.contacts)
result = [item for item in gen if item]

Give this a try:
[res for x in curr_role_group.contacts
for res in [getattr(x, contact_field_map[communication_type])] if res]
For example, instead of
[i**2 for i in range(10) if i**2 < 10]
Out: [0, 1, 4, 9]
You can do
[res for i in range(10) for res in [i**2] if res < 10]
Out: [0, 1, 4, 9]
Here, you are computing i**2 only once.

Could you use a generator for the entire thing? Something like:
def gen():
for x in curr_role_group.contacts:
value = getattr(x, contact_field_map[communication_type])
if value:
yield value
result = list(gen())

Related

Does Python calculate list comprehension condition in each step?

In a list comprehension with a condition that has a function call in it, does Python (specifically CPython 3.9.4) call the function each time, or does it calculate the value once and then uses it?
For example if you have:
list_1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
list_2 = [x for x in list_1 if x > np.average(list_1)]
Will Python actually calculate the np.average(list_1) len(list_1) times? So would it be more optimized to write
list_1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
np_avg = np.average(list_1)
list_2 = [x for x in list_1 if x > np_avg]
instead? Or does Python already "know" to just calculate the average beforehand?

Python has to call the function each time. It cannot optimize that part, because successive calls of the function might return different results (for example because of side effects). There is no easy way for Python’s compiler to be sure that this can’t happen.
Therefore, if you (the programmer) know that the result will always be the same – like in this case – it is probably advisable to calculate the result of the function in advance and use it inside the list comprehension.

Assuming standard CPython - Short answer: Yes. Your second snippet is more efficient.
A function call in the filter part of a list comprehension will be called for each element.
We can test this quite easily with a trivial example:
def f(value):
""" Allow even values only """
print('function called')
return value % 2 == 0
mylist = [x for x in range(5) if f(x)]
# 'function called' will be printed 5 times
The above is somewhat equivalent to doing:
mylist = []
for x in range(5):
if f(x):
mylist.append(x)
Since you're comparing against the same average each time, you can indeed just calculate it beforehand and use the same value as you did in your second code snippet.

How to use lambda with two parameters to find the element with the highest value?

I was given a task which was to find the largest value in a list using a lambda function. With the lambda function, I must have two parameters and I am having a hard time how to retrieve elements from a list to use in a lambda function.
I understand how to perform the task by defining a function but I do not know how to translate that to a lambda function.
myList = [1,2,3,4,5,6,7,8,9,10]
#Here I can find the max value by defining a function
def maxVal(list_a):
max = 0
for i in list_a:
if i > max:
max = i
return max
print(maxVal(myList))
#Here I attempt to use a lambda function to find the max value
maxMyList = map(lambda x,y: x[0] >= y[0], myList, myList)
print(maxMyList(myList,myList))
Edit: Sorry for any confusion as this is my first time posting here. Just for clarity, I CANNOT define any functions for use in this program. I just wanted to post the code for the defined function maxVal to show that I understand the logic of what I need to do. Thank you for all of your responses, I really appreciate it!

There is really nothing wrong with the way you were going about this. You just needed to incorporate your lambda into the rest of your code in the right way:
myList = [1,2,3,10,4,5,6,7,8,9]
def maxVal(list_a, compare_function):
max = -sys.maxsize - 1 # strange but true...this seems to be the canonical way to get the smallest 32 bit `int`
for i in list_a:
if compare_function(i, max):
max = i
return max
my_compare_function = lambda x,y: max(x, y)
print(maxVal(myList, my_compare_function))
Result:
10
I did have to change at what level return max was being called. I don't know if this was a logic error, or just a transcription error when you put your code into a S.O. question. Also note that I moved the 10 to a different place in your input data to make sure it didn't have to be at the end to be found to be the largest value.
If you want a "cooler" and more modern answer, reduce is really designed to do just what you want. Here's a simple solution.
import functools
myList = [1,2,3,10,4,5,6,7,8,9]
r = functools.reduce(lambda v1, v2: max(v1, v2), myList)
print(r)
Result:
10
Also note that 'max' itself is a function that takes two parameters, so you could use it directly to get to something very simple...not that this works in your case where your assignment is to use a lambda:
myList = [1,2,3,10,4,5,6,7,8,9]
r = functools.reduce(max, myList)
print(r)
These functional equivalents to writing loops are all the rage, and for good reason. When combined with processing data as streams, it's a super powerful idea. And as it becomes more popular, I think it will be thought to be more readable. As an old timer, I'm just getting on board with all of this functional/streams programming...more in Java than Python, but still. It's a powerful set of tools that would be to your benefit to really understand well.

You can define a function that takes a list as the first parameter and optionally a current maximum as the second parameter, and recursively calls itself with the first item in the list as the second argument if it's bigger than the current maximum or if the current maximum is None, and the rest of the list as the first argument, until the rest of the list is empty, at which point returns the current maximum:
maxVal = lambda lst, m=None: maxVal(lst[1:], m if m is not None and lst[0] < m else lst[0]) if lst else m
so that:
from random import shuffle
myList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
shuffle(myList)
print(maxVal(myList))
outputs: 10

max= (lambda x, y: x if x > y else y)
max(1, 2)

Python has a built-in function for sorting lists. It's sorted()
Ex: sorted([2, 8, 8, 3, 7, 4, 2, 93, 1]) = [1, 2, 2, 3, 4, 7, 8, 8, 93]
We can use the subscript [::-1] to reverse the list, so the completed lambda function is:
maxVal = lambda l: sorted(l)[::-1][0]

Using map to sum the elements of list

I was wondering if map can be used at all to sum the elements of a list.
assume a = [1, 2, 3, 4]
list(map(sum, a)) will give an error that int object is not iterable because list wants iterables.
map(sum, a) is a valid statement but given the object, I do not see an easy way to dereference it.
[map(sum, a)] will return an object inside the list
this answer states that it should be easy. What am I missing here?

map applies a function to every element in the list. Instead, you can use reduce:
a = [1, 2, 3, 4]
sum_a = reduce(lambda x, y:x+y, a)
In this case, purely sum can be used, however, to be more functional, reduce is a better option.
Or, in Python3:
from functools import reduce
a = [1, 2, 3, 4]
sum_a = reduce(lambda x, y:x+y, a)

x = list(map(sum,a))
Is equivalent to
x = []
for i in a:
x.append(sum(i))
Sum needs a iterable to apply sum across. If you see the docs syntax goes this way sum(iterable[, start]). Since int is not an iterable you get that error.

Of course if one just want to sum the elements of a list, he should simply call sum(list_).
Now, comming to your question: map, both the Python built-in and the pattern refer to applying a function to a data sequence, and yield another sequence, with a separate result for each element in the initial sequence.
sum does not do that - it yields a single result for the whole sequence. That pattern is called reduce, and so is the Python ex-built-in to do that. In Python 3, it was "demoted" to the functools module, as it is rarely used when compared to the map pattern.
The sum built-in itself employs the "reduce" pattern alone - but if you were to explicitly recreate sum using the reduce pattern it goes like:
from functools import reduce
a = [1, 2, 3, 4]
reduce(lambda result, value: result + value, a, 0)
The first parameter is a callable that takes the "accumulated result so far", the second value is the sequence of items you want to run reduce at, and the third parameter is the initial value to be passed as the accumulated result. (so,it starts at zero). For a multiplicatory, we could use:
reduce(lambda result, value: result * value, a, 1)
update: Python 3.8 implemented the "multiplicatory" in the standard library as math.prod.

The error int object is not iterable is not because list expects an iterable, but sum expected an iterable.
The following code:
map(sum , [1,2,3,4])
Is somewhat equivalent to:
[sum(x) for x in [1,2,3,4]]
Executing the last expression yields the same error.

reduce(lambda x,y:x+y, L) #summing all elements of a list L
Using map reduce and printing the elapsed time in seconds
import time
from six.moves import reduce
import numpy as np
start=time.time()
L = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
result = reduce(lambda x,y:x+y, L)
end=time.time()
print("sum of list L ", L, " is equal to", result)
print("elapsed time is ", end-start, ' seconds')
output:
sum of list L [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] is equal to 55
elapsed time is 0.00014519691467285156 seconds
using python's build-in sum function and elapsed time
start=time.time()
s = sum([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
end=time.time()
print("elapsed time is ", end-start, ' seconds')
output:
elapsed time is 9.226799011230469e-05 seconds
sum is a slightly faster method since 9e-05 is less than 1e-04

Here's one way to do it purely functionally.
from operator import add
from functools import reduce
result = reduce(add, a)

Indirectly you can add all the elements of a list using a map function using a global variable like below:
# reading the file
with open('numbers.txt') as f:
lines = [line.strip() for line in f]
numbers = [int(line) for line in lines]
all_sum = 0
def add(n):
global all_sum
all_sum += n
return(all_sum)
result = map(add, numbers)
print(list(result)[-1])
There is only one number in one line in the text file.

Feeling stupid while trying to implement lazy partitioning in Python

I am trying to implement lazy partitioning of an iterator object that yields slices of the iterator when a function on an element of the iterator changes values. This would mimick the behavior of Clojure's partition-by (although the semantics of the output would be different, since Python would genuinely "consume" the elements). My implementation is optimal in the number of operations it performs but not in memory it requires. I don't see why a good implementation would need more than O(1) memory, but my implementation takes up O(k) memory, where k is the size of the partition. I would like to be able to handle cases where k is large. Does anyone know of a good implementation?
The correct behavior should be something like
>>>unagi = [-1, 3, 4, 7, -2, 1, -3, -5]
>>> parts = partitionby(lambda x: x < 0,unagi)
>>> print [[y for y in x] for x in parts]
[[-1], [3, 4, 7], [-2], [1], [-3, -5]]
Here is my current version
from itertools import *
def partitionby(f,iterable):
seq = iter(iterable)
current = next(seq)
justseen = next(seq)
partition = iter([current])
while True:
if f(current) == f(justseen):
partition = chain(partition,iter([justseen]))
try:
justseen = next(seq)
except StopIteration:
yield partition
break
else:
yield partition
current = justseen
partition = iter([])

Why not reuse groupby? I think it is O(1).
def partitionby(f, iterable):
return (g[1] for g in groupby(iterable, f))
The difference of groupby's implementation with yours is that the partition is a specialized iterator object, instead of a chain of chain of chain ...

it was bugging me that partition could be a normal list and not an iterator i.e.:
partition = iter([current])
partition = chain(partition,iter([justseen]))
partition = iter([])
could be:
partition = [current]
partition.append(justseen)
partition = []

Using python to return a list of squared integers

I'm looking to write a function that takes the integers within a list, such as [1, 2, 3], and returns a new list with the squared integers; [1, 4, 9]
How would I go about this?
PS - just before I was about to hit submit I noticed Chapter 14 of O'Reilly's 'Learning Python' seems to provide the explanation I'm looking for (Pg. 358, 4th Edition)
But I'm still curious to see what other solutions are possible

You can (and should) use list comprehension:
squared = [x**2 for x in lst]
map makes one function call per element and while lambda expressions are quite handy, using map + lambda is mostly slower than list comprehension.
Python Patterns - An Optimization Anecdote is worth a read.

Besides lambda and list comprehensions, you can also use generators. List comprehension calculates all the squares when it's called, generators calculate each square as you iterate through the list. Generators are better when input size is large or when you're only using some initial part of the results.
def generate_squares(a):
for x in a:
yield x**2
# this is equivalent to above
b = (x**2 for x in a)

squared = lambda li: map(lambda x: x*x, li)

You should know about map built-in which takes a function as the first argument and an iterable as the second and returns a list consisting of items acted upon by the function.
For e.g.
>>> def sqr(x):
... return x*x
...
>>> map(sqr,range(1,10))
[1, 4, 9, 16, 25, 36, 49, 64, 81]
>>>
There is a better way of writing the sqr function above, namely using the nameless lambda having quirky syntax. (Beginners get confused looking for return stmt)
>>> map(lambda x: x*x,range(1,10))
[1, 4, 9, 16, 25, 36, 49, 64, 81]
Apart from that you can use list comprehension too.
result = [x*x for x in range(1,10)]

a = [1, 2, 3]
b = [x ** 2 for x in a]

good remark of kefeizhou, but then there is no need of a generator function, a generator expression is right:
for sq in (x*x for x in li):
# do

You can use lambda with map to get this.
lst=(3,8,6)
sqrs=map(lambda x:x**2,lst)
print sqrs

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Optimization on Python list comprehension - python

Could you use a generator for the entire thing? Something like: def gen(): for x in curr_role_group.contacts: value = getattr(x, contact_field_map[communication_type]) if value: yield value result = list(gen())

Related

Does Python calculate list comprehension condition in each step?

How to use lambda with two parameters to find the element with the highest value?

Using map to sum the elements of list

Feeling stupid while trying to implement lazy partitioning in Python

Using python to return a list of squared integers

Categories

Resources