I encounter this problem when I want to adopt lazy object pattern, and at some point when some function (one of user inputs) might be constant function. I want to check if the function is constant before feeding it into the loop.
My current solution is a somewhat ugly workaround using np.allclose:
def is_constant(func, arr):
return np.allclose(fun(arr), func(arr[0]))
you can also use things like np.maximum == np.minimum kinds of stuff, which can work slightly faster.
But I was wondering if there is any fast way to do that? As the above was still calculate the function over a somewhat large array already.
Related
I'm working on mortgage calculations and many functions calculate multiple different values that I need to return.
For example:
def calculate_something(incomes, debts):
...
return {'value':150000, 'extra':{'dsti_gross':450, 'dsti_supergross':480}}
has many intermediate calculations that are useful in other calculations. These functions are not divisible into smaller functions as it would make the code slower.
I don't want to return tuples as the code might change in the future and I'd need to check all the function calls.
Dictionary is now a way to go but dictionary value calls aren't statically analyzed (most IDEs) so you always need to check the function if you access the correct keys.
Is there another way to work with such returns? Considering classes/objects but every function returns different keys.
To expand on the suggestion to use classes and subclasses, you can take advantage of python dataclasses, which are pretty much designed for such a problem.
I'm using Python 3.8. I'm trying to stop using loops and instead use vectorization to speed up my code. I'm not too sure how to vectorize an equation that uses the result from the step before.
I know how to do basic vectorization like change this:
for i in range(5):
j=i*2
into this
i=range(5)
j=i*2
but how would I translate something like this, that uses the index from the previous step into a vectorized equation?
j=0
for i in range(1,5):
k=i*2+j
j=i
If a value in the vector depends on previous components, this is not possible to fully parallelise. Depending on the particular operation, however, you can make the algorithm a bit more efficient by using other operations in a smart way: For instance, here the cummulative sum is used to make the computation a bit better than a naive for loop. And here you have another alternative (and a benchmark, although it's done in MATLAB)
I have many if statements and I want to find the most optimized sequence for them.
Each case has a different time complexity and gets triggered a different amount of times. For example, case1 might be Θ(n^2) vs. case3 of Θ(log(n)), but the second one stops the function earlier so it could be better to place it first.
How would I go about finding the most efficient way to order the if statements?
def function1():
if case1:
return False
if case2:
return False
if case3:
return False
# caseN...
return True
To make an informed design decision, you need to define your goal:
If the function should return as fast as possible, then the fastest case should probably go first.
But if you want the function to return fast on average with all input data, then you probably need to know which case is more likely for your input data; the case where you expect the most input data to fall into, that one should probably be first. That way, the average run time of the function could be reduced.
But to make the best decision you need to consider a lot of factors:
the type of your data (sets, lists, strings, ...)
the size of the input and how many times the function will be invoked
the operations that are done in each case (is it a search, some heavy computation, a simple arithmetic operation, ...)
investigate the possibility to include as many shortcircuit conditions in your cases to have them fail early
If you want more help, then you should are more specific information to your question.
What is your goal? Is it the reduction of the average time, the worst-case time or some other measure?
Minimum average time:
If you have statistical information about the necessary time for each case, of its relative probability and the time for a non-satisfied if condition, you can calculate the expectation value for each sequence of if-clauses and choose the best one.
(If applicable, avoiding the sequence of if-clauses - as Austin and Shadab Ahmed have already proposed - would be a good idea.)
Try below code if you want something like switch
def f(x):
return {
'a': 1,
'b': 2,
}[x]
If you want to run all if conditions then you can try to run the functions in parallel and the worst time complexity will be the max of all the functions. example for running functions in parallel Python: How can I run python functions in parallel?
I am new-ish to Python and I am finding that I am writing the same pattern of code over and over again:
def foo(list):
results = []
for n in list:
#do some or a lot of processing on N and possibly other variables
nprime = operation(n)
results.append(nprime)
return results
I am thinking in particular about the creation of the empty list followed by the append call. Is there a more Pythonic way to express this pattern? append might not have the best performance characteristics, but I am not sure how else I would approach it in Python.
I often know exactly the length of my output, so calling append each time seems like it might be causing memory fragmentation, or performance problems, but I am also wondering if that is just my old C ways tripping me up. I am writing a lot of text parsing code that isn't super performance sensitive on any particular loop or piece because all of the performance is really contained in gensim or NLTK code and is in much more capable hands than mine.
Is there a better/more pythonic pattern for doing this type of operation?
First, a list comprehension may be all you need (if all the processing mentioned in your comment occurs in operation.
def foo(list):
return [operation(n) for n in list]
If a list comprehension will not work in your situation, consider whether foo really needs to build the list and could be a generator instead.
def foo(list):
for n in list:
# Processing...
yield operation(n)
In this case, you can iterate over the sequence, and each value is calculated on demand:
for x in foo(myList):
...
or you can let the caller decide if a full list is needed:
results = list(foo())
If neither of the above is suitable, then building up the return list in the body of the loop as you are now is perfectly reasonable.
[..] so calling append each time seems like it might be causing memory fragmentation, or performance problems, but I am also wondering if that is just my old C ways tripping me up.
If you are worried about this, don't. Python over-allocates when a new resizing of the list is required (lists are dynamically resized based on their size) in order to perform O(1) appends. Either you manually call list.append or build it with a list comprehension (which internally also uses .append) the effect, memory wise, is similar.
The list-comprehension just performs (speed wise) a bit better; it is optimized for creating lists with specialized byte-code instructions that aid it (LIST_APPEND mainly that directly calls lists append in C).
Of course, if memory usage is of concern, you could always opt for the generator approach as highlighted in chepners answer to lazily produce your results.
In the end, for loops are still great. They might seem clunky in comparison to comprehensions and maps but they still offer a recognizable and readable way to achieve a goal. for loops deserve our love too.
I find myself constantly having to change and adapt old code back and forth repeatedly for different purposes, but occasionally to implement the same purpose it had two versions ago.
One example of this is a function which deals with prime numbers. Sometimes what I need from it is a list of n primes. Sometimes what I need is the nth prime. Maybe I'll come across a third need from the function down the road.
Any way I do it though I have to do the same processes but just return different values. I thought there must be a better way to do this than just constantly changing the same code. The possible alternatives I have come up with are:
Return a tuple or a list, but this seems kind of messy since there will be all kinds of data types within including lists of thousands of items.
Use input statements to direct the code, though I would rather just have it do everything for me when I click run.
Figure out how to utilize class features to return class properties and access them where I need them. This seems to be the cleanest solution to me, but I am not sure since I am still new to this.
Just make five versions of every reusable function.
I don't want to be a bad programmer, so which choice is the correct choice? Or maybe there is something I could do which I have not thought of.
Modular, reusable code
Your question is indeed important. It's important in a programmers everyday life. It is the question:
Is my code reusable?
If it's not, you will run into code redundancies, having the same lines of code in more than one place. This is the best starting point for bugs. Imagine you want to change the behavior somehow, e.g., because you discovered a potential problem. Then you change it in one place, but you will forget the second location. Especially if your code reaches dimensions like 1,000, 10,0000 or 100,000 lines of code.
It is summarized in the SRP, the Single-Responsibilty-Principle. It states that every class (also applicable to functions) should only have one determination, that it "should do just one thing". If a function does more than one thing, you should break it apart into smaller chunks, smaller tasks.
Every time you come across (or write) a function with more than 10 or 20 lines of (real) code, you should be skeptical. Such functions rarely stick to this principle.
For your example, you could identify as individual tasks:
generate prime numbers, one by one (generate implies using yield for me)
collect n prime numbers. Uses 1. and puts them into a list
get nth prime number. Uses 1., but does not save every number, just waits for the nth. Does not consume as much memory as 2. does.
Find pairs of primes: Uses 1., remembers the previous number and, if the difference to the current number is two, yields this pair
collect all pairs of primes: Uses 4. and puts them into a list
...
...
The list is extensible, and you can reuse it at any level. Every function will not have more than 10 lines of code, and you will not be reinventing the wheel everytime.
Put them all into a module, and use it from every script for an Euler Problem related to primes.
In general, I started a small library for my Euler Problem scripts. You really can get used to writing reusable code in "Project Euler".
Keyword arguments
Another option you didn't mention (as far as I understand) is the use of optional keyword arguments. If you regard small, atomic functions as too complicated (though I really insist you should get used to it) you could add a keyword argument to control the return value. E.g., in some scipy functions there is a parameter full_output, that takes a bool. If it's False (default), only the most important information is returned (e.g., an optimized value), if it's True some supplementary information is returned as well, e.g., how well the optimization performed and how many iterations it took to converge.
You could define a parameter output_mode, with possible values "list", "last" ord whatever.
Recommendation
Stick to small, reusable chunks of code. Getting used to this is one of the most valuable things you can pick up at "Project Euler".
Remark
If you try to implement the pattern I propose for reusable functions, you might run into a problem immediately at point 1: How to create a generator-style function for this? E.g., if you use the sieve method. But it's not too bad.
My guess, create module that contain:
private core function (example: return list of n-th first primes or even something more generall)
several wrapper/util functions that use core one and prepare output different ways. (example: n-th prime number)
Try to reduce your functions as much as possible, and reuse them.
For example you might have a function next_prime which is called repeatedly by n_primes and n_th_prime.
This also makes your code more maintainable, as if you come up with a more efficient way to count primes, all you do is change the code in next_prime.
Furthermore you should make your output as neutral as possible. If you're function returns several values, it should return a list or a generator, not a comma separated string.