Optimizing Python polynomial evaluation - python

I have a function which evaluates terms of a polynomial in several variables. The inputs are lists of powers of each variable. For example, for two variables and 2nd order it looks like this,
def f(x,y):
return [1, x[1], y[1], x[1]*y[1], x[2], y[2]]
x = [2**0, 2**1, 2**2]
y = [3**0, 3**1, 3**2]
>>> f(x,y)
[1,2,3,6,4,9]
In reality the function is higher order and has many variables so on average there are a few thousand terms (in fact, I create the function at run time with an eval statement, but that's not important). The function is on an inner most loop and is currently a speed bottleneck. The profiler tells me I spend most of the time in __times__.
Short of creating a C extension module, can anyone see any room for optimization?
Edit: The example above is trying to evaulate 1 + x + y + xy + x^2 + y^2 with x = 2and y = 3, except without adding them, just putting each term in a list.
Adding them is fine (with some coefficients A, B, ...) i.e. all I'm trying to do is compute:
A + B*x + C*y + D*x*y + E*x^2 + F*y^2.

I'm not sure from which version, but numpy should have a polyval2d(x,y,c) function into the polynomial module, that will perfectly apply to your example.
You seemed interested in expanding your example to a much higher dimension.
In the same module there's a polyval3d(x,y,z,c), if that's not enought I'd suggest (as I guess you're already doing) to look at the source code. It shouldn't be too hard to implement what best suits your needs, and you can always ask here on SO :)

The function is on an inner most loop and is currently a speed
bottleneck.
You could try to get rid of the loop altogether, by using NumPy and replacing your variables with arrays of higher dimension.

Related

How do I set 2 expressions equal to eachother and solve for x [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Let's say I have an equation:
2x + 6 = 12
With algebra we can see that x = 3. How can I make a program in Python that can solve for x? I'm new to programming, and I looked at eval() and exec() but I can't figure out how to make them do what I want. I do not want to use external libraries (e.g. SAGE), I want to do this in just plain Python.
How about SymPy? Their solver looks like what you need. Have a look at their source code if you want to build the library yourself…
There are two ways to approach this problem: numerically and symbolically.
To solve it numerically, you have to first encode it as a "runnable" function - stick a value in, get a value out. For example,
def my_function(x):
return 2*x + 6
It is quite possible to parse a string to automatically create such a function; say you parse 2x + 6 into a list, [6, 2] (where the list index corresponds to the power of x - so 6*x^0 + 2*x^1). Then:
def makePoly(arr):
def fn(x):
return sum(c*x**p for p,c in enumerate(arr))
return fn
my_func = makePoly([6, 2])
my_func(3) # returns 12
You then need another function which repeatedly plugs an x-value into your function, looks at the difference between the result and what it wants to find, and tweaks its x-value to (hopefully) minimize the difference.
def dx(fn, x, delta=0.001):
return (fn(x+delta) - fn(x))/delta
def solve(fn, value, x=0.5, maxtries=1000, maxerr=0.00001):
for tries in xrange(maxtries):
err = fn(x) - value
if abs(err) < maxerr:
return x
slope = dx(fn, x)
x -= err/slope
raise ValueError('no solution found')
There are lots of potential problems here - finding a good starting x-value, assuming that the function actually has a solution (ie there are no real-valued answers to x^2 + 2 = 0), hitting the limits of computational accuracy, etc. But in this case, the error minimization function is suitable and we get a good result:
solve(my_func, 16) # returns (x =) 5.000000000000496
Note that this solution is not absolutely, exactly correct. If you need it to be perfect, or if you want to try solving families of equations analytically, you have to turn to a more complicated beast: a symbolic solver.
A symbolic solver, like Mathematica or Maple, is an expert system with a lot of built-in rules ("knowledge") about algebra, calculus, etc; it "knows" that the derivative of sin is cos, that the derivative of kx^p is kpx^(p-1), and so on. When you give it an equation, it tries to find a path, a set of rule-applications, from where it is (the equation) to where you want to be (the simplest possible form of the equation, which is hopefully the solution).
Your example equation is quite simple; a symbolic solution might look like:
=> LHS([6, 2]) RHS([16])
# rule: pull all coefficients into LHS
LHS, RHS = [lh-rh for lh,rh in izip_longest(LHS, RHS, 0)], [0]
=> LHS([-10,2]) RHS([0])
# rule: solve first-degree poly
if RHS==[0] and len(LHS)==2:
LHS, RHS = [0,1], [-LHS[0]/LHS[1]]
=> LHS([0,1]) RHS([5])
and there is your solution: x = 5.
I hope this gives the flavor of the idea; the details of implementation (finding a good, complete set of rules and deciding when each rule should be applied) can easily consume many man-years of effort.
Python may be good, but it isn't God...
There are a few different ways to solve equations. SymPy has already been mentioned, if you're looking for analytic solutions.
If you're happy to just have a numerical solution, Numpy has a few routines that can help. If you're just interested in solutions to polynomials, numpy.roots will work. Specifically for the case you mentioned:
>>> import numpy
>>> numpy.roots([2,-6])
array([3.0])
For more complicated expressions, have a look at scipy.fsolve.
Either way, you can't escape using a library.
If you only want to solve the extremely limited set of equations mx + c = y for positive integer m, c, y, then this will do:
import re
def solve_linear_equation ( equ ):
"""
Given an input string of the format "3x+2=6", solves for x.
The format must be as shown - no whitespace, no decimal numbers,
no negative numbers.
"""
match = re.match(r"(\d+)x\+(\d+)=(\d+)", equ)
m, c, y = match.groups()
m, c, y = float(m), float(c), float(y) # Convert from strings to numbers
x = (y-c)/m
print ("x = %f" % x)
Some tests:
>>> solve_linear_equation("2x+4=12")
x = 4.000000
>>> solve_linear_equation("123x+456=789")
x = 2.707317
>>>
If you want to recognise and solve arbitrary equations, like sin(x) + e^(i*pi*x) = 1, then you will need to implement some kind of symbolic maths engine, similar to maxima, Mathematica, MATLAB's solve() or Symbolic Toolbox, etc. As a novice, this is beyond your ken.
Use a different tool. Something like Wolfram Alpha, Maple, R, Octave, Matlab or any other algebra software package.
As a beginner you should probably not attempt to solve such a non-trivial problem.

Getting coefficient of term in sympy

I need to find the coefficient of a term in a rather long, nasty expansion. I have a polynomial, say f(x) = (x+x^2)/2 and then a function that is defined recursively: g_k(x,y) = y*f(g_{k-1}(x,y)) with g_0(x,y)=yx.
I want to know, say, the coefficient of x^2y^4 in g_10(x,y)
I've coded this up as
import sympy
x, y = sympy.symbols('x y')
def f(x):
return (x+x**2)/2
def g(x,y,k):
if k==0:
return y*x
else:
return y*f(g(x,y,k-1))
fxn = g(x,y,2)
fxn.expand().coeff(x**2).coeff(y**4)
> 1/4
So far so good.
But now I want to find a coefficient for k = 10. Now fxn = g(x,y,10) and then fxn.expand() is very slow. Obviously there are a lot of steps going on, so it's not a surprise. But my knowledge of sympy is rudimentary - I've only started using it specifically because I need to be able to find these coefficients. I could imagine that there may be a way to get sympy to recognize that everything is a polynomial and so it can more quickly find a particular coefficient, but I haven't been able to find examples doing that.
Is there another approach through sympy to get this coefficient, or anything I can do to speed it up?
I assume you are only interested in the coefficients given and not the whole polynomial g(x,y,10). So you can redefine your function g to get rid of higher orders in every step of the recursion. This will significantly speed up your calculation.
def g(x,y,k):
if k==0:
return y*x
else:
temp = y*f(g(x,y,k-1)) + sympy.O(y**5) + sympy.O(x**3)
return temp.expand().removeO()
Works as follows: First everything of the order O(y**5), O(x**3) (and higher) will be grouped and then discarded. Keep in mind you loose lots of information!
Also have a look here: Sympy: Drop higher order terms in polynomial

Exponentially distributed random generator (log function) in python?

I really need help as I am stuck at the begining of the code.
I am asked to create a function to investigate the exponential distribution on histogram. The function is x = −log(1−y)/λ. λ is a constant and I referred to that as lamdr in the code and simply gave it 10. I gave N (the number of random numbers) 10 and ran the code yet the results and the generated random numbers gave me totally different results; below you can find the code, I don't know what went wrong, hope you guys can help me!! (I use python 2)
import random
import math
N = raw_input('How many random numbers you request?: ')
N = int(N)
lamdr = raw_input('Enter a value:')
lamdr = int(lamdr)
def exprand(lamdr):
y = []
for i in range(N):
y.append(random.uniform(0,1))
return y
y = exprand(lamdr)
print 'Randomly generated numbers:', (y)
x = []
for w in y:
x.append((math.log((1 - w) / lamdr)) * -1)
print 'Results:', x
After viewing the code you provided, it looks like you have the pieces you need but you're not putting them together.
You were asked to write function exprand(lambdr) using the specified formula. Python already provides a function called random.expovariate(lambd) for generating exponentials, but what the heck, we can still make our own. Your formula requires a "random" value for y which has a uniform distribution between zero and one. The documentation for the random module tells us that random.random() will give us a uniform(0,1) distribution. So all we have to do is replace y in the formula with that function call, and we're in business:
def exprand(lambdr):
return -math.log(1.0 - random.random()) / lambdr
An historical note: Mathematically, if y has a uniform(0,1) distribution, then so does 1-y. Implementations of the algorithm dating back to the 1950's would often leverage this fact to simplify the calculation to -math.log(random.random()) / lambdr. Mathematically this gives distributionally correct results since P{X = c} = 0 for any continuous random variable X and constant c, but computationally it will blow up in Python for the 1 in 264 occurrence where you get a zero from random.random(). One historical basis for doing this was that when computers were many orders of magnitude slower than now, ditching the one additional arithmetic operation was considered worth the minuscule risk. Another was that Prime Modulus Multiplicative PRNGs, which were popular at the time, never yield a zero. These days it's primarily of historical interest, and an interesting example of where math and computing sometimes diverge.
Back to the problem at hand. Now you just have to call that function N times and store the results somewhere. Likely candidates to do so are loops or list comprehensions. Here's an example of the latter:
abuncha_exponentials = [exprand(0.2) for _ in range(5)]
That will create a list of 5 exponentials with λ=0.2. Replace 0.2 and 5 with suitable values provided by the user, and you're in business. Print the list, make a histogram, use it as input to something else...
Replacing exporand with expovariate in the list comprehension should produce equivalent results using Python's built-in exponential generator. That's the beauty of functions as an abstraction, once somebody writes them you can just use them to your heart's content.
Note that because of the use of randomness, this will give different results every time you run it unless you "seed" the random generator to the same value each time.
WHat #pjs wrote is true to a point. While statement mathematically, if y has a uniform(0,1) distribution, so does 1-y appears to be correct, proposal to replace code with -math.log(random.random()) / lambdr is just wrong. Why? Because Python random module provide U(0,1) in the range [0,1) (as mentioned here), thus making such replacement non-equivalent.
In more layman term, if your U(0,1) is actually generating numbers in the [0,1) range, then code
import random
def exprand(lambda):
return -math.log(1.0 - random.random()) / lambda
is correct, but code
import random
def exprand(lambda):
return -math.log(random.random()) / lambda
is wrong, it will sometimes generate NaN/exception, as log(0) will be called

Python how to get function formula given it's inputs and results

Assume we have a function with unknown formula, given few inputs and results of this function, how can we get the function's formula.
For example we have inputs x and y and result r in format (x,y,r)
[ (2,4,8) , (3,6,18) ]
And the desired function can be
f(x,y) = x * y
As you post the question, the problem is too generic. If you want to find any formula mapping the given inputs to the given result, there are simply too many possible formulas. In order to make sense of this, you need to somehow restrict the set of functions to consider. For example you could say that you're only interested in polynomial solutions, i.e. where
r = sum a_ij * x^i * y^j for i from 0 to n and j from 0 to n - i
then you have a system of equations, with the a_ij as parameters to solve for. The higher the degree n the more such parameters you'd have to find, so the more input-output combinations you'd need to know. Variations of this use rational functions (so you divide by another polynomial), or allow some trigonometric functions, or something like that.
If your setup were particularly easy, you'd have just linear equations, i.e. r = a*x + b*y + c. As you can see, even that has three parameters a,b,c so you can't uniquely find all three of them just given the two inputs you provided in your question. And even then the result would not be the r = x*y you were aiming for, since that's technically of degree 2.
If you want to point out that r = x*y is a particularly simple formula, and you would like to look for simple formulas, then one approach would be enumerating formulas in order of increasing complexity. But if you do this without parameters (since ugly parameters will make a simple formula like a*x + b*y + c appear complex), then it's hard to guilde this enumeration towards the one you want, so you'd really have to enumerate all possible formulas, which will become infeasible very quickly.

Optimization on a set of data using python

Optimization on a set of data using python.
Following data sets available
x, y, f(x), f(y).
Function to be optimized (maximize):
f(x,y) = f(x)*y - f(y)*x
based on following contraints:
V >= sqrt(f(x)^2+f(y)^2)
I >= sqrt(x^2+y2)
where V and I are constants.
Can anyone please let me know what optimization module do I need to use? From what I understand I need to perform a discrete optimization as I have set f values for x, y, f(x) and f(y).
Using complex optimizers (http://docs.scipy.org/doc/scipy/reference/optimize.html) for such a problem is rather a bad idea.
It looks like a problem which can be quite easily solved in under O(n^2) where n=max(|x|,|y|), simply:
sort x,y,f(x),f(y) creating sorted(x), sorted(y), sorted(f(x)), sorted(f(y))
for each x find the positions in sorted(y) for which I^2 >= x^2+y^2 holds and similarly for f(x) and sorted(f(y)) and V^2 >= f(x)^2 + f(y)^2 (two binary searches, as I^2 >= x^2+y^2 <=> |y| <= sqrt(I^2-x^2) so you can find the "barrier"in constant time and then use bin searches to find actual data points which are the closest ones "on the right side of inequality")
Iterate through sorted(x) and for each x:
Iterate simultanously through elements of y and f(y) and discard (in this loop) points which are not in borth intervals found in step 2. (linear complexity)
Record argument pairs x_max,y_max for which f(x_max,y_max) is maximized
Return x_max,y_max
Total complexity is under quadratic, as step 1 takes O(nlgn), each iteration of loop in step 2 is O(lgn) so the whole step 2 takes O(nlgn), loop in step 3 is O(n) and loop in first substep of step 3 is O(n) (but in real life it should be almost constant due to the constraints), which makes the whole algorithm O(n^2) (and in most cases it will behave as O(nlgn)). It also does not depend on the definition of f(x,y) (it uses it as a black box) so you can optimize an arbitrary function is such a way.

Categories

Resources