Getting coefficient of term in sympy - python

I need to find the coefficient of a term in a rather long, nasty expansion. I have a polynomial, say f(x) = (x+x^2)/2 and then a function that is defined recursively: g_k(x,y) = y*f(g_{k-1}(x,y)) with g_0(x,y)=yx.
I want to know, say, the coefficient of x^2y^4 in g_10(x,y)
I've coded this up as
import sympy
x, y = sympy.symbols('x y')
def f(x):
return (x+x**2)/2
def g(x,y,k):
if k==0:
return y*x
else:
return y*f(g(x,y,k-1))
fxn = g(x,y,2)
fxn.expand().coeff(x**2).coeff(y**4)
> 1/4
So far so good.
But now I want to find a coefficient for k = 10. Now fxn = g(x,y,10) and then fxn.expand() is very slow. Obviously there are a lot of steps going on, so it's not a surprise. But my knowledge of sympy is rudimentary - I've only started using it specifically because I need to be able to find these coefficients. I could imagine that there may be a way to get sympy to recognize that everything is a polynomial and so it can more quickly find a particular coefficient, but I haven't been able to find examples doing that.
Is there another approach through sympy to get this coefficient, or anything I can do to speed it up?

I assume you are only interested in the coefficients given and not the whole polynomial g(x,y,10). So you can redefine your function g to get rid of higher orders in every step of the recursion. This will significantly speed up your calculation.
def g(x,y,k):
if k==0:
return y*x
else:
temp = y*f(g(x,y,k-1)) + sympy.O(y**5) + sympy.O(x**3)
return temp.expand().removeO()
Works as follows: First everything of the order O(y**5), O(x**3) (and higher) will be grouped and then discarded. Keep in mind you loose lots of information!
Also have a look here: Sympy: Drop higher order terms in polynomial

Related

Exponentially distributed random generator (log function) in python?

I really need help as I am stuck at the begining of the code.
I am asked to create a function to investigate the exponential distribution on histogram. The function is x = −log(1−y)/λ. λ is a constant and I referred to that as lamdr in the code and simply gave it 10. I gave N (the number of random numbers) 10 and ran the code yet the results and the generated random numbers gave me totally different results; below you can find the code, I don't know what went wrong, hope you guys can help me!! (I use python 2)
import random
import math
N = raw_input('How many random numbers you request?: ')
N = int(N)
lamdr = raw_input('Enter a value:')
lamdr = int(lamdr)
def exprand(lamdr):
y = []
for i in range(N):
y.append(random.uniform(0,1))
return y
y = exprand(lamdr)
print 'Randomly generated numbers:', (y)
x = []
for w in y:
x.append((math.log((1 - w) / lamdr)) * -1)
print 'Results:', x
After viewing the code you provided, it looks like you have the pieces you need but you're not putting them together.
You were asked to write function exprand(lambdr) using the specified formula. Python already provides a function called random.expovariate(lambd) for generating exponentials, but what the heck, we can still make our own. Your formula requires a "random" value for y which has a uniform distribution between zero and one. The documentation for the random module tells us that random.random() will give us a uniform(0,1) distribution. So all we have to do is replace y in the formula with that function call, and we're in business:
def exprand(lambdr):
return -math.log(1.0 - random.random()) / lambdr
An historical note: Mathematically, if y has a uniform(0,1) distribution, then so does 1-y. Implementations of the algorithm dating back to the 1950's would often leverage this fact to simplify the calculation to -math.log(random.random()) / lambdr. Mathematically this gives distributionally correct results since P{X = c} = 0 for any continuous random variable X and constant c, but computationally it will blow up in Python for the 1 in 264 occurrence where you get a zero from random.random(). One historical basis for doing this was that when computers were many orders of magnitude slower than now, ditching the one additional arithmetic operation was considered worth the minuscule risk. Another was that Prime Modulus Multiplicative PRNGs, which were popular at the time, never yield a zero. These days it's primarily of historical interest, and an interesting example of where math and computing sometimes diverge.
Back to the problem at hand. Now you just have to call that function N times and store the results somewhere. Likely candidates to do so are loops or list comprehensions. Here's an example of the latter:
abuncha_exponentials = [exprand(0.2) for _ in range(5)]
That will create a list of 5 exponentials with λ=0.2. Replace 0.2 and 5 with suitable values provided by the user, and you're in business. Print the list, make a histogram, use it as input to something else...
Replacing exporand with expovariate in the list comprehension should produce equivalent results using Python's built-in exponential generator. That's the beauty of functions as an abstraction, once somebody writes them you can just use them to your heart's content.
Note that because of the use of randomness, this will give different results every time you run it unless you "seed" the random generator to the same value each time.
WHat #pjs wrote is true to a point. While statement mathematically, if y has a uniform(0,1) distribution, so does 1-y appears to be correct, proposal to replace code with -math.log(random.random()) / lambdr is just wrong. Why? Because Python random module provide U(0,1) in the range [0,1) (as mentioned here), thus making such replacement non-equivalent.
In more layman term, if your U(0,1) is actually generating numbers in the [0,1) range, then code
import random
def exprand(lambda):
return -math.log(1.0 - random.random()) / lambda
is correct, but code
import random
def exprand(lambda):
return -math.log(random.random()) / lambda
is wrong, it will sometimes generate NaN/exception, as log(0) will be called

scipy.optimize.minimize chi squared python

So i am doing this assignment, where i am supposed to minimize the chi squared function. I saw someone doing this on the internet so i just copied it:
Multiple variables in SciPy's optimize.minimize
I made a chi-squared function which is a function in 3 variables (x,y,sigma) where sigma is random gaussian fluctuation random.gauss(0,sigma). I did not print that code here because on first sight it might be confusing (I used a lot of recursion). But i can assure you that this function is correct.
now this code just makes a list of the calculated minimization(Which are different every time because of the random gaussian fluctuation). But here comes the main problem. If i did my calculation correctly, we should get a list with a mean of 2 (since i have 2 degrees of freedom as you can see in this link: https://en.wikipedia.org/wiki/Chi-squared_test).
def Chi2(pos):
return Chi(pos[0],pos[1],1)
x_list= []
y_list= []
chi_list = []
for i in range(1000):
result = scipy.optimize.minimize(Chi2,[5,5]).x
x_list.append(result[0])
y_list.append(result[1])
chi_list.append(Chi2(result))
But when i use this code i get a list of mean 4, however if i add the method "Powell" i get a mean of 9!!
So my main question is, how is it possible these means are so different and how do i know which method to use to get the best optimization?
Because i think the error might be in my chisquare function i will show this one as well. The story behind this assignment is that we need to find the position of a mobile device and we have routers on the positions (0,0),(20,0),(0,20) and (20,20). We used a lot of recursion, and the graph of the chi_squared looked fine(it has a minimum on (5,5)
def perfectsignal(x_m,y_m,x_r,y_r):
return 20*np.log10(c / (4 * np.pi * f)) - 10 * np.log((x_m-x_r)**2 + (y_m-y_r)**2 + 2**2)
def signal(x_m,y_m,x_r,y_r,sigma):
return perfectsignal(x_m,y_m,x_r,y_r) + random.gauss(0,sigma)
def res(x_m,y_m,x_r,y_r,sigma,sigma2):
x = (signal(x_m,y_m,x_r,y_r,sigma) - perfectsignal(x_m,y_m,x_r,y_r))/float(sigma2);
return x
def Chi(x,y,sigma):
return(res(x,y,0,0,sigma,1)**2+res(x,y,20,0,sigma,1)**2+res(x,y,0,20,sigma,1)**2+res(x,y,20,20,sigma,1)**2)
Kees

Python how to get function formula given it's inputs and results

Assume we have a function with unknown formula, given few inputs and results of this function, how can we get the function's formula.
For example we have inputs x and y and result r in format (x,y,r)
[ (2,4,8) , (3,6,18) ]
And the desired function can be
f(x,y) = x * y
As you post the question, the problem is too generic. If you want to find any formula mapping the given inputs to the given result, there are simply too many possible formulas. In order to make sense of this, you need to somehow restrict the set of functions to consider. For example you could say that you're only interested in polynomial solutions, i.e. where
r = sum a_ij * x^i * y^j for i from 0 to n and j from 0 to n - i
then you have a system of equations, with the a_ij as parameters to solve for. The higher the degree n the more such parameters you'd have to find, so the more input-output combinations you'd need to know. Variations of this use rational functions (so you divide by another polynomial), or allow some trigonometric functions, or something like that.
If your setup were particularly easy, you'd have just linear equations, i.e. r = a*x + b*y + c. As you can see, even that has three parameters a,b,c so you can't uniquely find all three of them just given the two inputs you provided in your question. And even then the result would not be the r = x*y you were aiming for, since that's technically of degree 2.
If you want to point out that r = x*y is a particularly simple formula, and you would like to look for simple formulas, then one approach would be enumerating formulas in order of increasing complexity. But if you do this without parameters (since ugly parameters will make a simple formula like a*x + b*y + c appear complex), then it's hard to guilde this enumeration towards the one you want, so you'd really have to enumerate all possible formulas, which will become infeasible very quickly.

Optimization on a set of data using python

Optimization on a set of data using python.
Following data sets available
x, y, f(x), f(y).
Function to be optimized (maximize):
f(x,y) = f(x)*y - f(y)*x
based on following contraints:
V >= sqrt(f(x)^2+f(y)^2)
I >= sqrt(x^2+y2)
where V and I are constants.
Can anyone please let me know what optimization module do I need to use? From what I understand I need to perform a discrete optimization as I have set f values for x, y, f(x) and f(y).
Using complex optimizers (http://docs.scipy.org/doc/scipy/reference/optimize.html) for such a problem is rather a bad idea.
It looks like a problem which can be quite easily solved in under O(n^2) where n=max(|x|,|y|), simply:
sort x,y,f(x),f(y) creating sorted(x), sorted(y), sorted(f(x)), sorted(f(y))
for each x find the positions in sorted(y) for which I^2 >= x^2+y^2 holds and similarly for f(x) and sorted(f(y)) and V^2 >= f(x)^2 + f(y)^2 (two binary searches, as I^2 >= x^2+y^2 <=> |y| <= sqrt(I^2-x^2) so you can find the "barrier"in constant time and then use bin searches to find actual data points which are the closest ones "on the right side of inequality")
Iterate through sorted(x) and for each x:
Iterate simultanously through elements of y and f(y) and discard (in this loop) points which are not in borth intervals found in step 2. (linear complexity)
Record argument pairs x_max,y_max for which f(x_max,y_max) is maximized
Return x_max,y_max
Total complexity is under quadratic, as step 1 takes O(nlgn), each iteration of loop in step 2 is O(lgn) so the whole step 2 takes O(nlgn), loop in step 3 is O(n) and loop in first substep of step 3 is O(n) (but in real life it should be almost constant due to the constraints), which makes the whole algorithm O(n^2) (and in most cases it will behave as O(nlgn)). It also does not depend on the definition of f(x,y) (it uses it as a black box) so you can optimize an arbitrary function is such a way.

Optimizing Python polynomial evaluation

I have a function which evaluates terms of a polynomial in several variables. The inputs are lists of powers of each variable. For example, for two variables and 2nd order it looks like this,
def f(x,y):
return [1, x[1], y[1], x[1]*y[1], x[2], y[2]]
x = [2**0, 2**1, 2**2]
y = [3**0, 3**1, 3**2]
>>> f(x,y)
[1,2,3,6,4,9]
In reality the function is higher order and has many variables so on average there are a few thousand terms (in fact, I create the function at run time with an eval statement, but that's not important). The function is on an inner most loop and is currently a speed bottleneck. The profiler tells me I spend most of the time in __times__.
Short of creating a C extension module, can anyone see any room for optimization?
Edit: The example above is trying to evaulate 1 + x + y + xy + x^2 + y^2 with x = 2and y = 3, except without adding them, just putting each term in a list.
Adding them is fine (with some coefficients A, B, ...) i.e. all I'm trying to do is compute:
A + B*x + C*y + D*x*y + E*x^2 + F*y^2.
I'm not sure from which version, but numpy should have a polyval2d(x,y,c) function into the polynomial module, that will perfectly apply to your example.
You seemed interested in expanding your example to a much higher dimension.
In the same module there's a polyval3d(x,y,z,c), if that's not enought I'd suggest (as I guess you're already doing) to look at the source code. It shouldn't be too hard to implement what best suits your needs, and you can always ask here on SO :)
The function is on an inner most loop and is currently a speed
bottleneck.
You could try to get rid of the loop altogether, by using NumPy and replacing your variables with arrays of higher dimension.

Categories

Resources