cython vs python different results within scipy.optimize.fsolve - python

I cythonized a function that I call a bunch of times in my code. The cython version and the original python code give me the same answers (within 1e-7 which I understand has something to do with cython vs. python types...not the question here but might be important).
I attempt to find the root of the function using scipy.optimize.fsolve(). The python version works fine, but the cython version diverges.
The code is pretty involved and has a big external file to prepare some of the arguments, so I can't post everything. I post the cython code. Full code is here.
def euler_outside(float b_prime, int index_b,
np.ndarray[np.double_t, ndim=1] b_grid, int index_y,
np.ndarray[np.double_t, ndim=1] y_grid,
np.ndarray[np.double_t, ndim=1] y_vec,
np.ndarray[np.double_t, ndim=2] pol_mat_b, float q,
np.ndarray[np.double_t, ndim=2] pol_mat_q,
np.ndarray[np.double_t, ndim=2] P, float beta,
int n_ygrid, int check=0):
'''
b_prime - the variable of interest. want to find b_prime that solves this
function
'''
cdef double b, y, c, uc, e_ucp, eul_val
cdef int i
cdef np.ndarray[np.float64_t, ndim=1] uct, c_prime = np.zeros((n_ygrid,))
b = b_grid[index_b]
y = y_grid[index_y]
# Get value of consumption today
c = b + y - b_prime/q
# Get possible values of consumption tomorrow
if check:
c_prime = b_prime + y_vec - b_grid[0]/q
else:
for i in range(n_ygrid):
c_prime[i] = (b_prime + y_vec[i] -
(np.interp(b_prime, b_grid, pol_mat_b[:,i]) /
np.interp(b_prime, b_grid, pol_mat_q[:,i])))
if c<0:
return 1e10
uc = utility_prime(c)
uct = utility_prime(c_prime)
e_ucp = np.inner( uct, P[index_y,:] )
eul_val = uc - beta*q * e_ucp
return eul_val
The python code is the same but w/out the cdef statements and type info on the arguments. I've checked to make sure the output is the same for the same input values, and it is. My question is why scipy's fsolve goes off the deep-end for one and not the other. I assume it's a problem with my cython?
Running python 2.7 from Anaconda. Compiling the extension module via pyximport.

As mentioned in the comments above, the reason for the discrepancy between the results from the Python and Cython versions is that in the Cython function, several of the inputs are declared as float, whereas the actual Python variables are double precision.
The resulting increase in round-off error for the Cython function seems to be the reason why fsolve fails to converge - when these inputs are declared as double instead, the Python and Cython versions yield the exact same result, and fsolve converges correctly for both.
As an aside, cases where round-off error in the objective function prevents convergence are indicative of ill-conditioned problems. You might want to think about whether it's possible to re-formulate your model in order to improve its numerical stability.

Related

Compile Scipy function with Cython

I am running a simulation in Python 3.4 - that involves a lot of dot products between a sparse array (in csr format) and a dense vector. I am using Scipy for the sparse matrix, numpy for everything else.
Using Cython gave me a massive boost (~x6 speed increase), after making sure that I cdef everything properly and after minimizing Python interaction (bt going through the html file that Cython gives me and modifying my code).
Now, I profile the code and 50% of the simulation time is spent on the line with the dot product. I am wondering if it is possible to somehow accelerate this line, say by complining this one dot function in Cython?
I know I could write my own implementation for (csr sprase 2d matrix) dot (dense vector), but I am trying to avoid that.
Edit: I have included a minimal example of the code. I am sorry, I can't see how to make it smaller. It is a textbook exercise in statistical mechanics. Place marbles in pots until one of the pots exceeds capacity. Then, start a cascade which propagates according to a (here sparse) matrix. I am using batch sampling.
Please focus on the line towards the end.
from __future__ import division
import numpy as np
import cython
cimport numpy as np
cimport cpython.array
import scipy.sparse as sps
#cython.cdivision(True)
#cython.nonecheck(False)
#cython.boundscheck(False)
#cython.wraparound(False)
def simulate(long[:] capacity_vec,
int random_array_size,
long n,
int seed,
int[:] A_col,
int[:] A_row,
long[:] A_data):
#### Initialise ####################################################################################################
# Initialise states
cdef int[:] damage = np.random.randint(0, int(np.min(capacity_vec)/2), n).astype(np.int32)
cdef int[:] dr_list = np.random.choice(n, 1000).astype(np.int32)
cdef int[:] states = np.zeros(n).astype(np.int32)
cdef int[:] states_ = np.zeros(n).astype(np.int32)
cdef int[:] change = np.zeros(n).astype(np.int32)
# Initialise counters
cdef int k, violations, violations_, counter= 0, dr_id=0, increment_index = 0
# Build Sparse Adjecency Matrix
cA_sps = sps.csr_matrix( (A_data, (A_row, A_col) ), shape=(n,n) ).astype(np.int32)
while counter < 1000:
#### Place damage until a cascade starts #######################################################################
while damage[increment_index] <= capacity_vec[increment_index]:# Check for violations
increment_index = dr_list[dr_id] # Where do we place the marble?
damage[increment_index] = damage[increment_index] + 1 # place the marble
dr_id = dr_id + 1 # another random number used
if dr_id == random_array_size - 1: # Check if we run out of random numbers
dr_list = np.random.choice(n, random_array_size).astype(np.int32) # if so, pick new increment_index
dr_id = 0 # and reset the counter
#### Initialise cascade ########################################################################################
violations, violations_ = 1, 0
states[increment_index] = 1
#### Propagate cascade #########################################################################################
while violations > violations_: # check for fixed point, propagate cascade
for k in range(n): change[k] = states[k] - states_[k]
### THIS LINE IS THE PROBLEM. It takes up half of all simulation time.
damage = damage + cA_sps.dot(change).astype(np.int32) # spread violations
states_ = states.copy() # store previous states
# Determine previous and current violations
violations, violations_ = 0 , violations
for k in range(n):
states_[k] = 0
if damage[k] > capacity_vec[k]:
violations = violations + 1
states[k] = 1 # deactivate any node that has a violation
for k in range(n): damage[k] = 0
counter = counter + 1 # progress cascade id after storing
I'd discourage from writing your own matrix multiplication. SciPy is done by smart people who know what they are doing, and unless your confident in numerical computing, just don't. Most of SciPy code is already compiled.
However, what you might look at is code for sparse.csr_matrix.dot. Getting into definition directly here and then here, you'll see that there are few checks done in Scipy. If you know what exact form you want, you could write your own method (modify your SciPy copy) and compute your product directly. Not sure how much that would help, though.
If you want to build Scipy yourself it is as easy as checking out whole project from GitHug and then running
python setup.py build
python setup.py install
For more direct instructions check build documentation.

Struct pointers in Cython with GSL Monte-Carlo minimization

I'm stuck on this exercise and am not good enough to resolve it. Basically I am writing a Monte-Carlo Maximum Likelihood algorithm for the Bernoulli distribution. The problem is that I have to pass the data as the parameter to the GSL minimization (one-dim) algorithm, and need to also pass the size of the data (since the outer loop are the different sample sizes of the "observed" data). So I'm attempting to pass these parameters as a struct. However, I'm running into seg faults and I'm SURE it is coming from the portion of the code that concerns the struct and treating it as a pointer.
[EDIT: I have corrected for allocation of the struct and its components]
%%cython
#!python
#cython: boundscheck=False, wraparound=False, nonecheck=False, cdivision=True
from libc.stdlib cimport rand, RAND_MAX, calloc, malloc, realloc, free, abort
from libc.math cimport log
#Use the CythonGSL package to get the low-level routines
from cython_gsl cimport *
######################### Define the Data Structure ############################
cdef struct Parameters:
#Pointer for Y data array
double* Y
#size of the array
int* Size
################ Support Functions for Monte-Carlo Function ##################
#Create a function that allocates the memory and verifies integrity
cdef void alloc_struct(Parameters* data, int N, unsigned int flag) nogil:
#allocate the data array initially
if flag==1:
data.Y = <double*> malloc(N * sizeof(double))
#reallocate the data array
else:
data.Y = <double*> realloc(data.Y, N * sizeof(double))
#If the elements of the struct are not properly allocated, destory it and return null
if N!=0 and data.Y==NULL:
destroy_struct(data)
data = NULL
#Create the destructor of the struct to return memory to system
cdef void destroy_struct(Parameters* data) nogil:
free(data.Y)
free(data)
#This function fills in the Y observed variable with discreet 0/1
cdef void Y_fill(Parameters* data, double p_true, int* N) nogil:
cdef:
Py_ssize_t i
double y
for i in range(N[0]):
y = rand()/<double>RAND_MAX
if y <= p_true:
data.Y[i] = 1
else:
data.Y[i] = 0
#Definition of the function to be maximized: LLF of Bernoulli
cdef double LLF(double p, void* data) nogil:
cdef:
#the sample structure (considered the parameter here)
Parameters* sample
#the total of the LLF
double Sum = 0
#the loop iterator
Py_ssize_t i, n
sample = <Parameters*> data
n = sample.Size[0]
for i in range(n):
Sum += sample.Y[i]*log(p) + (1-sample.Y[i])*log(1-p)
return (-(Sum/n))
########################## Monte-Carlo Function ##############################
def Monte_Carlo(int[::1] Samples, double[:,::1] p_hat,
Py_ssize_t Sims, double p_true):
#Define variables and pointers
cdef:
#Data Structure
Parameters* Data
#iterators
Py_ssize_t i, j
int status, GSL_CONTINUE, Iter = 0, max_Iter = 100
#Variables
int N = Samples.shape[0]
double start_val, a, b, tol = 1e-6
#GSL objects and pointer
const gsl_min_fminimizer_type* T
gsl_min_fminimizer* s
gsl_function F
#Set the GSL function
F.function = &LLF
#Allocate the minimization routine
T = gsl_min_fminimizer_brent
s = gsl_min_fminimizer_alloc(T)
#allocate the struct
Data = <Parameters*> malloc(sizeof(Parameters))
#verify memory integrity
if Data==NULL: abort()
#set the starting value
start_val = rand()/<double>RAND_MAX
try:
for i in range(N):
if i==0:
#allocate memory to the data array
alloc_struct(Data, Samples[i], 1)
else:
#reallocate the data array in the struct if
#we are past the first run of outer loop
alloc_struct(Data, Samples[i], 2)
#verify memory integrity
if Data==NULL: abort()
#pass the data size into the struct
Data.Size = &Samples[i]
for j in range(Sims):
#fill in the struct
Y_fill(Data, p_true, Data.Size)
#set the parameters for the GSL function (the samples)
F.params = <void*> Data
a = tol
b = 1
#set the minimizer
gsl_min_fminimizer_set(s, &F, start_val, a, b)
#initialize conditions
GSL_CONTINUE = -2
status = -2
while (status == GSL_CONTINUE and Iter < max_Iter):
Iter += 1
status = gsl_min_fminimizer_iterate(s)
start_val = gsl_min_fminimizer_x_minimum(s)
a = gsl_min_fminimizer_x_lower(s)
b = gsl_min_fminimizer_x_upper(s)
status = gsl_min_test_interval(a, b, tol, 0.0)
if (status == GSL_SUCCESS):
print ("Converged:\n")
p_hat[i,j] = start_val
finally:
destroy_struct(Data)
gsl_min_fminimizer_free(s)
with the following python code to run the above function:
import numpy as np
#Sample Sizes
N = np.array([5,50,500,5000], dtype='i')
#Parameters for MC
T = 1000
p_true = 0.2
#Array of the outputs from the MC
p_hat = np.empty((N.size,T), dtype='d')
p_hat.fill(np.nan)
Monte_Carlo(N, p_hat, T, p_true)
I have separately tested the struct allocation and it works, doing what it should do. However, while funning the Monte Carlo the kernel is killed with an abort call (per the output on my Mac) and the Jupyter output on my console is the following:
gsl: fsolver.c:39: ERROR: computed function value is infinite or NaN
Default GSL error handler invoked.
It seems now that the solver is not working. I'm not familiar with the GSL package, having used it only once to generate random numbers from the gumbel distribution (bypassing the scipy commands).
I would appreciate any help on this! Thanks
[EDIT: Change lower bound of a]
Redoing the exercise with the exponential distribution, whose log likelihood function contains just one log I've honed down the problem having been with gsl_min_fminimizer_set initially evaluating at the lower bound of a at 0 yielding the -INF result (since it evaluates the problem prior to solving to generate f(lower), f(upper) where f is my function to optimise). When I set the lower bound to something other than 0 but really small (say the tol variable of my defined tolerance) the solution algorithm works and yields the correct results.
Many thanks #DavidW for the hints to get me to where I needed to go.
This is a somewhat speculative answer since I don't have GSL installed so struggle to test it (so apologies if it's wrong!)
I think the issue is the line
Sum += sample.Y[i]*log(p) + (1-sample.Y[i])*log(1-p)
It looks like Y[i] can be either 0 or 1. When p is at either end of the range 0-1 it gives 0*-inf = nan. In the case where only all the 'Y's are the same this point is the minimum (so the solver will reliably end up at the invalid point). Fortunately you should be able to rewrite the line to avoid getting a nan:
if sample.Y[i]:
Sum += log(p)
else:
Sum += log(1-p)
(the case which will generate the nan is the one not executed).
There's a second minor issue I've spotted: in alloc_struct you do data = NULL in case of an error. This only affects the local pointer, so your test for NULL in Monte_Carlo is meaningless. You'd be better returning a true or false flag from alloc_struct and checking that. I doubt if you're hitting this error though.
Edit: Another better option would be to find the minimum analytically: the derivative of A log(p) + (1-A) log (1-p) is A/p - (1-A)/(1-p). Average all the sample.Ys to find A. Finding the place where the derivative is 0 gives p=A. (You'll want to double-check my working!). With this you can avoid having to use the GSL minimization routines.

Cython/numpy vs pure numpy for least square fitting

A T.A at school showed me this code as an example of a least square fitting algorithm.
import numpy as np
#return the coefficients (a0,..aN) of the fit y=a0+a1*x+..an*x^n
#with associated sigma dy
#x,y,dy are all np.arrays with dtype= np.float64
def fit_poly(x,y,dy,n):
V = np.asmatrix(np.diag(dy**2))
M = []
for k in range(n+1):
M.append(x**k)
M = np.asmatrix(M).T
theta = (M.T*V.I*M).I*M.T*V.I*np.asmatrix(y).T
cov_t = (M.T*V.I*M).I
return np.asarray(theta.T)[0], np.asarray(cov_t)
Im trying to optimize his codes using cython. i got this code
cimport numpy as np
import numpy as np
cimport cython
#cython.boundscheck(False)
#cython.wraparound(False)
cpdef poly_c(np.ndarray[np.float64_t, ndim=1] x ,
np.ndarray[np.float64_t, ndim=1] y np.ndarray[np.float64_t,ndim=1]dy , np.int n):
cdef np.ndarray[np.float64_t, ndim=2] V, M
V=np.asmatrix(np.diag(dy**2),dtype=np.float64)
M=np.asmatrix([x**k for k in range(n+1)],dtype=np.float64).T
return ((M.T*V.I*M).I*M.T*V.I*(np.asmatrix(y).T))[0],(M.T*V.I*M).I
But the runtime seems to be the same for both programs,i did used an 'assert' to make sure the outputs where the same. What am i missing/doing wrong?
Thank you for your time and hopefully you can help me.
ps: this is the code im profiling with(not sure if i can call this profiling but w/e)
import numpy as np
from polyC import poly_c
from time import time
from pancho_fit import fit_poly
#pancho's the T.A,sup pancho
x=np.arange(1,1000)
x=np.asarray(x,dtype=np.float64)
y=3*x+np.random.random(999)
y=np.asarray(y,dtype=np.float64)
dy=np.array([y.std() for i in range(1,1000)],dtype=np.float64)
t0=time()
a,b=poly_c(x,y,dy,4)
#a,b=fit_poly(x,y,dy,4)
print("time={}s".format(time()-t0))
Except for [x**k for k in range(n+1)] I don't see any iterations for cython to improve. Most of the action is in matrix products. Those are already done with compiled code (with np.dot for ndarrays).
And n is only 4, not many iterations.
But why iterate this?
In [24]: x=np.arange(1,1000.)
In [25]: M1=x[:,None]**np.arange(5)
# np.matrix(M1)
does the same thing.
So no, this does not look like a good cython candidate - not unless you are prepared to write out all those matrix products in compilable detail.
I'd skip also the asmatrix stuff and use regular dot, # and einsum, but that's more a matter of style than speed.

Cython fails to recognize overloaded constructor

I'm trying to compile the cyrand project using cython, but am running into a bizarre compile error when testing overloaded constructors. See this gist for the files in question.
From the gist, I can compile and run example.pyx just fine, which uses the default constructor:
import numpy as np
cimport numpy as np
cimport cython
include "random.pyx"
#cython.boundscheck(False)
def example(n):
cdef int N = n
cdef rng r
cdef rng_sampler[double] * rng_p = new rng_sampler[double](r)
cdef rng_sampler[double] rng = deref(rng_p)
cdef np.ndarray[np.double_t, ndim=1] result = np.empty(N, dtype=np.double)
for i in range(N):
result[i] = rng.normal(0.0, 2.0)
print result
return result
^ this works and runs fine. An example run produces the following output:
$ python test_example.py
[ 0.47237842 3.153744849 3.6854932057 ]
Yet when I try to compile and run the test which used a constructor that takes an unsigned long as argument:
import numpy as np
cimport numpy as np
cimport cython
include "random.pyx"
#cython.boundscheck(False)
def example_seed(n, seed):
cdef int N = n
cdef unsigned long Seed = seed
cdef rng r
cdef rng_sampler[double] * rng_p = new rng_sampler[double](Seed)
cdef rng_sampler[double] rng = deref(rng_p)
cdef np.ndarray[np.double_t, ndim=1] result = np.empty(N, dtype=np.double)
for i in range(N):
result[i] = rng.normal(0.0, 2.0)
print result
return result
I get the following cython compiler error:
Error compiling Cython file:
-----------------------------------------------------------
...
cdef int N = n
cdef unsigned long Seed = seed
cdef rng_sampler[double] * rng_p = new rng_sampler[double](Seed)
----------------------------------------------------------
example/example_seed.pyx:15:67 Cannot assign type 'unsigned long' to 'mt19937'
I interpret this message, along with the fact that example.pyx compiles and produces a working example.so file, that cython cannot find (or manage) the rng_sampler constructor that takes an unsigned long as input. I've not used cython before, and my cpp is middling at best. Can anyone shed light on how to fix this simple problem?
python: 2.7.10 (Anaconda 2.0.1)
cython: 0.22.1
I resolved the error, it was to do with how boost is installed. I had installed boost via apt-get. After downloading/untarring and changing the pointers to boost in setup.py, it works.

Why does numba have worse optimization than Cython in this code?

I am trying to optimize some code with numba. The problem is that a simple Cython optimization (just specifying data types) is six times faster than using autojit, so I don't know if I'm doing something wrong.
The function to optimize is:
from numba import autojit
#autojit(nopython=True)
def get_energy(system, i,j,m):
#system is an array, (i,j) some indices and m the size of the array
up=i-1; down=i+1; left=j-1; right=j+1
if up<0: total=system[m,j]
else: total=system[up,j]
if down>m: total+=system[0,j]
else: total+=system[down,j]
if left<0: total+=system[i,m]
else: total+=system[i,left]
if right>m: total+=system[i,0]
else: total+=system[i,right]
return 2*system[i,j]*total
A simple run would be something like this:
import numpy as np
x=np.random.rand(50,50)
get_energy(x, 3, 5, 50)
I've understood that numba is good at loops but may not optimize other things very well. Anyhow, I would expect a similar performance to Cython, is numba slower accessing arrays or at conditional statements?
The .pyx file in Cython is:
import numpy as np
cimport cython
cimport numpy as np
def get_energy(np.ndarray[np.float64_t, ndim=2] system, int i,int j,unsigned int m):
cdef int up
cdef int down
cdef int left
cdef int right
cdef np.float64_t total
up=i-1; down=i+1; left=j-1; right=j+1
if up<0: total=system[m,j]
else: total=system[up,j]
if down>m: total+=system[0,j]
else: total+=system[down,j]
if left<0: total+=system[i,m]
else: total+=system[i,left]
if right>m: total+=system[i,0]
else: total+=system[i,right]
return 2*system[i,j]*total
Please comment if I need to give further information.

Categories

Resources