Sum of multiple distributions - python

Background
I try to estimate the potential energy supply within a geographical area using spatially explicit data. For this purpose, I build a Bayesian network (HydeNet package) and attached it to a raster stack in R. The Bayesian network model reads the input data (e.g resource supply, conversion efficiency) of each cell location from the raster stack and computes the corresponding energy supply (MCMC simulations). As a result I obtain a new raste layer with a specific probability distribution of the expected energy supply for each raster cell.
However, I am equally interested in the total energy supply within the study area. That means I need to aggregate (sum) the potential supply of all the raster cells in order to get the overall supply potential within the area.
Click here for visual example
Research
The mathematical operation I want to do is called convolution. R provides a corresponding function called convolve that makes use of the Fast Fourrier Transfomration.
The examples I found so far (e.g. example 1, 2) were limited to the addition of two distributions at a time. However, I would like to sum-up multiple distributions (thousands, millions).
Question
How can I sum-up (convolve) multiple probabilty distributions?
I have up to 18,000,000 probability distributions. Thus the computation efficiency will certainly be an big issue.
Further, I am mainly interested in a solution in R, but other solutions (notably Python) are appreciated too.

I don't know if convolving multiple distributions at once would result in a speed increase. Wouldn't somthing like a123 = convolve(a1, a2, a3) behind the scenes simplify to a12 = convolve(a1, a2); a123 = convolve(a12, a30)?. Regardless, in R what you could try is using the foreach package and do all convolutions in parallel. on a quad core that would speed up the calculations (theoretically) by a factor 4. If you really want more speed you could try to use the OpenCL package to see if you can do these calculations parallel on a GPU, but this is programmingwise not easy to get into. If I were you I would focus more on these kind of solutions than trying to speed up functions that do convolutions.

Related

How to determine correlation values for angular data?

I have a plurality of timeseries of angular data. These values are not vectors (no magnitude), just angles. I need to determine among the various timeseries how correlated they are with each other (e.g., would like to obtain a correlation matrix) over the duration of the data. For example, some are measured very close to each other and I expect will be highly correlated, but I'm interested in also seeing how correlated the further measurements are.
How would I go about adapting this angular data in order to be able to obtain a correlation matrix? I thought about just vectorizing it (i.e., with unit vectors), but then I'm not sure how to do the correlation analysis with this two-dimensional data, as I've only done it with one dimensional previously. Of course, I can't simply analyze the correlation of the angles themselves, due to the nature of angular data (the reset at 0-360).
I'm working in Python, so if anyone has any recommendations on relevant packages I would appreciate it.
I have found a solution in the Astropy python package. The following function is suitable for circular correlation:
https://docs.astropy.org/en/stable/api/astropy.stats.circcorrcoef.html

What is the logic behind finding the Trend Component using Convolution Filters in STL Decomposition?

I'm trying to analyse the Source Code of STL Decomposition using Loess and identify the math behind splitting the observed data into Seasonality, Trend and Residual. Please find below the link to the source code of the STL:
https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/seasonal.py
I'm able to decode the values of Seasonality. But to find how the value for Trend is calculated, I'm redirected to a Convolution Filter function which inturn makes further more calls to compute the values for Trend.
I need 2 information out of this :
1. How are the filter values (an array) generated ? (the logic behind it)
2. How are the trend values calculated using Convolution filters?
The underlying STL implementation appears to be a direct port of the original Fortran implementation available in netlib. I recommend reading the original paper. The authors show that the combination of filters can separate different scales of variation with the "trend" component being the longest scale variation. Stability of the iteration implies certain constraints on the smoothing widths that are specified for the various stages of the iteration. I've done a less literal translation to java, available here, that might (or might not) be helpful in understanding the algorithm. It extends the Fortran implementation to allow for local quadratic LOESS interpolation. This was described in the paper but only implemented in the S version (the R version does not have this - it is a wrapper for the original Fortran).

How can I statistically compare a lightcurve data set with the simulated lightcurve?

With python I want to compare a simulated light curve with the real light curve. It should be mentioned that the measured data contain gaps and outliers and the time steps are not constant. The model, however, contains constant time steps.
In a first step I would like to compare with a statistical method how similar the two light curves are. Which method is best suited for this?
In a second step I would like to fit the model to my measurement data. However, the model data is not calculated in Python but in an independent software. Basically, the model data depends on four parameters, all of which are limited to a certain range, which I am currently feeding mannualy to the software (planned is automatic).
What is the best method to create a suitable fit?
A "Brute-Force-Fit" is currently an option that comes to my mind.
This link "https://imgur.com/a/zZ5xoqB" provides three different plots. The simulated lightcurve, the actual measurement and lastly both together. The simulation is not good, but by playing with the parameters one can get an acceptable result. Which means the phase and period are the same, magnitude is in the same order and even the specular flashes should occur at the same period.
If I understand this correctly, you're asking a more foundational question that could be better answered in https://datascience.stackexchange.com/, rather than something specific to Python.
That said, as a data science layperson, this may be a problem suited for gradient descent with a mean-square-error cost function. You initialize the parameters of the curve (possibly randomly), then calculate the square error at your known points.
Then you make tiny changes to each parameter in turn, and calculate how the cost function is affected. Then you change all the parameters (by a tiny amount) in the direction that decreases the cost function. Repeat this until the parameters stop changing.
(Note that this might trap you in a local minimum and not work.)
More information: https://towardsdatascience.com/implement-gradient-descent-in-python-9b93ed7108d1
Edit: I overlooked this part
The simulation is not good, but by playing with the parameters one can get an acceptable result. Which means the phase and period are the same, magnitude is in the same order and even the specular flashes should occur at the same period.
Is the simulated curve just a sum of sine waves, and are the parameters just phase/period/amplitude of each? In this case what you're looking for is the Fourier transform of your signal, which is very easy to calculate with numpy: https://docs.scipy.org/doc/scipy/reference/tutorial/fftpack.html

Python programme to find relation between two parameaters

I have the expirimental value of 16 intensity values corresponding to 16 distance. I want to find the relation between Thea's points as an approximate equation,so that i can tell distance required to corresponding intensity value with out plotting the graph.
Is there any python programme for this ?
I can share the values,if required.
Based on the values you have given us, I highly doubt fitting a graph rule to this will work at all. The reason being is this:
If you aren't concerned with minute changes (in the decimals), then you can essentially estimate this to be 5.9 as a fair estimate. If you are concerned with these changes, then looking at the data it has a seemingly erratic behaviour, and I highly doubt you will get an r^2 value sufficient for any practical use.
If you had significantly more points you may be able to make a graph rule from this, or even apply a machine learning model to it (the data is simple enough that a basic feed forward neural network would work. Search for tensorflow), but with just those points a guess of 5.9 is as good as any.

How to convolve two distirbutions from scipy library

I have seen (by researching) convolution being done via numpy, but if I wish to convolve two standard distributions (specifically a normal with a uniform) which are readily available in the scipy library, is there a direct way of doing it rather than creating two arrays via numpy and convolve?
In general, computing convolutions for distributions requires solving integrals. I worked on this problem as part of the work for my dissertation [1] and wrote some Java (rather idiosyncratic) to carry out the operations. Basically my approach was to make a catalog of distributions for which there are known results, and fall back on a numerical method (convolution via discretization and FFT) if there is no known result.
For the combination of a Gaussian and a uniform, the result is like a Gaussian bump split into two and pasted onto each end of a uniform distribution, when the uniform is wide enough, otherwise it just looks like a bump. I can try to find formulas for that if you are interested.
You can try to compute the integrals via a symbolic computation system such as Maxima. [2] For example, Maxima says the convolution of a unit Gaussian with a unit uniform is:
-(erf((sqrt(2)*s-sqrt(2))/2)-erf(s/sqrt(2)))/2
[1] http://riso.sourceforge.net/docs/dodier-dissertation.pdf (in particular section C.3.17)
[2] http://sourceforge.net/p/maxima

Categories

Resources