Given some coordinates in 3D (x-, y- and z-axes), what I would like to do is to get a polynomial (fifth order). I know how to do it in 2D (for example just in x- and y-direction) via numpy. So my question is: Is it possible to do it also with the third (z-) axes?
Sorry if I missed a question somewhere.
Thank you.
Numpy has functions for multi-variable polynomial evaluation in the polynomial package -- polyval2d, polyval3d -- the problem is getting the coefficients. For fitting, you need the polyvander2d, polyvander3d functions that create the design matrices for the least squares fit. The multi-variable polynomial coefficients thus determined can then be reshaped and used in the corresponding evaluation functions. See the documentation for those functions for more details.
Related
I want to visualize the topic modeling made with the LDA-algorithm. I use the python module called "pyldavis" and as environment the jupyter notebook.
import pyLDAvis.sklearn
...
pyLDAvis.sklearn.prepare(lda_tf, dtm_tf, tf_vectorizer)
pyLDAvis.sklearn.prepare(lda_tf, dtm_tf, tf_vectorizer, mds='mmds')
pyLDAvis.sklearn.prepare(lda_tf, dtm_tf, tf_vectorizer, mds='tsne')
It does work fine, but I don't really understand the mds-parameter... Even after reading the documentation:
mds :function or a string representation of function
A function that takes topic_term_dists as an input and outputs a n_topics by 2 distance matrix. The output approximates the distance between topics. See js_PCoA() for details on the default function. A string representation currently accepts pcoa (or upper case variant), mmds (or upper case variant) and tsne (or upper case variant), if sklearn package is installed for the latter two.
Does somebody know what the differences btw. mds='pcoa', mds='mmds', mds='tsne'?
Thanks!
Dimension reduction via Jensen-Shannon Divergence &
pcoa:Principal Coordinate Analysis(aka Classical Multidimensional Scaling)
mmds:Metric Multidimensional Scaling
tsne:t-distributed Stochastic Neighbor Embedding
Simply put: text data, when transformed into numeric tabular data, usually is high-dimensional. On the other hand, visualizations on a screen is two-dimensional (2D). Thus, a method of dimension reduction is required to bring the number of dimensions down to 2.
mds stands for multidimensional scaling. The possible values of that argument are:
mmds (Metric Multidimensional Scaling),
tsne (t-distributed Stochastic Neighbor Embedding), and
pcoa (Principal Coordinate Analysis),
All of them are dimension reduction methods.
Another method of dimension reduction that may be more familiar to you but not listed above is PCA (principal component analysis). They all share the similar idea of reducing dimensionality without losing too much information, backed by different theories and implementations.
I got this array of data and I need to calculate the area under the curve, so I use the Numpy library and the Scipy library which contain the functions trapz in Numpy and integrate.simps in Scipy for a Numerical Integration which gave me a really nice result in both cases.
The problem now is, that I need the error for each one or at least the error for the Trapezoidal Rule. The thing is, that the formula for that ask me a function, which obviously I don't have. I have been researching for a way to obtain the error but always return to the same point...
Here are the pages of scipy.integrate http://docs.scipy.org/doc/scipy/reference/integrate.html and trapz in Numpy http://docs.scipy.org/doc/numpy/reference/generated/numpy.trapz.html I try and see a lot of code about the Numerical Integration and prefer to use the existing ones...
Any ideas please?
While cel is right that you cannot determine an integration error if you don't know the function, there is something you can do.
You can use curve fitting to fit a function through the available data points. You can then use that function for error estimation.
If you expect the data to fit a certain kind of function like a sine, log or exponential it is good to use that as a basis for curve fitting.
For instance, if you are measuring the drag on a moving car, it is known that this mostly proportional to the velocity squared because of air resistance.
However, if you do not have any knowledge about the applicable function then assuming you have N data points, there is a polynomial of the N-1 degree that fits exactly though all those data points. Determining such a polynomial from the data is solving a system of lineair equations. See e.g. polynomial interpolation. You could use this polynomial as an estimate for the unknown real function. Note however that outside the range of data points this polynomial might be wildly inaccurate.
I have a 3D array that I need to integrate numerically using Python. My array is a function of wavelength, depth and time. It is data that I have modelled numerically using another software package and don't have an analytical form of the function, just the 3d array output from the other package. I need to find the triple integral of this array. In Matlab I use trapz(my_array, 3) where 3 is the ndims to integrate over. The Scipy trapz only seems to work on a single integral.
I think I may have 2 options but I am I need some advice.
opt 1. use 3d interpolation in scipy that returns a function handle, do these exist? the 1d version returns a function, and then use scipy.integrate.tplquad to do the integration over the interpolated function where I use the max and in values in my array as the integration limits.
opt 2. use three nested trapz calls like this suggestion for 2d I found on another site. --> sp.trapz(sp.trapz(f, y[np.newaxis,:], axis=1), x, axis=0))
Can't quite get my head around to make either work. Any help/advice would be appreciated. I need to make sure that my integration error is as low as possible.
A recent immigrant to Python and scientific computing with Python. This is a question to avoid any duplication of code that might already exist.
I have a field that is sampled as a function of x and y in a regular grid. I would want to interpolate the data to obtain not only the value of the field at any point on the grid, but also the first and second derivatives. If I interpolate it with bicubic interpolation using interp2d, I can obtain the value of the field.
Does anyone have a suggestion on how to obtain the first and second derivatives of the field using an EXISTING numpy or scipy function?
Thanks!
The scipy.interpolate.interp2d.__call__ method has options dx and dy to evaluate higher derivatives at point (at least since version 0.14.0).
I have some data that are the integrals of an unknown curve within bins. For your interest, the data is ocean wave energy and the bins are for directions, e.g. 0-15 degrees. If possible, I would like to fit a curve on to the data that conserves the integrals within the bins. I've tried sketching it on a notepad with a pencil and it seems like it could be possible. Does anyone know of any curve-fitting tool in Python to do this, for example in the scipy interpolation sub-package?
Thanks in advance
Edit:
Thanks for the help. If I do it, it looks like I will try the method that is recommended in section 4 of this paper: http://journals.ametsoc.org/doi/abs/10.1175/1520-0485%281996%29026%3C0136%3ATIOFFI%3E2.0.CO%3B2. In theory, it basically uses matrices to make some 'fake' data from the known integrals between each band. When plotted, this data then produces an interpolated line graph that preserves the integrals.
It's a little outside my bailiwick, but I can suggest having a look at SciKits to see if there's anything there that might be useful. Other packages to browse would be pandas and StatsModels. Good luck!
If you have a curve f(x) which is an approximation to the integral of another curve g(x), i.e. f=int(g,x) then the two are related by the Fundamental theorem of calculus, that is, your original function is the derivative of the first curve g = df/dx. As such you can use numpy.diff or any of the higher order methods to approximate df/dx to obtain an estimate of your original curve.
One possibility: calculate the cumulative sum of the bin volumes (np.cumsum), fit an interpolating spline to it, and then take the derivative to get the curve.
scipy splines have methods to calculate the derivatives.
The only limitation, in case it is relevant in your case, the spline through the cumulative sum might not be monotonic, and the derivative might be negative over some intervals.
I guess that the literature on smoothing a histogram looks at similar constraints on the volume of the integral/bin, but I don't have any references ready.
1/ fit2histogram
Your question is about fitting an histogram. I just came through documentation for some Python package for Multi-Variate Pattern Analysis, PyMVPA, and some function for histogram fitting is proposed. An example is here: PyMVPA.
However, I guess that set of available distributions is limited to famous distributions.
2/ integral computation
As already mentionned, next solution is to approximate integral value, and to fit a model to the resulting set of data. Either you know explicit expression for the derivative, or you use computational derivation: finite difference, analytical method.