I am having an issue with the pycopula library.
The example (provided on https://github.com/blent-ai/pycopula) imports a csv dataset and then uses it in the function. I have generated two random variable, uniformly distributed, and combined them into a pd.DataFrame() . I then tried to estimate a Clayton copula.
import pandas as pd
from pycopula.copula import ArchimedeanCopula
x1 = np.random.uniform(size=3000)
x2 = np.random.uniform(size=3000)
X = pd.DataFrame(); X[0]=x1; X[1]=x2
archimedean = ArchimedeanCopula(family="clayton", dim=2)
archimedean.fit(X, method="cmle")
I am getting a TypeError: '(0, slice(None, None, None))' is an invalid key. If anyone has used this library before and knows what input does the function take, I would be grateful. The full documentation link that it is provided on GitHub redirects me to a non-existing website (Error 404). Thanks!
I think that the method fit() takes data in numpy array type. You can't put a dataframe into it.
X : numpy array (of size n * copula dimension)
Use Dataframe.to_numpy() to change into right type. Hope it works.
Related
I am trying to use statsmodles for panel and have an issue with the shape of my data. My model is a TVP-VAR for a panel in a normal linear state space model composed of the State Equation and the Measurement Equation, where I have managed to write it as in eq. 33 in Canova and Cicarelli (2013)
The key model equation, where X t = Xt and ut = Xt′+ut with UtN = 0 (I + 2 Xt′ Xt), is attached.
Key Model Equation
I use exactly this class of models from your site : TVP-VAR, MCMC, and sparse simulation smoothing.
https://www.statsmodels.org/devel/examples/notebooks/generated/statespace_tvpvar_mcmc_cfa.html
When I run the local model, I get the attached local graph, for the Simulations based on KFS approach, MLE parameters' and Simulations based on CFA approach, MLE parameters' where some countries and years appear in an unexpected format.
KFS and CFA unexpected unexpected outcome format
I suspect it has to do with the data shape I am using. You can see my actual data shape in the attached local screenshot.
When I run the Simulations with alternative parameterization yielding a smoother trend among the errors I get is
"
value' must be an instance of str or bytes, not a tuple.
"
In addition to an earlier
"An unsupported index was provided and will be ignored when, e.g. forecasting. self._init_dates(dates, freq) "
I suspect that has to do with my data shape and index.My dataset is in a long format.
A screenshot here
Data shape
My question is a bit naive. How do I reshape my data in order to be compatible with statsmodels? How do I rewrite my code in order to bring my data into an acceptable shape to run the TVP-VAR, MCMC, and sparse simulation smoothing?
Hope it is clear what I am looking. The code I am now using to import data is:
%matplotlib inline
from importlib import reload
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
from scipy.stats import invwishart, invgamma
#1
import pyreadstat
dtafile = 'panel.dta'
dta, meta = pyreadstat.read_dta(dtafile)
dta.tail()
labels=list(meta.column_labels)
column=list(meta.column_names)
# Panel data settings
year = dta.year
year = pd.Categorical(dta.year)
dta = dta.set_index([ "country", "year"])
dta["year"] = year
dta.head()
I would apreace if you help me setting the right shape format acceptable from statsmodles
I'm trying to take the inverse by trying to implement $$(X^{T}X)^{-1}X^{T}y$$
However, when doing so, I get the following error: TypeError: No loop matching the specified signature and casting was found for ufunc inv
My code is as follows:
import pandas as pd
import numpy as np
from numpy.linalg import inv
data = pd.read_csv("Tbill10yr.csv")
X = data.as_matrix()[:,1]
X1 = X[:730]
y_1 = X[1:,].reshape((730,1))
Nobs = y_1.shape[0]
X1 = np.c_[ np.ones( (Nobs,1) ) , X1]
XX = np.dot(X1.T , X1)
Xy = np.dot(X1.T , y_1)
beta_hat = np.dot(inv(XX),Xy)
I later figured out that I had to use beta_hat = np.dot(inv(XX.dtype(float)),Xy)
Why is it necessary to do so ? Is there a proper way to go abouts this ?
Any explanation is greatly apprecaited.
Thanks
One possible reason may be that your data was not recognized as float. From data.as_matrix documentation:
The dtype will be a lower-common-denominator dtype (implicit
upcasting); that is to say if the dtypes (even of numeric types) are
mixed, the one that accommodates all will be chosen. Use this with
care if you are not dealing with the blocks.
You can check the type of your arrays by doing XX.dtype.
I want to transform data
[0.54667, 0.471447, 0.826591, 0.330514, 0.7263, 0.496063, 0.520698, 0.321594, 0.351358, 0.894333]
to distribution
'dgamma(a=0.91, loc=0.48, scale=0.15)'
How to do in python
First of all, you don't need to generate an distribution object in advance. All you need would be the distribution params using codes below.
from scipy.stats import gamma
import numpy as np
data = [1,2,3,4,5] # your data
fit_alpha, fit_loc, fit_beta = gamma.fit(np.array(data), floc=0, fscale=1)
Then, you can use scipy.stats.gamma funtions to get PDF/CDF/ec. Like:
print(gamma.pdf(0.9, fit_alpha))
Check out the documentations to find the useful calls.
I have 2 input files of identical size/shape, however the data they contain has a different resolution and I am looking to perform a chi squared test on them.
The input files are 500 lines long and contain 4 columns delineated by spaces, I am trying to test the second column of each input file against the other.
My code is as follows:
# Import statements
C = pl.loadtxt("input_1.txt")
D = pl.loadtxt("input_2.txt")
col2_C = C[:,1]
col2_D = D[:,1]
f_obs = np.array([col2_C])
f_exp = np.array([col2_D])
chisquare(f_obs, f_exp)
This gives me an error saying:
ValueError: df <= 0
I don't even understand what it is complaining about here.
I have tried several other syntaxes within the script, each of which also resulted in various errors:
This one was found here.
chisquare = f_obs=[col2_C], f_exp=[col2_D])
TypeError: chisquare() takes at least one positional argument
Then I tried
chisquare = f_obs(col2_C), F_exp=[col2_D)
NameError: name 'f_obs' is not defined
I also tried several other syntactical tweaks but nothing to any avail. If anybody could please help me get this running I would appreciate it greatly.
Thanks in advance.
First, be sure you are importing chisquare from scipy.stats. Numpy has the function numpy.random.chisquare, but that does not do a statistical test. It generates samples from a chi-square probability distribution.
So be sure you use:
from scipy.stats import chisquare
There is a second problem.
As slices of the two-dimensional array returned by loadtxt, col2_C and col2_D are one-dimensional numpy arrays, so there is no need to use, for example, np.array([col2_C]) when you pass these to chisquare. Just use col2_C and col2_D directly:
chisquare(col2_C, col2_D)
Wrapping the arrays with np.array like you did is causing the problem. chisquare accepts multidimensional arrays and an axis argument. When you do f_exp = np.array([col2_C]) (with the extra square brackets), f_exp is actually a two-dimensional array, with shape (1, 500). Similarly f_obs has shape (1, 500). The default axis argument of chisquare is 0. So when you called chisquare(f_obs, f_exp), you were asking chisquare to perform 500 chi-square tests, with each test having a single observed and expected value.
I've looked all over the place and am not finding a solution to this issue. I feel like it should be fairly straightforward, but we'll see.
I have a .FITS format data cube and I need to collapse it into a 2D FITS image. The data cube has two spacial dimensions and one spectral/velocity dimension.
Just looking for a simple python routine to load in the cube and flatten all these layers (i.e. integrate them along the spectral/velocity axis). Thanks for any help.
This tutorial on pyfits is a little old, but still basically correct. The key is that the output of opening a FITS cube with pyfits (or astropy.io.fits) is that you have a 3 dimensional numpy array.
import pyfits
# if you are using astropy then for this example
# from astropy.io import fits as pyfits
data_cube, header_data_cube = pyfits.getdata("data_cube.fits", 0, header=True)
data_cube.shape
# (Z, X, Y)
You then have to decided how to flatten/integrate cube along the Z axis, and there are plenty of resources out there to help you decide the right (hopefully based in some analysis framework) to do that.
OK, this seems to work:
import pyfits
import numpy as np
hdulist = pyfits.open(filename)
header = hdulist[0].header
data = hdulist[0].data
data = np.nan_to_num(data)
new_data = data[0]
for i in range(1,84): #this depends on number of layers or pages
new_data += data[i]
hdu = pyfits.PrimaryHDU(new_data)
hdu.writeto(new_filename)
One problem with this routine is that WCS coordinates (which are attached to the original data cube) are lost during this conversion.
This is a bit of an old question, but spectral-cube now provides a better solution for this.
Example, based on Teachey's answer:
from spectral_cube import SpectralCube
cube = SpectralCube.read(filename)
summed_image = cube.sum(axis=0)
summed_image.hdu.writeto(new_filename)