I am trying to use SUSI on hyperspectral data, but I am getting errors. I am sure that I am the problem and not SUSI.
import susi as su
import spectral as sp
import spectral.io.envi as envi
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
box = envi.open('C:/path/ref_16-2_22/normalised.hdr')
data = np.array(box.load())
som = su.SOMClassifier(n_rows=data.shape[0], n_columns=data.shape[1])
som.fit(data)
ValueError: estimator requires y to be passed, but the target y is None
som = su.SOMClustering(n_rows=data.shape[0], n_columns=data.shape[1])
som.fit(data)
ValueError: Found array with dim 3. None expected <= 2.
I am definitely the problem! Has anyone used SUSI on 3D data?
In general: the dimensions of the SOM (rows and columns) don't have to relate to the dimensions of your data.
For susi: You are using a classifier on data without class labels. in som.fit, you need to pass also the labels y:
som.fit(data, y)
data can be an n-D array, y would be a 1D array in your case I guess.
Alternatively, you can use unsupervised clustering:
som = SOMClustering()
som.fit(data)
[Disclaimer: I am the developer of susi.]
Related
I am trying to use statsmodles for panel and have an issue with the shape of my data. My model is a TVP-VAR for a panel in a normal linear state space model composed of the State Equation and the Measurement Equation, where I have managed to write it as in eq. 33 in Canova and Cicarelli (2013)
The key model equation, where X t = Xt and ut = Xt′+ut with UtN = 0 (I + 2 Xt′ Xt), is attached.
Key Model Equation
I use exactly this class of models from your site : TVP-VAR, MCMC, and sparse simulation smoothing.
https://www.statsmodels.org/devel/examples/notebooks/generated/statespace_tvpvar_mcmc_cfa.html
When I run the local model, I get the attached local graph, for the Simulations based on KFS approach, MLE parameters' and Simulations based on CFA approach, MLE parameters' where some countries and years appear in an unexpected format.
KFS and CFA unexpected unexpected outcome format
I suspect it has to do with the data shape I am using. You can see my actual data shape in the attached local screenshot.
When I run the Simulations with alternative parameterization yielding a smoother trend among the errors I get is
"
value' must be an instance of str or bytes, not a tuple.
"
In addition to an earlier
"An unsupported index was provided and will be ignored when, e.g. forecasting. self._init_dates(dates, freq) "
I suspect that has to do with my data shape and index.My dataset is in a long format.
A screenshot here
Data shape
My question is a bit naive. How do I reshape my data in order to be compatible with statsmodels? How do I rewrite my code in order to bring my data into an acceptable shape to run the TVP-VAR, MCMC, and sparse simulation smoothing?
Hope it is clear what I am looking. The code I am now using to import data is:
%matplotlib inline
from importlib import reload
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
from scipy.stats import invwishart, invgamma
#1
import pyreadstat
dtafile = 'panel.dta'
dta, meta = pyreadstat.read_dta(dtafile)
dta.tail()
labels=list(meta.column_labels)
column=list(meta.column_names)
# Panel data settings
year = dta.year
year = pd.Categorical(dta.year)
dta = dta.set_index([ "country", "year"])
dta["year"] = year
dta.head()
I would apreace if you help me setting the right shape format acceptable from statsmodles
I have a netcdf file with a spatial resolution of 0.05º and I want to regrid it to a spatial resolution of 0.01º like this other netcdf. I tried using scipy.interpolate.griddata, but I am not really getting there, I think there is something that I am missing.
original_dataset = xr.open_dataset('to_regrid.nc')
target_dataset= xr.open_dataset('SSTA_L4_MED_0_1dg_2022-01-18.nc')
According to scipy.interpolate.griddata documentation, I need to construct my interpolation pipeline as following:
grid = griddata(points, values, (grid_x_new, grid_y_new),
method='nearest')
So in my case, I assume it would be as following:
#Saving in variables the old and new grids
grid_x_new = target_dataset['lon']
grid_y_new = target_dataset['lat']
grid_x_old = original_dataset ['lon']
grid_y_old = original_dataset ['lat']
points = (grid_x_old,grid_y_old)
values = original_dataset['analysed_sst'] #My variable in the netcdf is the sea surface temp.
Now, when I run griddata:
from scipy.interpolate import griddata
grid = griddata(points, values, (grid_x_new, grid_y_new),method='nearest')
I am getting the following error:
ValueError: shape mismatch: objects cannot be broadcast to a single
shape
I assume it has something to do with the lat/lon array shapes. I am quite new to netcdf field and don't really know what can be the issue here. Any help would be very appreciated!
In your original code the indices in grid_x_old and grid_y_old should correspond to each unique coordinate in the dataset. To get things working correctly something like the following will work:
import xarray as xr
from scipy.interpolate import griddata
original_dataset = xr.open_dataset('to_regrid.nc')
target_dataset= xr.open_dataset('SSTA_L4_MED_0_1dg_2022-01-18.nc')
#Saving in variables the old and new grids
grid_x_old = original_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lon
grid_y_old = original_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lat
grid_x_new = target_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lon
grid_y_new = target_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lat
values = original_dataset.to_dataframe().reset_index().loc[:,["lat", "lon", "analysed_sst"]].analysed_sst
points = (grid_x_old,grid_y_old)
grid = griddata(points, values, (grid_x_new, grid_y_new),method='nearest')
I recommend using xesm for regridding xarray datasets. The code below will regrid your dataset:
import xarray as xr
import xesmf as xe
original_dataset = xr.open_dataset('to_regrid.nc')
target_dataset= xr.open_dataset('SSTA_L4_MED_0_1dg_2022-01-18.nc')
regridder = xe.Regridder(original_dataset, target_dataset, "bilinear")
dr_out = regridder(original_dataset)
Python t-sne implementation from this resource: https://lvdmaaten.github.io/tsne/
Btw I'm a beginner to scRNA-seq.
What I am trying to do: Use a scRNA-seq data set and run t-SNE on it but with using previously calculated PCAs (I have PCA.score and PCA.load files)
Q1: I should be able to use my selected calculated PCAs in the tSNE, but which file do I use the pca.score or pca.load when running Y = tsne.tsne(X)?
Q2: I've tried removing/replacing parts of the PCA calculating code to attempt to remove PCA preprocessing but it always seems to give an error. What should I change for it to properly use my already PCA data and not calculate PCA from it again?
The piece of PCA processing code is this in its raw form:
def pca(X=np.array([]), no_dims=50):
"""
Runs PCA on the NxD array X in order to reduce its dimensionality to
no_dims dimensions.
"""
print("Preprocessing the data using PCA...")
(n, d) = X.shape
X = X - np.tile(np.mean(X, 0), (n, 1))
(l, M) = X #np.linalg.eig(np.dot(X.T, X))
Y = np.dot(X, M[:, 0:no_dims])
return Y
You should use the PCA score.
As for not running pca, you can just comment out this line:
X = pca(X, initial_dims).real
What I did is to add a parameter do_pca and edit the function such:
def tsne(X=np.array([]), no_dims=2, initial_dims=50, perplexity=30.0,do_pca=True):
"""
Runs t-SNE on the dataset in the NxD array X to reduce its
dimensionality to no_dims dimensions. The syntaxis of the function is
`Y = tsne.tsne(X, no_dims, perplexity), where X is an NxD NumPy array.
"""
# Check inputs
if isinstance(no_dims, float):
print("Error: array X should have type float.")
return -1
if round(no_dims) != no_dims:
print("Error: number of dimensions should be an integer.")
return -1
# Initialize variables
if do_pca:
X = pca(X, initial_dims).real
(n, d) = X.shape
max_iter = 50
[.. rest stays the same..]
Using an example dataset, without commenting out that line:
import numpy as np
from sklearn.manifold import TSNE
from sklearn.datasets import load_digits
import matplotlib.pyplot as plt
import sys
import os
from tsne import *
X,y = load_digits(return_X_y=True,n_class=3)
If we run the default:
res = tsne(X=X,initial_dims=20,do_pca=True)
plt.scatter(res[:,0],res[:,1],c=y)
If we pass it a pca :
pc = pca(X)[:,:20]
res = tsne(X=pc,initial_dims=20,do_pca=False)
plt.scatter(res[:,0],res[:,1],c=y)
I am currently struggling to obtain a summary of the statistics of a model I ran through Bayesian regression on. I first used Lasso and model selection to filter the best variables, then used pm.Model to obtain the regression proper.
Of course, having 'filtered' the explanatory variables that weren't relevant, the shape of the X matrix had changed. The data I worked on is the load_boston dataset from sklearn.dataset. I coded the data as independent variable and the target as dependent variable.
Having performed model selection with SelectFromModel, I used the get.support method to obtain an index of the retained variables. I then used a loop over both the indexes of all variables and the numbers contained in the support, with the purpose of storing the names of the retained variables in an empty list I had created at hoc. The code looks something like this
import pandas as pd
import numpy as np
import pymc3 as pm
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(9)
# Load the boston dataset.
from sklearn.datasets import load_boston
boston = load_boston()
X, y = boston['data'], boston['target']
# Here is the code for the estimator LassoCV
# Here is the code for Model Selection
support(indices=True) #to obtain the list of indices of retained variables
X_transform = sfm.transform(X) #to remove the unnecessary variables
#Here is the line for linear modeling
#I initialize some useful variables
m = y.shape[0]
n = X.shape[1]
c = supp.shape[0]
L = boston['feature_names']
varnames=[]
for i in range (0, n):
for j in range (0, c):
if i == supp[j]:
varnames.append(L[i])
pm.summary(trace, varnames=varnames)
The console then displays 'KeyError: RM', which is one of the names of the variables used. One issue I noticed that every object of varnames is classified as str_ object of numpy module, meaning that I can't read the name of the retained variables on the list unless I double click on them.
How could I fix this? I have no clue what I am doing wrong.
How to convert a image to datasets or numpy array and to predict by fiting it to clf
import PIL as pillow
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
infilename=input()
im=Image.open(infilename)
imarr=np.array(im)
flatim=imarr.flatten('F')
clf=svm.SVC(gamma=0.0001,C=100)
x,y=im.size
#how to fit the numpy array to clf
clf.fit(flatim[:-1],flatim[:-1])
print("prediction:",clf.predict(flatim[-1]))
plt.imshow(flatim,camp=plt.cm.gray_r,interpolation='nearest')
plt.show()
Anyone please and thanks!!!
there is no other reason of using SVM on a single image except for fun of doing it. Here are the fixes I did. 1) use .convert("L") to convert the image as 2D array grayscale. 2) created a dummy target variable y as randomized 1D array. 3) fix type error displaying the image again (plt.imshow) cmap (instead of camp) and im (instead of flatim)
import PIL as pillow
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
im=Image.open("sample.jpg").convert("L")
imarr=np.array(im)
flatim=imarr.flatten('F')
clf=svm.SVC()
#X,y=im.size
X = imarr
y = np.random.randint(2, size=imarr.shape[0])
clf.fit(X, y)
#how to fit the numpy array to clf
#clf.fit(flatim[:-1],flatim[:-1])
# I HAVE NO IDEA WHAT I"M DOING HERE!
print("prediction:", clf.predict(X[-2:-1]))
plt.imshow(im,cmap=plt.cm.gray_r,interpolation='nearest')
plt.show()
I see a good example in scikit-learn website of using SVM. I guess this is what you are trying to copy. Isn't?