I have a small issue with running code from a tutorial that isn't working as it should. It's not a syntax problem for sure. I'm working with scikit-learn and matplotlib, and I'm getting a warning message in my IDE "Cannot find reference 'gray_r' in 'cm.py'..." All my packages are installed properly (via pip) and have worked for sample programs except this.
Any advice?
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm
digits = datasets.load_digits()
print(digits.data)
print(digits.target)
print(digits.images[0])
clf = svm.SVC(gamma=0.001, C=100)
print(len(digits.data))
x, y = digits.data[:-1], digits.target[:-1]
clf.fit(x,y)
print('Prediction:', clf.predict(digits.data[-1])
plt.imshow(digits.images[-1], cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()
Well for starters your missing a closing parenthesis on your last print statement: print('Prediction:', clf.predict(digits.data[-1])) Other than that, this code runs on my computer with only a deprecation warning. What does the traceback say?
Related
When I run any code using kmeans clustering from sklearn, my python crashes (e.g., the kernel dies in Jupyter). This is not a memory usage issue and from what I can tell sklearn is up to date (version 1.0.2).
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()
sns.set_style('white')
from sklearn.cluster import KMeans
import sklearn
print('The scikit-learn version is {}.'.format(sklearn.__version__))
# Sample data for clustering
data_file = 'cluster_data.csv'
df = pd.read_csv(data_file,index_col='id')
X = df[['x1','x2']]
# Plotting data for visual inspection of clusters
plt.figure(figsize = (10, 10)) # determines the size of the plot area
ax = sns.scatterplot(x='x1', y='x2',data=df,edgecolor='grey',alpha=0.5)
# Kmeans clustering
sklearn.cluster.KMeans(n_clusters=3, init='random').fit(df) # This is where the kernel dies
kmeans_centroids = kmeans.cluster_centers_
kmeans_labels_k3 = kmeans.labels_
When running the 'sklearn.cluster.KMeans' I get the message:
'The kernel appears to have died. It will restart automatically.'
Any suggestions?
(Other sklearn packages work e.g., random forests)
Access to data can be found here:
https://github.com/JakeTufts/Health-Data-Science-Msc/blob/Stack-overflow/cluster_data.csv
Your input data is a Pandas DataFrame, try to use a numpy matrix instead
df2 = df.to_numpy()
sklearn.cluster.KMeans(n_clusters=3, init='random').fit(df2) # This is where the kernel dies
Now it should work, please let me know the results.
You maybe see a warning message which says about incompatible libraries with tutorial data.
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
export MKL_THREADING_LAYER=GNU
This solved that kmeans kills my jupyter kernel.
Ref = https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md#workarounds-for-intel-openmp-and-llvm-openmp-case
I am trying to build a score matching using pymatch. Unfortunately I am getting the following error
Fitting Models on Balanced Samples: 1\200Error: Unable to coerce to Series, length must be 1: given 1898
Here is my code
from sklearn.datasets.samples_generator import make_blobs
from pymatch.Matcher import Matcher
import pandas as pd
import numpy as np
X, y = make_blobs(n_samples=5000, centers=2, n_features=2, cluster_std=3.5)
df = pd.DataFrame(dict(x=X[:,0], y=X[:,1], label=y))
df['population'] = np.random.choice([1, 0], size=len(df), p=[0.8, 0.2])
control = df[df.label == 1]
test = df[df.label == 0]
m = Matcher(test, control, yvar="population", exclude=['label'])
m.fit_scores(balance=True, nmodels=200)
if I ran this code I will get the error. I am quite sure that I was able to run this before, but after changing some versions, this doesn't work anymore. Unfortunately I wasn't able to fix it by going back to previous versions, so not sure what's going on here...
Downgrading pandas did not work for me, but I found where the problem is.
It is an error in the method _scores_to_accuracy() of Matcher.py. I downloaded the source file, edited the function on my local machine, and now it works fine.
https://github.com/benmiroglio/pymatch/issues/23
Please downgrade your pandas, to version 0.23.4.
Use the code:
pip install pandas==0.23.4
First I followed an iris tutorial and it worked great! the program ran fine and did everything it was supposed to do. Then I started working on a pickle tutorial to pickle data then open it again ... then everything went crazy. Now I have a pycache folder in my code folder that wasn't there and I am getting the following error:
AttributeError: module 'numpy' has no attribute 'dtype'
So far I have tried completely wiping scipy, numpy, sklearn, and pandas from my computer and reinstalling. Then I tried disabling rapport (I'm on an Ubuntu machine) because a part of the bug long error code kept talking about it.
Below is the program I ran that I think caused this.
Save Model Using Pickle
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
import pickle
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
test_size = 0.33
seed = 7
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size=test_size, random_state=seed)
# Fit the model on 33%
model = LogisticRegression()
model.fit(X_train, Y_train)
# save the model to disk
filename = 'finalized_model.sav'
pickle.dump(model, open(filename, 'wb'))
Upon further investigation I realized I saved the code as pickle.py on my computer (in the same folder that the pycache was appearing). I changed it to pickle1.py and now everything works. Lesson learned don't name code after modules...
I might guess that your numpy installation somehow got stepped on. Maybe try "pip install --upgrade --force-reinstall numpy" in your command line?
Or maybe it's a line that says "numpy.dtype" somewhere is being used wrong. You'd have to share at least that line of code to see that though.
Just wild guesses without having your entire setup.
I'm trying to run a regression analysis with the below mentioned code. I encounter ImportError: No module named statsmodels.api and No module named matplotlib.pyplot. Any suggestions will be appreciated to overcome this error.
import pandas as pd
import numpy as np
import seaborn as sns
from scipy import stats, integrate
import matplotlib.pyplot as plt
import statsmodels.api as sm
data = pd.read_csv("F:\Projects\Poli_Map\DAT_OL\MASTRTAB.csv")
# define the data/predictors as the pre-set feature names
df = pd.DataFrame(data.data, columns=data.feature_names)
# Put the target (IMR) in another DataFrame
target = pd.DataFrame(data.target, columns=["IMR"])
X = df["HH_LATR","COMM_TOILT","PWS"]
y = target["IMR"]
model = sm.OLS(y, X).fit()
predictions = model.predict(X) # make the predictions by the model
# Print out the statistics
model.summary()
plt.scatter(predictions, y, s=30, c='r', marker='+', zorder=10) #Plot graph
plt.xlabel("Independent variables")
plt.ylabel("Outcome variables")
plt.show()
I highly recommend that you install ANACONDA. This way the environment variables are set automatically and you don't need to worry about anything else. There are many useful packages (e.g. numpy, sympy, scipy) which are bundled with anaconda.
Moreover, based on personal experience I can tell you that using pip on windows and compiling from source (you need visual studio) is a pain in the neck sometimes. That's why ANACONDA has been conceived.
see : https://www.anaconda.com/download/
Hope this helps.
I am a newbie and the following question may be dumb and not well written.
I tried the following block of codes in Ipython:
%pylab qt5
x = randn(100,100)
y = mean(x,0)
import seaborn
plot(y)
And it delivered a plot. Everything was fine.
However, when I copied and pasted those same lines of codes to Pycharm and tried running, syntax error messages appeared.
For instance,
%pylab was not recognized.
Then I tried to import numpy and matplotlib one by one. But then,
randn(.,.) was not recognized.
You can use IPython/Jupyter notebooks in PyCharm by following this guide:
https://www.jetbrains.com/help/pycharm/using-ipython-jupyter-notebook-with-pycharm.html
You may modify code like the snippet below in order to run in PyCharm:
from numpy.random import randn
from numpy import mean
import seaborn
x = randn(10, 10)
y = mean(x, 0)
seaborn.plt.plot(x)
seaborn.plt.show()