I am trying to plot a graph with the calculated linear regression, but I get the error "ValueError: x and y must have same first dimension".
This is a multivariate (2 variables) linear regression with 3 samples (x1,x2,x3).
1 - First, I am calculating the linear regression correctly?
2 - I know that the error comes from the plot lines. I just don't understand why I get this error. What is the right dimensions to put in the plot?
import numpy as np
import matplotlib.pyplot as plt
x1 = np.array([3,2])
x2 = np.array([1,1.5])
x3 = np.array([6,5])
y = np.random.random(3)
A = [x1,x2,x3]
m,c = np.linalg.lstsq(A,y)[0]
plt.plot(A, y, 'o', label='Original data', markersize=10)
plt.plot(A, m*A + c, 'r', label='Fitted line')
plt.legend()
plt.show()
$ python testNumpy.py
Traceback (most recent call last):
File "testNumpy.py", line 22, in <module>
plt.plot(A, m*A + c, 'r', label='Fitted line')
File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 2987, in plot
ret = ax.plot(*args, **kwargs)
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 4137, in plot
for line in self._get_lines(*args, **kwargs):
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 317, in _grab_next_args
for seg in self._plot_args(remaining, kwargs):
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 295, in _plot_args
x, y = self._xy_from_xy(x, y)
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 237, in _xy_from_xy
raise ValueError("x and y must have same first dimension")
ValueError: x and y must have same first dimension
The problem here is that you're creating a list A where you want an array instead. m*A is not doing what you expect.
This:
A = np.array([x1, x2, x3])
will get rid of the error.
NB: multiplying a list A and an integer m gives you a new list with the original content repeated m times. Eg.
>>> [1, 2] * 4
[1, 2, 1, 2, 1, 2, 1, 2]
Now, m being a floating point number should have raised a TypeError (because you can only multiply lists by integers)... but m turns out to be a numpy.float64, and it seems like when you multiply it to some unexpected thing (or a list, who knows), NumPy coerces it to an integer.
Related
# Python code to demonstrate SQL to fetch data.
# importing the module
import sqlite3
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from scipy.stats import chisquare
# connect withe the myTable database
connection = sqlite3.connect(r"C:\Users\Aidan\Desktop\INA_DB.db")
# cursor object
crsr = connection.cursor()
dog= crsr.execute("Select s, ei, ki FROM INa_VC WHERE s IN ('d') ")
ans= crsr.fetchall()
#x = [0]*len(ans); y = [0]*len(ans)
x= np.zeros(len(ans)); y= np.zeros(len(ans))
for i in range(0,len(ans)):
x[i] = float(ans[i][1])
y[i] = float(ans[i][2])
# Reshaping
x, y = x.reshape(-1,1), y.reshape(-1, 1)
# Linear Regression Object
lin_regression = LinearRegression()
# Fitting linear model to the data
lin_regression.fit(x,y)
# Get slope of fitted line
m = lin_regression.coef_
# Get y-Intercept of the Line
b = lin_regression.intercept_
# Get Predictions for original x values
# you can also get predictions for new data
predictions = lin_regression.predict(x)
chi= chisquare(predictions, y)
# following slope intercept form
print ("formula: y = {0}x + {1}".format(m, b))
print(chi)
plt.scatter(x, y, color='black')
plt.plot(x, predictions, color='blue',linewidth=3)
plt.show()
Error:
runfile('C:/Users/Aidan/.spyder-py3/temp.py',
wdir='C:/Users/Aidan/.spyder-py3')
Traceback (most recent call last):
File "", line 1, in
runfile('C:/Users/Aidan/.spyder-py3/temp.py',
wdir='C:/Users/Aidan/.spyder-py3')
File
"C:\Users\Aidan\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py",
line 705, in runfile
execfile(filename, namespace)
File
"C:\Users\Aidan\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py",
line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Aidan/.spyder-py3/temp.py", line 28, in
y[i] = float(ans[i][2])
ValueError: could not convert string to float:
The issue that I am 99 percent sure of is an issue with the Y value. For my data set I have some y values purposely missing and this is leading to a float error. Given my current script what would be a quick fix in order to filter OUT missing NAN y values?
This script works perfectly if y values are in there.
Probably the best answer is storing those values as the string "nan" in your db, which float parses just fine. Afterwards you can use for example np.isnan to get those values that are not defined.
As an alternative, leave them at zero:
for i in range(0, len(ans)):
try:
x[i] = float(ans[i][1])
except ValueError:
pass
try:
y[i] = float(ans[i][2])
except ValueError:
pass
Or, leave them out entirely:
xy = np.array([tuple(map(float, values[1:])) for values in ans if values[2]])
x = xy[:, 0]
y = xy[:, 1]
I taught myself the Metropolis Algorithm and decided to try code it in Python. I chose to simulate the Ising model. I have an amateur understanding of Python and with that here is what I came up with -
import numpy as np, matplotlib.pyplot as plt, matplotlib.animation as animation
def Ising_H(x,y):
s = L[x,y] * (L[(x+1) % l,y] + L[x, (y+1) % l] + L[(x-1) % l, y] + L[x,(y-1) % l])
H = -J * s
return H
def mcstep(*args): #One Monte-Carlo Step - Metropolis Algorithm
x = np.random.randint(l)
y = np.random.randint(l)
i = Ising_H(x,y)
L[x,y] *= -1
f = Ising_H(x,y)
deltaH = f - i
if(np.random.uniform(0,1) > np.exp(-deltaH/T)):
L[x,y] *= -1
mesh.set_array(L.ravel())
return mesh,
def init_spin_config(opt):
if opt == 'h':
#Hot Start
L = np.random.randint(2, size=(l, l)) #lxl Lattice with random spin configuration
L[L==0] = -1
return L
elif opt =='c':
#Cold Start
L = np.full((l, l), 1, dtype=int) #lxl Lattice with all +1
return L
if __name__=="__main__":
l = 15 #Lattice dimension
J = 0.3 #Interaction strength
T = 2.0 #Temperature
N = 1000 #Number of iterations of MC step
opt = 'h'
L = init_spin_config(opt) #Initial spin configuration
#Simulation Vizualization
fig = plt.figure(figsize=(10, 10), dpi=80)
fig.suptitle("T = %0.1f" % T, fontsize=50)
X, Y = np.meshgrid(range(l), range(l))
mesh = plt.pcolormesh(X, Y, L, cmap = plt.cm.RdBu)
a = animation.FuncAnimation(fig, mcstep, frames = N, interval = 5, blit = True)
plt.show()
Apart from a 'KeyError' from a Tkinter exception and white bands when I try a 16x16 or anything above that, it looks and works fine. Now what I want to know is if this is right because -
I am uncomfortable with how I have used FuncAnimation to do the Monte Carlo simulation AND animate my mesh plot - does that even make sense?
And How about that cold start? All I am getting is a red screen.
Also, please tell me about the KeyError and the white banding.
The 'KeyError' came up as -
Exception in Tkinter callback
Traceback (most recent call last):
File "/usr/lib/python2.7/lib-tk/Tkinter.py", line 1540, in __call__
return self.func(*args)
File "/usr/lib/python2.7/lib-tk/Tkinter.py", line 590, in callit
func(*args)
File "/usr/local/lib/python2.7/dist-packages/matplotlib/backends/backend_tkagg.py", line 147, in _on_timer
TimerBase._on_timer(self)
File "/usr/local/lib/python2.7/dist-packages/matplotlib/backend_bases.py", line 1305, in _on_timer
ret = func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/matplotlib/animation.py", line 1049, in _step
still_going = Animation._step(self, *args)
File "/usr/local/lib/python2.7/dist-packages/matplotlib/animation.py", line 855, in _step
self._draw_next_frame(framedata, self._blit)
File "/usr/local/lib/python2.7/dist-packages/matplotlib/animation.py", line 873, in _draw_next_frame
self._pre_draw(framedata, blit)
File "/usr/local/lib/python2.7/dist-packages/matplotlib/animation.py", line 886, in _pre_draw
self._blit_clear(self._drawn_artists, self._blit_cache)
File "/usr/local/lib/python2.7/dist-packages/matplotlib/animation.py", line 926, in _blit_clear
a.figure.canvas.restore_region(bg_cache[a])
KeyError: <matplotlib.axes._subplots.AxesSubplot object at 0x7fd468b2f2d0>
You are asking a lot of questions at a time.
KeyError: cannot be reproduced. It's strange that it should only occur for some array sizes and not others. Possibly something is wrong with the backend, you may try to use a different one by placing those lines at the top of the script
import matplotlib
matplotlib.use("Qt4Agg")
white bands: cannot be reproduced either, but possibly they come from an automated axes scaling. To avoid that, you can set the axes limits manually
plt.xlim(0,l-1)
plt.ylim(0,l-1)
Using FuncAnimation to do the Monte Carlo simulation is perfectly fine. f course it's not the fastest method, but if you want to follow your simulation on the screen, there is nothing wrong with it. One may however ask the question why there would be only one spin flipping per time unit. But that is more a question on the physics than about programming.
Red screen for cold start: In the case of the cold start, you initialize your grid with only 1s. That means the minimum and maximum value in the grid is 1. Therefore the colormap of the pcolormesh is normalized to the range [1,1] and is all red. In general you want the colormap to span [-1,1], which can be done using vmin and vmax arguments.
mesh = plt.pcolormesh(X, Y, L, cmap = plt.cm.RdBu, vmin=-1, vmax=1)
This should give you the expected behaviour also for the "cold start".
this code returns the error "float() argument must be a string or a number, not 'interp2d'". I'm attempting to learn how to interpolate values to fill an array given a few of the values in the array (sorry, bad phrasing). Am I messing up the syntax for the interp2d function or what?
import numpy as np
import matplotlib.pyplot as plt
from netCDF4 import Dataset
import scipy as sp
GCM_file = '/Users/Robert/Documents/Python Scripts/GCMfiles/ATM_echc0003_1979_2008.nc'
fh = Dataset(GCM_file, mode = 'r')
pressure = fh.variables['lev'][:]
lats = fh.variables['lat'][:]
temp = np.mean(fh.variables['t'][0,:,:,:,:], axis = (0, 3))
potential_temp = np.zeros((np.size(temp,axis=0), np.size(temp,axis=1)))
P0 = pressure[0]
#plt.figure(0)
for j in range(0, 96):
potential_temp[:,j] = temp[:, j] * (P0/ pressure[:]) ** .238
potential_temp_view = potential_temp.view()
temp_view = temp.view()
combo_t_and_pt = np.dstack((potential_temp_view,temp_view))
combo_view = combo_t_and_pt.view()
pt_and_t_flat=np.reshape(combo_view, (26*96,2))
t_flat = temp.flatten()
pt_flat = potential_temp.flatten()
temp_grid = np.zeros((2496,96))
for j in range(0, 2496):
if j <= 95:
temp_grid[j,j] = t_flat[j]
else:
temp_grid[j, j % 96] = t_flat[j]
'''Now you have the un-interpolated grid of all your values of t as a function of potential temp and latitude, so you have to interpolate the rest somehow....?'''
xlist = lats
ylist = pt_flat
X,Y = np.meshgrid(xlist,ylist)
temp_cubic = sp.interpolate.interp2d(xlist,ylist, temp_grid, kind = 'cubic')
#temp_linear= griddata(temp_grid, (X,Y), method = 'linear')
#temp_quintic = griddata(temp_grid, (X,Y), method = 'cubic')
plt.figure(0)
plt.contourf(X,Y, temp_cubic, 20)
EDIT: The error with this was pointed out to me. I changed the code from the interpolating line down into this, and I'm still getting an error, which reads "ValueError: Invalid input data". Here's the traceback:
runfile('C:/Users/Robert/Documents/Python Scripts/attempt at defining potential temperature.py', wdir='C:/Users/Robert/Documents/Python Scripts')
Traceback (most recent call last):
File "<ipython-input-27-1ffd3fcc3aa1>", line 1, in <module>
runfile('C:/Users/Robert/Documents/Python Scripts/attempt at defining potential temperature.py', wdir='C:/Users/Robert/Documents/Python Scripts')
File "C:\Users\Robert\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "C:\Users\Robert\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 88, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "C:/Users/Robert/Documents/Python Scripts/attempt at defining potential temperature.py", line 62, in <module>
Z = temp_cubic(xlist,ylist)
File "C:\Users\Robert\Anaconda3\lib\site-packages\scipy\interpolate\interpolate.py", line 292, in __call__
z = fitpack.bisplev(x, y, self.tck, dx, dy)
File "C:\Users\Robert\Anaconda3\lib\site-packages\scipy\interpolate\fitpack.py", line 1048, in bisplev
raise ValueError("Invalid input data")":
temp_cubic = sp.interpolate.interp2d(xlist, ylist, temp_grid, kind = 'cubic')
ylist = np.linspace(np.min(pt_flat), np.max(pt_flat), .01)
X,Y = np.meshgrid(xlist,ylist)
Z = temp_cubic(xlist,ylist)
plt.contourf(X,Y, Z, 20)
The problem is in the following line. interp2d returns an interpolation function. However, you used it in place of the Z argument to countourf, which is supposed to be a float matrix. See the contourf doc for details.
In particular:
contour(X,Y,Z,N)
make a contour plot of an array Z.
X, Y specify the (x, y) coordinates of the surface
X and Y must both be 2-D with the same shape as Z,
or they must both be 1-D such that
len(X) is the number of columns in Z and
len(Y) is the number of rows in Z.
contour up to N automatically-chosen levels.
In short, I believe that you want to apply the function to X and Y to generate the array you pass in as the third argument.
Credit to both the matplotlib documentation and kindall for showing the conceptual error of my other possibilities.
I want to make a 2d histogramme by putting two 2D array as argument, Tx and alt_array, same size (56000,40)
def histo_2D(alt, Tx):
u,v = 56000,40
Tx = np.zeros((u,v))
alt_array = np.zeros((u,v))
alt,tx = np.zeros((v)), np.zeros((v))
for i in range(0,v):
alt[i] = i
tx[i] = i
alt_array[:][:] = alt
Tx[:][:] = tx
alt_array[:][:] = alt
print np.shape(Tx), np.shape(alt_array)
plt.hist2d(Tx , alt_array)
But when i try to execute my program, i get this error message :
Traceback (most recent call last):
File "goccp.py", line 516, in <module>
histo_2D(alt,Tx)
File "goccp.py", line 376, in histo_2D
plt.hist2d(Tx , alt_array)
File "/Code/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.py", line 2847, in hist2d
weights=weights, cmin=cmin, cmax=cmax, **kwargs)
File "/Code/anaconda/lib/python2.7/site-packages/matplotlib/axes.py", line 8628, in hist2d
normed=normed, weights=weights)
File "/Code/anaconda/lib/python2.7/site-packages/numpy/lib/twodim_base.py", line 650, in histogram2d
hist, edges = histogramdd([x, y], bins, range, normed, weights)
File "/Code/anaconda/lib/python2.7/site-packages/numpy/lib/function_base.py", line 288, in histogramdd
N, D = sample.shape
ValueError: too many values to unpack
I've tried to use flattened array, but the result is not really good...
The documentation for hist2d states:
matplotlib.pyplot.hist2d(x, y, bins=10, range=None, normed=False, weights=None, cmin=None, cmax=None, hold=None, **kwargs)
Parameters: x, y: array_like, shape (n, ) :
Thus x and y need to be one dimensional; your values are two dimensional.
Have a look at the example as well, given at the end of the documentation.
I've started with matplotlib a week ago; I'm trying to plot the function
where
I changed my code for
from math import*
import numpy as np
import matplotlib.pyplot as plt
def phi(x):
return min(ceil(x) - x, x - floor(x))
n=50
def f(x):
return sum([phi(x*2.0**i)/(2.0**i) for i in range (1,n)])
t = np.arange(0.0, 3.0, 0.1)
plt.plot(t, map(f,t))
plt.show()
But it is not working. The error that I'm getting is:
File "C:\Documents and Settings\Macedo\Desktop\exem.py", line 15, in <module>
plt.plot(t, map(f,t))
File "C:\Python32\lib\site-packages\matplotlib\pyplot.py", line 2459, in plot
ret = ax.plot(*args, **kwargs)
File "C:\Python32\lib\site-packages\matplotlib\axes.py", line 3850, in plot
for line in self._get_lines(*args, **kwargs):
File "C:\Python32\lib\site-packages\matplotlib\axes.py", line 325, in _grab_next_args
for seg in self._plot_args(remaining, kwargs):
File "C:\Python32\lib\site-packages\matplotlib\axes.py", line 302, in _plot_args
x, y = self._xy_from_xy(x, y)
File "C:\Python32\lib\site-packages\matplotlib\axes.py", line 242, in _xy_from_xy
raise ValueError("x and y must have same first dimension")
ValueError: x and y must have same first dimension
The problem is how you are defining variables. For example, you wrote:
def phi(x):
phi = lambda x: min(ceil(x) - x, x - floor(x))
You can either define it as
def phi(x):
return min(ceil(x) - x, x - floor(x))
or
phi = lambda x: min(ceil(x) - x, x - floor(x))
Look up function definitions and lambda functions in Python.
The definition of f should not be in a loop. So ou need something like
n=50
def f(x):
return sum([phi(x*2.0**i)/(2.0**i) for i in range (1,n)])
To get rid of the "only length-1 arrays can be converted to Python scalars" error, use
plt.plot(t, map(f,t))
instead of
plt.plot(t, f(t))
The problem is that math.ceil needs a scalar, and does not operate element-wise on arrays, which is what you want. So map will operate f element-wise on t now.
So finally, the code I am using is:
from math import *
import numpy as np
import matplotlib.pyplot as plt
def phi(x):
return min(ceil(x) - x, x - floor(x))
n=50
def f(x):
return sum([phi(x*2.0**i)/(2.0**i) for i in range (1,n)])
t = np.arange(0.0, 3.0, 0.1)
plt.plot(t, map(f,t))
plt.show()
And the output is
This is in Python 2.7.2. As suggested by #ThomasK, for Python 3 you might need list(map(f,t)).