matplotlib: not plotting a curve correctly - python

I am trying to plot this curve, and am a little confused on why it looks the way that it does. I would like to plot the curve seen below, but I don't want the lines in the middle and can't figure out why they're there. Could it be because there are 0's in the middle of the vector representing the y values?

This is just from my phone, so apologies if the formatting is off...
This is happening because you have data with zeros in it. If you want to prune them out in some way, then either you can do it on the reads, or you can sort the data. Something like this should suffice:
x, y = sorted(zip(x, y))

It is already late but I hope it may help to someone. Taken from that answer why my curve fitting plot using matplotlib looks obscured?
You need to sort your X's in ascending order and then use it in plot function. Please bear in mind x and y pairs should be preserved to have correctly drawn curve.
import numpy as np
sorted_indexes = np.argsort(X)
X = X[sorted_indexes]
y = y[sorted_indexes]

Related

Contour plot of 2D Numpy array in Python

my aim is to get a contour plot in Python for an (100,100) array imported and created with Fortran.
I imported the array from Fortran in the following way :
x=np.linspace(0.02,10,100),
y=np.linspace(0.47,4,100)
f = (np.fromfile(('/path/result.dat'
), dtype=np.float64).reshape((len(x), len(y)), order="F"))
So the result is dependent from x and y and gives a value for every combination of x and y.
How can I create a corresponding contour plot? So far what I tried was:
X, Y= np.meshgrid(x, y)
plt.contourf(X, Y, f, colors='black')
plt.show()
But the resulting contour plot shows values that dont make sense. I also tried imshow() but it did not work. If you could help me, I will be very grateful!
The arrangement of X,Y, and f plays a role here. Without looking at how the result.dat was generated, though, it is difficult to answer this question. Intuition tells me, the values of f(x,y) may not match with the meshgrid.
The improper values might be arising because the values of X and Y don't correspond to the values of f. Try order = "C" or order = "A". Also, your x and y should really be defined before reshaping the data.
x=np.linspace(0.02,10,100)
y=np.linspace(0.47,4,100)
f = np.fromfile(('/path/result.dat'), dtype=np.float64).reshape((len(x), len(y)), order="<>")
Maybe try reordering X and Y if this doesn't work.

Why I use matplotlib.pyplot(plt) lib to show some points but it cannot show line between the points

There is a for-loop in my part of code, and every step it can generate new tpr(as X), fpr(as Y) like that
0.05263157894736842 0.1896551724137931
0.06578947368421052 0.19540229885057472
0.07894736842105263 0.22988505747126436
0.07894736842105263 0.25862068965517243
0.07894736842105263 0.28735632183908044
I want collect all these points and get a full plot, but it didn't work. And my code are attached below
for i in range (-30,20):
predicted = (np.sign(t+i*1e-4)+1)/2.
vals, cm = re.get_CM_vals(y_test, predicted)
tpr = re.TPR_CM(cm)
fpr = re.FPR_CM(cm)
#print(tpr, fpr)
plt.plot(fpr, tpr,'b.-',linewidth=1)
plt.show()
Beside, I want to the the right angle line between points like that.is there a func in matplotlib?
Using your current code, I suggest adding the x values to an array and the y values to another array. You could also use something like: ArrayName = [[],[]], then append the x and y values to ArrayName[0] and ArrayName[1], respectively. Not only would this actually work, but it would be slightly faster, since the plt.plot and plt.scatter functions work faster plotting all the points at once instead of through a for loop.
If you don't want to plot the points connected with lines, I still suggest using an array since that would be faster. (It wouldn't be that much faster in this case, but it's a good habit to have.

Trying to plot a quadratic regression, getting multiple lines

I'm making a demonstration of a different types of regression in numpy with ipython, and so far, I've been able to plot a simple linear regression without difficulty. Now, when I go on to make a quadratic fit to my data and go to plot it, I don't get a quadratic curve but instead get many lines. Here's the code I'm running that generates the problem:
import numpy
from numpy import random
from matplotlib import pyplot as plt
import math
# Generate random data
X = random.random((100,1))
epsilon=random.randn(100,1)
f = 3+5*X+epsilon
# least squares system
A =numpy.array([numpy.ones((100,1)),X,X**2])
A = numpy.squeeze(A)
A = A.T
quadfit = numpy.linalg.solve(numpy.dot(A.transpose(),A),numpy.dot(A.transpose(),f))
# plot the data and the fitted parabola
qdbeta0,qdbeta1,qdbeta2 = quadfit[0][0],quadfit[1][0],quadfit[2][0]
plt.scatter(X,f)
plt.plot(X,qdbeta0+qdbeta1*X+qdbeta2*X**2)
plt.show()
What I get is this picture (zoomed in to show the problem):
You can see that rather than having a single parabola that fits the data, I have a huge number of individual lines doing something that I'm not sure of. Any help would be greatly appreciated.
Your X is ordered randomly, so it's not a good set of x values to use to draw one continuous line, because it has to double back on itself. You could sort it, I guess, but TBH I'd just make a new array of x coordinates and use those:
plt.scatter(X,f)
x = np.linspace(0, 1, 1000)
plt.plot(x,qdbeta0+qdbeta1*x+qdbeta2*x**2)
gives me

How can I account for identical data points in a scatter plot?

I'm working with some data that has several identical data points. I would like to visualize the data in a scatter plot, but scatter plotting doesn't do a good job of showing the duplicates.
If I change the alpha value, then the identical data points become darker, which is nice, but not ideal.
Is there some way to map the color of a dot to how many times it occurs in the data set? What about size? How can I assign the size of the dot to how many times it occurs in the data set?
As it was pointed out, whether this makes sense depends a bit on your dataset. If you have reasonably discrete points and exact matches make sense, you can do something like this:
import numpy as np
import matplotlib.pyplot as plt
test_x=[2,3,4,1,2,4,2]
test_y=[1,2,1,3,1,1,1] # I am just generating some test x and y values. Use your data here
#Generate a list of unique points
points=list(set(zip(test_x,test_y)))
#Generate a list of point counts
count=[len([x for x,y in zip(test_x,test_y) if x==p[0] and y==p[1]]) for p in points]
#Now for the plotting:
plot_x=[i[0] for i in points]
plot_y=[i[1] for i in points]
count=np.array(count)
plt.scatter(plot_x,plot_y,c=count,s=100*count**0.5,cmap='Spectral_r')
plt.colorbar()
plt.show()
Notice: You will need to adjust the radius (the value 100 in th s argument) according to your point density. I also used the square root of the count to scale it so that the point area is proportional to the counts.
Also note: If you have very dense points, it might be more appropriate to use a different kind of plot. Histograms for example (I personally like hexbin for 2d data) are a decent alternative in these cases.

Highlighting last data point in pandas plot

I have number of graphs similar to this:
import pandas as pd
dates = pd.date_range('2012-01-01','2013-02-22')
y = np.random.randn(len(dates))/365
Y = pd.Series(y, index=dates)
Y.plot()
The graph is great for showing the shape of the data, but I would like the latest value to stand out as well. I would like to highlight the last data point with a marker 'x' and with a different color. Any idea how I can do this?
Have added Dan Allan's suggestion. Works but I need something a bit more visible. As seen below the x is hardly visible. Any ideas?
Have added return of final answer to complete this. Changed the x to a D for a diamond for better visibility and increased the size of the marker.
Y.tail(1).plot(style='rD',markersize=10)
Add this line to your example to plot the last data point as a red X.
Y.tail(1).plot(style='rx')

Categories

Resources