I have a plot with several data points that I would like to keep as it is.
One of the data point is 'better', because it does not only come with a value but also a probability assigned to it.
I would like to show that probability by plotting the normal data points, and for the one with the measured PDF show a violinplot.
So far I've done this with a scatter plot over plotted which looks somewhat like this (in an MWE):
import numpy as np
import matplotlib.pyplot as plt
def plot():
x = np.linspace(0,20,20)
data = x + np.random.rand(len(x))
y = 2*x
histo = np.array([1,2,3,10,20,10,3,1])
y_better = np.array([9.5,9.8,10,11.5,12,13,15,16])
ax = plt.subplot()
ax.plot(x,data,'o')
ax.scatter(np.ones_like(histo)*x[10],y_better,c=histo,norm=matplotlib.colors.LogNorm(),s=100)
plt.show()
plot()
which looks like this:
While this works, and transports the message - but doesn't look too cool.
Following the suggestion by #jadsq, I discovered violinplots, which look exactly like what I want!
I now have the problem that the violinplot function assumes data and then conveniently draws the PDF. In my case I already have a measured PDF (which is what I want to plot). How could I make a plot that looks like the violin plot, but with my PDF (so without the estimation)?
To me it looks like a way of indicating error bars on your point so you could maybe try representing it with a box plot.
Regarding the color map: just add cmap='inferno' in the scatter call,like so :
ax.scatter(np.ones_like(histo)*x[10],y_better,c=histo,norm=matplotlib.colors.LogNorm(),s=100,cmap='inferno')
Related
I am trying to display a chart using matplotlib. But my labels are so big that they are overlapping each other. I want to show it cleanly no overlapping. How can I do that? I am now using below code:
import matplotlib.pyplot as plt
x = ['jdwdw723#gmail.com' ,'emcast.test10#gmail.com', 'pbChinaTester#clp.com']
y = [10,25,6]
plt.plot(x,y)
plt.xlabel("loginId")
plt.ylabel("times appeared in the data")
plt.title("loginId Graph")
plt.tight_layout()
plt.show()
I tried your example code, and it doesn't seem to be overlapping there. There are many possibilities. One, commonly used, is to rotate the labels.
You can do it like this:
plt.xticks(rotation=45)
There are more ideas in Changing the “tick frequency” on x or y axis in matplotlib? and in reducing number of plot ticks.
I created an example notebook here, feel free to duplicate and play with it.
I need to plot points on a Seaborn distplot corresponding to certain X values such that they fall either on the density curve or below it. Here is a distplot from the following URL:
From the Seaborn site - distplot examples
Here is an image with the code:
So for example, in the plot shown above, I need to determine programmatically what is the Y axis value corresponding to the X value of 0 that falls on the density curve. From the figure, it seems like it is somewhere around 0.37. How can I get that in my program?
Assuming that can be done, then how can I show it in the plot shown, i.e., what line of code will show that. I am translating a set of R visualizations to Python. The following plot in R shows what I am trying to achieve:
See the points shown on the curve? There are many point values to be drawn, but if you help me draw one, I can try to do the rest. I am a beginning user of both Matplotlib and Seaborn packages.
In order to obtain the y coordinate for a point on the kde curve of the distplot, you can use the underlying data of the curve. You can get the data from the line plot using the get_data method of the line.
You can then interpolate the data on the point(s) you are interested in, using e.g. numpy.interp.
import seaborn.apionly as sns
import numpy as np; np.random.seed(0)
import matplotlib.pyplot as plt
x = np.random.randn(100)
ax = sns.distplot(x, hist_kws={"ec":"k"})
data_x, data_y = ax.lines[0].get_data()
xi = 0 # coordinate where to find the value of kde curve
yi = np.interp(xi,data_x, data_y)
print ("x={},y={}".format(xi, yi)) # prints x=0,y=0.3698
ax.plot([0],[yi], marker="o")
plt.show()
Being asked in the comments about how to obtain this solution:
Start with the problem. We have a distplot and we want to draw a point at a certain point on its kde curve.
Look at the documentation; does the distplot function have an argument that does what we want? Unfortunately not.
Does the function return an object? Yes. Is it the curve? Unfortunately not. Instead it's a matplotlib axes. (finding that out with type())
Find out what a matplotlib axes is; read the documentation. Uff, it's pretty heavy, but we stumble upon a method axes.get_lines(); since the curve should be a line, that should help.
Find out what those lines are: They are Line2D objects. Again looking at the documentation we find that there is a method get_data. So now we have the data of the curve. Great!
At this point it would be even better if we had a function we could call with our x value to receive the corresponding y value. Now it seems, we need to find that function by ourselves.
So given x and y data of the curve, how do we find the y value of a given x value? Since the data is discrete, we need to interpolate. Looking around for "interpolate" & "python" eventually brings us to numpy.interp. So this provides us with the coordinates we need to draw a point.
Find out how to draw a point. Well that's easy. Lots of examples for that around.
That's it.
I'm trying to add a color bar in a graph, but I don't understand how it works. The problem is that I make my own colorcode by:
x = np.arange(11)
ys = [i+x+(i*x)**2 for i in range(11)]
colors = cm.rainbow(np.linspace(0, 1, len(ys)))
and colors[i] will give me a new color. Then I use (homemade) functions to select the relevant data and plot them accordingly. This would look something like this:
function(x,y,concentration,temperature,1,37,colors[0])
function(x,y,concentration,temperature,2,37,colors[1])
# etc
Now I want to add the colors in a color bar, with labels I can change. How do I do this?
I have seen several examples where you plot all the data as one array, with automated color bars, but here I plot the data one by one (by using functions to select the relevant data).
EDIT:
function(x,y,concentration,temperature,1,37,colors[0]) looks like this (simplified):
def function(x,y,c,T,condition1,condition2,colors):
import matplotlib.pyplot as plt
i=0
for element in c:
if element == condition1:
if T[i]==condition2:
plt.plot(x,y,color=colors,linewidth=2)
i=i+1
return
Drawing a colorbar aside a line plot
Please map my solution (I used simply 11 sines of different amplitudes) to your problem (as I told you, it is difficult to understand from what you wrote in your Q).
import matplotlib
import numpy as np
from matplotlib import pyplot as plt
# an array of parameters, each of our curves depend on a specific
# value of parameters
parameters = np.linspace(0,10,11)
# norm is a class which, when called, can normalize data into the
# [0.0, 1.0] interval.
norm = matplotlib.colors.Normalize(
vmin=np.min(parameters),
vmax=np.max(parameters))
# choose a colormap
c_m = matplotlib.cm.cool
# create a ScalarMappable and initialize a data structure
s_m = matplotlib.cm.ScalarMappable(cmap=c_m, norm=norm)
s_m.set_array([])
# plotting 11 sines of varying amplitudes, the colors are chosen
# calling the ScalarMappable that was initialised with c_m and norm
x = np.linspace(0,np.pi,31)
for parameter in parameters:
plt.plot(x,
parameter*np.sin(x),
color=s_m.to_rgba(parameter))
# having plotted the 11 curves we plot the colorbar, using again our
# ScalarMappable
plt.colorbar(s_m)
# That's all, folks
plt.show()
Example
Acknowledgements
A similar problem, about a scatter plot
Update — April 14, 2021
With recent versions of Matplotlib, the statement s_m.set_array([]) is not required any more. On the other hand, it does no harm.
When plotting, in place of color=s_m.to_rgba(parameter) one may want to use the (slightly) more obvious color=c_m(norm(parameter)).
i try to plot data in a histogram or bar in python. The data size (array size) is between 0-10000. The data itself (each entry of the array) depends on the input and has a range between 0 and e+20 (mostly the data is in th same range). So i want to do a hist plot with matplotlib. I want to plot how often a data is in some intervall (to illustrate the mean and deviation). Sometimes it works like this:
hist1.
But sometimes there is a problem with the intevall size like this:
hist2.
In this plot i need more bars at point 0-100 etc.
Can anyone help me with this?
The plots are just made with:
from numpy.linalg import *
import matplotlib.pyplot as plt
plt.hist(numbers,bins=100)
plt.show()
By default, hist produces a plot with an x range that covers the full range of your data.
If you have one outsider at very high x in comparison with the other values, then you will see this image with a 'compressed' figure.
I you want to have always the same view you can fix the limits with xlim.
Alternatively, if you want to see your distribution always centered and as nicer as possible, you can calculate the mean and the standard deviation of your data and fix the x range accordingly (p.e. for mean +/- 5 stdev)
I am plotting things using matplotlib and Basemap (within a wxpython gui). Currently, my plot code look something like this:
self.map = Basemap(llcrnrlon=lon_L, llcrnrlat=lat_D, urcrnrlon=lon_R,
urcrnrlat=lat_U, projection='lcc', lat_0=map_lat1, lon_0=map_lon1,
resolution='i', area_thresh=10000,ax=self.axes, fix_aspect=False)
m = Basemap(llcrnrlon=lon_L, llcrnrlat=lat_D, urcrnrlon=lon_R,
urcrnrlat=lat_U, projection='lcc', lat_0=map_lat1, lon_0=map_lon1,
resolution='i', area_thresh=10000,ax=self.axes)
x,y=m(some_x_data,some_y_data)
plot_handle, = self.map.plot(x,y,'bo')
plot_handle.set_xdata(x)
plot_handle.set_ydata(y)
self.figure.canvas.draw()
This plots it just fine. Now what I want to do is take a single point (single x and single y within my data) and color it a different color. I still want to use the plot_handle because I am constantly updating the map/plot -- so i don't want to just reset my data. Any help?
Thanks!
If you use scatter (doc) you can set and update the color of each point.
import matplotlib.pylab as plt
x,y = m(some_x,some_y)
c = iterator_of_colors
plt_handle, = self.map.scatter(x,y,c=c)
# some code
c[j] = new_color # update you color list
plt_handle.set_array(c) # update the colors in the graph
plt.draw()
It looks a little strange to use set_array but that is how matplotlib deals with scatter plots internally (it looks like they use the same class that is used for displaying images, only just color in markers instead of squares in the grid).
Do a new plot_handle for the specific plot with a different marker:
plot_handle1, = self.map.plot([x[i]], [y[i]], 'ro')
You'll then have to update this every time you want to change that point's position. It's not possible to use only one plot_handle and have points showing with different markers.