Plot 2D histogram data with pcolormesh - python

I need to plot a binned statistic, as one would get from scipy.stats.binned_statistic_2d. Basically, that means I have edge values and within-bin data. This also means I cannot (to my knowledge) use plt.hist2d. Here's a code snippet to generate the sort of data I might need to plot:
import numpy as np
x_edges = np.arange(6)
y_edges = np.arange(6)
bin_values = np.random.randn(5, 5)
One would imagine that I could use pcolormesh for this, but the issue is that pcolormesh does not allow for bin edge values. The following will only plot the values in bins 1 through 4. The 5th value is excluded, since while pcolormesh "knows" that the value at 4.0 is some value, there is no later value to plot, so the width of the 5th bin is zero.
import matplotlib.pyplot as plt
X, Y = np.broadcast_arrays(x_edges[:5, None], y_edges[None, :5])
plt.figure()
plt.pcolormesh(X, Y, bin_values)
plt.show()
I can get around this with an ugly hack by adding an additional set of values equal to the last values:
import matplotlib.pyplot as plt
X, Y = np.broadcast_arrays(x_edges[:, None], y_edges[None, :])
dummy_bin_values = np.zeros([6, 6])
dummy_bin_values[:5, :5] = bin_values
dummy_bin_values[5, :] = dummy_bin_values[4, :]
dummy_bin_values[:, 5] = dummy_bin_values[:, 4]
plt.figure()
plt.pcolormesh(X, Y, dummy_bin_values)
plt.show()
However, this is an ugly hack. Is there any cleaner way to plot 2D histogram data with bin edge values? "No" is possibly the correct answer, but convince me that's the case if it is.

I do not understand the problem with any of the two options. So here is simly a code which uses both, numpy histogrammed data with pcolormesh, as well as simply plt.hist2d.
import numpy as np
import matplotlib.pyplot as plt
x_edges = np.arange(6)
y_edges = np.arange(6)
data = np.random.rand(340,2)*5
### using numpy.histogram2d
bin_values,_,__ = np.histogram2d(data[:,0],data[:,1],bins=(x_edges, y_edges) )
X, Y = np.meshgrid(x_edges,y_edges)
fig, (ax,ax2) = plt.subplots(ncols=2)
ax.set_title("numpy.histogram2d \n + plt.pcolormesh")
ax.pcolormesh(X, Y, bin_values.T)
### using plt.hist2d
ax2.set_title("plt.hist2d")
ax2.hist2d(data[:,0],data[:,1],bins=(x_edges, y_edges))
plt.show()
Of course this would equally work with scipy.stats.binned_statistic_2d.

Related

Coloring a scatter plot from pandas itertuples?

I have a dataframe that has coordinates in one column, and an orientation in the other:
I'm trying to scatter plot the coordinates and then colour them by their orientation:
for row in df.itertuples():
x, y = row.coords[:,1], row.coords[:,0]
plt.scatter(x,y, c=df.orientation)
This plots the coordinates fine but not the orientation as it's still within the itertuple loop. Does anyone know how to get around this problem?
As each row has only one color, you need to explicitly set that color for the row. In order to get a color from a certain numeric orientation value, you need to create a colormap and a norm. The colormap can be any of your choice. The norm needs to be set using the complete range of the 'orientation' column.
Using the norm (to get a value between 0 and 1) you can index the colormap and obtain an rgb-value.
As plt.scatter tries to verify whether you are giving one single color for all points together or one color per point, rgb values can cause confusion. Therefore, it is safest to create an array around the color value (so c=[cmap(norm(row.orientation))] instead of just c=cmap(norm(row.orientation))).
The colormap and norm can also be used to create an accompanying colorbar.
Here is some example code to get you started:
from matplotlib import pyplot as plt
from matplotlib.cm import ScalarMappable
import numpy as np
import pandas as pd
N = 30
df = pd.DataFrame({'coords': [np.random.normal(0, 1, size=(np.random.randint(5, 50), 2)) + np.random.uniform(0, 50, 2)
for _ in range(N)],
'orientation': np.random.uniform(-1, 1, N)})
cmap = plt.get_cmap('magma')
norm = plt.Normalize(df.orientation.min(), df.orientation.max())
for row in df.itertuples():
coords = np.array(row.coords)
x, y = coords[:, 1], coords[:, 0]
plt.scatter(x, y, c=[cmap(norm(row.orientation))])
plt.colorbar(ScalarMappable(cmap=cmap, norm=norm), label='orientation')
plt.show()
You iterate by row, so you should use the same syntax as you did for x and y : c=row.orientation

Check if seaborn scatterplot function is sampling data

I have plotted a seaborn scatter plot. My data consists of 5000 data points. By looking into the plot, I definitely am not seeing 5000 points. So I'm pretty sure some kind of sampling is performed by seaborn scatterplot function. I want to know how many data points each point in the plot represent? If it depends on the code, the code is as following:
g = sns.scatterplot(x=data['x'], y=data['y'],hue=data['P'], s=40, edgecolor='k', alpha=0.8, legend="full")
Nothing would really suggest to me that seaborn is sampling your data. However, you can check the data in your axes g to be sure. Query the children of the axes for a PathCollection (scatter plot) object:
g.get_children()
It's probably the first item in the list that is returned. From there you can use get_offsets to retrieve the data and check its shape.
g.get_children()[0].get_offsets().shape
As far as I know, no sampling is performed. On the picture you have posted, you can see that most of the data points are just overlapping and that might be the reason why you can not see 5000 points. Try with less points and you will see that all of them get plotted.
In order to check whether or not Seaborn's scatter removes points, here is a way to see 5000 different points. No points seem to be missing.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
x = np.linspace(1, 100, 100)
y = np.linspace(1, 50, 50)
X, Y = np.meshgrid(x, y)
Z = (X * Y) % 25
X = np.ravel(X)
Y = np.ravel(Y)
Z = np.ravel(Z)
sns.scatterplot(x=X, y=Y, s=15, hue=Z, palette=plt.cm.plasma, legend=False)
plt.show()

How to remove an histogram in Matplotlib

I am used to work with plots that change over the time in order to show differences when a parameter is changed. Here I provide an easy example
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111)
ax.grid(True)
x = np.arange(-3, 3, 0.01)
for j in range(1, 15):
y = np.sin(np.pi*x*j) / (np.pi*x*j)
line, = ax.plot(x, y)
plt.draw()
plt.pause(0.5)
line.remove()
You can clearly see that increasing the paramter j the plot becames narrower and narrower.
Now if I want to do the some job with a counter plot than I just have to remove the comma after "line". From my understanding this little modification comes from the fact that the counter plot is not an element of a tuple anymore, but just an attribute as the counter plot completely "fill up" all the space available.
But it looks like there is no way to remove (and plot again) an histogram. Infact if type
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111)
ax.grid(True)
x = np.random.randn(100)
for j in range(15):
hist, = ax.hist(x, 40)*j
plt.draw()
plt.pause(0.5)
hist.remove()
It doesn't matter whether I type that comma or not, I just get a message of error.
Could you help me with this, please?
ax.hist doesn't return what you think it does.
The returns section of the docstring of hist (access via ax.hist? in an ipython shell) states:
Returns
-------
n : array or list of arrays
The values of the histogram bins. See **normed** and **weights**
for a description of the possible semantics. If input **x** is an
array, then this is an array of length **nbins**. If input is a
sequence arrays ``[data1, data2,..]``, then this is a list of
arrays with the values of the histograms for each of the arrays
in the same order.
bins : array
The edges of the bins. Length nbins + 1 (nbins left edges and right
edge of last bin). Always a single array even when multiple data
sets are passed in.
patches : list or list of lists
Silent list of individual patches used to create the histogram
or list of such list if multiple input datasets.
So you need to unpack your output:
counts, bins, bars = ax.hist(x, 40)*j
_ = [b.remove() for b in bars]
Here the right way to iteratively draw and delete histograms in matplotlib
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure(figsize = (20, 10))
ax = fig.add_subplot(111)
ax.grid(True)
for j in range(1, 15):
x = np.random.randn(100)
count, bins, bars = ax.hist(x, 40)
plt.draw()
plt.pause(1.5)
t = [b.remove() for b in bars]

Python - Line colour of 3D parametric curve

I have 2 lists tab_x (containe the values of x) and tab_z (containe the values of z) which have the same length and a value of y.
I want to plot a 3D curve which is colored by the value of z. I know it's can be plotted as a 2D plot but I want to plot a few of these plot with different values of y to compare so I need it to be 3D.
My tab_z also containe negatives values
I've found the code to color the curve by time (index) in this question but I don't know how to transforme this code to get it work in my case.
Thanks for the help.
I add my code to be more specific:
fig8 = plt.figure()
ax8 = fig8.gca(projection = '3d')
tab_y=[]
for i in range (0,len(tab_x)):
tab_y.append(y)
ax8.plot(tab_x, tab_y, tab_z)
I have this for now
I've tried this code
for i in range (0,len(tab_t)):
ax8.plot(tab_x[i:i+2], tab_y[i:i+2], tab_z[i:i+2],color=plt.cm.rainbow(255*tab_z[i]/max(tab_z)))
A total failure:
Your second attempt almost has it. The only change is that the input to the colormap cm.jet() needs to be on the range of 0 to 1. You can scale your z values to fit this range with Normalize.
import numpy as np
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import colors
fig = plt.figure()
ax = fig.gca(projection='3d')
N = 100
y = np.ones((N,1))
x = np.arange(1,N + 1)
z = 5*np.sin(x/5.)
cn = colors.Normalize(min(z), max(z)) # creates a Normalize object for these z values
for i in xrange(N-1):
ax.plot(x[i:i+2], y[i:i+2], z[i:i+2], color=plt.cm.jet(cn(z[i])))
plt.show()

x axis with duplicate values (loading profile) plot in matplotlib

i have load profile data where x axis is load profile such that for multiple same values of x (constant load) i have different values for y.
till now in excel i used to line plot y and right click graph->selec data->change hoizontal axis data by providing it range o x axis data and that used to give me the graph
the problem i have is when i try to give
plot(x,y), matplotlib plots y for unique vals of x ie it neglects out all the remaining value of for same value of x.
and when i plot with plot(y) i get sequence numbers on x axis
i tried xticks([0,5,10,15]) for checking out but couldn't get the required result.
my question is
is it possible to plot a graph in a similar fashion as of excel
the other alternative i could think of was plotting plot(y and plot (x) with same horizontal axis it atleast gives a pictorial idea but is there any means to do it the excel way??
From your description, it sounds to me like you want to use the "scatter" plotting command instead of the "plot" plotting command. This will allow the use of redundant x-values. Sample code:
import numpy as np
import matplotlib.pyplot as plt
# Generate some data that has non-unique x-values
x1 = np.linspace(1,50)
y1 = x1**2
y2 = 2*x1
x3 = np.append(x1,x1)
y3 = np.append(y1,y2)
# Now plot it using the scatter command
# Note that some of the abbreviations that work with plot,
# such as 'ro' for red circles don't work with scatter
plt.scatter(x3,y3,color='red',marker='o')
As I mentioned in the comments, some of the handy "plot" shortcuts don't work with "scatter" so you may want to check the documentation: http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.scatter
If you want to plot y-values for a given x-values, you need to get the index which has same x-values. If you are working with numpy then you can try
import pylab as plt
import numpy as np
x=np.array([1]*5+[2]*5+[3]*5)
y=np.array([1,2,3,4,5]*3)
idx=(x==1) # Get the index where x-values are 1
plt.plot(y[idx],'o-')
plt.show()
If you are working with lists you can get the index by
# Get the index where x-values are 1
idx=[i for i, j in enumerate(x) if j == 1]
just answering own question,found this around when i had posted this question years back :)
def plotter(y1,y2,y1name,y2name):
averageY1=float(sum(y1)/len(y1))
averageY2=float(sum(y2)/len(y2))
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.plot(y1,'b-',linewidth=2.0)
ax1.set_xlabel("SNo")
# Make the y2-axis label and tick labels match the line color.
ax1.set_ylabel(y1name, color='b')
for tl in ax1.get_yticklabels():
tl.set_color('b')
ax1.axis([0,len(y2),0,max(y1)+50])
ax2 = ax1.twinx()
ax2.plot(y2, 'r-')
ax2.axis([0,len(y2),0,max(y2)+50])
ax2.set_ylabel(y2name, color='r')
for tl in ax2.get_yticklabels():
tl.set_color('r')
plt.title(y1name + " vs " + y2name)
#plt.fill_between(y2,1,y1)
plt.grid(True,linestyle='-',color='0.75')
plt.savefig(y1name+"VS"+y2name+".png",dpi=200)
You can use
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1, 1, 1, 2, 2, 2])
y = np.array([1, 2, 1, 5, 6, 7])
fig, ax = plt.subplots()
ax.plot(np.arange(len(x)), y)
ax.set_xticklabels(x)
plt.show()

Categories

Resources