I have a function that inputs a string (the name of the dataframe we're visualizing) and returns two histograms that visualize that data. The first plot (on the left) is the raw data, the one on the right is it after being normalized (same, just plotted using the matplotlib parameter density=True). But as you can see, this leads to transparency issues when the plots overlap. This is my code for this particular plot:
plt.rcParams["figure.figsize"] = [12, 8]
plt.rcParams["figure.autolayout"] = True
ax0_1 = plt.subplot(121)
_,bins,_ = ax0_1.hist(filtered_0,alpha=1,color='b',bins=15,label='All apples')
ax0_1.hist(filtered_1,alpha=0.9,color='gold',bins=bins,label='Less than two apples')
ax0_1.set_title('Condition 0 vs Condition 1: '+'{}'.format(apple_data),fontsize=14)
ax0_1.set_xlabel('{}'.format(apple_data),fontsize=13)
ax0_1.set_ylabel('Frequency',fontsize=13)
ax0_1.grid(axis='y',linewidth=0.4)
ax0_1.tick_params(axis='x',labelsize=13)
ax0_1.tick_params(axis='y',labelsize=13)
ax0_1_norm = plt.subplot(122)
_,bins,_ = ax0_1_norm.hist(filtered_0,alpha=1,color='b',bins=15,label='All apples',density=True)
ax0_1_norm.hist(filtered_1,alpha=0.9,color='gold',bins=bins,label='Less than two apples',density=True)
ax0_1_norm.set_title('Condition 0 vs Condition 1: '+'{} - Normalized'.format(apple_data),fontsize=14)
ax0_1_norm.set_xlabel('{}'.format(apple_data),fontsize=13)
ax0_1_norm.set_ylabel('Frequency',fontsize=13)
ax0_1_norm.legend(bbox_to_anchor=(2, 0.95))
ax0_1_norm.grid(axis='y',linewidth=0.4)
ax0_1_norm.tick_params(axis='x',labelsize=13)
ax0_1_norm.tick_params(axis='y',labelsize=13)
plt.tight_layout(pad=0.5)
plt.show()
What my current plot looks like
Any ideas on how to make the colors blend a bit better would be helpful. Alternatively, if there are any other combinations you know of that would work instead, feel free to share. I'm not picky about the color choice. Thanks!
I think it is better to emphasize such a histogram by distinguishing it by the shape of the histogram or by the difference in transparency rather than visualizing it by color. I have coded an example from the official reference with additional overlap.
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(20211021)
N_points = 100000
n_bins = 20
x = np.random.randn(N_points)
y = .4 * x + np.random.randn(100000) + 2
fig, axs = plt.subplots(2, 2, sharey=True, tight_layout=True)
# We can set the number of bins with the `bins` kwarg
axs[0,0].hist(x, color='b', alpha=0.9, bins=n_bins, ec='b', fc='None')
axs[0,0].hist(y, color='gold', alpha=0.6, bins=21)
axs[0,0].set_title('edgecolor and facecolor None')
axs[0,1].hist(x, color='b', alpha=0.9, bins=n_bins)
axs[0,1].hist(y, color='gold', alpha=0.6, bins=21, ec='b')
axs[0,1].set_title('edgecolor and facecolor')
axs[1,0].hist(x, alpha=0.9, bins=n_bins, histtype='step', facecolor='b')
axs[1,0].hist(y, color='gold', alpha=0.6, bins=21)
axs[1,0].set_title('step')
axs[1,1].hist(x, color='b', alpha=0.9, bins=n_bins, histtype='bar', rwidth=0.8)
axs[1,1].hist(y, color='gold', alpha=0.6, bins=21, ec='b')
axs[1,1].set_title('bar')
plt.show()
Related
Trying to add the average value to each category in the plot. I have been trying to add these average values independently, per category, but without success. Is there a way that catplot can average the values from the data set and plot that extra value with a different color? My goal is to add and differentiate the average value from the individual values so can be visually identified.
plt.rcParams["figure.figsize"] = [5.50, 5.50]
plt.rcParams["figure.autolayout"] = True
ax = sns.catplot(x="Sample Set", y="Values [%]", data=df)
ax.set_xticklabels(rotation=90)
ax.despine(right=True, top=True)
sp = 100
delta = 5
plt.axhline(y=sp, color='gray', linestyle='--', label='Target')
plt.axhline(y=sp*((100+(delta*2))/100), color='r', linestyle='--', label='10%')
plt.axhline(y=sp*((100-(delta*2))/100), color='r', linestyle='--')
plt.ylim(80, 120)
plt.title('Sample Location[enter image description here][1]', fontsize = 14, y=1.05)
plt.legend(frameon=False, loc ="lower right")
plt.savefig(outputFileName, dpi=300, bbox_inches = 'tight')
plt.show()
plt.draw()
You probably run into strange error messages, as you named the return value of sns.catplot as ax. sns.catplot is a "figure-level" function and returns a FacetGrid, often assigned to a variable named g. A figure-level function can have one or more subplots, accessible via g.axes. When there is only one subplot, g.ax points to that subplot.
Also note that the catplot's figsize isn't set via the rcParams. The figure size comes from the height= parameter (height in inches of one subplot) and the aspect= parameter (ratio between width and height of a subplot), multiplied by the number of rows/columns of subplots.
Further, you seem to be mixing the "object-oriented" and the pyplot interface for matplotlib. For readability and code maintenance, it is preferred to stick to one interface.
To indicate the means, sns.pointplot without confidence interval might be suited. ax.axhspan might be used to visualize the range around the target.
Here is some example code starting from seaborn's iris dataset.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
iris = sns.load_dataset('iris')
g = sns.catplot(data=iris, x="species", y="sepal_length", height=5.50, aspect=1)
ax = g.ax
ax.tick_params(axis='x', rotation=0, length=0)
sns.pointplot(data=iris, x="species", y="sepal_length", estimator=np.mean,
join=False, ci=None, markers=['D'], color='black', size=20, zorder=3, ax=ax)
sns.despine(right=True, top=True)
sp = 6
delta = 10
ax.axhline(y=sp, color='gray', linestyle='--', label='Target')
ax.axhspan(ymin=sp * (100 - delta) / 100, ymax=sp * (100 + delta) / 100,
color='r', alpha=0.15, linestyle='--', label='10%')
ax.collections[-1].set_label('Mean')
ax.legend(frameon=False, loc="lower right")
# plt.savefig(outputFileName, dpi=300, bbox_inches='tight')
plt.tight_layout()
plt.show()
According to plotting with seaborn using the matplotlib object-oriented interface as catplot is a Figure-leveltype of graph, will be much harder than doing it comparing to some other types of graph.
The second group of functions (Figure-level) are distinguished by the
fact that the resulting plot can potentially include several Axes
which are always organized in a "meaningful" way. That means that the
functions need to have total control over the figure, so it isn't
possible to plot, say, an lmplot onto one that already exists. Calling
the function always initializes a figure and sets it up for the
specific plot it's drawing.
Currently, I have the first y axis (probability) of my subplots aligned. However, I am attempting to get the secondary y axis (sample size) of the subplots aligned. I've tried to simply set the y-axis limit, but this solution isn't very generalizable.
Here is my code:
attacks = 5
crit_rate = .5
idealdata = fullMatrix(attacks, crit_rate)
crit_rate = ("crit_%.0f" % (crit_rate*100))
actualdata = trueDataM(attacks, crit_rate)
[enter image description here][1]
fig, axs = plt.subplots(attacks+1, sharex=True, sharey=True)
axs2 = [ax.twinx() for ax in axs]
fig.text(0.5, 0.04, 'State', ha='center')
fig.text(0.04, 0.5, 'Probability', va='center', rotation='vertical')
fig.text(.95, .5, 'Sample Size', va='center', rotation='vertical')
fig.text(.45, .9, 'Ideal vs. Actual Critical Strike Rate', va='center')
cmap = plt.get_cmap('rainbow')
samplesize = datasample(attacks, 'crit_50')
fig.set_size_inches(18.5, 10.5)
for i in range(attacks+1):
axs[i].plot(idealdata[i], color=cmap(i/attacks), marker='o', lw=3)
axs[i].plot(actualdata[i], 'gray', marker='o', lw=3, ls='--')
axs2[i].bar(range(len(samplesize[i])), samplesize[i], width=.1, color=cmap(i/attacks), alpha = .6)
plt.show()
https://i.stack.imgur.com/HKJlE.png
Without data to confirm my assumptions it's hard to tell if this will be correct.
You are not making any attempt to scale the left y-axes so that data must all have the same range. To ensure the right y-axes all have the same scale/limits you need to determine the range (max and min) of the (all) data being plotted on those axes then apply that to all of them.
It isn't clear whether samplesize is a Numpy ndarray or a lists of lists, I'm also assuming that it is a 2-d structure with range(attacks+1) rows. Since you are making bar charts on the second y-axes you only need to find the largest height in all the data.
# for a list of lists
biggest = max(max(row) for row in samplesize)
# or
biggest = max(map(max,samplesize))
# for an ndarray
biggest = samplesize.max()
Then apply that scale to all the right y-axes before they are shown
for ax in axs2:
ax.set_ylim(top=biggest)
If you determine biggest prior to the plot loop you can just add a line to that loop:
for i in range(attacks+1):
...
axs2[i].set_ylim(top=biggest)
You'll find plenty of related SO Q&A'a searching with the terms: matplotlib subplots same y scale, matplotlib subplots y axis limits or something similar.
Here is a toy example:
from matplotlib import pyplot as plt
import numpy as np
lines = np.random.randint(0,200,(5,10))
bars = [np.random.randint(0,np.random.randint(0,10000),10) for _ in (0,0,0,0,0,)]
fig, axs = plt.subplots(lines.shape[0], sharex=True, sharey=True)
axs2 = [ax.twinx() for ax in axs]
#xs = np.arange(lines.shape[1])
xs = np.arange(1,11)
biggest = max(map(max,bars))
for ax,ax2,line,row in zip(axs,axs2,lines,bars):
bars = ax2.bar(xs,row)
ax.plot(line)
ax2.set_ylim(top=biggest)
plt.show()
plt.close()
In pyplot, you can change the order of different graphs using the zorder option or by changing the order of the plot() commands. However, when you add an alternative axis via ax2 = twinx(), the new axis will always overlay the old axis (as described in the documentation).
Is it possible to change the order of the axis to move the alternative (twinned) y-axis to background?
In the example below, I would like to display the blue line on top of the histogram:
import numpy as np
import matplotlib.pyplot as plt
import random
# Data
x = np.arange(-3.0, 3.01, 0.1)
y = np.power(x,2)
y2 = 1/np.sqrt(2*np.pi) * np.exp(-y/2)
data = [random.gauss(0.0, 1.0) for i in range(1000)]
# Plot figure
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
ax2.hist(data, bins=40, normed=True, color='g',zorder=0)
ax2.plot(x, y2, color='r', linewidth=2, zorder=2)
ax1.plot(x, y, color='b', linewidth=2, zorder=5)
ax1.set_ylabel("Parabola")
ax2.set_ylabel("Normal distribution")
ax1.yaxis.label.set_color('b')
ax2.yaxis.label.set_color('r')
plt.show()
Edit: For some reason, I am unable to upload the image generated by this code. I will try again later.
You can set the zorder of an axes, ax.set_zorder(). One would then need to remove the background of that axes, such that the axes below is still visible.
ax2 = ax1.twinx()
ax1.set_zorder(10)
ax1.patch.set_visible(False)
I am trying to plot a large dataset with a scatter plot.
I want to use matplotlib to plot it with single pixel marker.
It seems to have been solved.
https://github.com/matplotlib/matplotlib/pull/695
But I cannot find a mention of how to get a single pixel marker.
My simplified dataset (data.csv)
Length,Time
78154393,139.324091
84016477,229.159305
84626159,219.727537
102021548,225.222662
106399706,221.022827
107945741,206.760239
109741689,200.153263
126270147,220.102802
207813132,181.67058
610704756,50.59529
623110004,50.533158
653383018,52.993885
659376270,53.536834
680682368,55.97628
717978082,59.043843
My code is below.
import pandas as pd
import os
import numpy
import matplotlib.pyplot as plt
inputfile='data.csv'
iplevel = pd.read_csv(inputfile)
base = os.path.splitext(inputfile)[0]
fig = plt.figure()
plt.yscale('log')
#plt.xscale('log')
plt.title(' My plot: '+base)
plt.xlabel('x')
plt.ylabel('y')
plt.scatter(iplevel['Time'], iplevel['Length'],color='black',marker=',',lw=0,s=1)
fig.tight_layout()
fig.savefig(base+'_plot.png', dpi=fig.dpi)
You can see below that the points are not single pixel.
Any help is appreciated
The problem
I fear that the bugfix discussed at matplotlib git repository that you're citing is only valid for plt.plot() and not for plt.scatter()
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(4,2))
ax = fig.add_subplot(121)
ax2 = fig.add_subplot(122, sharex=ax, sharey=ax)
ax.plot([1, 2],[0.4,0.4],color='black',marker=',',lw=0, linestyle="")
ax.set_title("ax.plot")
ax2.scatter([1,2],[0.4,0.4],color='black',marker=',',lw=0, s=1)
ax2.set_title("ax.scatter")
ax.set_xlim(0,8)
ax.set_ylim(0,1)
fig.tight_layout()
print fig.dpi #prints 80 in my case
fig.savefig('plot.png', dpi=fig.dpi)
The solution: Setting the markersize
The solution is to use a usual "o" or "s" marker, but set the markersize to be exactly one pixel. Since the markersize is given in points, one would need to use the figure dpi to calculate the size of one pixel in points. This is 72./fig.dpi.
For aplot`, the markersize is directly
ax.plot(..., marker="o", ms=72./fig.dpi)
For a scatter the markersize is given through the s argument, which is in square points,
ax.scatter(..., marker='o', s=(72./fig.dpi)**2)
Complete example:
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(4,2))
ax = fig.add_subplot(121)
ax2 = fig.add_subplot(122, sharex=ax, sharey=ax)
ax.plot([1, 2],[0.4,0.4], marker='o',ms=72./fig.dpi, mew=0,
color='black', linestyle="", lw=0)
ax.set_title("ax.plot")
ax2.scatter([1,2],[0.4,0.4],color='black', marker='o', lw=0, s=(72./fig.dpi)**2)
ax2.set_title("ax.scatter")
ax.set_xlim(0,8)
ax.set_ylim(0,1)
fig.tight_layout()
fig.savefig('plot.png', dpi=fig.dpi)
For anyone still trying to figure this out, the solution I found was to specify the s argument in plt.scatter.
The s argument refers to the area of the point you are plotting.
It doesn't seem to be quite perfect, since s=1 seems to cover about 4 pixels of my screen, but this definitely makes them smaller than anything else I've been able to find.
https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.scatter.html
s : scalar or array_like, shape (n, ), optional
size in points^2. Default is rcParams['lines.markersize'] ** 2.
Set the plt.scatter() parameter to linewidths=0 and figure out the right value for the parameter s.
Source: https://stackoverflow.com/a/45803960/4063622
I am trying to set the x and y limits on a subplot but am having difficultly. I suspect that the difficultly stems from my fundamental lack of understanding of how figures and subplots work. I have read these two questions:
question 1
question 2
I tried to use that approach, but neither had any effect on the x and y limits. Here's my code:
fig = plt.figure(figsize=(9,6))
ax = plt.subplot(111)
ax.hist(sub_dict['b'], bins=30, color='r', alpha=0.3)
ax.set_ylim=([0,200])
ax.set_xlim=([0,100])
plt.xlabel('x')
plt.ylabel('y')
plt.title('title')
plt.show()
I am confused as whether to apply commands to fig or ax? For instance .xlabel and .title don't seem to be available for ax. Thanks
Why don't you do:
Ax = fig.add_subplot(111)
import matplotlib.pyplot as plt
import numpy as np
mu, sigma = 100, 15
x = mu + sigma*np.random.randn(100)
fig = plt.figure(figsize=(9,6))
ax = fig.add_subplot(111)
ax.hist(x, bins=30, color='r', alpha=0.3)
ax.set_ylim=(0, 200)
ax.set_xlim=(0, 100)
plt.xlabel('x')
plt.ylabel('y')
plt.title('title')
plt.show()
I've run your code on some sample code, and I'm attaching the screenshot. I'm not sure this is the desired result but this is what I got.
For a multiplot, where you have subplots in a single figure, you can have several xlabel and one title
fig.title("foobar")
ax.set_xlabel("x")
This is explained in great detail here on the Matplotlib website.
You in your case, use a subplot for just a single plot. This is possible, just doesn't make a lot of sense. Plots like the one below are supposed to be created with the subplot feature:
To answer your question: you can set the x- and y-limits on a per-subplot and per-axis basis by simply addressing the respective subplot directly (ax for subplot 1) and them calling the set_xlabel member function to set the label on the x-axis.
EDIT
For your updated question:
Use this code as inspiration, I had to generate some data on my own so no guarantees:
import matplotlib.pyplot as plt
plt.hist(sub_dict['b'], bins=30, color='r', alpha=0.3)
plt.ylim(0,200)
plt.xlim(0,100)
plt.xlabel('x')
plt.ylabel('y')
plt.title('title')
plt.show()
Bit more googling and I got the following that has worked:
sub_dict = subset(data_dict, 'b', 'a', greater_than, 10)
fig = plt.figure(figsize=(9,6))
ax = fig.add_subplot(111)
ax.hist(sub_dict['b'], bins=30, color='r', alpha=0.3)
plt.ylim(0,250)
plt.xlim(0,100)
plt.xlabel('x')
plt.ylabel('y')
plt.title('title')
plt.show()