I'm trying to create a histogram with seaborn, where the bins start at 0 and go to 1. However, there is only date in the range from 0.22 to 0.34. I want the empty space more for a visual effect to better present the data.
I create my sheet with
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
%matplotlib inline
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('svg', 'pdf')
df = pd.read_excel('test.xlsx', sheetname='IvT')
Here I create a variable for my list and one that I think should define the range of the bins of the histogram.
st = pd.Series(df['Short total'])
a = np.arange(0, 1, 15, dtype=None)
And the histogram itself looks like this
sns.set_style("white")
plt.figure(figsize=(12,10))
plt.xlabel('Ration short/total', fontsize=18)
plt.title ('CO3 In vitro transcription, Na+', fontsize=22)
ax = sns.distplot(st, bins=a, kde=False)
plt.savefig("hist.svg", format="svg")
plt.show()
Histogram
It creates a graph bit the range in x goes from 0 to 0.2050 and in y from -0.04 to 0.04. So completely different from what I expect. I google searched for quite some time but can't seem to find an answer to my specific problem.
Already, thanks for your help guys.
There are a few approaches to achieve the desired results here. For example, you can change the xaxis limits after you have plotted the histogram, or adjust the range over which the bins are created.
import seaborn as sns
# Load sample data and create a column with values in the suitable range
iris = sns.load_dataset('iris')
iris['norm_sep_len'] = iris['sepal_length'] / (iris['sepal_length'].max()*2)
sns.distplot(iris['norm_sep_len'], bins=10, kde=False)
Change the xaxis limits (the bins are still created over the range of your data):
ax = sns.distplot(iris['norm_sep_len'], bins=10, kde=False)
ax.set_xlim(0,1)
Create the bins over the range 0 to 1:
sns.distplot(iris['norm_sep_len'], bins=10, kde=False, hist_kws={'range':(0,1)})
Since the range for the bins is larger, you now need to use more bins if you want to have the same bin width as when adjusting the xlim:
sns.distplot(iris['norm_sep_len'], bins=45, kde=False, hist_kws={'range':(0,1)})
Related
I am trying to plot a histogram with the proportion of the class (0/1) for each bin.
I have already plotted a barplot with stacked percentage (image below), but it doesn't look the way I want to.
Stacked percentage barplot
I want something like this (it was on this post, but it is coded in R, I want it in python), and if possible, using the seaborn library:
Stacked percentage histplot
My dataset is super simple, it contains a column with the age and another one for classification (0/1):
df.head()
[dataset
With seaborn, you can use sns.histplot(..., multiple='fill').
Here is an example starting from the titanic dataset:
from matplotlib import pyplot as plt
from matplotlib.ticker import PercentFormatter
import seaborn as sns
import numpy as np
titanic = sns.load_dataset('titanic')
ax = sns.histplot(data=titanic, x='age', hue='alive', multiple='fill', bins=np.arange(0, 91, 10), palette='spring')
for bars in ax.containers:
heights = [b.get_height() for b in bars]
labels = [f'{h * 100:.1f}%' if h > 0.001 else '' for h in heights]
ax.bar_label(bars, labels=labels, label_type='center')
ax.yaxis.set_major_formatter(PercentFormatter(1))
ax.set_ylabel('Percentage of age group')
plt.tight_layout()
plt.show()
I plot boxplots using sns.boxplot and pandas.DataFrame.boxplot in python 3.x.
And I want to ask is it possible to adjust the spacing between boxes in boxplot, so the box of Group_b is farther right to the box of Group_a than in the output figures. Thanks
Codes:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
dict_a = {'value':[1,2,3,7,8,9],'name':['Group_a']*3+['Group_b']*3}
dataframe = pd.DataFrame(dict_a)
sns.boxplot( y="value" , x="name" , data=dataframe )
Output figure:
dataframe.boxplot("value" ,by = "name" )
Output figure 2:
The distance between the two boxes is determined by the x axis limits. For a constant distance in data units between the boxes, what makes them spaced more or less appart is the fraction of this data unit distance compared to the overall data space shown on the axis.
For example, in the seaborn case, the first box sits at x=0, the second at x=1. The difference is 1 unit. The maximal distance between the two boxplots is hence achieved by setting the x axis limits to those exact limits,
ax.set_xlim(0, 1)
Of course this will cut half of each box.
So a more useful value would be ax.set_xlim(0-val, 1+val) with val being somewhere in the range of the width of the boxes.
One needs to mention that pandas uses different units. The first box is at x=1, the second at x=2. Hence one would need something like ax.set_xlim(1-val, 2+val).
The following would add a slider to the plot to see the effect of different values.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
dict_a = {'value':[1,2,3,7,8,9],'name':['Group_a']*3+['Group_b']*3}
dataframe = pd.DataFrame(dict_a)
fig, (ax, ax2, ax3) = plt.subplots(nrows=3,
gridspec_kw=dict(height_ratios=[4,4,1], hspace=1))
sns.boxplot( y="value" , x="name" , data=dataframe, width=0.1, ax=ax)
dataframe.boxplot("value", by = "name", ax=ax2)
from matplotlib.widgets import Slider
slider = Slider(ax3, "", valmin=0, valmax=3)
def update(val):
ax.set_xlim(-val, 1+val)
ax2.set_xlim(1-val, 2+val)
slider.on_changed(update)
plt.show()
I am plotting 3 channels of my time series measurements which are more or less centered around (-80). Missing values are filled with (-50) so that they get a bright yellow color and contrast with the rest of the plot. It has no meaning numerically. See the figure and the code below:
f, ax = plt.subplots(figsize=(12.5, 12.5))
sns.heatmap(df.loc[:, ['Ch2', 'Ch3', 'Ch1']].fillna(-50)[:270], cmap='viridis', yticklabels=27, cbar=True, ax=ax)
How can I keep the color range but limit the display scale (i.e the heatmap should stay the same but the color bar ranges only from -70 to -90)?
(Note that the question of how to Set Max value for color bar on seaborn heatmap has already been answered and it is not what I am aiming at, I want vmin and vmax to stay just as they are).
You can set the limits of the colorbar axes similar to any other axes.
ax.collections[0].colorbar.ax.set_ylim(-90,-70)
Complete example:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
data = np.random.rand(82*3)*20-90
data[np.random.randint(1,82*3, size=20)] = np.nan
df = pd.DataFrame(data.reshape(82,3))
ax = sns.heatmap(df, vmin=-90, vmax=-50, cmap="viridis")
ax.set_facecolor("gold")
ax.collections[0].colorbar.ax.set_ylim(-90,-70)
plt.show()
I am doing a histogram plot of a bunch of data that goes from 0 to 1. When I plot I get this
As you can see, the histogram 'blocks' do not align with the y-axis.
Is there a way to set my histogram in order to get the histograms in a constant width of 0.1? Or should I try a diferent package?
My code is quite simple:
import pandas as pd
import numpy as np
from pandas.plotting import scatter_matrix
import matplotlib.pyplot as plt
np.set_printoptions(precision=10,
threshold=10000,
linewidth=150,suppress=True)
E=pd.read_csv("FQCoherentSeparableBons5.csv")
E = E.ix[0:,1:]
E=np.array(E,float)
P0=E[:,0]
P0=pd.DataFrame(P0,columns=['P0'])
scatter_matrix(P0, alpha=0.2, figsize=(6, 6), diagonal='hist',color="red")
plt.suptitle('Distribucio p0')
plt.ylabel('Frequencia p0')
plt.show()
PD: If you are wondering about the data, I is just a random distribution from 0 to 1.
You can pass additional arguments to the pandas histogram using the hist_kwds argument of the scatter_matrix function. If you want ten bins of width 0.1, then your scatter_matrix call should look like
scatter_matrix(P0, alpha=0.2, figsize=(6, 6), diagonal='hist', color="red",
hist_kwds={'bins':[i*0.1 for i in range(11)]})
Additional arguments for the pandas histogram can be found in documentation.
Here is a simple example. I've added a grid to the plot so that you can see the bins align correctly.
import numpy as np
import pandas as pd
from pandas import scatter_matrix
import matplotlib.pyplot as plt
x = np.random.uniform(0,1,100)
scatter_matrix(pd.DataFrame(x), diagonal='hist',
hist_kwds={'bins':[i*0.1 for i in range(11)]})
plt.xlabel('x')
plt.ylabel('frequency')
plt.grid()
plt.show()
By default, the number of bins in the histogram is 10, but just because your data is distributed between 0 and 1 doesn't mean the bins will be evenly spaced over the range. For example, if you do not actually have a data point equal to 1, you will get a result similar to the one in your question.
I am having trouble using the pyplot.hist function to plot 2 histograms on the same figure. For each binning interval, I want the 2 bars to be centered between the bins (Python 3.6 user). To illustrate, here is an example:
import numpy as np
from matplotlib import pyplot as plt
bin_width=1
A=10*np.random.random(100)
B=10*np.random.random(100)
bins=np.arange(0,np.round(max(A.max(),B.max())/bin_width)*bin_width+2*bin_width,bin_width)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(A,bins,color='Orange',alpha=0.8,rwidth=0.4,align='mid',label='A')
ax.hist(B,bins,color='Orange',alpha=0.8,rwidth=0.4,align='mid',label='B')
ax.legend()
ax.set_ylabel('Count')
I get this:
Histogram_1
A and B series are overlapping, which is not good. Knowing there are only 3 option for 'align', (centered on left bin, middle of 2 bins, centered on right bin), i see no other options than modifying the bins, by adding:
bins-=0.25*bin_width
Before plotting A, and adding:
bins+=0.5*bin_width
Before plotting B. That gives me: Histogram
That's better! However, I had to modify the binning, so it is not the same for A and B.
I searched for a simple way to use the same bins, and then shift the 1st and 2nd plot so they are correctly displayed in the binning intervals, but I didn't find it. Any advice?
I hope I explained my problem clearly.
As previously was mentioned in the above comment you do not need a hist plot function. Use numpy histogram function and plot it results with bar function of matplotlib.
According to bins count and count of data types you can calculate bin width. Ticks you may adjust with xticks method:
import numpy as np
import matplotlib.pylab as plt
A=10*np.random.random(100)
B=10*np.random.random(100)
bins=20
# calculate heights and bins for both lists
ahist, abins = np.histogram(A, bins)
bhist, bbins = np.histogram(B, abins)
fig = plt.figure()
ax = fig.add_subplot(111)
# calc bin width for two lists
w = (bbins[1] - bbins[0])/3.
# plot bars
ax.bar(abins[:-1]-w/2.,ahist,width=w,color='r')
ax.bar(bbins[:-1]+w/2.,bhist,width=w,color='orange')
# adjsut xticks
plt.xticks(abins[:-1], np.arange(bins))
plt.show()