Seaborn Box Plot X-Axis Too Crowded - python

Good Day,
See the attached image for reference. The x-axis on the Seaborn bar chart I created has overlapping text and is too crowded. How do I fix this?
The data source is on Kaggle and I was following along with this article: https://towardsdatascience.com/a-quick-guide-on-descriptive-statistics-using-pandas-and-seaborn-2aadc7395f32
Here is the code I used:
sns.set(style = 'darkgrid')
plt.figure(figsize = (20, 10))
ax = sns.countplot(x = 'Regionname', data = df)
Seaborn X-axis too crowded
I'd appreciate any help.
Thanks!

You are not using the figure size you set on the previous line. Try
fig, ax = plt.subplots(figsize=(20, 10)) # generate a figure and return figure and axis handle
sns.countplot(x='Regionname', data=df, ax=ax) # passing the `ax` to seaborn so it knows about it
An extra thing after this might be to rotate the labels:
ax.set_xticklabels(ax.get_xticklabels(), rotation=60)

Related

How to show multiple already plotted matplotlib figures side-by-side or on-top in Python without re-plotting them?

I have already plotted two figures separately in a single jupyter notebook file, and exported them.
What I want is to show them side by side, but not plot them again by using matplotlib.pyplot.subplots.
For example, in Mathematica, it's easier to do this by just saving the figures into a Variable, and displaying them afterwards.
What I tried was saving the figures, using
fig1, ax1 = plt.subplots(1,1)
... #plotting using ax1.plot()
fig2, ax2 = plt.subplots(1,1)
... #plotting using ax2.plot()
Now, those fig1 or fig2 are of type Matplotlib.figure.figure which stores the figure as an 'image-type' instance. I can even see them separately by calling just fig1 or fig2 in my notebook.
But, I can not show them together as by doing something like
plt.show(fig1, fig2)
It returns nothing since, there wasn't any figures currently being plotted.
You may look at this link or this, which is a Mathematica version of what I was talking about.
assuming u want to merge those subplots in the end.
Here is the code
import numpy as np
import matplotlib.pyplot as plt
#e.x function to plot
x = np.linspace(0, 10)
y = np.exp(x)
#almost your code
figure, axes = plt.subplots(1,1)
res_1, = axes.plot(x,y) #saving the results in a tuple
plt.show()
plt.close(figure)
figure, axes = plt.subplots(1,1)
res_2, = axes.plot(x,-y) #same before
plt.show()
#restructure to merge
figure_2, (axe_1,axe_2) = plt.subplots(1,2) #defining rows and columns
axe_1.plot(res_1.get_data()[0], res_1.get_data()[1]) #using the already generated data
axe_2.plot(res_2.get_data()[0], res_2.get_data()[1])
#if you want show them in one
plt.show()
Not quite sure what you mean with:
but not plot them again by using matplotlib.pyplot.subplots.
But you can display two figures next to each other in a jupyter notebook by using:
fig, ax = plt.subplots(nrows=1, ncols=2)
ax[0] = ... # Code for first figure
ax[1] = ... # Code for second figure
plt.show()
Or above each other:
fig, ax = plt.subplots(nrows=2, ncols=1)
ax[0] = ... # Top figure
ax[1] = ... # Bottom figure
plt.show()

set custom tick labels on heatmap color bar

I have a list of dataframes named merged_dfs that I am looping through to get the correlation and plot subplots of heatmap correlation matrix using seaborn.
I want to customize the colorbar tick labels, but I am having trouble figuring out how to do it with my example.
Currently, my colorbar scale values from top to bottom are
[1,0.5,0,-0.5,-1]
I want to keep these values, but change the tick labels to be
[1,0.5,0,0.5,1]
for my diverging color bar.
Here is the code and my attempt:
fig, ax = plt.subplots(nrows=6, ncols=2, figsize=(20,20))
for i, (title,merging) in enumerate (zip(new_name_data,merged_dfs)):
graph = merging.corr()
colormap = sns.diverging_palette(250, 250, as_cmap=True)
a = sns.heatmap(graph.abs(), cmap=colormap, vmin=-1,vmax=1,center=0,annot = graph, ax=ax.flat[i])
cbar = fig.colorbar(a)
cbar.set_ticklabels(["1","0.5","0","0.5","1"])
fig.delaxes(ax[5,1])
plt.show()
plt.close()
I keep getting this error:
AttributeError: 'AxesSubplot' object has no attribute 'get_array'
Several things are going wrong:
fig.colorbar(...) would create a new colorbar, by default appended to the last subplot that was created.
sns.heatmap returns an ax (indicates a subplot). This is very different to matplotlib functions, e.g. plt.imshow(), which would return the graphical element that was plotted.
You can suppress the heatmap's colorbar (cbar=False), and then create it newly with the parameters you want.
fig.colorbar(...) needs a parameter ax=... when the figure contains more than one subplot.
Instead of creating a new colorbar, you can add the colorbar parameters to sns.heatmap via cbar_kws=.... The colorbar itself can be found via ax.collections[0].colobar. (ax.collections[0] is where matplotlib stored the graphical object that contains the heatmap.)
Using an index is strongly discouraged when working with Python. It's usually more readable, easier to maintain and less error-prone to include everything into the zip command.
As now your vmin now is -1, taking the absolute value for the coloring seems to be a mistake.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
merged_dfs = [pd.DataFrame(data=np.random.rand(5, 7), columns=[*'ABCDEFG']) for _ in range(5)]
new_name_data = [f'Dataset {i + 1}' for i in range(len(merged_dfs))]
fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(12, 7))
for title, merging, ax in zip(new_name_data, merged_dfs, axes.flat):
graph = merging.corr()
colormap = sns.diverging_palette(250, 250, as_cmap=True)
sns.heatmap(graph, cmap=colormap, vmin=-1, vmax=1, center=0, annot=True, ax=ax, cbar_kws={'ticks': ticks})
ax.collections[0].colorbar.set_ticklabels([abs(t) for t in ticks])
fig.delaxes(axes.flat[-1])
fig.tight_layout()
plt.show()

Seaborn plot adds extra zeroes to x axis time-stamp labels

I am trying to plot the below dataset as barplot cum pointplot using seaborn.
But the time-stamp in the x-axis labels shows additional zeroes at the end as shown below
The code I use is
import matplotlib.pyplot as plt
import seaborn as sns
fig, ax1 = plt.subplots()
# Plot the barplot
sns.barplot(x='Date', y=y_value, hue='Sentiment', data=mergedData1, ax=ax1)
# Assign y axis label for bar plot
ax1.set_ylabel('No of Feeds')
# Position the legen on the right side outside the box
plt.legend(loc=2, bbox_to_anchor=(1.1, 1), ncol=1)
# Create a dual axis
ax2 = ax1.twinx()
# Plot the ponitplot
sns.pointplot(x='Date', y='meanTRP', data=mergedData1, ax=ax2, color='r')
# Assign y axis label for point plot
ax2.set_ylabel('TRP')
# Hide the grid for secondary axis
ax2.grid(False)
# Give a chart title
plt.title(source+' Social Media Feeds & TRP for the show '+show)
# Automatically align the x axis labels
fig.autofmt_xdate()
fig.tight_layout()
Not sure what is going wrong. Please help me with this. Thanks
Easiest solution is to split the text at the letter "T" as the rest is probably not needed.
ax.set_xticklabels([t.get_text().split("T")[0] for t in ax.get_xticklabels()])
You can still have more control over date format with this code:
ax.set_xticklabels([pd.to_datetime(tm).strftime('%d-%m-%Y') for tm in ax.get_xticklabels()])

x axis label disappearing in matplotlib and basic plotting in python

I am new to matplotlib, and I am finding it very confusing. I have spent quite a lot of time on the matplotlib tutorial website, but I still cannot really understand how to build a figure from scratch. To me, this means doing everything manually... not using the plt.plot() function, but always setting figure, axis handles.
Can anyone explain how to set up a figure from the ground up?
Right now, I have this code to generate a double y-axis plot. But my xlabels are disappearing and I dont' know why
fig, ax1 = plt.subplots()
ax1.plot(yearsTotal,timeseries_data1,'r-')
ax1.set_ylabel('Windspeed [m/s]')
ax1.tick_params('y',colors='r')
ax2 = ax1.twinx()
ax2.plot(yearsTotal,timeseries_data2,'b-')
ax2.set_xticks(np.arange(min(yearsTotal),max(yearsTotal)+1))
ax2.set_xticklabels(ax1.xaxis.get_majorticklabels(), rotation=90)
ax2.set_ylabel('Open water duration [days]')
ax2.tick_params('y',colors='b')
plt.title('My title')
fig.tight_layout()
plt.savefig('plots/my_figure.png',bbox_inches='tight')
plt.show()
Because you are using a twinx, it makes sense to operate only on the original axes (ax1).
Further, the ticklabels are not defined at the point where you call ax1.xaxis.get_majorticklabels().
If you want to set the ticks and ticklabels manually, you can use your own data to do so (although I wouldn't know why you'd prefer this over using the automatic labeling) by specifying a list or array
ticks = np.arange(min(yearsTotal),max(yearsTotal)+1)
ax1.set_xticks(ticks)
ax1.set_xticklabels(ticks)
Since the ticklabels are the same as the tickpositions here, you may also just do
ax1.set_xticks(np.arange(min(yearsTotal),max(yearsTotal)+1))
plt.setp(ax1.get_xticklabels(), rotation=70)
Complete example:
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(1)
yearsTotal = np.arange(1977, 1999)
timeseries_data1 = np.cumsum(np.random.normal(size=len(yearsTotal)))+5
timeseries_data2 = np.cumsum(np.random.normal(size=len(yearsTotal)))+20
fig, ax1 = plt.subplots()
ax1.plot(yearsTotal,timeseries_data1,'r-')
ax1.set_ylabel('Windspeed [m/s]')
ax1.tick_params('y',colors='r')
ax1.set_xticks(np.arange(min(yearsTotal),max(yearsTotal)+1))
plt.setp(ax1.get_xticklabels(), rotation=70)
ax2 = ax1.twinx()
ax2.plot(yearsTotal,timeseries_data2,'b-')
ax2.set_ylabel('Open water duration [days]')
ax2.tick_params('y',colors='b')
plt.title('My title')
fig.tight_layout()
plt.show()
Based on your code, it is not disappear, it is set (overwrite) by these two functions:
ax2.set_xticks(np.arange(min(yearsTotal),max(yearsTotal)+1))
ax2.set_xticklabels(ax1.xaxis.get_majorticklabels(), rotation=90)
set_xticks() on the axes will set the locations and set_xticklabels() will set the xtick labels with list of strings labels.

How to prevent overlapping x-axis labels in sns.countplot

For the plot
sns.countplot(x="HostRamSize",data=df)
I got the following graph with x-axis label mixing together, how do I avoid this? Should I change the size of the graph to solve this problem?
Having a Series ds like this
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(136)
l = "1234567890123"
categories = [ l[i:i+5]+" - "+l[i+1:i+6] for i in range(6)]
x = np.random.choice(categories, size=1000,
p=np.diff(np.array([0,0.7,2.8,6.5,8.5,9.3,10])/10.))
ds = pd.Series({"Column" : x})
there are several options to make the axis labels more readable.
Change figure size
plt.figure(figsize=(8,4)) # this creates a figure 8 inch wide, 4 inch high
sns.countplot(x="Column", data=ds)
plt.show()
Rotate the ticklabels
ax = sns.countplot(x="Column", data=ds)
ax.set_xticklabels(ax.get_xticklabels(), rotation=40, ha="right")
plt.tight_layout()
plt.show()
Decrease Fontsize
ax = sns.countplot(x="Column", data=ds)
ax.set_xticklabels(ax.get_xticklabels(), fontsize=7)
plt.tight_layout()
plt.show()
Of course any combination of those would work equally well.
Setting rcParams
The figure size and the xlabel fontsize can be set globally using rcParams
plt.rcParams["figure.figsize"] = (8, 4)
plt.rcParams["xtick.labelsize"] = 7
This might be useful to put on top of a juypter notebook such that those settings apply for any figure generated within. Unfortunately rotating the xticklabels is not possible using rcParams.
I guess it's worth noting that the same strategies would naturally also apply for seaborn barplot, matplotlib bar plot or pandas.bar.
You can rotate the x_labels and increase their font size using the xticks methods of pandas.pyplot.
For Example:
import matplotlib.pyplot as plt
plt.figure(figsize=(10,5))
chart = sns.countplot(x="HostRamSize",data=df)
plt.xticks(
rotation=45,
horizontalalignment='right',
fontweight='light',
fontsize='x-large'
)
For more such modifications you can refer this link:
Drawing from Data
If you just want to make sure xticks labels are not squeezed together, you can set a proper fig size and try fig.autofmt_xdate().
This function will automatically align and rotate the labels.
plt.figure(figsize=(15,10)) #adjust the size of plot
ax=sns.countplot(x=df['Location'],data=df,hue='label',palette='mako')
ax.set_xticklabels(ax.get_xticklabels(), rotation=40, ha="right") #it will rotate text on x axis
plt.tight_layout()
plt.show()
you can try this code & change size & rotation according to your need.
I don't know whether it is an option for you but maybe turning the graphic could be a solution (instead of plotting on x=, do it on y=), such that:
sns.countplot(y="HostRamSize",data=df)

Categories

Resources