Seaborn heatmap auto sizing the cells - python

I am trying to build a heat map which is working great if there are lot of categorical variables but not looking great when there are two to three data points as shown in the second image. I am looking for a way to auto adjust based on the data points.
here is the function
def bivariate(col1,col2,Title,cbar_size):
temp2=modifiedloan.groupby([col1,col2,'loan_status']).id.agg('count').to_frame('count').reset_index()
temp3=temp2.pivot_table(index=(col1,col2), columns='loan_status', values='count').fillna(0)
temp3['default%']=(temp3[0]/(temp3[0]+temp3[1]))
temp3=temp3.reset_index()
temp4=temp3.pivot_table(index=col1, columns=col2, values='default%').fillna(0)
temp5=temp3.pivot_table(index=col1, columns=col2, values=[0]).fillna(0)
f, (ax1) = plt.subplots(nrows=1, ncols=1, figsize=(12,6))
cmap = sns.cm.rocket_r
sns.heatmap(temp4,linewidths=1, ax=ax1,annot=False, fmt='g',cmap=cmap,cbar=True,cbar_kws={"shrink": cbar_size})
sns.heatmap(temp5, annot=True, annot_kws={'va':'top'}, fmt="", cbar=False,ax=ax1)
sns.heatmap(temp4, annot=True, fmt=".1%",annot_kws={'va':'bottom'}, cbar=False,cmap=cmap)
plt.ylim(b, t) # update the ylim(bottom, top) values
ax1.set_title(Title)
plt.tight_layout()

I realized the problem is with the version of seaborn installed. The same functions worked well on one of my colleagues machine.

Related

Boxplot image combining

I am working now on plot my dataset by boxplot as in below code
plt.figure(figsize=(8,5))
fig = plt.figure()
num_list=Final_dataset.columns.values.tolist()
for i in range(len(num_list)):
column=num_list[i]
sns.boxplot(x="label", y=column, data=Final_dataset, palette='Set2')
plt.savefig('{}.png'. format(i))
plt.show()
I need to produce one image that combine all attributes figures as in this figure rather than several figures.
how Ican fix it?
thanks, a lot

Matplotlib scatter plot dual y-axis

I try to figure out how to create scatter plot in matplotlib with two different y-axis values.
Now i have one and need to add second with index column values on y.
points1 = plt.scatter(r3_load["TimeUTC"], r3_load["r3_load_MW"],
c=r3_load["r3_load_MW"], s=50, cmap="rainbow", alpha=1) #set style options
plt.rcParams['figure.figsize'] = [20,10]
#plt.colorbar(points)
plt.title("timeUTC vs Load")
#plt.xlim(0, 400)
#plt.ylim(0, 300)
plt.xlabel('timeUTC')
plt.ylabel('Load_MW')
cbar = plt.colorbar(points1)
cbar.set_label('Load')
Result i expect is like this:
So second scatter set should be for TimeUTC vs index. Colors are not the subject;) also in excel y-axes are different sites, but doesnt matter.
Appriciate your help! Thanks, Paulina
Continuing after the suggestions in the comments.
There are two ways of using matplotlib.
Via the matplotlib.pyplot interface, like you were doing in your original code snippet with .plt
The object-oriented way. This is the suggested way to use matplotlib, especially when you need more customisation like in your case. In your code, ax1 is an Axes instance.
From an Axes instance, you can plot your data using the Axes.plot and Axes.scatter methods, very similar to what you did through the pyplot interface. This means, you can write a Axes.scatter call instead of .plot and use the same parameters as in your original code:
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.scatter(r3_load["TimeUTC"], r3_load["r3_load_MW"],
c=r3_load["r3_load_MW"], s=50, cmap="rainbow", alpha=1)
ax2.plot(r3_dda249["TimeUTC"], r3_dda249.index, c='b', linestyle='-')
ax1.set_xlabel('TimeUTC')
ax1.set_ylabel('r3_load_MW', color='g')
ax2.set_ylabel('index', color='b')
plt.show()

Matplotlib Subplot axes sharing: Apply to every other plot?

I am trying to find a way to apply the shared axes parameters of subplot() to every other plot in a series of subplots.
I've got the following code, which uses data from RPM4, based on rows in fpD
fig, ax = plt.subplots(2*(fpD['name'].count()), sharex=True, figsize=(6,fpD['name'].count()*2),
gridspec_kw={'height_ratios':[5,1]*fpD['name'].count()})
for i, r in fpD.iterrows():
RPM4[RPM4['name'] == RPM3.iloc[i,0]].plot(x='date', y='RPM', ax=ax[(2*i)], legend=False)
RPM4[RPM4['name'] == RPM3.iloc[i,0]].plot(kind='area', color='lightgrey', x='date', y='total', ax=ax[(2*i)+1],
legend=False,)
ax[2*i].set_title('test', fontsize=12)
plt.tight_layout()
Which produces an output that is very close to what I need. It loops through the 'name' column in a table and produces two plots for each, and displays them as subplots:
As you can see, the sharex parameter works fine for me here, since I want all the plots to share the same axis.
However, what I'd really like is for all the even-numbered (bigger) plots to share the same y axis, and for the odd-numbered (small grey) plots to all share a different y axis.
Any help on accomplishing this is much appreciated, thanks!

Avoid overlapping on seaborn plots

I'm making some EDA using pandas and seaborn, this is the code I have to plot the histograms of a group of features:
skewed_data = pd.DataFrame.skew(data)
skewed_features =skewed_data.index
fig, axs = plt.subplots(ncols=len(skewed_features))
plt.ticklabel_format(style='sci', axis='both', scilimits=(0,0))
for i,skewed_feature in enumerate(skewed_features):
g = sns.distplot(data[column])
sns.distplot(data[skewed_feature], ax=axs[i])
This is the result I'm getting:
Is not readable, how can I avoid that issue?
I know you are concerning about the layout of the figures. However, you need to first decide how to represent your data. Here are two choices for your case
(1) Multiple lines in one figure and
(2) Multiple subplots 2x2, each subplot draws one line.
I am not quite familiar with searborn, but the plotting of searborn is based on matplotlib. I could give you some basic ideas.
To archive (1), you can first declare the figure and ax, then add all line to this ax. Example codes:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
# YOUR LOOP, use the ax parameter
for i in range(3)
sns.distplot(data[i], ax=ax)
To archive (2), same as above, but with different number subplots, and put your line in the different subplot.
# Four subplots, 2x2
fig, axarr = plt.subplots(2,2)
# YOUR LOOP, use different cell
You may check matplotlib subplots demo. To do a good visualization is a very tough work. There are so many documents to read. Check the gallery of matplotlib or seaborn is a good and quick way to understand how some kinds of visualization are implemented.
Thanks.

x axis label disappearing in matplotlib and basic plotting in python

I am new to matplotlib, and I am finding it very confusing. I have spent quite a lot of time on the matplotlib tutorial website, but I still cannot really understand how to build a figure from scratch. To me, this means doing everything manually... not using the plt.plot() function, but always setting figure, axis handles.
Can anyone explain how to set up a figure from the ground up?
Right now, I have this code to generate a double y-axis plot. But my xlabels are disappearing and I dont' know why
fig, ax1 = plt.subplots()
ax1.plot(yearsTotal,timeseries_data1,'r-')
ax1.set_ylabel('Windspeed [m/s]')
ax1.tick_params('y',colors='r')
ax2 = ax1.twinx()
ax2.plot(yearsTotal,timeseries_data2,'b-')
ax2.set_xticks(np.arange(min(yearsTotal),max(yearsTotal)+1))
ax2.set_xticklabels(ax1.xaxis.get_majorticklabels(), rotation=90)
ax2.set_ylabel('Open water duration [days]')
ax2.tick_params('y',colors='b')
plt.title('My title')
fig.tight_layout()
plt.savefig('plots/my_figure.png',bbox_inches='tight')
plt.show()
Because you are using a twinx, it makes sense to operate only on the original axes (ax1).
Further, the ticklabels are not defined at the point where you call ax1.xaxis.get_majorticklabels().
If you want to set the ticks and ticklabels manually, you can use your own data to do so (although I wouldn't know why you'd prefer this over using the automatic labeling) by specifying a list or array
ticks = np.arange(min(yearsTotal),max(yearsTotal)+1)
ax1.set_xticks(ticks)
ax1.set_xticklabels(ticks)
Since the ticklabels are the same as the tickpositions here, you may also just do
ax1.set_xticks(np.arange(min(yearsTotal),max(yearsTotal)+1))
plt.setp(ax1.get_xticklabels(), rotation=70)
Complete example:
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(1)
yearsTotal = np.arange(1977, 1999)
timeseries_data1 = np.cumsum(np.random.normal(size=len(yearsTotal)))+5
timeseries_data2 = np.cumsum(np.random.normal(size=len(yearsTotal)))+20
fig, ax1 = plt.subplots()
ax1.plot(yearsTotal,timeseries_data1,'r-')
ax1.set_ylabel('Windspeed [m/s]')
ax1.tick_params('y',colors='r')
ax1.set_xticks(np.arange(min(yearsTotal),max(yearsTotal)+1))
plt.setp(ax1.get_xticklabels(), rotation=70)
ax2 = ax1.twinx()
ax2.plot(yearsTotal,timeseries_data2,'b-')
ax2.set_ylabel('Open water duration [days]')
ax2.tick_params('y',colors='b')
plt.title('My title')
fig.tight_layout()
plt.show()
Based on your code, it is not disappear, it is set (overwrite) by these two functions:
ax2.set_xticks(np.arange(min(yearsTotal),max(yearsTotal)+1))
ax2.set_xticklabels(ax1.xaxis.get_majorticklabels(), rotation=90)
set_xticks() on the axes will set the locations and set_xticklabels() will set the xtick labels with list of strings labels.

Categories

Resources