pandas plot replace xticks - python

I am plotting grouped panda data frame
score = pd.DataFrame()
score['Score'] = svm_score
score['Wafer_Slot'] = desc.Wafer_Slot[test_index].tolist()
gscore = score.groupby('Wafer_Slot')
score_plot = [score for ws, score in gscore]
ax = gscore.boxplot(subplots=False)
ax.set_xticklabels(range(52)) # does not work
plt.xlabel('Wafer Slot')
plt.show()
It is working well but the x axis is impossible to read as there are numerous numbers overlapping. I would like the x axis be a counter of the boxplot.
How can I do that?

The boxplot method doesn't return the axes object like the plot method of DataFrames and Series. Try this:
gscore.boxplot(subplots=False)
ax = plt.gca()
ax.set_xticklabels(range(52))
The boxplot method returns a dict or OrderedDict of dicts of line objects by the look of it.

Related

How to display all legends when plotting using *args & seaborn

My data & code are as below
w = [1,2,3,4,5,6,7,8,9,10]
vals = [[1,2,3,4,5,6,7,8,9,10],[2,4,6,8,8,8,8,8,7,1],[1,4,2,4,8,9,8,8,7,2]]
def plot_compare(*id_nums):
fig = plt.figure(figsize=(10, 5))
leg=[]
for id_num in id_nums:
rel = vals[id_num]
sns.lineplot(x=w, y=rel)
leg.append(id_num)
fig.legend(labels=[leg],loc=5,);
plot_compare(0,2)
The idea was to get multiple line plots with just one function (I my actual data I have a lot of values that need to be plotted)
When I run the code as above, I get the plot as below.
Line plots are exactly as I want, but the legend is just one item instead of 2 items (since I have plotted 2 line graphs).
I have tried moving the legend line inside of the for loop but no use. I want a may legends as the line plots.
Can anyone help?
You are having legend as list of list. Instead use fig.legend(labels=leg,loc=5)
Use:
w = [1,2,3,4,5,6,7,8,9,10]
vals = [[1,2,3,4,5,6,7,8,9,10],[2,4,6,8,8,8,8,8,7,1],[1,4,2,4,8,9,8,8,7,2]]
def plot_compare(*id_nums):
fig = plt.figure(figsize=(10, 5))
leg=[]
for id_num in id_nums:
rel = vals[id_num]
sns.lineplot(x=w, y=rel)
leg.append(id_num)
fig.legend(labels=leg,loc=5)
plt.show()
plot_compare(0,2)

Plotting subplots using dataframes

I have dictionary containing 9 dataframes. I want to create a 3,3 subplot and plot bar charts for each dataframe.
To plot a single plot I would do this (just a singplot not considering subplots),
%matplotlib inline
with plt.style.context('bmh'):
famd = satFAMD_obj['MODIS.NDVI']
df_norm = satFAMD_dfNorm['MODIS.NDVI']
df_cor =famd.column_correlations(df_norm)
df_cor.columns = ['component 1','component 2', 'component 3']
df_cor.plot(kind = 'bar',cmap = 'Set1', figsize = (10,6))
plt.show()
where satFAMD_obj & satFAMD_dfNorm are two dictionaries containing factor analysis trained objects and a dataframes. In the next line I create a new dataframe called df_cor and then plot it using this line df_cor.plot(kind = 'bar',cmap = 'Set1', figsize = (10,6)).
Now my problem is when it comes to multiple subplots how do I do this ?
I cannot simply do this,
fig,ax = plt.subplots(3,3, figsize = (12,8))
ax[0,0].df_cor.plot(kind = 'bar',cmap = 'Set1')
Any ideas?
I'm supposing that all of your keys in your two dictionaries will need to be plotted.
You will:
declare subplots,
iterate over dictionaries,
iterate over the axes objects,
plot to each set of axes.
Using code like the below example:
fig,ax = plt.subplots(3,3, figsize = (12,8))
for k1,k2 in zip(satFAMD_obj.keys(),satFAMD_dfNorm.keys()):
for axes in ax.flatten():
famd = satFAMD_obj[k1]
df_norm = satFAMD_dfNorm[k2]
df_cor = famd.column_correlations(df_norm)
df_cor.columns = ['component 1','component 2', 'component 3']
df_cor.plot(kind = 'bar',cmap = 'Set1',ax=axes)
# ^^^^^^^

Wrangling x-axis datetime labels on matplotlib

I have a pandas DataFrame with a DateTime index.
I can plot a timeseries from it, and by default it looks fine.
But when I try to print a bar chart from the same DataFrame, the xAxis labels are ruined (massive overlapping). (Also the spacing of the data is weird (big gaps between sets of bars)
I tried autoformat_xdate(), but that didn't help anything.
This is the simple code fragment I used to generate the charts
entire_df['predict'] = regr.predict(entire_df[X_cols])
entire_df['error'] = entire_df['predict']-entire_df['px_usd_mmbtu']
#entire_df['error'].plot(kind='hist')
fig=plt.figure()
entire_df[['px_usd_mmbtu', 'predict']].plot()
fig2 = plt.figure()
entire_df['error'].plot(kind='bar')
#fig2.autofmt_xdate() #doesn't help
print (type(error_df.index))
Try this:
entire_df['predict'] = regr.predict(entire_df[X_cols])
entire_df['error'] = entire_df['predict']-entire_df['px_usd_mmbtu']
plt.figure(figsize=(15,15))
plt.xticks(rotation = 90) # or change from 90 to 45
#entire_df['error'].plot(kind='hist')
entire_df[['px_usd_mmbtu', 'predict']].plot()
entire_df['error'].plot(kind='bar')

Combining FacetGrid and dual Y-axis in Pandas

I am trying to plot two different variables (linked by a relation of causality), delai_jour and date_sondage on a single FacetGrid. I can do it with this code:
g = sns.FacetGrid(df_verif_sum, col="prefecture", col_wrap=2, aspect=2, sharex=True,)
g = g.map(plt.plot, "date_sondage", "delai_jour", color="m", linewidth=2)
g = g.map(plt.bar, "date_sondage", "impossible")
which gives me this:
FacetGrid
(There are 33 of them in total).
I'm interested in comparing the patterns across the various prefecture, but due to the difference in magnitude I cannot see the changes in the line chart.
For this specific work, the best way to do it is to create a secondary y axis, but I can't seem to make anything work: it doesn't look like it's possible with FacetGrid, and I didn't understand the code not was able to replicate the examples i've seen with pure matplotlib.
How should I go about it?
I got this to work by iterating through the axes and plotting a secondary axis as in a typical Seaborn graph.
Using the OP example:
g = sns.FacetGrid(df_verif_sum, col="prefecture", col_wrap=2, aspect=2, sharex=True)
g = g.map(plt.plot, "date_sondage", "delai_jour", color="m", linewidth=2)
for ax, (_, subdata) in zip(g.axes, df_verif_sum.groupby('prefecture')):
ax2=ax.twinx()
subdata.plot(x='data_sondage',y='impossible', ax=ax2,legend=False,color='r')
If you do any formatting to the x-axis, you may have to do it to both ax and ax2.
Here's an example where you apply a custom mapping function to the dataframe of interest. Within the function, you can call plt.gca() to get the current axis at the facet being currently plotted in FacetGrid. Once you have the axis, twinx() can be called just like you would in plain old matplotlib plotting.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
def facetgrid_two_axes(*args, **kwargs):
data = kwargs.pop('data')
dual_axis = kwargs.pop('dual_axis')
alpha = kwargs.pop('alpha', 0.2)
kwargs.pop('color')
ax = plt.gca()
if dual_axis:
ax2 = ax.twinx()
ax2.set_ylabel('Second Axis!')
ax.plot(data['x'],data['y1'], **kwargs, color='red',alpha=alpha)
if dual_axis:
ax2.plot(df['x'],df['y2'], **kwargs, color='blue',alpha=alpha)
df = pd.DataFrame()
df['x'] = np.arange(1,5,1)
df['y1'] = 1 / df['x']
df['y2'] = df['x'] * 100
df['facet'] = 'foo'
df2 = df.copy()
df2['facet'] = 'bar'
df3 = pd.concat([df,df2])
win_plot = sns.FacetGrid(df3, col='facet', size=6)
(win_plot.map_dataframe(facetgrid_two_axes, dual_axis=True)
.set_axis_labels("X", "First Y-axis"))
plt.show()
This isn't the prettiest plot as you might want to adjust the presence of the second y-axis' label, the spacing between plots, etc. but the code suffices to show how to plot two series of differing magnitudes within FacetGrids.

Overlapping boxplots in python

I have the foll. dataframe:
Av_Temp Tot_Precip
278.001 0
274 0.0751864
270.294 0.631634
271.526 0.229285
272.246 0.0652201
273 0.0840059
270.463 0.0602944
269.983 0.103563
268.774 0.0694555
269.529 0.010908
270.062 0.043915
271.982 0.0295718
and want to plot a boxplot where the x-axis is 'Av_Temp' divided into equi-sized bins (say 2 in this case), and the Y-axis shows the corresponding range of values for Tot_Precip. I have the foll. code (thanks to Find pandas quartiles based on another column), however, when I plot the boxplots, they are getting plotted one on top of another. Any suggestions?
expl_var = 'Av_Temp'
cname = 'Tot_Precip'
df[expl_var+'_Deciles'] = pandas.qcut(df[expl_var], 2)
grp_df = df.groupby(expl_var+'_Deciles').apply(lambda x: numpy.array(x[cname]))
fig, ax = plt.subplots()
for i in range(len(grp_df)):
box_arr = grp_df[i]
box_arr = box_arr[~numpy.isnan(box_arr)]
stats = cbook.boxplot_stats(box_arr, labels = str(i))
ax.bxp(stats)
ax.set_yscale('log')
plt.show()
Since you're using pandas already, why not use the boxplot method on dataframes?
expl_var = 'Av_Temp'
cname = 'Tot_Precip'
df[expl_var+'_Deciles'] = pandas.qcut(df[expl_var], 2)
ax = df.boxplot(by='Av_Temp_Deciles', column='Tot_Precip')
ax.set_yscale('log')
That produces this: http://i.stack.imgur.com/20KPx.png
If you don't like the labels, throw in a
plt.xlabel('');plt.suptitle('');plt.title('')
If you want a standard boxplot, the above should be fine. My understanding of the separation of boxplot into boxplot_stats and bxp is to allow you to modify or replace the stats generated and fed to the plotting routine. See https://github.com/matplotlib/matplotlib/pull/2643 for some details.
If you need to draw a boxplot with non-standard stats, you can use boxplot_stats on 2D numpy arrays, so you only need to call it once. No loops required.
expl_var = 'Av_Temp'
cname = 'Tot_Precip'
df[expl_var+'_Deciles'] = pandas.qcut(df[expl_var], 2)
# I moved your nan check into the df apply function
grp_df = df.groupby('Av_Temp_Deciles').apply(lambda x: numpy.array(x[cname][~numpy.isnan(x[cname])]))
# boxplot_stats can take a 2D numpy array of data, and a 1D array of labels
# stats is now a list of dictionaries of stats, one dictionary per quantile
stats = cbook.boxplot_stats(grp_df.values, labels=grp_df.index)
# now it's a one-shot plot, no loops
fig, ax = plt.subplots()
ax.bxp(stats)
ax.set_yscale('log')

Categories

Resources