Plotting two-series bar chart in python in a single expression - python

I'm trying to create a bar chart with two series from raw data. My data looks like this:
So what I want to do is group by prev_purchase_count, count unique customers, and split/colour by segment.
I've written a few lines of code which achieve what I want to do but I know there is an easier way - probably one line.
lv_purch = df_customers.loc[df_customers['segment']=='low-value','prev_purchase_count']
hv_purch = df_customers.loc[df_customers['segment']=='high-value','prev_purchase_count']
plt.hist([lv_purch,hv_purch], label=['low-value','high-value'])
plt.legend(loc='upper right')
plt.show()
Thanks a lot!

You can use df.groupby(['prev_purchase_count']) to group them according to low or high

Found it!
import seaborn as sns
ax = sns.countplot(x='prev_purchase_count', hue = 'segment', data=df_customers)
plt.legend(loc='upper right'

Related

how to fix overlap in xlabels using matplotlib

Hello all i'm currently trying to plot some data using matplotlib. Unfortunately every time I plot the xlabels overlap each other. How can I go about getting this to space out the labels?Code snippet below. my dict has keys of 'neighborhood' and values of int
#do data visualization here
myList = dict.items()
#myList = sorted(myList)
x, y = zip(*myList)
plt.xlabel("Neighborhoods")
plt.ylabel("# of Public Art")
plt.title("Public Art Distribution in Pittsburgh")
plt.xscale('log', base=3)
plt.plot(x, y)
plt.show()
I'm trying to get the labels to not overlap here
EDIT: thank you to the two suggestions of rotation and to increase figure size. By using these two I was able to get it displaying correctly
import matplotlib.ticker as ticker
ax.xaxis.set_major_locator(ticker.MultipleLocator(5))
You can try to set the frequency to five or more.

How to plot a graph from 2 columns in a dataframe using plotnine?

I am trying to understand how this works
Dataframe is "result"
Column in "result" is Time, Column1, Column2
I am able to plot only a single line from the Column1 from the code below:
(p9.ggplot(data=result,
mapping=p9.aes(x='Time', y='Column1'))
+ p9.geom_point(alpha=0.5)
+ p9.scale_x_datetime(breaks=date_breaks('12 hour'))
)
How to write the code if I wanted it to include Column2? Means plot 2 lines in the chart?
Plotnine's current API (as of 2022-08-09) doesn't include a secondary axis, so this is not possible to do. It is a feature that has been requested several times though, and looks like it is in the roadmap for the next major release.
In any case, if you want to keep with the ggplot2 style you will have to move to R and use the sec.axis of scale_y_continous (you can see several examples here.
If you want to stay in Python, you can just go directly to matplotlib, as secondary axes are supported using the .twinx() method. The working example for your case would be:
import matplotlib.pyplot as plt
ax = result.plot(x='Date', y='Column1', color='blue', legend=False)
sec_axis = ax.twinx()
result.plot(x='Date', y='Column2', ax=sec_axis, color='red', legend=False)
ax.figure.legend()
plt.show()
Why not:
(p9.ggplot(data=result.melt(id_vars = 'Time'),
mapping=p9.aes(x='Time', y='value', color='variable', group = 'variable'))
+ p9.geom_point(alpha=0.5)
+ p9.scale_x_datetime(breaks=date_breaks('12 hour'))
)

Show scatter plot title from column value

I'm making automatic scatterplot charting with regression from my dataframe example
I want to make correlation between Column2 to Column3 and Column2 to Column4 in separate scatter plot group by Column1. For example, there will be 3 scatter plot of Column2 to Column3 each with the title of A, B, and C
For the plotting I'm using pandas scatter plot for example:
df.groupby('Column1').plot.scatter(x='Column2',y='Column3')
Scatter plot return exactly 3 plot, but I want to know how to add chart title based on grouping from Column1 and also how to add the regression line. I haven't use seaborn or matplotlib yet because it's quite confusing for me, so I appreciate if you can explain more :))
EDIT 1
Sorry for not being clear before. My code previously running fine but with output like this.
It's ok but hard to look at which plot belong to which group. Hence my intention is to create something like this
EDIT 2
Shoutout to everyone who are kindly helping. Caina especially helped me a lot, shoutout to him also.
This is the code he write based on several addition on the comment.
fig, axes = plt.subplots(1, df.Column1.nunique(), figsize=(12,8))
groups = df.groupby('Column1')
fig.tight_layout(pad=3)
# If `fig.tight_layout(pad=3)` does not work:
# plt.subplots_adjust(wspace=0.5)
for i, (gname, gdata) in enumerate(groups):
sns.regplot(x='Column2', y='Column3', data=gdata, ax=axes[i])
axes[i].set_title(gname)
axes[i].set_ylim(0,)
And this is my result plot
The title and the axis works beautifully. This thread can be considered closed as I got the help for the plot.
But as you can see, the bottom plot is in weird presentation as it also display the x axis from 0 to 600 and y from 0 to 25 although all of them should have same y axis format. For the other chart they are also stretched horizontally.
I'm trying to use the method here but not really successful with parameter square or equal
Can I ask for further on how to adjust the axis so the plot will be a square?Thank you!
You can iterate over your groups, specifying the title:
fig, axes = plt.subplots(1, df.Column1.nunique(), figsize=(12,8))
groups = df.groupby('Column1')
for i, (gname, gdata) in enumerate(groups):
sns.regplot(x='Column2', y='Column3', data=gdata, ax=axes[i]).set_title(gname)
You can also use seaborn.FacetGrid directly:
g = sns.FacetGrid(df, col='Column1')
g.map(sns.regplot, 'Column2', 'Column2')
Edit (further customization based on new requirements in comments):
fig, axes = plt.subplots(1, df.Column1.nunique(), figsize=(12,8))
groups = df.groupby('Column1')
fig.tight_layout(pad=3)
# If `fig.tight_layout(pad=3)` does not work:
# plt.subplots_adjust(wspace=0.5)
for i, (gname, gdata) in enumerate(groups):
sns.regplot(x='Column2', y='Column3', data=gdata, ax=axes[i])
axes[i].set_title(gname)
axes[i].set_ylim(0,)
You can also use the seaborn in-built functionality. If you want to see the correlation you can just do df.corr().
import seaborn as sns
sns.heatmap(df.corr(),annot=True) # This is HeatMap
You can also use pairplot.
sns.set_style('whitegrid')
sns.set(rc={'figure.figsize':(13.7,10.27)})
sns.set(font_scale=1.3)
sns.set_palette("cubehelix",8)
sns.pairplot(df)
Here's what I did. You can try something like this.
fig, axes = plt.subplots(ncols=3,figsize=(12,6))
plt.subplots_adjust(wspace=0.5, hspace=0.3)
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[0])
axes[0].set_title("A")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[1])
axes[1].set_title("B")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[2])
axes[2].set_title("C")
plt.show()
The output of this is:
If you want to see the answers vertically, then change ncols to nrows and fix the figsize.
fig, axes = plt.subplots(nrows=3,figsize=(3,12))
plt.subplots_adjust(wspace=0.2, hspace=.5)
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[0])
axes[0].set_title("A")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[1])
axes[1].set_title("B")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[2])
axes[2].set_title("C")
plt.show()
This will give you:

Plotting Bar Graph by Years in Matplotlib

I am trying to plot this DataFrame which records various amounts of money over a yearly series:
from matplotlib.dates import date2num
jp = pd.DataFrame([1000,2000,2500,3000,3250,3750,4500], index=['2011','2012','2013','2014','2015','2016','2017'])
jp.index = pd.to_datetime(jp.index, format='%Y')
jp.columns = ['Money']
I would simply like to make a bar graph out of this using PyPlot (i.e pyplot.bar).
I tried:
plt.figure(figsize=(15,5))
xvals = date2num(jp.index.date)
yvals = jp['Money']
plt.bar(xvals, yvals, color='black')
ax = plt.gca()
ax.xaxis_date()
plt.show()
But the chart turns out like this:
Only by increasing the width substantially will I start seeing the bars. I have a feeling that this graph is attributing the data to the first date of the year (2011-01-01 for example), hence the massive space between each 'bar' and the thinness of the bars.
How can I plot this properly, knowing that this is a yearly series? Ideally the y-axis would contain only the years. Something tells me that I do not need to use date2num(), since this seems like a very common, ordinary plotting exercise.
My guess as to where I'm stuck is not handling the year correctly. As of now I have them as DateTimeIndex, but maybe there are other steps I need to take.
This has puzzled me for 2 days. All solutions I found online seems to use DataFrame.plot, but I would rather learn how to use PyPlot properly. I also intend to add two more sets of bars, and it seems like the most common way to do that is through plt.bar().
Thanks everyone.
You can either do
jp.plot.bar()
which gives:
or plot against the actual years:
plt.bar(jp.index.year, jp.Money)
which gives:

How to Groupby and plot it

I have the following dataframe (with different campaigns)
When I use groupby and try to plot, I get several graphs
df.groupby("Campaign").plot(y=["Visits"], x = "Week")
I would like to have only one graph with all the visits in the same graph by every campaign during the week time. Also because the graphs show up separated, I do not know which one belongs to each campaign.
I would appreciate any tips regarding this.
You could do this:
df.set_index(['Week','Campaign'])['Visits'].unstack().plot(title='Visits by Campaign')
For multiple values of Week/Campaign let's aggregate them with sum or you could use mean to average the values:
df.groupby(['Week','Campaign'])['Visits'].sum().unstack().plot(title='Visits by Campain')
Output:
Another possible solution would be to use seaborn
import seaborn as sns
ax = sns.lineplot(x="Week",
y="Visits",
hue="Campaign",
estimator=None,
lw=1,
data=df)
The documentation is here

Categories

Resources