Show scatter plot title from column value

Show scatter plot title from column value - python

I'm making automatic scatterplot charting with regression from my dataframe example
I want to make correlation between Column2 to Column3 and Column2 to Column4 in separate scatter plot group by Column1. For example, there will be 3 scatter plot of Column2 to Column3 each with the title of A, B, and C
For the plotting I'm using pandas scatter plot for example:
df.groupby('Column1').plot.scatter(x='Column2',y='Column3')
Scatter plot return exactly 3 plot, but I want to know how to add chart title based on grouping from Column1 and also how to add the regression line. I haven't use seaborn or matplotlib yet because it's quite confusing for me, so I appreciate if you can explain more :))
EDIT 1
Sorry for not being clear before. My code previously running fine but with output like this.
It's ok but hard to look at which plot belong to which group. Hence my intention is to create something like this
EDIT 2
Shoutout to everyone who are kindly helping. Caina especially helped me a lot, shoutout to him also.
This is the code he write based on several addition on the comment.
fig, axes = plt.subplots(1, df.Column1.nunique(), figsize=(12,8))
groups = df.groupby('Column1')
fig.tight_layout(pad=3)
# If `fig.tight_layout(pad=3)` does not work:
# plt.subplots_adjust(wspace=0.5)
for i, (gname, gdata) in enumerate(groups):
sns.regplot(x='Column2', y='Column3', data=gdata, ax=axes[i])
axes[i].set_title(gname)
axes[i].set_ylim(0,)
And this is my result plot
The title and the axis works beautifully. This thread can be considered closed as I got the help for the plot.
But as you can see, the bottom plot is in weird presentation as it also display the x axis from 0 to 600 and y from 0 to 25 although all of them should have same y axis format. For the other chart they are also stretched horizontally.
I'm trying to use the method here but not really successful with parameter square or equal
Can I ask for further on how to adjust the axis so the plot will be a square?Thank you!

You can iterate over your groups, specifying the title:
fig, axes = plt.subplots(1, df.Column1.nunique(), figsize=(12,8))
groups = df.groupby('Column1')
for i, (gname, gdata) in enumerate(groups):
sns.regplot(x='Column2', y='Column3', data=gdata, ax=axes[i]).set_title(gname)
You can also use seaborn.FacetGrid directly:
g = sns.FacetGrid(df, col='Column1')
g.map(sns.regplot, 'Column2', 'Column2')
Edit (further customization based on new requirements in comments):
fig, axes = plt.subplots(1, df.Column1.nunique(), figsize=(12,8))
groups = df.groupby('Column1')
fig.tight_layout(pad=3)
# If `fig.tight_layout(pad=3)` does not work:
# plt.subplots_adjust(wspace=0.5)
for i, (gname, gdata) in enumerate(groups):
sns.regplot(x='Column2', y='Column3', data=gdata, ax=axes[i])
axes[i].set_title(gname)
axes[i].set_ylim(0,)

You can also use the seaborn in-built functionality. If you want to see the correlation you can just do df.corr().
import seaborn as sns
sns.heatmap(df.corr(),annot=True) # This is HeatMap
You can also use pairplot.
sns.set_style('whitegrid')
sns.set(rc={'figure.figsize':(13.7,10.27)})
sns.set(font_scale=1.3)
sns.set_palette("cubehelix",8)
sns.pairplot(df)

Here's what I did. You can try something like this.
fig, axes = plt.subplots(ncols=3,figsize=(12,6))
plt.subplots_adjust(wspace=0.5, hspace=0.3)
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[0])
axes[0].set_title("A")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[1])
axes[1].set_title("B")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[2])
axes[2].set_title("C")
plt.show()
The output of this is:
If you want to see the answers vertically, then change ncols to nrows and fix the figsize.
fig, axes = plt.subplots(nrows=3,figsize=(3,12))
plt.subplots_adjust(wspace=0.2, hspace=.5)
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[0])
axes[0].set_title("A")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[1])
axes[1].set_title("B")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[2])
axes[2].set_title("C")
plt.show()
This will give you:

Related

How to affect a list of colors to histogram index bar in matplotlib?

I have the the folowing dataframe "freqs2" with index (SD to SD17) and associated values (frequencies) :
freqs
SD 101
SD2 128
...
SD17 65
I would like to affect a list of precise colors (in order) for each index. I've tried the following code :
colors=['#e5243b','#DDA63A', '#4C9F38','#C5192D','#FF3A21','#26BDE2','#FCC30B','#A21942','#FD6925','#DD1367','#FD9D24','#BF8B2E','#3F7E44','#0A97D9','#56C02B','#00689D','#19486A']
freqs2.plot.bar(freqs2.index, legend=False,rot=45,width=0.85, figsize=(12, 6),fontsize=(14),color=colors )
plt.ylabel('Frequency',fontsize=(17))
As result I obtain all my chart bars in red color (first color of the list).
Based on similar questions, I've tried to integrate "freqs2.index" to stipulate that the list of colors concern index but the problem stay the same.

It looks like a bug in pandas, plotting directly in matplotlib or using seaborn (which I recommend) works:
import seaborn as sns
colors=['#e5243b','#dda63a', '#4C9F38','#C5192D','#FF3A21','#26BDE2','#FCC30B','#A21942','#FD6925','#DD1367','#FD9D24','#BF8B2E','#3F7E44','#0A97D9','#56C02B','#00689D','#19486A']
# # plotting directly with matplotlib works too:
# fig = plt.figure()
# ax = fig.add_axes([0,0,1,1])
# ax.bar(x=df.index, height=df['freqs'], color=colors)
ax = sns.barplot(data=df, x= df.index, y='freqs', palette=colors)
ax.tick_params(axis='x', labelrotation=45)
plt.ylabel('Frequency',fontsize=17)
plt.show()
Edit: an issue already exists on Github

How to plot a graph from 2 columns in a dataframe using plotnine?

I am trying to understand how this works
Dataframe is "result"
Column in "result" is Time, Column1, Column2
I am able to plot only a single line from the Column1 from the code below:
(p9.ggplot(data=result,
mapping=p9.aes(x='Time', y='Column1'))
+ p9.geom_point(alpha=0.5)
+ p9.scale_x_datetime(breaks=date_breaks('12 hour'))
)
How to write the code if I wanted it to include Column2? Means plot 2 lines in the chart?

Plotnine's current API (as of 2022-08-09) doesn't include a secondary axis, so this is not possible to do. It is a feature that has been requested several times though, and looks like it is in the roadmap for the next major release.
In any case, if you want to keep with the ggplot2 style you will have to move to R and use the sec.axis of scale_y_continous (you can see several examples here.
If you want to stay in Python, you can just go directly to matplotlib, as secondary axes are supported using the .twinx() method. The working example for your case would be:
import matplotlib.pyplot as plt
ax = result.plot(x='Date', y='Column1', color='blue', legend=False)
sec_axis = ax.twinx()
result.plot(x='Date', y='Column2', ax=sec_axis, color='red', legend=False)
ax.figure.legend()
plt.show()

Why not:
(p9.ggplot(data=result.melt(id_vars = 'Time'),
mapping=p9.aes(x='Time', y='value', color='variable', group = 'variable'))
+ p9.geom_point(alpha=0.5)
+ p9.scale_x_datetime(breaks=date_breaks('12 hour'))
)

creating plot matrix with relplot in seaborn

I am trying to add multiple plots and create a matrix plot with seaborn. unfortunately python give me following warning.
"relplot is a figure-level function and does not accept target axes. You may wish to try scatterplot"
fig, axes = plt.subplots(nrows=5,ncols=5,figsize=(20,20),sharex=True, sharey=True)
for i in range(5):
for j in range(5):
axes[i][j]=seaborn.relplot(x=col[i+2],y=col[j+2],data=df,ax=axes=[i][j])
I would like to know if there's any method with which I can combine all the plots plotted with relplot.

Hi Kinto welcome to StackOverflow!
relplot works differently than for example scatterplot. With relplot you don't need to define subplots and loop over them. Instead you can say what you would like to vary on each row or column of a graph.
For an example from the documentation:
import seaborn as sns
sns.set(style="ticks")
tips = sns.load_dataset("tips")
g = sns.relplot(
x="total_bill", y="tip", hue="day",
col="time", row="sex", data=tips
)
Which says: on each subplot, plot the total bill on the x-axis, the tip on the y-axis and vary the hue in a subplot with the day. Then for each column, plot unique data from the "time" column of the tips dataset. In this case there are two unique times: "Lunch" and "Diner". And finally vary the "sex" for each subplot row. In this case there are two types of "sex": "Male" and "Female", so on one row you plot the male tipping behavior and on the second the female tipping behavior.
I'm not sure what your data looks like, but hopefully this explanation helps you.

How to use a 3rd dataframe column as x axis ticks/labels in matplotlib scatter

I'm struggling to wrap my head around matplotlib with dataframes today. I see lots of solutions but I'm struggling to relate them to my needs. I think I may need to start over. Let's see what you think.
I have a dataframe (ephem) with 4 columns - Time, Date, Altitude & Azimuth.
I produce a scatter for alt & az using:
chart = plt.scatter(ephem.Azimuth, ephem.Altitude, marker='x', color='black', s=8)
What's the most efficient way to set the values in the Time column as the labels/ticks on the x axis?
So:
the scale/gridlines etc all remain the same
the chart still plots alt and az
the y axis ticks/labels remain as is
only the x axis ticks/labels are changed to the Time column.
Thanks

This isn't by any means the cleanest piece of code but the following works for me:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.scatter(ephem.Azimuth, ephem.Altitude, marker='x', color='black', s=8)
labels = list(ephem.Time)
ax.set_xticklabels(labels)
plt.show()
Here you will explicitly force the set_xticklabels to the dataframe Time column which you have.

In other words, you want to change the x-axis tick labels using a list of values.
labels = ephem.Time.tolist()
# make your plot and before calling plt.show()
# insert the following two lines
ax = plt.gca()
ax.set_xticklabels(labels = labels)
plt.show()

Question on Matplotlib bar plot with a pandas dataframe?

I want to create a bar chart with SALES_COV_TYPE as X and ID_ev as Y. I would like to have both the bars in different colors. Rename X axis values, and also rename the X label in legend to something else as can be seen in the image link. Change Id_ev to 'Visits'.
This is my attempt at it and I'm not satisfied with it.
data = {'SALES_COV_TYPE':[84.0,88.0], 'ID_ev':[2360869,882791]}
df = pd.DataFrame(data)
df.plot(kind='bar',y='ID_ev',x='SALES_COV_TYPE')
ax=plt.gca()
my_xticks = ['CAM','GAM']
ax.set_xticklabels(my_xticks)
ax.set_xlabel('Sales Coverage Type')
ax.set_ylabel('Number of Customer Visits')
Plot
I want to create the bar chart using fig, ax method so that when I create subplots in future this becomes a template for me, but I am not being able to get it done. I would not like to use the pandas wrapper for matplotlib if possible. What do you all suggest?
Thanks!

Do you mean something like this:
data = {'SALES_COV_TYPE':[84.0,88.0], 'ID_ev':[2360869,882791], 'SALES_COV_CLASS':['CAM','GAM']}
df = pd.DataFrame(data)
colors = ['green', 'red']
fig, ax = plt.subplots(1, 1, figsize=(4,3))
ax.bar(df['SALES_COV_CLASS'], df['ID_ev'], color = colors)
ax.set_title('Title')
ax.set_xlabel('Sales Coverage Type')
ax.set_ylabel('Number of Customer Visits')
plt.show()
I find it easier to add a column with the name of the group, that way you don't need to reindex the x axis to remove the empty columns.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Show scatter plot title from column value - python

Related

How to affect a list of colors to histogram index bar in matplotlib?

How to plot a graph from 2 columns in a dataframe using plotnine?

creating plot matrix with relplot in seaborn

How to use a 3rd dataframe column as x axis ticks/labels in matplotlib scatter

Question on Matplotlib bar plot with a pandas dataframe?

Categories

Resources