I'm a R programmer learning python and finding the plotting in python much more difficult than R.
I'm trying to write the following function but haven't been successful. Could anyone help?
import pandas as pd
#example data
df1 = pd.DataFrame({
'PC1':[-2.2,-2.0,2.04,0.97],
'PC2':[0.5,-0.6,0.9,-0.5],
'PC3':[-0.1,-0.2,0.2,0.8],
'f1':['a','a','b','b'],
'f2':['x','y','x','y'],
'f3':['k','g','g','k']
})
def drawPCA(df,**kwargs):
"""Produce a 1x3 subplots of scatterplot; each subplot includes two PCs with
no legend, e.g. subplot 1 is PC1 vs PC2. The legend is on the upper middle of
the figure.
Parameters
----------
df: Pandas DataFrame
The first 3 columns are the PCs, followed by sample characters.
kwargs
To specify hue,style,size, etc. if the plotting uses seaborn.scatterplot;
or c,s,etc. if using pyplot scatter
Example
----------
drawPCA(df1, hue="f1")
drawPCA(df1, c="f1", s="f2") #if plotting uses plt.scatter
drawPCA(df1, hue="f1", size="f2",style="f3")
or more varialbes passable to the actual plotting function
"""
This is what I come up with! Just two question:
is there a parameter to set the legend horizontal, instead of using the ncol?
how to prevent the figure from being displayed when running the function like this?
fig,ax=drawPCA(df1,hue="f1",style="f2",size="f3")
#may do more changing on the figure.
Here is the function:
def drawPCA2(df,**kwargs):
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.figure import figaspect
nUniVals=sum([df[i].unique().size for i in kwargs.values()])
nKeys=len(kwargs.keys())
w, h = figaspect(1/3)
fig1, axs = plt.subplots(ncols=3,figsize=(w,h))
fig1.suptitle("All the PCs")
sns.scatterplot(x="PC1",y="PC2",data=df,legend=False,ax=axs[0],**kwargs)
sns.scatterplot(x="PC1",y="PC3",data=df,legend=False,ax=axs[1],**kwargs)
sns.scatterplot(x="PC2",y="PC3",data=df,ax=axs[2],label="",**kwargs)
handles, labels = axs[2].get_legend_handles_labels()
fig1.legend(handles, labels, loc='lower center',bbox_to_anchor=(0.5, 0.85), ncol=nUniVals+nKeys)
axs[2].get_legend().remove()
fig1.tight_layout(rect=[0, 0.03, 1, 0.9])
return fig1,axs
Related
I tried to create a graph side by side using matplotlib.
I don't get any errors when I run my code, instead, I just get a blank window from MatPlotLib.
Here's the link I used for my CSV.
https://ca.finance.yahoo.com/quote/%5EGSPTSE/history?p=%5EGSPTSE
Previously, I have also created a graph that overlayed the two lines(which works as intended), but they are not displaying as seperate graphs, which is what I am trying to do with my current code.
I tried this video for information in creating these graphs, but I can't replicate the graph shown in the video even when I copy the code.
https://www.youtube.com/watch?v=-2AMr95nUDw
from matplotlib import pyplot as mpl
import pandas as pd
data_better = pd.read_csv('What.csv')
# print(data_better.head()) #I used this part to find out what the headers were for x values
# print(data_better.columns[::])
mpl.axes([15000, 17000, 20000, 23000])
mpl.title("Open Values")
mpl.plot(data_better["Date"], data_better["Open"])
mpl.ylabel("Money")
mpl.axes([15000, 17000, 20000, 23000])
mpl.title("Close Values")
mpl.plot(data_better["Date"], data_better["Close"])
mpl.ylabel("Money")
mpl.show()
pyplot.axes accepts 4-tuple of floats in normalized (0, 1) units to place the axes. You can look at examples in Make Room For Ylabel Using Axesgrid to learn using it.
If you want to plot two plots in one figure, you need use different axes
from matplotlib import pyplot as plt
import pandas as pd
data_better = pd.read_csv('What.csv')
figure, (axes1, axes2) = plt.subplots(nrows=1, ncols=2)
axes1.set_title("Open Values")
axes1.plot(data_better["Date"], data_better["Open"])
axes1.set_ylabel("Money")
axes2.set_title("Close Values")
axes2.plot(data_better["Date"], data_better["Close"])
axes2.set_ylabel("Money")
plt.show()
Is there a way to adjust the axes limits of pairplot(), but not as individual plots? Maybe a setting to produce better axes limits?
I would like to have the plots with a bigger range for the axes. My plots axes allows all the data to be visualized, but it is too 'zoomed in'.
My code is:
import pandas as pd
mport matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
g = sns.pairplot(iris, hue = 'species', diag_kind = 'hist', palette = 'Dark2', plot_kws={"s": 20})
The link for my plot and what I would like to plot to look like is here:
pairplot
To change the subplots, g.map(func, <parameters>) can be used. A small problem is that func needs to accept color as parameter, and plt.margins() gives an error when color is used. Moreover, map uses x and y to indicate the row and column variables. You could write a dummy function that simply calls plt.margin(), for example g.map(lambda *args, **kwargs: plt.margins(x=0.2, y=0.3)).
An alternative is to loop through g.axes.flat and call ax.margins() on each of them. Note that many axes are shared in x and/or y direction. The diagonal is treated differently; for some reason ax.margins needs to be called a second time on the diagonal.
To have the histogram for the different colors stacked instead of overlapping, diag_kws={"multiple": "stack"} can be set.
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
iris = sns.load_dataset('iris')
g = sns.pairplot(iris, hue='species', diag_kind='hist', palette='Dark2',
plot_kws={"s": 20}, diag_kws={"multiple": "stack"})
# g.map(plt.margins, x=0.2, y=0.2) # gives an error
for ax in g.axes.flat:
ax.margins(x=0.2, y=0.2)
for ax in g.diag_axes:
ax.margins(y=0.2)
plt.show()
PS: still another option, is to change the rcParams which will have effect on all the plots created later in the code:
import matplotlib as mpl
mpl.rcParams['axes.xmargin'] = 0.2
mpl.rcParams['axes.ymargin'] = 0.2
Is there an easy way to align two subplots of a time series of different kinds (plot and barplot) in matplotlib? I use the pandas wrapper since I am dealing with pd.Series objects:
import pandas as pd
import matplotlib.pyplot as plt
series = pd._testing.makeTimeSeries()
fig, axes = plt.subplots(2, 1)
series.head(3).plot(marker='o', ax=axes[0])
series.head(3).plot.bar(ax=axes[1])
plt.tight_layout()
The result is not visually great, it would be great to keep the code simplicity and:
Vertically align data points in the top plot to the bars on the bottom plot
Share the axis of the bar plot with the first and remove the visibility on x-axis labels of the top plot altogether (but keep grids whenever present)
Based on the ideas thrown in the comments, I think that this is the simplest solution (giving up the pandas API), which is exactly what I needed:
import pandas as pd
import matplotlib.pyplot as plt
series = pd._testing.makeTimeSeries()
fig, axes = plt.subplots(2, 1, sharex=True)
axes[0].plot(series.head(3), marker='o')
axes[1].bar(series.head(3).index, series.head(3))
plt.tight_layout()
With eventual fix on the xticks for cases with missing values, where the xticks are not plotted daily (e.g. plt.xticks(series.head(3).index)).
Thanks for the help!
I plot boxplots using sns.boxplot and pandas.DataFrame.boxplot in python 3.x.
And I want to ask is it possible to adjust the spacing between boxes in boxplot, so the box of Group_b is farther right to the box of Group_a than in the output figures. Thanks
Codes:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
dict_a = {'value':[1,2,3,7,8,9],'name':['Group_a']*3+['Group_b']*3}
dataframe = pd.DataFrame(dict_a)
sns.boxplot( y="value" , x="name" , data=dataframe )
Output figure:
dataframe.boxplot("value" ,by = "name" )
Output figure 2:
The distance between the two boxes is determined by the x axis limits. For a constant distance in data units between the boxes, what makes them spaced more or less appart is the fraction of this data unit distance compared to the overall data space shown on the axis.
For example, in the seaborn case, the first box sits at x=0, the second at x=1. The difference is 1 unit. The maximal distance between the two boxplots is hence achieved by setting the x axis limits to those exact limits,
ax.set_xlim(0, 1)
Of course this will cut half of each box.
So a more useful value would be ax.set_xlim(0-val, 1+val) with val being somewhere in the range of the width of the boxes.
One needs to mention that pandas uses different units. The first box is at x=1, the second at x=2. Hence one would need something like ax.set_xlim(1-val, 2+val).
The following would add a slider to the plot to see the effect of different values.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
dict_a = {'value':[1,2,3,7,8,9],'name':['Group_a']*3+['Group_b']*3}
dataframe = pd.DataFrame(dict_a)
fig, (ax, ax2, ax3) = plt.subplots(nrows=3,
gridspec_kw=dict(height_ratios=[4,4,1], hspace=1))
sns.boxplot( y="value" , x="name" , data=dataframe, width=0.1, ax=ax)
dataframe.boxplot("value", by = "name", ax=ax2)
from matplotlib.widgets import Slider
slider = Slider(ax3, "", valmin=0, valmax=3)
def update(val):
ax.set_xlim(-val, 1+val)
ax2.set_xlim(1-val, 2+val)
slider.on_changed(update)
plt.show()
In the following code snippet:
import numpy as np
import pandas as pd
import pandas.rpy.common as com
import matplotlib.pyplot as plt
mtcars = com.load_data("mtcars")
df = mtcars.groupby(["cyl"]).apply(lambda x: pd.Series([x["cyl"].count(), np.mean(x["wt"])], index=["n", "wt"])).reset_index()
plt.plot(df["n"], range(len(df["cyl"])), "o")
plt.yticks(range(len(df["cyl"])), df["cyl"])
plt.show()
This code outputs the dot plot graph, but the result looks quite awful, since both the xticks and yticks don't have enough space, that it's quite difficult to notice both 4 and 8 of the cyl variable output its values in the graph.
So how can I plot it with enough space in advance, much like you can do it without any hassles in R/ggplot2?
For your information, both of this code and this doesn't work in my case. Anyone knows the reason? And do I have to bother to creating such subplots in the first place? Is it impossible to automatically adjust the ticks with response to the input values?
I can't quite tell what you're asking...
Are you asking why the ticks aren't automatically positioned or are you asking how to add "padding" around the inside edges of the plot?
If it's the former, it's because you've manually set the tick locations with yticks. This overrides the automatic tick locator.
If it's the latter, use ax.margins(some_percentage) (where some_percentage is between 0 and 1, e.g. 0.05 is 5%) to add "padding" to the data limits before they're autoscaled.
As an example of the latter, by default, the data limits can be autoscaled such that a point can lie on the boundaries of the plot. E.g.:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(range(10), 'ro')
plt.show()
If you want to avoid this, use ax.margins (or equivalently, plt.margins) to specify a percentage of padding to be added to the data limits before autoscaling takes place.
E.g.
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(range(10), 'ro')
ax.margins(0.04) # 4% padding, similar to R.
plt.show()