Extracting data from a scatter plot in Matplotlib - python

I'm writing an interface to do scatter plots in Matplotlib, and I'd like to be able to access the data from a python script.
Right now, my interface is doing:
scat = self.axes.scatter(x_data, y_data, label=label, s=size)
With a standard axes.plot I can do something like:
line = self.axes.plot(x_data, y_data)
data = line[0].get_data()
and that works. What I'd like is something similar, but with the scatter plot.
Can anyone suggest a similar method?

A scatter plot is drawn using PathCollection, so the x, y positions are called "offsets":
import numpy as np
import matplotlib.pyplot as plt
f, ax = plt.subplots()
scat = ax.scatter(np.random.randn(10), np.random.randn(10))
print scat.get_offsets()
[[-0.17477838 -0.47777312]
[-0.97296068 -0.98685982]
[-0.18880346 1.16780445]
[-1.65280361 0.2182109 ]
[ 0.92655599 -1.40315507]
[-0.10468029 0.82269317]
[-0.09516654 -0.80651275]
[ 0.01400393 -1.1474178 ]
[ 1.6800925 0.16243422]
[-1.91496598 -2.12578586]]

Related

Not able to plot box plot separately

I have lot of feature in data and i want to make box plot for each feature. So for that
import pandas as pd
import seaborn as sns
plt.figure(figsize=(25,20))
for data in train_df.columns:
plt.subplot(7,4,i+1)
plt.subplots_adjust(hspace = 0.5, wspace = 0.5)
ax =sns.boxplot(train_df[data])
I did this
and the output is
All the plot are on one image i want something like
( not with skew graphs but with box plot )
What changes i need to do ?
In your code, I cannot see where the i is coming from and also it's not clear how ax was assigned.
Maybe try something like this, first an example data frame:
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
train_df = pd.concat([pd.Series(np.random.normal(i,1,100)) for i in range(12)],axis=1)
Set up fig and a flattened ax for each subplot:
fig,ax = plt.subplots(4,3,figsize=(10,10))
ax = ax.flatten()
The most basic would be to call sns.boxplot assigning ax inside the function:
for i,data in enumerate(train_df.columns):
sns.boxplot(train_df[data],ax=ax[i])

Drawing a surface 3D plot using "plotnine" library

Question : Using the python library 'plotnine', can we draw an interactive 3D surface plot?
Backup Explanations
What I'd like to do is, under python environment, creating an interactive 3D plot with R plot grammars like we do with ggplot2 library in R. It's because I have hard time remembering grammars of matplotlib and other libraries like seaborn.
An interactive 3D plot means a 3D plot that you can zoom in, zoom out, and scroll up and down, etc.
It seems like only Java supported plotting libraries scuh as bokeh or plotly can create interactive 3D plots. But I want to create it with the library 'plotnine' because the library supports ggplot-like grammar, which is easy to remember.
For example, can I draw a 3D surface plot like the one below with the library 'plotnine'?
import plotly.plotly as py
import plotly.graph_objs as go
import pandas as pd
# Read data from a csv
z_data =
pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/
master/api_docs/mt_bruno_elevation.csv')
data = [
go.Surface(
z=z_data.as_matrix()
)]
layout = go.Layout(
title='Mt Bruno Elevation',
autosize=False,
width=500,
height=500,
margin=dict(
l=65,
r=50,
b=65,
t=90
)
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='elevations-3d-surface')
The codes above make a figure like below.
You can check out the complete interactive 3D surface plot in this link
p.s. If i can draw an interactive 3D plot with ggplot-like grammar, it does not have to be the 'plotnine' library that we should use.
Thank you for your time for reading this question!
It is possible, if you are willing to expand plotnine a bit, and caveats apply. The final code is as simple as:
(
ggplot_3d(mt_bruno_long)
+ aes(x='x', y='y', z='height')
+ geom_polygon_3d(size=0.01)
+ theme_minimal()
)
And the result:
First, you need to transform your data into long format:
z_data = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/api_docs/mt_bruno_elevation.csv', index_col=0)
z = z_data.values
nrows, ncols = z.shape
x, y = np.linspace(0, 1, nrows), np.linspace(0, 1, ncols)
x, y = np.meshgrid(x, y)
mt_bruno_long = pd.DataFrame({'x': x.flatten(), 'y': y.flatten(), 'height': z.flatten()})
Then, we need to create equivalents for ggplot and geom_polygon with awareness of the third dimension.
Since writing this answer the code is is now available in plotnine3d package, so you could just:
from plotnine3d import ggplot_3d, geom_polygon_3d
But for completeness, this is how (relatively) simple it is:
from plotnine import ggplot, geom_polygon
from plotnine.utils import to_rgba, SIZE_FACTOR
class ggplot_3d(ggplot):
def _create_figure(self):
figure = plt.figure()
axs = [plt.axes(projection='3d')]
figure._themeable = {}
self.figure = figure
self.axs = axs
return figure, axs
def _draw_labels(self):
ax = self.axs[0]
ax.set_xlabel(self.layout.xlabel(self.labels))
ax.set_ylabel(self.layout.ylabel(self.labels))
ax.set_zlabel(self.labels['z'])
class geom_polygon_3d(geom_polygon):
REQUIRED_AES = {'x', 'y', 'z'}
#staticmethod
def draw_group(data, panel_params, coord, ax, **params):
data = coord.transform(data, panel_params, munch=True)
data['size'] *= SIZE_FACTOR
grouper = data.groupby('group', sort=False)
for i, (group, df) in enumerate(grouper):
fill = to_rgba(df['fill'], df['alpha'])
polyc = ax.plot_trisurf(
df['x'].values,
df['y'].values,
df['z'].values,
facecolors=fill if any(fill) else 'none',
edgecolors=df['color'] if any(df['color']) else 'none',
linestyles=df['linetype'],
linewidths=df['size'],
zorder=params['zorder'],
rasterized=params['raster'],
)
# workaround for https://github.com/matplotlib/matplotlib/issues/9535
if len(set(fill)) == 1:
polyc.set_facecolors(fill[0])
For interactivity you can use any matplotlib backend of your liking, I went with ipympl (pip install ipympl and then %matplotlib widget in a jupyter notebook cell).
The caveats are:
while shading works nice, plot_trisurf does not handle facecolors well (there is a PR to fix it here)
you may want to add a parameter allowing to disable shading, see matplotlib 3D shading examples
faceting, flipping axes etc will not work without further fiddling - this could however be addressed in the future as discussed in this plotnine issue about bringing 3D plots to plotnine.
Edit: In case if the dataset becomes unavailable, here is a self-contained example based on matplotlib's documentation:
import numpy as np
n_radii = 8
n_angles = 36
radii = np.linspace(0.125, 1.0, n_radii)
angles = np.linspace(0, 2*np.pi, n_angles, endpoint=False)[..., np.newaxis]
x = np.append(0, (radii*np.cos(angles)).flatten())
y = np.append(0, (radii*np.sin(angles)).flatten())
z = np.sin(-x*y)
df = pd.DataFrame(dict(x=x,y=y,z=z))
(
ggplot_3d(df)
+ aes(x='x', y='y', z='z')
+ geom_polygon_3d(size=0.01)
+ theme_minimal()
)

Change Error Bar Markers (Caplines) in Pandas Bar Plot

so I am plotting error bar of pandas dataframe. Now the error bar has a weird arrow at the top, but what I want is a horizontal line. For example, a figure like this:
But now my error bar ends with arrow instead of a horinzontal line.
Here is the code i used to generate it:
plot = meansum.plot(
kind="bar",
yerr=stdsum,
colormap="OrRd_r",
edgecolor="black",
grid=False,
figsize=(8, 2),
ax=ax,
position=0.45,
error_kw=dict(ecolor="black", elinewidth=0.5, lolims=True, marker="o"),
width=0.8,
)
So what should I change to make the error become the one I want. Thx.
Using plt.errorbar from matplotlib makes it easier as it returns several objects including the caplines which contain the marker you want to change (the arrow which is automatically used when lolims is set to True, see docs).
Using pandas, you just need to dig the correct line in the children of plot and change its marker:
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({"val":[1,2,3,4],"error":[.4,.3,.6,.9]})
meansum = df["val"]
stdsum = df["error"]
plot = meansum.plot(kind='bar',yerr=stdsum,colormap='OrRd_r',edgecolor='black',grid=False,figsize=8,2),ax=ax,position=0.45,error_kw=dict(ecolor='black',elinewidth=0.5, lolims=True),width=0.8)
for ch in plot.get_children():
if str(ch).startswith('Line2D'): # this is silly, but it appears that the first Line in the children are the caplines...
ch.set_marker('_')
ch.set_markersize(10) # to change its size
break
plt.show()
The result looks like:
Just don't set lolim = True and you are good to go, an example with sample data:
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({"val":[1,2,3,4],"error":[.4,.3,.6,.9]})
meansum = df["val"]
stdsum = df["error"]
plot = meansum.plot(kind='bar',yerr=stdsum,colormap='OrRd_r',edgecolor='black',grid=False,figsize=(8,2),ax=ax,position=0.45,error_kw=dict(ecolor='black',elinewidth=0.5),width=0.8)
plt.show()

Python Seaborn Matplotlib setting line style as legend

I have the following plot build with seaborn using factorplot() method.
Is it possible to use the line style as a legend to replace the legend based on line color on the right?
graycolors = sns.mpl_palette('Greys_r', 4)
g = sns.factorplot(x="k", y="value", hue="class", palette=graycolors,
data=df, linestyles=["-", "--"])
Furthermore I'm trying to get both lines in black color using the color="black" parameter in my factorplot method but this results in an exception "factorplot() got an unexpected keyword argument 'color'". How can I paint both lines in the same color and separate them by the linestyle only?
I have been looking for a solution trying to put the linestyle in the legend like matplotlib, but I have not yet found how to do this in seaborn. However, to make the data clear in the legend I have used different markers:
import seaborn as sns
import numpy as np
import pandas as pd
# creating some data
n = 11
x = np.linspace(0,2, n)
y = np.sin(2*np.pi*x)
y2 = np.cos(2*np.pi*x)
data = {'x': np.append(x, x), 'y': np.append(y, y2),
'class': np.append(np.repeat('sin', n), np.repeat('cos', n))}
df = pd.DataFrame(data)
# plot the data with the markers
# note that I put the legend=False to move it up (otherwise it was blocking the graph)
g=sns.factorplot(x="x", y="y", hue="class", palette=graycolors,
data=df, linestyles=["-", "--"], markers=['o','v'], legend=False)
# placing the legend up
g.axes[0][0].legend(loc=1)
# showing graph
plt.show()
you can try the following:
h = plt.gca().get_lines()
lg = plt.legend(handles=h, labels=['YOUR Labels List'], loc='best')
It worked fine with me.

python lineplot with color according to y values

I am quite a beginner in coding ... Im trying to plot curves from 2columns xy data with full line not scatter. I want y to be colored according to the value of y.
I can make it work for scatter but not for line plot.
my code:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib
read data ... (data are xy 2 columns so one can simply use 2 lists, say a and b)
# a = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17]
# b = [11,12,3,34,55,16,17,18,59,50,51,42,13,14,35,16,17]
fig = plt.figure()
ax = fig.add_subplot(111)
bnorm = []
for i in b:
i = i/float(np.max(b)) ### normalizing the data
bnorm.append(i)
plt.scatter(a, b, c = plt.cm.jet(bnorm))
plt.show()
with scatter it works ...
how can I make it as a line plot with colors ? something like this:

Categories

Resources