How do I reuse a plot layout in iPython notebook>? - python

The code below gives me the image even further below.
flowRates=[2,5,10,20,50]
flowRateTol=0.2
#sets the limits for the plot
xRange=(0,700)
yRange=(0,70)
ax=axes()
ax.set_xlabel('Time (s)')
#ax.set_ylabel('Reaction Force (lbf)')
ax.legend(loc=0)
#set up the second axis
ax.twinx()
ax.set_ylabel('10s Average Flow Rate')
ax.set_xlim(xRange)
ax.set_ylim(yRange)
#shade the acceptable tolerance bands
for flowRate in flowRates:
rectX=[0,xRange[1],xRange[1],0]
rectY=[ flowRate*(1-flowRateTol),
flowRate*(1-flowRateTol),
flowRate*(1+flowRateTol),
flowRate*(1+flowRateTol)]
ax.fill(rectX,rectY,'b', alpha=0.2, edgecolor='r')
However what I'd like to do in my next iPython cell is to actually plot data on the graph. The code I'm using to do so (unsuccessfully is) just has a call to ax.plot(), but I can't get a graph to show up with my data.
Any thoughts? My goal is to have a worflow (that I will present) that goes something like this:
Look how I import my data!
This is how I set up my graph! (show the base plot)
This is how I plot all my data! (show the base plot with the data)
This is how I filter my data! (do some fancy filtering)
This is what the filtered data looks like! (show new data on same base plot)

I would suggest packaging different ideas into functions. E.g
This is how I import data:
def Import_Data(file_name,...):
# Stuff to import data
return data
This is how I plot my data:
def Plot(data..)
Plotting just the base plot seems like a special case that you may do once, but if you really want to be able to show this, and minimise the amount of repeated code just allow data=None to ignore errors and not plot anything.
The great thing about splitting code up like this is that it is easy to make changes to just one function, provided then just worry about inputs and outputs. For instance to filter you can either add a filter paramateter to the plot function, or create new filtered data that is plotted in the same way!

Related

How to plot unfilled markers in sns.scatterplot with 'hue' set?

I have two sets of x-y data, that I'd like to plot as a scatterplot, using sns.scatterplot. I want to highlight two different things:
the difference between different types of data
the difference between the first and the second set of x-y data
For the first, I'm using the inbuilt hue and style, for the second, I'd like to have filled vs. unfilled markers, but I'm wondering how to do so, without doing it all by hand with plt.scatter, where I would have to implement all the magic of sns.scatterplot by hand.
long version, with MWE:
I have X and Y data, and also have some type info for each point of data. I.e. I have a sample 1 which is of type A and yields X=11, Y=21 at the first sampling and X=10, Y=21 at the second sampling. And the same deal for sample 2 of type A, sample 3 of type B and so on (see example file at the end).
So i want to visualize the differences between two samplings, like so:
data = pd.read_csv('testdata.csv', sep=';', index_col=0, header=0)
# data for the csv at the end of the question
sns.scatterplot(x=data['x1'], y=data['y1'])
sns.scatterplot(x=data['x2'], y=data['y2'])
Nice, I can easily see that the first sampling seems to show a linear relationship between X and Y, whereas the second one shows some differences. Now what interests me, is which type of data is affected the most by these differences and that's why I'm using seaborn, instead of pure matplotlib: sns.scatterplot has a lot of nice stuff built in, e.g. hue (and style, to get symbols for printing in b&w):
sizes = (200, 200) # to make stuff more visible
sns.scatterplot(x=data['x1'], y=data['y1'], hue=data['type'], style=data['type'],
size=data['type'], sizes=sizes)
sns.scatterplot(x=data['x2'], y=data['y2'], hue=data['type'], style=data['type'],
size=data['type'], sizes=sizes)
OK, so I can easily distinguish my data types, but I lost all information about which sample is what. The obvious solution to me seem to use filled markers for one, and unfilled ones for the other.
However, I can't seem to do that.
I'm aware of this question/answer, using fc='none' which is not documented in the sns.scatterplot documentation but this fails, when also using hue:
sns.scatterplot(x=data['x1'], y=data['y1'], hue=data['type'], style=data['type'],
size=data['type'], sizes=sizes)
sns.scatterplot(x=data['x2'], y=data['y2'], hue=data['type'], style=data['type'],
size=data['type'], sizes=sizes, fc='none')
As you can see, the second set of markers simply vanishes (there's some artifacts in the B data, where hints of a white cross are visible).
I can kinda fix that by setting ec=...:
sns.scatterplot(x=data['x1'], y=data['y1'], hue=data['type'], style=data['type'],
size=data['type'], sizes=sizes)
sns.scatterplot(x=data['x2'], y=data['y2'], hue=data['type'], style=data['type'],
size=data['type'], sizes=sizes, fc='none',
ec=('b','b','y','y','y', 'y', 'g', 'g', 'g','r'))
# I would have to define the proper colors, but for this example, they're close enough
but that obviously has a few issues:
the markers in the legend aren't fitting anymore, neither color nor fill
and I'm already halfway in doing-it-all-by-hand territory anyways, e.g. my ec= would fail when I want to plot a new dataset with sample_no 11.
How can I do that with seaborn? Filled vs. unfilled seems quite an obvious flag for scatterplots, but I can't seem to find it.
data for testdata.csv:
sample_no;type;x1;y1;x2;y2
1;A;11;21;10;21
2;A;12;22;12;21
3;B;13;23;13.2;22.8
4;B;14;24;13.8;24
5;B;15;25;14.8;25.2
6;B;16;26;16.3;25.9
7;C;17;27;18;28
8;C;18;28;20;26
9;C;19;29;20;30
10;D;20;30;19;28

Matplotlib.pyplot - how to save a histogram in a variable for later access?

Due to data access patterns, I need to save various histograms in a Python list and then access them later to output as part of a multi-page PDF.
If I save the histograms to my PDF as soon as I create them, my code works fine:
def output_histogram_pdf(self, pdf):
histogram = plt.hist(
x=[values], bins=50)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.title(title)
if isinstance(pdf, PdfPages):
pdf.savefig()
But if I instead save them to a list so I can later manipulate the order, I run into trouble.
histogram_list.append(histogram)
Then later
for histogram in histogram_list:
plt.figure(histogram)
pdf.savefig()
This does not work. I'm either saving the wrong thing, or I don't know how to properly open what I've saved.
I've spent quite some time fruitlessly googling for a working solution, but so many of the terms involved are sufficiently vague that I get tons of different types of issues in my search results. Any help would be greatly appreciated, thanks!
Short Answer
You can use plt.gcf()
When creating your graph, after setting xlabel, ylabel, and title, append the figure to histogram list.
histogram_list.append(plt.gcf())
You can then iterate over the list later and call savefig.
Long Answer
plt.hist doesn't return the figure object. However, the figure object can be obtained using gcf (Get Current Figure).
In case you do not want to use the current figure, you could always create the figure yourself, using plt.figure or plt.subplot.
Either way, since you are already plotting the histogram and setting the labels for the figure, you'd want to append the figure to the list.
Option 1: using gcf
histogram = plt.hist(
x=[values], bins=50)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.title(title)
histogram_list.append(plt.gcf())
Option 2: create your own figure
figure = plt.figure(figsize=(a,b,))
# draw histogram on figure
histogram_list.append(figure)
Each histogram is formed by (n,bins,patches) where n are the values for each bin, bins are the bins edges (1 more than n), and patches are the artists to create the bars.
Most simply, try to plot each histogram as
for histogram in histogram_list:
n = histogram[0]
bins = histogram[1]
plt.plot(bins[:-1], n, '-', ds='steps-pre')

matplotlib x-axis formatting if x-axis is pandas index

I'm using iPython notebook's %matplotlib inline and I'm having trouble formatting my plot.
As you can see, my first and last data point aren't showing up the way the other data points are showing up. I'd like to have the error bars visible and have the graph be "zoomed out" a bit.
df.plot(yerr=df['std dev'],color='b', ecolor='r')
plt.title('SpO2 Mean with Std Dev')
plt.xlabel('Time (s)')
plt.ylabel(SpO2)
I assume I have to use
matplotlib.pyplot.xlim()
but I'm not sure how to use it properly if my x-axis is a DataFrame index composed of strings:
index = ['-3:0','0:3','3:6','6:9','9:12','12:15','15:18','18:21','21:24']
Any ideas? Thanks!
You can see the usage of xlim here. Basically in this case if you ran plt.xlim() you would get(0.0, 8.0). As you have an index that uses text and not numbers the values for xlim are actually just the index of the entries in your index. So in this case you would just need to change the values by feeding in however many steps left and right you want your graph to take. For example:
plt.xlim(-1,len(df))
Would change this:
to this:
Hope that helps.

How to put data on a graph like powerlaw.plot_pdf?

I need some basic help with the powerlaw package (https://pypi.python.org/pypi/powerlaw).
I have a list of data samples.
When I use powerlaw.plot_pdf(data), I get a graph (* sorry, can't upload the graphs here as I dont have enough reputation yet).
However, when trying to create the same graph on my own (with this code):
ax.plot(data)
ax.set_yscale('log')
ax.set_xscale('log')
I get a different graph.
Why is it?
Maybe I should normalize the data first (if yes - how)?
Or do I miss something more crucial?
(If I get it right, using the powerlaw.plot_pdf(data) means ploting the data before fitting).
Another option would be to get the values of both the x and y axes that produce the graph of powerlaw.plot_pdf(data) somehow, but did not succeed with that either.
Thanks for your kind help,
Alon
Solved.
After downloading the powerlaw code (https://code.google.com/p/powerlaw/source/checkout), things become clearer.
To get the values of both the x and y axes that produce the graph of powerlaw.plot_pdf(data):
edges, hist = powerlaw.pdf(data)
Now, instead of just plotting the data on a log log axes as described in the question, we should first produce the centers of the bins. These centers will be the values on the x axis. hist contain the values of the y axis:
bin_centers = (edges[1:]+edges[:-1])/2.0
Now plot on a loglog axes:
plt.loglog(bin_centers, hist)

Changing the marker on the same set of data

I have a set of data that comes from two different sources, and I have multiple sets graphed together. So essentially 6 scatterplots with error bars (all different colors), and each scatterplot has two sources.
Basically I want the blue scatterplot to have two different markers, 'o' and's'. I currently have done this by plotting each point individually with a loop and checking to see if the source is 1 or 2. If it is 1 it plots a 's' if the source is 2 then it plots a 'o'.
However this method does not really allow for having a legend. (Data1, Data2,...Data6)
Is there a better way of doing this?
EDIT:
I want a cleaner method for this, something along the lines of
x=[1,2,3]
y=[4,5,6]
m=['o','s','^']
plt.scatter(x,y,marker=m)
But this returns an error Unrecognized marker style
A more pythonic way (but still a loop) might be something like
x=[1,2,3]
y=[4,5,6]
l=['data1','data2','data3']
m=['ob','sb','^b']
f,a = plt.subplots(1,1)
[a.plot(*data, label=lab) for data,lab in zip(zip(x,y,m),l)]
plt.legend(loc='lower right')
plt.xlim(0,4)
plt.ylim(3,7);
But I guess this is not the most efficient way if you have lots of datapoints.
If you want to use scatter try something like
m=['o','s','^']
f,a = plt.subplots(1,1)
[a.scatter(*data, marker=m1, label=l1) for data,m1,l1 in zip(zip(x,y),m,l)]
I'm pretty sure, there is also a possibility to apply ** and dicts here.
UPDATE:
Instead of looping over the plot command the ability of matplotlib's plot function to read an arbitrary number of x,y,fmt groups, see docs.
x=np.random.random((3,6))
y=np.random.random((3,6))
l=['data1','data2','data3']
m=['ob','sb','^b']
plt.plot(*[i[j] for i in zip(x,y,m) for j in range(3)])
plt.legend(l,loc='lower right')
Calling plot in a loop is fine. You just need to keep the list of lines returned by plot and use fig.legend to create a legend for the whole figure. See http://matplotlib.org/examples/pylab_examples/figlegend_demo.html
Seconded to #tcaswell 's comments, .scatter() returns collections.PathCollection, which provides a fast way of plotting a large number of identical shaped objects. You can use a loop to plot the data as many scatter plots (and many different datasets) but in my opinion it looses all the speed benefit provided by .scatter().
With these being said, it is however not true that the dots have to be identical in a scatter plot. You can have different linewidth, edgecolor and many other things. But the dots have to be the same shape. See this example, assigning different colors (and only plot one dataset):
>>> sc=plt.scatter(x, y, label='test')
>>> sc.set_color(['r','g','b'])
>>> plt.legend()
See details in http://matplotlib.org/api/collections_api.html.
These were all alright, but not really what I was looking for. The problem was how I parsed through my data and how I could add a legend in the wouldn't mess that up. Since I did a for-loop and plotted each point individually based on if it was measured at Observation location 1 or 2 whenever I made a legend it would plot over 50 legend entries. So I plotted my data as full sets (Invisibly and with no change in symbols) then again in color with the varying symbols. This worked better. Thanks though

Categories

Resources