How to put data on a graph like powerlaw.plot_pdf? - python

I need some basic help with the powerlaw package (https://pypi.python.org/pypi/powerlaw).
I have a list of data samples.
When I use powerlaw.plot_pdf(data), I get a graph (* sorry, can't upload the graphs here as I dont have enough reputation yet).
However, when trying to create the same graph on my own (with this code):
ax.plot(data)
ax.set_yscale('log')
ax.set_xscale('log')
I get a different graph.
Why is it?
Maybe I should normalize the data first (if yes - how)?
Or do I miss something more crucial?
(If I get it right, using the powerlaw.plot_pdf(data) means ploting the data before fitting).
Another option would be to get the values of both the x and y axes that produce the graph of powerlaw.plot_pdf(data) somehow, but did not succeed with that either.
Thanks for your kind help,
Alon

Solved.
After downloading the powerlaw code (https://code.google.com/p/powerlaw/source/checkout), things become clearer.
To get the values of both the x and y axes that produce the graph of powerlaw.plot_pdf(data):
edges, hist = powerlaw.pdf(data)
Now, instead of just plotting the data on a log log axes as described in the question, we should first produce the centers of the bins. These centers will be the values on the x axis. hist contain the values of the y axis:
bin_centers = (edges[1:]+edges[:-1])/2.0
Now plot on a loglog axes:
plt.loglog(bin_centers, hist)

Related

Matplotlib.pyplot - how to save a histogram in a variable for later access?

Due to data access patterns, I need to save various histograms in a Python list and then access them later to output as part of a multi-page PDF.
If I save the histograms to my PDF as soon as I create them, my code works fine:
def output_histogram_pdf(self, pdf):
histogram = plt.hist(
x=[values], bins=50)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.title(title)
if isinstance(pdf, PdfPages):
pdf.savefig()
But if I instead save them to a list so I can later manipulate the order, I run into trouble.
histogram_list.append(histogram)
Then later
for histogram in histogram_list:
plt.figure(histogram)
pdf.savefig()
This does not work. I'm either saving the wrong thing, or I don't know how to properly open what I've saved.
I've spent quite some time fruitlessly googling for a working solution, but so many of the terms involved are sufficiently vague that I get tons of different types of issues in my search results. Any help would be greatly appreciated, thanks!
Short Answer
You can use plt.gcf()
When creating your graph, after setting xlabel, ylabel, and title, append the figure to histogram list.
histogram_list.append(plt.gcf())
You can then iterate over the list later and call savefig.
Long Answer
plt.hist doesn't return the figure object. However, the figure object can be obtained using gcf (Get Current Figure).
In case you do not want to use the current figure, you could always create the figure yourself, using plt.figure or plt.subplot.
Either way, since you are already plotting the histogram and setting the labels for the figure, you'd want to append the figure to the list.
Option 1: using gcf
histogram = plt.hist(
x=[values], bins=50)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.title(title)
histogram_list.append(plt.gcf())
Option 2: create your own figure
figure = plt.figure(figsize=(a,b,))
# draw histogram on figure
histogram_list.append(figure)
Each histogram is formed by (n,bins,patches) where n are the values for each bin, bins are the bins edges (1 more than n), and patches are the artists to create the bars.
Most simply, try to plot each histogram as
for histogram in histogram_list:
n = histogram[0]
bins = histogram[1]
plt.plot(bins[:-1], n, '-', ds='steps-pre')

matplotlib x-axis formatting if x-axis is pandas index

I'm using iPython notebook's %matplotlib inline and I'm having trouble formatting my plot.
As you can see, my first and last data point aren't showing up the way the other data points are showing up. I'd like to have the error bars visible and have the graph be "zoomed out" a bit.
df.plot(yerr=df['std dev'],color='b', ecolor='r')
plt.title('SpO2 Mean with Std Dev')
plt.xlabel('Time (s)')
plt.ylabel(SpO2)
I assume I have to use
matplotlib.pyplot.xlim()
but I'm not sure how to use it properly if my x-axis is a DataFrame index composed of strings:
index = ['-3:0','0:3','3:6','6:9','9:12','12:15','15:18','18:21','21:24']
Any ideas? Thanks!
You can see the usage of xlim here. Basically in this case if you ran plt.xlim() you would get(0.0, 8.0). As you have an index that uses text and not numbers the values for xlim are actually just the index of the entries in your index. So in this case you would just need to change the values by feeding in however many steps left and right you want your graph to take. For example:
plt.xlim(-1,len(df))
Would change this:
to this:
Hope that helps.

How to increase plot y-range iwith matplotlib? [duplicate]

I would like to plot a set of points using pyplot in matplotlib but have none of the points be on the edge of my axes. The autoscale (or something) sets the xlim and ylim such that often the first and last points lie at x = xmin or xmax making it difficult to read in some situations.
This is more often problematic with loglog() or semilog() plots because the autoscale would like xmin and xmax to be exact powers of ten, but if my data contains only three points, e.g. at xdata = [10**2,10**3,10**4] then the first and last points will lie on the border of the plot.
Attempted Workaround
This is my solution to add a 10% buffer to either side of the graph. But is there a way to do this more elegantly or automatically?
from numpy import array, log10
from matplotlib.pyplot import *
xdata = array([10**2,10**3,10**4])
ydata = xdata**2
figure()
loglog(xdata,ydata,'.')
xmin,xmax = xlim()
xbuff = 0.1*log10(xmax/xmin)
xlim(xmin*10**(-xbuff),xmax*10**(xbuff))
I am hoping for a one- or two-line solution that I can easily use whenever I make a plot like this.
Linear Plot
To make clear what I'm doing in my workaround, I should add an example in linear space (instead of log space):
plot(xdata,ydata)
xmin,xmax = xlim()
xbuff = 0.1*(xmax-xmin)
xlim(xmin-xbuff,xmax+xbuff))
which is identical to the previous example but for a linear axis.
Limits too large
A related problem is that sometimes the limits are too large. Say my data is something like ydata = xdata**0.25 so that the variance in the range is much less than a decade but ends at exactly 10**1. Then, the autoscale ylim are 10**0 to 10**1 though the data are only in the top portion of the plot. Using my workaround above, I can increase ymax so that the third point is fully within the limits but I don't know how to increase ymin so that there is less whitespace at the lower portion of my plot. i.e., the point is that I don't always want to spread my limits apart but would just like to have some constant (or proportional) buffer around all my points.
#askewchan I just succesfully achieved how to change matplotlib settings by editing matplotlibrc configuration file and running python directly from terminal. Don't know the reason yet, but matplotlibrc is not working when I run python from spyder3 (my IDE). Just follow steps here matplotlib.org/users/customizing.html.
1) Solution one (default for all plots)
Try put this in matplotlibrc and you will see the buffer increase:
axes.xmargin : 0.1 # x margin. See `axes.Axes.margins`
axes.ymargin : 0.1 # y margin See `axes.Axes.margins`
Values must be between 0 and 1.
Obs.: Due to bugs, scale is not correctly working yet. It'll be fixed for matplotlib 1.5 (mine is 1.4.3 yet...). More info:
axes.xmargin/ymargin rcParam behaves differently than pyplot.margins() #2298
Better auto-selection of axis limits #4891
2) Solution two (individually for each plot inside the code)
There is also the margins function (for put directly in the code). Example:
import numpy as np
from matplotlib import pyplot as plt
t = np.linspace(-6,6,1000)
plt.plot(t,np.sin(t))
plt.margins(x=0.1, y=0.1)
plt.savefig('plot.png')
Obs.: Here scale is working (0.1 will increase 10% of buffer before and after x-range and y-range).
A similar question was posed to the matplotlib-users list earlier this year. The most promising solution involves implementing a Locator (based on MaxNLocator in this case) to override MaxNLocator.view_limits.

Changing the marker on the same set of data

I have a set of data that comes from two different sources, and I have multiple sets graphed together. So essentially 6 scatterplots with error bars (all different colors), and each scatterplot has two sources.
Basically I want the blue scatterplot to have two different markers, 'o' and's'. I currently have done this by plotting each point individually with a loop and checking to see if the source is 1 or 2. If it is 1 it plots a 's' if the source is 2 then it plots a 'o'.
However this method does not really allow for having a legend. (Data1, Data2,...Data6)
Is there a better way of doing this?
EDIT:
I want a cleaner method for this, something along the lines of
x=[1,2,3]
y=[4,5,6]
m=['o','s','^']
plt.scatter(x,y,marker=m)
But this returns an error Unrecognized marker style
A more pythonic way (but still a loop) might be something like
x=[1,2,3]
y=[4,5,6]
l=['data1','data2','data3']
m=['ob','sb','^b']
f,a = plt.subplots(1,1)
[a.plot(*data, label=lab) for data,lab in zip(zip(x,y,m),l)]
plt.legend(loc='lower right')
plt.xlim(0,4)
plt.ylim(3,7);
But I guess this is not the most efficient way if you have lots of datapoints.
If you want to use scatter try something like
m=['o','s','^']
f,a = plt.subplots(1,1)
[a.scatter(*data, marker=m1, label=l1) for data,m1,l1 in zip(zip(x,y),m,l)]
I'm pretty sure, there is also a possibility to apply ** and dicts here.
UPDATE:
Instead of looping over the plot command the ability of matplotlib's plot function to read an arbitrary number of x,y,fmt groups, see docs.
x=np.random.random((3,6))
y=np.random.random((3,6))
l=['data1','data2','data3']
m=['ob','sb','^b']
plt.plot(*[i[j] for i in zip(x,y,m) for j in range(3)])
plt.legend(l,loc='lower right')
Calling plot in a loop is fine. You just need to keep the list of lines returned by plot and use fig.legend to create a legend for the whole figure. See http://matplotlib.org/examples/pylab_examples/figlegend_demo.html
Seconded to #tcaswell 's comments, .scatter() returns collections.PathCollection, which provides a fast way of plotting a large number of identical shaped objects. You can use a loop to plot the data as many scatter plots (and many different datasets) but in my opinion it looses all the speed benefit provided by .scatter().
With these being said, it is however not true that the dots have to be identical in a scatter plot. You can have different linewidth, edgecolor and many other things. But the dots have to be the same shape. See this example, assigning different colors (and only plot one dataset):
>>> sc=plt.scatter(x, y, label='test')
>>> sc.set_color(['r','g','b'])
>>> plt.legend()
See details in http://matplotlib.org/api/collections_api.html.
These were all alright, but not really what I was looking for. The problem was how I parsed through my data and how I could add a legend in the wouldn't mess that up. Since I did a for-loop and plotted each point individually based on if it was measured at Observation location 1 or 2 whenever I made a legend it would plot over 50 legend entries. So I plotted my data as full sets (Invisibly and with no change in symbols) then again in color with the varying symbols. This worked better. Thanks though

How do I reuse a plot layout in iPython notebook>?

The code below gives me the image even further below.
flowRates=[2,5,10,20,50]
flowRateTol=0.2
#sets the limits for the plot
xRange=(0,700)
yRange=(0,70)
ax=axes()
ax.set_xlabel('Time (s)')
#ax.set_ylabel('Reaction Force (lbf)')
ax.legend(loc=0)
#set up the second axis
ax.twinx()
ax.set_ylabel('10s Average Flow Rate')
ax.set_xlim(xRange)
ax.set_ylim(yRange)
#shade the acceptable tolerance bands
for flowRate in flowRates:
rectX=[0,xRange[1],xRange[1],0]
rectY=[ flowRate*(1-flowRateTol),
flowRate*(1-flowRateTol),
flowRate*(1+flowRateTol),
flowRate*(1+flowRateTol)]
ax.fill(rectX,rectY,'b', alpha=0.2, edgecolor='r')
However what I'd like to do in my next iPython cell is to actually plot data on the graph. The code I'm using to do so (unsuccessfully is) just has a call to ax.plot(), but I can't get a graph to show up with my data.
Any thoughts? My goal is to have a worflow (that I will present) that goes something like this:
Look how I import my data!
This is how I set up my graph! (show the base plot)
This is how I plot all my data! (show the base plot with the data)
This is how I filter my data! (do some fancy filtering)
This is what the filtered data looks like! (show new data on same base plot)
I would suggest packaging different ideas into functions. E.g
This is how I import data:
def Import_Data(file_name,...):
# Stuff to import data
return data
This is how I plot my data:
def Plot(data..)
Plotting just the base plot seems like a special case that you may do once, but if you really want to be able to show this, and minimise the amount of repeated code just allow data=None to ignore errors and not plot anything.
The great thing about splitting code up like this is that it is easy to make changes to just one function, provided then just worry about inputs and outputs. For instance to filter you can either add a filter paramateter to the plot function, or create new filtered data that is plotted in the same way!

Categories

Resources