matplotlib x-axis formatting if x-axis is pandas index - python

I'm using iPython notebook's %matplotlib inline and I'm having trouble formatting my plot.
As you can see, my first and last data point aren't showing up the way the other data points are showing up. I'd like to have the error bars visible and have the graph be "zoomed out" a bit.
df.plot(yerr=df['std dev'],color='b', ecolor='r')
plt.title('SpO2 Mean with Std Dev')
plt.xlabel('Time (s)')
plt.ylabel(SpO2)
I assume I have to use
matplotlib.pyplot.xlim()
but I'm not sure how to use it properly if my x-axis is a DataFrame index composed of strings:
index = ['-3:0','0:3','3:6','6:9','9:12','12:15','15:18','18:21','21:24']
Any ideas? Thanks!

You can see the usage of xlim here. Basically in this case if you ran plt.xlim() you would get(0.0, 8.0). As you have an index that uses text and not numbers the values for xlim are actually just the index of the entries in your index. So in this case you would just need to change the values by feeding in however many steps left and right you want your graph to take. For example:
plt.xlim(-1,len(df))
Would change this:
to this:
Hope that helps.

Related

How to get rid of scientifc notation in the y-axis of this .plot.bar() pandas figure?

I tried using plt.ticklabel_format(style='plain'), but since the figure has been created using pandas, df.plot.bar(), it is not working. How can I make this work?
Use the following code:
ax = df.plot.bar(x="Place", y="Amount", rot=30, title="Amount of ...", legend=False)
ax.ticklabel_format(style='plain', axis='y');
Of course, parameters passed to df.plot.bar are according to my source data.
Set them according to your environment.
Importans points are:
save the result of df.plot.bar in a variable (axes object),
set tick label format only for y axis.

Colorbar based legend in python matplotlib

In the graphic below, I want to put in a legend for the calendar plot. The calendar plot was made using ax.plot(...,label='a') and drawing rectangles in a 52x7 grid (52 weeks, 7 days per week).
The legend is currently made using:
plt.gca().legend(loc="upper right")
How do I correct this legend to something more like a colorbar? Also, the colorbar should be placed at the bottom of the plot.
EDIT:
Uploaded code and data for reproducing this here:
https://www.dropbox.com/sh/8xgyxybev3441go/AACKDiNFBqpsP1ZttsZLqIC4a?dl=0
Aside - existing bugs
The code you put on the dropbox doesn't work "out of the box". In particular - you're trying to divide a datetime.timedelta by a numpy.timedelta64 in two places and that fails.
You do your own normalisation and colour mapping (calling into color_list based on an int() conversion of your normalised value). You subtract 1 from this and you don't need to - you already floor the value by using int(). The result of doing this is that you can get an index of -1 which means your very smallest values are incorrectly mapped to the colour for the maximum value. This is most obvious if you plot column 'BIOM'.
I've hacked this by adding a tiny value (0.00001) to the total range of the values that you divide by. It's a hack - I'm not sure that this method of mapping is at all the best use of matplotlib, but that's a different question entirely.
Solution adapting your code
With those bugs fixed, and adding a last suplot below all the existing ones (i.e. replacing 3 with 4 on all your calls to subplot2grid(), you can do the following:
Replace your
plt.gca().legend(loc="upper right")
with
# plot an overall colorbar type legend
# Grab the new axes object to plot the colorbar on
ax_colorbar = plt.subplot2grid((4,num_yrs), (3,0),rowspan=1,colspan=num_yrs)
mappableObject = matplotlib.cm.ScalarMappable(cmap = palettable.colorbrewer.sequential.BuPu_9.mpl_colormap)
mappableObject.set_array(numpy.array(df[col_name]))
col_bar = fig.colorbar(mappableObject, cax = ax_colorbar, orientation = 'horizontal', boundaries = numpy.arange(min_val,max_val,(max_val-min_val)/10))
# You can change the boundaries kwarg to either make the scale look less boxy (increase 10)
# or to get different values on the tick marks, or even omit it altogether to let
col_bar.set_label(col_name)
ax_colorbar.set_title(col_name + ' color mapping')
I tested this with two of your columns ('NMN' and 'BIOM') and on Python 2.7 (I assume you're using Python 2.x given the print statement syntax)
The finalised code that works directly with your data file is in a gist here
You get
How does it work?
It creates a ScalarMappable object that matplotlib can use to map values to colors. It set the array to base this map on to all the values in the column you are dealing with. It then used Figure.colorbar() to add the colorbar - passing in the mappable object so that the labels are correct. I've added boundaries so that the minimum value is shown explicitly - you can omit that if you want matplotlib to sort that out for itself.
P.S. I've set the colormap to palettable.colorbrewer.sequential.BuPu_9.mpl_colormap, matching your get_colors() function which gets these colours as a 9 member list. I strongly recommend importing the colormap you want to use as a nice name to make the use of mpl_colors and mpl_colormap more easy to understand e.g.
import palettable.colorbrewer.sequential.BuPu_9 as color_scale
Then access it as
color_scale.mpl_colormap
That way, you can keep your code DRY and change the colors with only one change.
Layout (in response to comments)
The colorbar may be a little big (certainly tall) for aesthetic ideal. There are a few possible options to do that. I'll point you to two:
The "right" way to do it is probably to use a Gridspec
You could use your existing approach, but increase the number of rows and have the colorbar still in one row, while the other elements span more rows than they do currently.
I've implemented that with 9 rows, an extra column (so that the month labels don't get lost) and the colorbar on the bottom row, spanning 2 less columns than the main figure. I've also used tight_layout with w_pad=0.0 to avoid label clashes. You can play with this to get your exact preferred size. New code here.
This gives:
:
There are functions to do this in matplotlib.colorbar. With some specific code from your example, I could give you a better answer, but you'll use something like:
myColorbar = matplotlib.colorbar.ColorbarBase(myAxes, cmap=myColorMap,
norm=myNorm,
orientation='vertical')

Exponential values at X-axis in pyplot [duplicate]

I'm using Matplotlib in Python to plot simple x-y datasets. This produces nice-looking graphs, although when I "zoom in" too close on various sections of the plotted graph using the Figure View (which appears when you execute plt.show() ), the x-axis values change from standard number form (1050, 1060, 1070 etc.) to scientific form with exponential notation (e.g. 1, 1.5, 2.0 with the x-axis label given as +1.057e3).
I'd prefer my figures to retain the simple numbering of the axis, rather than using exponential form. Is there a way I can force Matplotlib to do this?
The formatting of tick labels is controlled by a Formatter object, which assuming you haven't done anything fancy will be a ScalerFormatterby default. This formatter will use a constant shift if the fractional change of the values visible is very small. To avoid this, simply turn it off:
plt.plot(arange(0,100,10) + 1000, arange(0,100,10))
ax = plt.gca()
ax.get_xaxis().get_major_formatter().set_useOffset(False)
plt.draw()
If you want to avoid scientific notation in general,
ax.get_xaxis().get_major_formatter().set_scientific(False)
Can control this with globally via the axes.formatter.useoffset rcparam.
You can use a simpler command to turn it off:
plt.ticklabel_format(useOffset=False)
You can use something like:
from matplotlib.ticker import ScalarFormatter, FormatStrFormatter
ax.xaxis.set_major_formatter(FormatStrFormatter('%.0f'))
Use the following command:
ax.ticklabel_format(useOffset=False, style='plain')
If you are using a subplot, you may experience the AttributeError: This method only works with the ScalarFormatter in which case you would add axis='y' like the below. You can change 'y' to the axis with the issues.
ax1.ticklabel_format(useOffset=False, style='plain', axis='y')
Source question and answer here. Note, the axis 'y' command use is hidden in the answer comments.
I have used below code before the graphs, and it worked seamless for me..
plt.ticklabel_format(style='plain')
Exactly I didn't want scientific numbers to be shown when I zoom in, and the following worked in my case too. I am using Lat/Lon in labeling where scientific form doesn't make sense.
plt.ticklabel_format(useOffset=False)

Changing the marker on the same set of data

I have a set of data that comes from two different sources, and I have multiple sets graphed together. So essentially 6 scatterplots with error bars (all different colors), and each scatterplot has two sources.
Basically I want the blue scatterplot to have two different markers, 'o' and's'. I currently have done this by plotting each point individually with a loop and checking to see if the source is 1 or 2. If it is 1 it plots a 's' if the source is 2 then it plots a 'o'.
However this method does not really allow for having a legend. (Data1, Data2,...Data6)
Is there a better way of doing this?
EDIT:
I want a cleaner method for this, something along the lines of
x=[1,2,3]
y=[4,5,6]
m=['o','s','^']
plt.scatter(x,y,marker=m)
But this returns an error Unrecognized marker style
A more pythonic way (but still a loop) might be something like
x=[1,2,3]
y=[4,5,6]
l=['data1','data2','data3']
m=['ob','sb','^b']
f,a = plt.subplots(1,1)
[a.plot(*data, label=lab) for data,lab in zip(zip(x,y,m),l)]
plt.legend(loc='lower right')
plt.xlim(0,4)
plt.ylim(3,7);
But I guess this is not the most efficient way if you have lots of datapoints.
If you want to use scatter try something like
m=['o','s','^']
f,a = plt.subplots(1,1)
[a.scatter(*data, marker=m1, label=l1) for data,m1,l1 in zip(zip(x,y),m,l)]
I'm pretty sure, there is also a possibility to apply ** and dicts here.
UPDATE:
Instead of looping over the plot command the ability of matplotlib's plot function to read an arbitrary number of x,y,fmt groups, see docs.
x=np.random.random((3,6))
y=np.random.random((3,6))
l=['data1','data2','data3']
m=['ob','sb','^b']
plt.plot(*[i[j] for i in zip(x,y,m) for j in range(3)])
plt.legend(l,loc='lower right')
Calling plot in a loop is fine. You just need to keep the list of lines returned by plot and use fig.legend to create a legend for the whole figure. See http://matplotlib.org/examples/pylab_examples/figlegend_demo.html
Seconded to #tcaswell 's comments, .scatter() returns collections.PathCollection, which provides a fast way of plotting a large number of identical shaped objects. You can use a loop to plot the data as many scatter plots (and many different datasets) but in my opinion it looses all the speed benefit provided by .scatter().
With these being said, it is however not true that the dots have to be identical in a scatter plot. You can have different linewidth, edgecolor and many other things. But the dots have to be the same shape. See this example, assigning different colors (and only plot one dataset):
>>> sc=plt.scatter(x, y, label='test')
>>> sc.set_color(['r','g','b'])
>>> plt.legend()
See details in http://matplotlib.org/api/collections_api.html.
These were all alright, but not really what I was looking for. The problem was how I parsed through my data and how I could add a legend in the wouldn't mess that up. Since I did a for-loop and plotted each point individually based on if it was measured at Observation location 1 or 2 whenever I made a legend it would plot over 50 legend entries. So I plotted my data as full sets (Invisibly and with no change in symbols) then again in color with the varying symbols. This worked better. Thanks though

How do I reuse a plot layout in iPython notebook>?

The code below gives me the image even further below.
flowRates=[2,5,10,20,50]
flowRateTol=0.2
#sets the limits for the plot
xRange=(0,700)
yRange=(0,70)
ax=axes()
ax.set_xlabel('Time (s)')
#ax.set_ylabel('Reaction Force (lbf)')
ax.legend(loc=0)
#set up the second axis
ax.twinx()
ax.set_ylabel('10s Average Flow Rate')
ax.set_xlim(xRange)
ax.set_ylim(yRange)
#shade the acceptable tolerance bands
for flowRate in flowRates:
rectX=[0,xRange[1],xRange[1],0]
rectY=[ flowRate*(1-flowRateTol),
flowRate*(1-flowRateTol),
flowRate*(1+flowRateTol),
flowRate*(1+flowRateTol)]
ax.fill(rectX,rectY,'b', alpha=0.2, edgecolor='r')
However what I'd like to do in my next iPython cell is to actually plot data on the graph. The code I'm using to do so (unsuccessfully is) just has a call to ax.plot(), but I can't get a graph to show up with my data.
Any thoughts? My goal is to have a worflow (that I will present) that goes something like this:
Look how I import my data!
This is how I set up my graph! (show the base plot)
This is how I plot all my data! (show the base plot with the data)
This is how I filter my data! (do some fancy filtering)
This is what the filtered data looks like! (show new data on same base plot)
I would suggest packaging different ideas into functions. E.g
This is how I import data:
def Import_Data(file_name,...):
# Stuff to import data
return data
This is how I plot my data:
def Plot(data..)
Plotting just the base plot seems like a special case that you may do once, but if you really want to be able to show this, and minimise the amount of repeated code just allow data=None to ignore errors and not plot anything.
The great thing about splitting code up like this is that it is easy to make changes to just one function, provided then just worry about inputs and outputs. For instance to filter you can either add a filter paramateter to the plot function, or create new filtered data that is plotted in the same way!

Categories

Resources