I am trying to create a grouped bar chart with multiple subplots using matplotlib and pandas. I am able to create it manually defining the plots according to the values of the datatframe, but I want to get it automated with loops. I have tried many ways doing a loop, but running into one or other error every time. Being a beginner in both programming and python, I am getting lost. here's my data:sales3
The code I have written to get the expected output:
sales3 = sales.groupby(["Region","Tier"])[["Sales2015","Sales2016"]].sum().round().astype("int64")
sales3.reset_index(inplace=True)
fig,(ax1,ax2,ax3) = plt.subplots(nrows=1,ncols=3,sharex=True,sharey=True,figsize=(10,6))
sales3[sales3["Region"]=="Central"].plot(kind="bar",x="Tier",y=["Sales2015","Sales2016"],ax=ax1)
ax1.set_title("Central")
sales3[sales3["Region"]=="East"].plot(kind="bar",x="Tier",y=["Sales2015","Sales2016"],ax=ax2)
ax2.set_title("East")
sales3[sales3["Region"]=="West"].plot(kind="bar",x="Tier",y=["Sales2015","Sales2016"],ax=ax3)
ax3.set_title("West")
plt.tight_layout()
output:
expected output
Please guide how do I write it using a loop or any automated way. Say, I have another region like "North" /"South" added in future or a new Tier introduced, what will be the best way to program that would accommodate such new additions.
You can iterate through the axes and regions:
sales3 = sales.groupby(["Region","Tier"])[["Sales2015","Sales2016"]].sum().round().astype("int64")
sales3.reset_index(inplace=True)
fig,axes = plt.subplots(nrows=1,ncols=3,sharex=True,sharey=True,figsize=(10,6))
# define regions to plot
regions = ["Central", "East", "West"]
# iterate over regions and axes using zip()
for region, ax in zip(regions,axes):
sales3[sales3["Region"]==region].plot(kind="bar",x="Tier",y=["Sales2015","Sales2016"],ax=ax)
ax.set_title(region)
plt.tight_layout()
I think the key is using pythons built-in zip function which is documented here.
Related
I am a first poster here, please bear with me as I try to present my issue :)
I have a for loop in which I am generating a set of data (in this case a list, but could be converted to a data frame if needed) and each time the for loop iterates, I generate a histogram with new data for that specific loop that I want to add to my subplot. See below what I have tried:
I have tried the following:
well_ID = ["A2","A3","A4","B2","B3","B4"]
FOV = range(1,31)
for index_row,i in enumerate(well_ID):
for j in FOV:
#Here I have plenty of functions analysing images and thresholding them and eventually I get to the following:
my_data_list = [#huge dataset with lots of values]
#What I would like to do:
fig, axes = plt.subplots(nrows=6, ncols=5)
fig.subplots_adjust(hspace=0.5)
axes[index_row,j].hist(my_data_list, bins = 50, rwidth=0.9,color = 'skyblue')
When I run the above code I get a subplot that only plots each loop dataset into the specified position in axes[index_row,j] without adding previous loops data (as in updating the previous subplot with the new loop data).
Can anybody help with this? Thank you very much!!! :D
(feel free to ask any specific questions to make it clearer, I just tried to simplify it because the actual code is very long)
I'm new to Python so I hope you'll forgive my silly questions. I have read a dataset from excel with pandas. The dataset is composed by 3 functions (U22, U35, U55) and related same index (called y/75). enter image description here
now I would like to "turn" the graph so that the index "y/75" goes on the y-axis instead of the x-axis, keeping all the functions in the same graph. The results I want to obtain is like in the following picture enter image description here
the code I've used is
var = pd.read_excel('path.xlsx','SummarySheet', index_col=0)
norm_vel=var[['U22',"U35","U55"]]
norm_vel.plot(figsize=(10,10), grid='true')
But with this code I couldn't find a way to change the axes. Then I tried a different approach, so I turned the graph but couldn't add all the functions in the same graph but just one by one
var = pd.read_excel('path.xlsx','SummarySheet', index_col=False)
norm_vel2=var[['y/75','U22',"U35","U55"]]
norm_vel2.plot( x='U22', y='y/75', figsize=(10,10), grid='true' )
plt.title("Velocity profiles")
plt.xlabel("Normalized velocity")
plt.ylabel("y/75")
obtaining this enter image description here
I am not very familiar with dataframes plot. And to be honest, I've been stalking this question expecting that someone would give an obvious answer. But since no one has one (1 hour old questions, is already late for obvious answers), I can at least tell you how I would do it, without the plot method of the dataframe
plt.figure(figsize=(10,10))
plt.grid(True)
plt.plot(var[['U22',"U35","U55"]], var['y/75'])
plt.title("Velocity profiles")
plt.xlabel("Normalized velocity")
plt.ylabel("y/75")
When used to matplotlib, in which, you can have multiple series in both x and y, the instinct says that pandas connections (which are just useful functions to call matplotlib with the correct parameters), should make it possible to just call
var.plot(x=['U22', 'U35', 'U55'], y='y/75')
Since after all,
var.plot(x='y/75', y=['U22', 'U35', 'U55'])
works as expected (3 lines: U22 vs y/75, U35 vs y/75, U55 vs y/75). So the first one should have also worked (3 lines, y/75 vs U22, y/75 vs U35, y/75 vs U55). But it doesn't. Probably the reason why pandas documentation itself says that these matplotlib connections are still a work in progress.
So, all you've to do is call matplotlib function yourself. After all, it is not like pandas is doing much more when calling those .plot method anyway.
In the graphic below, I want to put in a legend for the calendar plot. The calendar plot was made using ax.plot(...,label='a') and drawing rectangles in a 52x7 grid (52 weeks, 7 days per week).
The legend is currently made using:
plt.gca().legend(loc="upper right")
How do I correct this legend to something more like a colorbar? Also, the colorbar should be placed at the bottom of the plot.
EDIT:
Uploaded code and data for reproducing this here:
https://www.dropbox.com/sh/8xgyxybev3441go/AACKDiNFBqpsP1ZttsZLqIC4a?dl=0
Aside - existing bugs
The code you put on the dropbox doesn't work "out of the box". In particular - you're trying to divide a datetime.timedelta by a numpy.timedelta64 in two places and that fails.
You do your own normalisation and colour mapping (calling into color_list based on an int() conversion of your normalised value). You subtract 1 from this and you don't need to - you already floor the value by using int(). The result of doing this is that you can get an index of -1 which means your very smallest values are incorrectly mapped to the colour for the maximum value. This is most obvious if you plot column 'BIOM'.
I've hacked this by adding a tiny value (0.00001) to the total range of the values that you divide by. It's a hack - I'm not sure that this method of mapping is at all the best use of matplotlib, but that's a different question entirely.
Solution adapting your code
With those bugs fixed, and adding a last suplot below all the existing ones (i.e. replacing 3 with 4 on all your calls to subplot2grid(), you can do the following:
Replace your
plt.gca().legend(loc="upper right")
with
# plot an overall colorbar type legend
# Grab the new axes object to plot the colorbar on
ax_colorbar = plt.subplot2grid((4,num_yrs), (3,0),rowspan=1,colspan=num_yrs)
mappableObject = matplotlib.cm.ScalarMappable(cmap = palettable.colorbrewer.sequential.BuPu_9.mpl_colormap)
mappableObject.set_array(numpy.array(df[col_name]))
col_bar = fig.colorbar(mappableObject, cax = ax_colorbar, orientation = 'horizontal', boundaries = numpy.arange(min_val,max_val,(max_val-min_val)/10))
# You can change the boundaries kwarg to either make the scale look less boxy (increase 10)
# or to get different values on the tick marks, or even omit it altogether to let
col_bar.set_label(col_name)
ax_colorbar.set_title(col_name + ' color mapping')
I tested this with two of your columns ('NMN' and 'BIOM') and on Python 2.7 (I assume you're using Python 2.x given the print statement syntax)
The finalised code that works directly with your data file is in a gist here
You get
How does it work?
It creates a ScalarMappable object that matplotlib can use to map values to colors. It set the array to base this map on to all the values in the column you are dealing with. It then used Figure.colorbar() to add the colorbar - passing in the mappable object so that the labels are correct. I've added boundaries so that the minimum value is shown explicitly - you can omit that if you want matplotlib to sort that out for itself.
P.S. I've set the colormap to palettable.colorbrewer.sequential.BuPu_9.mpl_colormap, matching your get_colors() function which gets these colours as a 9 member list. I strongly recommend importing the colormap you want to use as a nice name to make the use of mpl_colors and mpl_colormap more easy to understand e.g.
import palettable.colorbrewer.sequential.BuPu_9 as color_scale
Then access it as
color_scale.mpl_colormap
That way, you can keep your code DRY and change the colors with only one change.
Layout (in response to comments)
The colorbar may be a little big (certainly tall) for aesthetic ideal. There are a few possible options to do that. I'll point you to two:
The "right" way to do it is probably to use a Gridspec
You could use your existing approach, but increase the number of rows and have the colorbar still in one row, while the other elements span more rows than they do currently.
I've implemented that with 9 rows, an extra column (so that the month labels don't get lost) and the colorbar on the bottom row, spanning 2 less columns than the main figure. I've also used tight_layout with w_pad=0.0 to avoid label clashes. You can play with this to get your exact preferred size. New code here.
This gives:
:
There are functions to do this in matplotlib.colorbar. With some specific code from your example, I could give you a better answer, but you'll use something like:
myColorbar = matplotlib.colorbar.ColorbarBase(myAxes, cmap=myColorMap,
norm=myNorm,
orientation='vertical')
I have a set of data that comes from two different sources, and I have multiple sets graphed together. So essentially 6 scatterplots with error bars (all different colors), and each scatterplot has two sources.
Basically I want the blue scatterplot to have two different markers, 'o' and's'. I currently have done this by plotting each point individually with a loop and checking to see if the source is 1 or 2. If it is 1 it plots a 's' if the source is 2 then it plots a 'o'.
However this method does not really allow for having a legend. (Data1, Data2,...Data6)
Is there a better way of doing this?
EDIT:
I want a cleaner method for this, something along the lines of
x=[1,2,3]
y=[4,5,6]
m=['o','s','^']
plt.scatter(x,y,marker=m)
But this returns an error Unrecognized marker style
A more pythonic way (but still a loop) might be something like
x=[1,2,3]
y=[4,5,6]
l=['data1','data2','data3']
m=['ob','sb','^b']
f,a = plt.subplots(1,1)
[a.plot(*data, label=lab) for data,lab in zip(zip(x,y,m),l)]
plt.legend(loc='lower right')
plt.xlim(0,4)
plt.ylim(3,7);
But I guess this is not the most efficient way if you have lots of datapoints.
If you want to use scatter try something like
m=['o','s','^']
f,a = plt.subplots(1,1)
[a.scatter(*data, marker=m1, label=l1) for data,m1,l1 in zip(zip(x,y),m,l)]
I'm pretty sure, there is also a possibility to apply ** and dicts here.
UPDATE:
Instead of looping over the plot command the ability of matplotlib's plot function to read an arbitrary number of x,y,fmt groups, see docs.
x=np.random.random((3,6))
y=np.random.random((3,6))
l=['data1','data2','data3']
m=['ob','sb','^b']
plt.plot(*[i[j] for i in zip(x,y,m) for j in range(3)])
plt.legend(l,loc='lower right')
Calling plot in a loop is fine. You just need to keep the list of lines returned by plot and use fig.legend to create a legend for the whole figure. See http://matplotlib.org/examples/pylab_examples/figlegend_demo.html
Seconded to #tcaswell 's comments, .scatter() returns collections.PathCollection, which provides a fast way of plotting a large number of identical shaped objects. You can use a loop to plot the data as many scatter plots (and many different datasets) but in my opinion it looses all the speed benefit provided by .scatter().
With these being said, it is however not true that the dots have to be identical in a scatter plot. You can have different linewidth, edgecolor and many other things. But the dots have to be the same shape. See this example, assigning different colors (and only plot one dataset):
>>> sc=plt.scatter(x, y, label='test')
>>> sc.set_color(['r','g','b'])
>>> plt.legend()
See details in http://matplotlib.org/api/collections_api.html.
These were all alright, but not really what I was looking for. The problem was how I parsed through my data and how I could add a legend in the wouldn't mess that up. Since I did a for-loop and plotted each point individually based on if it was measured at Observation location 1 or 2 whenever I made a legend it would plot over 50 legend entries. So I plotted my data as full sets (Invisibly and with no change in symbols) then again in color with the varying symbols. This worked better. Thanks though
The code below gives me the image even further below.
flowRates=[2,5,10,20,50]
flowRateTol=0.2
#sets the limits for the plot
xRange=(0,700)
yRange=(0,70)
ax=axes()
ax.set_xlabel('Time (s)')
#ax.set_ylabel('Reaction Force (lbf)')
ax.legend(loc=0)
#set up the second axis
ax.twinx()
ax.set_ylabel('10s Average Flow Rate')
ax.set_xlim(xRange)
ax.set_ylim(yRange)
#shade the acceptable tolerance bands
for flowRate in flowRates:
rectX=[0,xRange[1],xRange[1],0]
rectY=[ flowRate*(1-flowRateTol),
flowRate*(1-flowRateTol),
flowRate*(1+flowRateTol),
flowRate*(1+flowRateTol)]
ax.fill(rectX,rectY,'b', alpha=0.2, edgecolor='r')
However what I'd like to do in my next iPython cell is to actually plot data on the graph. The code I'm using to do so (unsuccessfully is) just has a call to ax.plot(), but I can't get a graph to show up with my data.
Any thoughts? My goal is to have a worflow (that I will present) that goes something like this:
Look how I import my data!
This is how I set up my graph! (show the base plot)
This is how I plot all my data! (show the base plot with the data)
This is how I filter my data! (do some fancy filtering)
This is what the filtered data looks like! (show new data on same base plot)
I would suggest packaging different ideas into functions. E.g
This is how I import data:
def Import_Data(file_name,...):
# Stuff to import data
return data
This is how I plot my data:
def Plot(data..)
Plotting just the base plot seems like a special case that you may do once, but if you really want to be able to show this, and minimise the amount of repeated code just allow data=None to ignore errors and not plot anything.
The great thing about splitting code up like this is that it is easy to make changes to just one function, provided then just worry about inputs and outputs. For instance to filter you can either add a filter paramateter to the plot function, or create new filtered data that is plotted in the same way!