Plotting ordinal data with a marker in matplotlib

Plotting ordinal data with a marker in matplotlib - python

I have some data for which I have experimental and simulated values, the data isn't really continuous without introducing a new definition (which I don't want to do) so I wish to display it ordinally in a scatter type plot with two markers for each set and then label each set on the X-axis.
Basically I can't figure out how to do this with matplotlib (I'd prefer to use it for consistency with how I've presented other data).
An example of the data is presented below:
1cm square: 0.501, 0.505
1cm circle: 0.450, 0.448
1cm X 2cm rect: 0.665, 0.641

I may be misunderstanding the question, but it sounds like you're wanting something along these lines:
import matplotlib.pyplot as plt
# Using this layout to make the grouping clear
data = [('apples', [0.1, 0.25]),
('oranges', [0.6, 0.35]),
('pears', [0.1, 0.18]),
('bananas', [0.7, 0.98]),
('peaches', [0.6, 0.48])]
# Reorganize our data a bit
names = [item[0] for item in data]
predicted = [item[1][0] for item in data]
observed = [item[1][1] for item in data]
# It might make more sense to use a bar chart in this case.
# You could also use `ax.scatter` to plot the data instead of `ax.plot`
fig, ax = plt.subplots()
ax.plot(predicted, color='lightblue', marker='o', linestyle='none',
markersize=12, label='Predicted')
ax.plot(observed, color='red', marker='s', linestyle='none',
markersize=12, label='Observed')
ax.margins(0.05)
ax.set(xticks=range(len(names)), xticklabels=names, ylabel='Meaningless')
ax.legend(loc='best', numpoints=1)
ax.grid(axis='x')
plt.show()
The key part is setting the xticks and xticklabels to correspond to your data "groups". You could plot the data in a few other ways (e.g. bar plots, etc), but using the xticks/labels will be the same in each case.

Related

Matplotlib, 'Figure' object has no attribute 'figlegend' [duplicate]

I am plotting the same type of information, but for different countries, with multiple subplots with Matplotlib. That is, I have nine plots on a 3x3 grid, all with the same for lines (of course, different values per line).
However, I have not figured out how to put a single legend (since all nine subplots have the same lines) on the figure just once.
How do I do that?

There is also a nice function get_legend_handles_labels() you can call on the last axis (if you iterate over them) that would collect everything you need from label= arguments:
handles, labels = ax.get_legend_handles_labels()
fig.legend(handles, labels, loc='upper center')

figlegend may be what you're looking for: matplotlib.pyplot.figlegend
An example is at Figure legend demo.
Another example:
plt.figlegend(lines, labels, loc = 'lower center', ncol=5, labelspacing=0.)
Or:
fig.legend(lines, labels, loc = (0.5, 0), ncol=5)

TL;DR
lines_labels = [ax.get_legend_handles_labels() for ax in fig.axes]
lines, labels = [sum(lol, []) for lol in zip(*lines_labels)]
fig.legend(lines, labels)
I have noticed that none of the other answers displays an image with a single legend referencing many curves in different subplots, so I have to show you one... to make you curious...
Now, if I've teased you enough, here it is the code
from numpy import linspace
import matplotlib.pyplot as plt
# each Axes has a brand new prop_cycle, so to have differently
# colored curves in different Axes, we need our own prop_cycle
# Note: we CALL the axes.prop_cycle to get an itertoools.cycle
color_cycle = plt.rcParams['axes.prop_cycle']()
# I need some curves to plot
x = linspace(0, 1, 51)
functs = [x*(1-x), x**2*(1-x),
0.25-x*(1-x), 0.25-x**2*(1-x)]
labels = ['$x-x²$', '$x²-x³$',
'$\\frac{1}{4} - (x-x²)$', '$\\frac{1}{4} - (x²-x³)$']
# the plot,
fig, (a1,a2) = plt.subplots(2)
for ax, f, l, cc in zip((a1,a1,a2,a2), functs, labels, color_cycle):
ax.plot(x, f, label=l, **cc)
ax.set_aspect(2) # superfluos, but nice
# So far, nothing special except the managed prop_cycle. Now the trick:
lines_labels = [ax.get_legend_handles_labels() for ax in fig.axes]
lines, labels = [sum(lol, []) for lol in zip(*lines_labels)]
# Finally, the legend (that maybe you'll customize differently)
fig.legend(lines, labels, loc='upper center', ncol=4)
plt.show()
If you want to stick with the official Matplotlib API, this is
perfect, otherwise see note no.1 below (there is a private
method...)
The two lines
lines_labels = [ax.get_legend_handles_labels() for ax in fig.axes]
lines, labels = [sum(lol, []) for lol in zip(*lines_labels)]
deserve an explanation, see note 2 below.
I tried the method proposed by the most up-voted and accepted answer,
# fig.legend(lines, labels, loc='upper center', ncol=4)
fig.legend(*a2.get_legend_handles_labels(),
loc='upper center', ncol=4)
and this is what I've got
Note 1
If you don't mind using a private method of the matplotlib.legend module ... it's really much much much easier
from matplotlib.legend import _get_legend_handles_labels
...
fig.legend(*_get_legend_handles_and_labels(fig.axes), ...)
Note 2
I have encapsulated the two tricky lines in a function, just four lines of code, but heavily commented
def fig_legend(fig, **kwdargs):
# Generate a sequence of tuples, each contains
# - a list of handles (lohand) and
# - a list of labels (lolbl)
tuples_lohand_lolbl = (ax.get_legend_handles_labels() for ax in fig.axes)
# E.g., a figure with two axes, ax0 with two curves, ax1 with one curve
# yields: ([ax0h0, ax0h1], [ax0l0, ax0l1]) and ([ax1h0], [ax1l0])
# The legend needs a list of handles and a list of labels,
# so our first step is to transpose our data,
# generating two tuples of lists of homogeneous stuff(tolohs), i.e.,
# we yield ([ax0h0, ax0h1], [ax1h0]) and ([ax0l0, ax0l1], [ax1l0])
tolohs = zip(*tuples_lohand_lolbl)
# Finally, we need to concatenate the individual lists in the two
# lists of lists: [ax0h0, ax0h1, ax1h0] and [ax0l0, ax0l1, ax1l0]
# a possible solution is to sum the sublists - we use unpacking
handles, labels = (sum(list_of_lists, []) for list_of_lists in tolohs)
# Call fig.legend with the keyword arguments, return the legend object
return fig.legend(handles, labels, **kwdargs)
I recognize that sum(list_of_lists, []) is a really inefficient method to flatten a list of lists, but ① I love its compactness, ② usually is a few curves in a few subplots and ③ Matplotlib and efficiency? ;-)

For the automatic positioning of a single legend in a figure with many axes, like those obtained with subplots(), the following solution works really well:
plt.legend(lines, labels, loc = 'lower center', bbox_to_anchor = (0, -0.1, 1, 1),
bbox_transform = plt.gcf().transFigure)
With bbox_to_anchor and bbox_transform=plt.gcf().transFigure, you are defining a new bounding box of the size of your figureto be a reference for loc. Using (0, -0.1, 1, 1) moves this bounding box slightly downwards to prevent the legend to be placed over other artists.
OBS: Use this solution after you use fig.set_size_inches() and before you use fig.tight_layout()

You just have to ask for the legend once, outside of your loop.
For example, in this case I have 4 subplots, with the same lines, and a single legend.
from matplotlib.pyplot import *
ficheiros = ['120318.nc', '120319.nc', '120320.nc', '120321.nc']
fig = figure()
fig.suptitle('concentration profile analysis')
for a in range(len(ficheiros)):
# dados is here defined
level = dados.variables['level'][:]
ax = fig.add_subplot(2,2,a+1)
xticks(range(8), ['0h','3h','6h','9h','12h','15h','18h','21h'])
ax.set_xlabel('time (hours)')
ax.set_ylabel('CONC ($\mu g. m^{-3}$)')
for index in range(len(level)):
conc = dados.variables['CONC'][4:12,index] * 1e9
ax.plot(conc,label=str(level[index])+'m')
dados.close()
ax.legend(bbox_to_anchor=(1.05, 0), loc='lower left', borderaxespad=0.)
# it will place the legend on the outer right-hand side of the last axes
show()

If you are using subplots with bar charts, with a different colour for each bar, it may be faster to create the artefacts yourself using mpatches.
Say you have four bars with different colours as r, m, c, and k, you can set the legend as follows:
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
labels = ['Red Bar', 'Magenta Bar', 'Cyan Bar', 'Black Bar']
#####################################
# Insert code for the subplots here #
#####################################
# Now, create an artist for each color
red_patch = mpatches.Patch(facecolor='r', edgecolor='#000000') # This will create a red bar with black borders, you can leave out edgecolor if you do not want the borders
black_patch = mpatches.Patch(facecolor='k', edgecolor='#000000')
magenta_patch = mpatches.Patch(facecolor='m', edgecolor='#000000')
cyan_patch = mpatches.Patch(facecolor='c', edgecolor='#000000')
fig.legend(handles = [red_patch, magenta_patch, cyan_patch, black_patch], labels=labels,
loc="center right",
borderaxespad=0.1)
plt.subplots_adjust(right=0.85) # Adjust the subplot to the right for the legend

To build on top of gboffi's and Ben Usman's answer:
In a situation where one has different lines in different subplots with the same color and label, one can do something along the lines of:
labels_handles = {
label: handle for ax in fig.axes for handle, label in zip(*ax.get_legend_handles_labels())
}
fig.legend(
labels_handles.values(),
labels_handles.keys(),
loc = "upper center",
bbox_to_anchor = (0.5, 0),
bbox_transform = plt.gcf().transFigure,
)

Using Matplotlib 2.2.2, this can be achieved using the gridspec feature.
In the example below, the aim is to have four subplots arranged in a 2x2 fashion with the legend shown at the bottom. A 'faux' axis is created at the bottom to place the legend in a fixed spot. The 'faux' axis is then turned off so only the legend shows. Result:
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
# Gridspec demo
fig = plt.figure()
fig.set_size_inches(8, 9)
fig.set_dpi(100)
rows = 17 # The larger the number here, the smaller the spacing around the legend
start1 = 0
end1 = int((rows-1)/2)
start2 = end1
end2 = int(rows-1)
gspec = gridspec.GridSpec(ncols=4, nrows=rows)
axes = []
axes.append(fig.add_subplot(gspec[start1:end1, 0:2]))
axes.append(fig.add_subplot(gspec[start2:end2, 0:2]))
axes.append(fig.add_subplot(gspec[start1:end1, 2:4]))
axes.append(fig.add_subplot(gspec[start2:end2, 2:4]))
axes.append(fig.add_subplot(gspec[end2, 0:4]))
line, = axes[0].plot([0, 1], [0, 1], 'b') # Add some data
axes[-1].legend((line,), ('Test',), loc='center') # Create legend on bottommost axis
axes[-1].set_axis_off() # Don't show the bottom-most axis
fig.tight_layout()
plt.show()

This answer is a complement to user707650's answer on the legend position.
My first try on user707650's solution failed due to overlaps of the legend and the subplot's title.
In fact, the overlaps are caused by fig.tight_layout(), which changes the subplots' layout without considering the figure legend. However, fig.tight_layout() is necessary.
In order to avoid the overlaps, we can tell fig.tight_layout() to leave spaces for the figure's legend by fig.tight_layout(rect=(0,0,1,0.9)).
Description of tight_layout() parameters.

All of the previous answers are way over my head, at this state of my coding journey, so I just added another Matplotlib aspect called patches:
import matplotlib.patches as mpatches
first_leg = mpatches.Patch(color='red', label='1st plot')
second_leg = mpatches.Patch(color='blue', label='2nd plot')
thrid_leg = mpatches.Patch(color='green', label='3rd plot')
plt.legend(handles=[first_leg ,second_leg ,thrid_leg ])
The patches aspect put all the data i needed on my final plot (it was a line plot that combined three different line plots all in the same cell in Jupyter Notebook).
Result
(I changed the names form what I named my own legend.)

How to specify space between matplotlib legend markers

I am looking through the matplotlib api and can't seem to find a way to change the space between legend markers. I came across a way to change the space between a marker and its respective handle with handletextpad, but I want to change the space between each marker.
Ideally, I'd like to have the markers touching eachother with the labels above (or on top of) the markers.
My legend:
What I am trying to model:
Is there a way to do this?

I am not sure if this matches your expectations. We have used the standard features to create a graph that is similar to your objectives. Since the code and data are unknown to me, I customized the example in the official reference to create it, using handletextpad and columnspacing, and since the numbers are in font units, I achieved this with a negative value.
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(19680801)
fig, ax = plt.subplots(figsize=(8,8))
for color in ['tab:blue', 'tab:orange', 'tab:green']:
n = 750
x, y = np.random.rand(2, n)
scale = 200.0 * np.random.rand(n)
ax.scatter(x, y, c=color, s=scale, label=color.split(':')[1][0],
alpha=0.5, edgecolors='none')
handlers, labels = ax.get_legend_handles_labels()
print(labels)
ax.legend(handletextpad=-1.2, columnspacing=-0.5, ncol=3,loc="upper left", bbox_to_anchor=(0.75, 1.08))
ax.grid(True)
plt.show()

Change each regression line styling using in a multiple regressions plot Python

I am currently trying to plot two regression lines for my data split by a categorical attribute (which is either freedom or happiness scores). My current qualm is that I need color to encode another separate categorical attribute in my graph (GNI/capita brackets). Having a mix of colors seemed confusing so I decided to distinguish the data points using different markers instead. However, I am having trouble changing just one of the regression lines to a dashed line as they are identical. I don't even want to think about how I am going to create a legend for all of this. If you think this is an ugly graph, I agree, but certain circumstances mandate I have four attributes encoded in a single graph. By the way, open to any suggestions at all on a better way to do this - if there is any. An example of my current graph is below and would appreciate any help!
sns.lmplot(data=combined_indicators, x='x', y='y', hue='Indicator', palette=["#000620"], markers=['x', '.'], ci=None)
plt.axvspan(0,1025, alpha=0.5, color='#de425b', zorder=-1)
plt.axvspan(1025,4035, alpha=0.5, color='#fbb862', zorder=-1)
plt.axvspan(4035,12475, alpha=0.5, color ='#afd17c', zorder=-1)
plt.axvspan(12475,100000, alpha=0.5, color='#00876c', zorder=-1)
plt.title("HFI & Happiness Regressed on GNI/capita")
plt.xlabel("GNI/Capita by Purchasing Power Parity (2017 International $)")
plt.ylabel("Standard Indicator Score (0-10)")
My current figure rears its ugly head

To my knowledge, there is no easy way to change the style of the regression line in lmplot. But you can achieve your goal if you use regplot instead of lmplot, the drawback being that you have to implement the hue-splitting "by hand"
x_col = 'total_bill'
y_col = 'tip'
hue_col = 'smoker'
df = sns.load_dataset('tips')
markers = ['x','.']
colors = ["#000620", "#000620"]
linestyles = ['-','--']
plt.figure()
for (hue,gr),m,c,ls in zip(df.groupby(hue_col),markers,colors,linestyles):
sns.regplot(data=gr, x=x_col, y=y_col, marker=m, color=c, line_kws={'ls':ls}, ci=None, label=f'{hue_col}={hue}')
ax.legend()

Just wanted to add, if anyone stumbled upon this post later, you can create a legend for this mess manually using Line2D. Looks something like this for mine:
from matplotlib.patches import Patch
from matplotlib.lines import Line2D
legend_elements = [Line2D([0], [0], color='#000620', lw=2, label='Freedom', linestyle='--'),
Line2D([0],[0], color='#000620', lw=2, label='Happiness'),
Line2D([0], [0], marker='x', color='#000620', label='Freedom',
markerfacecolor='#000620', markersize=15),
Line2D([0], [0], marker='.', color='#000620', label='Happiness',
markerfacecolor='#000620', markersize=15),
Patch(facecolor='#de425b', label='Low-Income'),
Patch(facecolor='#fbb862', label='Lower Middle-Income'),
Patch(facecolor='#afd17c', label='Upper Middle-Income'),
Patch(facecolor='#00876c', label='High-Income')]
The end result looks like this:
Graph with custom legend

Density Plot Python Pandas

I want to create a plot that looks like the plot attached below.
My data frame is built at this format:
Playlist Type Streams
0 a classical 94
1 b hip-hop 12
2 c classical 8
The 'popularity' category can be replaced by the 'streams' - the only thing is that the streams variable has a high variance of values (goes from 0 to 10,000+) and therefore I believe the density graph might look weird.
However, my first question is how can I plot a graph similar to this in Pandas, when grouping by the 'Type' column and then creating the density graph.
I tried various methods but did not find a good one to establish my goal.

To augment the answer of #Student240 you could make use of the seaborn library, which makes it easy to fit 'kernal density estimates'. In other words, to have smooth curves similar to that in your question, rather than a binned histogram. This is done with the KDEplot class. A related plot type is the distplot which gives the KDE estimate but also shows the histogram bins.
Another difference in my answer is to use the explicit object oriented approach in matplotlib/seaborn. This involves initially declaring a figure and axes objects with plt.subplots() rather than the implicit approach of fig.hist. See this really good tutorial for more details.
import matplotlib.pyplot as plt
import seaborn as sns
## This block of code is copied from Student240's answer:
import random
categories = ['classical','hip-hop','indiepop','indierock','jazz'
,'metal','pop','rap','rock']
# NB I use a slightly different random variable assignment to introduce a bit more variety in my random numbers.
df = pd.DataFrame({'Type':[random.choice(categories) for _ in range(1000)],
'stream':[random.normalvariate(i,random.randint(0,15)) for i in
range(1000)]})
###split the data into groups based on types
g = df.groupby('Type')
## From here things change as I make use of the seaborn library
classical = g.get_group('classical')
hiphop = g.get_group('hip-hop')
indiepop = g.get_group('indiepop')
indierock = g.get_group('indierock')
fig, ax = plt.subplots()
ax = sns.kdeplot(data=classical['stream'], label='classical streams', ax=ax)
ax = sns.kdeplot(data=hiphop['stream'], label='hiphop streams', ax=ax)
ax = sns.kdeplot(data=indiepop['stream'], label='indiepop streams', ax=ax)
# for this final one I use the shade option just to show how it is done:
ax = sns.kdeplot(data=indierock['stream'], label='indierock streams', ax=ax, shade=True)
ax.set_xtitle('Count')
ax.set_ytitle('Density')
ax.set_title('KDE plot example from seaborn")

Hi you can try the following example, I have used randon normals just for this example, obviously it wouldn't be possible to have negative streams. Anyway disclaimer over, here is the code:
import random
categories = ['classical','hip-hop','indiepop','indierock','jazz'
,'metal','pop','rap','rock']
df = pd.DataFrame({'Type':[random.choice(categories) for _ in range(10000)],
'stream':[random.normalvariate(0,random.randint(0,15)) for _ in
range(10000)]})
###split the data into groups based on types
g = df.groupby('Type')
###access the classical group
classical = g.get_group('classical')
plt.figure(figsize=(15,6))
plt.hist(classical.stream, histtype='stepfilled', bins=50, alpha=0.2,
label="Classical Streams", color="#D73A30", density=True)
plt.legend(loc="upper left")
###hip hop
hiphop = g.get_group('hip-hop')
plt.hist(hiphop.stream, histtype='stepfilled', bins=50, alpha=0.2,
label="hiphop Streams", color="#2A3586", density=True)
plt.legend(loc="upper left")
###indie pop
indiepop = g.get_group('indiepop')
plt.hist(indiepop.stream, histtype='stepfilled', bins=50, alpha=0.2,
label="indie pop streams", color="#5D271B", density=True)
plt.legend(loc="upper left")
#indierock
indierock = g.get_group('indierock')
plt.hist(indierock.stream, histtype='stepfilled', bins=50, alpha=0.2,
label="indie rock Streams", color="#30A9D7", density=True)
plt.legend(loc="upper left")
##jazz
jazz = g.get_group('jazz')
plt.hist(jazz.stream, histtype='stepfilled', bins=50, alpha=0.2,
label="jazz Streams", color="#30A9D7", density=True)
plt.legend(loc="upper left")
####you can add other here if you wish
##modify this to control x-axis, possibly useful for high-variance data
plt.xlim([-20,20])
plt.title('Distribution of Streams by Genre')
plt.xlabel('Count')
plt.ylabel('Density')
You can Google 'Hex color picker' if you want to get a specific '#000000' color in the format I have used in this example.
modify variable 'alpha' if you want to change how dense the color is displayed, you can also play around with 'bins' in the example I provided as this should allow you to make it look better if 50 is too large or small.
I hope this helps, plotting in matplotlib can be a pain to learn, but it is surely worth it!!

problem plotting on logscale in matplotlib in python

I am trying to plot the following numbers on a log scale as a scatter plot in matplotlib. Both the quantities on the x and y axes have very different scales, and one of the variables has a huge dynamic range (nearly 0 to 12 million roughly) while the other is between nearly 0 and 2. I think it might be good to plot both on a log scale.
I tried the following, for a subset of the values of the two variables:
fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(1, 1, 1)
ax.set_yscale('log')
ax.set_xscale('log')
plt.scatter([1.341, 0.1034, 0.6076, 1.4278, 0.0374],
[0.37, 0.12, 0.22, 0.4, 0.08])
The x-axes appear log scaled but the points do not appear -- only two points appear. Any idea how to fix this? Also, how can I make this log scale appear on a square axes, so that the correlation between the two variables can be interpreted from the scatter plot?
thanks.

I don't know why you only get those two points. For this case, you can manually adjust the limits to make sure all your points fit. I ran:
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(8, 8)) # You were missing the =
ax = fig.add_subplot(1, 1, 1)
ax.set_yscale('log')
ax.set_xscale('log')
plt.scatter([1.341, 0.1034, 0.6076, 1.4278, 0.0374],
[0.37, 0.12, 0.22, 0.4, 0.08])
plt.xlim(0.01, 10) # Fix the x limits to fit all the points
plt.show()
I'm not sure I understand understand what "Also, how can I make this log scale appear on a square axes, so that the correlation between the two variables can be interpreted from the scatter plot?" means. Perhaps someone else will understand, or maybe you can clarify?

You can also just do,
plt.loglog([1.341, 0.1034, 0.6076, 1.4278, 0.0374],
[0.37, 0.12, 0.22, 0.4, 0.08], 'o')
This produces the plot you want with properly scaled axes, though it doesn't have all the flexibility of a true scatter plot.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plotting ordinal data with a marker in matplotlib - python

Related

Matplotlib, 'Figure' object has no attribute 'figlegend' [duplicate]

How to specify space between matplotlib legend markers

Change each regression line styling using in a multiple regressions plot Python

Density Plot Python Pandas

problem plotting on logscale in matplotlib in python

Categories

Resources