Plotting dot plot with enough space of ticks in Python/matplotlib? - python

In the following code snippet:
import numpy as np
import pandas as pd
import pandas.rpy.common as com
import matplotlib.pyplot as plt
mtcars = com.load_data("mtcars")
df = mtcars.groupby(["cyl"]).apply(lambda x: pd.Series([x["cyl"].count(), np.mean(x["wt"])], index=["n", "wt"])).reset_index()
plt.plot(df["n"], range(len(df["cyl"])), "o")
plt.yticks(range(len(df["cyl"])), df["cyl"])
plt.show()
This code outputs the dot plot graph, but the result looks quite awful, since both the xticks and yticks don't have enough space, that it's quite difficult to notice both 4 and 8 of the cyl variable output its values in the graph.
So how can I plot it with enough space in advance, much like you can do it without any hassles in R/ggplot2?
For your information, both of this code and this doesn't work in my case. Anyone knows the reason? And do I have to bother to creating such subplots in the first place? Is it impossible to automatically adjust the ticks with response to the input values?

I can't quite tell what you're asking...
Are you asking why the ticks aren't automatically positioned or are you asking how to add "padding" around the inside edges of the plot?
If it's the former, it's because you've manually set the tick locations with yticks. This overrides the automatic tick locator.
If it's the latter, use ax.margins(some_percentage) (where some_percentage is between 0 and 1, e.g. 0.05 is 5%) to add "padding" to the data limits before they're autoscaled.
As an example of the latter, by default, the data limits can be autoscaled such that a point can lie on the boundaries of the plot. E.g.:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(range(10), 'ro')
plt.show()
If you want to avoid this, use ax.margins (or equivalently, plt.margins) to specify a percentage of padding to be added to the data limits before autoscaling takes place.
E.g.
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(range(10), 'ro')
ax.margins(0.04) # 4% padding, similar to R.
plt.show()

Related

Is there a way to adjust the axes limits of pairplot(), but not as individual plots?

Is there a way to adjust the axes limits of pairplot(), but not as individual plots? Maybe a setting to produce better axes limits?
I would like to have the plots with a bigger range for the axes. My plots axes allows all the data to be visualized, but it is too 'zoomed in'.
My code is:
import pandas as pd
mport matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
g = sns.pairplot(iris, hue = 'species', diag_kind = 'hist', palette = 'Dark2', plot_kws={"s": 20})
The link for my plot and what I would like to plot to look like is here:
pairplot
To change the subplots, g.map(func, <parameters>) can be used. A small problem is that func needs to accept color as parameter, and plt.margins() gives an error when color is used. Moreover, map uses x and y to indicate the row and column variables. You could write a dummy function that simply calls plt.margin(), for example g.map(lambda *args, **kwargs: plt.margins(x=0.2, y=0.3)).
An alternative is to loop through g.axes.flat and call ax.margins() on each of them. Note that many axes are shared in x and/or y direction. The diagonal is treated differently; for some reason ax.margins needs to be called a second time on the diagonal.
To have the histogram for the different colors stacked instead of overlapping, diag_kws={"multiple": "stack"} can be set.
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
iris = sns.load_dataset('iris')
g = sns.pairplot(iris, hue='species', diag_kind='hist', palette='Dark2',
plot_kws={"s": 20}, diag_kws={"multiple": "stack"})
# g.map(plt.margins, x=0.2, y=0.2) # gives an error
for ax in g.axes.flat:
ax.margins(x=0.2, y=0.2)
for ax in g.diag_axes:
ax.margins(y=0.2)
plt.show()
PS: still another option, is to change the rcParams which will have effect on all the plots created later in the code:
import matplotlib as mpl
mpl.rcParams['axes.xmargin'] = 0.2
mpl.rcParams['axes.ymargin'] = 0.2

Change colour scheme label to log scale without changing the axis in matplotlib

I am quite new to python programming. I have a script with me that plots out a heat map using matplotlib. Range of X-axis value = (-180 to +180) and Y-axis value =(0 to 180). The 2D heatmap colours areas in Rainbow according to the number of points occuring in a specified area in the x-y graph (defined by the 'bin' (see below)).
In this case, x = values_Rot and y = values_Tilt (see below for code).
As of now, this script colours the 2D-heatmap in the linear scale. How do I change this script such that it colours the heatmap in the log scale? Please note that I only want to change the heatmap colouring scheme to log-scale, i.e. only the number of points in a specified area. The x and y-axis stay the same in linear scale (not in logscale).
A portion of the code is here.
rot_number = get_header_number(headers, AngleRot)
tilt_number = get_header_number(headers, AngleTilt)
psi_number = get_header_number(headers, AnglePsi)
values_Rot = []
values_Tilt = []
values_Psi = []
for line in data:
try:
values_Rot.append(float(line.split()[rot_number]))
values_Tilt.append(float(line.split()[tilt_number]))
values_Psi.append(float(line.split()[psi_number]))
except:
print ('This line didnt work, it may just be a blank space. The line is:' + line)
# Change the values here if you want to plot something else, such as psi.
# You can also change how the data is binned here.
plt.hist2d(values_Rot, values_Tilt, bins=25,)
plt.colorbar()
plt.show()
plt.savefig('name_of_output.png')
You can use a LogNorm for the colors, using plt.hist2d(...., norm=LogNorm()). Here is a comparison.
To have the ticks in base 2, the developers suggest adding the base to the LogLocator and the LogFormatter. As in this case the LogFormatter seems to write the numbers with one decimal (.0), a StrMethodFormatter can be used to show the number without decimals. Depending on the range of numbers, sometimes the minor ticks (shorter marker lines) also get a string, which can be suppressed assigning a NullFormatter for the minor colorbar ticks.
Note that base 2 and base 10 define exactly the same color transformation. The position and the labels of the ticks are different. The example below creates two colorbars to demonstrate the different look.
import matplotlib.pyplot as plt
from matplotlib.ticker import NullFormatter, StrMethodFormatter, LogLocator
from matplotlib.colors import LogNorm
import numpy as np
from copy import copy
# create some toy data for a standalone example
values_Rot = np.random.randn(100, 10).cumsum(axis=1).ravel()
values_Tilt = np.random.randn(100, 10).cumsum(axis=1).ravel()
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(15, 4))
cmap = copy(plt.get_cmap('hot'))
cmap.set_bad(cmap(0))
_, _, _, img1 = ax1.hist2d(values_Rot, values_Tilt, bins=40, cmap='hot')
ax1.set_title('Linear norm for the colors')
fig.colorbar(img1, ax=ax1)
_, _, _, img2 = ax2.hist2d(values_Rot, values_Tilt, bins=40, cmap=cmap, norm=LogNorm())
ax2.set_title('Logarithmic norm for the colors')
fig.colorbar(img2, ax=ax2) # default log 10 colorbar
cbar2 = fig.colorbar(img2, ax=ax2) # log 2 colorbar
cbar2.ax.yaxis.set_major_locator(LogLocator(base=2))
cbar2.ax.yaxis.set_major_formatter(StrMethodFormatter('{x:.0f}'))
cbar2.ax.yaxis.set_minor_formatter(NullFormatter())
plt.show()
Note that log(0) is minus infinity. Therefore, the zero values in the left plot (darkest color) are left empty (white background) on the plot with the logarithmic color values. If you just want to use the lowest color for these zeros, you need to set a 'bad' color. In order not the change a standard colormap, the latest matplotlib versions wants you to first make a copy of the colormap.
PS: When calling plt.savefig() it is important to call it before plt.show() because plt.show() clears the plot.
Also, try to avoid the 'jet' colormap, as it has a bright yellow region which is not at the extreme. It may look nice, but can be very misleading. This blog article contains a thorough explanation. The matplotlib documentation contains an overview of available colormaps.
Note that to compare two plots, plt.subplots() needs to be used, and instead of plt.hist2d, ax.hist2d is needed (see this post). Also, with two colorbars, the elements on which the colorbars are based need to be given as parameter. A minimal change to your code would look like:
from matplotlib.ticker import NullFormatter, StrMethodFormatter, LogLocator
from matplotlib.colors import LogNorm
from matplotlib import pyplot as plt
from copy import copy
# ...
# reading the data as before
cmap = copy(plt.get_cmap('magma'))
cmap.set_bad(cmap(0))
plt.hist2d(values_Rot, values_Tilt, bins=25, cmap=cmap, norm=LogNorm())
cbar = plt.colorbar()
cbar.ax.yaxis.set_major_locator(LogLocator(base=2))
cbar.ax.yaxis.set_major_formatter(StrMethodFormatter('{x:.0f}'))
cbar.ax.yaxis.set_minor_formatter(NullFormatter())
plt.savefig('name_of_output.png') # needs to be called prior to plt.show()
plt.show()

Remove grid lines, but keep frame (ggplot2 style in matplotlib)

Using Matplotlib I'd like to remove the grid lines inside the plot, while keeping the frame (i.e. the axes lines). I've tried the code below and other options as well, but I can't get it to work. How do I simply keep the frame while removing the grid lines?
I'm doing this to reproduce a ggplot2 plot in matplotlib. I've created a MWE below. Be aware that you need a relatively new version of matplotlib to use the ggplot2 style.
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import pylab as P
import numpy as np
if __name__ == '__main__':
values = np.random.uniform(size=20)
plt.style.use('ggplot')
fig = plt.figure()
_, ax1 = P.subplots()
weights = np.ones_like(values)/len(values)
plt.hist(values, bins=20, weights=weights)
ax1.set_xlabel('Value')
ax1.set_ylabel('Probability')
ax1.grid(b=False)
#ax1.yaxis.grid(False)
#ax1.xaxis.grid(False)
ax1.set_axis_bgcolor('white')
ax1.set_xlim([0,1])
P.savefig('hist.pdf', bbox_inches='tight')
OK, I think this is what you are asking (but correct me if I misunderstood):
You need to change the colour of the spines. You need to do this for each spine individually, using the set_color method:
for spine in ['left','right','top','bottom']:
ax1.spines[spine].set_color('k')
You can see this example and this example for more about using spines.
However, if you have removed the grey background and the grid lines, and added the spines, this is not really in the ggplot style any more; is that really the style you want to use?
EDIT
To make the edge of the histogram bars touch the frame, you need to either:
Change your binning, so the bin edges go to 0 and 1
n,bins,patches = plt.hist(values, bins=np.linspace(0,1,21), weights=weights)
# Check, by printing bins:
print bins[0], bins[-1]
# 0.0, 1.0
If you really want to keep the bins to go between values.min() and values.max(), you would need to change your plot limits to no longer be 0 and 1:
n,bins,patches = plt.hist(values, bins=20, weights=weights)
ax.set_xlim(bins[0],bins[-1])

Add padding between bars and Y-Axis

I am building a bar chart using matplotlib using the code below. When my first or last column of data is 0, my first column is wedged against the Y-axis.
An example of this. Note that the first column is ON the x=0 point.
If I have data in this column, I get a huge padding between the Y-Axis and the first column as seen here. Note the additional bar, now at X=0. This effect is repeated if I have data in my last column as well.
My code is as follows:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.ticker import MultipleLocator
binVals = [0,5531608,6475325,1311915,223000,609638,291151,449434,1398731,2516755,3035532,2976924,2695079,1822865,1347155,304911,3562,157,5,0,0,0,0,0,0,0,0]
binTot = sum(binVals)
binNorm = []
for v in range(len(binVals)):
binNorm.append(float(binVals[v])/binTot)
fig = plt.figure(figsize=(6,4))
ax1 = fig.add_subplot(1,1,1)
ax1.bar(range(len(binNorm)),binNorm,align='center', label='Values')
plt.legend(loc=1)
plt.title("Demo Histogram")
plt.xlabel("Value")
plt.xticks(range(len(binLabels)),binLabels,rotation='vertical')
plt.grid(b=True, which='major', color='grey', linestyle='--', alpha=0.35)
ax1.xaxis.grid(False)
plt.ylabel("% of Count")
plt.subplots_adjust(bottom=0.15)
plt.tight_layout()
plt.show()
How can I set a constant margin between the Y-axis and my first/last bar?
Additionally, I realize it's labeled "Demo Histogram", that is a because I missed it when correcting problems discussed here.
I can't run the code snippet you gave, and even with some modification I couldn't replicate the big space. Aside from that, if you need to enforce a border to matplotlib, you ca do somthing like this:
ax.set_xlim( min(your_data) - 10, None )
The first term tells the axis to put the border at 10 units of distance from the minimum of your data, the None parameter teels it to keep the present value.
to put it into contest:
from collections import Counter
from pylab import *
data = randint(20,size=1000)
res = Counter(data)
vals = arange(20)
ax = gca()
ax.bar(vals-0.4, [ res[i] for i in vals ], width=0.8)
ax.set_xlim( min(data)-1, None )
show()
searching around stackoverflow I just learned a new trick: you can call
ax.margins( margin_you_desire )
to let automatically let matplotlib put that amount of space around your plot. It can also be configured differently between x and y.
In your case the best solution would be something like
ax.margins(0.01, None)
The little catch is that the unit is in axes unit, referred to the size of you plot, so a margin of 1 will put space around your plot at both sizes big as your present plot
The problem is align='center'. Remove it.

Overlapping y-axis tick label and x-axis tick label in matplotlib

If I create a plot with matplotlib using the following code:
import numpy as np
from matplotlib import pyplot as plt
xx = np.arange(0,5, .5)
yy = np.random.random( len(xx) )
plt.plot(xx,yy)
plt.imshow()
I get a result that looks like the attached image. The problem is the
bottom-most y-tick label overlaps the left-most x-tick label. This
looks unprofessional. I was wondering if there was an automatic
way to delete the bottom-most y-tick label, so I don't have
the overlap problem. The fewer lines of code, the better.
In the ticker module there is a class called MaxNLocator that can take a prune kwarg.
Using that you can remove the first tick:
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import numpy as np
xx = np.arange(0,5, .5)
yy = np.random.random( len(xx) )
plt.plot(xx,yy)
plt.gca().xaxis.set_major_locator(MaxNLocator(prune='lower'))
plt.show()
Result:
You can pad the ticks on the x-axis:
ax.tick_params(axis='x', pad=15)
Replace ax with plt.gca() if you haven't stored the variable ax for the current figure.
You can also pad both the axes removing the axis parameter.
A very elegant way to fix the overlapping problem is increasing the padding of the x- and y-tick labels (i.e. the distance to the axis). Leaving out the corner most label might not always be wanted. In my opinion, in general it looks nice if the labels are a little bit farther from the axis than given by the default configuration.
The padding can be changed via the matplotlibrc file or in your plot script by using the commands
import matplotlib as mpl
mpl.rcParams['xtick.major.pad'] = 8
mpl.rcParams['ytick.major.pad'] = 8
Most times, a padding of 6 is also sufficient.
This is answered in detail here. Basically, you use something like this:
plt.xticks([list of tick locations], [list of tick lables])

Categories

Resources