I am trying to plot a collection of tens of thousands of line segments in a matplotlib interactive plot in a Jupyter notebook. The problem I have is that
the x-values are datetimes (datetime64[ns], basically POSIX timestamps)
LineCollections can only be based on numbers
when leaving the x-axis of the plot to be numbers, when I zoom the plot, the x-axis nicely adjusts in scale to the zoom. However, the x-axis values are uninformative. When formatting the x-axis to informative datetime values, this information is lost when zooming.
Example:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import collections as mc
import matplotlib.dates as mdates
%matplotlib nbagg # interactive plot in jupyter notebook
x = np.array([['2018-03-19T07:01:00.000073810', '2018-03-19T07:01:00.632164618'],
['2018-03-19T07:01:00.000073811', '2018-03-19T07:01:00.742295898'],
['2018-03-19T07:01:00.218747698', '2018-03-19T07:01:00.260067814'],
['2018-03-19T07:01:01.218747698', '2018-03-19T07:01:02.260067814'],
['2018-03-19T07:01:02.218747698', '2018-03-19T07:01:02.260067814'],
['2018-03-19T07:01:02.218747698', '2018-03-19T07:01:02.260067814']],
dtype='datetime64[ns]')
y = np.array([[12355.5, 12355.5],
[12363. , 12363. ],
[12362.5, 12362.5],
[12355.5, 12355.5],
[12363. , 12363. ],
[12362.5, 12362.5]])
fig, ax = plt.subplots()
segs = np.zeros((x.shape[0], x.shape[1], 2))
segs[:, :, 1] = y
segs[:, :, 0] = mdates.date2num(x)
lc = mc.LineCollection(segs)
ax.set_xlim(segs[:,:,0].min(), segs[:,:,0].max())
ax.set_ylim(segs[:,:,1].min()-1, segs[:,:,1].max()+1)
ax.add_collection(lc)
Now, zooming works fine -- the x-axis scale adjusts with the zoom -- but the x-axis values don't tell me anything useful, i.e. the precise time I'm currently looking at. To remedy this I tried to e.g. do:
ax.xaxis.set_major_locator(mdates.SecondLocator())
#ax.xaxis.set_minor_locator(mdates.MicrosecondLocator()) # this causes the plot not to display
Fmt = mdates.DateFormatter("%S")
ax.xaxis.set_major_formatter(Fmt)
Now clearly zooming doesn't work fine since matplotlib doesn't know how format the finer ticks. So if I zoom sufficiently -- which I need to do -- I basically have no ticks on the x-axis.
Is there a way to address this? One way I could think of is to be able to setup a callback that gets called when the plot zooms, and adjust the format of the x-axis. But as far as I could find, this is not possible.
It appears that the main problem is currently to get just any useful ticks and labels on your plot. The default way to do this would be
loc = mdates.AutoDateLocator()
fmt = mdates.AutoDateFormatter(loc)
ax.xaxis.set_major_locator(loc)
ax.xaxis.set_major_formatter(fmt)
This would automatically choose useful tick locations for you and is correct down to some microseconds; below that, ticking may become inaccurate due to floating point restrictions.
Meaning, if you need customized or more accurate tick locations you will need to write your own locator and/or change the units of your data (e.g. to "seconds since midnight").
Related
graph
how do I make this graph infill all the square around it? (I colored the part that I want to take off in yellow, for reference)
Normally I use two methods to adjust axis limits depending on a situation.
When a graph is simple, axis.set_ylim(bottom, top) method is a quick way to directly change y-axis (you might know this already).
Another way is to use matplotlib.ticker. It gives you more utilities to adjust axis ticks in your graph.
https://matplotlib.org/3.1.1/gallery/ticks_and_spines/tick-formatters.html
I'm guessing you're using a list of strings to set yaxis tick labels. You may want to set locations (float numbers) and labels (string) of y-axis ticks separatedly. Then set the limits on locations like the following snippet.
import matplotlib.pyplot as plt
import matplotlib.ticker as mt
fig, ax = plt.subplots(1,1)
ax.plot([0,1,2], [0,1,2])
ax.yaxis.set_major_locator(mt.FixedLocator([0,1,2]))
ax.yaxis.set_major_formatter(mt.FixedFormatter(["String1", "String2", "String3"]))
ax.set_ylim(bottom=0, top=2)
It gives you this: generated figure
Try setting the min and max of your x and y axes.
I am trying to customize the xticks and yticks for my scatterplot with the simple code below:
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
y_ticks = np.arange(10, 41, 10)
x_ticks = np.arange(1000, 5001, 1000)
ax.set_yticks(y_ticks)
ax.set_xticks(x_ticks)
ax.scatter(some_x, some_y)
plt.show()
If we comment out the line: ax.scatter(x, y), we get an empty plot with the correct result:
However if the code is run exactly as shown, we get this:
Finally, if we run the code with ax.set_yticks(yticks) and ax.set_xticks(xticks) commented out, we also get the correct result (just with the axes not in the ranges I desire them to be):
Note that I am using Python version 2.7. Additionally, some_x and some_y are omitted.
Any input on why the axes are changing in such an odd manner only after I try plotting a scatterplot would be appreciated.
EDIT:
If I run ax.scatter(x, y) before xticks and yticks are set, I get odd results that are slightly different than before:
Matplotlib axes will always adjust themselves to the content. This is a desirable feature, because it allows to always see the plotted data, no matter if it ranges from -10 to -9 or from 1000 to 10000.
Setting the xticks will only change the tick locations. So if you set the ticks to locations between -10 and -9, but then plot data from 1000 to 10000, you would simply not see any ticks, because they do not lie in the shown range.
If the automatically chosen limits are not what you are looking for, you need to set them manually, using ax.set_xlim() and ax.set_ylim().
Finally it should be clear that in order to have correct numbers appear on the axes, you need to actually use numbers. If some_x and some_y in ax.scatter(some_x, some_y) are strings, they will not obey to any reasonable limits, but simply be plotted one after the other.
I am trying to plot a data and function with matplotlib 2.0 under python 2.7.
The x values of the function are evolving with time and the x is first decreasing to a certain value, than increasing again.
If the function is plotted against time, it shows function like this plot of data against time
I need the same x axis evolution for plotting against real x values. Unfortunately as the x values are the same for both parts before and after, both values are mixed together. This gives me the wrong data plot:
In this example it means I need the x-axis to start on value 2.4 and decrease to 1.0 than again increase to 2.4. I swear I found before that this is possible, but unfortunately I can't find a trace about that again.
A matplotlib axis is by default linearly increasing. More importantly, there must be an injective mapping of the number line to the axis units. So changing the data range is not really an option (at least when the aim is to keep things simple).
It would hence be good to keep the original numbers and only change the ticks and ticklabels on the axis. E.g. you could use a FuncFormatter to map the original numbers to
np.abs(x-tp)+tp
where tp would be the turning point.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker
x = np.linspace(-10,20,151)
y = np.exp(-(x-5)**2/19.)
plt.plot(x,y)
tp = 5
fmt = lambda x,pos:"{:g}".format(np.abs(x-tp)+tp)
plt.gca().xaxis.set_major_formatter(matplotlib.ticker.FuncFormatter(fmt))
plt.show()
One option would be to use two axes, and plot your two timespans separately on each axes.
for instance, if you have the following data:
myX = np.linspace(1,2.4,100)
myY1 = -1*myX
myY2 = -0.5*myX-0.5
plt.plot(myX,myY, c='b')
plt.plot(myX,myY2, c='g')
you can instead create two subplots with a shared y-axis and no space between the two axes, plot each time span independently, and finally, adjust the limits of one of your x-axis to reverse the order of the points
fig, (ax1,ax2) = plt.subplots(1,2, gridspec_kw={'wspace':0}, sharey=True)
ax1.plot(myX,myY1, c='b')
ax2.plot(myX,myY2, c='g')
ax1.set_xlim((2.4,1))
ax2.set_xlim((1,2.4))
In the following code snippet:
import numpy as np
import pandas as pd
import pandas.rpy.common as com
import matplotlib.pyplot as plt
mtcars = com.load_data("mtcars")
df = mtcars.groupby(["cyl"]).apply(lambda x: pd.Series([x["cyl"].count(), np.mean(x["wt"])], index=["n", "wt"])).reset_index()
plt.plot(df["n"], range(len(df["cyl"])), "o")
plt.yticks(range(len(df["cyl"])), df["cyl"])
plt.show()
This code outputs the dot plot graph, but the result looks quite awful, since both the xticks and yticks don't have enough space, that it's quite difficult to notice both 4 and 8 of the cyl variable output its values in the graph.
So how can I plot it with enough space in advance, much like you can do it without any hassles in R/ggplot2?
For your information, both of this code and this doesn't work in my case. Anyone knows the reason? And do I have to bother to creating such subplots in the first place? Is it impossible to automatically adjust the ticks with response to the input values?
I can't quite tell what you're asking...
Are you asking why the ticks aren't automatically positioned or are you asking how to add "padding" around the inside edges of the plot?
If it's the former, it's because you've manually set the tick locations with yticks. This overrides the automatic tick locator.
If it's the latter, use ax.margins(some_percentage) (where some_percentage is between 0 and 1, e.g. 0.05 is 5%) to add "padding" to the data limits before they're autoscaled.
As an example of the latter, by default, the data limits can be autoscaled such that a point can lie on the boundaries of the plot. E.g.:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(range(10), 'ro')
plt.show()
If you want to avoid this, use ax.margins (or equivalently, plt.margins) to specify a percentage of padding to be added to the data limits before autoscaling takes place.
E.g.
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(range(10), 'ro')
ax.margins(0.04) # 4% padding, similar to R.
plt.show()
If I create a plot with matplotlib using the following code:
import numpy as np
from matplotlib import pyplot as plt
xx = np.arange(0,5, .5)
yy = np.random.random( len(xx) )
plt.plot(xx,yy)
plt.imshow()
I get a result that looks like the attached image. The problem is the
bottom-most y-tick label overlaps the left-most x-tick label. This
looks unprofessional. I was wondering if there was an automatic
way to delete the bottom-most y-tick label, so I don't have
the overlap problem. The fewer lines of code, the better.
In the ticker module there is a class called MaxNLocator that can take a prune kwarg.
Using that you can remove the first tick:
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import numpy as np
xx = np.arange(0,5, .5)
yy = np.random.random( len(xx) )
plt.plot(xx,yy)
plt.gca().xaxis.set_major_locator(MaxNLocator(prune='lower'))
plt.show()
Result:
You can pad the ticks on the x-axis:
ax.tick_params(axis='x', pad=15)
Replace ax with plt.gca() if you haven't stored the variable ax for the current figure.
You can also pad both the axes removing the axis parameter.
A very elegant way to fix the overlapping problem is increasing the padding of the x- and y-tick labels (i.e. the distance to the axis). Leaving out the corner most label might not always be wanted. In my opinion, in general it looks nice if the labels are a little bit farther from the axis than given by the default configuration.
The padding can be changed via the matplotlibrc file or in your plot script by using the commands
import matplotlib as mpl
mpl.rcParams['xtick.major.pad'] = 8
mpl.rcParams['ytick.major.pad'] = 8
Most times, a padding of 6 is also sufficient.
This is answered in detail here. Basically, you use something like this:
plt.xticks([list of tick locations], [list of tick lables])