Changing the count of numbers of the y-axis (Python, matplotlib.pyplot)

Changing the count of numbers of the y-axis (Python, matplotlib.pyplot) - python

I just have a small problem with pyplot.
I am plotting my data in a way like:
import matplotlib.pyplot as plt
plt.subplot(i,3,j)
plt.plot(xy.data)
plt.show()
With several subplots. Now my problem is: When I have many subplots, the plots become very small and espacially flat. So the numbers that are on my y-axis become impossible to read, cause they overlap each other.
Is it somehow easily possible to change the count of the numbers to something like 3? So I just have the maximum, zero and the minimum? But not the minimum of my function, I would rather keep the minimum (and max) that is currently there. So I just would like to let every step inbetween the current min, 0 and max away.
Thank You all, have a nice day.

from matplotlib.ticker import MaxNLocator
plt.gca().yaxis.set_major_locator(MaxNLocator(3))
MaxNLocator sets the amount of ticks, and with the plt.gca().yaxis you make this happen on the y-axis.

Related

Plotting for a large number of time series data points using matplotlib

I've collected a sensor data every 5 minutes for a month (30 days).
That means, I have a timeseries data with 288*30 data points in total.
I'd like to scatterplot the data (x-axis: time, y-axis: sensor value).
the following code is for test.
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
# generate time series randomly (length: 1 month)
rng=pd.date_range("2015-11-11",periods=288*30,freq="5min")
ts=pd.Series(np.random.randn(len(rng)),rng)
nr=3
nc=1
fig=plt.figure(1)
fig.subplots_adjust(left=0.04,top=1,bottom=0.02,right=0.98,wspace=0.1,hspace=0.1)
for i in range(3):
ctr=i+1
ax=fig.add_subplot(nr,nc,ctr)
ax.scatter(ts.index,ts.values)
ax.set_xlim(ts.index.min(),ts.index.max())
plt.show()
I've generated random time series data having 288*30 observations and tried to draw it in scatter plot. However, as you can see, it is impossible to analyze the figure.
I want to redraw it satisfying the following conditions:
I want a zoomed-in version of the figure. In other words, a part of data points of some time range (e.g., 2~3 hours) is shown at once. Then, there should be enough space between adjacent points.
I want save the figure as png or pdf file. Then, if I open the file, the image (or pdf) viewer has a horizontal scroll bar which enables me to explore the whole figure.
Is there anyone who can solve it?
I do not think it will be not hard for a matplotlib expert, but quite hard for me, a beginner.

note to readers: answer changed significantly from v1 due to clarification of the question
I want a zoomed-in version of the figure. In other words, a part of data points of some time range (e.g., 2~3 hours) is shown at once. Then, there should be enough space between adjacent points.
Zooming in matplotlib is implemented with the x and y limits of the axis. So you can simply change the arguments to your call to ax.set_xlim such that the corresponding times differ by 2-3 hours or however long you want. Knowing that you have a sample every 5 minutes, since 2 hours/(5 min/sample) = 24, you could use
ax.set_xlim(ts.index.min(),ts.index.min() + 24)
to get a 2-hour range.
I want save the figure as png or pdf file. Then, if I open the file, the image (or pdf) viewer has a horizontal scroll bar which enables me to explore the whole figure.
Use savefig to save the figure to a file. Note that if you have set the axis limits using set_xlim or xlim or equivalent, this will save only the portion of the figure that is visible within the given limits. So to save the entire figure (with all data points visible), you will need to set the axis limits to the minimum and maximum values, respectively.
When you open the image/PDF file in a viewer, whether it displays a scroll bar (and how much of the figure is shown) is entirely up to the viewer. You cannot control this in Python. But you can give it some chance of showing up with a horizontal scroll bar by making the figure very large in the horizontal direction. To do so, you can pass the figsize=(width, height) keyword argument when creating the figure, or use the set_size_inches(width, height) method on an existing Figure object. The measurements are in inches in both cases. Pass a value for width that is much larger than that for height and you will get a very wide figure; for example, 40 for width and 4 for height. You'll have to experiment with these values to find which ones give your figure the proportions you want.

Python, matplotlib: how to set tick label values to their logarithmic values

I have some data that I plot on a semi-log plot (log-lin style, with a logarithmic scale on the y-axis). Is there a way to change the y-axis tick labels from their actual values to their logarithmic values?
As an example, consider the following code:
import matplotlib.pyplot as plt
import numpy as np
x=np.array([1,2,3,4,5])
def f(x):
return 10**(x-1)
plt.plot(x,f(x))
plt.yscale(u'log')
plt.show()
Which produces the following plot:
(Sorry it is kind of big, I do not know how to make it smaller, feel free to edit to help out with that).
In this plot the tick labels are shown as 10^0, 10^1, 10^2, etc.; however I would like them to display as their logarithmic values: 0, 1, 2, etc.
I realize I could go back and change plt.plot(x,f(x)) to plt.plot(x,np.log10(f(x))) and then make the y-axis linear again instead of logarithmic but I want to know if there is anyway matplotlib can just change the y-axis tick values themselves without me having to put np.log10() in all my plt.plot()'s. My reason for this is two-fold: I have many plt.plot() lines in my code and would rather not go back and have to change it for all of them, and then I wouldn't have logarithmically spaced minor ticks (although I'm sure there's some way to change that even with a linear axis).
EDIT: I am aware of this question which has some similarities to mine but is not the same. The person in that question wants to change the tick labels from scientific form to "normal" decimal form. I want to change my tick labels from scientific form to the logarithmic (base 10) value of the number. I am sure the answer will be similar to the one I linked but it is not obvious to me how to do it. In fact, I looked at that question before posting mine but still decided to post mine because I did not know how to apply it to my problem. Perhaps to experienced programmers it is obvious how to apply the methods of the question I linked to my situation but it isn't obvious to me so please step me through it.
If you could show me a code sample (by copying my code sample and putting in the necessary lines) how this works I would much appreciate it.

You can use a custom formatter, for example:
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import numpy as np
import math
x=np.array([1,2,3,4,5])
def f(x):
return 10**(x-1)
plt.plot(x,f(x))
plt.yscale(u'log')
#SET CUSTORM TICK FORMATTING
plt.gca().yaxis.set_major_formatter(FuncFormatter(lambda x,y: '{}'.format(math.log(x, 10))))
plt.show()

the axes control of mlab.axes

This is just a small question. I use the sentence below to control the three axes ranges.
mlab.axes(xlabel='x', ylabel='y', zlabel='z',ranges=(0,10000,0,10000,0,22),nb_labels=10)
In fact the real data ranges are (3000,4000),(5000,6000),(0,22) respectively.
However the axes of the figure I plot is scaled to (0,10000,0,10000,0,22).
I did not find a parameter of mlab.axes can control that.
Do I have to calculate the data ranges every time? Without knowing the real data range, is there a way to make the axis range obey the real data?

Matplotlib slow with large data sets, how to enable decimation?

I use matplotlib for a signal processing application and I noticed that it chokes on large data sets. This is something that I really need to improve to make it a usable application.
What I'm looking for is a way to let matplotlib decimate my data. Is there a setting, property or other simple way to enable that? Any suggestion of how to implement this are welcome.
Some code:
import numpy as np
import matplotlib.pyplot as plt
n=100000 # more then 100000 points makes it unusable slow
plt.plot(np.random.random_sample(n))
plt.show()
Some background information
I used to work on a large C++ application where we needed to plot large datasets and to solve this problem we used to take advantage of the structure of the data as follows:
In most cases, if we want a line plot then the data is ordered and often even equidistantial. If it is equidistantial, then you can calculate the start and end index in the data array directly from the zoom rectangle and the inverse axis transformation. If it is ordered but not equidistantial a binary search can be used.
Next the zoomed slice is decimated, and because the data is ordered we can simply iterate a block of points that fall inside one pixel. And for each block the mean, maximum and minimum is calculated. Instead of one pixel, we then draw a bar in the plot.
For example: if the x axis is ordered, a vertical line will be drawn for each block, possibly the mean with a different color.
To avoid aliasing the plot is oversampled with a factor of two.
In case it is a scatter plot, the data can be made ordered by sorting, because the sequence of plotting is not important.
The nice thing of this simple recipe is that the more you zoom in the faster it becomes. In my experience, as long as the data fits in memory the plots stays very responsive. For instance, 20 plots of timehistory data with 10 million points should be no problem.

It seems like you just need to decimate the data before you plot it
import numpy as np
import matplotlib.pyplot as plt
n=100000 # more then 100000 points makes it unusable slow
X=np.random.random_sample(n)
i=10*array(range(n/10))
plt.plot(X[i])
plt.show()

Decimation is not best for example if you decimate sparse data it might all appear as zeros.
The decimation has to be smart such that each LCD horizontal pixel is plotted with the min and the max of the data between decimation points. Then as you zoom in you see more an more detail.
With zooming this can not be done easy outside matplotlib and thus is better to handle internally.

How do I convert (or scale) axis values and redefine the tick frequency in matplotlib?

I am displaying a jpg image (I rotate this by 90 degrees, if this is relevant) and of course
the axes display the pixel coordinates. I would like to convert the axis so that instead of displaying the pixel number, it will display my unit of choice - be it radians, degrees, or in my case an astronomical coordinate. I know the conversion from pixel to (eg) degree. Here is a snippet of what my code looks like currently:
import matplotlib.pyplot as plt
import Image
import matplotlib
thumb = Image.open(self.image)
thumb = thumb.rotate(90)
dpi = plt.rcParams['figure.dpi']
figsize = thumb.size[0]/dpi, thumb.size[1]/dpi
fig = plt.figure(figsize=figsize)
plt.imshow(thumb, origin='lower',aspect='equal')
plt.show()
...so following on from this, can I take each value that matplotlib would print on the axis, and change/replace it with a string to output instead? I would want to do this for a specific coordinate format - eg, rather than an angle of 10.44 (degrees), I would like it to read 10 26' 24'' (ie, degrees, arcmins, arcsecs)
Finally on this theme, I'd want control over the tick frequency, on the plot. Matplotlib might print the axis value every 50 pixels, but I'd really want it every (for example) degree.
It sounds like I would like to define some kind of array with the pixel values and their converted values (degrees etc) that I want to be displayed, having control over the sampling frequency over the range xmin/xmax range.
Are there any matplotlib experts on Stack Overflow? If so, thanks very much in advance for your help! To make this a more learning experience, I'd really appreciate being prodded in the direction of tutorials etc on this kind of matplotlib problem. I've found myself getting very confused with axes, axis, figures, artists etc!
Cheers,
Dave

It looks like you're dealing with the matplotlib.pyplot interface, which means that you'll be able to bypass most of the dealing with artists, axes, and the like. You can control the values and labels of the tick marks by using the matplotlib.pyplot.xticks command, as follows:
tick_locs = [list of locations where you want your tick marks placed]
tick_lbls = [list of corresponding labels for each of the tick marks]
plt.xticks(tick_locs, tick_lbls)
For your particular example, you'll have to compute what the tick marks are relative to the units (i.e. pixels) of your original plot (since you're using imshow) - you said you know how to do this, though.
I haven't dealt with images much, but you may be able to use a different plotting method (e.g. pcolor) that allows you to supply x and y information. That may give you a few more options for specifying the units of your image.
For tutorials, you would do well to look through the matplotlib gallery - find something you like, and read the code that produced it. One of the guys in our office recently bought a book on Python visualization - that may be worthwhile looking at.
The way that I generally think of all the various pieces is as follows:
A Figure is a container for all the Axes
An Axes is the space where what you draw (i.e. your plot) actually shows up
An Axis is the actual x and y axes
Artists? That's too deep in the interface for me: I've never had to worry about those yet, even though I rarely use the pyplot module in production plots.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.