Python, matplotlib: how to set tick label values to their logarithmic values

Python, matplotlib: how to set tick label values to their logarithmic values - python

I have some data that I plot on a semi-log plot (log-lin style, with a logarithmic scale on the y-axis). Is there a way to change the y-axis tick labels from their actual values to their logarithmic values?
As an example, consider the following code:
import matplotlib.pyplot as plt
import numpy as np
x=np.array([1,2,3,4,5])
def f(x):
return 10**(x-1)
plt.plot(x,f(x))
plt.yscale(u'log')
plt.show()
Which produces the following plot:
(Sorry it is kind of big, I do not know how to make it smaller, feel free to edit to help out with that).
In this plot the tick labels are shown as 10^0, 10^1, 10^2, etc.; however I would like them to display as their logarithmic values: 0, 1, 2, etc.
I realize I could go back and change plt.plot(x,f(x)) to plt.plot(x,np.log10(f(x))) and then make the y-axis linear again instead of logarithmic but I want to know if there is anyway matplotlib can just change the y-axis tick values themselves without me having to put np.log10() in all my plt.plot()'s. My reason for this is two-fold: I have many plt.plot() lines in my code and would rather not go back and have to change it for all of them, and then I wouldn't have logarithmically spaced minor ticks (although I'm sure there's some way to change that even with a linear axis).
EDIT: I am aware of this question which has some similarities to mine but is not the same. The person in that question wants to change the tick labels from scientific form to "normal" decimal form. I want to change my tick labels from scientific form to the logarithmic (base 10) value of the number. I am sure the answer will be similar to the one I linked but it is not obvious to me how to do it. In fact, I looked at that question before posting mine but still decided to post mine because I did not know how to apply it to my problem. Perhaps to experienced programmers it is obvious how to apply the methods of the question I linked to my situation but it isn't obvious to me so please step me through it.
If you could show me a code sample (by copying my code sample and putting in the necessary lines) how this works I would much appreciate it.

You can use a custom formatter, for example:
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import numpy as np
import math
x=np.array([1,2,3,4,5])
def f(x):
return 10**(x-1)
plt.plot(x,f(x))
plt.yscale(u'log')
#SET CUSTORM TICK FORMATTING
plt.gca().yaxis.set_major_formatter(FuncFormatter(lambda x,y: '{}'.format(math.log(x, 10))))
plt.show()

Related

How do I use matplotlib to create a bar chart of a very large dataset?

The data I am working with is an array 27,000 elements long which is a histogram of a few million data points but what I have is the histogram and I need to plot it in my program, preferably with vertical bars.
I've tried using the 'bar' function in matplotlib but this takes a minute or two to plot whereas using just regular plot (with just points on the chart) is almost immediate but obviously does not achieve the effect I want (i.e. bars). I'm not sure why the bar function is so much slower so I was wondering if there was a more effective way to plot a histogram with vertical bars using matplotlib?
I've looked at the hist function with matplotlib but it's purpose to my understanding is to take data, make a histogram, and then plot it but I already have a histogram so I don't believe it works for my case. I greatly appreciate any help!
Here's a reference to the hist function documentation, maybe I missed something.
https://matplotlib.org/3.2.0/api/_as_gen/matplotlib.pyplot.hist.html
Thanks in advance! Let me know if you would like an example of the code I am working with but it is just your most generic my_axes.plot(my_data) or my_axes.bar(my_data) so I'm not sure how helpful it would be.
I've taken a look at this as well now: https://gist.github.com/pierdom/d639a1d3b8934ee31db8b2ab9997ae92.
This also works but has the same time issue as using bar so I suppose this is just an issue with rendering a lot of vertical bars? (though I still wonder why rendering 27000 points happens so quickly)

Apparently, this is a known and discussed limitation of the bar graph as it is currently implemented. See this issue and this discussion. Though there are questions about it's usefulness, in my particular case I have a toolbar across the top that allows the user to zoom in and move around the data set (which is very practical method for my use case).
However, a great alternative does exist in the form of stairs. Simply use fill and you have an effective bar graph, that is much more performant.
import matplotlib.pyplot as plt
import random
bins = range(27001) # Note that bins needs to be one greater then heights
heights = [random.randint(0, i) for i in range(27000)]
ax = plt.gca()
ax.stairs(heights, bins, fill=True)
plt.show()

matplotlib's bar should be pretty fast to execute so I'm guessing you're passing all the data points to it (although you mention you have "histogram data", so if you can provide more details on the format, that'd help).
bar takes the x positions for the bars and the heights, so if you want the bar function to produce a histogram you need to bin and count.
This will produce something similar to matplotlib's hist:
import matplotlib.pyplot as plt
bins = [0, 1, 2, 3]
heights = [1, 2, 3, 4]
ax = plt.gca()
ax.bar(bins, heights, align='center', width=1)

python multiple stacked plots along y axis

I have a binned data of an x-axis n-length vector and 3 y-axis n-length vector for 3 different histograms on the same x-axis.
Now I want this kind of stacked bar plot or any thing similar as below.
The nearest I have found is Qtiplot (which is not python). It can generate exactly this kind of histogram plots. But it computes the histogram by itself and requires the actual data samples which are not present in my case (I only have the histogram itself).
Please note that I don't know python very well. So I don't have a clue from where I shall start, neither I am really in a mood to learn programming in python. I need this only to make a nice vector-graphics plot for my research thesis.
I have tagged python as I think python is the most obvious language. In case someone knows any better solution other than in python (but not Matlab, I cannot install that huge pile), I will thankfully add the proper tag.
Thanks in advance for any help.

use matplotlib package in python
import matplotlib.pyplot as plt
apple_weight=[3,3,3,10,10,1,1,1,4,4,4,4,7,7,7]
banana_weight=[3,3,3,10,10,1,1,1,4,4,4,4,7,7,7]
mango_weight=[3,3,3,10,10,1,1,1,4,4,4,4,7,7,7]
fig=plt.figure()
ax1=fig.add_subplot(311)
ax2=fig.add_subplot(312)
ax3=fig.add_subplot(313)
ax1.hist(apple_weight)
ax2.hist(banana_weight)
ax3.hist(mango_weight)
plt.show()

import matplotlib.pyplot as plt
apple_weight=[3,3,3,10,10,1,1,1,4,4,4,4,7,7,7]
banana_weight=[3,3,3,10,10,1,1,1,4,4,4,4,7,7,7]
mango_weight=[3,3,3,10,10,1,1,1,4,4,4,4,7,7,7]
fig=plt.figure()
ax1=fig.add_subplot(111)
ax2=ax1.twinx()
#only two y axes so the third list just add to either
ax1.hist(apple_weight)
ax2.hist(banana_weight)
ax1.hist(mango_weight)
plt.show()

Changing the count of numbers of the y-axis (Python, matplotlib.pyplot)

I just have a small problem with pyplot.
I am plotting my data in a way like:
import matplotlib.pyplot as plt
plt.subplot(i,3,j)
plt.plot(xy.data)
plt.show()
With several subplots. Now my problem is: When I have many subplots, the plots become very small and espacially flat. So the numbers that are on my y-axis become impossible to read, cause they overlap each other.
Is it somehow easily possible to change the count of the numbers to something like 3? So I just have the maximum, zero and the minimum? But not the minimum of my function, I would rather keep the minimum (and max) that is currently there. So I just would like to let every step inbetween the current min, 0 and max away.
Thank You all, have a nice day.

from matplotlib.ticker import MaxNLocator
plt.gca().yaxis.set_major_locator(MaxNLocator(3))
MaxNLocator sets the amount of ticks, and with the plt.gca().yaxis you make this happen on the y-axis.

How to better fit seaborn violinplots?

The following code gives me a very nice violinplot (and boxplot within).
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
foo = np.random.rand(100)
sns.violinplot(foo)
plt.boxplot(foo)
plt.show()
So far so good. However, when I look at foo, the variable does not contain any negative values. The seaborn plot seems misleading here. The normal matplotlib boxplot gives something closer to what I would expect.
How can I make violinplots with a better fit (not showing false negative values)?

As the comments note, this is a consequence (I'm not sure I'd call it an "artifact") of the assumptions underlying gaussian KDE. As has been mentioned, this is somewhat unavoidable, and if your data don't meet those assumptions, you might be better off just using a boxplot, which shows only points that exist in the actual data.
However, in your response you ask about whether it could be fit "tighter", which could mean a few things.
One answer might be to change the bandwidth of the smoothing kernel. You do that with the bw argument, which is actually a scale factor; the bandwidth that will be used is bw * data.std():
data = np.random.rand(100)
sns.violinplot(y=data, bw=.1)
Another answer might be to truncate the violin at the extremes of the datapoints. The KDE will still be fit with densities that extend past the bounds of your data, but the tails will not be shown. You do that with the cut parameter, which specifies how many units of bandwidth past the extreme values the density should be drawn. To truncate, set it to 0:
sns.violinplot(y=data, cut=0)
By the way, the API for violinplot is going to change in 0.6, and I'm using the development version here, but both the bw and cut arguments exist in the current released version and behave more or less the same way.

How do I convert (or scale) axis values and redefine the tick frequency in matplotlib?

I am displaying a jpg image (I rotate this by 90 degrees, if this is relevant) and of course
the axes display the pixel coordinates. I would like to convert the axis so that instead of displaying the pixel number, it will display my unit of choice - be it radians, degrees, or in my case an astronomical coordinate. I know the conversion from pixel to (eg) degree. Here is a snippet of what my code looks like currently:
import matplotlib.pyplot as plt
import Image
import matplotlib
thumb = Image.open(self.image)
thumb = thumb.rotate(90)
dpi = plt.rcParams['figure.dpi']
figsize = thumb.size[0]/dpi, thumb.size[1]/dpi
fig = plt.figure(figsize=figsize)
plt.imshow(thumb, origin='lower',aspect='equal')
plt.show()
...so following on from this, can I take each value that matplotlib would print on the axis, and change/replace it with a string to output instead? I would want to do this for a specific coordinate format - eg, rather than an angle of 10.44 (degrees), I would like it to read 10 26' 24'' (ie, degrees, arcmins, arcsecs)
Finally on this theme, I'd want control over the tick frequency, on the plot. Matplotlib might print the axis value every 50 pixels, but I'd really want it every (for example) degree.
It sounds like I would like to define some kind of array with the pixel values and their converted values (degrees etc) that I want to be displayed, having control over the sampling frequency over the range xmin/xmax range.
Are there any matplotlib experts on Stack Overflow? If so, thanks very much in advance for your help! To make this a more learning experience, I'd really appreciate being prodded in the direction of tutorials etc on this kind of matplotlib problem. I've found myself getting very confused with axes, axis, figures, artists etc!
Cheers,
Dave

It looks like you're dealing with the matplotlib.pyplot interface, which means that you'll be able to bypass most of the dealing with artists, axes, and the like. You can control the values and labels of the tick marks by using the matplotlib.pyplot.xticks command, as follows:
tick_locs = [list of locations where you want your tick marks placed]
tick_lbls = [list of corresponding labels for each of the tick marks]
plt.xticks(tick_locs, tick_lbls)
For your particular example, you'll have to compute what the tick marks are relative to the units (i.e. pixels) of your original plot (since you're using imshow) - you said you know how to do this, though.
I haven't dealt with images much, but you may be able to use a different plotting method (e.g. pcolor) that allows you to supply x and y information. That may give you a few more options for specifying the units of your image.
For tutorials, you would do well to look through the matplotlib gallery - find something you like, and read the code that produced it. One of the guys in our office recently bought a book on Python visualization - that may be worthwhile looking at.
The way that I generally think of all the various pieces is as follows:
A Figure is a container for all the Axes
An Axes is the space where what you draw (i.e. your plot) actually shows up
An Axis is the actual x and y axes
Artists? That's too deep in the interface for me: I've never had to worry about those yet, even though I rarely use the pyplot module in production plots.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.