Set log xticks in matplotlib for a linear plot - python

Consider
xdata=np.random.normal(5e5,2e5,int(1e4))
plt.hist(np.log10(xdata), bins=100)
plt.show()
plt.semilogy(xdata)
plt.show()
is there any way to display xticks of the first plot (plt.hist) as in the second plot's yticks? For good reasons I want to histogram the np.log10(xdata) of xdata but I'd like to set minor ticks to display as usual in a log scale (even considering that the exponent is linear...)
In other words, I want the x_axis of this plot:
to be like the y_axis
of the 2nd plot, without changing the spacing between major ticks (e.g., adding log marks between 5.5 and 6.0, without altering these values)

Proper histogram plot with logarithmic x-axis:
Explanation:
Cut off negative values
The randomly generated example data likely contains still some negative values
activate the commented code lines at the beginning to see the effect
logarithmic function isn't defined for values <= 0
while the 2nd plot just deals with y-axis log scaling (negative values are just out of range), the 1st plot doesn't work with negative values in the BINs range
probably real world working data won't be <= 0, otherwise keep that in mind
BINs should be aligned to log scale as well
otherwise the 'BINs widths' distribution looks off
switch # on the plt.hist( statements in the 1st plot section to see the effect)
xdata (not np.log10(xdata)) to be plotted in the histogram
that 'workaround' with plotting np.log10(xdata) probably was the root cause for the misunderstanding in the comments
Code:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42) # just to have repeatable results for the answer
xdata=np.random.normal(5e5,2e5,int(1e4))
# MIN_xdata, MAX_xdata = np.min(xdata), np.max(xdata)
# print(f"{MIN_xdata}, {MAX_xdata}") # note the negative values
# cut off potential negative values (log function isn't defined for <= 0 )
xdata = np.ma.masked_less_equal(xdata, 0)
MIN_xdata, MAX_xdata = np.min(xdata), np.max(xdata)
# print(f"{MIN_xdata}, {MAX_xdata}")
# align the bins to fit a log scale
bins = 100
bins_log_aligned = np.logspace(np.log10(MIN_xdata), np.log10(MAX_xdata), bins)
# 1st plot
plt.hist(xdata, bins = bins_log_aligned) # note: xdata (not np.log10(xdata) )
# plt.hist(xdata, bins = 100)
plt.xscale('log')
plt.show()
# 2nd plot
plt.semilogy(xdata)
plt.show()

Just kept for now for clarification purpose. Will be deleted when the question is revised.
Disclaimer:
As Lucas M. Uriarte already mentioned that isn't an expected way of changing axis ticks.
x axis ticks and labels don't represent the plotted data
You should at least always provide that information along with such a plot.
The plot
From seeing the result I kinda understand where that special plot idea is coming from - still there should be a preferred way (e.g. conversion of the data in advance) to do such a plot instead of 'faking' the axis.
Explanation how that special axis transfer plot is done:
original x-axis is hidden
a twiny axis is added
note that its y-axis is hidden by default, so that doesn't need handling
twiny x-axis is set to log and the 2nd plot y-axis limits are transferred
subplots used to directly transfer the 2nd plot y-axis limits
use variables if you need to stick with your two plots
twiny x-axis is moved from top (twiny default position) to bottom (where the original x-axis was)
Code:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42) # just to have repeatable results for the answer
xdata=np.random.normal(5e5,2e5,int(1e4))
plt.figure()
fig, axs = plt.subplots(2, figsize=(7,10), facecolor=(1, 1, 1))
# 1st plot
axs[0].hist(np.log10(xdata), bins=100) # plot the data on the normal x axis
axs[0].axes.xaxis.set_visible(False) # hide the normal x axis
# 2nd plot
axs[1].semilogy(xdata)
# 1st plot - twin axis
axs0_y_twin = axs[0].twiny() # set a twiny axis, note twiny y axis is hidden by default
axs0_y_twin.set(xscale="log")
# transfer the limits from the 2nd plot y axis to the twin axis
axs0_y_twin.set_xlim(axs[1].get_ylim()[0],
axs[1].get_ylim()[1])
# move the twin x axis from top to bottom
axs0_y_twin.tick_params(axis="x", which="both", bottom=True, top=False,
labelbottom=True, labeltop=False)
# Disclaimer
disclaimer_text = "Disclaimer: x axis ticks and labels don't represent the plotted data"
axs[0].text(0.5,-0.09, disclaimer_text, size=12, ha="center", color="red",
transform=axs[0].transAxes)
plt.tight_layout()
plt.subplots_adjust(hspace=0.2)
plt.show()

Related

Setting xticklabels and x-axis limits in a bar plot with matplotlib

I want to plot a bar graph with a variable amount of values along the x-axis. For the data, I have a set of labels which I want to show on the x-axis under the bars. I also want the x-axis limits to start at -1, since otherwise, only half of the first bar at index 0 would be visible. I've tried multiple alternatives for achieving that, none of them worked, because the xticklabels are always one or more off. And IF they work for a given set of data, with another set of data (with more or less bars) it does not work again. See minimum code example below
from matplotlib import pyplot as plt
from matplotlib import ticker
import numpy as np
randData = np.random.rand(100)
xValues = np.linspace(0, len(randData)-1, num=len(randData))
labels = []
for i in range(len(randData)):
labels.append('label' + str(i))
fig, ax = plt.subplots()
ax.bar(np.linspace(0, len(randData)-1, num=len(randData)), randData)
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
# Alternative 1
# Use an empty string for index -1, set labels, then set new xlim
labels.insert(0, '')
ax.set_xticklabels(labels, size='x-small', rotation=90)
plt.xlim(-1, len(randData))
# Alternative 2
# Use an empty string for index -1, set new xlim, then set labels
labels.insert(0, '')
plt.xlim(-1, len(randData))
ax.set_xticklabels(labels, size='x-small', rotation=90)
# Alternative 3
# Setting limits with ax.set_xlim
ax.set_xticklabels(labels, size='x-small', rotation=90)
ax.set_xlim([-1, len(randData)])
# Alternative 4
# Setting limits with plt.xlim
ax.set_xticklabels(labels, size='x-small', rotation=90)
plt.xlim(-1, len(randData))
plt.show()
None of the variants worked so far. One part of the problem is that the pyplot automatically sets its xlimits depending on the amount of bar graphs (sometimes it starts at -1, with more values it might sometimes start at -4).
One of the faulty results is shown below:
Any help would be appreciated.
P.S.: If I may, I'd like to add a little side question: How can I remove the Warning "UserWarning: FixedFormatter should only be used together with FixedLocator" when setting the xticklabels? Nothing from this answer worked for me.

Seaborn heatmap with variating cell sizes

I have a heatmap with ticks which have non equal deltas between themselves:
For example, in the attached image, the deltas are between 0.015 to 0.13. The current scale doesn't show the real scenario, since all cell sizes are equal.
Is there a way to place the ticks in their realistic positions, such that cell sizes would also change accordingly?
Alternatively, is there another method to generate this figure such that it would provide a realistic representation of the tick values?
As mentioned in the comments, a Seaborn heatmap uses categorical labels. However, the underlying structure is a pcolormesh, which can have different sizes for each cell.
Also mentioned in the comments, is that updating the private attributes of the pcolormesh isn't recommended. Moreover, the heatmap can be directly created calling pcolormesh.
Note that if there are N cells, there will be N+1 boundaries. The example code below supposes you have x-positions for the centers of the cells. It then calculates boundaries in the middle between successive cells. The first and the last distance is repeated.
The ticks and tick labels for x and y axis can be set from the given x-values. The example code supposes the original values indicate the centers of the cells.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
sns.set()
N = 10
xs = np.random.uniform(0.015, 0.13, 10).cumsum().round(3) # some random x values
values = np.random.rand(N, N) # a random matrix
# set bounds in the middle of successive cells, add extra bounds at start and end
bounds = (xs[:-1] + xs[1:]) / 2
bounds = np.concatenate([[2 * bounds[0] - bounds[1]], bounds, [2 * bounds[-1] - bounds[-2]]])
fig, ax = plt.subplots()
ax.pcolormesh(bounds, bounds, values)
ax.set_xticks(xs)
ax.set_xticklabels(xs, rotation=90)
ax.set_yticks(xs)
ax.set_yticklabels(xs, rotation=0)
plt.tight_layout()
plt.show()
PS: In case the ticks are mean to be the boundaries, the code can be simplified. One extra boundary is needed, for example a zero at the start.`
bounds = np.concatenate([[0], xs])
ax.tick_params(bottom=True, left=True)

Seaborn distplot() won't display frequency in the y-axis

I am trying to display the weighted frequency in the y-axis of a seaborn.distplot() graph, but it keeps displaying the density (which is the default in distplot())
I read the documentation and also many similar questions here in Stack.
The common answer is to set norm_hist=False and also to assign the weights in a bumpy array as in a standard histogram. However, it keeps showing the density and not the probability/frequency of each bin.
My code is
plt.figure(figsize=(10, 4))
plt.xlim(-0.145,0.145)
plt.axvline(0, color='grey')
data = df['col1']
x = np.random.normal(data.mean(), scale=data.std(), size=(100000))
normal_dist =sns.distplot(x, hist=False,color="red",label="Gaussian")
data_viz = sns.distplot(data,color="blue", bins=31,label="data", norm_hist=False)
# I also tried adding the weights inside the argument
#hist_kws={'weights': np.ones(len(data))/len(data)})
plt.legend(bbox_to_anchor=(1, 1), loc=1)
And I keep receiving this output:
Does anyone have an idea of what could be the problem here?
Thanks!
[EDIT]: The problem is that the y-axis is showing the kdevalues and not those from the weighted histogram. If I set kde=False then I can display the frequency in the y-axis. However, I still want to keep the kde, so I am not considering that option.
Keeping the kde and the frequency/count in one y-axis in one plot will not work because they have different scales. So it might be better to create a plot with 2 axis with each showing the kde and histogram separately.
From documentation norm_hist If True, the histogram height shows a density rather than a count. **This is implied if a KDE or fitted density is plotted**.
versusnja in https://github.com/mwaskom/seaborn/issues/479 has a workaround:
# Plot hist without kde.
# Create another Y axis.
# Plot kde without hist on the second Y axis.
# Remove Y ticks from the second axis.
first_ax = sns.distplot(data, kde=False)
second_ax = ax.twinx()
sns.distplot(data, ax=second_ax, kde=True, hist=False)
second_ax.set_yticks([])
If you need this just for visualization it should be good enough.

Python/Matplotlib - Colorbar indicating a mean value

Is it possible to indicate a mean value on the colorbar?
I have the following plots, showing the surface "temperature" (Radiance) of a sahara section, and now it would be nice to see the mean value on the colorbar indicated by an arrow or something.
The difference between the plots is the band/channel/wavelength the measurement was taken in and there is a slight difference. Especially, when I'm going to compare the data from season to season.
When you add a colorbar to the plot using plt.colorbar(), matplotlib creates a new axis for the colorbar returns the colorbar object. The axis the colorbar is plotted on is scaled from 0 to 1 in both x and y and is referenced as the .ax property of the colorbar object. We can use value min and max from the colorbar to map where on the axis the mean should be drawn.
import numpy as np
from matplotlib import pyplot as plt
data = np.random.normal(1, 4, 450).reshape(-1, 15)
plt.imshow(data)
# capture the colorbar object, rescale mean to the axis
cb = plt.colorbar()
mean_loc = (data.mean() - cb.vmin) / (cb.vmax - cb.vmin)
# add a horizontal line to the colorbar axis
cb.ax.hlines(mean_loc, 0, 1)
plt.show()

matplotlib: align y-ticks in twinx [duplicate]

I created a matplotlib plot that has 2 y-axes. The y-axes have different scales, but I want the ticks and grid to be aligned. I am pulling the data from excel files, so there is no way to know the max limits beforehand. I have tried the following code.
# creates double-y axis
ax2 = ax1.twinx()
locs = ax1.yaxis.get_ticklocs()
ax2.set_yticks(locs)
The problem now is that the ticks on ax2 do not have labels anymore. Can anyone give me a good way to align ticks with different scales?
Aligning the tick locations of two different scales would mean to give up on the nice automatic tick locator and set the ticks to the same positions on the secondary axes as on the original one.
The idea is to establish a relation between the two axes scales using a function and set the ticks of the second axes at the positions of those of the first.
import matplotlib.pyplot as plt
import matplotlib.ticker
fig, ax = plt.subplots()
# creates double-y axis
ax2 = ax.twinx()
ax.plot(range(5), [1,2,3,4,5])
ax2.plot(range(6), [13,17,14,13,16,12])
ax.grid()
l = ax.get_ylim()
l2 = ax2.get_ylim()
f = lambda x : l2[0]+(x-l[0])/(l[1]-l[0])*(l2[1]-l2[0])
ticks = f(ax.get_yticks())
ax2.yaxis.set_major_locator(matplotlib.ticker.FixedLocator(ticks))
plt.show()
Note that this is a solution for the general case and it might result in totally unreadable labels depeding on the use case. If you happen to have more a priori information on the axes range, better solutions may be possible.
Also see this question for a case where automatic tick locations of the first axes is sacrificed for an easier setting of the secondary axes tick locations.
To anyone who's wondering (and for my future reference), the lambda function f in ImportanceofBeingErnest's answer maps the input left tick to a corresponding right tick through:
RHS tick = Bottom RHS tick + (% of LHS range traversed * RHS range)
Refer to this question on tick formatting to truncate decimal places:
from matplotlib.ticker import FormatStrFormatter
ax2.yaxis.set_major_formatter(FormatStrFormatter('%.2f')) # ax2 is the RHS y-axis

Categories

Resources