Problem with scaling two different y-axis on matplotlib - python

I want to plot a dataset on one x-axis and two y-axes (eV and nm). The two y-axis are linked together with the equation: nm = 1239.8/eV.
As you can see from my picture output, the values are not in the correct positions. For instance, at eV = 0.5 I need to have nm = 2479.6, at eV = 2.9, nm = 423, etc…
How can I fix this?
My data.txt:
number eV nm
1 2.573 481.9
2 2.925 423.9
3 3.174 390.7
4 3.242 382.4
5 3.387 366.1
The code I am using:
#!/usr/bin/env python3
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.ticker as tck
# data handling
file = "data.txt"
df = pd.read_csv(file, delimiter=" ") # generate a DataFrame with data
no = df[df.columns[0]]
eV = df[df.columns[1]].round(2) # first y-axis
nm = df[df.columns[2]].round(1) # second y-axis
# generate a subplot 1x1
fig, ax1 = plt.subplots(1,1)
# first Axes object, main plot (lollipop plot)
ax1.stem(no, eV, markerfmt=' ', basefmt=" ", linefmt='blue', label="Gas")
ax1.set_ylim(0.5,4)
ax1.yaxis.set_minor_locator(tck.MultipleLocator(0.5))
ax1.set_xlabel('Aggregation', labelpad=12)
ax1.set_ylabel('Transition energy [eV]', labelpad=12)
# adding second y-axis
ax2 = ax1.twinx()
ax2.set_ylim(2680,350) # set the corresponding ymax and ymin,
# but the values are not correct anyway
ax2.set_yticklabels(nm)
ax2.set_ylabel('Wavelength [nm]', labelpad=12)
# save
plt.tight_layout(pad=1.5)
plt.show()
The resulting plot is the following. I just would like to obtain a second axis by dividing the first one by 1239.8, and I don't know what else to look for!

You can use ax.secondary_yaxis, as described in this example. See the below code for an implementation for your problem. I have only included the part of the code relevant for the second y axis.
# adding second y-axis
def eV_to_nm(eV):
return 1239.8 / eV
def nm_to_eV(nm):
return 1239.8 / nm
ax2 = ax1.secondary_yaxis('right', functions=(eV_to_nm, nm_to_eV))
ax2.set_yticks(nm)
ax2.set_ylabel('Wavelength [nm]', labelpad=12)
Note that I am also using set_yticks instead of set_yticklabels. Furthermore, if you remove set_yticks, matplotlib will automatically determine y tick positions assuming a linear distribution of y ticks. However, because nm is inversely proportional to eV, this will lead to a (most likely) undesirable distribution of y ticks. You can manually change these using a different set of values in set_yticks.

I figured out how to solve this problem (source of the hint here).
So, for anyone who needs to have one dataset with one x-axis but two y-axes (one mathematically related to the other), a working solution is reported. Basically, the problem is to have the same ticks as the main y-axis, but change them proportionally, according to their mathematical relationship (that is, in this case, nm = 1239.8/eV). The following code has been tested and it is working.
This method of course works if you have two x-axes and 1 shared y-axis, etc.
Important note: you must define an y-range (or x-range if you want the opposite result), otherwise you might get some scaling problems.
#!/usr/bin/env python3
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.ticker as tck
from matplotlib.text import Text
# data
file = "data.txt"
df = pd.read_csv(file, delimiter=" ") # generate a DataFrame with data
no = df[df.columns[0]]
eV = df[df.columns[1]].round(2) # first y-axis
nm = df[df.columns[2]].round(1) # second y-axis
# generate a subplot 1x1
fig, ax1 = plt.subplots(1,1)
# first Axes object, main plot (lollipop plot)
ax1.stem(no, eV, markerfmt=' ', basefmt=" ", linefmt='blue', label="Gas")
ax1.set_ylim(0.5,4)
ax1.yaxis.set_minor_locator(tck.MultipleLocator(0.5))
ax1.set_xlabel('Aggregation', labelpad=12)
ax1.set_ylabel('Transition energy [eV]', labelpad=12)
# function that correlates the two y-axes
def eV_to_nm(eV):
return 1239.8 / eV
# adding a second y-axis
ax2 = ax1.twinx() # share x axis
ax2.set_ylim(ax1.get_ylim()) # set the same range over y
ax2.set_yticks(ax1.get_yticks()) # put the same ticks as ax1
ax2.set_ylabel('Wavelength [nm]', labelpad=12)
# change the labels of the second axis by apply the mathematical
# function that relates the two axis to each tick of the first
# axis, and then convert it to text
# This way you have the same axis as y1 but with the same ticks scaled
ax2.set_yticklabels([Text(0, yval, f'{eV_to_nm(yval):.1f}')
for yval in ax1.get_yticks()])
# show the plot
plt.tight_layout(pad=1.5)
plt.show()
data.txt is the same as above:
number eV nm
1 2.573 481.9
2 2.925 423.9
3 3.174 390.7
4 3.242 382.4
5 3.387 366.1
Output image here

Related

Set log xticks in matplotlib for a linear plot

Consider
xdata=np.random.normal(5e5,2e5,int(1e4))
plt.hist(np.log10(xdata), bins=100)
plt.show()
plt.semilogy(xdata)
plt.show()
is there any way to display xticks of the first plot (plt.hist) as in the second plot's yticks? For good reasons I want to histogram the np.log10(xdata) of xdata but I'd like to set minor ticks to display as usual in a log scale (even considering that the exponent is linear...)
In other words, I want the x_axis of this plot:
to be like the y_axis
of the 2nd plot, without changing the spacing between major ticks (e.g., adding log marks between 5.5 and 6.0, without altering these values)
Proper histogram plot with logarithmic x-axis:
Explanation:
Cut off negative values
The randomly generated example data likely contains still some negative values
activate the commented code lines at the beginning to see the effect
logarithmic function isn't defined for values <= 0
while the 2nd plot just deals with y-axis log scaling (negative values are just out of range), the 1st plot doesn't work with negative values in the BINs range
probably real world working data won't be <= 0, otherwise keep that in mind
BINs should be aligned to log scale as well
otherwise the 'BINs widths' distribution looks off
switch # on the plt.hist( statements in the 1st plot section to see the effect)
xdata (not np.log10(xdata)) to be plotted in the histogram
that 'workaround' with plotting np.log10(xdata) probably was the root cause for the misunderstanding in the comments
Code:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42) # just to have repeatable results for the answer
xdata=np.random.normal(5e5,2e5,int(1e4))
# MIN_xdata, MAX_xdata = np.min(xdata), np.max(xdata)
# print(f"{MIN_xdata}, {MAX_xdata}") # note the negative values
# cut off potential negative values (log function isn't defined for <= 0 )
xdata = np.ma.masked_less_equal(xdata, 0)
MIN_xdata, MAX_xdata = np.min(xdata), np.max(xdata)
# print(f"{MIN_xdata}, {MAX_xdata}")
# align the bins to fit a log scale
bins = 100
bins_log_aligned = np.logspace(np.log10(MIN_xdata), np.log10(MAX_xdata), bins)
# 1st plot
plt.hist(xdata, bins = bins_log_aligned) # note: xdata (not np.log10(xdata) )
# plt.hist(xdata, bins = 100)
plt.xscale('log')
plt.show()
# 2nd plot
plt.semilogy(xdata)
plt.show()
Just kept for now for clarification purpose. Will be deleted when the question is revised.
Disclaimer:
As Lucas M. Uriarte already mentioned that isn't an expected way of changing axis ticks.
x axis ticks and labels don't represent the plotted data
You should at least always provide that information along with such a plot.
The plot
From seeing the result I kinda understand where that special plot idea is coming from - still there should be a preferred way (e.g. conversion of the data in advance) to do such a plot instead of 'faking' the axis.
Explanation how that special axis transfer plot is done:
original x-axis is hidden
a twiny axis is added
note that its y-axis is hidden by default, so that doesn't need handling
twiny x-axis is set to log and the 2nd plot y-axis limits are transferred
subplots used to directly transfer the 2nd plot y-axis limits
use variables if you need to stick with your two plots
twiny x-axis is moved from top (twiny default position) to bottom (where the original x-axis was)
Code:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42) # just to have repeatable results for the answer
xdata=np.random.normal(5e5,2e5,int(1e4))
plt.figure()
fig, axs = plt.subplots(2, figsize=(7,10), facecolor=(1, 1, 1))
# 1st plot
axs[0].hist(np.log10(xdata), bins=100) # plot the data on the normal x axis
axs[0].axes.xaxis.set_visible(False) # hide the normal x axis
# 2nd plot
axs[1].semilogy(xdata)
# 1st plot - twin axis
axs0_y_twin = axs[0].twiny() # set a twiny axis, note twiny y axis is hidden by default
axs0_y_twin.set(xscale="log")
# transfer the limits from the 2nd plot y axis to the twin axis
axs0_y_twin.set_xlim(axs[1].get_ylim()[0],
axs[1].get_ylim()[1])
# move the twin x axis from top to bottom
axs0_y_twin.tick_params(axis="x", which="both", bottom=True, top=False,
labelbottom=True, labeltop=False)
# Disclaimer
disclaimer_text = "Disclaimer: x axis ticks and labels don't represent the plotted data"
axs[0].text(0.5,-0.09, disclaimer_text, size=12, ha="center", color="red",
transform=axs[0].transAxes)
plt.tight_layout()
plt.subplots_adjust(hspace=0.2)
plt.show()

Adding two smaller subplots to the side of my main plot in matplotlib subplots

Currently my chart is showing only the main big chart on the left.
However, I now want to add the two smaller plots to the right-hand side of my main plot; with each individual set of data.
I am struggling with subplots to figure out how to do this. My photo below shows my desired output.
filenamesK = glob("C:/Users/Ke*.csv")
filenamesZ = glob("C:/Users/Ze*.csv")
K_Z_Averages = {'K':[], 'Z':[]}
# We will create a function for plotting, instead of nesting lots of if statements within a long for-loop.
def plot_data(filename, fig_ax, color):
df = pd.read_csv(f, sep=',',skiprows=24) # Read in the csv.
df.columns=['sample','Time','ms','Temp1'] # Set the column names
df=df.astype(str) # Set the data type as a string.
df["Temp1"] = df["Temp1"].str.replace('\+ ', '').str.replace(' ', '').astype(float) # Convert to float
# Take the average of the data from the Temp1 column, starting from sample 60 until sample 150.
avg_Temp1 = df.iloc[60-1:150+1]["Temp1"].mean()
# Append this average to a K_Z_Averages, containing a column for average from each K file and the average from each Z file.
# Glob returns the whole path, so you need to replace 0 for 10.
K_Z_Averages[os.path.basename(filename)[0]].append(avg_Temp1)
fig_ax.plot(df[["Temp1"]], color=color)
fig, ax = plt.subplots(figsize=(20, 15))
for f in filenamesK:
plot_data(f, ax, 'blue')
for f in filenamesZ:
plot_data(f, ax, 'red')
plt.show()
#max 's answer is fine, but something you can also do matplotlib>=3.3 is
import matplotlib.pyplot as plt
fig = plt.figure(constrained_layout=True)
axs = fig.subplot_mosaic([['Left', 'TopRight'],['Left', 'BottomRight']],
gridspec_kw={'width_ratios':[2, 1]})
axs['Left'].set_title('Plot on Left')
axs['TopRight'].set_title('Plot Top Right')
axs['BottomRight'].set_title('Plot Bottom Right')
Note hw the repeated name 'Left' is used twice to indicate that this subplot takes up two slots in the layout. Also note the use of width_ratios.
This is a tricky question. Essentially, you can place a grid on a figure (add_gridspec()) and than open subplots (add_subplot()) in and over different grid elements.
import matplotlib.pyplot as plt
# open figure
fig = plt.figure()
# add grid specifications
gs = fig.add_gridspec(2, 3)
# open axes/subplots
axs = []
axs.append( fig.add_subplot(gs[:,0:2]) ) # large subplot (2 rows, 2 columns)
axs.append( fig.add_subplot(gs[0,2]) ) # small subplot (1st row, 3rd column)
axs.append( fig.add_subplot(gs[1,2]) ) # small subplot (2nd row, 3rd column)

Change colour scheme label to log scale without changing the axis in matplotlib

I am quite new to python programming. I have a script with me that plots out a heat map using matplotlib. Range of X-axis value = (-180 to +180) and Y-axis value =(0 to 180). The 2D heatmap colours areas in Rainbow according to the number of points occuring in a specified area in the x-y graph (defined by the 'bin' (see below)).
In this case, x = values_Rot and y = values_Tilt (see below for code).
As of now, this script colours the 2D-heatmap in the linear scale. How do I change this script such that it colours the heatmap in the log scale? Please note that I only want to change the heatmap colouring scheme to log-scale, i.e. only the number of points in a specified area. The x and y-axis stay the same in linear scale (not in logscale).
A portion of the code is here.
rot_number = get_header_number(headers, AngleRot)
tilt_number = get_header_number(headers, AngleTilt)
psi_number = get_header_number(headers, AnglePsi)
values_Rot = []
values_Tilt = []
values_Psi = []
for line in data:
try:
values_Rot.append(float(line.split()[rot_number]))
values_Tilt.append(float(line.split()[tilt_number]))
values_Psi.append(float(line.split()[psi_number]))
except:
print ('This line didnt work, it may just be a blank space. The line is:' + line)
# Change the values here if you want to plot something else, such as psi.
# You can also change how the data is binned here.
plt.hist2d(values_Rot, values_Tilt, bins=25,)
plt.colorbar()
plt.show()
plt.savefig('name_of_output.png')
You can use a LogNorm for the colors, using plt.hist2d(...., norm=LogNorm()). Here is a comparison.
To have the ticks in base 2, the developers suggest adding the base to the LogLocator and the LogFormatter. As in this case the LogFormatter seems to write the numbers with one decimal (.0), a StrMethodFormatter can be used to show the number without decimals. Depending on the range of numbers, sometimes the minor ticks (shorter marker lines) also get a string, which can be suppressed assigning a NullFormatter for the minor colorbar ticks.
Note that base 2 and base 10 define exactly the same color transformation. The position and the labels of the ticks are different. The example below creates two colorbars to demonstrate the different look.
import matplotlib.pyplot as plt
from matplotlib.ticker import NullFormatter, StrMethodFormatter, LogLocator
from matplotlib.colors import LogNorm
import numpy as np
from copy import copy
# create some toy data for a standalone example
values_Rot = np.random.randn(100, 10).cumsum(axis=1).ravel()
values_Tilt = np.random.randn(100, 10).cumsum(axis=1).ravel()
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(15, 4))
cmap = copy(plt.get_cmap('hot'))
cmap.set_bad(cmap(0))
_, _, _, img1 = ax1.hist2d(values_Rot, values_Tilt, bins=40, cmap='hot')
ax1.set_title('Linear norm for the colors')
fig.colorbar(img1, ax=ax1)
_, _, _, img2 = ax2.hist2d(values_Rot, values_Tilt, bins=40, cmap=cmap, norm=LogNorm())
ax2.set_title('Logarithmic norm for the colors')
fig.colorbar(img2, ax=ax2) # default log 10 colorbar
cbar2 = fig.colorbar(img2, ax=ax2) # log 2 colorbar
cbar2.ax.yaxis.set_major_locator(LogLocator(base=2))
cbar2.ax.yaxis.set_major_formatter(StrMethodFormatter('{x:.0f}'))
cbar2.ax.yaxis.set_minor_formatter(NullFormatter())
plt.show()
Note that log(0) is minus infinity. Therefore, the zero values in the left plot (darkest color) are left empty (white background) on the plot with the logarithmic color values. If you just want to use the lowest color for these zeros, you need to set a 'bad' color. In order not the change a standard colormap, the latest matplotlib versions wants you to first make a copy of the colormap.
PS: When calling plt.savefig() it is important to call it before plt.show() because plt.show() clears the plot.
Also, try to avoid the 'jet' colormap, as it has a bright yellow region which is not at the extreme. It may look nice, but can be very misleading. This blog article contains a thorough explanation. The matplotlib documentation contains an overview of available colormaps.
Note that to compare two plots, plt.subplots() needs to be used, and instead of plt.hist2d, ax.hist2d is needed (see this post). Also, with two colorbars, the elements on which the colorbars are based need to be given as parameter. A minimal change to your code would look like:
from matplotlib.ticker import NullFormatter, StrMethodFormatter, LogLocator
from matplotlib.colors import LogNorm
from matplotlib import pyplot as plt
from copy import copy
# ...
# reading the data as before
cmap = copy(plt.get_cmap('magma'))
cmap.set_bad(cmap(0))
plt.hist2d(values_Rot, values_Tilt, bins=25, cmap=cmap, norm=LogNorm())
cbar = plt.colorbar()
cbar.ax.yaxis.set_major_locator(LogLocator(base=2))
cbar.ax.yaxis.set_major_formatter(StrMethodFormatter('{x:.0f}'))
cbar.ax.yaxis.set_minor_formatter(NullFormatter())
plt.savefig('name_of_output.png') # needs to be called prior to plt.show()
plt.show()

How do I shift categorical scatter markers to left and right above xticks (multiple data sets per category)?

I have a simple pandas dataframe that I want to plot with matplotlib:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel('SAT_data.xlsx', index_col = 'State')
plt.figure()
plt.scatter(df['Year'], df['Reading'], c = 'blue', s = 25)
plt.scatter(df['Year'], df['Math'], c = 'orange', s = 25)
plt.scatter(df['Year'], df['Writing'], c = 'red', s = 25)
Here is what my plot looks like:
I'd like to shift the blue data points a bit to the left, and the red ones a bit to the right, so each year on the x-axis has three mini-columns of scatter data above it instead of all three datasets overlapping. I tried and failed to use the 'verts' argument properly. Is there a better way to do this?
Using an offset transform would allow to shift the scatter points by some amount in units of points instead of data units. The advantage is that they would then always sit tight against each other, independent of the figure size, zoom level etc.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(0)
import matplotlib.transforms as transforms
year = np.random.choice(np.arange(2006,2017), size=(300) )
values = np.random.rand(300, 3)
plt.figure()
offset = lambda p: transforms.ScaledTranslation(p/72.,0, plt.gcf().dpi_scale_trans)
trans = plt.gca().transData
sc1 = plt.scatter(year, values[:,0], c = 'blue', s = 25, transform=trans+offset(-5))
plt.scatter(year, values[:,1], c = 'orange', s = 25)
plt.scatter(year, values[:,2], c = 'red', s = 25, transform=trans+offset(5))
plt.show()
Broad figure:
Normal figure:
Zoom
Some explanation:
The problem is that we want to add an offset in points to some data in data coordinates. While data coordinates are automatically transformed to display coordinates using the transData (which we normally don't even see on the surface), adding some offset requires us to change the transform.
We do this by adding an offset. While we could just add an offset in pixels (display coordinates), it is more convenient to add the offset in points and thereby using the same unit as the size of the scatter points is given in (their size is points squared actually).
So we want to know how many pixels are p points? This is found out by dividing p by the ppi (points per inch) to obtain inches, and then by multiplying by the dpi (dots per inch) to obtain the display pixel. This calculation in done in the ScaledTranslation.
While the dots per inch are in principle variable (and taken care of by the dpi_scale_trans transform), the points per inch are fixed. Matplotlib uses 72 ppi, which is kind of a typesetting standard.
A quick and dirty way would be to create a small offset dx and subtract it from x values of blue points and add to x values of red points.
dx = 0.1
plt.scatter(df['Year'] - dx, df['Reading'], c = 'blue', s = 25)
plt.scatter(df['Year'], df['Math'], c = 'orange', s = 25)
plt.scatter(df['Year'] + dx, df['Writing'], c = 'red', s = 25)
One more option could be to use stripplot function from seaborn library. It would be necessary to melt the original dataframe into long form so that each row contains a year, a test and a score. Then make a stripplot specifying year as x, score as y and test as hue. The split keyword argument is what controls plotting categories as separate stripes for each x. There's also the jitter argument that will add some noise to x values so that they take up some small area instead of being on a single vertical line.
import pandas as pd
import seaborn as sns
# make up example data
np.random.seed(2017)
df = pd.DataFrame(columns = ['Reading','Math','Writing'],
data = np.random.normal(540,30,size=(1000,3)))
df['Year'] = np.random.choice(np.arange(2006,2016),size=1000)
# melt the data into long form
df1 = pd.melt(df, var_name='Test', value_name='Score',id_vars=['Year'])
# make a stripplot
fig, ax = plt.subplots(figsize=(10,7))
sns.stripplot(data = df1, x='Year', y = 'Score', hue = 'Test',
jitter = True, split = True, alpha = 0.7,
palette = ['blue','orange','red'])
Output:
Here is how the given code can be adapted to work with multiple subplots, and also to a situation without "middle column".
To adapt the given code, ax[n,p].transData is needed instead of plt.gca().transData. plt.gca() refers to the last created subplot, while now you'll need the transform of each individual subplot.
Another problem is that when only plotting via a transform, matplotlib doesn't automatically sets the lower and upper limits of the subplot. In the given example plots the points "in the middle" without setting a specific transform, and the plot gets "zoomed out" around these points (orange in the example).
If you don't have points at the center, the limits need to be set in another way. The way I came up with, is plotting some dummy points in the middle (which sets the zooming limits), and remove those again.
Also note that the size of the scatter dots in given as the square of their diameter (measured in "unit points"). To have touching dots, you'd need to use the square root for their offset.
import matplotlib.pyplot as plt
from matplotlib import transforms
import numpy as np
# Set up data for reproducible example
year = np.random.choice(np.arange(2006, 2017), size=(100))
data = np.random.rand(4, 100, 3)
data2 = np.random.rand(4, 100, 3)
# Create plot and set up subplot ax loop
fig, axs = plt.subplots(2, 2, figsize=(18, 14))
# Set up offset with transform
offset = lambda p: transforms.ScaledTranslation(p / 72., 0, plt.gcf().dpi_scale_trans)
# Plot data in a loop
for ax, q, r in zip(axs.flat, data, data2):
temp_points = ax.plot(year, q, ls=' ')
for pnt in temp_points:
pnt.remove()
ax.plot(year, q, marker='.', ls=' ', ms=10, c='b', transform=ax.transData + offset(-np.sqrt(10)))
ax.plot(year, r, marker='.', ls=' ', ms=10, c='g', transform=ax.transData + offset(+np.sqrt(10)))
plt.show()

Scale colormap for contour and contourf

I'm trying to plot the contour map of a given function f(x,y), but since the functions output scales really fast, I'm losing a lot of information for lower values of x and y. I found on the forums to work that out using vmax=vmax, it actually worked, but only when plotted for a specific limit of x and y and levels of the colormap.
Say I have this plot:
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
u = np.linspace(-2,2,1000)
x,y = np.meshgrid(u,u)
z = (1-x)**2+100*(y-x**2)**2
cont = plt.contour(x,y,z,500,colors='black',linewidths=.3)
cont = plt.contourf(x,y,z,500,cmap="jet",vmax=100)
plt.colorbar(cont)
plt.show
I want to uncover whats beyond the axis limits keeping the same scale, but if I change de x and y limits to -3 and 3 I get:
See how I lost most of my levels since my max value for the function at these limits are much higher. A work around to this problem is to increase the levels to 1000, but that takes a lot of computational time.
Is there a way to plot only the contour levels that I need? That is, between 0 and 100.
An example of a desired output would be:
With the white space being the continuation of the plot without resizing the levels.
The code I'm using is the one given after the first image.
There are a few possible ideas here. The one I very much prefer is a logarithmic representation of the data. An example would be
from matplotlib import ticker
fig = plt.figure(1)
cont1 = plt.contourf(x,y,z,cmap="jet",locator=ticker.LogLocator(numticks=10))
plt.colorbar(cont1)
plt.show()
fig = plt.figure(2)
cont2 = plt.contourf(x,y,np.log10(z),100,cmap="jet")
plt.colorbar(cont2)
plt.show()
The first example uses matplotlibs LogLocator functions. The second one just directly computes the logarithm of the data and plots that normally.
The third example just caps all data above 100.
fig = plt.figure(3)
zcapped = z.copy()
zcapped[zcapped>100]=100
cont3 = plt.contourf(x,y,zcapped,100,cmap="jet")
cbar = plt.colorbar(cont3)
plt.show()

Categories

Resources