I have 2 lists, each has 128 elements
x = [1,2,3,...,128]
y = [y1,y2,...,y128]
How should I use matplotlib to plot (x,y) with x axis appearing as shown in this screenshot?
To replicate the graph, I have (1) created 2 additional lists from the original lists, and (2) used set_xticklabels:
f, ax1 = plt.subplots(1,1,figsize=(16,7))
x1 = [1, 2, 4, 8, 16, 32, 64, 128]
y1 = [y[0],y[1],y[3],y[7],y[15],y[31],y[63],y[127]]
line1 = ax1.plot(x1,y1,label="Performance",color='b',linestyle="-")
ax1.set_xticklabels([0,1,2,4,8,16,32,64,128])
ax1.set_xlabel('Time Period',fontsize=15)
ax1.set_ylabel("Value",color='b',fontsize=15)
The problem with this approach is that only 8 pairs of value are plotted, and 120 pairs are ommitted.
If my comments aren't clear enough, please, ask. :)
from matplotlib import pyplot as plt
# Instanciating my lists...
f = lambda x:x**2
x = [nb for nb in range(1, 129)]
y = [f(nb) for nb in x]
# New values you want to plot, with linear spacing.
indexes_to_keep = [1, 2, 4, 8, 16, 32, 64, 128]
y_to_use = [y[nb - 1] for nb in indexes_to_keep]
# First plot that shows the 128 points as a whole.
fig = plt.figure(figsize=(10, 5.4))
ax1 = fig.add_subplot(121)
ax1.plot(x, y)
ax1.set_title('Former values')
# Second plot that shows only the indexes you wish to keep.
ax2 = fig.add_subplot(122)
# my_ticks = [1, 2, 3, 4, 5, 6, 7]
# meaning : my_ticks will be linear values.
my_ticks = [i for i in range(len(indexes_to_keep))]
# We set the ticks we want to show, meaning : all our list
# instead of some linear spacing matplotlib will show by default
ax2.set_xticks(my_ticks)
# Then, we manually change the name of the X ticks.
ax2.set_xticklabels(indexes_to_keep)
# We will then, plot the LINEAR x axis,
# but with respect to the y-axis values pre-processed.
ax2.plot(my_ticks, y_to_use)
ax2.set_title('New selected values with linear spacing')
plt.show()
Showing...
What you are looking for is a logarithmic scale with base 2. matplotlib provides logarithmic scales and you can define any base you want:
from matplotlib import pyplot as plt
from matplotlib.ticker import ScalarFormatter
#sample data
x = list(range(1, 130))
y = list(range(3, 260, 2))
f, ax1 = plt.subplots(1,1,figsize=(16,7))
x1 = [ 1, 2, 4, 8, 16, 32, 64, 128]
y1 = [y[0],y[1],y[3],y[7],y[15],y[31],y[63],y[127]]
#just the points, where the ticks are
ax1.plot(x1, y1,"bo-", label = "Performance")
#all other points to contrast this
ax1.plot(x, [270 - i for i in y], "rx-", label = "anti-Performance")
#transform x axis into logarithmic scale with base 2
plt.xscale("log", basex = 2)
#modify x axis ticks from exponential representation to float
ax1.get_xaxis().set_major_formatter(ScalarFormatter())
ax1.set_xlabel('Time Period',fontsize=15)
ax1.set_ylabel("Value",color='b',fontsize=15)
plt.legend()
plt.show()
Output:
Related
I want to create a heatmap in seaborn, and have a nice way to see the labels.
With ax.figure.tight_layout(), I am getting
which is obviously bad.
Without ax.figure.tight_layout(), the labels get cropped.
The code is
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sn
n_classes = 10
confusion = np.random.randint(low=0, high=100, size=(n_classes, n_classes))
label_length = 20
label_ind_by_names = {
"A"*label_length: 0,
"B"*label_length: 1,
"C"*label_length: 2,
"D"*label_length: 3,
"E"*label_length: 4,
"F"*label_length: 5,
"G"*label_length: 6,
"H"*label_length: 7,
"I"*label_length: 8,
"J"*label_length: 9,
}
# confusion matrix
df_cm = pd.DataFrame(
confusion,
index=label_ind_by_names.keys(),
columns=label_ind_by_names.keys()
)
plt.figure()
sn.set(font_scale=1.2)
ax = sn.heatmap(df_cm, annot=True, annot_kws={"size": 16}, fmt='d')
# ax.figure.tight_layout()
plt.show()
I would like to create an extra legend based on label_ind_by_names, then post an abbreviation on the heatmap itself, and be able to look up the abbreviation in the legend.
How can this be done in seaborn?
You can define your own legend handler, e.g. for integers:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sn
n_classes = 10
confusion = np.random.randint(low=0, high=100, size=(n_classes, n_classes))
label_length = 20
label_ind_by_names = {
"A"*label_length: 0,
"B"*label_length: 1,
"C"*label_length: 2,
"D"*label_length: 3,
"E"*label_length: 4,
"F"*label_length: 5,
"G"*label_length: 6,
"H"*label_length: 7,
"I"*label_length: 8,
"J"*label_length: 9,
}
# confusion matrix
df_cm = pd.DataFrame(
confusion,
index=label_ind_by_names.values(),
columns=label_ind_by_names.values()
)
fig, ax = plt.subplots(figsize=(10, 5))
fig.subplots_adjust(left=0.05, right=.65)
sn.set(font_scale=1.2)
sn.heatmap(df_cm, annot=True, annot_kws={"size": 16}, fmt='d', ax=ax)
class IntHandler:
def legend_artist(self, legend, orig_handle, fontsize, handlebox):
x0, y0 = handlebox.xdescent, handlebox.ydescent
text = plt.matplotlib.text.Text(x0, y0, str(orig_handle))
handlebox.add_artist(text)
return text
ax.legend(label_ind_by_names.values(),
label_ind_by_names.keys(),
handler_map={int: IntHandler()},
loc='upper left',
bbox_to_anchor=(1.2, 1))
plt.show()
Explanation of the hard-coded figures: the first two are the left and right extreme positions of the Axes in the figure (0.05 = 5 % for the figure width etc). 1.2 and 1 is the location of the upper left corner of the legend box relative to the Axes (1, 1 is the upper right corner of the Axes, we add 0.2 to 1 to account for the space used by the colorbar). Ideally one would use a constrained layout instead of fiddeling with the parameters but it doesn't (yet) support figure legends and if using an Axes legend, it places it between the Axes and the colorbar.
EDIT: My question was closed because someone thought another question was responding to it (but it doesn't: Matplotlib different size subplots). To clarify what I want:
I would like to replicate something like what is done on this photo: having a 3rd dataset plotted on top of 2 subplots, with its y-axis displayed on the right.
I have 3 datasets spanning the same time interval (speed, position, precipitation). I would like to plot the speed and position in 2 horizontal subplots, and the precipitation spanning the 2 subplots.
For example in the code below, instead of having the twinx() only on the first subplot, I would like to have it overlap the two subplots (ie. on the right side have a y-axis with 0 at the bottom right of the 2nd subplot, and 20 at the top right of the 1st subplot).
I could I achieve that ?
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(2,1,figsize=(20,15), dpi = 600)
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
ax[0].plot(x,y, label = 'speed')
plt.legend()
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
ax[1].plot(x,y, label = 'position')
plt.legend()
#plot 3:
x = np.array([0, 1, 2, 3])
y = np.array([10, 0, 4, 20])
ax2=ax[0].twinx()
ax2.plot(x,y, label = 'precipitation')
plt.legend(loc='upper right')
plt.show()
Best way I found is not very elegant but it works:
# Prepare 2 subplots
fig, ax = plt.subplots(2,1,figsize=(20,15), dpi = 600)
#plot 1:
# Dummy values for plotting
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
ax[0].plot(x,y, label = 'speed')
# Prints the legend
plt.legend()
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
ax[1].plot(x,y, label = 'position')
plt.legend()
#plot 3:
x = np.array([0, 1, 2, 3])
y = np.array([10, 0, 4, 20])
# Add manually a 3rd subplot that stands on top of the 2 others
ax2 = fig.add_subplot(111, label="new subplot", facecolor="none")
# Move the y-axis to the right otherwise it will overlap with the ones on the left
ax2.yaxis.set_label_position("right")
# "Erase" every tick and label of this 3rd plot
ax2.tick_params(left=False, right=True, labelleft=False, labelright=True,
bottom=False, labelbottom=False)
# This line merges the x axes of the 1st and 3rd plot, and indicates
# that the y-axis of the 3rd plot will be drawn on the entirety of the
# figure instead of just 1 subplot (because fig.add_subplot(111) makes it spread on the entirety of the figure)
ax[0].get_shared_x_axes().join(ax[0],ax2)
ax2.plot(x,y, label = 'precipitation')
# Prints the legend in the upper right corner
plt.legend(loc='upper right')
plt.show()
This is a follow-up to my previous couple of questions. Here's the code I'm playing with:
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats
import numpy as np
dictOne = {'Name':['First', 'Second', 'Third', 'Fourth', 'Fifth', 'Sixth', 'Seventh', 'Eighth', 'Ninth'],
"A":[1, 2, -3, 4, 5, np.nan, 7, np.nan, 9],
"B":[4, 5, 6, 5, 3, np.nan, 2, 9, 5],
"C":[7, np.nan, 10, 5, 8, 6, 8, 2, 4]}
df2 = pd.DataFrame(dictOne)
column = 'B'
df2[df2[column] > -999].hist(column, alpha = 0.5)
param = stats.norm.fit(df2[column].dropna()) # Fit a normal distribution to the data
print(param)
pdf_fitted = stats.norm.pdf(df2[column], *param)
plt.plot(pdf_fitted, color = 'r')
I'm trying to make a histogram of the numbers in a single column in the dataframe -- I can do this -- but with an overlaid normal curve...something like the last graph on here. I'm trying to get it working on this toy example so that I can apply it to my much larger dataset for real. The code I've pasted above gives me this graph:
Why doesn't pdf_fitted match the data in this graph? How can I overlay the proper PDF?
You should plot the histogram with density=True if you hope to compare it to a true PDF. Otherwise your normalization (amplitude) will be off.
Also, you need to specify the x-values (as an ordered array) when you plot the pdf:
fig, ax = plt.subplots()
df2[df2[column] > -999].hist(column, alpha = 0.5, density=True, ax=ax)
param = stats.norm.fit(df2[column].dropna())
x = np.linspace(*df2[column].agg([min, max]), 100) # x-values
plt.plot(x, stats.norm.pdf(x, *param), color = 'r')
plt.show()
As an aside, using a histogram to compare continuous variables with a distribution is isn't always the best. (Your sample data are discrete, but the link uses a continuous variable). The choice of bins can alias the shape of your histogram, which may lead to incorrect inference. Instead, the ECDF is a much better (choice-free) illustration of the distribution for a continuous variable:
def ECDF(data):
n = sum(data.notnull())
x = np.sort(data.dropna())
y = np.arange(1, n+1) / n
return x,y
fig, ax = plt.subplots()
plt.plot(*ECDF(df2.loc[df2[column] > -999, 'B']), marker='o')
param = stats.norm.fit(df2[column].dropna())
x = np.linspace(*df2[column].agg([min, max]), 100) # x-values
plt.plot(x, stats.norm.cdf(x, *param), color = 'r')
plt.show()
I'm having a problem adding a colorbar to a plot of many lines corresponding to a power-law.
To create the color-bar for a non-image plot, I added a dummy plot (from answers here: Matplotlib - add colorbar to a sequence of line plots).
To colorbar ticks do not correspond to the colors of the plot.
I have tried changing the norm of the colorbar, and I can fine-tune it to be semy accurate for a particular case, but I can't do that generally.
def plot_loglog_gauss():
from matplotlib import cm as color_map
import matplotlib as mpl
"""Creating the data"""
time_vector = [0, 1, 2, 4, 8, 16, 32, 64, 128, 256]
amplitudes = [t ** 2 * np.exp(-t * np.power(np.linspace(-0.5, 0.5, 100), 2)) for t in time_vector]
"""Getting the non-zero minimum of the data"""
data = np.concatenate(amplitudes).ravel()
data_min = np.min(data[np.nonzero(data)])
"""Creating K-space data"""
k_vector = np.linspace(0,1,100)
"""Plotting"""
number_of_plots = len(time_vector)
color_map_name = 'jet'
my_map = color_map.get_cmap(color_map_name)
colors = my_map(np.linspace(0, 1, number_of_plots, endpoint=True))
# plt.figure()
# dummy_plot = plt.contourf([[0, 0], [0, 0]], time_vector, cmap=my_map)
# plt.clf()
norm = mpl.colors.Normalize(vmin=time_vector[0], vmax=time_vector[-1])
cmap = mpl.cm.ScalarMappable(norm=norm, cmap=color_map_name)
cmap.set_array([])
for i in range(number_of_plots):
plt.plot(k_vector, amplitudes[i], color=colors[i], label=time_vector[i])
c = np.arange(1, number_of_plots + 1)
plt.xlabel('Frequency')
plt.ylabel('Amplitude')
plt.yscale('symlog', linthreshy=data_min)
plt.xscale('log')
plt.legend(loc=3)
ticks = time_vector
plt.colorbar(cmap, ticks=ticks, shrink=1.0, fraction=0.1, pad=0)
plt.show()
By comparing with the legend you see the ticks values don't match the actual colors. For example, 128 is shown in green in the colormap while red in the legend.
The actual result should be a linear-color colorbar. with ticks at regular intervals on the colorbar (corresponding to irregular time intervals...). And of course correct color for value of tick.
(Eventually the plot contains many plots (len(time_vector) ~ 100), I lowered the number of plots to illustrate and to be able to show the legend.)
To clarify, this is what I want the result to look like.
The most important principle is to keep the colors from the line plots and the ScalarMappable in sync. This means, the color of the line should not be taken from an independent list of colors, but rather from the same colormap and using the same normalization as the colorbar to be shown.
One major problem is then to decide what to do with 0 which cannot be part of a loagrithmic normalization. The following is a workaround assuming a linear scale between 0 and 2, and a log scale above, using a SymLogNorm.
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
"""Creating the data"""
time_vector = [0, 1, 2, 4, 8, 16, 32, 64, 128, 256]
amplitudes = [t ** 2 * np.exp(-t * np.power(np.linspace(-0.5, 0.5, 100), 2)) for t in time_vector]
"""Getting the non-zero minimum of the data"""
data = np.concatenate(amplitudes).ravel()
data_min = np.min(data[np.nonzero(data)])
"""Creating K-space data"""
k_vector = np.linspace(0,1,100)
"""Plotting"""
cmap = plt.cm.get_cmap("jet")
norm = mpl.colors.SymLogNorm(2, vmin=time_vector[0], vmax=time_vector[-1])
sm = mpl.cm.ScalarMappable(norm=norm, cmap=cmap)
sm.set_array([])
for i in range(len(time_vector)):
plt.plot(k_vector, amplitudes[i], color=cmap(norm(time_vector[i])), label=time_vector[i])
#c = np.arange(1, number_of_plots + 1)
plt.xlabel('Frequency')
plt.ylabel('Amplitude')
plt.yscale('symlog', linthreshy=data_min)
plt.xscale('log')
plt.legend(loc=3)
cbar = plt.colorbar(sm, ticks=time_vector, format=mpl.ticker.ScalarFormatter(),
shrink=1.0, fraction=0.1, pad=0)
plt.show()
Consider the folowing plot:
fig, ax = plt.subplots(figsize = (14, 6))
ax.set_facecolor('k')
ax.set_xlim(0, 100)
ax.set_ylim(0, 100)
xs = np.arange(60, 70) # xs = np.linspace(60, 70, 100)
ys = np.arange(0, 100, .5) # ys = np.linspace(0, 100, 100)
v = [[[x, y] for x in xs] for y in ys]
lines = LineCollection(v, linewidth = 1, cmap = plt.cm.Greys_r)
lines.set_array(xs)
ax.add_collection(lines)
How can I change the color of the lines according to their x coordinates (horizontally) so as to create a "shading" effect like this:
Here, the greater x is, the "whiter" the LineCollection is.
Following this reasoning, I thought that specifying lines.set_array(xs) would do the trick but as you can see in my plot the color gradation is still following the y axis. Strangely the pattern is repeating itself, from black to white (every 5) over and over (up to 100).
I think (not sure at all) the problem lies in the v variable that contains the coordinates. The concatenation of x and y might be improper.
The shape of the list v you supply to the LineCollection is indeed not suitable to create a gradient of the desired direction. This is because each line in a LineCollection can only have single color. Here the lines range from x=60 to x=70 and each of those lines has one color.
What you need to do instead is to create a line collection where each line is devided into several segments, each of which can then have its own color.
To this end an array of dimensions (n, m, l), where n is the number of segments, m is the number of points per segment, and l is the dimension (2D, hence l=2) needs to be used.
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.collections import LineCollection
fig, ax = plt.subplots(figsize = (14, 6))
ax.set_facecolor('k')
ax.set_xlim(0, 100)
ax.set_ylim(0, 100)
xs = np.linspace(60, 70, 100)
ys = np.linspace(0, 100, 100)
X,Y = np.meshgrid(xs,ys)
s = X.shape
segs = np.empty(((s[0])*(s[1]-1),2,2))
segs[:,0,0] = X[:,:-1].flatten()
segs[:,1,0] = X[:,1:].flatten()
segs[:,0,1] = Y[:,:-1].flatten()
segs[:,1,1] = Y[:,1:].flatten()
lines = LineCollection(segs, linewidth = 1, cmap = plt.cm.Greys_r)
lines.set_array(X[:,:-1].flatten())
ax.add_collection(lines)
plt.show()