I have two sets of arrays x1, y1, t1 and x2, y2, t1 -- x data, y data and time measurement.
I would like to plot two these sets as lines with time as an x argument in plot(), so that lines are aligned with respect to time precedence of events.
However, I would also like to see the corresponding x1 and x2 on the plot in a form of xlabels (say at the top and at the bottom of the plot), as well as have two scales for y values (i.e. on the left and on the right of the figure).
import numpy as np
t1 = np.linspace(0, 10, 10)
y1 = np.arange(10)
x1 = (np.cumsum(np.random.rand(10)) * 1000000000).astype(int)
x1 = (x1 / 100000).astype(int) * 10
x2 = (np.cumsum(np.random.rand(10)) * 1000000000).astype(int)
x2 = (x2 / 1000000).astype(int)
y2 = 2 * np.arange(10)
t2 = np.linspace(0, 10, 10) + 2
from matplotlib import pyplot as plt
fig, ax1 = plt.subplots()
ax1.plot(t1, y1)
ax1.set_ylabel("y1 label")
ax1.set_xticklabels(x1)
ax1.set_xlabel("x1 label")
ax2 = ax1.twinx()
ax2.plot(t2, y2, c='r')
ax2.set_ylabel("y2 label")
ax3 = ax2.twiny()
ax3.xaxis.set_ticks_position('top')
ax3.set_xticklabels(x2);
ax3.set_xlabel("x2 label")
The code produces
which is good, but has two problems:
xlabels are not aligned with the data: blue line on the plot starts at x1 ticklabel 104500, while x1[0] = 29380.
I an unable to apply sci format for the x1 and x2 ticks, i.e. the line
ax1.ticklabel_format(style='sci', axis='x', scilimits=(0,0))
fails with This method only works with the ScalarFormatter, which is reasonable, since I have replaced labels of ticks, not ticks themselves. On the other hand, I cannot assign x1 to xticks, since this will change limits of xaxis.
How could I overcome two these problems?
Related
I have a dataset that looks similar to the one simulated in the code below. There are two sets of observations, one for those at X=0 and another for those at X>0.
import numpy as np
import seaborn as sns; sns.set()
import matplotlib.pyplot as plt
X1 = np.random.normal(0, 1, 100)
X1 = X1 - np.min(X1)
Y1 = X1 + np.random.normal(0, 1, 100)
X0 = np.zeros(100)
Y0 = np.random.normal(0, 1.2, 100) + 2
X = np.concatenate((X1, X0))
Y = np.concatenate((Y1, Y0))
sns.distplot(Y0, color="orange")
plt.show()
sns.scatterplot(X, Y, hue = (X == 0), legend=False)
plt.show()
There are two plots: a histogram with KDE and a scatterplot.
I want to take the histogram with KDE, rotate it, and orient it appropriately with respect to the scatter plot. I would also like to add a trend line for each respective set of observations.
The ideal result would look something like this:
How do you do this in python, either using seaborn or matplotlib?
This can be done by combining plt.subplots with shared y-axis to keep the scale and sns plots. For trend line you need some additional computation, but you can use np for quick fitting. Here is an example how to achieve your goal, and here is jupyter notebook to play with.
import numpy as np
import seaborn as sns; sns.set()
import matplotlib.pyplot as plt
# Prepare some data
np.random.seed(2020)
mean_Y1 = 0
std_Y1 = 1
size_Y1 = 100
X1 = np.random.normal(mean_Y1, std_Y1, size_Y1)
X1 = X1 - np.min(X1)
Y1 = X1 + np.random.normal(mean_Y1, std_Y1, size_Y1)
# this for computing trend line
Z = np.polyfit(X1, Y1, 1)
Y_ = np.poly1d(Z)(X1)
mean_Y0 = 2
std_Y0 = 1.2
size_Y0 = 100
X0 = np.zeros(100)
Y0 = np.random.normal(mean_Y0, std_Y0, size_Y0)
X = np.concatenate((X1, X0))
Y = np.concatenate((Y1, Y0))
# Now time for plotting
fig, axs = plt.subplots(1, 2,
sharey=True,
figsize=(10, 5),
gridspec_kw={'width_ratios': (1, 2)}
)
# control space between plots
fig.subplots_adjust(wspace=0.1)
# set the ticks for y-axis:
axs[0].yaxis.set_tick_params(left=False, labelleft=False, labelright=True)
# if you wish you can rotate xticks on the histogram with:
axs[0].xaxis.set_tick_params(rotation=90)
# plot histogram
dist = sns.distplot(Y0, color="orange", vertical=True, ax=axs[0])
# now we need to get the coordinate of the peak, we need this for mean line
line_data = dist.get_lines()[0].get_data()
max_Y0 = np.max(line_data[0])
# plotting the mean line
axs[0].plot([0, max_Y0], [mean_Y0, mean_Y0], '--', c='orange')
# inverting xaxis
axs[0].invert_xaxis()
# Plotting scatterpot
sns.scatterplot(X, Y, hue = (X == 0), legend=False, ax=axs[1])
# Plotting trend line
sns.lineplot(X1, Y_, ax=axs[1])
# Plotting mean again
axs[1].plot([0, max(X1)], [mean_Y0, mean_Y0], '--', c='orange')
plt.show()
Out:
I have a data set with two y values associated with each x value. How can I divide the data into "upper" and "lower" values?
Below, I show an example with such a data set. I show an image of the desired "top" and "bottom" groupings (the red is the top and the purple is the bottom). My best idea so far is to find a line dividing the top and bottom data using an iterative approach.This solution is complicated and does not work very well, so I did not include it.
import matplotlib.pyplot as plt
import numpy as np
# construct data using piecewise functions
x1 = np.linspace(0, 0.7, 70)
x2 = np.linspace(0.7, 1, 30)
x3 = np.linspace(0.01, 0.999, 100)
y1 = 4.164 * x1 ** 3
y2 = 1 / x2
y3 = x3 ** 4 - 0.1
# concatenate data
x = np.concatenate([x1, x2, x3])
y = np.concatenate([y1, y2, y3])
# I want to be able divide the data by top and bottom,
# like shown in the chart. The black is the unlabeled data
# and the red and purple show the top and bottom
plt.scatter(x, y, marker='^', s=10, c='k')
plt.scatter(x1, y1, marker='x', s=0.8, c='r')
plt.scatter(x2, y2, marker='x', s=0.8, c='r')
plt.scatter(x3, y3, marker='x', s=0.8, c='purple')
plt.show()
You can create a dividing line by re-ordering your data. Sort everything by x then apply a Gaussian filter. The two data sets are strictly above or below the results of the Gaussian filter:
import matplotlib.pyplot as plt
from scipy.ndimage.filters import gaussian_filter1d
import numpy as np
# construct data using piecewise functions
x1 = np.linspace(0, 0.7, 70)
x2 = np.linspace(0.7, 1, 30)
x3 = np.linspace(0.01, 0.999, 100)
y1 = 4.164 * x1 ** 3
y2 = 1 / x2
y3 = x3 ** 4 - 0.1
# concatenate data
x = np.concatenate([x1, x2, x3])
y = np.concatenate([y1, y2, y3])
# I want to be able divide the data by top and bottom,
# like shown in the chart. The black is the unlabeled data
# and the red and purple show the top and bottom
idx = np.argsort(x)
newy = y[idx]
newx = x[idx]
gf = gaussian_filter1d(newy, 5)
plt.scatter(x, y, marker='^', s=10, c='k')
plt.scatter(x1, y1, marker='x', s=0.8, c='r')
plt.scatter(x2, y2, marker='x', s=0.8, c='r')
plt.scatter(x3, y3, marker='x', s=0.8, c='purple')
plt.scatter(newx, gf, c='orange')
plt.show()
I would try as follows:
sort the points by increasing X if necessary;
maintain two indexes to the upper and lower subsets;
moving from left to right, for every new point assign it to the closest subset and update the corresponding index.
Initialization of the process seems a little tricky. Start with the first two points (they have high chance of belonging to the same subset). Progress until the two points have a significant separation so that you are sure they belong to different subsets. Then backtrack to the left.
I want to fill the maximized area in from the below equation after plotting in matplotlib
Tried all possibilities but could not fill the desired area.
import numpy as np
import matplotlib.pyplot as plt
A = np.linspace(0, 100, 2000)
# 3A+4B≤30
y1 = (30 - A * 3 ) /4
# 5A+6B≤60
y2 = (60 - A * 5)/6
# 1.5A+3B≤21
y3 = (21 - A * 1.5)/3.0
plt.plot(A, y1, label=r'$3A+4B\leq30$')
plt.plot(A, y2, label=r'$5A+6B\leq60$')
plt.plot(A, y3, label=r'$1.5A+3B\leq21$')
plt.xlim((0, 20))
plt.ylim((0, 15))
plt.xlabel(r'$x values$')
plt.ylabel(r'$y values$')
plt.fill_between(A, y3, where = y2<y3,color='grey', alpha=0.5)
plt.legend(bbox_to_anchor=(.80, 1), loc=2, borderaxespad=0.1)
plt.show()
want to fill the area of maxim which is x = 2.0 and y = 6.0
This is one solution based on this link. The only difference from the linked solution is that for your case, I had to use fill_betweenx to cover the whole x-axis common to the curves and switch the order of x and Y. The idea is to first find the intersection point within some tolerance and then take the values from one curve lying left to the point and the other curve lying right to the intersection. I also had to add an additional [0] in the ind to get it working
import numpy as np
import matplotlib.pyplot as plt
A = np.linspace(0, 100, 2000)
y1 = (30 - A * 3 ) /4
y2 = (60 - A * 5)/6
y3 = (21 - A * 1.5)/3.0
plt.plot(A, y1, label=r'$3A+4B\leq30$')
plt.plot(A, y2, label=r'$5A+6B\leq60$')
plt.plot(A, y3, label=r'$1.5A+3B\leq21$')
plt.xlim((0, 20))
plt.ylim((0, 12))
plt.xlabel(r'$x values$')
plt.ylabel(r'$y values$')
plt.legend(bbox_to_anchor=(.65, 0.95), loc=2, borderaxespad=0.1)
def fill_below_intersection(x, S, Z):
"""
fill the region below the intersection of S and Z
"""
#find the intersection point
ind = np.nonzero( np.absolute(S-Z)==min(np.absolute(S-Z)))[0][0]
# compute a new curve which we will fill below
Y = np.zeros(S.shape)
Y[:ind] = S[:ind] # Y is S up to the intersection
Y[ind:] = Z[ind:] # and Z beyond it
plt.fill_betweenx(Y, x, facecolor='gray', alpha=0.5) # <--- Important line
fill_below_intersection(A, y3, y1)
I am assuming you want to fill the area between y1 and y3 until they intersect with each other, because you specified (2, 6) as a point? Then use:
plt.fill_between(A, y1, y3, where = y1<y3)
Analogously replace y3 for y2 if you meant the other curve. "Maximized area" is a bit misleading, as #gmds already commented.
I'm trying to get the output of a saved figure, with two side-by-side subplots, to have an equal looking spacing on the left and right of the plots, so that when the figure is included in a document, it looks like the plots are collectively centred.
In other words, add padding of width 'a' from the figure to the right hand side of the figure.
Here is some sample code (with the data replaced):
import numpy as np
import matplotlib.pyplot as plt
x1 = np.linspace(0.0, 5.0)
x2 = np.linspace(0.0, 2.0)
y1 = np.cos(2 * np.pi * x1) * np.exp(-x1)
y2 = np.cos(2 * np.pi * x2)
fig, axes = plt.subplots(nrows=1,
ncols=2,
dpi=220,
squeeze=False,
figsize=(3.45, 2.1)
)
axes[0][0].plot(x1, y1, 'o-')
axes[0][0].set_ylabel('Damped oscillation')
axes[0][0].set_xlabel('time (s)')
axes[0][1].plot(x2, y2, '.-')
axes[0][1].set_xlabel('time (s)')
for i in [0, 1]:
x0, x1 = axes[0][i].get_xlim()
x_diff = x1 - x0
y0, y1 = axes[0][i].get_ylim()
y_diff = y1 - y0
axes[0][i].set_aspect(x_diff / y_diff)
axes[0][i].tick_params(labelbottom='off', labelleft='off', axis=u'both', which=u'both', length=0)
axes[0][i].xaxis.labelpad = 7
axes[0][i].yaxis.labelpad = 7
fig.savefig('clustering.pdf',
bbox_inches='tight',
pad_inches=0,
dpi='figure')
Given 3 arrays:
X1 = 10.00, 30.10, 50.20, 70.30 ...
X2 = 1.9976433815311, 2.0109630315475, 2.0372702369401, 2.0665284897891 ...
Y = -0.0000008764356, -0.0000149459573, -0.0000326996870, -0.0000513717121 ...
There is a one-to-one correspondence between X1, X2 and Y, i.e.
the i-th element of X1 has an i-th associated value of X2 and a i-th value of Y.
The following is the plot of Y as a function of X1 (blue dots).
I would need the X2 axis to show all the corresponding X2 values for each X1 value.
Following the second answer on this post,
I have partially accomplished this thorugh the ticker.FixedFormatter strategy,
by which: the X2 array needs to be transformed to a tuple, and each element of this tuple needs to be a string.
As can be seen, not all red values of X2 are displayed for each value of X1, e.g. for X1 = 10.0 the corresponding X2 = 2.00 appears to be displaced.
I do not understand very well why this is occurring. I would appreciate if you could help me.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import sys
X1 = np.array([10.0000000000000, 30.1000000000000, 50.2000000000000, 70.3000000000000, 90.4000000000000, 110.5100000000000, 130.6100000000000, 150.7100000000000, 170.8100000000000, 190.9100000000000, 211.0100000000000, 231.1100000000000, 251.2100000000000, 271.3100000000000, 291.4100000000000, 311.5200000000000, 331.6200000000000 ])
Y = np.array([-0.0000008764356, -0.0000149459573, -0.0000326996870, -0.0000513717121, -0.0000652350399, -0.0000842214902, -0.0001003825474, -0.0001214363281, -0.0001376971422, -0.0001572720132, -0.0001971891337, -0.0002203926200, -0.0002747064193, -0.0003217228112, -0.0003764577474, -0.0004657478828, -0.0006232016207])
X2 = np.array([1.9976433815311, 2.0109630315475, 2.0372702369401, 2.0665284897891, 2.0995743328944, 2.1392386324550, 2.1789200955649, 2.2290243968267, 2.2872281293691, 2.3180577547912, 2.4100643103912, 2.4826981368480, 2.5794602952095, 2.6764219232389, 2.7963983991814, 2.9740753305878, 3.3107035136072])
##### Plotting:
fig, ax1 = plt.subplots()
ax1.plot(X1, Y, linestyle='--', marker="o", markersize=6, color='blue')
ax1.set_ylabel('Y', fontsize=20)
# Make the ax1-ticks and ax1-tick-labels match the line color (blue):
ax1.set_xlabel('X1', fontsize=20, color='blue')
plt.setp(ax1.get_xticklabels(), rotation='45') # rotate them
# Create a new axis:
ax2 = ax1.twiny()
# Make the ax2-ticks and ax2-tick-labels match the red color:
ax2.set_xlabel('X2', fontsize=20, color='red')
ax2.tick_params('x', colors='red')
fig.tight_layout()
ax2.set_xlim(1.9, 3.4)
ax1.set_ylim(-0.0007, 1.1e-5)
ax2.set_ylim(-0.0007, 1.1e-5)
ax1.grid()
# Convert all X2 elements to a list of strings:
X2_string_all = []
for i in X2:
aux = "%.2f" % i
X2_string = str(aux)
X2_string_all.append(X2_string)
# Convert that list into a tuple:
X2_string_all_tuple = tuple(X2_string_all)
ax1.xaxis.set_major_locator(ticker.FixedLocator((X1)))
ax2.xaxis.set_major_formatter(ticker.FixedFormatter((X2_string_all_tuple)))
plt.show()
Something like this would be the desired plot (the red lines that come across the plot are not necessary):
In your code ax2 does not know that it should behave exactly as ax1, just with different labels. So you need to tell it,
ax2.set_xlim(ax1.get_xlim())
Then just use the same tick locations for both axes,
ax1.set_xticks(X1)
ax2.set_xticks(X1)
and label the ticks of ax2 with values from X2
ax2.set_xticklabels(["%.2f" % i for i in X2])
Complete code:
import numpy as np
import matplotlib.pyplot as plt
X1 = np.array([10., 30.1, 50.2, 70.3, 90.4, 110.510, 130.610, 150.710, 170.810,
190.910, 211.010, 231.110, 251.210, 271.310, 291.410, 311.52, 331.62])
Y = np.array([-0.00000087, -0.0000149, -0.0000326, -0.0000513, -0.00006523, -0.0000842,
-0.0001003, -0.0001214, -0.00013769, -0.0001572, -0.0001971, -0.0002203,
-0.00027470, -0.0003217, -0.0003764, -0.0004657, -0.00062320])
X2 = np.array([1.997, 2.0109, 2.0372, 2.0665, 2.099, 2.1392, 2.1789, 2.2290,
2.287, 2.3180, 2.4100, 2.4826, 2.579, 2.6764, 2.7963, 2.9740, 3.310])
##### Plotting:
fig, ax1 = plt.subplots()
ax1.grid()
ax2 = ax1.twiny()
ax1.plot(X1, Y, linestyle='--', marker="o", markersize=6, color='blue')
ax1.set_ylabel('Y', fontsize=20)
ax1.set_xlabel('X1', fontsize=20, color='blue')
plt.setp(ax1.get_xticklabels(), rotation='45') # rotate them
ax2.set_xlabel('X2', fontsize=20, color='red')
plt.setp(ax2.get_xticklabels(), rotation='45', color='red')
# Set xlimits of ax2 the same as ax1
ax2.set_xlim(ax1.get_xlim())
# Set ticks at desired position
ax1.set_xticks(X1)
ax2.set_xticks(X1)
# Label ticks of ax2 with values from X2
ax2.set_xticklabels(["%.2f" % i for i in X2])
fig.tight_layout()
plt.show()