Given a certain dataset, I would like to create three histograms in one plot. The data (just a small snippet of a huge dataset, which would break the mold) looks like this:
x, y1, y2, y3
2.0466115, 0, 0, 0
2.349824, 0, 0, 0
2.697959, 0, 0, 0
3.097671, 0.195374, 0.191008, 0.167979
3.5566025, 0.522926, 0.511492, 0.426324
4.083526, 0.691916, 0.6774083,0.5790586666666666
4.688515, 0.8181206,0.801901, 0.6795873333333334
5.3831355, 0.8489766,0.833376, 0.707486
6.1806665, 0.809022, 0.795524, 0.6750806666666667
All my x values are the same, y1, y2 and y3 represent the three different y values. I'm creating a seperate list for each column and pass them as an argument for pyplot.hist. You can see my code here:
import numpy as np
from matplotlib import pyplot
from excel_to_csv import coordinates
y1 = coordinates(1) #another method, which creates the list out of the column
y2 = coordinates(2)
y3 = coordinates(3)
bins = np.linspace(0, 10, 150)
pyplot.hist(y1, bins, alpha=0.5, label='y1')
pyplot.hist(y2, bins, alpha=0.5, label='y2')
pyplot.hist(y3, bins, alpha=0.5, label='y3')
pyplot.legend(loc='upper right')
pyplot.show()
This code results in the following plot (regarding the actual dataset):
As far as I researched, you creating bins for the range of the x axis. But instead of doing so, I would like to put there my x values.
My goal is the histogram looking like this, but as a histogram (once again - regarding the huge dataset):
You can use np.histogram and then plot the values of the histogram:
import numpy as np
import matplotlib.pyplot as plt
# Generate sample data
y1 = np.random.normal(3,1,10000)
y2 = np.random.normal(5,1,10000)
y3 = np.random.normal(7,1,10000)
bins = np.linspace(0, 10, 150)
x = np.linspace(0,10000,149)
# Plot regular histograms
plt.figure()
plt.hist(y1, bins, alpha=0.5, label='y1')
plt.hist(y2, bins, alpha=0.5, label='y2')
plt.hist(y3, bins, alpha=0.5, label='y3')
plt.ylabel('Frequency')
plt.xlabel('Bins')
plt.legend(loc='upper right')
plt.show()
# Compute histogram data
h1 = np.histogram(y1, bins)
h2 = np.histogram(y2, bins)
h3 = np.histogram(y3, bins)
# Compute bin average
bin_avg = bins[0:-1] + bins[1] - bins[0]
# Plot histogram data as a line with markers
plt.figure()
plt.plot(bin_avg, h1[0], alpha=0.5, label='y1', marker='o')
plt.plot(bin_avg, h2[0], alpha=0.5, label='y2', marker='o')
plt.plot(bin_avg, h3[0], alpha=0.5, label='y3', marker='o')
plt.ylabel('Frequency')
plt.xlabel('Bins')
plt.legend(loc='upper right')
plt.show()
It wouldn't make sense to plot the binned data versus x because once the data has been transformed by the histogram the relationship it had with x is no longer the same.
Related
I have a dataset that looks similar to the one simulated in the code below. There are two sets of observations, one for those at X=0 and another for those at X>0.
import numpy as np
import seaborn as sns; sns.set()
import matplotlib.pyplot as plt
X1 = np.random.normal(0, 1, 100)
X1 = X1 - np.min(X1)
Y1 = X1 + np.random.normal(0, 1, 100)
X0 = np.zeros(100)
Y0 = np.random.normal(0, 1.2, 100) + 2
X = np.concatenate((X1, X0))
Y = np.concatenate((Y1, Y0))
sns.distplot(Y0, color="orange")
plt.show()
sns.scatterplot(X, Y, hue = (X == 0), legend=False)
plt.show()
There are two plots: a histogram with KDE and a scatterplot.
I want to take the histogram with KDE, rotate it, and orient it appropriately with respect to the scatter plot. I would also like to add a trend line for each respective set of observations.
The ideal result would look something like this:
How do you do this in python, either using seaborn or matplotlib?
This can be done by combining plt.subplots with shared y-axis to keep the scale and sns plots. For trend line you need some additional computation, but you can use np for quick fitting. Here is an example how to achieve your goal, and here is jupyter notebook to play with.
import numpy as np
import seaborn as sns; sns.set()
import matplotlib.pyplot as plt
# Prepare some data
np.random.seed(2020)
mean_Y1 = 0
std_Y1 = 1
size_Y1 = 100
X1 = np.random.normal(mean_Y1, std_Y1, size_Y1)
X1 = X1 - np.min(X1)
Y1 = X1 + np.random.normal(mean_Y1, std_Y1, size_Y1)
# this for computing trend line
Z = np.polyfit(X1, Y1, 1)
Y_ = np.poly1d(Z)(X1)
mean_Y0 = 2
std_Y0 = 1.2
size_Y0 = 100
X0 = np.zeros(100)
Y0 = np.random.normal(mean_Y0, std_Y0, size_Y0)
X = np.concatenate((X1, X0))
Y = np.concatenate((Y1, Y0))
# Now time for plotting
fig, axs = plt.subplots(1, 2,
sharey=True,
figsize=(10, 5),
gridspec_kw={'width_ratios': (1, 2)}
)
# control space between plots
fig.subplots_adjust(wspace=0.1)
# set the ticks for y-axis:
axs[0].yaxis.set_tick_params(left=False, labelleft=False, labelright=True)
# if you wish you can rotate xticks on the histogram with:
axs[0].xaxis.set_tick_params(rotation=90)
# plot histogram
dist = sns.distplot(Y0, color="orange", vertical=True, ax=axs[0])
# now we need to get the coordinate of the peak, we need this for mean line
line_data = dist.get_lines()[0].get_data()
max_Y0 = np.max(line_data[0])
# plotting the mean line
axs[0].plot([0, max_Y0], [mean_Y0, mean_Y0], '--', c='orange')
# inverting xaxis
axs[0].invert_xaxis()
# Plotting scatterpot
sns.scatterplot(X, Y, hue = (X == 0), legend=False, ax=axs[1])
# Plotting trend line
sns.lineplot(X1, Y_, ax=axs[1])
# Plotting mean again
axs[1].plot([0, max(X1)], [mean_Y0, mean_Y0], '--', c='orange')
plt.show()
Out:
As the title says, I am trying to plot a system of linear equations to get the intersection point of the 2 equations.
8a-b = 9
4a+9b = 7.
below is the code i have tried.
import matplotlib.pyplot as plt
from numpy.linalg import inv
import numpy as np
a = np.array([[8,-1],[4,9]])
b = np.array([9,7])
c = np.linalg.solve(a,b)
plt.figure()
# Set x-axis range
plt.xlim((-10,10))
# Set y-axis range
plt.ylim((-10,10))
# Draw lines to split quadrants
plt.plot([-10,-10],[10,10], linewidth=4, color='blue' )
#draw the equations
plt.plot(a[0][0],a[0][1], linewidth=2, color='red' )
plt.plot(a[1][0],a[1][1], linewidth=2, color='red' )
plt.plot(c[0],c[1], marker='x', color="black")
plt.title('Quadrant plot')
plt.show()
I get only the intersection point, but not the lines on the 2D plane as shown in the below graph.
I want something like this.
To plot the lines it's easiest if you rearrange your equations to in terms of b. This way 8a-b=9 becomes b=8a-9 and 4a+9b=7 becomes b=(7-4a)/9
It also looks like you were trying to draw the "axis" of the graph, I've fixed this in the code below too.
The following should do the trick:
import matplotlib.pyplot as plt
import numpy as np
a = np.array([[8,-1],[4,9]])
b = np.array([9,7])
c = np.linalg.solve(a,b)
plt.figure()
# Set x-axis range
plt.xlim((-10,10))
# Set y-axis range
plt.ylim((-10,10))
# Draw lines to split quadrants
plt.plot([-10, 10], [0, 0], color='C0')
plt.plot([0, 0], [-10, 10], color='C0')
# Draw line 8a-b=9 => b=8a-9
x = np.linspace(-10, 10)
y = 8 * x - 9
plt.plot(x, y, color='C2')
# Draw line 4a+9b=7 => b=(7-4a)/9
y = (7 - 4*x) / 9
plt.plot(x, y, color='C2')
# Add solution
plt.scatter(c[0], c[1], marker='x', color='black')
# Annotate solution
plt.annotate('({:0.3f}, {:0.3f})'.format(c[0], c[1]), c+0.5)
plt.title('Quadrant plot')
plt.show()
This gave me the following plot:
x1 = np.arange(-10, 10, 0.01) # between -10 and 10, 0.01 stepsize
y1 = 8*x1-9
x2 = np.arange(-10, 10, 0.01) # between -10 and 10, 0.01 stepsize
y2 = (7-4*x2)/9
This is the equations of your lines.
Now plot these using plt.plot(x1,y1) etc.
plt.figure()
# Set x-axis range
plt.xlim((-10,10))
# Set y-axis range
plt.ylim((-10,10))
# Draw lines to split quadrants
plt.plot([-10,-10],[10,10], linewidth=4, color='blue' )
plt.plot(x1,y1)
plt.plot(x2,y2)
#draw the equations
plt.plot(a[0][0],a[0][1], linewidth=2, color='red' )
plt.plot(a[1][0],a[1][1], linewidth=2, color='red' )
plt.plot(c[0],c[1], marker='x', color="black")
plt.title('Quadrant plot')
plt.show()
I wrote some code a while ago that used gaussian kde to make simple density scatter plots. However, for datasets larger than about 100,000 points, it just ran 'forever' (I killed it after a few days). A friend gave me some code in R that could create such a density plot in seconds (plot_fun.R), and it seems like matplotlib should be able to do the same thing.
I think the right place to look is 2d histograms, but I am struggling to get the density to be 'right'. I modified code I found at this question to accomplish this, but the density is not showing, it looks like only the densist posible points are getting any color.
Here is approximately the code I am using:
# initial data
x = -np.log10(np.random.random_sample(10000))
y = -np.log10(np.random.random_sample(10000))
#histogram definition
bins = [1000, 1000] # number of bins
thresh = 3 #density threshold
#data definition
mn = min(x.min(), y.min())
mx = max(x.max(), y.max())
mn = mn-(mn*.1)
mx = mx+(mx*.1)
xyrange = [[mn, mx], [mn, mx]]
# histogram the data
hh, locx, locy = np.histogram2d(x, y, range=xyrange, bins=bins)
posx = np.digitize(x, locx)
posy = np.digitize(y, locy)
#select points within the histogram
ind = (posx > 0) & (posx <= bins[0]) & (posy > 0) & (posy <= bins[1])
hhsub = hh[posx[ind] - 1, posy[ind] - 1] # values of the histogram where the points are
xdat1 = x[ind][hhsub < thresh] # low density points
ydat1 = y[ind][hhsub < thresh]
hh[hh < thresh] = np.nan # fill the areas with low density by NaNs
f, a = plt.subplots(figsize=(12,12))
c = a.imshow(
np.flipud(hh.T), cmap='jet',
extent=np.array(xyrange).flatten(), interpolation='none',
origin='upper'
)
f.colorbar(c, ax=ax, orientation='vertical', shrink=0.75, pad=0.05)
s = a.scatter(
xdat1, ydat1, color='darkblue', edgecolor='', label=None,
picker=True, zorder=2
)
That produces this plot:
The KDE code is here:
f, a = plt.subplots(figsize=(12,12))
xy = np.vstack([x, y])
z = sts.gaussian_kde(xy)(xy)
# Sort the points by density, so that the densest points are
# plotted last
idx = z.argsort()
x2, y2, z = x[idx], y[idx], z[idx]
s = a.scatter(
x2, y2, c=z, s=50, cmap='jet',
edgecolor='', label=None, picker=True, zorder=2
)
That produces this plot:
The problem is, of course, that this code is unusable on large data sets.
My question is: how can I use the 2d histogram to produce a scatter plot like that? ax.hist2d does not produce a useful output, because it colors the whole plot, and all my efforts to get the above 2d histogram data to actually color the dense regions of the plot correctly have failed, I always end up with either no coloring or a tiny percentage of the densest points being colored. Clearly I just don't understand the code very well.
Your histogram code assigns a unique color (color='darkblue') so what are you expecting?
I think you are also over complicating things. This much simpler code works fine:
import numpy as np
import matplotlib.pyplot as plt
x, y = -np.log10(np.random.random_sample((2,10**6)))
#histogram definition
bins = [1000, 1000] # number of bins
# histogram the data
hh, locx, locy = np.histogram2d(x, y, bins=bins)
# Sort the points by density, so that the densest points are plotted last
z = np.array([hh[np.argmax(a<=locx[1:]),np.argmax(b<=locy[1:])] for a,b in zip(x,y)])
idx = z.argsort()
x2, y2, z2 = x[idx], y[idx], z[idx]
plt.figure(1,figsize=(8,8)).clf()
s = plt.scatter(x2, y2, c=z2, cmap='jet', marker='.')
Given 3 arrays:
X1 = 10.00, 30.10, 50.20, 70.30 ...
X2 = 1.9976433815311, 2.0109630315475, 2.0372702369401, 2.0665284897891 ...
Y = -0.0000008764356, -0.0000149459573, -0.0000326996870, -0.0000513717121 ...
There is a one-to-one correspondence between X1, X2 and Y, i.e.
the i-th element of X1 has an i-th associated value of X2 and a i-th value of Y.
The following is the plot of Y as a function of X1 (blue dots).
I would need the X2 axis to show all the corresponding X2 values for each X1 value.
Following the second answer on this post,
I have partially accomplished this thorugh the ticker.FixedFormatter strategy,
by which: the X2 array needs to be transformed to a tuple, and each element of this tuple needs to be a string.
As can be seen, not all red values of X2 are displayed for each value of X1, e.g. for X1 = 10.0 the corresponding X2 = 2.00 appears to be displaced.
I do not understand very well why this is occurring. I would appreciate if you could help me.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import sys
X1 = np.array([10.0000000000000, 30.1000000000000, 50.2000000000000, 70.3000000000000, 90.4000000000000, 110.5100000000000, 130.6100000000000, 150.7100000000000, 170.8100000000000, 190.9100000000000, 211.0100000000000, 231.1100000000000, 251.2100000000000, 271.3100000000000, 291.4100000000000, 311.5200000000000, 331.6200000000000 ])
Y = np.array([-0.0000008764356, -0.0000149459573, -0.0000326996870, -0.0000513717121, -0.0000652350399, -0.0000842214902, -0.0001003825474, -0.0001214363281, -0.0001376971422, -0.0001572720132, -0.0001971891337, -0.0002203926200, -0.0002747064193, -0.0003217228112, -0.0003764577474, -0.0004657478828, -0.0006232016207])
X2 = np.array([1.9976433815311, 2.0109630315475, 2.0372702369401, 2.0665284897891, 2.0995743328944, 2.1392386324550, 2.1789200955649, 2.2290243968267, 2.2872281293691, 2.3180577547912, 2.4100643103912, 2.4826981368480, 2.5794602952095, 2.6764219232389, 2.7963983991814, 2.9740753305878, 3.3107035136072])
##### Plotting:
fig, ax1 = plt.subplots()
ax1.plot(X1, Y, linestyle='--', marker="o", markersize=6, color='blue')
ax1.set_ylabel('Y', fontsize=20)
# Make the ax1-ticks and ax1-tick-labels match the line color (blue):
ax1.set_xlabel('X1', fontsize=20, color='blue')
plt.setp(ax1.get_xticklabels(), rotation='45') # rotate them
# Create a new axis:
ax2 = ax1.twiny()
# Make the ax2-ticks and ax2-tick-labels match the red color:
ax2.set_xlabel('X2', fontsize=20, color='red')
ax2.tick_params('x', colors='red')
fig.tight_layout()
ax2.set_xlim(1.9, 3.4)
ax1.set_ylim(-0.0007, 1.1e-5)
ax2.set_ylim(-0.0007, 1.1e-5)
ax1.grid()
# Convert all X2 elements to a list of strings:
X2_string_all = []
for i in X2:
aux = "%.2f" % i
X2_string = str(aux)
X2_string_all.append(X2_string)
# Convert that list into a tuple:
X2_string_all_tuple = tuple(X2_string_all)
ax1.xaxis.set_major_locator(ticker.FixedLocator((X1)))
ax2.xaxis.set_major_formatter(ticker.FixedFormatter((X2_string_all_tuple)))
plt.show()
Something like this would be the desired plot (the red lines that come across the plot are not necessary):
In your code ax2 does not know that it should behave exactly as ax1, just with different labels. So you need to tell it,
ax2.set_xlim(ax1.get_xlim())
Then just use the same tick locations for both axes,
ax1.set_xticks(X1)
ax2.set_xticks(X1)
and label the ticks of ax2 with values from X2
ax2.set_xticklabels(["%.2f" % i for i in X2])
Complete code:
import numpy as np
import matplotlib.pyplot as plt
X1 = np.array([10., 30.1, 50.2, 70.3, 90.4, 110.510, 130.610, 150.710, 170.810,
190.910, 211.010, 231.110, 251.210, 271.310, 291.410, 311.52, 331.62])
Y = np.array([-0.00000087, -0.0000149, -0.0000326, -0.0000513, -0.00006523, -0.0000842,
-0.0001003, -0.0001214, -0.00013769, -0.0001572, -0.0001971, -0.0002203,
-0.00027470, -0.0003217, -0.0003764, -0.0004657, -0.00062320])
X2 = np.array([1.997, 2.0109, 2.0372, 2.0665, 2.099, 2.1392, 2.1789, 2.2290,
2.287, 2.3180, 2.4100, 2.4826, 2.579, 2.6764, 2.7963, 2.9740, 3.310])
##### Plotting:
fig, ax1 = plt.subplots()
ax1.grid()
ax2 = ax1.twiny()
ax1.plot(X1, Y, linestyle='--', marker="o", markersize=6, color='blue')
ax1.set_ylabel('Y', fontsize=20)
ax1.set_xlabel('X1', fontsize=20, color='blue')
plt.setp(ax1.get_xticklabels(), rotation='45') # rotate them
ax2.set_xlabel('X2', fontsize=20, color='red')
plt.setp(ax2.get_xticklabels(), rotation='45', color='red')
# Set xlimits of ax2 the same as ax1
ax2.set_xlim(ax1.get_xlim())
# Set ticks at desired position
ax1.set_xticks(X1)
ax2.set_xticks(X1)
# Label ticks of ax2 with values from X2
ax2.set_xticklabels(["%.2f" % i for i in X2])
fig.tight_layout()
plt.show()
I need to center the bars of a histogram.
x = array
y = [0,1,2,3,4,5,6,7,8,9,10]
num_bins = len(array)
n, bins, patches = plt.hist(x, num_bins, facecolor='green', alpha=0.5)
barWidth=20
x.bar(x, y, width=barWidth, align='center')
plt.show()
What I need, is that it looks like the one in this picture
I tried almost everything, but still can't go through.
Thank you all
For your task, I think it's better to calculate the histogram with NumPy and plot with bat function. Please refer to a following code and see how to use bin_edges.
import matplotlib.pyplot as plt
import numpy as np
num_samples = 100
num_bins = 10
lb, ub = 0, 10 # lower bound, upper bound
# create samples
y = np.random.random(num_samples) * ub
# caluculate histogram
hist, bin_edges = np.histogram(y, num_bins, range=(lb, ub))
width = (bin_edges[1] - bin_edges[0])
# plot histogram
plt.bar(bin_edges[:-1], hist, align='center',
width=width, edgecolor='k', facecolor='green', alpha=0.5)
plt.xticks(range(num_bins))
plt.xlim([lb-width/2, ub-width/2])
plt.show()