Shift plots of different lengths in the same x-axis - python

I have a very specific something I want to do with matplotlib I don't even know if it's possible, but I figured it was worth asking. Maybe the answers will give me an alternate idea about how to go about it.
I have 4 arrays of similar, but different lengths that I want to plot in the same x-axis. This question suggests creating the values for x using range(), and it worked:
plt.figure(figsize=(8, 6), dpi=300)
x_5 = range(len(all_data_float[0]))
plt.plot(x_5, all_data_float[0], color='b', marker='.')
x_10 = range(len(all_data_float[1]))
plt.plot(x_10, all_data_float[1], color='r', marker='.')
x_15 = range(len(all_data_float[2]))
plt.plot(x_15, all_data_float[2], color='g', marker='.')
x_20 = range(len(all_data_float[3]))
plt.plot(x_20, all_data_float[3], color='c', marker='.')
plt.show()
But I wanted to do something else, I want to plot a vertical line in the middle aligned by a point, for example there are 4 plots with:
plot1: 101 points
plot2: 99 points
plot3: 100 points
plot4: 101 points
So for plot1, that point would be index 51, which means 51 points before and 49 after with the line crossing point 51. For plot2, that middle point is index 49, which means 49 points before and 50 after, and so forth.
My difficulty is that the vertical line has a different index for each plot. I know plt.vlines() accepts an array, but in this case it plots multiple lines, and I wanted a single line.
Is there a way to "shift" each plot relative to the x-axis? so index 51 of plot1 aligns with index 49 of plot2, etc? Or is there a better strategy to do this?

From the set up of the question I am going to assume that the the x-values do not have any numerical meaning so it is safe from a data-point-of-view to shift them around. Instead of plotting your data against range(len(...)), do the shift there!
import matpoltlib.pyplot as plt
import numpy as np
def synthetic_data(length):
"make some variable length synthetic data to plot."
return np.exp(-((np.linspace(-5, 5, length)) ** 2))
data = [synthetic_data(51), synthetic_data(75), synthetic_data(105)]
fig, ax = plt.subplots(constrained_layout=True)
for d in data:
x_vector = np.arange(len(d)) - len(d) // 2
ax.plot(x_vector, d)
ax.axvline(0, color="k", ls="--")
ax.set_xlabel("delta from center")
ax.set_ylabel("synthetic data!")

Related

Python: scatter plot with non-linear x axis

I have data with lots of x values around zero and only a few as you go up to around 950,
I want to create a plot with a non-linear x axis so that the relationship can be seen in a 'straight line' form. Like seen in this example,
I have tried using plt.xscale('log') but it does not achieve what I want.
I have not been able to use the log scale function with a scatter plot as it then only shows 3 values rather than the thousands that exist.
I have tried to work around it using
plt.plot(retper, aep_NW[y], marker='o', linewidth=0)
to replicate the scatter function which plots but does not show what I want.
plt.figure(1)
plt.scatter(rp,aep,label="SSI sum")
plt.show()
Image 3:
plt.figure(3)
plt.scatter(rp, aep)
plt.xscale('log')
plt.show()
Image 4:
plt.figure(4)
plt.plot(rp, aep, marker='o', linewidth=0)
plt.xscale('log')
plt.show()
ADDITION:
Hi thank you for the response.
I think you are right that my x axis is truncated but I'm not sure why or how...
I'm not really sure what to post code wise as the data is all large and coming from a server so can't really give you the data to see it with.
Basically aep_NW is a one dimensional array with 951 elements, values from 0-~140, with most values being small and only a few larger values. The data represents a storm severity index for 951 years.
Then I want the x axis to be the return period for these values, so basically I made a rp array, of the same size, which is given values from 951 down decreasing my a half each time.
I then sort the aep_NW values from lowest to highest with the highest value being associated with the largest return value (951), then the second highest aep_NW value associated with the second largest return period value (475.5) ect.
So then when I plot it I need the x axis scale to be similar to the example you showed above or the first image I attatched originally.
rp = [0]*numseas.shape[0]
i = numseas.shape[0] - 1
rp[i] = numseas.shape[0]
i = i - 1
while i != 0:
rp[i] = rp[i+1]/2
i = i - 1
y = np.argsort(aep_NW)
fig, ax = plt.subplots()
ax.scatter(rp,aep_NW[y],label="SSI sum")
ax.set_xscale('log')
ax.set_xlabel("Return period")
ax.set_ylabel("SSI score")
plt.title("AEP for NW Europe: total loss per entire extended winter season")
plt.show()
It looks like in your "Image 3" the x axis is truncated, so that you don't see the data you are interested in. It appears this is due to there being 0's in your 'rp' array. I updated the examples to show the error you are seeing, one way to exclude the zeros, and one way to clip them and show them on a different scale.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
n = 100
numseas = np.logspace(-5, 3, n)
aep_NW = np.linspace(0, 140, n)
rp = [0]*numseas.shape[0]
i = numseas.shape[0] - 1
rp[i] = numseas.shape[0]
i = i - 1
while i != 0:
rp[i] = rp[i+1] /2
i = i - 1
y = np.argsort(aep_NW)
fig, axes = plt.subplots(1, 3, figsize=(14, 5))
ax = axes[0]
ax.scatter(rp, aep_NW[y], label="SSI sum")
ax.set_xscale('log')
ax.set_xlabel("Return period")
ax.set_ylabel("SSI score")
ax = axes[1]
rp = np.array(rp)[y]
mask = rp > 0
ax.scatter(rp[mask], aep_NW[y][mask], label="SSI sum")
ax.set_xscale('log')
ax.set_xlabel("Return period (0 values excluded)")
ax = axes[2]
log2_clipped_rp = np.log2(rp.clip(2**-100, None))[y]
ax.scatter(log2_clipped_rp, aep_NW[y], label="SSI sum")
xticks = list(range(-110, 11, 20))
xticklabels = [f'$2^{{{i}}}$' for i in xticks]
ax.set_xticks(xticks)
ax.set_xticklabels(xticklabels)
ax.set_xlabel("log$_2$ Return period (values clipped to 2$^{-100}$)")
plt.show()

Matplotlib: incorrect histograms

So I just learned about histograms on Khan Academy:
When I go plot something similar in Matplotlib, it is plotted differently. Why?
Shouldn't bins be completely filled? And since bin 5-6 has 3 counts (5, 6, 6), shouldn't it consists of a single bar of value 3? I'm confused
By default, plt.hist() creates 10 bins (or 11 edges). The default value is found in the documentation, and is taken from you rc parameter rcParams["hist.bins"] = 10.
So if you provide data in the range [1–6], hist will count the number of values in the bins: [1.–1.5), [1.5–2.), [2–2.5), [2.5–3.), [3–3.5), [3.5–4.), [4–4.5), [4.5–5.), [5.–5.5), [5.5–6.]. You can tell that that's the case by looking at the text output by hist() (in addition to the graph).
hist() returns 3 objects when called:
the height of each bar (that is the number of items in each bin), equivalent to the column "#" in that Khan video
the edges of the bins, which is roughly equivalent to the column "Bucket" in the video
a list of matplotlib objects that you can use to tweak their appearance when needed.
In summary:
If you want to have bars of width 1, then you need to specify either the number of bins (5), or the edges of your bins.
These two calls provide the same result:
plt.hist(counts, bins=5)
plt.hist(counts, bins=[1,2,3,4,5,6])
EDIT
Here is a function that can help you see the "buckets" chosen by hist:
def hist_and_bins(x, ax=None, **kwargs):
ax = ax or plt.gca()
counts, edges, patches = ax.hist(x, **kwargs)
bin_edges = [[a,b] for a,b in zip(edges, edges[1:])]
ticks = np.mean(bin_edges, axis=1)
tick_labels = ['[{}-{})'.format(l,r) for l,r in bin_edges]
tick_labels[-1] = tick_labels[-1][:-1]+']' # last bin is a closed interval
ax.set_xticks(ticks)
ax.set_xticklabels(tick_labels)
return counts, edges, patches, ax.get_xticks()
fig, (ax1, ax2, ax3) = plt.subplots(1,3, figsize=(9,3))
ax1.hist([1,2,3,4,5,6,6])
hist_and_bins([1,2,3,4,5,6,6], ax=ax2)
hist_and_bins([1,2,3,4,5,6,6], ax=ax3, bins=5, ec='w')
fig.autofmt_xdate()

making 45 degree vectors in matplotlib

I am trying to make 45 degree vector arrows in a chart using the snippet of matplotlib python code:
soa_tau = []
def my_range(start, end, step):
while start <= end:
yield start
start += step
fig, axes = plt.subplots(nrows=1, ncols=6, figsize=(11, 8.5))
plt.subplots_adjust(bottom=0.2, top=0.8, left=0.07, right=0.97, wspace=0.4)
a=0
for tau in range(0,121,24):
if (tau==0):
depth_count= []
soa = []
for h in my_range(0,600,100):
if (tau==0):
depth_count.append(-h)
xvect=0.125
yvect=0.125
result = [0.2,-h,xvect,yvect]
soa.append(result)
a=a+1;
soa_tau.append(soa)
axes[0].set_ylabel('depth (m)')
aa=0
for ax in axes:
soa=soa_tau[aa]
X,Y,U,V = zip(*soa)
ax.quiver(X,Y,U,V,angles='xy',width=0.003,headwidth=0.4,color='r',scale=1,zorder=2)
ax.tick_params(axis='both',which='major', labelsize=6)
ax.set_xlabel('Speed (m/s)')
aa=aa+1
plt.savefig('test.pdf')
Now I am trying to produce the following figure:
However as you can see in the figure the vectors are flat lines. I am trying to make the lines be at 45 degree angles which is what I would expect from the x and y components of the vectors being equal. My guess is having the multiple figures complicated things. However I still need to have these figures available. Is there any way that this code can be tweaked so that the vectors display at 45 degrees instead of 0 degrees as shown in the figure? I also want to be able to maintain the length of the vectors as shown in the figure.

Can't get rid of leading zeros on y axis

I am trying to plot graphs in Matplotlib and embed them into pyqt5 GUI. Everything is working fine, except for the fact that my y axis has loads of leading zeros which I cannot seem to get rid of.
I have tried googling how to format the axis, but nothing seems to work! I can't set the ticks directly because there's no way of determining what they will be, as I am going to be working with varying sized data sets.
num_bins = 50
# create an axis
ax = self.figure.add_subplot(111)
# discards the old graph
ax.clear()
##draws the bars and legend
colours = ['blue','red']
ax.hist(self.histoSets, num_bins, density=True, histtype='bar', color=colours, label=colours)
ax.legend(prop={'size': 10})
##set x ticks
min,max = self.getMinMax()
scaleMax = math.ceil((max/10000))*10000
scaleMin = math.floor((min/10000))*10000
scaleRange = scaleMax - scaleMin
ax.xaxis.set_ticks(np.arange(scaleMin, scaleMax+1, scaleRange/4))
# refresh canvas
self.draw()
all those numbers on your y-axis are tiny, i.e. on the order of 1e-5. this is because the integral of the density is defined to be 1 and your x-axis spans such a large range
I can mostly reproduce your plot with:
import matplotlib.pyplot as plt
import numpy as np
y = np.random.normal([190000, 220000], 20000, (5000, 2))
a, b, c = plt.hist(y, 40, density=True)
giving me:
the tuple returned from hist contains useful information, notably the first element (a above) are the densities, and the second element (b above) are the bins that it picked. you can see this all sums to one by doing:
sum(a[0] * np.diff(b))
and getting 1 back.
as ImportanceOfBeingErnest says you can use tight_layout() to resize the plot if it doesn't fit into the area

How can I plot ca. 20 million points as a scatterplot?

I am trying to create a scatterplot with matplotlib that consists of ca. ca. 20 million data points. Even after setting the alpha value to its lowest before ending up with no visible data at all the result is just a completely black plot.
plt.scatter(timedPlotData, plotData, alpha=0.01, marker='.')
The x-axis is a continuous timeline of about 2 months and the y-axis consists of 150k consecutive integer values.
Is there any way to plot all the points so that their distribution over time is still visible?
Thank you for your help.
There's more than one way to do this. A lot of folks have suggested a heatmap/kernel-density-estimate/2d-histogram. #Bucky suggesed using a moving average. In addition, you can fill between a moving min and moving max, and plot the moving mean over the top. I often call this a "chunkplot", but that's a terrible name. The implementation below assumes that your time (x) values are monotonically increasing. If they're not, it's simple enough to sort y by x before "chunking" in the chunkplot function.
Here are a couple of different ideas. Which is best will depend on what you want to emphasize in the plot. Note that this will be rather slow to run, but that's mostly due to the scatterplot. The other plotting styles are much faster.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
np.random.seed(1977)
def main():
x, y = generate_data()
fig, axes = plt.subplots(nrows=3, sharex=True)
for ax in axes.flat:
ax.xaxis_date()
fig.autofmt_xdate()
axes[0].set_title('Scatterplot of all data')
axes[0].scatter(x, y, marker='.')
axes[1].set_title('"Chunk" plot of data')
chunkplot(x, y, chunksize=1000, ax=axes[1],
edgecolor='none', alpha=0.5, color='gray')
axes[2].set_title('Hexbin plot of data')
axes[2].hexbin(x, y)
plt.show()
def generate_data():
# Generate a very noisy but interesting timeseries
x = mdates.drange(dt.datetime(2010, 1, 1), dt.datetime(2013, 9, 1),
dt.timedelta(minutes=10))
num = x.size
y = np.random.random(num) - 0.5
y.cumsum(out=y)
y += 0.5 * y.max() * np.random.random(num)
return x, y
def chunkplot(x, y, chunksize, ax=None, line_kwargs=None, **kwargs):
if ax is None:
ax = plt.gca()
if line_kwargs is None:
line_kwargs = {}
# Wrap the array into a 2D array of chunks, truncating the last chunk if
# chunksize isn't an even divisor of the total size.
# (This part won't use _any_ additional memory)
numchunks = y.size // chunksize
ychunks = y[:chunksize*numchunks].reshape((-1, chunksize))
xchunks = x[:chunksize*numchunks].reshape((-1, chunksize))
# Calculate the max, min, and means of chunksize-element chunks...
max_env = ychunks.max(axis=1)
min_env = ychunks.min(axis=1)
ycenters = ychunks.mean(axis=1)
xcenters = xchunks.mean(axis=1)
# Now plot the bounds and the mean...
fill = ax.fill_between(xcenters, min_env, max_env, **kwargs)
line = ax.plot(xcenters, ycenters, **line_kwargs)[0]
return fill, line
main()
For each day, tally up the frequency of each value (a collections.Counter will do this nicely), then plot a heatmap of the values, one per day. For publication, use a grayscale for the heatmap colors.
My recommendation would be to use a sorting and moving average algorithm on the raw data before you plot it. This should leave the mean and trend intact over the time period of interest while providing you with a reduction in clutter on the plot.
Group values into bands on each day and use a 3d histogram of count, value band, day.
That way you can get the number of occurrences in a given band on each day clearly.

Categories

Resources