So, I am generating plots on the same figure over a given iteration. As I have multiple plots there, I want them to start faded and as the iteration goes further, they get bolder and bolder. It is important to have plots with the same color.
A made up code in case you have any want to try yourself:
from scipy.stats import norm
import matplotlib.pyplot as plt
import numpy as np
i = np.array([1,1.2,1.4,1.6,1.8,2.0,3.0,4.0,5.0,6.0,7.0])
x = np.arange(-10, 10, .1)
for i in range(0,len(i)):
rv1 = norm(loc = 0., scale = 1.0*i)
plt.plot(x,rv1.pdf(x), color ='b')
plt.show()
I want my plot to resemble something like this:
You can do that using the alpha argument to plot, e.g.:
from scipy.stats import norm
import matplotlib.pyplot as plt
import numpy as np
idx = np.array([1,1.2,1.4,1.6,1.8,2.0,3.0,4.0,5.0,6.0,7.0])
xxx = np.arange(-10, 10, .1)
for i in range(0,len(idx)):
rv1 = norm(loc = 0., scale = 1.0*i)
plt.plot(xxx,rv1.pdf(xxx), color ='b', alpha=i / len(idx))
plt.show()
Note that I renamed some variables.
You can use a variable line width and the transparency parameter, alpha as following. Start with some initial values and then increase the line thickness by 20% (for example) and decrease the alpha by 10%
P.S: I read the comment of IOBE and added it to my for loop for readers. Don't call the iterator variable and the array by the same name i. I have now used ii in the for loop
wid = 0.8
alpha=0.9
for ii in range(0,len(i)):
rv1 = norm(loc = 0., scale = 1.0*ii)
plt.plot(x,rv1.pdf(x), color ='b', lw=wid, alpha=alpha)
wid *= 1.2
alpha *= 0.9
Related
I am trying to rebuild an image that I previously decomposed with SVD. The image is this:
I successfully decomposed the image with this code:
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
img = Image.open('steve.jpg')
img = np.mean(img, 2)
U,s,V = np.linalg.svd(img)
s an array of the singular values of the image. The more singular values I take, the more the reconstructed image is similar to the original one.
For example, if I take 20 singular values:
n = 20
S = np.zeros(np.shape(img))
for i in range(0, n):
S[i, i] = s[i]
recon_img = U#S#V
plt.imshow(recon_img)
plt.axis('off')
plt.show()
I would like to fix the minumum number of singular values in order to get a good result: an image pretty similary to the original one. Moreover, I would like to see how much the result changes when I take a higher number of singular values. I tried with an animation without success:
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
img = Image.open('steve.jpg')
img = np.mean(img, 2)
U,s,V = np.linalg.svd(img)
fig = plt.figure()
def update(i):
S = np.zeros(np.shape(img))
n = 20
for i in range(0, n):
S[i, i] = s[i]
recon_img = U#S#V
plt.imshow(recon_img)
plt.axis('off')
ani = FuncAnimation(fig = fig, func = update, frames = 20, interval = 10)
plt.show()
If you plot the s singular values you can see a very steep decreasing curve, better if you use a log scale for the y axis:
plt.semilogy(s, 'k-')
As you can see, the first 50 singular values are the most important ones: almost everyone more that 1000. Values from the ~50th to the ~250th are an order of magnitude lower and their values decreases slowly: the slope of the curve is contained (remember the logarithmic y scale). That beeing said I would take the first 50 elements to rebulid your image.
Regarding the animation:
while the animation updates frame by frame, the counter i is increased by 1. In your code, you mistakenly use i to slice the s and define S; you should rename the counter.
Moreover, as animation goes on, you need to take an increasing number of singular values, this is set by n which you keep constant frame by frame. You need to update n at each loop, so you can use it as the counter.
Furthermore, you need the erase the previous plotted image, so you need to add a plt.gca().cla() at the beginning of the update function.
Check the code below for reference:
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
img = Image.open('steve.jpg')
img = np.mean(img, 2)
U,s,V = np.linalg.svd(img)
fig, ax = plt.subplots(1, 2, figsize = (4, 4))
ax[0].imshow(img)
ax[0].axis('off')
ax[0].set_title('Original')
def init():
ax[1].cla()
ax[1].imshow(np.zeros(np.shape(img)))
ax[1].axis('off')
ax[1].set_title('Reconstructed\nn = 00')
def update(n):
ax[1].cla()
S = np.zeros(np.shape(img))
for i in range(0, n):
S[i, i] = s[i]
recon_img = U#S#V
ax[1].imshow(recon_img)
ax[1].axis('off')
ax[1].set_title(f'Reconstructed\nn = {n:02}')
ani = FuncAnimation(fig = fig, func = update, frames = 50, init_func = init, interval = 10)
ani.save('ani.gif', writer = 'imagemagick')
plt.show()
which gives this animation:
As you can see, the first 50 elements are enough to rebuild you image pretty well. The rest of the elements adds some noise and changes a little the background.
My data consists of the following:
Majority numbers < 60, and then a few outliers that are in the 2000s.
I want to display it in a histogram with the following bin ranges:
0-1, 1-2, 2-3, 3-4, ..., 59-60, 60-max
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.axes as axes
b = list(range(61)) + [2000] # will make [0, 1, ..., 60, 2000]
plt.hist(b, bins=b, edgecolor='black')
plt.xticks(b)
plt.show()
This shows the following:
Essentially what you see is all the numbers 0 .. 60 squished together on the left, and the 2000 on the right. This is not what I want.
So I remove the [2000] and get something like what I am looking for:
As you can see now it is better, but I still have the following problems:
How do I fix this such that the graph doesn't have any white space around (there's a big gap before 0 and after 60).
How do I fix this such that after 60, there is a 2000 tick that shows at the very end, while still keeping roughly the same spacing (not like the first?)
Here is one hacky solution using some random data. I still don't quite understand your second question but I tried to do something based on your wordings
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.axes as axes
fig, ax = plt.subplots(figsize=(12, 6))
data= np.random.normal(10, 5, 5000)
upper = 31
outlier = 2000
data = np.append(data, 100*[upper])
b = list(range(upper)) + [upper]
plt.hist(data, bins=b, edgecolor='black')
plt.xticks(b)
b[-1] = outlier
ax.set_xticklabels(b)
plt.xlim(0, upper)
plt.show()
I want to plot a histogram with Matplotlib, but I'd like the bins' values to represent the percentage of the total observations. A MWE would be like this:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import matplotlib.ticker as tck
import seaborn as sns
import numpy
sns.set(style='dark')
imagen2 = plt.figure(1, figsize=(5, 2))
imagen2.suptitle('StackOverflow Matplotlib histogram demo')
luminance = numpy.random.randn(1000, 1000)
# "Luminance" should range from 0.0...1.0 so we normalize it
luminance = (luminance - luminance.min())/(luminance.max() - luminance.min())
top_left = plt.subplot(121)
top_left.imshow(luminance)
bottom_left = plt.subplot(122)
sns.distplot(luminance.flatten(), kde_kws={"cumulative": True})
# plt.savefig("stackoverflow.pdf", dpi=300)
plt.tight_layout(rect=(0, 0, 1, 0.95))
plt.show()
The CDF here is OK (range: [0, 1]), but the resulting histogram doesn't match my expectations:
Why are the histogram's results in the range [0, 4]? Is there any way to fix this?
What you think you want
Here's how to plot the histogram such that the bins sum to 1:
import matplotlib.pyplot as plt
import matplotlib.ticker as tck
import seaborn as sns
import numpy as np
sns.set(style='dark')
imagen2 = plt.figure(1, figsize=(5, 2))
imagen2.suptitle('StackOverflow Matplotlib histogram demo')
luminance = numpy.random.randn(1000, 1000)
# "Luminance" should range from 0.0...1.0 so we normalize it
luminance = (luminance - luminance.min())/(luminance.max() - luminance.min())
# get the histogram values
heights,edges = np.histogram(luminance.flat, bins=30)
binCenters = (edges[:-1] + edges[1:])/2
# norm the heights
heights = heights/heights.sum()
# get the cdf
cdf = heights.cumsum()
left = plt.subplot(121)
left.imshow(luminance)
right = plt.subplot(122)
right.plot(binCenters, cdf, binCenters, heights)
# plt.savefig("stackoverflow.pdf", dpi=300)
plt.tight_layout(rect=(0, 0, 1, 0.95))
plt.show()
# confirm that the hist vals sum to 1
print('heights sum: %.2f' % heights.sum())
output:
heights sum: 1.00
The actual answer
This one is actually super easy. Just do
sns.distplot(luminance.flatten(), kde_kws={"cumulative": True}, norm_hist=True)
Here's what I get when I run your script with the above modification:
Surprise twist!
So it turns out that your histogram was normalized all along, as per the formal identity:
In plain(er) English, the general practice is to norm continuously valued histograms (ie their observations can be expressed as floating point number) in terms of their density. So in this case the sum of the bin widths times the bin heights will 1.0, as you can see by running this simplified version of your script:
import matplotlib.pyplot as plt
import matplotlib.ticker as tck
import numpy as np
imagen2 = plt.figure(1, figsize=(4,3))
imagen2.suptitle('StackOverflow Matplotlib histogram demo')
luminance = numpy.random.randn(1000, 1000)
luminance = (luminance - luminance.min())/(luminance.max() - luminance.min())
heights,edges,patches = plt.hist(luminance.ravel(), density=True, bins=30)
widths = edges[1:] - edges[:-1]
totalWeight = (heights*widths).sum()
# plt.savefig("stackoverflow.pdf", dpi=300)
plt.tight_layout(rect=(0, 0, 1, 0.95))
plt.show()
print(totalWeight)
And the totalWeight will indeed be exactly equal to 1.0, give or take a smidge of rounding error.
tel's answer is great! I just want to provide an alternative to give you the histogram you want with less lines. The key idea is to use weights arguments in the matplotlib hist function to normalize counts. You can replace your sns.distplot(luminance.flatten(), kde_kws={"cumulative": True}) with the following three lines of code:
lf = luminance.flatten()
sns.kdeplot(lf, cumulative=True)
sns.distplot(lf, kde=False,
hist_kws={'weights': numpy.full(len(lf), 1/len(lf))})
If you want to see the histogram on a second y-axis (better visual), add ax=bottom_left.twinx() to sns.distplot:
I am trying to plot some data, using a for loop to plot distributions. Now I want to label those distributions according to the loop counter as the subscript in math notation. This is where I am with this at the moment.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.mlab as mlab
mean = [10,12,16,22,25]
variance = [3,6,8,10,12]
x = np.linspace(0,40,1000)
for i in range(4):
sigma = np.sqrt(variance[i])
y = mlab.normpdf(x,mean[i],sigma)
plt.plot(x,y,label=$v_i$) # where i is the variable i want to use to label. I should also be able to use elements from an array, say array[i] for the same.
plt.xlabel("X")
plt.ylabel("P(X)")
plt.legend()
plt.axvline(x=15, ymin=0, ymax=1,ls='--',c='black')
plt.show()
This doesn't work, and I can't keep the variable between the $ $ signs of the math notation, as it is interpreted as text. Is there a way to put the variable in the $ $ notation?
The original question has been edited, this answer has been updated to reflect this.
When trying to work with LaTeX formatting in matplotlib you must use raw strings, denoted by r"".
The code given below will iterate over range(4) and plot using i'th mean and variance (as you originally have done). It will also set the label for each plot using label=r'$v_{}$'.format(i+1). This string formatting simply replaces the {} with whatever is called inside format, in this case i+1. In this way you can automate the labels for your plots.
I have removed the plt.axvline(...), plt.xlabel(...) and plt.ylabel(...) out of the for loop as you only need to call it once. I've also removed the plt.legend() from the for loop for the same reason and have removed its arguments. If you supply the keyword argument label to plt.plot() then you can label your plots individually as you plot them.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.mlab as mlab
mean = [10,12,16,22,25]
variance = [3,6,8,10,12]
x = np.linspace(0,40,1000)
for i in range(4):
sigma = np.sqrt(variance[i])
y = mlab.normpdf(x,mean[i],sigma)
plt.plot(x,y, label=r'$v_{}$'.format(i+1))
plt.xlabel("X")
plt.ylabel("P(X)")
plt.axvline(x=15, ymin=0, ymax=1,ls='--',c='black')
plt.legend()
plt.show()
So it turns out that you edited your question based on my answer. However, you;re still not quite there. If you want to do it the way I think you want to code it, it should be like this:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.mlab as mlab
mean = [10, 12, 16, 22, 25]
variance = [3, 6, 8, 10, 12]
x = np.linspace(0, 40, 1000)
for i in range(4):
sigma = np.sqrt(variance[i])
y = mlab.normpdf(x, mean[i], sigma)
plt.plot(x, y, label = "$v_{" + str(i) + "}$")
plt.xlabel("X")
plt.ylabel("P(X)")
plt.legend()
plt.axvline(x = 15, ymin = 0, ymax = 1, ls = '--', c = 'black')
plt.show()
This code generates the following figure:
In case you want the first plot start with v_1 instead of v_0 all you need to change is str(i+1). This way the subscripts are 1, 2, 3, and 4 instead of 0, 1, 2 and 3.
Hope this helps!
If I'm generating a colorbar for an imshow plot, sometimes I end up with a result that includes only one tick-mark --- making the scale fairly indeterminate. Is there a way to ensure that at least 2 tick marks will be present? For example, making sure that at least both ends of the scale are labeled?
For example:
Code to reproduce:
import numpy as np
import matplotlib as mpl
from matplotlib import pyplot as plt
SIZE = [100,100]
MIN = 0.2
tt = np.square(np.random.uniform(size=SIZE))
for ii in range(SIZE[0]):
for jj in range(SIZE[1]):
while( tt[ii,jj] < MIN ): tt[ii,jj] = np.random.uniform()
ran = [ np.min(tt), np.max(tt) ]
print ran
use_norm = mpl.colors.LogNorm()
use_norm.vmin = ran[0]
use_norm.vmax = ran[1]
plt.imshow(tt, norm=use_norm)
plt.colorbar()
plt.show()
which produces something like: