I want to visualize the Birthday Problem with different n. My aim is to plot multiple graphs in the same figure but it does not work. It only plots the last graph and ignores the others. I am using the Jupyter Notebook.
This is my Code:
from decimal import Decimal
def calc_p_distinct(n):
p_distinct = numpy.arange(0, n.size, dtype=Decimal)
for i in n:
p_distinct[i] = Decimal(1.0)
for i in n:
for person in range(i):
p_distinct[i] = Decimal(p_distinct[i]) * Decimal(((Decimal(365-person))/Decimal(365)))
return p_distinct
# n is the number of people
n = numpy.arange(0, 20)
n2 = numpy.arange(0, 50)
n3 = numpy.arange(0, 100)
# plot the probability distribution
p_distinct = calc_p_distinct(n)
pylab.plot(n, p_distinct, 'r')
p_distinct2 = calc_p_distinct(n2)
pylab.plot(n2, p_distinct2, 'g')
p_distinct3 = calc_p_distinct(n3)
pylab.plot(n3, p_distinct3, 'b')
# set the labels of the axis and title
pylab.xlabel("n", fontsize=18)
pylab.ylabel("probability", fontsize=18)
pylab.title("birthday problem", fontsize=20)
# show grid
pylab.grid(True)
# show the plot
pylab.show()
When I replace one of the calc_p_distinct() function calls with another built-in function (e.g. numpy.sin(n)), it will show me two graphs. So, I conclude that it must have something to do with my function. What am I doing wrong here?
This isn't a problem with matplotlib; all the lines are there, just on top of each other (which makes perfect sense; for 100 people, the probability for only the first 20 is the same as for a group of just 20 people).
If I quickly plot them with a different line width:
Related
So, I'm trying to code with matplotlib, so that it plots coordinates of the majority of cities in the USA on a graph. As should be evident by the fact I'm asking this question, it isn't working. The code is just plotting all the points on a single diagonal line, shown below, and both axises are completely out of order (you can see it clearly on the y-axis). Below is both an image of the result, and the matplotlib code I'm using:
Image of multiple points on a single diagonal line with no order to either axises
Code:
def demand(places, distfact):
demandList = []
fig = plt.figure()
print(len(places))
for origin in places:
for destination in places:
if origin != destination:
dist = haversine(float(origin.lat), float(origin.lon), float(destination.lat),
float(destination.lon))
result = (int(origin.population) * int(destination.population)) / (dist * distfact)
line1 = [origin.name, destination.name, result,
[origin.lat, origin.lon], [destination.lat, destination.lon]]
line2 = [destination.name, origin.name, result,
[destination.lat, destination.lon], [origin.lat, origin.lon]]
if line2 not in demandList and result >= 75000000 and dist >= 30.0:
demandList.append(line1)
demandList.sort(key=lambda row: (row[2]), reverse=True)
for i in range(0, 20):
print(demandList[i][0], "->", demandList[i][1], ":", "{:,}".format(demandList[i][2]))
print("\n")
print(len(demandList) - 30, "routes")
print("\n")
for i in range(0,10):
ind = len(demandList) - i - 1
print(demandList[ind][0], "->", demandList[ind][1], ":", "{:,}".format(demandList[ind][2]))
for i in range(len(demandList)):
xpoints = np.array([demandList[i][3][0], demandList[i][4][0]])
ypoints = np.array([demandList[i][3][1], demandList[i][4][1]])
plt.plot(xpoints, ypoints, "o")
"places" is a list of objects. Each object contains a townID, name, population, latitude and longtitude. distfact is simply a number, in this example it's set to 5.
Just a guess as I can't see what's in your variables: try casting the values from your x_points and y_points variables to whatever makes sense in your case be it float or int. It seems to me that they are seen as strings by pyplot.
edit: Not a guess anymore, you're missing casts to float, when populating line1 with latitudes and longitudes. You do do it in your haversine computations but not three lines below.
I have modeled Brownian motion in both the x and y directions as random walks. I have plotted the data on a 2-d plot but, while it is not so difficult to trace the simulated particle's path from the origin, I want to be able to see the time-evolution of the particle's path visually represented on the plot, whether it be by changing the color of the line over time, or by adding a third dimension to the plot to represent time, or by using some sort of dynamic graph type.
I haven't tried implementing anything, but I have tried to look at what options are available to me. I want to avoid using a 3d plot if possible. That said, I am open to using something other than matplotlib if it makes sense for this situation (like pyqtgraph).
Here is my code:
import random
import numpy as np
import matplotlib.pyplot as plt
#n is how many trajectory evaluations
n = 1000
t= np.linspace(0,10000,num=n)
def brownianMotion(time):
B = [0]
for t in range(len(time)-1):
nrand = random.gauss(0,(time[t+1] - time[t])**.5)
B.append(B[t]+nrand)
return B
xpath = brownianMotion(t)
ypath = brownianMotion(t)
def plot(x,y):
plt.figure()
xplot = np.insert(x,0,0)
yplot = np.insert(y,0,0)
plt.plot(xplot,yplot,'go-',lw=1,ms=.1)
#np.arange(0,n+1),'go-', lw=1, ms = .1)
plt.xlim([-150,150])
plt.ylim([-150,150])
plt.title('Brownian Motion')
plt.xlabel('xDisplacement')
plt.ylabel('yDisplacement')
plt.show()
plot(xpath,ypath)
All in all, this is just for fun and something I did while bored at work. All suggestions are welcome! Thank you for your time!
Please let me know if I should post a picture of my code's output.
Edit: Additionally, if I wanted to represent multiple particles in the same graph, how could I do that so that the multiple pathes are distinguishable? I have modified my code for this purpose shown below but currently this code outputs a messy green mixture of particles.
import random
import numpy as np
import matplotlib.pyplot as plt
nparticles = 20
#n is how many trajectory evaluations
n = 100
t= np.linspace(0,1000,num=n)
def brownianMotion(time):
B = [0]
for t in range(len(time)-1):
nrand = random.gauss(0,(time[t+1] - time[t])**.5)
B.append(B[t]+nrand)
return B
xs = []
ys = []
for i in range(nparticles):
xs.append(brownianMotion(t))
ys.append(brownianMotion(t))
#xpath = brownianMotion(t)
#ypath = brownianMotion(t)
def plot(x,y):
plt.figure()
for xpath, ypath in zip(x,y):
xplot = np.insert(xpath,0,0)
yplot = np.insert(ypath,0,0)
plt.plot(xplot,yplot,'go-',lw=1,ms=.1)
#np.arange(0,n+1),'go-', lw=1, ms = .1)
plt.xlim([np.amin(x),np.amax(x)])
plt.ylim([np.amin(y),np.amax(y)])
plt.title('Brownian Motion')
plt.xlabel('xDisplacement')
plt.ylabel('yDisplacement')
plt.show()
plot(xs,ys)
I was trying to simulate "Sampling Distribution of Sample Proportions" using Python. I tried with a Bernoulli Variable as in example here
The crux is that, out of large number of gumballs, we have yellow balls with true proportion of 0.6. If we take samples (of some size, say 10), take mean of that and plot, we should get a normal distribution.
I tried to do in python but I only always get uniform distribution (or flats out in the middle). I am not able to understand what am I missing.
Program:
from SDSP import create_bernoulli_population, get_frequency_df
from random import shuffle, choices
from bi_to_nor_demo import get_metrics, bare_minimal_plot
import matplotlib.pyplot as plt
N = 10000 # 10000 balls
p = 0.6 # probability of yellow ball is 0.6, and others (1-0.6)=>0.4
n_pickups = 1000 # sample size
n_experiments = 100 # I dont know what this is called
# generate population
population = create_bernoulli_population(N,p)
theor_df = get_frequency_df(population)
theor_df
# choose sample, take mean and add to X_mean_list. Do this for n_experiments times
X_hat = []
X_mean_list = []
for each_experiment in range(n_experiments):
X_hat = choices(population, k=n_pickups) # this method is with replacement
shuffle(population)
X_mean = sum(X_hat)/len(X_hat)
X_mean_list.append(X_mean)
# plot X_mean_list as bar graph
stats_df = get_frequency_df(X_mean_list)
fig, ax = plt.subplots(1,1, figsize=(5,5))
X = stats_df['x'].tolist()
P = stats_df['p(x)'].tolist()
ax.bar(X, P, color="C0")
plt.show()
Dependent functions:
bi_to_nor_demo
SDSP
Output:
Update:
I even tried uniform distribution as below but getting similar output. Not converging to normal :(. (using below function in place of create_bernoulli_population)
def create_uniform_population(N, Y=[]):
"""
Given the total size of population N,
this function generates list of those outcomes uniformly distributed
population list
N - Population size, eg N=10000
p - probability of interested outcome
Returns the outcomes spread out in population as a list
"""
uniform_p = 1/len(Y)
print(uniform_p)
total_pops = []
for i in range(0,len(Y)):
each_o = [i]*(int(uniform_p*N))
total_pops += each_o
shuffle(total_pops)
return total_pops
can you please share your matplotlib settings? I think you have the plot truncated, you are correct in that the sample distribution of the sample proportion on a bernoulli should be normally distributed around the population expected value ...
perhaps using something as:
plt.tight_layout()
to check if there are no graph issues
def plotHist(nr, N, n_):
''' plots the RVs'''
x = np.zeros((N))
sp = f.add_subplot(3, 2, n_ )
for i in range(N):
for j in range(nr):
x[i] += np.random.binomial(10, 0.6)/10
x[i] *= 1/nr
plt.hist(x, 100, normed=True, color='#348ABD', label=" %d RVs"%(nr));
plt.setp(sp.get_yticklabels(), visible=False)
N = 1000000 # number of samples taken
nr = ([1, 2, 4, 8, 16, 32])
for i in range(np.size(nr)):
plotHist(nr[i], N, i+1)
Above is a code sample based on a general blog I wrote on CLT: https://rajeshrinet.github.io/blog/2014/central-limit-theorem/
Essentially, I am generating several random numbers (nr) from a distribution in the range (0,1) and summing them. Then I see, how do they converge as I increase the number of the random numbers.
Here is a screenshot of the code and the result.
Solution:
I guess I have arrived at the solution. By reverse engineering Rajesh's approach and taking hint from Daniel if graph could be an issue, finally I have figured out the culprit: default bar graph width being 0.8 is too wide to show my graph as flattened on top. Below is modified code and output.
from SDSP import create_bernoulli_population, get_frequency_df
from random import shuffle, choices
from bi_to_nor_demo import get_metrics, bare_minimal_plot
import matplotlib.pyplot as plt
N = 10000 # 10000 balls
p = 0.6 # probability of yellow ball is 0.6, and others (1-0.6)=>0.4
n_pickups = 10 # sample size
n_experiments = 2000 # I dont know what this is called
# THEORETICAL PDF
# generate population and calculate theoretical bernoulli pdf
population = create_bernoulli_population(N,p)
theor_df = get_frequency_df(population)
# STATISTICAL PDF
# choose sample, take mean and add to X_mean_list. Do this for n_experiments times.
X_hat = []
X_mean_list = []
for each_experiment in range(n_experiments):
X_hat = choices(population, k=n_pickups) # choose, say 10 samples from population (with replacement)
X_mean = sum(X_hat)/len(X_hat)
X_mean_list.append(X_mean)
stats_df = get_frequency_df(X_mean_list)
# plot both theoretical and statistical outcomes
fig, (ax1,ax2) = plt.subplots(2,1, figsize=(5,10))
from SDSP import plot_pdf
mu,var,sigma = get_metrics(theor_df)
plot_pdf(theor_df, ax1, mu, sigma, p, title='True Population Parameters')
mu,var,sigma = get_metrics(stats_df)
plot_pdf(stats_df, ax2, mu, sigma, p=mu, bar_width=round(0.5/n_pickups,3),title='Sampling Distribution of\n a Sample Proportion')
plt.tight_layout()
plt.show()
Output:
How can I use annotate() (or any other command for that matter) to add a second "ylabel" to the right of a figure which makes the text "scale" the same way as the other texts (axis x,y-label and title)? With scaling I mean that I don't want to hack text offsets manually or have a solution which fails as soon as I rescale the figure/add more plots/add a colorbar or similar. I don't want to use twinx, because I'm not plotting any additional data, and I don't need another axis.
Here's an image of what I want to achieve:
Here is my code to produce this image, I want to change the ax.annotate part:
import numpy as np
import matplotlib.pyplot as plt
numPlotsY = 3
numPlotsX = 3
f, ax_grid = plt.subplots(numPlotsY,numPlotsX,sharex=True,sharey=True)
A = np.arange(numPlotsY)+1.0 # Amplitude
phi = np.arange(numPlotsX) # Phase shift
x = np.linspace(0,2.0,100) # x
for y_i in range(0,numPlotsY):
for x_i in range(0,numPlotsX):
ax = ax_grid[y_i,x_i]
y = A[y_i]*np.sin(x*np.pi + phi[x_i])
ax.plot(x,y,lw=2.0)
# Add xlabel to the left column
if x_i == 0:
ax.set_ylabel(r'$y$')
ax.set_yticks([-4,-2,0,2,4])
# Add ylabel below bottom row
if y_i == numPlotsY-1:
ax.set_xlabel(r'$x/\pi$')
ax.set_xticks([0.5,1.0,1.5])
# Add Phi label above top row
if y_i == 0:
ax.set_title(r'$\phi=%s$' % phi[x_i])
# Add amplitude label to the right... how??
if x_i == numPlotsX-1:
ax.annotate(r'$A=%d$' % A[x_i], xy=(1.1,0.5), rotation=90,
ha='center',va='center',xycoords='axes fraction')
f.subplots_adjust(wspace=0,hspace=0)
plt.suptitle(r'$A\cdot\sin\left(2\pi x + \phi\right)$',fontsize=18)
plt.show()
I've seen this topic discussed several times without an elegant solution. There's always so much hacking involved. I really think this boils down to the way matplotlib treats the axes. Why can't there be one label for each of the four sides of the figure, that behave the same way?
I have a code for a pcolormesh heatmap and dendrogram which works pretty great, except that if I have a prime number (or sometimes not a prime number) of samples and/or genes, the mesh no longer fits the subplot. After playing around a lot, I realized this had to do with the way that pcolor/pcolormesh divides its squares (rounding errors), but I'm not familiar enough with the API to even begin to fix the problem. I would really like to have this code be generalizable for ALL numbers of samples/top genes. Btw, I didn't write this code alone, it was cobbled together from tons of SO questions, so thanks guys (whoever you are).
import scipy
import scipy.cluster.hierarchy as hier
import scipy.spatial.distance as dist
# xl is number of patients, yl is number of genes
# slicing: array[rows,cols]
xl = 20
yl = 50
X = np.transpose(np.random.uniform(-5,5,(100,100)))
#X = np.transpose(Ximp)
X = X[0:yl,0:xl]
fig = plt.figure()
plt.subplot2grid((10,1), (0,1))
X = np.transpose(X)
distMatrix = dist.pdist(X)
distSquareMatrix = dist.squareform(distMatrix)
linkageMatrix = hier.linkage(distSquareMatrix)
dendro = hier.dendrogram(linkageMatrix)
leaves = dendro['leaves']
plt.gca().set_xticklabels([])
plt.gca().set_yticklabels([])
plt.subplot2grid((10,1), (2,0), rowspan=8)
X = np.transpose(X)
X = X[:,leaves]
plt.pcolormesh(X, cmap=matplotlib.cm.RdBu_r, vmin=-5, vmax=5)
xlabels = [item[0:2] for item in demos[0]][0:xl]
relabelx = dict(zip(range(xl),xlabels))
ylabels = glist[0:yl]
plt.xticks(arange(0.5, xl+0.5, 1))
plt.yticks(arange(0.5, yl+0.5, 1))
plt.gca().set_xticklabels([relabelx[xval] for xval in leaves])
plt.gca().set_yticklabels(ylabels)
fig.subplots_adjust(right=0.8)
cbar_ax = fig.add_axes([0.85, 0.15, 0.05, 0.7])
plt.colorbar(cax=cbar_ax)
plt.show()
This code produces this makes this image:
However, I change xl to 22 and yl to 51 (and yes, I know 22 is not prime, but I'm trying to show that even though my problem is usually with primes, it's not exclusive to them), and I get this monstrosity:
Does anyone have any clue how to fix this?
Just add:
plt.xlim(xmax=22) #or xl
plt.ylim(ymax=51) #or yl
after
plt.pcolormesh(X, cmap=matplotlib.cm.RdBu_r, vmin=-5, vmax=5)
should do it.