generating uniform distribution of integeres with python - python

I tried to generate an uniform distribution of random integeres on a given interval (it's unimportant whether it contains its upper limit or not) with python. I used the next snippet of code to do so and plot the result:
import numpy as np
import matplotlib.pyplot as plt
from random import randint
propsedPython = np.random.randint(0,32767,8388602)%2048
propsedPythonNoMod = np.random.randint(0,2048,8388602)
propsedPythonNoModIntegers = np.random.random_integers(0,2048,8388602)
propsedPythonNoModRandInt = np.empty(8388602)
for i in range(8388602):
propsedPythonNoModRandInt[i] = randint(0,2048)
plt.figure(figsize=[16,10])
plt.title(r'distribution $\rho_{prop}$ off all the python simulated proposed indices')
plt.xlabel(r'indices')
plt.ylabel(r'$\rho_{prop}$')
plt.yscale('log')
plt.hist(propsedPython,bins=1000,histtype='step',label=r'np.random.randint(0,32767,8388602)%2048')
plt.hist(propsedPythonNoMod,bins=1000,histtype='step',label=r'np.random.randint(0,2048,8388602')
plt.hist(propsedPythonNoModIntegers,bins=1000,histtype='step',label=r'np.random.random_integers(0,2048,8388602)')
plt.hist(propsedPythonNoModRandInt,bins=1000,histtype='step',label=r'for i in range(8388602):propsedPythonNoModRandInt[i] = randint(0,2048)')
plt.legend(loc=0)
The resulting plot is: Could somebody point me in the right direction why these spikes appear in al the different cases and or gives some advice which routine to use to got uniformly distributed random integers?
Thanks a lot!

Mmm...
I used new NumPy rng facility, and graph looks ok to me.
Code
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng()
N = 1024*500
hist = np.zeros(2048, dtype=np.int32)
q = rng.integers(0, 2048, dtype=np.int32, size=N, endpoint=False)
for k in range(0, N):
hist[q[k]] += 1
x = np.arange(0, 2048, dtype=np.int32)
fig, ax = plt.subplots()
ax.stem(x, hist, markerfmt=' ')
plt.show()
and graph

Related

Hist chart from list

import matplotlib as plp
cube = []
z = 0
while not z == 50:
x = random.randint(1, 6)
cube.append(x)
z = z + 1
print(cube)
plp.plot(cube[1])
plp.show()
How to repair this code to show histogram from components include in my list cube?
The comments do make some suggestions on most of the fixes you can make, but you can also improve this code alot in other ways. Here is what I would propose:
import matplotlib.pyplot as plt
import random
cube = []
for _ in range(50):
x = random.randint(1, 6)
cube.append(x)
plt.hist(cube)
plt.show()
First, since you only use z as an iteration counter, a for loop is better here (though the while loop will still work, it is more error prone). I also changed plp to plt, you don't have to do this but this is the convention. You can then use plt.hist(cube) to plot a histogram.
Note that if you want to use numpy, you can make this even simpler:
import matplotlib.pyplot as plt
import numpy as np
cube = np.random.randint(1, 6, size=50)
plt.hist(cube)
plt.show()
Since numpy lets you specify the size of the array of random numbers you want.

How to generate normal distribution samples (with specific mean and variance) in Python?

I'm new to Python and I would like to generate 1000 samples having normal distribution with specific mean and variance. I got to know a library called NumPy and I am wondering if I used it in the correct way. Here's my code:
import numpy
a = numpy.random.normal(0, 1, 1000)
print(a)
where 0 is the mean, 1 is the standard deviation (which is square root of variance), and 1000 is the size of the population.
Is this the correct way, or is there a better way to do it?
Yes, that's the way to generate 1000 samples in a normal distribution N(0,1).
And you can see that the output of these 1000 samples are mostly within -3 and 3, as 99.73% will be within plus/minus 3 standard deviations:
The colorful graph is done using below codes:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
np.random.seed(7)
y = np.random.normal(0, 1, 1000)
colors = cm.rainbow(np.linspace(0, 1, 11))
for x,y in enumerate(y):
plt.scatter(x,y, color=colors[np.random.randint(0,10)])
However it's faster to generate a single-color chart:
np.random.seed(7)
x = [i for i in range(1, 1001)]
y = np.random.normal(0, 1, 1000)
plt.scatter(x, y, color='navy')

Plot size = 1/{N∗⌈log2N⌉∗[(1/70)/60]} in matplotlib in python?

Similar with: Plot size = 1/{N∗⌈log2N⌉∗[(1/70)/60]} in R?
But with matplotlib in python (I guess it will be better to plot the function with matplotlib):
size = 1/{N∗⌈log_2(N)⌉∗[(a)/60]}
a = [1/70, 1/60, 1/50, 1/40]
How can I plot this function (for every value in a - it should be one graphic) with matplotlib in python?
(⌈⌉= ceil)
For example:
With label "size" for y-axis and "N" for the x-axis.
N >= 2, N is natural Number (2,3,4,5,6,...) (but it is not necessary to implement this... see picture above)
I have tried this one as a first approach:
import matplotlib.pyplot as plt
import numpy as np
n = np.arange(3,50,0.1)
size = (1)/n*np.ceil(np.log2(n))*((1/70)/60))
plt.plot(n,size)
plt.axis([3,50,0,550])
plt.show()
If you are looking to plot all the distinct segments and not as continuous lines, one way would be to look for discontinuities in the derivative. In this case, the slopes should always be increasing as n increases (n > 0), so you can look for when it violates this condition and then split the lines there.
import matplotlib.pyplot as plt
import numpy as np
from numpy import diff
n = np.arange(3,50,0.1)
a = [1/70,1/60,1/50,1/40]
discont = np.ones(len(n)-1) #array to show discontinuities
discont[1] = 0
for i in a:
size = 1/(n*np.ceil(np.log2(n))*(i/60))
derivs = diff(size)
for k in range(len(derivs)-2):
if derivs[k+1] > derivs[k]:
discont[k+2] = 0
segments = np.squeeze(np.asarray(discont.nonzero()))
for j in range(len(segments)-1):
start, stop = segments[j], segments[j+1]
plt.plot(n[start:stop],size[start:stop], 'b')
plt.axis([0,20,0,300])
plt.xlabel('N')
plt.ylabel('Size')
plt.grid()
plt.show()
This will produce the following plot:

Normal distribution appears too dense when plotted in matplotlib

I am trying to estimate the probability density function of my data. IN my case, the data is a satellite image with a shape 8200 x 8100.
Below, I present you the code of PDF (the function 'is_outlier' is borrowed by a guy that post this code on here ). As we can see, the PDF is in figure 1 too dense. I guess, this is due to the thousands of pixels that the satellite image is composed of. This is very ugly.
My question is, how can I plot a PDF that is not too dense? something like shown in figure 2 for example.
lst = 'satellite_img.tif' #import the image
lst_flat = lst.flatten() #create 1D array
#the function below removes the outliers
def is_outlier(points, thres=3.5):
if len(points.shape) == 1:
points = points[:,None]
median = np.median(points, axis=0)
diff = np.sum((points - median)**2, axis=-1)
diff = np.sqrt(diff)
med_abs_deviation = np.median(diff)
modified_z_score = 0.6745 * diff / med_abs_deviation
return modified_z_score > thres
lst_flat = np.r_[lst_flat]
lst_flat_filtered = lst_flat[~is_outlier(lst_flat)]
fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
plt.plot(lst_flat_filtered, fit)
plt.hist(lst_flat_filtered, bins=30, normed=True)
plt.show()
figure 1
figure 2
The issue is that the x values in the PDF plot are not sorted, so the plotted line is going back and forwards between random points, creating the mess you see.
Two options:
Don't plot the line, just plot points (not great if you have lots of points, but will confirm if what I said above is right or not):
plt.plot(lst_flat_filtered, fit, 'bo')
Sort the lst_flat_filtered array before calculating the PDF and plotting it:
lst_flat = np.r_[lst_flat]
lst_flat_filtered = np.sort(lst_flat[~is_outlier(lst_flat)]) # Changed this line
fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
plt.plot(lst_flat_filtered, fit)
Here's some minimal examples showing these behaviours:
Reproducing your problem:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
lst_flat_filtered = np.random.normal(7, 5, 1000)
fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
plt.hist(lst_flat_filtered, bins=30, normed=True)
plt.plot(lst_flat_filtered, fit)
plt.show()
Plotting points
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
lst_flat_filtered = np.random.normal(7, 5, 1000)
fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
plt.hist(lst_flat_filtered, bins=30, normed=True)
plt.plot(lst_flat_filtered, fit, 'bo')
plt.show()
Sorting the data
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
lst_flat_filtered = np.sort(np.random.normal(7, 5, 1000))
fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
plt.hist(lst_flat_filtered, bins=30, normed=True)
plt.plot(lst_flat_filtered, fit)
plt.show()

Graph only shows the data in 1 graph but not the other

I'm having trouble with my code. It's only showing the data for one graph and the other graph is blank. I can't figure why it's not working.
I'm using the subplot() function and my guess is that the the reason might be the way my function are formatted.
import numpy as np
import matplotlib.pyplot as plt
import cvxopt as opt
from cvxopt import blas, solvers
import pandas as pd
import mpld3
from mpld3 import plugins
np.random.seed(123)
solvers.options['show_progress'] = False
n_assets = 4
n_obs = 1000 # original 1000
return_vec = np.random.randn(n_assets, n_obs)
def rand_weights(n):
k = np.random.rand(n)
return k / sum(k) print(rand_weights(n_assets)) print(rand_weights(n_assets))
def random_portfolio(returns):
p = np.asmatrix(np.mean(returns,axis=1))
w = np.asmatrix(rand_weights(returns.shape[0]))
C = np.asmatrix(np.cov(returns))
mu = w * p.T
sigma = np.sqrt(w * C * w.T)
#this recursion reduces outlier to keep the graph nice
if sigma > 2:
return random_portfolio(returns)
return mu, sigma
n_portfolios = 500
means, stds = np.column_stack([random_portfolio(return_vec) for _ in range(n_portfolios)])
plt.plot(return_vec.T, alpha=.4);
plt.xlabel('time')
plt.ylabel('returns')
plt.figure(1)
plt.subplot(212)
plt.plot(stds, means, 'o', markersize = 5)
plt.xlabel('std')
plt.ylabel('mean')
plt.title('Mean and standard deviation of returns of randomly generated portfolios')
plt.subplot(211)
plt.figure(1)
plt.show()
You need to move the line plt.subplot(211) before your first call to plt.plot. This is because calls to plt.subplot must precede the actual plotting in that subplot.

Categories

Resources