how to generate uniform random complex numbers in python - python

For my research, I need to generate uniform random complex numbers. How to do this in python because there is no such module to generate the complex numbers.

Your question is underspecified, you need to say from what region of the complex plane you want to draw your uniformly distributed numbers.
For uniformly sampling real numbers, this is true as well.
However, in the real case there is a very obvious choice, namely the interval [0, 1).
You can see, for example that numpy.random.uniform per default samples from this interval.
I will present some solution for regions of the complex plane that could be useful, but ultimately, the choice that is right for you will depend on your application.
Assume np is numpy and that we want to genereate an array of many such random numbers with shape shape.
A square centered at the origin
I.e. sampling uniformly from all complex numbers z such that both real and imaginary part are in [-1,1]. You can generate such complex numbers e.g. via
np.random.uniform(-1, 1, shape) + 1.j * np.random.uniform(-1, 1, shape)
A disc centered at the origin
I.e. sampling uniformly from all complex numbers with absolute value in [0,1]. You can generate them e.g. as
np.sqrt(np.random.uniform(0, 1, shape)) * np.exp(1.j * np.random.uniform(0, 2 * np.pi, shape))
Explanation: We can parametrize points in the disc as z = r * exp(i a).
Uniform distribution of z in the disc means that the angle a is uniform in [0, 2pi] but the radius is non-uniform (intuition: in the disc there are more points with larger radius than with a small one).
the radius has a probability density of p(r) = 2r on the interval [0, 1], and a CDF (integral of p(r)) of F(r) = r^2, inverse CDF sampling then allows us to sample such radii as X = F^{-1}(Y) = sqrt(Y) where Y is uniformly distributed.

Is it not enough to do:
a = np.random.uniform(1,10,10)
b = a + a * <some constant>j
I think this stays uniform.
array([7.51553061 +9.01863673j, 1.53844779 +1.84613735j,
2.33666459 +2.80399751j, 9.44081138+11.32897366j,
7.47316887 +8.96780264j, 6.96193206 +8.35431847j,
9.13933486+10.96720183j, 2.10023098 +2.52027718j,
4.70705458 +5.6484655j , 8.02055689 +9.62466827j])

Related

Simulating expectation of continuous random variable

Currently I want to generate some samples to get expectation & variance of it.
Given the probability density function: f(x) = {2x, 0 <= x <= 1; 0 otherwise}
I already found that E(X) = 2/3, Var(X) = 1/18, my detail solution is from here https://math.stackexchange.com/questions/4430163/simulating-expectation-of-continuous-random-variable
But here is what I have when simulating using python:
import numpy as np
N = 100_000
X = np.random.uniform(size=N, low=0, high=1)
Y = [2*x for x in X]
np.mean(Y) # 1.00221 <- not equal to 2/3
np.var(Y) # 0.3323 <- not equal to 1/18
What am I doing wrong here? Thank you in advanced.
You are generating the mean and variance of Y = 2X, when you want the mean and variance of the X's themselves. You know the density, but the CDF is more useful for random variate generation than the PDF. For your problem, the density is:
so the CDF is:
Given that the CDF is an easily invertible function for the range [0,1], you can use inverse transform sampling to generate X values by setting F(X) = U, where U is a Uniform(0,1) random variable, and inverting the relationship to solve for X. For your problem, this yields X = U1/2.
In other words, you can generate X values with
import numpy as np
N = 100_000
X = np.sqrt(np.random.uniform(size = N))
and then do anything you want with the data, such as calculate mean and variance, plot histograms, use in simulation models, or whatever.
A histogram will confirm that the generated data have the desired density:
import matplotlib.pyplot as plt
plt.hist(X, bins = 100, density = True)
plt.show()
produces
The mean and variance estimates can then be calculated directly from the data:
print(np.mean(X), np.var(X)) # => 0.6661509538922444 0.05556962913014367
But wait! There’s more...
Margin of error
Simulation generates random data, so estimates of mean and variance will be variable across repeated runs. Statisticians use confidence intervals to quantify the magnitude of the uncertainty in statistical estimates. When the sample size is sufficiently large to invoke the central limit theorem, an interval estimate of the mean is calculated as (x-bar ± half-width), where x-bar is the estimate of the mean. For a so-called 95% confidence interval, the half-width is 1.96 * s / sqrt(n) where:
s is the estimated standard deviation;
n is the number of samples used in the estimates of mean and standard deviation; and
1.96 is a scaling constant derived from the normal distribution and the desired level of confidence.
The half-width is a quantitative measure of the margin of error, a.k.a. precision, of the estimate. Note that as n gets larger, the estimate has a smaller margin of error and becomes more precise, but there are diminishing returns to increasing the sample size due to the square root. Increasing the precision by a factor of 2 would require 4 times the sample size if independent sampling is used.
In Python:
var = np.var(X)
print(np.mean(X), var, 1.96 * np.sqrt(var / N))
produces results such as
0.6666763186360812 0.05511848269208021 0.0014551397290634852
where the third column is the confidence interval half-width.
Improving precision
Inverse transform sampling can yield greater precision for a given sample size if we use a clever trick based on fundamental properties of expectation and variance. In intro prob/stats courses you probably were told that Var(X + Y) = Var(X) + Var(Y). The true relationship is actually Var(X + Y) = Var(X) + Var(Y) + 2Cov(X,Y), where Cov(X,Y) is the covariance between X and Y. If they are independent, the covariance is 0 and the general relationship becomes the one we learn/teach in intro courses, but if they are not independent the more general equation must be used. Variance is always a positive quantity, but covariance can be either positive or negative. Consequently, it’s easy to see that if X and Y have negative covariance the variance of their sum will be less than when they are independent. Negative covariance means that when X is above its mean Y tends to be below its mean, and vice-versa.
So how does that help? It helps because we can use the inverse transform, along with a technique known as antithetic variates, to create pairs of random variables which are identically distributed but have negative covariance. If U is a random variable with a Uniform(0,1) distribution, U’ = 1 - U also has a Uniform(0,1) distribution. (In fact, flipping any symmetric distribution will produce the same distribution.) As a result, X = F-1(U) and X’ = F-1(U’) are identically distributed since they’re defined by the same CDF, but will have negative covariance because they fall on opposite sides of their shared median and thus strongly tend to fall on opposite sides of their mean. If we average each pair to get A = (F-1(ui) + F-1(1-ui)) / 2) the expected value E[A] = E[(X + X’)/2] = 2E[X]/2 = E[X] while the variance Var(A) = [(Var(X) + Var(X’) + 2Cov(X,X’)]/4 = 2[Var(X) + Cov(X,X’)]/4 = [Var(X) + Cov(X,X’)]/2. In other words, we get a random variable A whose average is an unbiased estimate of the mean of X but which has less variance.
To fairly compare antithetic results head-to-head with independent sampling, we take the original sample size and allocate it with half the data being generated by the inverse transform of the U’s, and the other half generated by antithetic pairing using 1-U’s. We then average the paired values and generate statistics as before. In Python:
U = np.random.uniform(size = N // 2)
antithetic_avg = (np.sqrt(U) + np.sqrt(1.0 - U)) / 2
anti_var = np.var(antithetic_avg)
print(np.mean(antithetic_avg), anti_var, 1.96*np.sqrt(anti_var / (N / 2)))
which produces results such as
0.6667222935263972 0.0018911848781598295 0.0003811869837216061
Note that the half-width produced with independent sampling is nearly 4 times as large as the half-width produced using antithetic variates. To put it another way, we would need more than an order of magnitude more data for independent sampling to achieve the same precision.
To approximate the integral of some function of x, say, g(x), over S = [0, 1], using Monte Carlo simulation, you
generate N random numbers in [0, 1] (i.e. draw from the uniform distribution U[0, 1])
calculate the arithmetic mean of g(x_i) over i = 1 to i = N where x_i is the ith random number: i.e. (1 / N) times the sum from i = 1 to i = N of g(x_i).
The result of step 2 is the approximation of the integral.
The expected value of continuous random variable X with pdf f(x) and set of possible values S is the integral of x * f(x) over S. The variance of X is the expected value of X-squared minus the square of the expected value of X.
Expected value: to approximate the integral of x * f(x) over S = [0, 1] (i.e. the expected value of X), set g(x) = x * f(x) and apply the method outlined above.
Variance: to approximate the integral of (x * x) * f(x) over S = [0, 1] (i.e. the expected value of X-squared), set g(x) = (x * x) * f(x) and apply the method outlined above. Subtract the result of this by the square of the estimate of the expected value of X to obtain an estimate of the variance of X.
Adapting your method:
import numpy as np
N = 100_000
X = np.random.uniform(size = N, low = 0, high = 1)
Y = [x * (2 * x) for x in X]
E = [(x * x) * (2 * x) for x in X]
# mean
print((a := np.mean(Y)))
# variance
print(np.mean(E) - a * a)
Output
0.6662016482614397
0.05554821798023696
Instead of making Y and E lists, a much better approach is
Y = X * (2 * X)
E = (X * X) * (2 * X)
Y, E in this case are numpy arrays. This approach is much more efficient. Try making N = 100_000_000 and compare the execution times of both methods. The second should be much faster.

Generating a vector with with a random uniform direction and a Gaussian-distributed magnitude

I want to add 2D Gaussian noise to each (x,y) point of a list that I have.
That is why I want to create a noise vector with a random uniform direction over [0, 2pi) and a Gaussian-distributed magnitude with N(0, \sigma^2).
How can I generate a vector in Python only specifying the direction and its magnitude?
Well, this is not hard to do
n = 100
sigma = 1.0
phi = 2.0 * np.pi * np.random.random(n)
r = np.random.normal(loc=0.0, scale=sigma, size=n)
x = r*np.cos(phi)
y = r*np.sin(phi)
You can generate two vectors, one for the magnitude and another for the phase. Then you use both to get what you want.
import numpy as np
import math
sigma_squred = 0.01 # Change to whatever value you want
num_elements = 10 # Size of the vector you want
magnitude = math.sqrt(sigma_squred) * np.random.randn(num_elements)
phase = 2 * np.pi * np.random.random_sample(num_elements)
# This will give you a vector with a Gaussian magnitude and a random phase between 0 and 2PI
noise = magnitude * np.exp(1j*phase)
I find it easier to work with a single vector of complex numbers, but since you have individual x and y values, you can get a noise_x and a noise_y vector with
noise_x = noise.real
noise_y = noise.imag
Note: I'm assuming you can use the numpy library, which make things much easier. If that is not the case you will need a loop to generate each element. To generate a single sample for magnitude you can use random.gauss(0, sigma), while 2*math.pi*random.random() can be used to generate a sample for the phase. Then you do the same as before to get a complex number from where you can get the real and the imaginary parts.

Generating random vectors of Euclidean norm <= 1 in Python?

More specifically, given a natural number d, how can I generate random vectors in R^d such that each vector x has Euclidean norm <= 1?
Generating random vectors via numpy.random.rand(1,d) is no problem, but the likelihood of such a random vector having norm <= 1 is predictably bad for even not-small d. For example, even for d = 10 about 0.2% percent of such random vectors have appropriately small norm. So that seems like a silly solution.
EDIT: Re: Walter's comment, yes, I'm looking for a uniform distribution over vectors in the unit ball in R^d.
Based on the Wolfram Mathworld article on hypersphere point picking and Nate Eldredge's answer to a similar question on math.stackexchange.com, you can generate such a vector by generating a vector of d independent Gaussian random variables and a random number U uniformly distributed over the closed interval [0, 1], then normalizing the vector to norm U^(1/d).
Based on the answer by user2357112, you need something like this:
import numpy as np
...
inv_d = 1.0 / d
for ...:
gauss = np.random.normal(size=d)
length = np.linalg.norm(gauss)
if length == 0.0:
x = gauss
else:
r = np.random.rand() ** inv_d
x = np.multiply(gauss, r / length)
# conceptually: / length followed by * r
# do something with x
(this is my second Python program, so don't shoot at me...)
The tricks are that
the combination of d independent gaussian variables with same σ is a gaussian distribution in d dimensions, which, remarkably, has spherical symmetry,
the gaussian distribution in d dimensions can be projected onto the unit sphere by dividing by the norm, and
the uniform distribution in a d-dimensional unit sphere has cumulative radial distribution rd (which is what you need to invert)
this is the Python / Numpy code I am using. Since it does not use loops, is much faster:
n_vectors=1000
d=2
rnd_vec=np.random.uniform(-1, 1, size=(n_vectors, d)) # the initial random vectors
unif=np.random.uniform(size=n_vectors) # a second array random numbers
scale_f=np.expand_dims(np.linalg.norm(rnd_vec, axis=1)/unif, axis=1) # the scaling factors
rnd_vec=rnd_vec/scale_f # the random vectors in R^d
The second array of random numbers (unif) is needed as second scaling factor because otherwise all the vectors will have euclidean norm equal to one.

Can I get data spread (noise) from singular value decomposition?

I'm was hoping to use singular value decomposition to estimate the standard deviation of eliptoid data. I'm not sure if this is the best approach and I may be overthinking the entire process so I need some help.
I simulated some data using the following script...
from matplotlib import pyplot as plt
import numpy
def svd_example():
# simulate some data...
# x values have standard deviation 3000
xdata = numpy.random.normal(0, 3000, 5000).reshape(-1, 1)
# y values standard deviation 300
ydata = numpy.random.normal(0, 300, 5000).reshape(-1, 1)
# apply some rotation
ydata_rotated = ydata + (xdata * 0.5)
data = numpy.hstack((xdata, ydata_rotated))
# get singular values
left_singular_matrix, singular_values, right_singular_matrix = numpy.linalg.svd(data)
print 'singular values', singular_values
# plot data....
plt.scatter(data[:, 0], data[:, 1], s=5)
plt.ylim(-15000, 15000)
plt.show()
svd_example()
I get singular values of...
>>> singular values [ 234001.71228678 18850.45155942]
My data looks like this...
I was under the assumption that the singular values would give me some indication of the spread of data regardless of it's rotation, right? But these values, [234001.71228678 18850.45155942], make no sense to me. My standard deviations were 3000 and 300. Do these singular values represent variance? How do I convert them?
The singular values indeed give some indication of the spread. In fact, they are related to the standard deviation in these directions. However, they are not normalized. If you divide by the square-root of the number samples, you will get values that closely resemble the standard deviations used for creating the data:
singular_values / np.sqrt(5000)
# array([ 3398.61320614, 264.00975837])
Why do you get 3400 and 264 instead of 3000 and 300? That is because ydata + (xdata * 0.5) is not a rotation but a shearing operation. A real rotation would preserve the original standard deviations.
For example, the following code would rotate the data by 40 degrees:
# apply some rotation
s = numpy.sin(40 * numpy.pi / 180)
c = numpy.cos(40 * numpy.pi / 180)
data = numpy.hstack((xdata, ydata)).dot([[c, s], [-s, c]])
With such a rotation you will get normalized singular values that are pretty close to the original standard deviations.
Edit:
On Normalization
I have to admit, normalization is probably not the correct term to apply here. It does not necessarily mean to scale values to a certain range. Normalization, as I meant it, was to bring values into a defined range, independent of the number of samples.
To understand where the division by sqrt(5000) comes from, let's talk about the standard deviation. Let x, be a data vector of n samples with zero mean. Then the standard deviation is computed as sqrt(sum(x**2)/n) or sqrt(sum(x**2)) / sqrt(n). Now, you can think of the singular value decomposition to compute only the sqrt(sum(x**2)) part, so we have to divide by sqrt(n) ourselves.
I'm afraid, this is not a very mathematical explanation, but hopefully it conveys the idea.

Power Spectrum and Autocorrelation of Data in Numpy

I am interested in computing the power spectrum of a system of particles (~100,000) in 3D space with Python. What I have found so far is a group of functions in Numpy (fft,fftn,..) which compute the discrete Fourier transform, of which the square of the absolute value is the power spectrum. My question is a matter of how my data are being represented - and truthfully may be fairly simple to answer.
The data structure I have is an array which has a shape of (n,2), n being the number of particles I have, and each column representing either the x, y, and z coordinate of the n particles. The function I believe I should be using it the fftn() function, which takes the discrete Fourier transform of an n-dimensional array - but it says nothing about the format. How should the data be represented as a data structure to be fed into fftn?
Here is what I've tried so far to test the function:
import numpy as np
import random
import matplotlib.pyplot as plt
DATA = np.zeros((100,3))
for i in range(len(DATA)):
DATA[i,0] = random.uniform(-1,1)
DATA[i,1] = random.uniform(-1,1)
DATA[i,2] = random.uniform(-1,1)
FFT = np.fft.fftn(DATA)
PS = abs(FFT)**2
plt.plot(PS)
plt.show()
The array entitled DATA is a mock array, ultimately the thing which will be 100,000 by 3 in shape. The output of the code gives me something like:
As you can see, I think this is giving me three 1D power spectra (1 for each column of my data), but really I'd like a power spectrum as a function of radius.
Does anybody have any advice or alternative methods/packages they know of to compute the power spectrum (I'd even settle for the two point autocorrelation function).
It doesn't quite work the way you are setting it out...
You need a function, lets call it f(x, y, z), that describes the density of mass in space. In your case, you can consider the galaxies as point masses, so you will have a delta function centered at the location of each galaxy. It is for this function that you can calculate the three-dimensional autocorrelation, from which you could calculate the power spectrum.
If you want to use numpy to do that for you, you are first going to have to discretize your function. A possible mock example would be:
import numpy as np
import matplotlib.pyplot as plt
space = np.zeros((100, 100, 100), dtype=np.uint8)
x, y, z = np.random.randint(100, size=(3, 1000))
space[x, y, z] += 1
space_ps = np.abs(np.fft.fftn(space))
space_ps *= space_ps
space_ac = np.fft.ifftn(space_ps).real.round()
space_ac /= space_ac[0, 0, 0]
And now space_ac holds the three-dimensional autocorrelation function for the data set. This is not quite what you are after, and to get you one-dimensional correlation function you would have to average the values on spherical shells around the origin:
dist = np.minimum(np.arange(100), np.arange(100, 0, -1))
dist *= dist
dist_3d = np.sqrt(dist[:, None, None] + dist[:, None] + dist)
distances, _ = np.unique(dist_3d, return_inverse=True)
values = np.bincount(_, weights=space_ac.ravel()) / np.bincount(_)
plt.plot(distances[1:], values[1:])
There is another issue with doing things yourself this way: when you compute the power spectrum as above, mathematically is as if your three dimensional array wrapped around the borders, i.e. point [999, y, z] is a neighbour to [0, y, z]. So your autocorrelation could show two very distant galaxies as close neighbours. The simplest way to deal with this is by making your array twice as large along every dimension, padding with extra zeros, and then discarding the extra data.
Alternatively you could use scipy.ndimage.filters.correlate with mode='constant' to do all the dirty work for you.

Categories

Resources