Make a chi-2 test and 3d spectral test with python LCG - python

I'm developping for my math stuff but I'm blocked. Here is the statement :
We denote by U and V two sequences of random floats, obtained using a random type generator in the interval [0,1].
If we combine the U and V sequences according to the transformation: Z = m + σ √(−2lnU) sin (2πV). We obtain a variable Z according to the law N (m, σ).
I want to obtain a normal distribution with mean m = 3 and standard deviation 1.
So, I've generated my array of 30 000 Z floats with :
def generateFloat(tr):
list = []
for i in range(tr):
list.append(random.uniform(0,1))
return list
def generateZ(nb,m,o):
U = generateFloat(nb)
V = generateFloat(nb)
tab = []
for i in range(nb):
Z = m+(o*sqrt(-2*log(U[i])) * sin (2*pi*V[i]) )
tab.append(Z)
return tab
I would like to make the chi-square test and do the 3D spectral test. Actually my chi-square test give me weird values...
def khi2(x):
S = 0
E = len(x)/(max(x))
print(max(x)-1)
for i in range(max(x)):
O = x[i]
S += ((O-E)**2)/E
return S
and for my 3d spectral test i have this code :
from mpl_toolkits import mplot3d
from matplotlib import pyplot
import numpy as np
fig = pyplot.figure()
ax = pyplot.axes()
num_bars = 30000
x_pos = genererFlottant(num_bars)
y_pos = genererFlottant(num_bars)
ax.scatter(x_pos, y_pos)
pyplot.show()
I think but i'm not sure about values that i obtened are good on my graph... And this isn't in 3d. So, i don't think it's a good graph
I don't know if i'm doing right. If someone can help me with something, i take any help that i can have :/
thank you.

Related

scipy fft returns null imaginary part

First of all, I apologize for being an absolute beginner in both python and signal processing.
I am trying to simulate an impulse signal (or a delta function) propagating along spatial x-axis over time. Then, I would like to perform Fourier Transformation on amplitude vs x-axis for each time and then amplitude vs t-axis for each point in space. The problem I'm facing is that the Fourier coefficients are all real valued. If I "implot" the imaginary part over spatial and temporal axis, you can see, all of these are shown to be zero. However, my understanding was that, the impulse signal at t = 0, x = 0, should have null imaginary coefficient. But after that, for all the other t and/or x's, there should be a real valued imaginary coefficient.
Please refer to this site http://madebyevan.com/dft/ where one can interactively make waveforms and observe the Fourier Transformation. In the f(x) box, please put "spike(x-0)", "spike(x-1)" etc. to simulate my problem and expected result.
I have tried the following code using scipy.fftpack. There are some extra lines to analyze the impulse signal travelling in x axis and x-t plane.
import numpy as np
from numpy import pi
import matplotlib.pyplot as plt
from scipy import signal
import math
import scipy.fftpack
from scipy import ndimage
L = 10
k = np.pi/L
w = np.pi*2
n = 5
# Number of samplepoints
Nx = 1000
Nt = 500
# sample spacing
l = 1.0/Nx
T = 1.0/Nt
x = np.linspace(0, Nx*l*L, Nx)
t = np.linspace(0, Nt*T*L, Nt)
x = np.round(x,2)
t = np.round(t,2)
# function to produce impulse
def gw(xx, tt):
if xx == tt:
kk = 1
else:
kk = 0
return (kk)
fig = plt.figure()
yg = np.array([gw(i, j) for j in t for i in x])
YG = yg.reshape(Nt, Nx)
# how impulse propagate in x-t plane
plt.imshow(YG, interpolation='bilinear',aspect='auto')
plt.colorbar();
# how impulse propagate in x-axis for t = 2 and t = 100
fig, ax = plt.subplots()
ax.plot(x, YG[2,:], x, YG[100,:])
plt.show()
# FFT in x-axis at each point in time
yxf = np.zeros((Nt, Nx))
for i in range(Nt):
yx = YG[i,:]
yxf[i,:] = scipy.fftpack.fft(yx)
plt.imshow(np.imag(yxf[:,:Nx]), interpolation='bilinear',aspect='auto')
plt.colorbar();
plt.show()
# FFT in t-axis at each point in space
ytf = np.zeros((Nt, Nx))
for i in range(Nx):
yt = YG[:,i]
ytf[:,i] = scipy.fftpack.fft(yt)
plt.imshow(np.imag(ytf[:Nt,:]), interpolation='bilinear',aspect='auto')
plt.colorbar();
plt.show()

bifurcation diagram with python

I'm a beginner and I don't speak english very well so sorry about that.
I'd like to draw the bifurcation diagram of the sequence :
x(n+1)=ux(n)(1-x(n)) with x(0)=0.7 and u between 0.7 and 4.
I am supposed to get something like this :
So, for each value of u, I'd like to calculate the accumulation points of this sequence. That's why I'd like to code something that could display every points (u;x1001),(u;x1002)...(u;x1050) for each value of u.
I did this :
import matplotlib.pyplot as plt
import numpy as np
P=np.linspace(0.7,4,10000)
m=0.7
Y=[m]
l=np.linspace(1000,1050,51)
for u in P:
X=[u]
for n in range(1001):
m=(u*m)*(1-m)
break
for l in range(1051):
m=(u*m)*(1-m)
Y.append(m)
plt.plot(X,Y)
plt.show()
And, I get a blank graphic.
This is the first thing I try to code and I don't know anything yet in Python so I need help please.
There are a few issues in your code. Although the problem you have is a code review problem, generating bifurcation diagrams is a problem of general interest (it might need a relocation on scicomp but I don't know how to request that formally).
import matplotlib.pyplot as plt
import numpy as np
P=np.linspace(0.7,4,10000)
m=0.7
# Initialize your data containers identically
X = []
Y = []
# l is never used, I removed it.
for u in P:
# Add one value to X instead of resetting it.
X.append(u)
# Start with a random value of m instead of remaining stuck
# on a particular branch of the diagram
m = np.random.random()
for n in range(1001):
m=(u*m)*(1-m)
# The break is harmful here as it prevents completion of
# the loop and collection of data in Y
for l in range(1051):
m=(u*m)*(1-m)
# Collection of data in Y must be done once per value of u
Y.append(m)
# Remove the line between successive data points, this renders
# the plot illegible. Use a small marker instead.
plt.plot(X, Y, ls='', marker=',')
plt.show()
Also, X is useless here as it contains a copy of P.
To save bifurcation diagram in png format, you can try this simple code.
# Bifurcation diagram of the logistic map
import math
from PIL import Image
imgx = 1000
imgy = 500
image = Image.new("RGB", (imgx, imgy))
xa = 2.9
xb = 4.0
maxit = 1000
for i in range(imgx):
r = xa + (xb - xa) * float(i) / (imgx - 1)
x = 0.5
for j in range(maxit):
x = r * x * (1 - x)
if j > maxit / 2:
image.putpixel((i, int(x * imgy)), (255, 255, 255))
image.save("Bifurcation.png", "PNG")

I am getting two plots for one data set in python

I am working through example 8.1 titled Euler's Method from Mark Newman's book Computational Physics. I rewrote the example as a method with Numpy arrays but when I plot it I get two plots on the same figure not sure how to correct it. Also is there better way to convert my 2 1D arrays into 1 2D array to use for plotting in Matplotlib, thanks.
Newman's example :
from math import sin
from numpy import arange
from pylab import plot,xlabel,ylabel,show
def f(x,t):
return -x**3 + sin(t)
a = 0.0 # Start of the interval
b = 10.0 # End of the interval
N = 1000 # Number of steps
h = (b-a)/N # Size of a single step
x = 0.0 # Initial condition
tpoints = arange(a,b,h)
xpoints = []
for t in tpoints:
xpoints.append(x)
x += h*f(x,t)
plot(tpoints,xpoints)
xlabel("t")
ylabel("x(t)")
show()
My modifications:
from pylab import plot,show,xlabel,ylabel
from numpy import linspace,exp,sin,zeros,vstack,column_stack
def f(x,t):
return (-x**(3) + sin(t))
def Euler(f,x0,a,b):
N=1000
h = (b-a)/N
t = linspace(a,b,N)
x = zeros(N,float)
y = x0
for i in range(N):
x[i] = y
y += h*f(x[i],t[i])
return column_stack((t,x)) #vstack((t,x)).T
plot(Euler(f,0.0,0.0,10.0))
xlabel("t")
ylabel("x(t)")
show()
The reason you get two lines is that t as well as x are plotted against their index, instead of x plotted against t
I don't see why you'd want to stack the two arrays. Just keep then separate, which will also solve the problem of the two plots.
The following works fine.
import numpy as np
import matplotlib.pyplot as plt
f = lambda x,t: -x**3 + np.sin(t)
def Euler(f,x0,a,b):
N=1000
h = (b-a)/N
t = np.linspace(a,b,N)
x = np.zeros(N,float)
y = x0
for i in range(N):
x[i] = y
y += h*f(x[i],t[i])
return t,x
t,x = Euler(f,0.0,0.0,10.0)
plt.plot(t,x)
plt.xlabel("t")
plt.ylabel("x(t)")
plt.show()

How does one implement a subsampled RBF (Radial Basis Function) in Numpy?

I was trying to implement a Radial Basis Function in Python and Numpy as describe by CalTech lecture here. The mathematics seems clear to me so I find it strange that its not working (or it seems to not work). The idea is simple, one chooses a subsampled number of centers for each Gaussian form a kernal matrix and tries to find the best coefficients. i.e. solve Kc = y where K is the guassian kernel (gramm) matrix with least squares. For that I did:
beta = 0.5*np.power(1.0/stddev,2)
Kern = np.exp(-beta*euclidean_distances(X=X,Y=subsampled_data_points,squared=True))
#(C,_,_,_) = np.linalg.lstsq(K,Y_train)
C = np.dot( np.linalg.pinv(Kern), Y )
but when I try to plot my interpolation with the original data they don't look at all alike:
with 100 random centers (from the data set). I also tried 10 centers which produces essentially the same graph as so does using every data point in the training set. I assumed that using every data point in the data set should more or less perfectly copy the curve but it didn't (overfit). It produces:
which doesn't seem correct. I will provide the full code (that runs without error):
import numpy as np
from sklearn.metrics.pairwise import euclidean_distances
from scipy.interpolate import Rbf
import matplotlib.pyplot as plt
## Data sets
def get_labels_improved(X,f):
N_train = X.shape[0]
Y = np.zeros( (N_train,1) )
for i in range(N_train):
Y[i] = f(X[i])
return Y
def get_kernel_matrix(x,W,S):
beta = get_beta_np(S)
#beta = 0.5*tf.pow(tf.div( tf.constant(1.0,dtype=tf.float64),S), 2)
Z = -beta*euclidean_distances(X=x,Y=W,squared=True)
K = np.exp(Z)
return K
N = 5000
low_x =-2*np.pi
high_x=2*np.pi
X = low_x + (high_x - low_x) * np.random.rand(N,1)
# f(x) = 2*(2(cos(x)^2 - 1)^2 -1
f = lambda x: 2*np.power( 2*np.power( np.cos(x) ,2) - 1, 2) - 1
Y = get_labels_improved(X , f)
K = 2 # number of centers for RBF
indices=np.random.choice(a=N,size=K) # choose numbers from 0 to D^(1)
subsampled_data_points=X[indices,:] # M_sub x D
stddev = 100
beta = 0.5*np.power(1.0/stddev,2)
Kern = np.exp(-beta*euclidean_distances(X=X,Y=subsampled_data_points,squared=True))
#(C,_,_,_) = np.linalg.lstsq(K,Y_train)
C = np.dot( np.linalg.pinv(Kern), Y )
Y_pred = np.dot( Kern , C )
plt.plot(X, Y, 'o', label='Original data', markersize=1)
plt.plot(X, Y_pred, 'r', label='Fitted line', markersize=1)
plt.legend()
plt.show()
Since the plots look strange I decided to read the docs for the ploting functions but I couldn't find anything obvious that was wrong.
Scaling of interpolating functions
The main problem is unfortunate choice of standard deviation of the functions used for interpolation:
stddev = 100
The features of your functions (its humps) are of size about 1. So, use
stddev = 1
Order of X values
The mess of red lines is there because plt from matplotlib connects consecutive data points, in the order given. Since your X values are in random order, this results in chaotic left-right movements. Use sorted X:
X = np.sort(low_x + (high_x - low_x) * np.random.rand(N,1), axis=0)
Efficiency issues
Your get_labels_improved method is inefficient, looping over the elements of X. Use Y = f(X), leaving the looping to low-level NumPy internals.
Also, the computation of least-squared solution of an overdetermined system should be done with lstsq instead of computing the pseudoinverse (computationally expensive) and multiplying by it.
Here is the cleaned-up code; using 30 centers gives a good fit.
import numpy as np
from sklearn.metrics.pairwise import euclidean_distances
import matplotlib.pyplot as plt
N = 5000
low_x =-2*np.pi
high_x=2*np.pi
X = np.sort(low_x + (high_x - low_x) * np.random.rand(N,1), axis=0)
f = lambda x: 2*np.power( 2*np.power( np.cos(x) ,2) - 1, 2) - 1
Y = f(X)
K = 30 # number of centers for RBF
indices=np.random.choice(a=N,size=K) # choose numbers from 0 to D^(1)
subsampled_data_points=X[indices,:] # M_sub x D
stddev = 1
beta = 0.5*np.power(1.0/stddev,2)
Kern = np.exp(-beta*euclidean_distances(X=X, Y=subsampled_data_points,squared=True))
C = np.linalg.lstsq(Kern, Y)[0]
Y_pred = np.dot(Kern, C)
plt.plot(X, Y, 'o', label='Original data', markersize=1)
plt.plot(X, Y_pred, 'r', label='Fitted line', markersize=1)
plt.legend()
plt.show()

Exercise on calculating and plotting cumulated empirical distribution

I was trying to finish an exercise in Jonh Stachurski's book (a textbook devoted to teach economists how to use Python). One of these is about how to calculate and plot cumulated empirical distribution. They provide a class called ecdf to calculate empirical distribution function
# Filename: ecdf.py
# Author: John Stachurski
# Date: December 2008
# Corresponds to: Listing 6.3
class ECDF:
def __init__(self, observations):
self.observations = observations
def __call__(self, x):
counter = 0.0
for obs in self.observations:
if obs <= x:
counter += 1
return counter / len(self.observations)
And the excercise reads
【Exercise 6.1.12】 Add a method to the ECDF class that uses Matplotlib to plot the em-
pirical distribution over a specified interval. Replicate the four graphs in figure 6.3
(modulo randomness).
the figure is need to be replicated is
and an illusion of algorithm
The following is my initial attempt
from ecdf import ECDF
import numpy as np
import matplotlib.pyplot as plt
from srs import SRS
from math import sqrt
from random import lognormvariate
# =========================
# parameters and arguments
# =========================
alpha, sigma2, s, delta = 0.3, 0.2, 0.5, 0.1
# numbers of draws
n = 1000
# length of each markov chain
t = 20
num_simu = [4,25,100,5000]
# Define F(k, z) = s k^alpha z + (1 - delta) k
F = lambda k, z: s * (k**alpha) * z + (1 - delta) * k
lognorm = lambda: lognormvariate(0, sqrt(sigma2))
# =====================
# create empirical distribution
# =====================
# different draw numbers
k = np.linspace(0,25,500)
for n in num_simu:
for x in range(n):
# list used to store capital stock (kt) in the last periods (t=20)
kt = []
solow_srs = SRS(F=F, phi=lognorm, X=1.0)
px = solow_srs.sample_path(t)
kt.append(px[-1])
# generate the empirical distribution function
F = ECDF(kt)
prob_kt_n = [F(i) for i in k] # need to determine range
# n refers to the n-th draw
# ==================================
# use for-loop to create subplots
# ==================================
#k = np.linspace(0,25,500)
#num_rows,num_cols = 2,2
The difficulties to me are 1) How can I store list/array of empirical distribution results for different draw numbers in the given graph. 2) How to create subplots using a for-loop. I also encountered some other tiny errors.
Thank you for your suggestions.
About (1), my advice is to create a dictionary (i.e. something like d = {} and then d[n] = ECDF(data) for each number n of observations).
Dunno about (2).

Categories

Resources