Simultaneously fit linearly every line of a 2d numpy array

Simultaneously fit linearly every line of a 2d numpy array - python

I am working in Python on image analysis. I have an image (2d numpy array) with some intensity drift in it. I want to level it.
To remove the increasing/decreasing intensity over the width of the image, I want to fit every row of the 2d numpy array with a line. I however do not want to loop through every row index.
MWE:
import numpy as np
import matplotlib.pyplot as plt
width=1500
height=2500
np.random.random((width,height))
fill_fun = lambda x,a,b : a*x+b
play_image = fill_fun(np.tile(np.arange(width),(height,1)),0.15,2)+np.random.random( (height,width) )
#For representation purposes:
#plt.imshow(play_image,cmap='Greys_r')
#plt.show()
#1) Fit every row and kill the intensity decrease/increase tendency
fit_func = lambda p,x: p[0]*x+b
errfunc = lambda p, x, y: abs(fitfunc(p, x) - y) # Distance to the target function
x_axis=np.linspace(0,width,width)
for i in range(height):
row_val=play_image[i,:]
p0=[(row_val[-1]-row_val[0])/float(width),row_val[0]] #guess
p1, success = optimize.leastsq(errfunc, p0[:], args=(x_axis,row_val))
play_image[i,:]-= fit_func(p1,x_axis)-p1[1]
By doing this I effectively level my image intensity horizontally. Is there anyway I can replace the loop by a matrix operation ? To somehow fit all the lines at the same time with a (height,2) parameter vector ?
Thanks for the help

Fitting a line is a simple formula to use directly, which can be done about three short lines in numpy (most of the code below is just making and plotting the data and fits):
import numpy as np
import matplotlib.pyplot as plt
# make the data as sequential sections of a circle
theta = np.linspace(np.pi, 0, 120)
y = np.reshape(np.sin(theta), (10,12))
x = np.repeat(np.arange(12)[None,:], 10, axis=0)
# fit the line
m = lambda x: np.mean(x, axis=1)
beta = ( m(y*x) - m(x)*m(y) )/(m(x*x) - m(x)**2)
alpha = m(y) - beta*m(x)
# plot the data and fits
plt.plot([y[:,i] for i in range(12)], ".") # plot the data
plt.gca().set_color_cycle(None) # reset the color cycle
fits = alpha[:,None] + beta[:,None]*x # make lines from the fits for the plots
plt.plot(fits.T)
plt.show()

You can implement the normal equations and their solution pretty easily. The main challenge is keeping track of the appropriate dimensions so all the vectorized operations work correctly. Here's one method:
import numpy as np
# image size
m = 100
n = 125
# A random image to work with.
np.random.seed(123)
img = np.random.randint(0, 100, size=(m, n))
# X is the design matrix. It is the same for each row. It has shape (n, 2).
X = np.column_stack((np.ones(n), np.arange(n)))
# A is X.T.dot(X), but in this case we can use an explicit formula for each term.
s1 = 0.5*n*(n - 1) # Sum of integers
s2 = n*(n - 0.5)*(n - 1)/3.0 # Sum of squared integers
A = np.array([[n, s1], [s1, s2]])
# Y has shape (2, m). Each column is a vector on the right-hand-side of the
# normal equations.
Y = X.T.dot(img.T)
# Solve the normal equations. beta has shape (2, m). Each column gives the
# coefficients of the linear fit for each row of img.
beta = np.linalg.solve(A, Y)
# Create an array that holds the linear drift for each row.
# X has shape (n, 2) and beta has shape (2, m), so row_drift has shape (m, n),
# the same as img.
row_drift = X.dot(beta).T
# Remove the drift from img.
img2 = img - row_drift

Related

matplotlib doesn't display the correct data

I am new to Python. For some reason when I look at the plot it displays all the data as if Y = 0 but the last one, which is weird since when I ask it to print Y it displays the right values. What am I doing wrong?
import math
import numpy as np
import matplotlib.pyplot as plt
y0=2 # [m]
g=9.81 # [m/s^2]
v=20 # initial speed [m/s]
y_target=1 # [m]
x=35 # [m]
n_iter=50
theta=np.linspace(0,0.5*math.pi,n_iter) # theta input [rad]
Y=np.zeros(n_iter) # y output [m]
for i in range(n_iter):
Y[i]=math.tan(theta[i])*x-g/(2*(v*math.cos(theta[i]))**2)*x**2+y0
plt.plot(theta,Y)
plt.ylabel('y [m]')
plt.xlabel('theta [rad]')
plt.ylim(top=max(Y),bottom=min(Y))
plt.show()

The problem is that the function blows up a bit as theta approaches π/2. Notice the little 1e33 at the top of the y-axis in the plot: the scale of that axis is huge, because the last value of y is essentially minus infinity (because of dividing by almost zero). If you change the limits of the y-axis, e.g. to (-1000, +1000), the plot looks correct.
But I can't resist helping you with something you didn't ask for help on... You are not using NumPy correctly. NumPy gives you two things: n-dimensional arrays as a data structure, and fast, optimized code for 'vectorized' computing with those arrays. In essence, you never need a loop in NumPy — you just compute with everything at once. Try doing 10 * np.array([1, 2, 3]) and you will get the idea.
So I would write your code like this:
import numpy as np
import matplotlib.pyplot as plt
# Problem parameters.
y0 = 2 # [m]
g = 9.81 # [m/s^2]
v = 20 # initial speed [m/s]
x = 35 # [m]
# Make theta [rad].
steps = 50
theta = np.linspace(0, 0.5*np.pi, steps)
# Compute y.
y = np.tan(theta) * x - g / (2 * (v * np.cos(theta))**2) * x**2 + y0
# Plot.
plt.plot(theta, y)
plt.ylabel('y [m]')
plt.xlabel('theta [rad]')
plt.ylim(-1000, 1000)
plt.show()
Notice that there's no loop — you just use the vector theta as if it were a scalar. And the math library (which can't handle NumPy's arrays, only scalars) is not needed at all when you're using NumPy.

How to estimate motion with FTT and Cross-Correlation?

I'm working in the estimation of cloud displacement for wind energy purposes with RGB GOES satellital images. I find the following the methodology from this paper "An Automated Technique for Obtaining Cloud Motion From Geosynchronous Satellite Data Using Cross Correlation" to achieve it. I don't know if this is a good way to compute this. The code bassically gets the cross correlation from the Fourier Transform to calculate cloud displacement between roi_a and roi_b images.
import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
img_a = cv.imread('2019.1117.1940.goes-16.rgb.tif', 0)
img_b = cv.imread('2019.1117.1950.goes-16.rgb.tif', 0)
roi_a = img_a[700:900, 1900:2100]
roi_b = img_b[700:900, 1900:2100]
def Fou(image):
fft_roi= np.fft.fft2(image)
return fft_roi
def inv_Fou(C_w):
c_t = np.fft.ifft2(C_w)
c_t = np.abs(c_t)
return c_t
#Step 1: gets the FFT
G_t0 = Fou(roi_a)##t_0
fft_roiA_conj = np.conj(G_t0) #Conjugate
G_t1 = Fou(roi_b)##t_1
#Step 2: Compute C(m, v)
prod = np.dot(fft_roiA_conj, G_t1)
#Step 3: Perform the inverse FFT
inv = inv_Fou(prod)
plt.imshow(inv, cmap = 'gray', )
plt.title('C (m,v) --> Cov(p,q)')
plt.xticks([])
plt.yticks([])
plt.show()
#Step 4: Compute cross correlation coefficient and the maximum cross correlation coefficient
def rms(sigma):
"Compute the standar deviation of an image"
rms = np.std(sigma)
return rms
R_t = inv / (rms(roi_a) * rms(roi_b))
This is the first time that I use FFT on images, so I have some questions about it:
I don't add fftshift, is this can affect the result?
What is difference between use np.dot in step 2 and simple '*', like prod = fft_roiA_conj * G_t1
How to interpret the image result (C(m, v) -> Cov (p, q)) from step 3?
How can I obtain the maximum coefficient p' and q' (maximum coefficient of x and y directions) from R_t?

1 - fftshift is a circular rotation, if you have a two sided signal you are computing the correlation is shifted (circularly), what is important is that you map your indices to displacements correctly, with or without fftshift.
2 - numpy.dot is the matrix product (equivalent to # operator for recent python versions), and the * operator does element-wise multiplication, in my understanding you want the element-wise product at step 2.
3 - Once you correct the step 2 you will have an image such that inv[i, j] the correlation of the immage roi_a and the image roi_b rolled by i rows and j columns
To answer the last question I will workout an example.
I will use the image scipy.misc.face, it is a RGB image, so it brings three matrices that are highly correlated.
import scipy
import numpy as np
import matplotlib.pyplot as plt
f = scipy.misc.face()
plt.figure(figsize=(12, 4))
plt.subplot(131), plt.imshow(f[:,:, 0])
plt.subplot(132), plt.imshow(f[:,:, 1])
plt.subplot(133), plt.imshow(f[:,:, 2])
The function img_corrcombine the three steps of the cross correlation (for images of the same size), notice that I am use rfft2 and irfft2, this are the FFT for real data, that take advantage of symmetry in the frequency domain.
def img_corr(foi_a, foi_b):
return np.fft.irfft2(np.fft.rfft2(foi_a) * np.conj(np.fft.rfft2(foi_b)))
C = img_corr(f[:,:,1], f[:,:,2])
plt.figure(figsize=(12, 4))
plt.subplot(121), plt.imshow(C), plt.title('FFT indices')
plt.subplot(122), plt.imshow(np.fft.fftshift(C, (0, 1))), plt.title('fftshift ed version')
To retrieve the position
# this returns the indice in the vector of all pixels
best_corr = np.argmax(C)
# unravel index gives the 2D index
best_pos = np.unravel_index(best_corr, C.shape)
# this get the positions as a fraction of the image size
relative_pos = [np.fft.fftfreq(size)[index] for index, size in zip(best_pos, C.shape)]
I hope this completes the answer.

Calculating cross-correlation with fft returning backwards output

I'm trying to cross correlate two sets of data, by taking the fourier transform of both and multiplying the conjugate of the first fft with the second fft, before transforming back to time space. In order to test my code, I am comparing the output with the output of numpy.correlate. However, when I plot my code, (restricted to a certain window), it seems the two signals go in opposite directions/are mirrored about zero.
This is what my output looks like
My code:
import numpy as np
import pyplot as plt
phl_data = np.sin(np.arange(0, 10, 0.1))
mlac_data = np.cos(np.arange(0, 10, 0.1))
N = phl_data.size
zeroes = np.zeros(N-1)
phl_data = np.append(phl_data, zeroes)
mlac_data = np.append(mlac_data, zeroes)
# cross-correlate x = phl_data, y = mlac_data:
# take FFTs:
phl_fft = np.fft.fft(phl_data)
mlac_fft = np.fft.fft(mlac_data)
# fft of cross-correlation
Cw = np.conj(phl_fft)*mlac_fft
#Cw = np.fft.fftshift(Cw)
# transform back to time space:
Cxy = np.fft.fftshift(np.fft.ifft(Cw))
times = np.append(np.arange(-N+1, 0, dt),np.arange(0, N, dt))
plt.plot(times, Cxy)
plt.xlim(-250, 250)
# test against convolving:
c = np.correlate(phl_data, mlac_data, mode='same')
plt.plot(times, c)
plt.show()
(both data sets have been padded with N-1 zeroes)

The documentation to numpy.correlate explains this:
This function computes the correlation as generally defined in signal processing texts:
c_{av}[k] = sum_n a[n+k] * conj(v[n])
and:
Notes
The definition of correlation above is not unique and sometimes correlation may be defined differently. Another common definition is:
c'_{av}[k] = sum_n a[n] conj(v[n+k])
which is related to c_{av}[k] by c'_{av}[k] = c_{av}[-k].
Thus, there is not a unique definition, and the two common definitions lead to a reversed output.

Principal component analysis dimension reduction in python

I have to implement my own PCA function function Y,V = PCA(data, M, whitening) that computes the first M principal
components and transforms the data, so that y_n = U^T x_n. The function should further
return V that explains the amount of variance that is explained by the transformation.
I have to reduce the dimension of data D=4 to M=2 > given function below <
def PCA(data,nr_dimensions=None, whitening=False):
""" perform PCA and reduce the dimension of the data (D) to nr_dimensions
Input:
data... samples, nr_samples x D
nr_dimensions... dimension after the transformation, scalar
whitening... False -> standard PCA, True -> PCA with whitening
Returns:
transformed data... nr_samples x nr_dimensions
variance_explained... amount of variance explained by the the first nr_dimensions principal components, scalar"""
if nr_dimensions is not None:
dim = nr_dimensions
else:
dim = 2
what I have done is the following:
import numpy as np
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import scipy.stats as stats
from scipy.stats import multivariate_normal
import pdb
import sklearn
from sklearn import datasets
#covariance matrix
mean_vec = np.mean(data)
cov_mat = (data - mean_vec).T.dot((data - mean_vec)) / (data.shape[0] - 1)
print('Covariance matrix \n%s' % cov_mat)
#now the eigendecomposition of the cov matrix
cov_mat = np.cov(data.T)
eig_vals, eig_vecs = np.linalg.eig(cov_mat)
print('Eigenvectors \n%s' % eig_vecs)
print('\nEigenvalues \n%s' % eig_vals)
# Make a list of (eigenvalue, eigenvector) tuples
eig_pairs = [(np.abs(eig_vals[i]), eig_vecs[:,i]) for i in range(len(eig_vals))]
This is the point where I don't know what to do now and how to reduce dimension.
Any help would be welcome! :)

Here is a simple example for the case where the initial matrix A that contains the samples and features has shape=[samples, features]
from numpy import array
from numpy import mean
from numpy import cov
from numpy.linalg import eig
# define a matrix
A = array([[1, 2], [3, 4], [5, 6]])
print(A)
# calculate the mean of each column since I assume that it's column is a variable/feature
M = mean(A.T, axis=1)
print(M)
# center columns by subtracting column means
C = A - M
print(C)
# calculate covariance matrix of centered matrix
V = cov(C.T)
print(V)
# eigendecomposition of covariance matrix
values, vectors = eig(V)
print(vectors)
print(values)
# project data
P = vectors.T.dot(C.T)
print(P.T)

PCA is actually the same as singular value decomposition, so you can either use numpy.linalg.svd:
import numpy as np
def PCA(U,ndim,whitening=False):
L,G,R=np.linalg.svd(U,full_matrices=False)
if not whitening:
L=L # G
Y=L[:,:ndim] # R[:,:ndim].T
return Y,G[:ndim]
If you want to use the eigenvalue problem, then assuming that the number of samples is higher than the number of features (or your data would be underfit), it is inefficient to calculate the spatial correlations (left eigenvectors) directly. Instead, using SVD use the right eigenfunctions:
def PCA(U,ndim,whitening=False):
K=U.T # U # Calculating right eigenvectors
G,R=np.linalg.eigh(K)
G=G[:,::-1]
R=R[::-1]
L=U # R # reconstructing left ones
nrm=np.linalg.norm(L,axis=0,keepdims=True) #normalizing them
L/=nrm
if not whitening:
L=L # G
Y=L[:,:ndim] # R[:,:ndim].T
return Y,G[:ndim]

Healpy: From Data to Healpix map

I have a data grid where the rows represent theta (0, pi) and the columns represent phi (0, 2*pi) and where f(theta,phi) is the density of dark matter at that location. I wanted to calculate the power spectrum for this and have decided to use healpy.
What I can not understand is how to format my data for healpy to use. If someone could provide code (in python for obvious reasons) or point me to a tutorial, that would be great! I have tried my hand at doing it with the following code:
#grid dimensions are Nrows*Ncols (subject to change)
theta = np.linspace(0, np.pi, num=grid.shape[0])[:, None]
phi = np.linspace(0, 2*np.pi, num=grid.shape[1])
nside = 512
print "Pixel area: %.2f square degrees" % hp.nside2pixarea(nside, degrees=True)
pix = hp.ang2pix(nside, theta, phi)
healpix_map = np.zeros(hp.nside2npix(nside), dtype=np.double)
healpix_map[pix] = grid
But, when I try to execute the code to do the power spectrum. Specifically, :
cl = hp.anafast(healpix_map[pix], lmax=1024)
I get this error:
TypeError: bad number of pixels
If anyone could point me to a good tutorial or help edit my code that would be great.
More specifications:
my data is in a 2d np array and I can change the numRows/numCols if I need to.
Edit:
I have solved this problem by first changing the args of anafast to healpix_map.
I also improved the spacing by making my Nrows*Ncols=12*nside*nside.
But, my power spectrum is still giving errors. If anyone has links to good documentation/tutorial on how to calculate the power spectrum (condition of theta/phi args), that would be incredibly helpful.

There you go, hope it's what you're looking for. Feel free to comment with questions :)
import healpy as hp
import numpy as np
import matplotlib.pyplot as plt
# Set the number of sources and the coordinates for the input
nsources = int(1.e4)
nside = 16
npix = hp.nside2npix(nside)
# Coordinates and the density field f
thetas = np.random.random(nsources) * np.pi
phis = np.random.random(nsources) * np.pi * 2.
fs = np.random.randn(nsources)
# Go from HEALPix coordinates to indices
indices = hp.ang2pix(nside, thetas, phis)
# Initate the map and fill it with the values
hpxmap = np.zeros(npix, dtype=np.float)
for i in range(nsources):
hpxmap[indices[i]] += fs[i]
# Inspect the map
hp.mollview(hpxmap)
Since the map above contains nothing but noise, the power spectrum should just contain shot noise, i.e. be flat.
# Get the power spectrum
Cl = hp.anafast(hpxmap)
plt.figure()
plt.plot(Cl)

There is a faster way to do the map initialization using numpy.add.at, following this answer.
This is several times faster on my machine as compared to the first section of Daniel's excellent answer:
import healpy as hp
import numpy as np
import matplotlib.pyplot as plt
# Set the number of sources and the coordinates for the input
nsources = int(1e7)
nside = 64
npix = hp.nside2npix(nside)
# Coordinates and the density field f
thetas = np.random.uniform(0, np.pi, nsources)
phis = np.random.uniform(0, 2*np.pi, nsources)
fs = np.random.randn(nsources)
# Go from HEALPix coordinates to indices
indices = hp.ang2pix(nside, thetas, phis)
# Baseline, from Daniel Lenz's answer:
# time: ~5 s
hpxmap1 = np.zeros(npix, dtype=np.float)
for i in range(nsources):
hpxmap1[indices[i]] += fs[i]
# Using numpy.add.at
# time: ~0.6 ms
hpxmap2 = np.zeros(npix, dtype=np.float)
np.add.at(hpxmap2, indices, fs)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Simultaneously fit linearly every line of a 2d numpy array - python

Related

matplotlib doesn't display the correct data

How to estimate motion with FTT and Cross-Correlation?

Calculating cross-correlation with fft returning backwards output

Principal component analysis dimension reduction in python

Healpy: From Data to Healpix map

Categories

Resources