resampling of 2D numpy array - python

I have a 2D array of size (3,2) and i have to re sample this by using nearest neighbor, linear and bi cubic method of interpolation so that the size become (4,3).
I am using Python, numpy and scipy for this.
How can I achieve resampling of the input array?

There is a good tutorial on re-sampling using convolution here.
For integer factor up-scaling:
import numpy
import scipy
from scipy import ndimage, signal
# Scale factor
factor = 2
# Input image
a = numpy.arange(16).reshape((4,4))
# Empty image enlarged by scale factor
b = numpy.zeros((a.shape[0]*factor, a.shape[0]*factor))
# Fill the new array with the original values
b[::factor,::factor] = a
# Define the convolution kernel
kernel_1d = scipy.signal.boxcar(factor)
kernel_2d = numpy.outer(kernel_1d, kernel_1d)
# Apply the kernel by convolution, seperately in each axis
c = scipy.signal.convolve(b, kernel_2d, mode="valid")
Note that the factor can be different for each axis, and that you can also apply the convolution sequentially, on each axis. The kernels for bi-linear and bi-cubic are also shown in the link, with the bilinear interpolation making use of a triangular signal (scipy.signal.triang) and bi-cubic being a piece wise function.
You should also mind which portion of the interpolated image is valid; along the edges there is not sufficient support for the kernel.
Bi-cubic interpolation is the best option of the three, as far as satellite imagery goes.

There is a simpler solution for this https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.zoom.html.
Nearest neighbor interpolation is order=0, bilinear interpolation is order=1, and bicubic is order=3 (default).
import numpy as np
import scipy.ndimage
x = np.arange(6).reshape(3,2).astype(float)
z = (4/3, 3/2)
print('Original array:\n{0}\n\n'.format(x))
methods=['nearest-neighbor', 'bilinear', 'biquadratic', 'bicubic']
for o in range(4):
print('Resampled with {0} interpolation:\n {1}\n\n'.
format(methods[o], scipy.ndimage.zoom(x, z, order=o)))
This results to:
Original array:
[[0. 1.]
[2. 3.]
[4. 5.]]
Resampled with nearest-neighbor interpolation:
[[0. 1. 1.]
[2. 3. 3.]
[2. 3. 3.]
[4. 5. 5.]]
Resampled with bilinear interpolation:
[[0. 0.5 1. ]
[1.33333333 1.83333333 2.33333333]
[2.66666667 3.16666667 3.66666667]
[4. 4.5 5. ]]
Resampled with biquadratic interpolation:
[[1.04083409e-16 5.00000000e-01 1.00000000e+00]
[1.11111111e+00 1.61111111e+00 2.11111111e+00]
[2.88888889e+00 3.38888889e+00 3.88888889e+00]
[4.00000000e+00 4.50000000e+00 5.00000000e+00]]
Resampled with bicubic interpolation:
[[5.55111512e-16 5.00000000e-01 1.00000000e+00]
[1.03703704e+00 1.53703704e+00 2.03703704e+00]
[2.96296296e+00 3.46296296e+00 3.96296296e+00]
[4.00000000e+00 4.50000000e+00 5.00000000e+00]]

Related

Why np.corrcoef is not working as expected with two vectors of dimension two?

I am trying to calculate the Pearson correlation coefficient between two vectors in 2-dimensions using np.corrcoef. When the dimension of the vectors is different than two, they work fine, see for example:
import numpy as np
x = np.random.uniform(-10, 10, 3)
y = np.random.uniform(-10, 10, 3)
print(x, y)
print(np.corrcoef(x,y))
Output:
[-6.59840638 -1.81100446 5.6158669 ] [ 6.7200348 -7.0373677 -2.11395157]
[[ 1. -0.53299763]
[-0.53299763 1. ]]
However, when the dimension is exactly two, the correlation is wrong with the only values 1 or -1:
import numpy as np
x = np.random.uniform(-10, 10, 2)
y = np.random.uniform(-10, 10, 2)
print(x, y)
print(np.corrcoef(x,y))
Output 1:
[-2.61268708 8.32602293] [6.42020314 3.43806504]
[[ 1. -1.]
[-1. 1.]]
Output 2:
[ 5.04249697 -3.6599369 ] [6.12936665 3.15827974]
[[1. 1.]
[1. 1.]]
Output 3:
[7.33503682 7.7145613 ] [-9.54304108 7.43840944]
[[1. 1.]
[1. 1.]]
Question: What's happening and how to solve it?
There are a couple misunderstandings leading to your confusion:
I'll use row major order as numpy "Each row of x represents a variable, and each column a single observation of all those variables."
The Pearson correlation coefficient describes the linear relationship between 2 variables. If you only have 2 values point for each. You can always create a linear relationship between the 2. With the normalization, you'll always get 1 or -1.
A covariance or correlation matrix is usually calculated amongst the components of a random vector X=(X1,....,Xn).T . When you say you want the correlation between 2 vectors, it is unclear whether you want the cross-correlation between X an Y in which case you need np.correlate.

Python Numpy Linspace function for bidimensional array

I know it is possible to create numpy arrays using the Linspace function. For example, given a range [x,y] I can make a vector of z elements equally distanced in [x,y]
v = np.linspace(x, y, z, retstep=True)
What if one needs more dimensions? Is it possible to use the same function to generate a 3x4 array? I tried by creating simple arrays and then merge them, but I don't think that is an efficient way to do that
You can use arrays for start and stop point of linspace:
x=np.linspace((0,0,0), (3,5,14), 4, axis=1)
print(x)
This will give the output:
[[ 0. 1. 2. 3. ]
[ 0. 1.66666667 3.33333333 5. ]
[ 0. 4.66666667 9.33333333 14. ]]

AgglomerativeClustering on precomputed Sparse Matrix

In my current approach, I have
from scipy.sparse import csr_matrix
from sklearn.cluster import AgglomerativeClustering
import pandas as pd
s = pd.DataFrame([[0.8, 0. , 3. ],
[1. , 1. , 2. ],
[0.3, 3. , 4. ]], columns=['dist', 'v1', 'v2'])
sparseD = csr_matrix((1-s['dist'], (s['v1'].values, s['v2'].values)), shape=(N, N))
agg = AgglomerativeClustering(n_clusters=None, affinity='precomputed', linkage='complete', distance_threshold=.25)
agg.fit_predict(sparseD)
The last line raises
TypeError: cannot take a sparse matrix.
If I cast the data toarray, the code works and produces the expected output, but uses a lot of memory and is slow: on the real data size: 61K x 61K.
I am wondering if there is another library (or scikit API) that can do the same linkage clustering on a precomputed, sparse Distance matrix -- if there were no entry for a given (element1, element2) pair, the API would not link them and everything else would be the same.

Iterate through columns of an array to standardize data

So I wrote a function to standardize my data but I'm having trouble making it work. I want to iterate through an array of my data and standardize it
Here's my function
I've tried Transposing my arr but it still doesn't work?
def Scaling(arr,data):
scaled=[[]]
for a in arr.T:
scaled = ((a-data.mean())/(data.std()))
scaled = np.asarray(scaled)
return scaled
When I run my code I only get a 1D array as the output instead of 10D.
Because data.mean() and data.std() are aggregated constants or scalars, consider running the needed arithmetic operation directly on entire array without any for loops. Each constant will be operated on each column of array in a vectorized operation:
def Scaling(arr,data):
return (arr.T-data.mean())/(data.std())
Your current for loop only outputs the last array assignment of loop. You initialize an empty nested list but do not ever append to it. In fact you re-assign and re-define scaled to an array with each iteration. Ideally you append arrays to a collection to concatenate together outside loop. Nonetheless, this type of operation is not needed with simple matrix algebra.
To demonstrate with random, seeded data (can be revised with OP's actual data) see below with an exaggerated sequential input array to show end calculations:
import numpy as np
np.random.seed(12919)
data = np.arange(10)
arr = np.concatenate([np.ones((5, 1)),
np.ones((5, 1))+1,
np.ones((5, 1))+2,
np.ones((5, 1))+3,
np.ones((5, 1))+4], axis=1)
def Scaling(arr,data):
return (arr.T-data.mean())/(data.std())
new_arr = Scaling(arr, data)
print(arr)
# [[1. 2. 3. 4. 5.]
# [1. 2. 3. 4. 5.]
# [1. 2. 3. 4. 5.]
# [1. 2. 3. 4. 5.]
# [1. 2. 3. 4. 5.]]
print(new_arr)
# [[-1.21854359 -1.21854359 -1.21854359 -1.21854359 -1.21854359]
# [-0.87038828 -0.87038828 -0.87038828 -0.87038828 -0.87038828]
# [-0.52223297 -0.52223297 -0.52223297 -0.52223297 -0.52223297]
# [-0.17407766 -0.17407766 -0.17407766 -0.17407766 -0.17407766]
# [ 0.17407766 0.17407766 0.17407766 0.17407766 0.17407766]]
Pyfiddle demo (click Run at top for output on right)

Why do I get rows of zeros in my 2D fft?

I am trying to replicate the results from a paper.
"Two-dimensional Fourier Transform (2D-FT) in space and time along sections of constant latitude (east-west) and longitude (north-south) were used to characterize the spectrum of the simulated flux variability south of 40degS." - Lenton et al(2006)
The figures published show "the log of the variance of the 2D-FT".
I have tried to create an array consisting of the seasonal cycle of similar data as well as the noise. I have defined the noise as the original array minus the signal array.
Here is the code that I used to plot the 2D-FT of the signal array averaged in latitude:
import numpy as np
from numpy import ma
from matplotlib import pyplot as plt
from Scientific.IO.NetCDF import NetCDFFile
### input directory
indir = '/home/nicholas/data/'
### get the flux data which is in
### [time(5day ave for 10 years),latitude,longitude]
nc = NetCDFFile(indir + 'CFLX_2000_2009.nc','r')
cflux_southern_ocean = nc.variables['Cflx'][:,10:50,:]
cflux_southern_ocean = ma.masked_values(cflux_southern_ocean,1e+20) # mask land
nc.close()
cflux = cflux_southern_ocean*1e08 # change units of data from mmol/m^2/s
### create an array that consists of the seasonal signal fro each pixel
year_stack = np.split(cflux, 10, axis=0)
year_stack = np.array(year_stack)
signal_array = np.tile(np.mean(year_stack, axis=0), (10, 1, 1))
signal_array = ma.masked_where(signal_array > 1e20, signal_array) # need to mask
### average the array over latitude(or longitude)
signal_time_lon = ma.mean(signal_array, axis=1)
### do a 2D Fourier Transform of the time/space image
ft = np.fft.fft2(signal_time_lon)
mgft = np.abs(ft)
ps = mgft**2
log_ps = np.log(mgft)
log_mgft= np.log(mgft)
Every second row of the ft consists completely of zeros. Why is this?
Would it be acceptable to add a randomly small number to the signal to avoid this.
signal_time_lon = signal_time_lon + np.random.randint(0,9,size=(730, 182))*1e-05
EDIT: Adding images and clarify meaning
The output of rfft2 still appears to be a complex array. Using fftshift shifts the edges of the image to the centre; I still have a power spectrum regardless. I expect that the reason that I get rows of zeros is that I have re-created the timeseries for each pixel. The ft[0, 0] pixel contains the mean of the signal. So the ft[1, 0] corresponds to a sinusoid with one cycle over the entire signal in the rows of the starting image.
Here are is the starting image using following code:
plt.pcolormesh(signal_time_lon); plt.colorbar(); plt.axis('tight')
Here is result using following code:
ft = np.fft.rfft2(signal_time_lon)
mgft = np.abs(ft)
ps = mgft**2
log_ps = np.log1p(mgft)
plt.pcolormesh(log_ps); plt.colorbar(); plt.axis('tight')
It may not be clear in the image but it is only every second row that contains completely zeros. Every tenth pixel (log_ps[10, 0]) is a high value. The other pixels (log_ps[2, 0], log_ps[4, 0] etc) have very low values.
Consider the following example:
In [59]: from scipy import absolute, fft
In [60]: absolute(fft([1,2,3,4]))
Out[60]: array([ 10. , 2.82842712, 2. , 2.82842712])
In [61]: absolute(fft([1,2,3,4, 1,2,3,4]))
Out[61]:
array([ 20. , 0. , 5.65685425, 0. ,
4. , 0. , 5.65685425, 0. ])
In [62]: absolute(fft([1,2,3,4, 1,2,3,4, 1,2,3,4]))
Out[62]:
array([ 30. , 0. , 0. , 8.48528137,
0. , 0. , 6. , 0. ,
0. , 8.48528137, 0. , 0. ])
If X[k] = fft(x), and Y[k] = fft([x x]), then Y[2k] = 2*X[k] for k in {0, 1, ..., N-1} and zero otherwise.
Therefore, I would look into how your signal_time_lon is being tiled. That may be where the problem lies.

Categories

Resources