I have a 3D NumPy array of size (9,9,200) and a 2D array of size (200,200).
I want to take each channel of shape (9,9,1) and generate an array (9,9,200), every channel multiplied 200 times by 1 scalar in a single row, and average it such that the resultant array is (9,9,1).
Basically, if there are n channels in an input array, I want each channel multiplied n times and averaged - and this should happen for all channels. Is there an efficient way to do so?
So far what I have is this -
import numpy as np
arr = np.random.rand(9,9,200)
nchannel = arr.shape[-1]
transform = np.array([np.random.uniform(low=0.0, high=1.0, size=(nchannel,)) for i in range(nchannel)])
for channel in range(nchannel):
# The below line needs optimization
temp = [arr[:,:,i] * transform[channel][i] for i in range(nchannel)]
arr[:,:,channel] = np.sum(temp, axis=0)/nchannel
Edit :
A sample image demonstrating what I am looking for. Here nchannel = 3.
The input image is arr. The final image is the transformed arr.
EDIT:
import numpy as np
n_channels = 3
scalar_size = 2
t = np.ones((n_channels,scalar_size,scalar_size)) # scalar array
m = np.random.random((n_channels,n_channels)) # letters array
print(m)
print(t)
m_av = np.mean(m, axis=1)
print(m_av)
for i in range(n_channels):
t[i] = t[i]*m_av1[i]
print(t)
output:
[[0.04601533 0.05851365 0.03893352]
[0.7954655 0.08505869 0.83033369]
[0.59557455 0.09632997 0.63723506]]
[[[1. 1.]
[1. 1.]]
[[1. 1.]
[1. 1.]]
[[1. 1.]
[1. 1.]]]
[0.04782083 0.57028596 0.44304653]
[[[0.04782083 0.04782083]
[0.04782083 0.04782083]]
[[0.57028596 0.57028596]
[0.57028596 0.57028596]]
[[0.44304653 0.44304653]
[0.44304653 0.44304653]]]
What you're asking for is a simple matrix multiplication along the last axis:
import numpy as np
arr = np.random.rand(9,9,200)
transform = np.random.uniform(size=(200, 200)) / 200
arr = arr # transform
Related
So I wrote a function to standardize my data but I'm having trouble making it work. I want to iterate through an array of my data and standardize it
Here's my function
I've tried Transposing my arr but it still doesn't work?
def Scaling(arr,data):
scaled=[[]]
for a in arr.T:
scaled = ((a-data.mean())/(data.std()))
scaled = np.asarray(scaled)
return scaled
When I run my code I only get a 1D array as the output instead of 10D.
Because data.mean() and data.std() are aggregated constants or scalars, consider running the needed arithmetic operation directly on entire array without any for loops. Each constant will be operated on each column of array in a vectorized operation:
def Scaling(arr,data):
return (arr.T-data.mean())/(data.std())
Your current for loop only outputs the last array assignment of loop. You initialize an empty nested list but do not ever append to it. In fact you re-assign and re-define scaled to an array with each iteration. Ideally you append arrays to a collection to concatenate together outside loop. Nonetheless, this type of operation is not needed with simple matrix algebra.
To demonstrate with random, seeded data (can be revised with OP's actual data) see below with an exaggerated sequential input array to show end calculations:
import numpy as np
np.random.seed(12919)
data = np.arange(10)
arr = np.concatenate([np.ones((5, 1)),
np.ones((5, 1))+1,
np.ones((5, 1))+2,
np.ones((5, 1))+3,
np.ones((5, 1))+4], axis=1)
def Scaling(arr,data):
return (arr.T-data.mean())/(data.std())
new_arr = Scaling(arr, data)
print(arr)
# [[1. 2. 3. 4. 5.]
# [1. 2. 3. 4. 5.]
# [1. 2. 3. 4. 5.]
# [1. 2. 3. 4. 5.]
# [1. 2. 3. 4. 5.]]
print(new_arr)
# [[-1.21854359 -1.21854359 -1.21854359 -1.21854359 -1.21854359]
# [-0.87038828 -0.87038828 -0.87038828 -0.87038828 -0.87038828]
# [-0.52223297 -0.52223297 -0.52223297 -0.52223297 -0.52223297]
# [-0.17407766 -0.17407766 -0.17407766 -0.17407766 -0.17407766]
# [ 0.17407766 0.17407766 0.17407766 0.17407766 0.17407766]]
Pyfiddle demo (click Run at top for output on right)
I am doing logistic regression on iris dataset from sklearn, I know the math and try to implement it. At the final step, I get a prediction vector, this prediction vector represents the probability of that data point being to class 1 or class 2 (binary classification).
Now I want to turn this prediction vector into target vector. Say if probability is greater than 50%, that corresponding data point will belong to class 1, otherwise class 2. Use 0 to represent class 1, 1 for class 2.
I know there is a for loop version of it, just looping through the whole vector. But when the size get large, for loop is very expensive, so I want to do it more efficiently, like numpy's matrix operation, it is faster than doing matrix operation in for loop.
Any suggestion on the faster method?
import numpy as np
a = np.matrix('0.1 0.82')
print(a)
a[a > 0.5] = 1
a[a <= 0.5] = 0
print(a)
Output:
[[ 0.1 0.82]]
[[ 0. 1.]]
Update:
import numpy as np
a = np.matrix('0.1 0.82')
print(a)
a = np.where(a > 0.5, 1, 0)
print(a)
A more general solution to a 2D array which has many vectors with many classes:
import numpy as np
a = np.array( [ [.5, .3, .2],
[.1, .2, .7],
[ 1, 0, 0] ] )
idx = np.argmax(a, axis=-1)
a = np.zeros( a.shape )
a[ np.arange(a.shape[0]), idx] = 1
print(a)
Output:
[[1. 0. 0.]
[0. 0. 1.]
[1. 0. 0.]]
Option 1: If you do binary classification and have 1d prediction vector then your solution is numpy.round:
prob = model.predict(X_test)
Y = np.round(prob)
Option 2: If you have an n-dimensional one-hot prediction matrix, but want to have labels then you can use numpy.argmax. This will return 1d vector with labels:
prob = model.predict(X_test)
y = np.argmax(prob, axis=1)
In case you want to procede with a confusion matrix etc. afterwards and get the original format of a target variable in scikit again: array([1 0 ... 1])you can use:
a = clf.predict_proba(X_test)[:,1]
a = np.where(a>0.5, 1, 0)
The [:,1] referes to the second class (in my case: 1), the first class in my case was 0
for multi class, or a more generalized solution, use
np.argmax(y_hat, axis=1)
While generating a linspace array in Numpy we get an array of the form (len(array), ), i.e. it doesn't have any 2nd dimension. How do I generate a similar array and initialize it using Numpy zeros? Because it takes a 2nd argument, like 1, so I get (len(array), 1) while initializing, which I wanted to avoid if possible.
Eg. np.linspace(0,10,5) = [0, 2.5, 5, 7.5, 10] ;
It's array dimension is (5, ).
On the other hand, a zeros array is defined as np.zeros((5,1)) and our output is a vector [0 0 0 0 0] ^ (Transpose). I wanted to be a flat array not like a vector.
Is there a way?
your first argument (5,1) is defining the shape of the array as a 5x1 explicitly 2d shape. Just pass (5,), or more explicitly as follows:
import numpy as np
z = np.zeros(shape=(5,), dtype=float)
print(z)
print(z.shape)
output is:
[ 0. 0. 0. 0. 0.]
(5,)
I am trying to save the results from a loop in a np.array.
import numpy as np
p=np.array([])
points= np.array([[3,0,0],[-1,0,0]])
for i in points:
for j in points:
if j[0]!=0:
n=i+j
p= np.append(p,n)
However the resulting array is a 1D array of 6 members.
[2. 0. 0. -2. 0. 0.]
Instead I am looking for, but have been unable to produce:
[[2,0,0],[-2,0,0]]
Is there any way to get the result above?
Thank you.
One possibility is to turn p into a list, and convert it into a NumPy array right at the end:
p = []
for i in points:
...
p.append(n)
p = np.array(p)
What you're looking for is vertically stacking your results:
import numpy as np
p=np.empty((0,3))
points= np.array([[3,0,0],[-1,0,0]])
for i in points:
for j in points:
if j[0]!=0:
n=i+j
p= np.vstack((p,n))
print p
which gives:
[[ 2. 0. 0.]
[-2. 0. 0.]]
Although you could also reshape your result afterwards:
import numpy as np
p=np.array([])
points= np.array([[3,0,0],[-1,0,0]])
for i in points:
for j in points:
if j[0]!=0:
n=i+j
p= np.append(p,n)
p=np.reshape(p,(-1,3))
print p
Which gives the same result
.
I must warn you, hovever, that your code fails if j[0]!=0 as that would make n undefined...
np.vstack
np.empty
np.reshape
I am trying to replicate the results from a paper.
"Two-dimensional Fourier Transform (2D-FT) in space and time along sections of constant latitude (east-west) and longitude (north-south) were used to characterize the spectrum of the simulated flux variability south of 40degS." - Lenton et al(2006)
The figures published show "the log of the variance of the 2D-FT".
I have tried to create an array consisting of the seasonal cycle of similar data as well as the noise. I have defined the noise as the original array minus the signal array.
Here is the code that I used to plot the 2D-FT of the signal array averaged in latitude:
import numpy as np
from numpy import ma
from matplotlib import pyplot as plt
from Scientific.IO.NetCDF import NetCDFFile
### input directory
indir = '/home/nicholas/data/'
### get the flux data which is in
### [time(5day ave for 10 years),latitude,longitude]
nc = NetCDFFile(indir + 'CFLX_2000_2009.nc','r')
cflux_southern_ocean = nc.variables['Cflx'][:,10:50,:]
cflux_southern_ocean = ma.masked_values(cflux_southern_ocean,1e+20) # mask land
nc.close()
cflux = cflux_southern_ocean*1e08 # change units of data from mmol/m^2/s
### create an array that consists of the seasonal signal fro each pixel
year_stack = np.split(cflux, 10, axis=0)
year_stack = np.array(year_stack)
signal_array = np.tile(np.mean(year_stack, axis=0), (10, 1, 1))
signal_array = ma.masked_where(signal_array > 1e20, signal_array) # need to mask
### average the array over latitude(or longitude)
signal_time_lon = ma.mean(signal_array, axis=1)
### do a 2D Fourier Transform of the time/space image
ft = np.fft.fft2(signal_time_lon)
mgft = np.abs(ft)
ps = mgft**2
log_ps = np.log(mgft)
log_mgft= np.log(mgft)
Every second row of the ft consists completely of zeros. Why is this?
Would it be acceptable to add a randomly small number to the signal to avoid this.
signal_time_lon = signal_time_lon + np.random.randint(0,9,size=(730, 182))*1e-05
EDIT: Adding images and clarify meaning
The output of rfft2 still appears to be a complex array. Using fftshift shifts the edges of the image to the centre; I still have a power spectrum regardless. I expect that the reason that I get rows of zeros is that I have re-created the timeseries for each pixel. The ft[0, 0] pixel contains the mean of the signal. So the ft[1, 0] corresponds to a sinusoid with one cycle over the entire signal in the rows of the starting image.
Here are is the starting image using following code:
plt.pcolormesh(signal_time_lon); plt.colorbar(); plt.axis('tight')
Here is result using following code:
ft = np.fft.rfft2(signal_time_lon)
mgft = np.abs(ft)
ps = mgft**2
log_ps = np.log1p(mgft)
plt.pcolormesh(log_ps); plt.colorbar(); plt.axis('tight')
It may not be clear in the image but it is only every second row that contains completely zeros. Every tenth pixel (log_ps[10, 0]) is a high value. The other pixels (log_ps[2, 0], log_ps[4, 0] etc) have very low values.
Consider the following example:
In [59]: from scipy import absolute, fft
In [60]: absolute(fft([1,2,3,4]))
Out[60]: array([ 10. , 2.82842712, 2. , 2.82842712])
In [61]: absolute(fft([1,2,3,4, 1,2,3,4]))
Out[61]:
array([ 20. , 0. , 5.65685425, 0. ,
4. , 0. , 5.65685425, 0. ])
In [62]: absolute(fft([1,2,3,4, 1,2,3,4, 1,2,3,4]))
Out[62]:
array([ 30. , 0. , 0. , 8.48528137,
0. , 0. , 6. , 0. ,
0. , 8.48528137, 0. , 0. ])
If X[k] = fft(x), and Y[k] = fft([x x]), then Y[2k] = 2*X[k] for k in {0, 1, ..., N-1} and zero otherwise.
Therefore, I would look into how your signal_time_lon is being tiled. That may be where the problem lies.