Iterate through columns of an array to standardize data

Iterate through columns of an array to standardize data - python

So I wrote a function to standardize my data but I'm having trouble making it work. I want to iterate through an array of my data and standardize it
Here's my function
I've tried Transposing my arr but it still doesn't work?
def Scaling(arr,data):
scaled=[[]]
for a in arr.T:
scaled = ((a-data.mean())/(data.std()))
scaled = np.asarray(scaled)
return scaled
When I run my code I only get a 1D array as the output instead of 10D.

Because data.mean() and data.std() are aggregated constants or scalars, consider running the needed arithmetic operation directly on entire array without any for loops. Each constant will be operated on each column of array in a vectorized operation:
def Scaling(arr,data):
return (arr.T-data.mean())/(data.std())
Your current for loop only outputs the last array assignment of loop. You initialize an empty nested list but do not ever append to it. In fact you re-assign and re-define scaled to an array with each iteration. Ideally you append arrays to a collection to concatenate together outside loop. Nonetheless, this type of operation is not needed with simple matrix algebra.
To demonstrate with random, seeded data (can be revised with OP's actual data) see below with an exaggerated sequential input array to show end calculations:
import numpy as np
np.random.seed(12919)
data = np.arange(10)
arr = np.concatenate([np.ones((5, 1)),
np.ones((5, 1))+1,
np.ones((5, 1))+2,
np.ones((5, 1))+3,
np.ones((5, 1))+4], axis=1)
def Scaling(arr,data):
return (arr.T-data.mean())/(data.std())
new_arr = Scaling(arr, data)
print(arr)
# [[1. 2. 3. 4. 5.]
# [1. 2. 3. 4. 5.]
# [1. 2. 3. 4. 5.]
# [1. 2. 3. 4. 5.]
# [1. 2. 3. 4. 5.]]
print(new_arr)
# [[-1.21854359 -1.21854359 -1.21854359 -1.21854359 -1.21854359]
# [-0.87038828 -0.87038828 -0.87038828 -0.87038828 -0.87038828]
# [-0.52223297 -0.52223297 -0.52223297 -0.52223297 -0.52223297]
# [-0.17407766 -0.17407766 -0.17407766 -0.17407766 -0.17407766]
# [ 0.17407766 0.17407766 0.17407766 0.17407766 0.17407766]]
Pyfiddle demo (click Run at top for output on right)

Related

How to mulitiply two arrays of different shape in numpy to get a matrix [duplicate]

This question already has answers here:
Matrix multiply two 1-D numpy arrays
(3 answers)
Closed 4 months ago.
I have for example two arrays, a and b. Array a has a length of 3. Array b an arbitrary length.
I would like to do the following with a numpy approach:
temp_res = 0
for i in range(3):
tem_res += a[i] * b
a can be treated as a vector of scalar values for multiplication. Basically I want to have a Matrix with 3 rows which has the same length as b and are multiplied with a's value at the corresponding index. However, because of the different shapes, I do not say any how to this without any loop (or list comprehension).
How can the example above implemented with purely numpy (and without any python loop)? I already checked out the documentation, but same shape is always a condition.

you need to read about numpy broadcasting, putting 1 in the first dimension of b will force broadcasting on it, a reshape only changes the stride but doesn't make a copy of the data.
tem_res = a * b.reshape([1,-1])
this can also be written this way in case b was larger than 2D
tem_res = a * b[None,:]
Example:
import numpy as np
a = np.ones([3,4]) # 3x4 array of ones
b = np.zeros([4]) # 1D array 4 elements of zeros
c = a * b.reshape([1,-1]) # b.reshape is now 1x4, it can be multipled by 3x4
print(c) # confirm it is 3x4 array
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]

Multiply same numpy array with scalars multiple times

I have a 3D NumPy array of size (9,9,200) and a 2D array of size (200,200).
I want to take each channel of shape (9,9,1) and generate an array (9,9,200), every channel multiplied 200 times by 1 scalar in a single row, and average it such that the resultant array is (9,9,1).
Basically, if there are n channels in an input array, I want each channel multiplied n times and averaged - and this should happen for all channels. Is there an efficient way to do so?
So far what I have is this -
import numpy as np
arr = np.random.rand(9,9,200)
nchannel = arr.shape[-1]
transform = np.array([np.random.uniform(low=0.0, high=1.0, size=(nchannel,)) for i in range(nchannel)])
for channel in range(nchannel):
# The below line needs optimization
temp = [arr[:,:,i] * transform[channel][i] for i in range(nchannel)]
arr[:,:,channel] = np.sum(temp, axis=0)/nchannel
Edit :
A sample image demonstrating what I am looking for. Here nchannel = 3.
The input image is arr. The final image is the transformed arr.

EDIT:
import numpy as np
n_channels = 3
scalar_size = 2
t = np.ones((n_channels,scalar_size,scalar_size)) # scalar array
m = np.random.random((n_channels,n_channels)) # letters array
print(m)
print(t)
m_av = np.mean(m, axis=1)
print(m_av)
for i in range(n_channels):
t[i] = t[i]*m_av1[i]
print(t)
output:
[[0.04601533 0.05851365 0.03893352]
[0.7954655 0.08505869 0.83033369]
[0.59557455 0.09632997 0.63723506]]
[[[1. 1.]
[1. 1.]]
[[1. 1.]
[1. 1.]]
[[1. 1.]
[1. 1.]]]
[0.04782083 0.57028596 0.44304653]
[[[0.04782083 0.04782083]
[0.04782083 0.04782083]]
[[0.57028596 0.57028596]
[0.57028596 0.57028596]]
[[0.44304653 0.44304653]
[0.44304653 0.44304653]]]

What you're asking for is a simple matrix multiplication along the last axis:
import numpy as np
arr = np.random.rand(9,9,200)
transform = np.random.uniform(size=(200, 200)) / 200
arr = arr # transform

How to vectorize passing a function to two numpy arrays: 3D and 2D?

I have two multidimensional numpy arrays: x is 3D and y is 2D.
If I have a function foo(a, b), which takes as inputs two 2D arrays, how can I pass to foo my multidimensional arrays and iterate over x's 3rd dimension in a vectorized way in order to get a list of foo's results?
I have been trying to do this with np.vectorize, but it iterates through the rows of the arrays and yields an error, so I am stuck.

You can specify the function's signature using the signature keyword. This will, however, try to use the last dimensions of each input, so you'd have to manually transpose. Example
F = np.vectorize(np.matmul, signature='(m,n),(n,l)->(m,l)', otypes=(float,))
A = np.arange(12).reshape(2, 2, 3)
B = np.diag((1.5, 2.5))
# F(A.transpose(2,0,1), B)
# array([[[ 0. , 7.5],
# [ 9. , 22.5]],
#
# [[ 1.5, 10. ],
# [10.5, 25. ]],
#
# [[ 3. , 12.5],
# [12. , 27.5]]])
As pointed out by #hpaulj in the comments vectorize is a convenience function, not a performance enhancer.

Reading 2d arrays into a 3d array in python

I searched stackoverflow but could not find an answer to this specific question. Sorry if it is a naive question, I am a newbie to python.
I have several 2d arrays (or lists) that I would like to read into a 3d array (list) in python. In Matlab, I can simply do
for i=1:N
# read 2d array "a"
newarray(:,:,i)=a(:,:)
end
so newarray is a 3d array with "a" being the 2d slices arranged along the 3rd dimension.
Is there a simple way to do this in python?
Edit: I am currently trying the following:
for file in files:
img=mpimg.imread(file)
newarray=np.array(0.289*cropimg[:,:,0]+0.5870*cropimg[:,:,1]+0.1140*cropimg[:,:,2])
i=i+1
I tried newarray[:,:,i] and it gives me an error
NameError: name 'newarray' is not defined
Seems like I have to define newarray as a numpy array? Not sure.
Thanks!

If you're familiar with MATLAB, translating that into using NumPy is fairly straightforward.
Lets say you have a couple arrays
a = np.eye(3)
b = np.arange(9).reshape((3, 3))
print(a)
# [[ 1. 0. 0.]
# [ 0. 1. 0.]
# [ 0. 0. 1.]]
print(b)
# [[0 1 2]
# [3 4 5]
# [6 7 8]]
If you simply want to put them into another dimension, pass them both to the array constructor in an iterable (e.g. a list) like so:
x = np.array([a, b])
print(x)
# [[[ 1. 0. 0.]
# [ 0. 1. 0.]
# [ 0. 0. 1.]]
#
# [[ 0. 1. 2.]
# [ 3. 4. 5.]
# [ 6. 7. 8.]]]
Numpy is smart enough to recognize the arrays are all the same size and creates a new dimension to hold it all.
print(x.shape)
# (2, 3, 3)
You can loop through it, but if you want to apply the same operations to it across some dimensions, I would strongly suggest you use broadcasting so that NumPy can vectorize the operation and it runs a whole lot faster.
For example, across one dimension, lets multiply one slice by 2, another by 3. (If it's not a pure scalar, we need to reshape the array to the same number of dimensions to broadcast, then the size on each needs to either match the array or be 1). Note that I'm working along the 0th axis, your image is probably different. I don't have a handy image to load up to toy with
y = x * np.array([2, 3]).reshape((2, 1, 1))
print(y)
#[[[ 2. 0. 0.]
# [ 0. 2. 0.]
# [ 0. 0. 2.]]
#
# [[ 0. 3. 6.]
# [ 9. 12. 15.]
# [ 18. 21. 24.]]]
Then we can add them up
z = np.sum(y, axis=0)
print(z)
#[[ 2. 3. 6.]
# [ 9. 14. 15.]
# [ 18. 21. 26.]]

If you're using NumPy arrays, you can translate almost directly from Matlab:
for i in range(1, N+1):
# read 2d array "a"
newarray[:, :, i] = a[:, :]
Of course you'd probably want to use range(N), because arrays use 0-based indexing. And obviously you're going to need to pre-create newarray in some way, just as you'd have to in Matlab, but you can translate that pretty directly too. (Look up the zeros function if you're not sure how.)
If you're using lists, you can't do this directly—but you probably don't want to anyway. A better solution would be to build up a list of 2D lists on the fly:
newarray = []
for i in range(N):
# read 2d list of lists "a"
newarray.append(a)
Or, more simply:
newarray = [read_next_2d_list_of_lists() for i in range(N)]
Or, even better, make that read function a generator, then just:
newarray = list(read_next_2d_list_of_lists())
If you want to transpose the order of the axes, you can use the zip function for that.

How to change chunks of data in a numpy array

I have a large numpy 1 dimensional array of data in Python and want entries x (500) to y (520) to be changed to equal 1. I could use a for loop but is there a neater, faster numpy way of doing this?
for x in range(500,520)
numpyArray[x] = 1.
Here is the for loop that could be used but it seems like there could be a function in numpy that I'm missing - I'd rather not use the masked arrays that numpy offers

You can use [] to access a range of elements:
import numpy as np
a = np.ones((10))
print(a) # Original array
# [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
startindex = 2
endindex = 4
a[startindex:endindex] = 0
print(a) # modified array
# [ 1. 1. 0. 0. 1. 1. 1. 1. 1. 1.]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Iterate through columns of an array to standardize data - python

Related

How to mulitiply two arrays of different shape in numpy to get a matrix [duplicate]

Multiply same numpy array with scalars multiple times

How to vectorize passing a function to two numpy arrays: 3D and 2D?

Reading 2d arrays into a 3d array in python

How to change chunks of data in a numpy array

Categories

Resources