I searched stackoverflow but could not find an answer to this specific question. Sorry if it is a naive question, I am a newbie to python.
I have several 2d arrays (or lists) that I would like to read into a 3d array (list) in python. In Matlab, I can simply do
for i=1:N
# read 2d array "a"
newarray(:,:,i)=a(:,:)
end
so newarray is a 3d array with "a" being the 2d slices arranged along the 3rd dimension.
Is there a simple way to do this in python?
Edit: I am currently trying the following:
for file in files:
img=mpimg.imread(file)
newarray=np.array(0.289*cropimg[:,:,0]+0.5870*cropimg[:,:,1]+0.1140*cropimg[:,:,2])
i=i+1
I tried newarray[:,:,i] and it gives me an error
NameError: name 'newarray' is not defined
Seems like I have to define newarray as a numpy array? Not sure.
Thanks!
If you're familiar with MATLAB, translating that into using NumPy is fairly straightforward.
Lets say you have a couple arrays
a = np.eye(3)
b = np.arange(9).reshape((3, 3))
print(a)
# [[ 1. 0. 0.]
# [ 0. 1. 0.]
# [ 0. 0. 1.]]
print(b)
# [[0 1 2]
# [3 4 5]
# [6 7 8]]
If you simply want to put them into another dimension, pass them both to the array constructor in an iterable (e.g. a list) like so:
x = np.array([a, b])
print(x)
# [[[ 1. 0. 0.]
# [ 0. 1. 0.]
# [ 0. 0. 1.]]
#
# [[ 0. 1. 2.]
# [ 3. 4. 5.]
# [ 6. 7. 8.]]]
Numpy is smart enough to recognize the arrays are all the same size and creates a new dimension to hold it all.
print(x.shape)
# (2, 3, 3)
You can loop through it, but if you want to apply the same operations to it across some dimensions, I would strongly suggest you use broadcasting so that NumPy can vectorize the operation and it runs a whole lot faster.
For example, across one dimension, lets multiply one slice by 2, another by 3. (If it's not a pure scalar, we need to reshape the array to the same number of dimensions to broadcast, then the size on each needs to either match the array or be 1). Note that I'm working along the 0th axis, your image is probably different. I don't have a handy image to load up to toy with
y = x * np.array([2, 3]).reshape((2, 1, 1))
print(y)
#[[[ 2. 0. 0.]
# [ 0. 2. 0.]
# [ 0. 0. 2.]]
#
# [[ 0. 3. 6.]
# [ 9. 12. 15.]
# [ 18. 21. 24.]]]
Then we can add them up
z = np.sum(y, axis=0)
print(z)
#[[ 2. 3. 6.]
# [ 9. 14. 15.]
# [ 18. 21. 26.]]
If you're using NumPy arrays, you can translate almost directly from Matlab:
for i in range(1, N+1):
# read 2d array "a"
newarray[:, :, i] = a[:, :]
Of course you'd probably want to use range(N), because arrays use 0-based indexing. And obviously you're going to need to pre-create newarray in some way, just as you'd have to in Matlab, but you can translate that pretty directly too. (Look up the zeros function if you're not sure how.)
If you're using lists, you can't do this directly—but you probably don't want to anyway. A better solution would be to build up a list of 2D lists on the fly:
newarray = []
for i in range(N):
# read 2d list of lists "a"
newarray.append(a)
Or, more simply:
newarray = [read_next_2d_list_of_lists() for i in range(N)]
Or, even better, make that read function a generator, then just:
newarray = list(read_next_2d_list_of_lists())
If you want to transpose the order of the axes, you can use the zip function for that.
Related
I understand the concept of vectorization, and how you can avoid using a loop to run through the elements when you want to adjust each individual element, however what I can't figure out it how to do this when we have a conditional based on the neighbouring values of a pixel.
For example, if I have a mask:
mask = np.array([[0,0,0,0],
[1,0,0,0],
[0,0,0,1],
[1,0,0,0]])
And I wanted to change an element by evaluating neighboring components in the mask, like so:
if sum(mask[j-1:j+2,i-1:i+2].flatten())>1 and mask[j,i]!=1:
out[j,i]=1
How can I vectorize the operation when I specifically need to access the neighboring elements?
Thanks in advance.
Full loop:
import numpy as np
mask = np.array([[0,0,0,0], [1,0,0,0], [0,0,0,1], [1,0,0,0]])
out = np.zeros(mask.shape)
for j in range(len(mask)):
for i in range(len(mask[0])):
if sum(mask[j-1:j+2,i-1:i+2].flatten())>1 and mask[j,i]!=1:
out[j,i]=1
Output:
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 0. 0.]]
Such a 'neighborhood sum' operation is often called a 2D convolution. In your case since you don't have any weighting it is efficiently implemented in the (IMO somewhat poorly named) scipy.ndimage.uniform_filter, which can compute the mean of a neighborhood (and the sum is
just the mean multiplied by the size).
import numpy as np
from scipy.ndimage import uniform_filter
mask = np.array([[0,0,0,0], [1,0,0,0], [0,0,0,1], [1,0,0,0]])
neighbor_sum = 9 * uniform_filter(mask.astype(np.float32), 3, mode="constant")
neighbor_sum = np.rint(neighbor_sum).astype(int)
out = ((neighbor_sum > 1) & (mask != 1)).astype(int)
print(out)
Output (which is different than your example but looking at it by hand is correct, assuming you don't want the edges to wrap around):
[[0 0 0 0]
[0 0 0 0]
[1 1 0 0]
[0 0 0 0]]
If you do want the edges to wrap around (or other edge behavior), look at the mode argument of uniform_filter.
I know it is possible to create numpy arrays using the Linspace function. For example, given a range [x,y] I can make a vector of z elements equally distanced in [x,y]
v = np.linspace(x, y, z, retstep=True)
What if one needs more dimensions? Is it possible to use the same function to generate a 3x4 array? I tried by creating simple arrays and then merge them, but I don't think that is an efficient way to do that
You can use arrays for start and stop point of linspace:
x=np.linspace((0,0,0), (3,5,14), 4, axis=1)
print(x)
This will give the output:
[[ 0. 1. 2. 3. ]
[ 0. 1.66666667 3.33333333 5. ]
[ 0. 4.66666667 9.33333333 14. ]]
So I wrote a function to standardize my data but I'm having trouble making it work. I want to iterate through an array of my data and standardize it
Here's my function
I've tried Transposing my arr but it still doesn't work?
def Scaling(arr,data):
scaled=[[]]
for a in arr.T:
scaled = ((a-data.mean())/(data.std()))
scaled = np.asarray(scaled)
return scaled
When I run my code I only get a 1D array as the output instead of 10D.
Because data.mean() and data.std() are aggregated constants or scalars, consider running the needed arithmetic operation directly on entire array without any for loops. Each constant will be operated on each column of array in a vectorized operation:
def Scaling(arr,data):
return (arr.T-data.mean())/(data.std())
Your current for loop only outputs the last array assignment of loop. You initialize an empty nested list but do not ever append to it. In fact you re-assign and re-define scaled to an array with each iteration. Ideally you append arrays to a collection to concatenate together outside loop. Nonetheless, this type of operation is not needed with simple matrix algebra.
To demonstrate with random, seeded data (can be revised with OP's actual data) see below with an exaggerated sequential input array to show end calculations:
import numpy as np
np.random.seed(12919)
data = np.arange(10)
arr = np.concatenate([np.ones((5, 1)),
np.ones((5, 1))+1,
np.ones((5, 1))+2,
np.ones((5, 1))+3,
np.ones((5, 1))+4], axis=1)
def Scaling(arr,data):
return (arr.T-data.mean())/(data.std())
new_arr = Scaling(arr, data)
print(arr)
# [[1. 2. 3. 4. 5.]
# [1. 2. 3. 4. 5.]
# [1. 2. 3. 4. 5.]
# [1. 2. 3. 4. 5.]
# [1. 2. 3. 4. 5.]]
print(new_arr)
# [[-1.21854359 -1.21854359 -1.21854359 -1.21854359 -1.21854359]
# [-0.87038828 -0.87038828 -0.87038828 -0.87038828 -0.87038828]
# [-0.52223297 -0.52223297 -0.52223297 -0.52223297 -0.52223297]
# [-0.17407766 -0.17407766 -0.17407766 -0.17407766 -0.17407766]
# [ 0.17407766 0.17407766 0.17407766 0.17407766 0.17407766]]
Pyfiddle demo (click Run at top for output on right)
When you know the number of dimensions of your lattice ahead of time, it is straight-forward to use meshgrid to evaluate a function over a mesh.
from pylab import *
lattice_points = linspace(0,3,4)
xs,ys = meshgrid(lattice_points,lattice_points)
zs = xs+ys # <- stand-in function, to be replaced by something more interesting
print(zs)
Produces
[[ 0. 1. 2. 3.]
[ 1. 2. 3. 4.]
[ 2. 3. 4. 5.]
[ 3. 4. 5. 6.]]
But I would like to have a version of something similar, for which the number of dimensions is determined during runtime, or is passed as a parameter.
from pylab import *
#np.vectorize
def fn(listOfVars) :
return sum(listOfVars) # <- stand-in function, to be replaced
# by something more interesting
n_vars = 2
lattice_points = linspace(0,3,4)
indices = meshgrid(*(n_vars*[lattice_points])) # this works fine
zs = fn(indices) # <-- this line is wrong, but I don't
# know what would work instead
print(zs)
Produces
[[[ 0. 1. 2. 3.]
[ 0. 1. 2. 3.]
[ 0. 1. 2. 3.]
[ 0. 1. 2. 3.]]
[[ 0. 0. 0. 0.]
[ 1. 1. 1. 1.]
[ 2. 2. 2. 2.]
[ 3. 3. 3. 3.]]]
But I want it to produce the same result as above.
There is probably a solution where you can find the indices of each dimension and use itertools.product to generate all of the possible combinations of indices etc. etc., but is there not a nice pythonic way of doing this?
Joe Kington and user2357112 have helped me to see the error in my ways. For those of you that would like to see a complete solution:
from pylab import *
## 2D "preknown case" (for testing / to compare output)
lattice_points = linspace(0,3,4)
xs,ys = meshgrid(lattice_points,lattice_points)
zs = xs+ys
print('2-D Case')
print(zs)
## 3D "preknown case" (for testing / to compare output)
lattice_points = linspace(0,3,4)
ws,xs,ys = meshgrid(lattice_points,lattice_points,lattice_points)
zs = ws+xs+ys
print('3-D Case')
print(zs)
## Solution, thanks to comments from Joe Kington and user2357112
def fn(listOfVars) :
return sum(listOfVars)
n_vars = 3 ## can change to 2 or 3 to compare to example cases above
lattice_points = linspace(0,3,4)
indices = meshgrid(*(n_vars*[lattice_points]))
zs = np.apply_along_axis(fn,0,indices)
print('adaptable n-D Case')
print(zs)
I have a large numpy 1 dimensional array of data in Python and want entries x (500) to y (520) to be changed to equal 1. I could use a for loop but is there a neater, faster numpy way of doing this?
for x in range(500,520)
numpyArray[x] = 1.
Here is the for loop that could be used but it seems like there could be a function in numpy that I'm missing - I'd rather not use the masked arrays that numpy offers
You can use [] to access a range of elements:
import numpy as np
a = np.ones((10))
print(a) # Original array
# [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
startindex = 2
endindex = 4
a[startindex:endindex] = 0
print(a) # modified array
# [ 1. 1. 0. 0. 1. 1. 1. 1. 1. 1.]