Related
I'm trying to find a way to perform operations on each elements across multiple 2D-arrays without having to loop over them. Or at least, not needing two for loops. My code calculates the standard deviation of each pixel over a series of images (arrays). Now, the amount of images there are is not the problem, it is the size of the arrays, making the code take extremely slow. The following is a working example of what I have.
import numpy as np
# reshape(# of image (arrays),# of rows, # of cols)
a = np.arange(32).reshape(2,4,4)
stddev_arr = np.array([])
for i in range(4):
for j in range(4):
pixel = a[0:,i,j]
stddev = np.std(pixel)
stddev_arr = np.append(stddev_arr, stddev)
My actual data is 2000x2000, making this code loop 4000000 times. Is there a better way to do this?
Any advice is extremely appreciated.
You're already using numpy. numpy's std() function takes an axis argument that tells it what axis you want it to operate on (in this case the zeroth axis). Because this offloads the calculation to numpy's C-backend (and possibly using SIMD optimizations for your processor that vectorize a lot of operations), it's so much faster than iterating. Another time-consuming operation in your code is when you append to stddev_arr. Appending to numpy arrays is slow because the entire array is copied into new memory before the new element is added. Now you already know how big that array needs to be, so you might as well preallocate it.
a = np.arange(32).reshape(2, 4, 4)
stdev = np.std(a, axis=0)
This gives a 4x4 array
array([[8., 8., 8., 8.],
[8., 8., 8., 8.],
[8., 8., 8., 8.],
[8., 8., 8., 8.]])
To flatten this into a 1D array, do flat_stdev = stdev.flatten().
Comparing the execution times:
# Using only numpy
def fun1(arr):
return np.std(arr, axis=0).flatten()
# Your function
def fun2(arr):
stddev_arr = np.array([])
for i in range(arr.shape[1]):
for j in range(arr.shape[2]):
pixel = arr[0:,i,j]
stddev = np.std(pixel)
stddev_arr = np.append(stddev_arr, stddev)
return stddev_arr
# Your function, but pre-allocating stddev_arr
def fun3(arr):
stddev_arr = np.zeros((arr.shape[1] * arr.shape[2],))
x = 0
for i in range(arr.shape[1]):
for j in range(arr.shape[2]):
pixel = arr[0:,i,j]
stddev = np.std(pixel)
stddev_arr[x] = stddev
x += 1
return stddev_arr
First, let's make sure all these functions are equivalent:
a = np.random.random((3, 10, 10))
assert np.all(fun1(a) == fun2(a))
assert np.all(fun1(a) == fun3(a))
Yup, all give the same result. Now, let's try with a bigger array.
a = np.random.random((3, 100, 100))
x = timeit.timeit('fun1(a)', setup='from __main__ import fun1, a', number=10)
# x: 0.003302899989648722
y = timeit.timeit('fun2(a)', setup='from __main__ import fun2, a', number=10)
# y: 5.495519500007504
z = timeit.timeit('fun3(a)', setup='from __main__ import fun3, a', number=10)
# z: 3.6250679999939166
Wow! We get a ~1.5x speedup just by preallocating.
Even more wow: using numpy's std() with the axis argument gives a > 1000x speedup, and this is just for the 100x100 array! With bigger arrays, you can expect to see even bigger speedup.
So based on what you have provided, you can reshape your array in another way to vectorize it to replace your two loops. Then you only have to use np.std once on the axis that you want.
a = np.arange(32).reshape(2, 4, 4)
a = a.reshape(2, -1).transpose()
stddev_arr = np.std(a, axis=1)
I have a 3-d array G with a size that changes in a loop. In Matlab, I first create an empty array G = [];
then I create the first element of G from another existing array D with size 256x256, it is simple to do that in matlab as follows
G(:,:,1) = D(:,:)
How can I do the same thing in Python?
Consider preallocating.
In python's numpy you can preallocate like this:
G = np.zeros([depth, height, width])
then you can slice G in a way similar to matlab, and substitue matrices in it. If you still want to have an array of changing size, you can create a list with your 2D arrays and then convert it to a np.array, like so:
G = list()
for i in range(N):
G.append(D)
G = np.array(G)
an empty 3d array should look something like this
n = 256
threeD = [[[0 for k in xrange(n)] for j in xrange(n)] for i in xrange(n)]
or if you want just one 256x256 2d array in the larger 3d array (which is what i think you are trying in Matlab)
threeD = [[[0 for k in xrange(n)] for j in xrange(n)]]
where n is the size of each "dimension".
it will give you an array full of 0, or you can replace 0 with None if that is more desirable as an empty array
also, it isn't really an "array" in python, it's a "list"
You can use
G[:,:,1]=D[:,:]
Example:
>>> G=np.zeros((2,2,2))
>>> D=np.ones((2,2))
>>> G[:,:,1]=D[:,:]
>>> G
array([[[ 0., 1.],
[ 0., 1.]],
[[ 0., 1.],
[ 0., 1.]]])
I'm trying to get the data values along a line (like in this hint). That example uses imshow(), but I'm currently using pcolormesh() to plot.
I'm finding that the get_array() function, to grab plotted data from pcolormesh() is returning a 1-D, flattened array of my data, instead of the original (or truncated) 2-D data.
For example:
D = np.genfromtxt(DataFilePath, skip_header=4, delimiter=',', unpack=True)
print( D.shape )
: (500, 500)
...more code...
img = ax[0].pcolormesh( np.arange( len(D[0,:]) ), np.arange(len(D[:,0])), D)
>>> D
: array([[ 42.38, 41.93, 41.92, ..., 41.73, 41.74, 41.51],
[ 41.88, 42.24, 42.21, ..., 41.88, 41.67, 41.64],
[ 42.4 , 41.47, 41.49, ..., 41.92, 42.07, 41.49],
...,
[ 44.24, 44.14, 44.17, ..., 40.2 , 40.68, 40.67],
[ 44.59, 44.24, 44.3 , ..., 40.91, 40.92, 40.95],
[ 44.2 , 44.27, 44.27, ..., 40.82, 40.91, 40.94]])
>>> img.get_array()
: array([ 42.38, 41.93, 41.92, ..., 40.85, 40.91, 40.92])
Since I'm trying to grab user-clicks on the plot and then re-plot using the clicked data values (like in this hint), I would like to use a function/class which won't have global access to the original data, but does have access to the img object.
Any idea how I get the 2D data from pcolormesh() using only the img(QuadMesh) object? It doesn't even seem to have the x/y length/shape values, for me to reconstruct the data from the 1-D get_array().
Thanks!
The shape of the array in stored in private attributes, _meshWidth and _meshHeight. Nevertheless, since these attributes are not part of the public API, it would be better to save the shape of the original data than to rely on these if possible.
import matplotlib.pyplot as plt
import numpy as np
D = np.random.uniform(0, 100, size=(5, 5))
fig, ax = plt.subplots()
h, w = D.shape
img = ax.pcolormesh( np.arange(h+1), np.arange(w+1), D)
D2 = img.get_array().reshape(img._meshWidth, img._meshHeight)
assert np.array_equal(D, D2)
Note also that if you wish to recover the original array D, then the coordinate arrays, np.arange(h+1), np.arange(w+1) must have lengths one bigger than the shape of D. Otherwise, img.get_array() returns an array of shape (499, 499) when D has shape (500, 500).
Yes, it does ravel the inputs:
https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes/_axes.py
# convert to one dimensional arrays
C = C.ravel()
X = X.ravel()
Y = Y.ravel()
If you know the desired 2d shape, you can unravel with a simple reshape call.
If the result should have the same shape as D use:
img.get_array().reshape(D.shape)
If the size of the raveled C can change, then this won't work.
If I make a D array that is (10,20), and plot it
img = pyplot.pcolormesh(D)
img._A is (200,), the array that img.get_array() returns.
img._meshHeight, img._meshWidth
# 10, 20
So the array can be reshaped with:
img._A.reshape(img._meshHeight, img._meshWidth)
img._coordinates is a (11,21,2) array, the coordinates in the x and y direction, plus one point. So you could get the C reshaping information from _coordinates as well. I don't see any public API method for retrieving these attributes, but that doesn't stop 'serious' Python programmers. In this test case, it generated the coordinates from the shape of D.
This Quadmesh was created with:
coords = np.zeros(((Nx * Ny), 2), dtype=float)
coords[:, 0] = X
coords[:, 1] = Y
collection = QuadMesh(
Nx - 1, Ny - 1, coords, ...)
....
collection.set_array(C)
A search for get_array in the matplotlib github repository does not get many hits.
I dug into the pcolor code a bit. It returns a PolyCollections img rather than a Quadmesh. It contains the information for drawing a collection of quadrilaterals.
For example in my test case with a 10x20 input, img._paths is a list of 200 Path objects
In [486]: img1._paths[0]
Out[486]:
Path(array([[ 0., 0.],
[ 0., 1.],
[ 1., 1.],
[ 1., 0.],
[ 0., 0.],
[ 0., 0.]]), array([ 1, 2, 2, 2, 2, 79], dtype=uint8))
It has five coordinate pairs, the xy points needed to draw the boundary of the quad, which will have a color value corresponding to C[0] (in the raveled form).
So all the X Y grid information is now coded in these Path objects. Instead of plotting a mesh it plots 200 colored squares (quads). The PolyCollections code does not assume that the squares are in any order or even touching each other. The big picture has been replaced with a bunch of independent small pictures.
You might be able reassemble those quads into a mesh, looking for matching vertices, etc. But it would be a lot of work.
I am working on an real-time application. For this I need to store around 20 arrays per second. Each arrays consists of n Points with their respective x and y coordinate (z may follow as well in the future).
What I did come up with is some kind of a Ring Buffer, which takes the length of the total arrays (it's frames of a video btw.) and the number of the points with their coordinate (this doesn't change within one execution, but is variable for executions following).
My Buffer inits with an numpy array filled with zeros: np.zeros((lengthOfSlices,numberOfTrackedPoints))
However this seems to be problematic, because I write the whole Points for a Slice into the array at once, not after another. That means I can't broadcast the array as the shape is not correct.
Is there a numPythonic way to initialize the array with zeros and store vectorwise afterwards?
Below you can find what I have now:
class Buffer():
def __init__(self, lengthOfSlices, numberOfTrackedPoints):
self.data = np.zeros((lengthOfSlices,numberOfTrackedPoints))
self.index = 0
def extend(self, x):
'adds array x to ring buffer'
x_index = (self.index + np.arange(x.size)) % self.data.size
self.data[x_index] = x
self.index = x_index[-1] + 1
def get(self):
'returns the first-in-first-out data in the ring buffer'
idx = (self.index + np.arange(self.data.size)) % self.data.size
return self.data[idx]
You need to reshape the array based on the lenght of the frame.
Simple example:
>>> import numpy as np
>>> A = np.zeros(100)
>>> B = np.reshape(A, (10,10))
>>> B[0]
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
So that's probably something like self.data = np.reshape(self.data, (lengthOfAFrame, 20))
EDIT:
Apparently reshaping is not your (only?) problem, you might check collections.deque for a python implementation of a circular buffer (source and example)
Is there way to initialize a numpy array of a shape and add to it? I will explain what I need with a list example. If I want to create a list of objects generated in a loop, I can do:
a = []
for i in range(5):
a.append(i)
I want to do something similar with a numpy array. I know about vstack, concatenate etc. However, it seems these require two numpy arrays as inputs. What I need is:
big_array # Initially empty. This is where I don't know what to specify
for i in range(5):
array i of shape = (2,4) created.
add to big_array
The big_array should have a shape (10,4). How to do this?
EDIT:
I want to add the following clarification. I am aware that I can define big_array = numpy.zeros((10,4)) and then fill it up. However, this requires specifying the size of big_array in advance. I know the size in this case, but what if I do not? When we use the .append function for extending the list in python, we don't need to know its final size in advance. I am wondering if something similar exists for creating a bigger array from smaller arrays, starting with an empty array.
numpy.zeros
Return a new array of given shape and
type, filled with zeros.
or
numpy.ones
Return a new array of given shape and
type, filled with ones.
or
numpy.empty
Return a new array of given shape and
type, without initializing entries.
However, the mentality in which we construct an array by appending elements to a list is not much used in numpy, because it's less efficient (numpy datatypes are much closer to the underlying C arrays). Instead, you should preallocate the array to the size that you need it to be, and then fill in the rows. You can use numpy.append if you must, though.
The way I usually do that is by creating a regular list, then append my stuff into it, and finally transform the list to a numpy array as follows :
import numpy as np
big_array = [] # empty regular list
for i in range(5):
arr = i*np.ones((2,4)) # for instance
big_array.append(arr)
big_np_array = np.array(big_array) # transformed to a numpy array
of course your final object takes twice the space in the memory at the creation step, but appending on python list is very fast, and creation using np.array() also.
Introduced in numpy 1.8:
numpy.full
Return a new array of given shape and type, filled with fill_value.
Examples:
>>> import numpy as np
>>> np.full((2, 2), np.inf)
array([[ inf, inf],
[ inf, inf]])
>>> np.full((2, 2), 10)
array([[10, 10],
[10, 10]])
Array analogue for the python's
a = []
for i in range(5):
a.append(i)
is:
import numpy as np
a = np.empty((0))
for i in range(5):
a = np.append(a, i)
You do want to avoid explicit loops as much as possible when doing array computing, as that reduces the speed gain from that form of computing. There are multiple ways to initialize a numpy array. If you want it filled with zeros, do as katrielalex said:
big_array = numpy.zeros((10,4))
EDIT: What sort of sequence is it you're making? You should check out the different numpy functions that create arrays, like numpy.linspace(start, stop, size) (equally spaced number), or numpy.arange(start, stop, inc). Where possible, these functions will make arrays substantially faster than doing the same work in explicit loops
To initialize a numpy array with a specific matrix:
import numpy as np
mat = np.array([[1, 1, 0, 0, 0],
[0, 1, 0, 0, 1],
[1, 0, 0, 1, 1],
[0, 0, 0, 0, 0],
[1, 0, 1, 0, 1]])
print mat.shape
print mat
output:
(5, 5)
[[1 1 0 0 0]
[0 1 0 0 1]
[1 0 0 1 1]
[0 0 0 0 0]
[1 0 1 0 1]]
For your first array example use,
a = numpy.arange(5)
To initialize big_array, use
big_array = numpy.zeros((10,4))
This assumes you want to initialize with zeros, which is pretty typical, but there are many other ways to initialize an array in numpy.
Edit:
If you don't know the size of big_array in advance, it's generally best to first build a Python list using append, and when you have everything collected in the list, convert this list to a numpy array using numpy.array(mylist). The reason for this is that lists are meant to grow very efficiently and quickly, whereas numpy.concatenate would be very inefficient since numpy arrays don't change size easily. But once everything is collected in a list, and you know the final array size, a numpy array can be efficiently constructed.
numpy.fromiter() is what you are looking for:
big_array = numpy.fromiter(xrange(5), dtype="int")
It also works with generator expressions, e.g.:
big_array = numpy.fromiter( (i*(i+1)/2 for i in xrange(5)), dtype="int" )
If you know the length of the array in advance, you can specify it with an optional 'count' argument.
I realize that this is a bit late, but I did not notice any of the other answers mentioning indexing into the empty array:
big_array = numpy.empty(10, 4)
for i in range(5):
array_i = numpy.random.random(2, 4)
big_array[2 * i:2 * (i + 1), :] = array_i
This way, you preallocate the entire result array with numpy.empty and fill in the rows as you go using indexed assignment.
It is perfectly safe to preallocate with empty instead of zeros in the example you gave since you are guaranteeing that the entire array will be filled with the chunks you generate.
I'd suggest defining shape first.
Then iterate over it to insert values.
big_array= np.zeros(shape = ( 6, 2 ))
for it in range(6):
big_array[it] = (it,it) # For example
>>>big_array
array([[ 0., 0.],
[ 1., 1.],
[ 2., 2.],
[ 3., 3.],
[ 4., 4.],
[ 5., 5.]])
Whenever you are in the following situation:
a = []
for i in range(5):
a.append(i)
and you want something similar in numpy, several previous answers have pointed out ways to do it, but as #katrielalex pointed out these methods are not efficient. The efficient way to do this is to build a long list and then reshape it the way you want after you have a long list. For example, let's say I am reading some lines from a file and each row has a list of numbers and I want to build a numpy array of shape (number of lines read, length of vector in each row). Here is how I would do it more efficiently:
long_list = []
counter = 0
with open('filename', 'r') as f:
for row in f:
row_list = row.split()
long_list.extend(row_list)
counter++
# now we have a long list and we are ready to reshape
result = np.array(long_list).reshape(counter, len(row_list)) # desired numpy array
Maybe something like this will fit your needs..
import numpy as np
N = 5
res = []
for i in range(N):
res.append(np.cumsum(np.ones(shape=(2,4))))
res = np.array(res).reshape((10, 4))
print(res)
Which produces the following output
[[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]]
If you want to add your item in multi-dimensional array, here is the solution.
import numpy as np
big_array = np.ndarray(shape=(0, 2, 4) # Empty with height and width 2, 4 and length 0
for i in range(5):
big_array = np.concatenate((big_array, i))
Here is the numpy official document for referral
# https://thispointer.com/create-an-empty-2d-numpy-array-matrix-and-append-rows-or-columns-in-python/
# Create an empty Numpy array with 4 columns or 0 rows
empty_array = np.empty((0, 4), int)
# Append a row to the 2D numpy array
empty_array = np.append(empty_array, np.array([[11, 21, 31, 41]]), axis=0)
# Append 2nd rows to the 2D Numpy array
empty_array = np.append(empty_array, np.array([[15, 25, 35, 45]]), axis=0)
print('2D Numpy array:')
print(empty_array)
pay attention that each inputed np.array is 2-dimensional