Creating Numpy-Arrays without iterating in Python

Creating Numpy-Arrays without iterating in Python - python

Say I have a numpy array with shape (2,3) filled with floats.
I also need an array of all possible combinations of X and Y Values (their corresponding position in the array). Is there something like a simpe function to get the indices as a tuple from a numpy array in which I don't need to have for-loops iterate through the array?
Example Code:
arr=np.array([np.array([1.0,1.1,1.2]),
np.array([1.0,1.1,1.2])])
indices=np.zeros([arr.shape[0]*arr.shape[1]])
#I want an array of length 6 like np.array([[0,0],[0,1],[0,2],[1,0],[1,1], [1,2]])
#Code so far, iterates though :(
ik=0
for i in np.arange(array.shape[0]):
for k in np.arange(array.shape[1]):
indices[ik]=np.array([i,k])
ik+=1
Now after this, I want to also make an array with the length of the 'indices' array containing "XYZ coordinates" as in each element containing the XY 'indices' and a Z Value from 'arr'. Is there an easier way (and if possible without iterating through the arrays again) than this:
xyz=np.zeros(indices.shape[0])
for i in range(indices.shape[0]):
xyz=np.array([indices[i,0],indices[i,1],arr[indices[i,0],indices[i,1]]

You can use np.ndindex:
indices = np.ndindex(arr.shape)
This will give an iterator rather than an array, but you can easily convert it to a list:
>>> list(indices)
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
Then you can stack the indices with the original array along the 2nd dimension:
np.hstack((list(indices), arr.reshape((arr.size, 1))))

For your indices:
indices = np.concatenate((np.meshgrid(range(arr.shape[0]), range(arr.shape[1])))

There are probably many ways to achieve this ... A possible solution is the following.
The first problem can be solved using np.unravel_index
max_it = arr.shape[0]*arr.shape[1]
indices = np.vstack(np.unravel_index(np.arange(max_it),arr.shape)).T
The second array can then be constructed with
xyz = np.column_stack((indices,arr[indices[:,0],indices[:,1]]))
Timings
On your array timeit gives for my code 10000 loops, best of 3: 27.7 µs per loop (grc's solution needs 10000 loops, best of 3: 39.6 µs per loop)
On larger arrays with shape=(50,60) I have 1000 loops, best of 3: 247 µs per loop (grc's solution needs 100 loops, best of 3: 2.17 ms per loop)

Related

Numpy.dot nests vector when multiplying [duplicate]

I am using numpy. I have a matrix with 1 column and N rows and I want to get an array from with N elements.
For example, if i have M = matrix([[1], [2], [3], [4]]), I want to get A = array([1,2,3,4]).
To achieve it, I use A = np.array(M.T)[0]. Does anyone know a more elegant way to get the same result?
Thanks!

If you'd like something a bit more readable, you can do this:
A = np.squeeze(np.asarray(M))
Equivalently, you could also do: A = np.asarray(M).reshape(-1), but that's a bit less easy to read.

result = M.A1
https://numpy.org/doc/stable/reference/generated/numpy.matrix.A1.html
matrix.A1
1-d base array

A, = np.array(M.T)
depends what you mean by elegance i suppose but thats what i would do

You can try the following variant:
result=np.array(M).flatten()

np.array(M).ravel()
If you care for speed; But if you care for memory:
np.asarray(M).ravel()

Or you could try to avoid some temps with
A = M.view(np.ndarray)
A.shape = -1

First, Mv = numpy.asarray(M.T), which gives you a 4x1 but 2D array.
Then, perform A = Mv[0,:], which gives you what you want. You could put them together, as numpy.asarray(M.T)[0,:].

This will convert the matrix into array
A = np.ravel(M).T

ravel() and flatten() functions from numpy are two techniques that I would try here. I will like to add to the posts made by Joe, Siraj, bubble and Kevad.
Ravel:
A = M.ravel()
print A, A.shape
>>> [1 2 3 4] (4,)
Flatten:
M = np.array([[1], [2], [3], [4]])
A = M.flatten()
print A, A.shape
>>> [1 2 3 4] (4,)
numpy.ravel() is faster, since it is a library level function which does not make any copy of the array. However, any change in array A will carry itself over to the original array M if you are using numpy.ravel().
numpy.flatten() is slower than numpy.ravel(). But if you are using numpy.flatten() to create A, then changes in A will not get carried over to the original array M.
numpy.squeeze() and M.reshape(-1) are slower than numpy.flatten() and numpy.ravel().
%timeit M.ravel()
>>> 1000000 loops, best of 3: 309 ns per loop
%timeit M.flatten()
>>> 1000000 loops, best of 3: 650 ns per loop
%timeit M.reshape(-1)
>>> 1000000 loops, best of 3: 755 ns per loop
%timeit np.squeeze(M)
>>> 1000000 loops, best of 3: 886 ns per loop

Came in a little late, hope this helps someone,
np.array(M.flat)

Summing over numpy array with modulo

Consider the following setup:
import numpy as np
import itertools as it
A = np.random.rand(3,3,3,16,3,3,3,16) # sum elements of A to arrive at...
B = np.zeros((4,4)) # a 4x4 array (output)
I have a large array 'A' that I want to sum over, but in a very specific way. 'A' has a shape of (x,x,x,16,x,x,x,16) where the 'x' is some integer.
The desired result is a 4x4 matrix 'B', which I can calculate via a for-loop like so:
%%timeit
for x1,y1,z1,s1 in it.product(range(3), range(3), range(3), range(16)):
for x2,y2,z2,s2 in it.product(range(3), range(3), range(3), range(16)):
B[s1%4, s2%4] += A[x1,y1,z1,s1,x2,y2,z2,s2]
>> 134 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
where the elements of B are "modulo-4" of the two axes with 16 elements in that dimension in 'A', here indexed by s1 and s2.
How can I achieve the same by broadcasting, or otherwise? Obviously with larger 'x' (dimensions in 'A'), the for-loop will get exponentially longer to compute, which is not ideal.
EDIT:
C = np.zeros((4,4))
for i,j in it.product(range(4), range(4)):
C[i,j] = A[:,:,:,i::4,:,:,:,j::4].sum()
This seems to work as well. But still involves 1 for-loop. Is there a way to make this any faster?

Here are a cleaner and a faster solution. Unfortunately, they are not the same ...
def clean(A):
return A.reshape(4*n*n*n, 4, 4*n*n*n, 4).sum(axis=(0, 2))
def fast(A):
return np.bincount(np.tile(np.arange(16).reshape(4, 4), (4, 4)).ravel(), A.sum((0,1,2,4,5,6)).ravel(), minlength=16).reshape(4, 4)
At n==6 fast is about three times faster.

How to reshape multiple array in a list in Python

I have a list of 3D arrays that are all different shapes, but I need them to all be the same shape. Also, that shape needs to be the smallest shape in the list.
For example my_list with three arrays have the shapes (115,115,3), (111,111,3), and (113,113,3) then they all need to be (111,111,3). They are all square color images so they will be of shape (x,x,3).
So I have two main problems:
How do I find the smallest shape array without looping or keeping a variable while creating the list?
How do I efficiently set all arrays in a list to the smallest shape?
Currently I am keeping a variable for smallest shape while creating my_list so I can do this:
for idx, img in enumerate(my_list):
img = img[:smallest_shape,:smallest_shape]
my_list[idx] = img
I just feel like this is not the most efficient way, and I do realize I'm losing values by slicing, but I expect that.

I constructed a sample list with
In [513]: alist=[np.ones((512,512,3)) for _ in range(100)]
and did some timings.
Collecting shapes is fast:
In [515]: timeit [a.shape for a in alist]
10000 loops, best of 3: 31.2 µs per loop
Taking the min takes more time:
In [516]: np.min([a.shape for a in alist],axis=0)
Out[516]: array([512, 512, 3])
In [517]: timeit np.min([a.shape for a in alist],axis=0)
1000 loops, best of 3: 344 µs per loop
slicing is faster
In [518]: timeit [a[:500,:500,:] for a in alist]
10000 loops, best of 3: 133 µs per loop
now try to isolate the min step.
In [519]: shapes=[a.shape for a in alist]
In [520]: timeit np.min(shapes, axis=0)
The slowest run took 5.75 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 136 µs per loop
When you have lists of objects, iteration is the only way to deal with all elements. Look at the code for np.hstack and np.vstack (and others). They do one or more list comprehensions to massage all the input arrays into the correct shape. Then they do np.concatenate which iterates too, but in compiled code.

Preallocating ndarrays

How can I preallocate arrays of arrays so I can do appending a bit more efficiently. In Matlab there is a function called cell(required_length) which preallocates 'cells' which can store arrays.
I have an array that currently looks like:
a=np.array([[1,2],[1],[2,3,4]])
b=np.array([[20,2]])
However I wish to append 1000s of more array which are like the 'b' shown but varying in size.

This isn't just a quesiton of preallocating an array, such as np.empty((100,), dtype=int). It's as much a question about how to collect a large number of lists into one structure, whether it be a list or numpy array. The comparison with MATLAB cells is enough, in my opinion, to warrant further discussion.
I think you should be using Python lists. They can contain lists or other objects (incuding arrays) of varying sizes. You can easily append more items (or use extend to add multiple objects). Python has had them for ever; MATLAB added cells to approximate that flexibility.
np.arrays with dtype=object are similar - arrays of pointers to objects such as lists. For the most part they are just lists with an array wrapper. You can initial an array to some large size, and insert/set items.
A = np.empty((10,),dtype=object)
produces an array with 10 elements, each None.
A[0] = [1,2,3]
A[1] = [2,3]
...
You can also concatenate elements to an existing array, but the result is new one. There is a np.append function, but it is just a cover for concatenate; it should not be confused with the list append.
If it must be an array, you can easily construct it from the list at the end. That's what your np.array([[1,2],[1],[2,3,4]]) does.
How to add to numpy array entries of different size in a for loop (similar to Matlab's cell arrays)?
On the issue of speed, let's try simple time tests
def witharray(n):
result=np.empty((n,),dtype=object)
for i in range(n):
result[i]=list(range(i))
return result
def withlist(n):
result=[]
for i in range(n):
result.append(list(range(i)))
return result
which produce
In [111]: withlist(4)
Out[111]: [[], [0], [0, 1], [0, 1, 2]]
In [112]: witharray(4)
Out[112]: array([[], [0], [0, 1], [0, 1, 2]], dtype=object)
In [113]: np.array(withlist(4))
Out[113]: array([[], [0], [0, 1], [0, 1, 2]], dtype=object)
timetests
In [108]: timeit withlist(400)
1000 loops, best of 3: 1.87 ms per loop
In [109]: timeit witharray(400)
100 loops, best of 3: 2.13 ms per loop
In [110]: timeit np.array(withlist(400))
100 loops, best of 3: 8.95 ms per loop
Simply constructing a list of lists is fastest. But if the result must be an object type array, then assigning values to an empty array is faster.

numpy array partial sums with weights

I have a numpy array, say, [a,b,c,d,e,...], and would like to compute an array that would look like [x*a+y*b, x*b+y*c, x*c+y*d,...]. The idea that I have is to first split the original array into something like [[a,b],[b,c],[c,d],[d,e],...] and then attack this creature with np.average specifying the axis and weights (x+y=1 in my case), or even use np.dot. Unfortunately, I don't know how to create such array of [a,b],[b,c],... pairs. Any help, or completely different idea even to accomplish the major task, are much appreciated :-)

The quickest, simplest would be to manually extract two slices of your array and add them together:
>>> arr = np.arange(5)
>>> x, y = 10, 1
>>> x*arr[:-1] + y*arr[1:]
array([ 1, 12, 23, 34])
This will turn into a pain if you want to generalize it to triples, quadruples... But you can create your array of pairs from the original array with as_strided in a much more general form:
>>> from numpy.lib.stride_tricks import as_strided
>>> arr_pairs = as_strided(arr, shape=(len(arr)-2+1,2), strides=arr.strides*2)
>>> arr_pairs
array([[0, 1],
[1, 2],
[2, 3],
[3, 4]])
Of course the nice thing about using as_strided is that, just like with the array slices, there is no data copying involved, just messing with the way memory is viewed, so creating this array is virtually costless.
And now probably the fastest is to use np.dot:
>>> xy = [x, y]
>>> np.dot(arr_pairs, xy)
array([ 1, 12, 23, 34])

This looks like a correlate problem.
a
Out[61]: array([0, 1, 2, 3, 4, 5, 6, 7])
b
Out[62]: array([1, 2])
np.correlate(a,b,mode='valid')
Out[63]: array([ 2, 5, 8, 11, 14, 17, 20])
Depending on array size and BLAS dot can be faster, your milage will vary greatly:
arr = np.random.rand(1E6)
b = np.random.rand(2)
np.allclose(jamie_dot(arr,b),np.convolve(arr,b[::-1],mode='valid'))
True
%timeit jamie_dot(arr,b)
100 loops, best of 3: 16.1 ms per loop
%timeit np.correlate(arr,b,mode='valid')
10 loops, best of 3: 28.8 ms per loop
This is with an intel mkl BLAS and 8 cores, np.correlate will likely be faster for most implementations.
Also an interesting observation from #Jamie's post:
%timeit b[0]*arr[:-1] + b[1]*arr[1:]
100 loops, best of 3: 8.43 ms per loop
His comment also eliminated the use of np.convolve(a,b[::-1],mode=valid) to the simpler correlate syntax.

If you have a small array, I would create a shifted copy:
shifted_array=numpy.append(original_array[1:],0)
result_array=x*original_array+y*shifted_array
Here you have to store your array twice in memory, so this solution is very memory inefficient, but you can get rid of the for loops.
If you have large arrays, you really need a loop (but much rather a list comprehension):
result_array=[x*original_array[i]+y*original_array[i+1] for i in xrange(len(original_array)-1)]
It gives you the same result as a python list, except for the last item, which should be treated differently anyway.
Based on some random trials, for arrays smaller than 2000 items. the first solution seems to be faster than the second one, but runs into MemoryError even for relatively small arrays (a few 10s of thousands on my PC).
So generally, use a list comprehension, but if you surely know that you will run this only on small (max. 1-2 thousand) arrays, you have a better shot.
Creating a new list like [[a,b],[b,c],[c,d],[d,e],...] would be both memory and time inefficient, as you also need a for loop (or similar) to create it, and you have to store every old value in a new array twice, so you would end up with storing your original array three times.

Another way is to create the right pairs in the array a = np.array([a,b,c,d,e,...]), reshape according to the size of array b = np.array([x, y, ...]) and then take advantage of numpy broadcasting rules:
a = np.arange(8)
b = np.array([1, 2])
a = a.repeat(2)[1:-1]
ans = a.reshape(-1, b.shape[0]).dot(b)
Timings (on my computer):
#Ophion's solution:
# 100000 loops, best of 3: 4.67 µs per loop
This solution:
# 100000 loops, best of 3: 9.78 µs per loop
So, it is slower. #Jaime's solution is better since it does not copy the data like this one.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating Numpy-Arrays without iterating in Python - python

For your indices: indices = np.concatenate((np.meshgrid(range(arr.shape[0]), range(arr.shape[1])))

Related

Numpy.dot nests vector when multiplying [duplicate]

Summing over numpy array with modulo

How to reshape multiple array in a list in Python

Preallocating ndarrays

numpy array partial sums with weights

Categories

Resources