How to reshape multiple array in a list in Python - python

I have a list of 3D arrays that are all different shapes, but I need them to all be the same shape. Also, that shape needs to be the smallest shape in the list.
For example my_list with three arrays have the shapes (115,115,3), (111,111,3), and (113,113,3) then they all need to be (111,111,3). They are all square color images so they will be of shape (x,x,3).
So I have two main problems:
How do I find the smallest shape array without looping or keeping a variable while creating the list?
How do I efficiently set all arrays in a list to the smallest shape?
Currently I am keeping a variable for smallest shape while creating my_list so I can do this:
for idx, img in enumerate(my_list):
img = img[:smallest_shape,:smallest_shape]
my_list[idx] = img
I just feel like this is not the most efficient way, and I do realize I'm losing values by slicing, but I expect that.

I constructed a sample list with
In [513]: alist=[np.ones((512,512,3)) for _ in range(100)]
and did some timings.
Collecting shapes is fast:
In [515]: timeit [a.shape for a in alist]
10000 loops, best of 3: 31.2 µs per loop
Taking the min takes more time:
In [516]: np.min([a.shape for a in alist],axis=0)
Out[516]: array([512, 512, 3])
In [517]: timeit np.min([a.shape for a in alist],axis=0)
1000 loops, best of 3: 344 µs per loop
slicing is faster
In [518]: timeit [a[:500,:500,:] for a in alist]
10000 loops, best of 3: 133 µs per loop
now try to isolate the min step.
In [519]: shapes=[a.shape for a in alist]
In [520]: timeit np.min(shapes, axis=0)
The slowest run took 5.75 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 136 µs per loop
When you have lists of objects, iteration is the only way to deal with all elements. Look at the code for np.hstack and np.vstack (and others). They do one or more list comprehensions to massage all the input arrays into the correct shape. Then they do np.concatenate which iterates too, but in compiled code.

Related

Numpy.dot nests vector when multiplying [duplicate]

I am using numpy. I have a matrix with 1 column and N rows and I want to get an array from with N elements.
For example, if i have M = matrix([[1], [2], [3], [4]]), I want to get A = array([1,2,3,4]).
To achieve it, I use A = np.array(M.T)[0]. Does anyone know a more elegant way to get the same result?
Thanks!
If you'd like something a bit more readable, you can do this:
A = np.squeeze(np.asarray(M))
Equivalently, you could also do: A = np.asarray(M).reshape(-1), but that's a bit less easy to read.
result = M.A1
https://numpy.org/doc/stable/reference/generated/numpy.matrix.A1.html
matrix.A1
1-d base array
A, = np.array(M.T)
depends what you mean by elegance i suppose but thats what i would do
You can try the following variant:
result=np.array(M).flatten()
np.array(M).ravel()
If you care for speed; But if you care for memory:
np.asarray(M).ravel()
Or you could try to avoid some temps with
A = M.view(np.ndarray)
A.shape = -1
First, Mv = numpy.asarray(M.T), which gives you a 4x1 but 2D array.
Then, perform A = Mv[0,:], which gives you what you want. You could put them together, as numpy.asarray(M.T)[0,:].
This will convert the matrix into array
A = np.ravel(M).T
ravel() and flatten() functions from numpy are two techniques that I would try here. I will like to add to the posts made by Joe, Siraj, bubble and Kevad.
Ravel:
A = M.ravel()
print A, A.shape
>>> [1 2 3 4] (4,)
Flatten:
M = np.array([[1], [2], [3], [4]])
A = M.flatten()
print A, A.shape
>>> [1 2 3 4] (4,)
numpy.ravel() is faster, since it is a library level function which does not make any copy of the array. However, any change in array A will carry itself over to the original array M if you are using numpy.ravel().
numpy.flatten() is slower than numpy.ravel(). But if you are using numpy.flatten() to create A, then changes in A will not get carried over to the original array M.
numpy.squeeze() and M.reshape(-1) are slower than numpy.flatten() and numpy.ravel().
%timeit M.ravel()
>>> 1000000 loops, best of 3: 309 ns per loop
%timeit M.flatten()
>>> 1000000 loops, best of 3: 650 ns per loop
%timeit M.reshape(-1)
>>> 1000000 loops, best of 3: 755 ns per loop
%timeit np.squeeze(M)
>>> 1000000 loops, best of 3: 886 ns per loop
Came in a little late, hope this helps someone,
np.array(M.flat)

Fast counts of elements of numpy array by value thresholds in another array

Given a numpy array of threshold values, what is the most efficient way to produce an array of the counts of another array meeting these values?
Assume the threshold value array is small and sorted, and the array of values to be counted is large-ish and unsorted.
Example: for each element of valueLevels, count the elements of values greater than or equal to it:
import numpy as np
n = int(1e5) # size of example
# example levels: the sequence 0, 1., 2.5, 5., 7.5, 10, 5, ... 50000, 75000
valueLevels = np.concatenate(
[np.array([0.]),
np.concatenate([ [ x*10**y for x in [1., 2.5, 5., 7.5] ]
for y in range(5) ] )
]
)
np.random.seed(123)
values = np.random.uniform(low=0, high=1e5, size=n)
So far I have tried the list comprehension approach.
np.array([sum(values>=x) for x in valueLevels])was unacceptably slow
np.array([len(values[values>=x]) for x in valueLevels]) was an improvement
sorting values did speed up the comprehension (in the example, from ~7 to 0.5 ms), but the cost of sort (~8 ms) exceeded the savings for one-time use
The best I have right now is a comprehension of this approach:
%%timeit
np.array([np.count_nonzero(values>=x) for x in valueLevels])
# 1000 loops, best of 3: 1.26 ms per loop
which is acceptable for my purposes, but out of curiosity,
What I would like to know is
If list comprehension is the way to go, can it be sped up? Or,
Are other approaches faster? (I have a vague sense that this could be done by broadcasting the values array over the thresholds array, but I can't figure out how to get the dimensions right for np.broadcast_arrays().
The fastest I have so far is
%timeit count_nonzero(values >= atleast_2d(valueLevels).T, axis=1)
# 1000 loops, best of 3: 860 µs per loop
sum is slower:
%timeit sum(values >= atleast_2d(valueLevels).T, axis=1)
# 100 loops, best of 3: 2.5 ms per loop
#Divakar's version is even slower:
%timeit count_nonzero(values[:, None] >= valueLevels, axis=1)
# 100 loops, best of 3: 3.86 ms per loop
However, I would probably still use your list comprehension, which is not much slower and does not create a big 2D boolean array as an intermediate step:
%timeit np.array([np.count_nonzero(values>=x) for x in valueLevels])
# 1000 loops, best of 3: 987 µs per loop
Approach #1 Using np.searchsorted -
values.size - np.searchsorted(values,valueLevels,sorter=values.argsort())
Approach #2 Using NumPy broadcasting -
(values[:,None]>=valueLevels).sum(0)

python set comprehension for 2.6

I was trying set comprehension for 2.6, and came across the following two ways. I thought the first method would be faster than the second, timeit suggested otherwise. Why is the second method faster even though the second method has got an extra list instantiation followed by a set instantiation?
Method 1:
In [16]: %timeit set(node[0] for node in pwnodes if node[1].get('pm'))
1000000 loops, best of 3: 568 ns per loop
Method 2:
In [17]: %timeit set([node[0] for node in pwnodes if node[1].get('pm')])
1000000 loops, best of 3: 469 ns per loop
where pwnodes = [('e1', dict(pm=1, wired=1)), ('e2', dict(pm=1, wired=1))].
Iteration is simply faster when using a list comprehension:
In [23]: from collections import deque
In [24]: %timeit deque((node[0] for node in pwnodes if node[1].get('pm')), maxlen=0)
1000 loops, best of 3: 305 µs per loop
In [25]: %timeit deque([node[0] for node in pwnodes if node[1].get('pm')], maxlen=0)
1000 loops, best of 3: 246 µs per loop
The deque is used to illustrate iteration speed; a deque with maxlen set to 0 discards all elements taken from the iterable so there are no memory allocation differences to skew the results.
That's because in Python 2, list comprehensions don't use a separate namespace, while a generator expression does (it has to, by necessity). That extra namespace requires a new frame on the stack, and this is expensive. The major advantage of generator expressions is their low memory footprint, not their speed.
In Python 3, list comprehensions have a separate namespace as well, and list comprehension and generator iteration speed is comparable. You also have set comprehensions, which are fastest still, even on Python 2.
My guess is because the second one involves a generator and the first one doesn't. Generators are generally slower than the equivalent list if the equivalent list fits in memory.
In [4]: timeit for i in [i for i in range(1000)]: pass
10000 loops, best of 3: 47.2 µs per loop
In [5]: timeit for i in (i for i in range(1000)): pass
10000 loops, best of 3: 57.8 µs per loop

Creating Numpy-Arrays without iterating in Python

Say I have a numpy array with shape (2,3) filled with floats.
I also need an array of all possible combinations of X and Y Values (their corresponding position in the array). Is there something like a simpe function to get the indices as a tuple from a numpy array in which I don't need to have for-loops iterate through the array?
Example Code:
arr=np.array([np.array([1.0,1.1,1.2]),
np.array([1.0,1.1,1.2])])
indices=np.zeros([arr.shape[0]*arr.shape[1]])
#I want an array of length 6 like np.array([[0,0],[0,1],[0,2],[1,0],[1,1], [1,2]])
#Code so far, iterates though :(
ik=0
for i in np.arange(array.shape[0]):
for k in np.arange(array.shape[1]):
indices[ik]=np.array([i,k])
ik+=1
Now after this, I want to also make an array with the length of the 'indices' array containing "XYZ coordinates" as in each element containing the XY 'indices' and a Z Value from 'arr'. Is there an easier way (and if possible without iterating through the arrays again) than this:
xyz=np.zeros(indices.shape[0])
for i in range(indices.shape[0]):
xyz=np.array([indices[i,0],indices[i,1],arr[indices[i,0],indices[i,1]]
You can use np.ndindex:
indices = np.ndindex(arr.shape)
This will give an iterator rather than an array, but you can easily convert it to a list:
>>> list(indices)
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
Then you can stack the indices with the original array along the 2nd dimension:
np.hstack((list(indices), arr.reshape((arr.size, 1))))
For your indices:
indices = np.concatenate((np.meshgrid(range(arr.shape[0]), range(arr.shape[1])))
There are probably many ways to achieve this ... A possible solution is the following.
The first problem can be solved using np.unravel_index
max_it = arr.shape[0]*arr.shape[1]
indices = np.vstack(np.unravel_index(np.arange(max_it),arr.shape)).T
The second array can then be constructed with
xyz = np.column_stack((indices,arr[indices[:,0],indices[:,1]]))
Timings
On your array timeit gives for my code 10000 loops, best of 3: 27.7 µs per loop (grc's solution needs 10000 loops, best of 3: 39.6 µs per loop)
On larger arrays with shape=(50,60) I have 1000 loops, best of 3: 247 µs per loop (grc's solution needs 100 loops, best of 3: 2.17 ms per loop)

Efficiently generate numpy array from list comprehension output?

Is there a more efficient way than using numpy.asarray() to generate an array from output in the form of a list?
This appears to be copying everything in memory, which doesn't seem like it would be that efficient with very large arrays.
(Updated) Example:
import numpy as np
a1 = np.array([1,2,3,4,5,6,7,8,9,10]) # pretend this has thousands of elements
a2 = np.array([3,7,8])
results = np.asarray([np.amax(np.where(a1 > element)) for element in a2])
I usually use np.fromiter:
results = np.fromiter((np.amax(np.amax(np.where(a1 > element)) for element in a2), dtype=int, count=len(a2))
You don't need to specify count but it allows numpy to preallocate the array. Here are some timings I did on https://www.pythonanywhere.com/try-ipython/:
In [8]: %timeit np.asarray([np.amax(np.where(a1 > element)) for element in a2])
1000 loops, best of 3: 161 us per loop
In [10]: %timeit np.frompyfunc(lambda element: np.amax(np.where(a1 > element)),1,1)(a2,out=np.empty_like(a2))
10000 loops, best of 3: 123 us per loop
In [13]: %timeit np.fromiter((np.amax(np.where(a1 > element)) for element in a2),dtype=int, count=len(a2))
10000 loops, best of 3: 111 us per loop
np.vectorize won't work the way you want, because it doesn't respect an out parameter. However, the lower-level np.frompyfunc will:
np.frompyfunc(lambda element: np.amax(np.where(a1 > element)),
1, 1)(a2, out=np.empty_like(a2))

Categories

Resources