Summing over numpy array with modulo - python

Consider the following setup:
import numpy as np
import itertools as it
A = np.random.rand(3,3,3,16,3,3,3,16) # sum elements of A to arrive at...
B = np.zeros((4,4)) # a 4x4 array (output)
I have a large array 'A' that I want to sum over, but in a very specific way. 'A' has a shape of (x,x,x,16,x,x,x,16) where the 'x' is some integer.
The desired result is a 4x4 matrix 'B', which I can calculate via a for-loop like so:
%%timeit
for x1,y1,z1,s1 in it.product(range(3), range(3), range(3), range(16)):
for x2,y2,z2,s2 in it.product(range(3), range(3), range(3), range(16)):
B[s1%4, s2%4] += A[x1,y1,z1,s1,x2,y2,z2,s2]
>> 134 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
where the elements of B are "modulo-4" of the two axes with 16 elements in that dimension in 'A', here indexed by s1 and s2.
How can I achieve the same by broadcasting, or otherwise? Obviously with larger 'x' (dimensions in 'A'), the for-loop will get exponentially longer to compute, which is not ideal.
EDIT:
C = np.zeros((4,4))
for i,j in it.product(range(4), range(4)):
C[i,j] = A[:,:,:,i::4,:,:,:,j::4].sum()
This seems to work as well. But still involves 1 for-loop. Is there a way to make this any faster?

Here are a cleaner and a faster solution. Unfortunately, they are not the same ...
def clean(A):
return A.reshape(4*n*n*n, 4, 4*n*n*n, 4).sum(axis=(0, 2))
def fast(A):
return np.bincount(np.tile(np.arange(16).reshape(4, 4), (4, 4)).ravel(), A.sum((0,1,2,4,5,6)).ravel(), minlength=16).reshape(4, 4)
At n==6 fast is about three times faster.

Related

numpy array assignment is slower than python list

numpy-
arr = np.array([[1, 2, 3, 4]])
row = np.array([1, 2, 3, 4])
%timeit arr[0] = row
466 ns ± 12.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
python list -
arr = [[1, 2, 3, 4]]
row = [1, 2, 3, 4]
%timeit arr[0] = row
59.3 ns ± 2.94 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each
Shouldn't numpy be the faster version here?
Here's what I'm aiming to get done -
arr = np.empty((150, 4))
while True:
row = get_row_from_api()
arr[-1] = row
Yep, using python lists this way would definitely be faster, because when you assign something to a python list element, it's not copied, just some references are reassigned (https://developers.google.com/edu/python/lists). Numpy instead copies all the elements from the source container to the target one. I'm not sure whether you need numpy arrays here, because their creation is not free and python lists aren't that slow at creation (and as we see at assignment as well).
The underlying semantics of the two operations are very different. Python lists are arrays of references. Numpy arrays are arrays of the data itself.
The line row = get_row_from_api() implies that a fresh list has already been allocated.
Assigning to a list as lst[-1] = row just writes an address into lst. That's generally 4 or 8 bytes.
Placing in an array as arr[i] = row is copying data. It's a shorthand for arr[i, :] = row. Every element of row gets copied to the buffer of arr. If row was a list, that incurs additional overhead to convert from python objects to native numerical types.
Remember that premature optimization is pointless. Your time savings for one method vs the other are likely to be negligible. At the same time, if you need an array later down the line anyway, it's likely faster to pre-allocate and take a small speed hit rather than calling np.array on the final list. In the former case, you allocate a buffer of predetermined size and dtype. In the latter, you've merely deferred the overhead of copying the data, but also incurred the overhead of having to figure out the array size and dtype.

Python: Creating a function that compares 2 arrays and inserts the larger elements between the 2 arrays

I am trying to create a function that compares 2 arrays and creates a new list with the maximum elements of the list without using numpy. I managed to create a manual version, but am having issues implementing this into a function.
Task: Create a function maximum_arrays(a,b) that compares both arrays a and b element-wise and returns a new array containing the larger elements. Use the insert2 function to add new elements to a list.
Example: from applying the function to the arrays a=[12,5,8,19,6] and b=[3,6,2,12,4] the result should be c=[12,6,8,19,6].
Current Code:
list_a = [12,5,8,19,6]
list_b = [3,6,2,12,4]
maximum_arrays = []
for item in list_a:
if list_b[item] > list_a[item]:
maximum_arrays.insert(list_b[item])
else:
maximum_arrays.insert(list_a[item])
print(maximum_arrays)
Manual Version:
list_a = [12,5,8,19,6]
list_b = [3,6,2,12,4]
#answer example
c = [12,6,8,19,6]
#empty list
maximum_arrays = []
#for each part of the list, choose the highest number of the other list and insert
maximum_arrays.insert(0, max(list_a[0],list_b[0]))
maximum_arrays.insert(1, max(list_a[1],list_b[1]))
maximum_arrays.insert(2, max(list_a[2],list_b[2]))
maximum_arrays.insert(3, max(list_a[3],list_b[3]))
maximum_arrays.insert(4, max(list_a[4],list_b[4]))
print(maximum_arrays)
use max in a list comprehension over the zipped lists, or numpy.max.
list_a = [12,5,8,19,6]
list_b = [3,6,2,12,4]
max_array = [max(i) for i in zip(list_a, list_b)]
print(max_array)
The explanation here is: zip turns n iterables into an iterator over tuples, where each tuple has n items. So, in the two-list case, zip([1, 2, 3], [4, 5, 6]) turns into ((1, 4), (2, 5), (3, 6)). Taking the max of all of these tuples gives you your list.
An important caveat, and one that has burned me several times, is that the number of tuples generated is the length of the shortest iterable in the zip. In other words, zip does not throw an exception when passed iterables of different lengths, and just stops when one of the input lists runs out. In this respect it differs from numpy.max, which does throw an error when given lists of different lengths.
Are you looking for something like this :
list_a = [12,5,8,19,6]
list_b = [3,6,2,12,4]
l = []
for i,j in enumerate(zip(list_a, list_b)):
l.insert(i, max(j))
print(l)
Other way using itertools.starmap:
list(starmap(max, zip(list_a, list_b)))
Output:
[12, 6, 8, 19, 6]
This is about 1.4x faster than list comprehension:
%timeit list(starmap(max, zip(list_a, list_b)))
# 1.19 µs ± 49.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit [max(i) for i in zip(list_a, list_b)]
# 1.69 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
You can solve many ways , Same as your function or list comprehension using zip function.
Below solution on your way, If both length is same then take one list length and iterate and use the "append"/"insert" option to add the value to list.
list_a = [12,5,8,19,6]
list_b = [3,6,2,12,4]
maximum_arrays = []
list_length = len(list_a)
for item in range(list_length):
if list_b[item] > list_a[item]:
maximum_arrays.append(list_b[item])
else:
maximum_arrays.append(list_a[item])
print(maximum_arrays)

Python command to check if an element is NOT in the list [duplicate]

I have a list of tuples in Python, and I have a conditional where I want to take the branch ONLY if the tuple is not in the list (if it is in the list, then I don't want to take the if branch)
if curr_x -1 > 0 and (curr_x-1 , curr_y) not in myList:
# Do Something
This is not really working for me though. What have I done wrong?
The bug is probably somewhere else in your code, because it should work fine:
>>> 3 not in [2, 3, 4]
False
>>> 3 not in [4, 5, 6]
True
Or with tuples:
>>> (2, 3) not in [(2, 3), (5, 6), (9, 1)]
False
>>> (2, 3) not in [(2, 7), (7, 3), "hi"]
True
How do I check if something is (not) in a list in Python?
The cheapest and most readable solution is using the in operator (or in your specific case, not in). As mentioned in the documentation,
The operators in and not in test for membership. x in s evaluates to
True if x is a member of s, and False otherwise. x not in s returns
the negation of x in s.
Additionally,
The operator not in is defined to have the inverse true value of in.
y not in x is logically the same as not y in x.
Here are a few examples:
'a' in [1, 2, 3]
# False
'c' in ['a', 'b', 'c']
# True
'a' not in [1, 2, 3]
# True
'c' not in ['a', 'b', 'c']
# False
This also works with tuples, since tuples are hashable (as a consequence of the fact that they are also immutable):
(1, 2) in [(3, 4), (1, 2)]
# True
If the object on the RHS defines a __contains__() method, in will internally call it, as noted in the last paragraph of the Comparisons section of the docs.
... in and not in,
are supported by types that are iterable or implement the
__contains__() method. For example, you could (but shouldn't) do this:
[3, 2, 1].__contains__(1)
# True
in short-circuits, so if your element is at the start of the list, in evaluates faster:
lst = list(range(10001))
%timeit 1 in lst
%timeit 10000 in lst # Expected to take longer time.
68.9 ns ± 0.613 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
178 µs ± 5.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
If you want to do more than just check whether an item is in a list, there are options:
list.index can be used to retrieve the index of an item. If that element does not exist, a ValueError is raised.
list.count can be used if you want to count the occurrences.
The XY Problem: Have you considered sets?
Ask yourself these questions:
do you need to check whether an item is in a list more than once?
Is this check done inside a loop, or a function called repeatedly?
Are the items you're storing on your list hashable? IOW, can you call hash on them?
If you answered "yes" to these questions, you should be using a set instead. An in membership test on lists is O(n) time complexity. This means that python has to do a linear scan of your list, visiting each element and comparing it against the search item. If you're doing this repeatedly, or if the lists are large, this operation will incur an overhead.
set objects, on the other hand, hash their values for constant time membership check. The check is also done using in:
1 in {1, 2, 3}
# True
'a' not in {'a', 'b', 'c'}
# False
(1, 2) in {('a', 'c'), (1, 2)}
# True
If you're unfortunate enough that the element you're searching/not searching for is at the end of your list, python will have scanned the list upto the end. This is evident from the timings below:
l = list(range(100001))
s = set(l)
%timeit 100000 in l
%timeit 100000 in s
2.58 ms ± 58.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
101 ns ± 9.53 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
As a reminder, this is a suitable option as long as the elements you're storing and looking up are hashable. IOW, they would either have to be immutable types, or objects that implement __hash__.
One can also use the count method of list class:
lets say we have a list:
x = [10,20,30,40,50]
To confirm if we have an element(i.e 10) in the list or Not and the frequency of its occurrence:
if x.count(10):
print(x.count(10))
else:
print(10," Not in the list")
I know this is a very old question but in the OP's actual question of "What have I done wrong?", the problem seems to be in how to code:
take the branch ONLY if the tuple is not in the list
This is logically equivalent to (as OP observes)
IF tuple in list THEN don't take the branch
It's, however, entirely silent on what should happen IF tuple not in list. In particular, it doesn't follow that
IF tuple not in list THEN take the branch
So OP's rule never mentions what to do IF tuple not in list. Apart from that, as the other answers have noted, not in is the correct syntax to check if an object is in a list (or any container really).
my_tuple not in my_list # etc.

Creating Numpy-Arrays without iterating in Python

Say I have a numpy array with shape (2,3) filled with floats.
I also need an array of all possible combinations of X and Y Values (their corresponding position in the array). Is there something like a simpe function to get the indices as a tuple from a numpy array in which I don't need to have for-loops iterate through the array?
Example Code:
arr=np.array([np.array([1.0,1.1,1.2]),
np.array([1.0,1.1,1.2])])
indices=np.zeros([arr.shape[0]*arr.shape[1]])
#I want an array of length 6 like np.array([[0,0],[0,1],[0,2],[1,0],[1,1], [1,2]])
#Code so far, iterates though :(
ik=0
for i in np.arange(array.shape[0]):
for k in np.arange(array.shape[1]):
indices[ik]=np.array([i,k])
ik+=1
Now after this, I want to also make an array with the length of the 'indices' array containing "XYZ coordinates" as in each element containing the XY 'indices' and a Z Value from 'arr'. Is there an easier way (and if possible without iterating through the arrays again) than this:
xyz=np.zeros(indices.shape[0])
for i in range(indices.shape[0]):
xyz=np.array([indices[i,0],indices[i,1],arr[indices[i,0],indices[i,1]]
You can use np.ndindex:
indices = np.ndindex(arr.shape)
This will give an iterator rather than an array, but you can easily convert it to a list:
>>> list(indices)
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
Then you can stack the indices with the original array along the 2nd dimension:
np.hstack((list(indices), arr.reshape((arr.size, 1))))
For your indices:
indices = np.concatenate((np.meshgrid(range(arr.shape[0]), range(arr.shape[1])))
There are probably many ways to achieve this ... A possible solution is the following.
The first problem can be solved using np.unravel_index
max_it = arr.shape[0]*arr.shape[1]
indices = np.vstack(np.unravel_index(np.arange(max_it),arr.shape)).T
The second array can then be constructed with
xyz = np.column_stack((indices,arr[indices[:,0],indices[:,1]]))
Timings
On your array timeit gives for my code 10000 loops, best of 3: 27.7 µs per loop (grc's solution needs 10000 loops, best of 3: 39.6 µs per loop)
On larger arrays with shape=(50,60) I have 1000 loops, best of 3: 247 µs per loop (grc's solution needs 100 loops, best of 3: 2.17 ms per loop)

Numpy matrix to array

I am using numpy. I have a matrix with 1 column and N rows and I want to get an array from with N elements.
For example, if i have M = matrix([[1], [2], [3], [4]]), I want to get A = array([1,2,3,4]).
To achieve it, I use A = np.array(M.T)[0]. Does anyone know a more elegant way to get the same result?
Thanks!
If you'd like something a bit more readable, you can do this:
A = np.squeeze(np.asarray(M))
Equivalently, you could also do: A = np.asarray(M).reshape(-1), but that's a bit less easy to read.
result = M.A1
https://numpy.org/doc/stable/reference/generated/numpy.matrix.A1.html
matrix.A1
1-d base array
A, = np.array(M.T)
depends what you mean by elegance i suppose but thats what i would do
You can try the following variant:
result=np.array(M).flatten()
np.array(M).ravel()
If you care for speed; But if you care for memory:
np.asarray(M).ravel()
Or you could try to avoid some temps with
A = M.view(np.ndarray)
A.shape = -1
First, Mv = numpy.asarray(M.T), which gives you a 4x1 but 2D array.
Then, perform A = Mv[0,:], which gives you what you want. You could put them together, as numpy.asarray(M.T)[0,:].
This will convert the matrix into array
A = np.ravel(M).T
ravel() and flatten() functions from numpy are two techniques that I would try here. I will like to add to the posts made by Joe, Siraj, bubble and Kevad.
Ravel:
A = M.ravel()
print A, A.shape
>>> [1 2 3 4] (4,)
Flatten:
M = np.array([[1], [2], [3], [4]])
A = M.flatten()
print A, A.shape
>>> [1 2 3 4] (4,)
numpy.ravel() is faster, since it is a library level function which does not make any copy of the array. However, any change in array A will carry itself over to the original array M if you are using numpy.ravel().
numpy.flatten() is slower than numpy.ravel(). But if you are using numpy.flatten() to create A, then changes in A will not get carried over to the original array M.
numpy.squeeze() and M.reshape(-1) are slower than numpy.flatten() and numpy.ravel().
%timeit M.ravel()
>>> 1000000 loops, best of 3: 309 ns per loop
%timeit M.flatten()
>>> 1000000 loops, best of 3: 650 ns per loop
%timeit M.reshape(-1)
>>> 1000000 loops, best of 3: 755 ns per loop
%timeit np.squeeze(M)
>>> 1000000 loops, best of 3: 886 ns per loop
Came in a little late, hope this helps someone,
np.array(M.flat)

Categories

Resources