Numpy subtraction from two arrays - python

I have two numpy arrays like below
a=np.array([11,12])
b=np.array([9])
#a-b should be [2,12]
I want to subtract both a & b such that result should [2,12]. How can I achieve this result?

You can zero-pad one of the array.
import numpy as np
n = max(len(a), len(b))
a_pad = np.pad(a, (0, n - len(a)), 'constant')
b_pad = np.pad(b, (0, n - len(b)), 'constant')
ans = a_pad - b_pad
Here np.pad's second argument is (#of left pads, #of right pads)

A similar method to #BlownhitherMa, would be to create an array of zeros the size of a (we can call it c), then put in b's values where appropriate:
c = np.zeros_like(a)
c[np.indices(b.shape)] = b
>>> c
array([9, 0])
>>> a-c
array([ 2, 12])

You could use zip_longest from itertools:
import numpy as np
from itertools import zip_longest
a = np.array([11, 12])
b = np.array([9])
result = np.array([ai - bi for ai, bi in zip_longest(a, b, fillvalue=0)])
print(result)
Output
[ 2 12]

Here is a very long laid out solution.
diff =[]
n = min(len(a), len(b))
for i in range (n):
diff.append(a[i] - b[i])
if len(a) > n:
for i in range(n,len(a)):
diff.append(a[i])
elif len(b) > n:
for i in range(n,len(b)):
diff.append(b[i])
diff=np.array(diff)
print(diff)

We can avoid unnecessary padding / temporaries by copying a and then subtracting b in-place:
# let numpy determine appropriate dtype
dtp = (a[:0]-b[:0]).dtype
# copy a
d = a.astype(dtp)
# subtract b
d[:b.size] -= b

Related

Outer product, vectorial operation and loop - numpy

I have two arrays of size 15 : A = [a_0, ... , a_14] and B = [b_0, ..., b_14]
Goal: obtain the array C of size 8 resulting from
C = [a_0] * [b_7, ..., b_14] + [a_2, a_3] * [b_3, b_4, b_5, b_6] + [a_3, a_4, a_5, a_6] * [b_2, b_3] + [a_7, ..., a_14] * [b_0]
where * is the outer product np.outer. Note that:
each sub-array is of length 2^i for i between 0 and 3.
from the outer product, we obtain two vectors of size (8) and two matrices of sizes (2, 4) and (4, 2). We suppose that we flatten immediately after the product, in order to be able to sum the four products and have at the end a long vector of size 8.
My implementation is the following:
inds = [0, 1, 3, 7, 15]
C = np.zeros(8)
d = 4
for i in range(d):
left = A[inds[i]:inds[i+1]]
right = B[inds[d-i-1]:inds[d-i]]
C += (left[:, None]*right[None, :]).ravel() # same as np.outer(left, right).ravel()
Question: what is the fastest way to obtain C ? i.e. is there a way to avoid having this for loop to perform the summation ?
If not: what are my options ? code in C++ ? Cython ?
NB: this is to be generalized for loops of range(L+1) with L any integer. In the example above I have illustrated the case L=3 for better comprehension. FYI, the generalized code would look like this:
L = 3
inds = np.cumsum([2**k for k in range(0, L+1)])
inds = np.concatenate(([0], inds))
# Input arrays A and B are of size inds[-1]
C = np.zeros(2**L)
d = L+1
for i in range(d):
left = A[inds[i]:inds[i+1]]
right = B[inds[d-i-1]:inds[d-i]]
C += (left[:, None]*right[None, :]).ravel() # same as np.outer(left, right).ravel()
I think you can simply do:
C = np.outer(A[0], B[7:])+\
np.outer(A[[2,3]], B[[3,4,5,6]]).ravel()+\
np.outer(A[[3,4,5,6]], B[[2,3]]).ravel()+\
np.outer(A[7:], B[0]).ravel()
Am I wrong?

Faster alternative to np.where for a sorted array

Given a large array a which is sorted along each row, is there faster alternative to numpy's np.where to find the indices where min_v <= a <= max_v? I would imagine that leveraging the sorted nature of the array should be able to speed things up.
Here's an example of a setup using np.where to find the given indices in a large array.
import numpy as np
# Initialise an example of an array in which to search
r, c = int(1e2), int(1e6)
a = np.arange(r*c).reshape(r, c)
# Set up search limits
min_v = (r*c/2)-10
max_v = (r*c/2)+10
# Find indices of occurrences
idx = np.where(((a >= min_v) & (a <= max_v)))
You can use np.searchsorted:
import numpy as np
r, c = 10, 100
a = np.arange(r*c).reshape(r, c)
min_v = ((r * c) // 2) - 10
max_v = ((r * c) // 2) + 10
# Old method
idx = np.where(((a >= min_v) & (a <= max_v)))
# With searchsorted
i1 = np.searchsorted(a.ravel(), min_v, 'left')
i2 = np.searchsorted(a.ravel(), max_v, 'right')
idx2 = np.unravel_index(np.arange(i1, i2), a.shape)
print((idx[0] == idx2[0]).all() and (idx[1] == idx2[1]).all())
# True
When I use np.searchsorted with the 100 million numbers in the original example with the not up-to-date NumPy version 1.12.1 (can't tell about newer versions), it is not much faster than np.where:
>>> import timeit
>>> timeit.timeit('np.where(((a >= min_v) & (a <= max_v)))', number=10, globals=globals())
6.685825735330582
>>> timeit.timeit('np.searchsorted(a.ravel(), [min_v, max_v])', number=10, globals=globals())
5.304438766092062
But, despite the NumPy docs for searchsorted say This function uses the same algorithm as the builtin python bisect.bisect_left and bisect.bisect_right functions, the latter are a lot faster:
>>> import bisect
>>> timeit.timeit('bisect.bisect_left(a.base, min_v), bisect.bisect_right(a.base, max_v)', number=10, globals=globals())
0.002058468759059906
Therefore, I'd use this:
idx = np.unravel_index(range(bisect.bisect_left(a.base, min_v),
bisect.bisect_right(a.base, max_v)), a.shape)

How to use numpy to generate random numbers on segmentation intervals

I am using numpy module in python to generate random numbers. When I need to generate random numbers in a continuous interval such as [a,b], I will use
(b-a)*np.random.rand(1)+a
but now I Need to generate a uniform random number in the interval [a, b] and [c, d], what should I do?
I want to generate a random number that is uniform over the length of all the intervals. I do not select an interval with equal probability, and then generate a random number inside the interval. If [a, b] and [c, d] are equal in length, There is no problem with this use, but when the lengths of the intervals are not equal, the random numbers generated by this method are not completely uniform.
You could do something like
a,b,c,d = 1,2,7,9
N = 10
r = np.random.uniform(a-b,d-c,N)
r += np.where(r<0,b,c)
r
# array([7.30557415, 7.42185479, 1.48986144, 7.95916547, 1.30422703,
# 8.79749665, 8.19329762, 8.72669862, 1.88426196, 8.33789181])
You can use
np.random.uniform(a,b)
for your random numbers between a and b (including a but excluding b)
So for random number in [a,b] and [c,d], you can use
np.random.choice( [np.random.uniform(a,b) , np.random.uniform(c,d)] )
Here's a recipe:
def random_multiinterval(*intervals, shape=(1,)):
# FIXME assert intervals are valid and non-overlapping
size = sum(i[1] - i[0] for i in intervals)
v = size * np.random.rand(*shape)
res = np.zeros_like(v)
for i in intervals:
res += (0 < v) * (v < (i[1] - i[0])) * (i[0] + v)
v -= i[1] - i[0]
return res
In [11]: random_multiinterval((1, 2), (3, 4))
Out[11]: array([1.34391171])
In [12]: random_multiinterval((1, 2), (3, 4), shape=(3, 3))
Out[12]:
array([[1.42936024, 3.30961893, 1.01379663],
[3.19310627, 1.05386192, 1.11334538],
[3.2837065 , 1.89239373, 3.35785566]])
Note: This is uniformly distributed over N (non-overlapping) intervals, even if they have different sizes.
You can just assign a probability for how likely it will be [a,b] or [c,d] and then generate accordingly:
import numpy as np
import random
random_roll = random.random()
a = 1
b = 5
c = 7
d = 10
if random_roll > .5: # half the time we will use [a,b]
my_num = (b - a) * np.random.rand(1) + a
else: # the other half we will use [c,d]
my_num = (d - c) * np.random.rand(1) + c
print(my_num)

Vectorizing outer and inner loop when these contain calculations and deletes

I've been checking out how to vectorize an outer and inner for loop. These have some calculations and also a delete inside them - that seems to make it much less straight forward.
How would this be vectorized best?
import numpy as np
flattenedArray = np.ndarray.tolist(someNumpyArray)
#flattenedArray is a python list of lists.
c = flattenedArray[:]
for a in range (len(flattenedArray)):
for b in range(a+1, len(flattenedArray)):
if a == b:
continue
i0 = flattenedArray[a][0]
j0 = flattenedArray[a][1]
z0 = flattenedArray[a][2]
i1 = flattenedArray[b][0]
i2 = flattenedArray[b][1]
z1 = flattenedArray[b][2]
if ((np.square(z0-z1)) <= (np.square(i0-i1) + (np.square(j0-j2)))):
if (np.square(i0-i1) + (np.square(j0-j1))) <= (np.square(z0+z1)):
c.remove(flattenedArray[b])
#MSeifert is, of course, as so often right. So the following full vectorisation is only to show "how it's done"
import numpy as np
N = 4
data = np.random.random((N, 3))
# vectorised code
j, i = np.tril_indices(N, -1) # chose tril over triu to have contiguous columns
# useful later
sqsum = np.square(data[i,0]-data[j,0]) + np.square(data[i,1]-data[j,1])
cond = np.square(data[i, 2] + data[j, 2]) >= sqsum
cond &= np.square(data[i, 2] - data[j, 2]) <= sqsum
# because equal 'b's are grouped together we can use reduceat:
cond = np.r_[False, np.logical_or.reduceat(
cond, np.add.accumulate(np.arange(N-1)))]
left = data[~cond, :]
# original code (modified to make it run)
flattenedArray = np.ndarray.tolist(data)
#flattenedArray is a python list of lists.
c = flattenedArray[:]
for a in range (len(flattenedArray)):
for b in range(a+1, len(flattenedArray)):
if a == b:
continue
i0 = flattenedArray[a][0]
j0 = flattenedArray[a][1]
z0 = flattenedArray[a][2]
i1 = flattenedArray[b][0]
j1 = flattenedArray[b][1]
z1 = flattenedArray[b][2]
if ((np.square(z0-z1)) <= (np.square(i0-i1) + (np.square(j0-j1)))):
if (np.square(i0-i1) + (np.square(j0-j1))) <= (np.square(z0+z1)):
try:
c.remove(flattenedArray[b])
except:
pass
# check they are the same
print(np.alltrue(c == left))
Vectorizing the inner loop isn't much of a problem if you work with a mask:
import numpy as np
# I'm using a random array
flattenedArray = np.random.randint(0, 100, (10, 3))
mask = np.zeros(flattenedArray.shape[0], bool)
for idx, row in enumerate(flattenedArray):
# Calculate the broadcasted elementwise addition/subtraction of this row
# with all following
added_squared = np.square(row[None, :] + flattenedArray[idx+1:])
subtracted_squared = np.square(row[None, :] - flattenedArray[idx+1:])
# Check the conditions
col1_col2_added = subtracted_squared[:, 0] + subtracted_squared[:, 1]
cond1 = subtracted_squared[:, 2] <= col1_col2_added
cond2 = col1_col2_added <= added_squared[:, 2]
# Update the mask
mask[idx+1:] |= cond1 & cond2
# Apply the mask
flattenedArray[mask]
If you also want to vectorize the outer loop one has to do it by broadcasting, that however will use a lot of memory O(n**2) instead of O(n). Given that the critical inner loop is already vectorized there won't be a lot of speedup by vectorizing the outer loop.

mean of elements i and i+1 in a numpy array

Out of curiosity, is there a specific numpy function to do the following (which would supposedly be faster):
a = np.array((0,2,4))
b = np.zeros(len(a) - 1)
for i in range(len(b)):
b[i] = a[i:i+2].mean()
print(b)
#prints [1,3]
Cheers
You could use
b = (a[1:] + a[:-1]) / 2.
to avoid the Python loop.

Categories

Resources