Numpy: Subtract array element by element - python

The title might be ambiguous, didn't know how else to word it.
I have gotten a bit far with my particle simulator in python using numpy and matplotlib, I have managed to implement coloumb, gravity and wind, now I just want to add temperature and pressure but I have a pre-optimization question (root of all evil). I want to see when particles crash:
Q: Is it in numpy possible to take the difference of an array with each of its own element based on a bool condition? I want to avoid looping.
Eg: (x - any element in x) < a
Should return something like
[True, True, False, True]
If element 0,1 and 3 in x meets the condition.
Edit:
The loop quivalent would be:
for i in len(x):
for j in in len(x):
#!= not so important
##earlier question I asked lets me figure that one out
if i!=j:
if x[j] - x[i] < a:
True
I notice numpy operations are far faster than if tests and this has helped me speed up things ALOT.
Here is a sample code if anyone wants to play with it.
#Simple circular box simulator, part of part_sim
#Restructure to import into gravity() or coloumb () or wind() or pressure()
#Or to use all forces: sim_full()
#Note: Implement crashing as backbone to all forces
import numpy as np
import matplotlib.pyplot as plt
N = 1000 #Number of particles
R = 8000 #Radius of box
r = np.random.randint(0,R/2,2*N).reshape(N,2)
v = np.random.randint(-200,200,r.shape)
v_limit = 10000 #Speedlimit
plt.ion()
line, = plt.plot([],'o')
plt.axis([-10000,10000,-10000,10000])
while True:
r_hit = np.sqrt(np.sum(r**2,axis=1))>R #Who let the dogs out, who, who?
r_nhit = ~r_hit
N_rhit = r_hit[r_hit].shape[0]
r[r_hit] = r[r_hit] - 0.1*v[r_hit] #Get the dogs back inside
r[r_nhit] = r[r_nhit] +0.1*v[r_nhit]
#Dogs should turn tail before they crash!
#---
#---crash code here....
#---crash end
#---
vmin, vmax = np.min(v), np.max(v)
#Give the particles a random kick when they hit the wall
v[r_hit] = -v[r_hit] + np.random.randint(vmin, vmax, (N_rhit,2))
#Slow down honey
v_abs = np.abs(v) > v_limit
#Hit the wall at too high v honey? You are getting a speed reduction
v[v_abs] *=0.5
line.set_ydata(r[:,1])
line.set_xdata(r[:,0])
plt.draw()
I plan to add colors to the datapoints above once I figure out how...such that high velocity particles can easily be distinguished in larger boxes.

Eg: x - any element in x < a Should return something like
[True, True, False, True]
If element 0,1 and 3 in x meets the condition. I notice numpy operations are far faster than if tests and this has helped me speed up things ALOT.
Yes, it's just m < a. For example:
>>> m = np.array((1, 3, 10, 5))
>>> a = 6
>>> m2 = m < a
>>> m2
array([ True, True, False, True], dtype=bool)
Now, to the question:
Q: Is it in numpy possible to take the difference of an array with each of its own element based on a bool condition? I want to avoid looping.
I'm not sure what you're asking for here, but it doesn't seem to match the example directly below it. Are you trying to, e.g., subtract 1 from each element that satisfies the predicate? In that case, you can rely on the fact that False==0 and True==1 and just subtract the boolean array:
>>> m3 = m - m2
>>> m3
>>> array([ 0, 2, 10, 4])
From your clarification, you want the equivalent of this pseudocode loop:
for i in len(x):
for j in in len(x):
#!= not so important
##earlier question I asked lets me figure that one out
if i!=j:
if x[j] - x[i] < a:
True
I think the confusion here is that this is the exact opposite of what you said: you don't want "the difference of an array with each of its own element based on a bool condition", but "a bool condition based on the difference of an array with each of its own elements". And even that only really gets you to a square matrix of len(m)*len(m) bools, but I think the part left over is that the "any".
At any rate, you're asking for an implicit cartesian product, comparing each element of m to each element of m.
You can easily reduce this from two loops to one (or, rather, implicitly vectorize one of them, gaining the usual numpy performance benefits). For each value, create a new array by subtracting that value from each element and comparing the result with a, and then join those up:
>>> a = -2
>>> comparisons = np.array([m - x < a for x in m])
>>> flattened = np.any(comparisons, 0)
>>> flattened
array([ True, True, False, True], dtype=bool)
But you can also turn this into a simple matrix operation pretty easily. Subtracting every element of m from every other element of m is just m - m.T. (You can make the product more explicit, but the way numpy handles adding row and column vectors, it isn't necessary.) And then you just compare every element of that to the scalar a, and reduce with any, and you're done:
>>> a = -2
>>> m = np.matrix((1, 3, 10, 5))
>>> subtractions = m - m.T
>>> subtractions
matrix([[ 0, 2, 9, 4],
[-2, 0, 7, 2],
[-9, -7, 0, -5],
[-4, -2, 5, 0]])
>>> comparisons = subtractions < a
>>> comparisons
matrix([[False, False, False, False],
[False, False, False, False],
[ True, True, False, True],
[ True, False, False, False]], dtype=bool)
>>> np.any(comparisons, 0)
matrix([[ True, True, False, True]], dtype=bool)
Or, putting it all together in one line:
>>> np.any((m - m.T) < a, 0)
matrix([[ True, True, True, True]], dtype=bool)
If you need m to be an array rather than a matrix, you can replace the subtraction line with m - np.matrix(m).T.
For higher dimensions, you actually do need to work in arrays, because you're trying to cartesian-product a 2D array with itself to get a 4D array, and numpy doesn't do 4D matrices. So, you can't use the simple "row vector - column vector = matrix" trick. But you can do it manually:
>>> m = np.array([[1,2], [3,4]]) # 2x2
>>> m4d = m.reshape(1, 1, 2, 2) # 1x1x2x2
>>> m4d
array([[[[1, 2],
[3, 4]]]])
>>> mt4d = m4d.T # 2x2x1x1
>>> mt4d
array([[[[1]],
[[3]]],
[[[2]],
[[4]]]])
>>> subtractions = m - mt4d # 2x2x2x2
>>> subtractions
array([[[[ 0, 1],
[ 2, 3]],
[[-2, -1],
[ 0, 1]]],
[[[-1, 0],
[ 1, 2]],
[[-3, -2],
[-1, 0]]]])
And from there, the remainder is the same as before. Putting it together into one line:
>>> np.any((m - m.reshape(1, 1, 2, 2).T) < a, 0)
(If you remember my original answer, I'd somehow blanked on reshape and was doing the same thing by multiplying m by a column vector of 1s, which obviously is a much stupider way to proceed.)
One last quick thought: If your algorithm really is "the bool result of (for any element y of m, x - y < a) for each element x of m", you don't actually need "for any element y", you can just use "for the maximal element y". So you can simplify from O(N^2) to O(N):
>>> (m - m.max()) < a
Or, if a is positive, that's always false, so you can simplify to O(1):
>>> np.zeros(m.shape, dtype=bool)
But I'm guessing your real algorithm is actually using abs(x - y), or something more complicated, which can't be simplified in this way.

Related

Assigning to all entries whose indices sum to some value

I have an array X of binary numbers and shape (2, 2, ..., 2), and would like to assign the value 1 to all entries whose indices sum to 0 modulo 2 and the value 0 to the rest.
For example, if we had X.shape = (2, 2, 2) then I would like to assign 1 to X[0, 0, 0], X[0, 1, 1], X[1, 0, 1], X[1, 1, 0] and 0 to the other 4 entries.
What is the most efficient way of doing this? I assume I should create this array with the np.bool datatype, so the solution should work with that in mind.
Here are a direct method and a tricksy one. The tricksy one uses bit packing and exploits certain repetitive patterns. For large n this gives a considerable speedup (>50 # n=19).
import functools as ft
import numpy as np
def direct(n):
I = np.arange(2, dtype='u1')
return ft.reduce(np.bitwise_xor, np.ix_(I[::-1], *(n-1)*(I,)))
def smartish(n):
assert n >= 6
b = np.empty(1<<(n-3), 'u1')
b[[0, 3, 5, 6]] = 0b10010110
b[[1, 2, 4, 7]] = 0b01101001
i = b.view('u8')
jp = 1
for j in range(0, n-7, 2):
i[3*jp:4*jp] = i[:jp]
i[jp:3*jp].reshape(2, -1)[...] = 0xffff_ffff_ffff_ffff ^ i[:jp]
jp *= 4
if n & 1:
i[jp:] = 0xffff_ffff_ffff_ffff ^ i[:jp]
return np.unpackbits(b).reshape(n*(2,))
from timeit import timeit
assert np.all(smartish(19) == direct(19))
print(f"direct {timeit(lambda: direct(19), number=100)*10:.3f} ms")
print(f"smartish {timeit(lambda: smartish(19), number=100)*10:.3f} ms")
Sample run on a 2^19 box:
direct 5.408 ms
smartish 0.079 ms
Please note that these return uint8 arrays, for example:
>>> direct(3)
array([[[1, 0],
[0, 1]],
[[0, 1],
[1, 0]]], dtype=uint8)
But these can be view-cast to bool at virtually zero cost:
>>> direct(3).view('?')
array([[[ True, False],
[False, True]],
[[False, True],
[ True, False]]])
Explainer:
direct method: One straight-forward way of checking bit parity is to xor the bits together. We need to do this in a "reducing" way, i.e. we have to apply the binary operation xor to the first two operands, then to the result and the third operand, then to that result and the fourth operand and so forth. This is what functools.reduce does.
Also, we don't want to do this just once but on each point of a 2^n grid. The numpy way of doing this are open grids. These can be generated from 1D axes using np.ix_ or in simple cases using np.ogrid. Note that we flip the very first axis to account for the fact that we want inverted parity.
smartish method. We make two main optimizations. 1) xor is a bitwise operation meaning that it does "64-way parallel computation" for free if we pack our bits into a 64 bit uint. 2) If we flatten the 2^n hypercube then position n in the linear arrangement corresponds to cell (bit1, bit2, bit3, ...) in the hypercube where bit1, bit2 etc. is the binary representation (with leading zeros) of n. Now note that if we have computed the parities of positions 0 .. 0b11..11 = 2^k-1 then we can get the parities of 2^k..2^(k+1)-1 by simply copying and inverting the already computed parities. For example k = 2:
0b000, 0b001, 0b010, 0b011 would be what we have and
0b100, 0b101, 0b110, 0b111 would be what we need to compute
^ ^ ^ ^
Since these two sequences differ only in the marked bit it is clear that indeed their cross digit sums differ by one and the parities are inverted.
As an exercise work out what can be said in a similar vein about the next 2^k entries and the 2^k entries after those.

Creating a "bitmask" from several boolean numpy arrays

I'm trying to convert several masks (boolean arrays) to a bitmask with numpy, while that in theory works I feel that I'm doing too many operations.
For example to create the bitmask I use:
import numpy as np
flags = [
np.array([True, False, False]),
np.array([False, True, False]),
np.array([False, True, False])
]
flag_bits = np.zeros(3, dtype=np.int8)
for idx, flag in enumerate(flags):
flag_bits += flag.astype(np.int8) << idx # equivalent to flag * 2 ** idx
Which gives me the expected "bitmask":
>>> flag_bits
array([1, 6, 0], dtype=int8)
>>> [np.binary_repr(bit, width=7) for bit in flag_bits]
['0000001', '0000110', '0000000']
However I feel that especially the casting to int8 and the addition with the flag_bits array is too complicated. Therefore I wanted to ask if there is any NumPy functionality that I missed that could be used to create such an "bitmask" array?
Note: I'm calling an external function that expects such a bitmask, otherwise I would stick with the boolean arrays.
>>> x = np.array(2**i for i in range(1, np.shape(flags)[1]+1))
>>> np.dot(flags, x)
array([1, 2, 2])
How it works: in a bit mask, every bit is effectively an original array element multiplied by a degree of 2 according to its position, e.g. 4 = False * 1 + True * 2 + False * 4. Effectively this can be represented as matrix multiplication, which is really efficient in numpy.
So, first line is a list comprehension to create these weights: x = [1, 2, 4, 8, ... 2^(n+1)].
Then, each line in flags is multiplied by the corresponding element in x and everything is summed up (this is how matrix multiplication works). At the end, we get the bitmask
How about this (added conversion to int8, if desired):
flag_bits = (np.transpose(flags) << np.arange(len(flags))).sum(axis=1)\
.astype(np.int8)
#array([1, 6, 0], dtype=int8)
Here's an approach to directly get to the string bitmask with boolean-indexing -
out = np.repeat('0000000',3).astype('S7')
out.view('S1').reshape(-1,7)[:,-3:] = np.asarray(flags).astype(int)[::-1].T
Sample run -
In [41]: flags
Out[41]:
[array([ True, False, False], dtype=bool),
array([False, True, False], dtype=bool),
array([False, True, False], dtype=bool)]
In [42]: out = np.repeat('0000000',3).astype('S7')
In [43]: out.view('S1').reshape(-1,7)[:,-3:] = np.asarray(flags).astype(int)[::-1].T
In [44]: out
Out[44]:
array([b'0000001', b'0000110', b'0000000'],
dtype='|S7')
Using the same matrix-multiplication strategy as dicussed in detail in #Marat's solution, but using a vectorized scaling array that gives us flag_bits -
np.dot(2**np.arange(3),flags)

Replace values in specific columns of a numpy array

I have a N x M numpy array (matrix). Here is an example with a 3 x 5 array:
x = numpy.array([[0,1,2,3,4,5],[0,-1,2,3,-4,-5],[0,-1,-2,-3,4,5]])
I'd like to scan all the columns of x and replace the values of each column if they are equal to a specific value.
This code for example aims to replace all the negative values (where the value is equal to the column number) to 100:
for i in range(1,6):
x[:,i == -(i)] = 100
This code obtains this warning:
DeprecationWarning: using a boolean instead of an integer will result in an error in the future
I'm using numpy 1.8.2. How can I avoid this warning without downgrade numpy?
I don't follow what your code is trying to do:
the i == -(i)
will evaluate to something like this:
x[:, True]
x[:, False]
I don't think this is what you want. You should try something like this:
for i in range(1, 6):
mask = x[:, i] == -i
x[:, i][mask] = 100
Create a mask over the whole column, and use that to change the values.
Even without the warning, the code you have there will not do what you want. i is the loop index and will equal minus itself only if i == 0, which is never. Your test will always return false, which is cast to 0. In other words your code will replace the first element of each row with 100.
To get this to work I would do
for i in range(1, 6):
col = x[:,i]
col[col == -i] = 100
Notice that you use the name of the array for the masking and that you need to separate the conventional indexing from the masking
If you are worried about the warning spewing out text, then ignore it as a Warning/Exception:
import numpy
import warnings
warnings.simplefilter('default') # this enables DeprecationWarnings to be thrown
x = numpy.array([[0,1,2,3,4,5],[0,-1,2,3,-4,-5],[0,-1,-2,-3,4,5]])
with warnings.catch_warnings():
warnings.simplefilter("ignore") # and this ignores them
for i in range(1,6):
x[:,i == -(i)] = 100
print(x) # just to show that you are actually changing the content
As you can see in the comments, some people are not getting DeprecationWarning. That is probably because python suppresses developer-only warnings since 2.7
As others have said, your loop isn't doing what you think it is doing. I would propose you change your code to use numpy's fancy indexing.
# First, create the "test values" (column index):
>>> test_values = numpy.arange(6)
# test_values is array([0, 1, 2, 3, 4, 5])
#
# Now, we want to check which columns have value == -test_values:
#
>>> mask = (x == -test_values) & (x < 0)
# mask is True wherever a value in the i-th column of x is negative i
>>> mask
array([[False, False, False, False, False, False],
[False, True, False, False, True, True],
[False, True, True, True, False, False]], dtype=bool)
#
# Now, set those values to 100
>>> x[mask] = 100
>>> x
array([[ 0, 1, 2, 3, 4, 5],
[ 0, 100, 2, 3, 100, 100],
[ 0, 100, 100, 100, 4, 5]])

Python: Elementwise comparison of same shaped arrays

I have n matrices of the same size and want to see how many cells are equal to each other across all matrices. Code:
import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[5,6,7], [4,2,6], [7, 8, 9]])
c = np.array([2,3,4],[4,5,6],[1,2,5])
#Intuition is below but is wrong
a == b == c
How do I get Python to return a value of 2 (cells 2,1 and 2,3 match in all 3 matrices) or an array of [[False, False, False], [True, False, True], [False, False, False]]?
You can do:
(a == b) & (b==c)
[[False False False]
[ True False True]
[False False False]]
For n items in, say, a list like x=[a, b, c, a, b, c], one could do:
r = x[0] == x[1]
for temp in x[2:]:
r &= x[0]==temp
The result in now in r.
If the structure is already in a 3D numpy array, one could also use:
np.amax(x,axis=2)==np.amin(x,axis=2)
The idea for the above line is that although it would be ideal to have an equal function with an axis argument, there isn't one so this line notes that if amin==amax along the axis, then all elements are equal.
If the different arrays to be compared aren't already in a 3D numpy array (or won't be in the future), looping the list is a fast and easy approach. Although I generally agree with avoiding Python loops for Numpy arrays, this seems like a case where it's easier and faster (see below) to use a Python loop since the loop is only along a single axis and it's easy to accumulate the comparisons in place. Here's a timing test:
def f0(x):
r = x[0] == x[1]
for y in x[2:]:
r &= x[0]==y
def f1(x): # from #Divakar
r = ~np.any(np.diff(np.dstack(x),axis=2),axis=2)
def f2(x):
x = np.dstack(x)
r = np.amax(x,axis=2)==np.amin(x,axis=2)
# speed test
for n, size, reps in ((1000, 3, 1000), (10, 1000, 100)):
x = [np.ones((size, size)) for i in range(n)]
print n, size, reps
print "f0: ",
print timeit("f0(x)", "from __main__ import x, f0, f1", number=reps)
print "f1: ",
print timeit("f1(x)", "from __main__ import x, f0, f1", number=reps)
print
1000 3 1000
f0: 1.14673900604 # loop
f1: 3.93413209915 # diff
f2: 3.93126702309 # min max
10 1000 100
f0: 2.42633581161 # loop
f1: 27.1066679955 # diff
f2: 25.9518558979 # min max
If arrays are already in a single 3D numpy array (eg, from using x = np.dstack(x) in the above) then modifying the above function defs appropriately and with the addition of the min==max approach gives:
def g0(x):
r = x[:,:,0] == x[:,:,1]
for iy in range(x[:,:,2:].shape[2]):
r &= x[:,:,0]==x[:,:,iy]
def g1(x): # from #Divakar
r = ~np.any(np.diff(x,axis=2),axis=2)
def g2(x):
r = np.amax(x,axis=2)==np.amin(x,axis=2)
which yields:
1000 3 1000
g0: 3.9761030674 # loop
g1: 0.0599548816681 # diff
g2: 0.0313589572906 # min max
10 1000 100
g0: 10.7617051601 # loop
g1: 10.881870985 # diff
g2: 9.66712999344 # min max
Note also that for a list of large arrays f0 = 2.4 and for a pre-built array g0, g1, g2 ~= 10., so that if the input arrays are large, than fastest approach by about 4x is to store them separately in a list. I find this a bit surprising and guess that this might be due to cache swapping (or bad code?), but I'm not sure anyone really cares so I'll stop this here.
Concatenate along the third axis with np.dstack and perfom differentiation with np.diff, so that the identical ones would show up as zeros. Then, check for cases where all are zeros with ~np.any. Thus, you would have a one-liner solution like so -
~np.any(np.diff(np.dstack((a,b,c)),axis=2),axis=2)
Sample run -
In [39]: a
Out[39]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [40]: b
Out[40]:
array([[5, 6, 7],
[4, 2, 6],
[7, 8, 9]])
In [41]: c
Out[41]:
array([[2, 3, 4],
[4, 5, 6],
[1, 2, 5]])
In [42]: ~np.any(np.diff(np.dstack((a,b,c)),axis=2),axis=2)
Out[42]:
array([[False, False, False],
[ True, False, True],
[False, False, False]], dtype=bool)
Try this:
z1 = a == b
z2 = a == c
z = np.logical_and(z1,z2)
print "count:", np.sum(z)
You can do this in a single statement:
count = np.sum( np.logical_and(a == b, a == c) )

numpy argmin elegant solution required.

In python to find the index of the minimum value of the array I usey = numpy.argmin(someMat)
Can i find the minimum value of this matrix such that it does not lie within a specified range in a neat way?
"Can i find the minimum value of this matrix such that it does not lie within a specified range in a neat way?"
If you only care about the minimum value satisfying some condition and not the location, then
>>> numpy.random.seed(1)
>>> m = numpy.random.randn(5.,5.)
>>> m
array([[ 1.62434536, -0.61175641, -0.52817175, -1.07296862, 0.86540763],
[-2.3015387 , 1.74481176, -0.7612069 , 0.3190391 , -0.24937038],
[ 1.46210794, -2.06014071, -0.3224172 , -0.38405435, 1.13376944],
[-1.09989127, -0.17242821, -0.87785842, 0.04221375, 0.58281521],
[-1.10061918, 1.14472371, 0.90159072, 0.50249434, 0.90085595]])
>>> m[~ ((m < 0.5) | (m > 0.8))].min()
0.50249433890186823
If you do want the location via argmin, then that's a bit trickier, but one way is to use masked arrays:
>>> numpy.ma.array(m,mask=((m<0.5) | (m > 0.8))).argmin()
23
>>> m.flat[23]
0.50249433890186823
Note that the condition here is flipped, as the mask is True for the excluded values, not the included ones.
Update: it appears that by "within a specified range" you don't mean the minimum value isn't within some bounds, but that you want to exclude portions of the matrix from the search based on the x,y coordinates. Here's one way (same matrix as before):
>>> xx, yy = numpy.indices(m.shape)
>>> points = ((xx == 0) & (yy == 0)) | ((xx > 2) & (yy < 3))
>>> points
array([[ True, False, False, False, False],
[False, False, False, False, False],
[False, False, False, False, False],
[ True, True, True, False, False],
[ True, True, True, False, False]], dtype=bool)
>>> m[points]
array([ 1.62434536, -1.09989127, -0.17242821, -0.87785842, -1.10061918,
1.14472371, 0.90159072])
>>> m[points].min()
-1.1006191772129212
with the corresponding masked array variant if you need the locations. [Edited to use indices instead of mgrid; I'd actually forgotten about it until it was used in another answer today!]
If I'm still wrong :^) and this also isn't what you're after, please edit your question to include a 3x3 example of your desired input and output.
I'm guessing this is what you are trying to achieve:
Argmin with arrays:
>>> from numpy import *
>>> a = array( [2,3,4] )
>>> argmin(a)
0
>>> print a[argmin(a)]
2
Argmin with matrices:
>>> b=array( [[6,5,4],[3,2,1]] )
>>> argmin(b)
5
>>> print b[argmin(b)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index out of bounds
Same approach for indexing doesn't work for arrays. The reason is that argmin (as well as argmax) returns index of the variable -- in case of a matrix, you need to convert your n-dimensional matrix to a 1-dimensional array of indices.
In order to do this, you need to call ravel :
>>> print b
[[6 5 4]
[3 2 1]]
>>> ravel(b)
array([6, 5, 4, 3, 2, 1])
When you combine ravel with argmin, you must write:
>>> print ravel(b)[argmin(b)]

Categories

Resources