Replace values in specific columns of a numpy array

Replace values in specific columns of a numpy array - python

I have a N x M numpy array (matrix). Here is an example with a 3 x 5 array:
x = numpy.array([[0,1,2,3,4,5],[0,-1,2,3,-4,-5],[0,-1,-2,-3,4,5]])
I'd like to scan all the columns of x and replace the values of each column if they are equal to a specific value.
This code for example aims to replace all the negative values (where the value is equal to the column number) to 100:
for i in range(1,6):
x[:,i == -(i)] = 100
This code obtains this warning:
DeprecationWarning: using a boolean instead of an integer will result in an error in the future
I'm using numpy 1.8.2. How can I avoid this warning without downgrade numpy?

I don't follow what your code is trying to do:
the i == -(i)
will evaluate to something like this:
x[:, True]
x[:, False]
I don't think this is what you want. You should try something like this:
for i in range(1, 6):
mask = x[:, i] == -i
x[:, i][mask] = 100
Create a mask over the whole column, and use that to change the values.

Even without the warning, the code you have there will not do what you want. i is the loop index and will equal minus itself only if i == 0, which is never. Your test will always return false, which is cast to 0. In other words your code will replace the first element of each row with 100.
To get this to work I would do
for i in range(1, 6):
col = x[:,i]
col[col == -i] = 100
Notice that you use the name of the array for the masking and that you need to separate the conventional indexing from the masking

If you are worried about the warning spewing out text, then ignore it as a Warning/Exception:
import numpy
import warnings
warnings.simplefilter('default') # this enables DeprecationWarnings to be thrown
x = numpy.array([[0,1,2,3,4,5],[0,-1,2,3,-4,-5],[0,-1,-2,-3,4,5]])
with warnings.catch_warnings():
warnings.simplefilter("ignore") # and this ignores them
for i in range(1,6):
x[:,i == -(i)] = 100
print(x) # just to show that you are actually changing the content
As you can see in the comments, some people are not getting DeprecationWarning. That is probably because python suppresses developer-only warnings since 2.7

As others have said, your loop isn't doing what you think it is doing. I would propose you change your code to use numpy's fancy indexing.
# First, create the "test values" (column index):
>>> test_values = numpy.arange(6)
# test_values is array([0, 1, 2, 3, 4, 5])
#
# Now, we want to check which columns have value == -test_values:
#
>>> mask = (x == -test_values) & (x < 0)
# mask is True wherever a value in the i-th column of x is negative i
>>> mask
array([[False, False, False, False, False, False],
[False, True, False, False, True, True],
[False, True, True, True, False, False]], dtype=bool)
#
# Now, set those values to 100
>>> x[mask] = 100
>>> x
array([[ 0, 1, 2, 3, 4, 5],
[ 0, 100, 2, 3, 100, 100],
[ 0, 100, 100, 100, 4, 5]])

Related

Looping through a Truth array in python and replacing true values with components from another array

Let's say I have a Numpy array truth array that looks something like the following:
truths = [True, False, False, False, True, True]
and I have another array of values that looks something like:
nums = [1, 2, 3]
I want to create a loop that will replace all the truth values in the truths array with the next number from the nums array and replace all the False values with 0.
I want to end up with something that looks like:
array = [1, 0, 0, 0, 2, 3]

I would recommend numpy.putmask(). Since we're converting from type bool to int64, we need to do some conversions first.
First, initialization:
truths = np.array([ True, False, False, False, True, True])
nums = np.array([1, 2, 3])
Then we convert and replace based on our mask (if element of truth is True):
truths = truths.astype('int64') # implicitly changes all the "False" values to 0
numpy.putmask(truths, truths, nums)
The end result:
>>> truths
array([1, 0, 0, 0, 2, 3])
Note that we just pass in truths into the "mask" argument of numpy.putmask(). This will simply check to see if each element of array truths is truthy; since we converted the array to type int64, it will replace only elements that are NOT 0, as required.
If we wanted to be more pedantic, or needed to replace some arbitrary value, we would need numpy.putmask(truths, truths==<value we want to replace>, nums) instead.
If we want to go EVEN more pedantic and not make the assumption that we can easily convert types (as we can from bool to int64), as far as I'm aware, we'd either need to make some sort of mapping to a different numpy.array where we could make that conversion. The way I'd personally do that is to convert my numpy.array into some boolean array where I can do this easy conversion, but there may be a better way.

You can use cycle from itertools to cycle through your nums list. Then just zip it with your booleans and use a ternary list comprehension.
from itertools import cycle
>>> [num if boolean else 0 for boolean, num in zip(truths, cycle(nums))]
[1, 0, 0, 0, 2, 3]

You could use itertools here as you said you want a loop.
from itertools import cycle, chain, repeat
import numpy as np
truths = np.array([True, False, False, False, True, True])
nums = np.array([1, 2, 3])
#you have 2 options here.
#Either repeat over nums
iter_nums = cycle(nums)
#or when nums is exhausted
#you just put default value in it's place
iter_nums = chain(nums, repeat(0))
masked = np.array([next(iter_nums) if v else v for v in truths])
print(masked)
#[1, 0, 0, 0, 2, 3]

Find rows indexes with zero values

I have a (N,2) array representing some image coordinates. I want to extract only those rows where two values are zero.
For example in this array:
aux = np.array([[0.,-0.0001], [0.0,0.0], [0.0,0.0], [123,0.0]])
I want as result a numpy array indicating whole row have zeros:
result: np.array([1,2])
Until now, Im triyng with where
np.where(aux==0)
(array([0, 1, 1, 2, 2, 3]), array([0, 0, 1, 0, 1, 1]))
But I don't understant the output as a tuple. What is the second array?

I think you can do this using np.all:
np.all(aux == 0, axis=1)
This returns a boolean array of where your two values are 0:
array([False, True, True, False], dtype=bool)
You can extract an array of the corresponding indices (matching your desired output) using np.where:
np.where(np.all(aux == 0, axis=1))
(array([1, 2]),)

using Lambdas in Python you can solve it like this:
aux = np.array ([[0., - 0.0001], [0.0,0.0], [0.0,0.0], [123,0.0]])
First, you must define the logic with a lambda expression,
The condition you are looking for is:
f = lambda x: x [0]==0 and x [1] == 0
map () is a python function that applies the lambda expression to each element
map (f, aux)
the output will be a Boolean vector with True where the condition has been met
[False, True, True, False]
This works fine for Python 2.7, but not in Python 3.6.
for python 3.6, you will need an additional step:
iter = map (f, aux)
for item in iter
print (item)
and you will get the same result:
False
True
True
False

Adding numpy array elements to new array only if conditions met

I need to copy elements from one numpy array to another, but only if a condition is met. Let's say I have two arrays:
x = ([1,2,3,4,5,6,7,8,9])
y = ([])
I want to add numbers from x to y, but only if they match a condition, lets say check if they are divisible by two. I know I can do the following:
y = x%2 == 0
which makes y an array of values 'true' and 'false'. This is not what I am trying to accomplish however, I want the actual values (0,2,4,6,8) and only those that evaluate to true.

You can get the values you want like this:
import numpy as np
x = np.array([1,2,3,4,5,6,7,8,9])
# array([1, 2, 3, 4, 5, 6, 7, 8, 9])
y = x[x%2==0]
# y is now: array([2, 4, 6, 8])
And, you can sum them like this:
np.sum(x[x%2==0])
# 20
Explanation: As you noticed, x%2==0 gives you a boolean array array([False, True, False, True, False, True, False, True, False], dtype=bool). You can use this as a "mask" on your original array, by indexing it with x[x%2==0], returning the values of x where your "mask" is True. Take a look at the numpy indexing documentation for more info.

Determine sum of numpy array while excluding certain values

I would like to determine the sum of a two dimensional numpy array. However, elements with a certain value I want to exclude from this summation. What is the most efficient way to do this?
For example, here I initialize a two dimensional numpy array of 1s and replace several of them by 2:
import numpy
data_set = numpy.ones((10, 10))
data_set[4][4] = 2
data_set[5][5] = 2
data_set[6][6] = 2
How can I sum over the elements in my two dimensional array while excluding all of the 2s? Note that with the 10 by 10 array the correct answer should be 97 as I replaced three elements with the value 2.
I know I can do this with nested for loops. For example:
elements = []
for idx_x in range(data_set.shape[0]):
for idx_y in range(data_set.shape[1]):
if data_set[idx_x][idx_y] != 2:
elements.append(data_set[idx_x][idx_y])
data_set_sum = numpy.sum(elements)
However on my actual data (which is very large) this is too slow. What is the correct way of doing this?

Use numpy's capability of indexing with boolean arrays. In the below example data_set!=2 evaluates to a boolean array which is True whenever the element is not 2 (and has the correct shape). So data_set[data_set!=2] is a fast and convenient way to get an array which doesn't contain a certain value. Of course, the boolean expression can be more complex.
In [1]: import numpy as np
In [2]: data_set = np.ones((10, 10))
In [4]: data_set[4,4] = 2
In [5]: data_set[5,5] = 2
In [6]: data_set[6,6] = 2
In [7]: data_set[data_set != 2].sum()
Out[7]: 97.0
In [8]: data_set != 2
Out[8]:
array([[ True, True, True, True, True, True, True, True, True,
True],
[ True, True, True, True, True, True, True, True, True,
True],
...
[ True, True, True, True, True, True, True, True, True,
True]], dtype=bool)

Without numpy, the solution is not much more complex:
x = [1,2,3,4,5,6,7]
sum(y for y in x if y != 7)
# 21
Works for a list of excluded values too:
# set is faster for resolving `in`
exl = set([1,2,3])
sum(y for y in x if y not in exl)
# 22

Using np.sums where= argument, we avoid the need for array copying which would otherwise be triggered from using advanced array indexing:
>>> import numpy as np
>>> data_set = np.ones((10,10))
>>> data_set[(4,5,6),(4,5,6)] = 2
>>> np.sum(data_set, where=data_set != 2)
97.0
>>> data_set.sum(where=data_set != 2)
97.0
https://numpy.org/doc/stable/reference/generated/numpy.sum.html
Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).
https://numpy.org/doc/stable/user/basics.indexing.html#advanced-indexing

How about this way that makes use of numpy's boolean capabilities.
We simply set all the values that meet the specification to zero before taking the sum, that way we don't change the shape of the array as we would if we were to filter them from the array.
The other benefit of this is that it means we can sum along axis after the filter is applied.
import numpy
data_set = numpy.ones((10, 10))
data_set[4][4] = 2
data_set[5][5] = 2
data_set[6][6] = 2
print "Sum", data_set.sum()
another_set = numpy.array(data_set) # Take a copy, we'll need that later
data_set[data_set == 2] = 0 # Set all the values that are 2 to zero
print "Filtered sum", data_set.sum()
print "Along axis", data_set.sum(0), data_set.sum(1)
Equally we could use any other boolean to set the data we wish to exclude from the sum.
another_set[(another_set > 1) & (another_set < 3)] = 0
print "Another filtered sum", another_set.sum()

Numpy: Subtract array element by element

The title might be ambiguous, didn't know how else to word it.
I have gotten a bit far with my particle simulator in python using numpy and matplotlib, I have managed to implement coloumb, gravity and wind, now I just want to add temperature and pressure but I have a pre-optimization question (root of all evil). I want to see when particles crash:
Q: Is it in numpy possible to take the difference of an array with each of its own element based on a bool condition? I want to avoid looping.
Eg: (x - any element in x) < a
Should return something like
[True, True, False, True]
If element 0,1 and 3 in x meets the condition.
Edit:
The loop quivalent would be:
for i in len(x):
for j in in len(x):
#!= not so important
##earlier question I asked lets me figure that one out
if i!=j:
if x[j] - x[i] < a:
True
I notice numpy operations are far faster than if tests and this has helped me speed up things ALOT.
Here is a sample code if anyone wants to play with it.
#Simple circular box simulator, part of part_sim
#Restructure to import into gravity() or coloumb () or wind() or pressure()
#Or to use all forces: sim_full()
#Note: Implement crashing as backbone to all forces
import numpy as np
import matplotlib.pyplot as plt
N = 1000 #Number of particles
R = 8000 #Radius of box
r = np.random.randint(0,R/2,2*N).reshape(N,2)
v = np.random.randint(-200,200,r.shape)
v_limit = 10000 #Speedlimit
plt.ion()
line, = plt.plot([],'o')
plt.axis([-10000,10000,-10000,10000])
while True:
r_hit = np.sqrt(np.sum(r**2,axis=1))>R #Who let the dogs out, who, who?
r_nhit = ~r_hit
N_rhit = r_hit[r_hit].shape[0]
r[r_hit] = r[r_hit] - 0.1*v[r_hit] #Get the dogs back inside
r[r_nhit] = r[r_nhit] +0.1*v[r_nhit]
#Dogs should turn tail before they crash!
#---
#---crash code here....
#---crash end
#---
vmin, vmax = np.min(v), np.max(v)
#Give the particles a random kick when they hit the wall
v[r_hit] = -v[r_hit] + np.random.randint(vmin, vmax, (N_rhit,2))
#Slow down honey
v_abs = np.abs(v) > v_limit
#Hit the wall at too high v honey? You are getting a speed reduction
v[v_abs] *=0.5
line.set_ydata(r[:,1])
line.set_xdata(r[:,0])
plt.draw()
I plan to add colors to the datapoints above once I figure out how...such that high velocity particles can easily be distinguished in larger boxes.

Eg: x - any element in x < a Should return something like
[True, True, False, True]
If element 0,1 and 3 in x meets the condition. I notice numpy operations are far faster than if tests and this has helped me speed up things ALOT.
Yes, it's just m < a. For example:
>>> m = np.array((1, 3, 10, 5))
>>> a = 6
>>> m2 = m < a
>>> m2
array([ True, True, False, True], dtype=bool)
Now, to the question:
Q: Is it in numpy possible to take the difference of an array with each of its own element based on a bool condition? I want to avoid looping.
I'm not sure what you're asking for here, but it doesn't seem to match the example directly below it. Are you trying to, e.g., subtract 1 from each element that satisfies the predicate? In that case, you can rely on the fact that False==0 and True==1 and just subtract the boolean array:
>>> m3 = m - m2
>>> m3
>>> array([ 0, 2, 10, 4])
From your clarification, you want the equivalent of this pseudocode loop:
for i in len(x):
for j in in len(x):
#!= not so important
##earlier question I asked lets me figure that one out
if i!=j:
if x[j] - x[i] < a:
True
I think the confusion here is that this is the exact opposite of what you said: you don't want "the difference of an array with each of its own element based on a bool condition", but "a bool condition based on the difference of an array with each of its own elements". And even that only really gets you to a square matrix of len(m)*len(m) bools, but I think the part left over is that the "any".
At any rate, you're asking for an implicit cartesian product, comparing each element of m to each element of m.
You can easily reduce this from two loops to one (or, rather, implicitly vectorize one of them, gaining the usual numpy performance benefits). For each value, create a new array by subtracting that value from each element and comparing the result with a, and then join those up:
>>> a = -2
>>> comparisons = np.array([m - x < a for x in m])
>>> flattened = np.any(comparisons, 0)
>>> flattened
array([ True, True, False, True], dtype=bool)
But you can also turn this into a simple matrix operation pretty easily. Subtracting every element of m from every other element of m is just m - m.T. (You can make the product more explicit, but the way numpy handles adding row and column vectors, it isn't necessary.) And then you just compare every element of that to the scalar a, and reduce with any, and you're done:
>>> a = -2
>>> m = np.matrix((1, 3, 10, 5))
>>> subtractions = m - m.T
>>> subtractions
matrix([[ 0, 2, 9, 4],
[-2, 0, 7, 2],
[-9, -7, 0, -5],
[-4, -2, 5, 0]])
>>> comparisons = subtractions < a
>>> comparisons
matrix([[False, False, False, False],
[False, False, False, False],
[ True, True, False, True],
[ True, False, False, False]], dtype=bool)
>>> np.any(comparisons, 0)
matrix([[ True, True, False, True]], dtype=bool)
Or, putting it all together in one line:
>>> np.any((m - m.T) < a, 0)
matrix([[ True, True, True, True]], dtype=bool)
If you need m to be an array rather than a matrix, you can replace the subtraction line with m - np.matrix(m).T.
For higher dimensions, you actually do need to work in arrays, because you're trying to cartesian-product a 2D array with itself to get a 4D array, and numpy doesn't do 4D matrices. So, you can't use the simple "row vector - column vector = matrix" trick. But you can do it manually:
>>> m = np.array([[1,2], [3,4]]) # 2x2
>>> m4d = m.reshape(1, 1, 2, 2) # 1x1x2x2
>>> m4d
array([[[[1, 2],
[3, 4]]]])
>>> mt4d = m4d.T # 2x2x1x1
>>> mt4d
array([[[[1]],
[[3]]],
[[[2]],
[[4]]]])
>>> subtractions = m - mt4d # 2x2x2x2
>>> subtractions
array([[[[ 0, 1],
[ 2, 3]],
[[-2, -1],
[ 0, 1]]],
[[[-1, 0],
[ 1, 2]],
[[-3, -2],
[-1, 0]]]])
And from there, the remainder is the same as before. Putting it together into one line:
>>> np.any((m - m.reshape(1, 1, 2, 2).T) < a, 0)
(If you remember my original answer, I'd somehow blanked on reshape and was doing the same thing by multiplying m by a column vector of 1s, which obviously is a much stupider way to proceed.)
One last quick thought: If your algorithm really is "the bool result of (for any element y of m, x - y < a) for each element x of m", you don't actually need "for any element y", you can just use "for the maximal element y". So you can simplify from O(N^2) to O(N):
>>> (m - m.max()) < a
Or, if a is positive, that's always false, so you can simplify to O(1):
>>> np.zeros(m.shape, dtype=bool)
But I'm guessing your real algorithm is actually using abs(x - y), or something more complicated, which can't be simplified in this way.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Replace values in specific columns of a numpy array - python

Related

Looping through a Truth array in python and replacing true values with components from another array

Find rows indexes with zero values

Adding numpy array elements to new array only if conditions met

Determine sum of numpy array while excluding certain values

Numpy: Subtract array element by element

Categories

Resources