how to implement multiple ifelse in numpy - python

I have an array like this and need to replace every 1 with 2, every 3 with 4, every 4 with 1. Is there a way to do this just with np and not loops?
import numpy as np
np.random.seed(2)
arr=np.random.randint(1,5,(3,3),int)
arr
array([[1, 4, 2],
[1, 3, 4],
[3, 4, 1]])
If I use array mask sequentially, it doesn't give the expected outcome:
array([[2, 1, 2],
[2, 4, 1],
[4, 1, 2]])
It is based on a conditional logic and not maths formula

If the array values don't necessarely range between 1 and 4 you can use np.select:
import numpy as np
a = np.random.randint(1,5, (3,3))
condlist = [np.logical_or(a==1, a==2), a==3, a==4]
choicelist= [2, 4, 1]
b = np.select(condlist, choicelist)
which does not care about the order of the conditions

Here's one with np.searchsorted for performance efficiency -
def map_values(arr, old_val, new_val):
sidx = old_val.argsort()
idx = np.searchsorted(old_val,arr,sorter=sidx)
return np.where(old_val[idx]==arr, new_val[sidx[idx]], arr)
Sample run -
In [40]: arr
Out[40]:
array([[1, 4, 2],
[1, 3, 4],
[3, 4, 1]])
In [41]: old_val = np.array([1,3,4])
...: new_val = np.array([2,4,1])
In [42]: map_values(arr, old_val, new_val)
Out[42]:
array([[2, 1, 2],
[2, 4, 1],
[4, 1, 2]])

Could do this with a lambda function and np.vectorize():
import numpy as np
np.random.seed(2)
arr=np.random.randint(1,5,(3,3),int)
f = lambda x: x%4 + 1 if x in [1,3,4] else x
vfunc = np.vectorize(f)
Usage:
>>> vfunc(arr)
array([[2, 1, 2],
[2, 4, 1],
[4, 1, 2]])

You have to be careful about the order of assignments. For example, if you do
arr[arr == 4] = 1
arr[arr == 1] = 2
Now all of the elements that were originally 4 will be 2, not 1 as you intend.
One solution is to carefully craft the order of assignments:
arr[arr == 1] = 2
arr[arr == 4] = 1
However, this is very brittle and will fall apart as you introduce more of them. It would be better to create the masks up front from the original array:
ones = arr == 1
fours = arr == 4
arr[ones] = 2
arr[fours] = 1
Now the order of the assignments won't matter because the masks are determined before modifying the array.

You want arr % 4 + 1, except in the case of 2, which stays the same. So use np.where to find all the 2s. Then do arr % 4 + 1, then reset all the 2s.
import numpy as np
np.random.seed(2)
arr=np.random.randint(1,5,(3,3),int)
twos = np.where(arr == 2)
arr = arr % 4 + 1
arr[twos] = 2
print(arr)

Related

element wise multiplication of a vector and a matrix with numpy

Given python code with numpy:
import numpy as np
a = np.arange(6).reshape(3, 2) # a = [[0, 1], [2, 3], [4, 5]]; a.shape = (3, 2)
b = np.arange(3) + 1 # b = [1, 2, 3] ; b.shape = (3,)
How can I multiply each value in b with each corresponding row ('vector') in a? So here, I want the result as:
result = [[0, 1], [4, 6], [12, 15]] # result.shape = (3, 2)
I can do this with a loop, but I am wondering about a vectorized approach. I found an Octave solution here. Apart from this, I didn't find anything else. Any pointers for this?
Thank you in advance.
Probably the simplest is to do the following.
import numpy as np
a = np.arange(6).reshape(3, 2) # a = [[0, 1], [2, 3], [4, 5]]; a.shape = (3, 2)
b = np.arange(3) + 1
ans = np.diag(b)#a
Here's a method that exploits numpy multiplication broadcasting:
ans = (b*a.T).T
These two solutions basically take the same approach
ans = np.tile(b,(2,1)).T*a
ans = np.vstack([b for _ in range(a.shape[1])]).T*a
In [123]: a = np.arange(6).reshape(3, 2) # a = [[0, 1], [2, 3], [4, 5]]; a.
...: shape = (3, 2)
...: b = np.arange(3) + 1 # b = [1, 2, 3] ; b.
...: shape = (3,)
In [124]: a
Out[124]:
array([[0, 1],
[2, 3],
[4, 5]])
A (3,1) will multiply a (3,2) via broadcasting:
In [125]: a*b[:,None]
Out[125]:
array([[ 0, 1],
[ 4, 6],
[12, 15]])

Creating an array element-wise from product of two arrays

I have a project wherein, after multiplying arrays, I have to arrange them into a separate array (element-wise) and get their sums.
As an example:
a = [1, 0, 1]
b = [[3,5,2], [5,4,3], [5,2,2]]
c = a*b
c = [ [3, 5, 2]
[0, 0, 0]
[5, 2, 2] ]
Now, I want to put the answers in an individual array element wise such as:
r1 = [3, 0, 5]
r2 = [5, 0, 2]
r3 = [2, 0, 2]
Then, get its sum.
sum_r1 = [8]
sum_r2 = [7]
sum_r3 = [4]
So far, my I am only able to code the multiplication. I am still trying the appropriate code for the succeeding steps. My code looks like this:
[EDIT]
def fitness_score(a, b):
c = numpy.multiply(a, b)
trns = numpy.transpose(c)
s = numpy.sum(trns, axis=1)
return s
Output gives the answer but it has an error something like this: ValueError: operands could not be broadcast together with shapes (500,3) (3,3). Note that the values in a are obtained randomly.
Any help would be appreciated! Thank you in advance!
You can use NumPy, just use transpose on the second matrix to get the desired result.
import numpy as np
a = [1, 0, 1]
b = [[3,5,2], [5,4,3], [5,2,2]]
a = np.array(a)
b = np.array(b)
mul = a*b.T
#array([[3, 0, 5],
# [5, 0, 2],
# [2, 0, 2]])
s = np.sum(a*b.T, axis=1)
#array([8, 7, 4])
If you have a 500 by 3 shaped array for a. You can try this:
import numpy as np
a = [[1, 0, 1] for _ in range(500)]
b = [[3,5,2], [5,4,3], [5,2,2]]
a = np.array(a)
b = np.array(b)
mul = [a_c*b.T for a_c in a]
#array([[3, 0, 5],
# [5, 0, 2],
# [2, 0, 2]])
s = np.sum(mul, axis=-1)
print(s)

NumPy apply function to groups of rows corresponding to another numpy array

I have a NumPy array with each row representing some (x, y, z) coordinate like so:
a = array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2]])
I also have another NumPy array with unique values of the z-coordinates of that array like so:
b = array([1, 2])
How can I apply a function, let's call it "f", to each of the groups of rows in a which correspond to the values in b? For example, the first value of b is 1 so I would get all rows of a which have a 1 in the z-coordinate. Then, I apply a function to all those values.
In the end, the output would be an array the same shape as b.
I'm trying to vectorize this to make it as fast as possible. Thanks!
Example of an expected output (assuming that f is count()):
c = array([2, 2])
because there are 2 rows in array a which have a z value of 1 in array b and also 2 rows in array a which have a z value of 2 in array b.
A trivial solution would be to iterate over array b like so:
for val in b:
apply function to a based on val
append to an array c
My attempt:
I tried doing something like this, but it just returns an empty array.
func(a[a[:, 2]==b])
The problem is that the groups of rows with the same Z can have different sizes so you cannot stack them into one 3D numpy array which would allow to easily apply a function along the third dimension. One solution is to use a for-loop, another is to use np.split:
a = np.array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2],
[4, 3, 1]])
a_sorted = a[a[:,2].argsort()]
inds = np.unique(a_sorted[:,2], return_index=True)[1]
a_split = np.split(a_sorted, inds)[1:]
# [array([[0, 0, 1],
# [4, 5, 1],
# [4, 3, 1]]),
# array([[1, 1, 2],
# [4, 5, 2]])]
f = np.sum # example of a function
result = list(map(f, a_split))
# [19, 15]
But imho the best solution is to use pandas and groupby as suggested by FBruzzesi. You can then convert the result to a numpy array.
EDIT: For completeness, here are the other two solutions
List comprehension:
b = np.unique(a[:,2])
result = [f(a[a[:,2] == z]) for z in b]
Pandas:
df = pd.DataFrame(a, columns=list('XYZ'))
result = df.groupby(['Z']).apply(lambda x: f(x.values)).tolist()
This is the performance plot I got for a = np.random.randint(0, 100, (n, 3)):
As you can see, approximately up to n = 10^5 the "split solution" is the fastest, but after that the pandas solution performs better.
If you are allowed to use pandas:
import pandas as pd
df=pd.DataFrame(a, columns=['x','y','z'])
df.groupby('z').agg(f)
Here f can be any custom function working on grouped data.
Numeric example:
a = np.array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2]])
df=pd.DataFrame(a, columns=['x','y','z'])
df.groupby('z').size()
z
1 2
2 2
dtype: int64
Remark that .size is the way to count number of rows per group.
To keep it into pure numpy, maybe this can suit your case:
tmp = np.array([a[a[:,2]==i] for i in b])
tmp
array([[[0, 0, 1],
[4, 5, 1]],
[[1, 1, 2],
[4, 5, 2]]])
which is an array with each group of arrays.
c = np.array([])
for x in np.nditer(b):
c = np.append(c, np.where((a[:,2] == x))[0].shape[0])
Output:
[2. 2.]

Assigning to slices of 2D NumPy array

I want to assign 0 to different length slices of a 2d array.
Example:
import numpy as np
arr = np.array([[1,2,3,4],
[1,2,3,4],
[1,2,3,4],
[1,2,3,4]])
idxs = np.array([0,1,2,0])
Given the above array arr and indices idxs how can you assign to different length slices. Such that the result is:
arr = np.array([[0,2,3,4],
[0,0,3,4],
[0,0,0,4],
[0,2,3,4]])
These don't work
slices = np.array([np.arange(i) for i in idxs])
arr[slices] = 0
arr[:, :idxs] = 0
You can use broadcasted comparison to generate a mask, and index into arr accordingly:
arr[np.arange(arr.shape[1]) <= idxs[:, None]] = 0
print(arr)
array([[0, 2, 3, 4],
[0, 0, 3, 4],
[0, 0, 0, 4],
[0, 2, 3, 4]])
This does the trick:
import numpy as np
arr = np.array([[1,2,3,4],
[1,2,3,4],
[1,2,3,4],
[1,2,3,4]])
idxs = [0,1,2,0]
for i,j in zip(range(arr.shape[0]),idxs):
arr[i,:j+1]=0
import numpy as np
arr = np.array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
idxs = np.array([0, 1, 2, 0])
for i, idx in enumerate(idxs):
arr[i,:idx+1] = 0
Here is a sparse solution that may be useful in cases where only a small fraction of places should be zeroed out:
>>> idx = idxs+1
>>> I = idx.cumsum()
>>> cidx = np.ones((I[-1],), int)
>>> cidx[0] = 0
>>> cidx[I[:-1]]-=idx[:-1]
>>> cidx=np.cumsum(cidx)
>>> ridx = np.repeat(np.arange(idx.size), idx)
>>> arr[ridx, cidx]=0
>>> arr
array([[0, 2, 3, 4],
[0, 0, 3, 4],
[0, 0, 0, 4],
[0, 2, 3, 4]])
Explanation: We need to construct the coordinates of the positions we want to put zeros in.
The row indices are easy: we just need to go from 0 to 3 repeating each number to fill the corresponding slice.
The column indices start at zero and most of the time are incremented by 1. So to construct them we use cumsum on mostly ones. Only at the start of each new row we have to reset. We do that by subtracting the length of the corresponding slice such as to cancel the ones we have summed in that row.

Numpy - add row to array

How does one add rows to a numpy array?
I have an array A:
A = array([[0, 1, 2], [0, 2, 0]])
I wish to add rows to this array from another array X if the first element of each row in X meets a specific condition.
Numpy arrays do not have a method 'append' like that of lists, or so it seems.
If A and X were lists I would merely do:
for i in X:
if i[0] < 3:
A.append(i)
Is there a numpythonic way to do the equivalent?
Thanks,
S ;-)
You can do this:
newrow = [1, 2, 3]
A = numpy.vstack([A, newrow])
What is X? If it is a 2D-array, how can you then compare its row to a number: i < 3?
EDIT after OP's comment:
A = array([[0, 1, 2], [0, 2, 0]])
X = array([[0, 1, 2], [1, 2, 0], [2, 1, 2], [3, 2, 0]])
add to A all rows from X where the first element < 3:
import numpy as np
A = np.vstack((A, X[X[:,0] < 3]))
# returns:
array([[0, 1, 2],
[0, 2, 0],
[0, 1, 2],
[1, 2, 0],
[2, 1, 2]])
As this question is been 7 years before, in the latest version which I am using is numpy version 1.13, and python3, I am doing the same thing with adding a row to a matrix, remember to put a double bracket to the second argument, otherwise, it will raise dimension error.
In here I am adding on matrix A
1 2 3
4 5 6
with a row
7 8 9
same usage in np.r_
A = [[1, 2, 3], [4, 5, 6]]
np.append(A, [[7, 8, 9]], axis=0)
>> array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
#or
np.r_[A,[[7,8,9]]]
Just to someone's intersted, if you would like to add a column,
array = np.c_[A,np.zeros(#A's row size)]
following what we did before on matrix A, adding a column to it
np.c_[A, [2,8]]
>> array([[1, 2, 3, 2],
[4, 5, 6, 8]])
If you want to prepend, you can just flip the order of the arguments, i.e.:
np.r_([[7, 8, 9]], A)
>> array([[7, 8, 9],
[1, 2, 3],
[4, 5, 6]])
If no calculations are necessary after every row, it's much quicker to add rows in python, then convert to numpy. Here are timing tests using python 3.6 vs. numpy 1.14, adding 100 rows, one at a time:
import numpy as np
from time import perf_counter, sleep
def time_it():
# Compare performance of two methods for adding rows to numpy array
py_array = [[0, 1, 2], [0, 2, 0]]
py_row = [4, 5, 6]
numpy_array = np.array(py_array)
numpy_row = np.array([4,5,6])
n_loops = 100
start_clock = perf_counter()
for count in range(0, n_loops):
numpy_array = np.vstack([numpy_array, numpy_row]) # 5.8 micros
duration = perf_counter() - start_clock
print('numpy 1.14 takes {:.3f} micros per row'.format(duration * 1e6 / n_loops))
start_clock = perf_counter()
for count in range(0, n_loops):
py_array.append(py_row) # .15 micros
numpy_array = np.array(py_array) # 43.9 micros
duration = perf_counter() - start_clock
print('python 3.6 takes {:.3f} micros per row'.format(duration * 1e6 / n_loops))
sleep(15)
#time_it() prints:
numpy 1.14 takes 5.971 micros per row
python 3.6 takes 0.694 micros per row
So, the simple solution to the original question, from seven years ago, is to use vstack() to add a new row after converting the row to a numpy array. But a more realistic solution should consider vstack's poor performance under those circumstances. If you don't need to run data analysis on the array after every addition, it is better to buffer the new rows to a python list of rows (a list of lists, really), and add them as a group to the numpy array using vstack() before doing any data analysis.
You can also do this:
newrow = [1,2,3]
A = numpy.concatenate((A,newrow))
import numpy as np
array_ = np.array([[1,2,3]])
add_row = np.array([[4,5,6]])
array_ = np.concatenate((array_, add_row), axis=0)
I use 'np.vstack' which is faster, EX:
import numpy as np
input_array=np.array([1,2,3])
new_row= np.array([4,5,6])
new_array=np.vstack([input_array, new_row])
I use numpy.insert(arr, i, the_object_to_be_added, axis) in order to insert object_to_be_added at the i'th row(axis=0) or column(axis=1)
import numpy as np
a = np.array([[1, 2, 3], [5, 4, 6]])
# array([[1, 2, 3],
# [5, 4, 6]])
np.insert(a, 1, [55, 66], axis=1)
# array([[ 1, 55, 2, 3],
# [ 5, 66, 4, 6]])
np.insert(a, 2, [50, 60, 70], axis=0)
# array([[ 1, 2, 3],
# [ 5, 4, 6],
# [50, 60, 70]])
Too old discussion, but I hope it helps someone.
If you can do the construction in a single operation, then something like the vstack-with-fancy-indexing answer is a fine approach. But if your condition is more complicated or your rows come in on the fly, you may want to grow the array. In fact the numpythonic way to do something like this - dynamically grow an array - is to dynamically grow a list:
A = np.array([[1,2,3],[4,5,6]])
Alist = [r for r in A]
for i in range(100):
newrow = np.arange(3)+i
if i%5:
Alist.append(newrow)
A = np.array(Alist)
del Alist
Lists are highly optimized for this kind of access pattern; you don't have convenient numpy multidimensional indexing while in list form, but for as long as you're appending it's hard to do better than a list of row arrays.
You can use numpy.append() to append a row to numpty array and reshape to a matrix later on.
import numpy as np
a = np.array([1,2])
a = np.append(a, [3,4])
print a
# [1,2,3,4]
# in your example
A = [1,2]
for row in X:
A = np.append(A, row)

Categories

Resources