Mask array based on other array? - python

I have a numpy array arrayBig
10 5 27 30 34
2 34 23 2 3
2 3 43 12 23
2 24 34 2 34
And I have a numpy array arraySmall
1 0 0
0 1 1
1 1 0
What I want is a numpy array arrayNew
34 0 0
0 43 12
24 34 0
I know my arraySmall has the shape (3,3) and is located at index (1 1) in arrayBig. How can I get arrayNew with Numpy?

>>> import numpy as np
>>> arrayBig = np.array([
... [10, 5, 27, 30, 34],
... [2, 34, 23, 2, 3],
... [2, 3, 43, 12, 23],
... [2, 24, 34, 2, 34],
... ])
>>> arraySmall = np.array([
... [1, 0, 0],
... [0, 1, 1],
... [1, 1, 0],
... ])
>>> arrayBig[1:4, 1:4] * arraySmall
array([[34, 0, 0],
[ 0, 43, 12],
[24, 34, 0]])

I recently learned about advanced boolean indexing. I'm not sure if it's any better than the other answer but you can do:
>>> a = np.array([[1,2,3],[4,5,6]])
>>> b = np.array([[1,0],[0,1]])
>>> c = b == 0
>>> d = a[0:b.shape[0],0:b.shape[1]]
>>> d[c] = 0
>>> d
array([[1, 0],
[0, 5]])

Related

Vectorize with a list of lists in Python

My final goal is to use a vectorized numpy solution for a for-loop. This loop creates for each element a random sample from another list if its elements are not given in the original element. However, the for-loops' input is a list of lists. I do not know how to apply a numpy vectorization for a list of lists. A reproducible example is here:
import random
list_of_all_items = [1, 2, 3, 4, 12, 21, 23, 42, 93]
seen_formats = [[1, 2, 3, 4], [2,23, 21, 3], [12, 42, 93, 1]]
not_seen_formats = []
for seen in seen_formats:
not_seen_formats.append(random.sample([format_ for format_ in list_of_all_items if format_ not in seen],
len(seen) * 1))
What I tried so far is:
import numpy as np
np.where(np.in1d(np.random.choice(list_of_all_items, 2, replace = False), np.asarray(seen_formats)))
>> (array([0, 1], dtype=int64),)
This sadly makes no sense. What I would like to have returned is an array which should contain random samples for the given list of lists, like:
>> array([[12, 21], # those numbers should be random numbers
[ 1, 4],
[ 2, 3]])
import numpy as np
np.random.seed(42)
list_of_all_items = np.array([1, 2, 3, 4, 12, 21, 23, 42, 93])
seen_formats = np.array([[1, 2, 3, 4], [2,23, 21, 3], [12, 42, 93, 1]])
print(list_of_all_items, '\n')
print(seen_formats, '\n')
def select(a, b):
return np.random.choice(a=np.setdiff1d(b, a), size=a.size, replace=False)
selection = np.apply_along_axis(func1d=select, axis=1, arr=seen_formats, b=list_of_all_items)
print(selection)
# Alternatively:
# select_vect = np.vectorize(select, excluded=['b'], signature='(m),(n)->(m)')
# selection2 = select_vect(seen_formats, list_of_all_items)
# print(selection2)
Output:
[ 1 2 3 4 12 21 23 42 93]
[[ 1 2 3 4]
[ 2 23 21 3]
[12 42 93 1]]
[[21 93 23 12]
[42 4 12 1]
[ 3 2 21 23]]

Transpose Numpy Array (Vector)

a = np.array([0,1,2])
b = np.array([3,4,5,6,7])
...
c = np.dot(a,b)
I want to transpose b so I can calculate the dot product of a and b.
You can use numpy's broadcasting for this:
import numpy as np
a = np.array([0,1,2])
b = np.array([3,4,5,6,7])
In [3]: a[:,None]*b
Out[3]:
array([[ 0, 0, 0, 0, 0],
[ 3, 4, 5, 6, 7],
[ 6, 8, 10, 12, 14]])
This has nothing to do with a dot product, though. But in the comments you said, that you want this result.
You could also use the numpy function outer:
In [4]: np.outer(a, b)
Out[4]:
array([[ 0, 0, 0, 0, 0],
[ 3, 4, 5, 6, 7],
[ 6, 8, 10, 12, 14]])
Well for this what you want is the outer product of the two arrays. The function you want to use for this is np.outer, :
a = np.array([0,1,2])
b = np.array([3,4,5,6,7])
np.outer(a,b)
array([[ 0, 0, 0, 0, 0],
[ 3, 4, 5, 6, 7],
[ 6, 8, 10, 12, 14]])
So with NumPy you could reshape swapping axes:
a = np.swapaxes([a], 1, 0)
# [[0]
# [1]
# [2]]
Then
print(a * b)
# [[ 0 0 0 0 0]
# [ 3 4 5 6 7]
# [ 6 8 10 12 14]]
Swapping b require to transpose the product, se here below.
Or usual NumPy reshape:
a = np.array([0,1,2])
b = np.array([3,4,5,6,7]).reshape(5,1)
print((a * b).T)
# [[ 0 0 0 0 0]
# [ 3 4 5 6 7]
# [ 6 8 10 12 14]]
Reshape is like b = np.array([ [bb] for bb in [3,4,5,6,7] ]) then b becomes:
# [[3]
# [4]
# [5]
# [6]
# [7]]
While reshaping a no need to transpose:
a = np.array([0,1,2]).reshape(3,1)
b = np.array([3,4,5,6,7])
print(a * b)
# [[ 0 0 0 0 0]
# [ 3 4 5 6 7]
# [ 6 8 10 12 14]]
Just out of curiosity, good old list comprehension:
a = [0,1,2]
b = [3,4,5,6,7]
print( [ [aa * bb for bb in b] for aa in a ] )
#=> [[0, 0, 0, 0, 0], [3, 4, 5, 6, 7], [6, 8, 10, 12, 14]]
Others have provided the outer and broadcasted solutions. Here's the dot one(s):
np.dot(a.reshape(3,1), b.reshape(1,5))
a[:,None].dot(b[None,:])
a[None].T.dot( b[None])
Conceptually I think it's a bit of an overkill, but due to implementation details, it actually is fastest
.

Multidimensional cumulative sum in numpy

I want to be able to calculate the cumulative sum of a large n-dimensional numpy array. The value of each element in the final array should be the sum of all elements which have indices greater than or equal to the current element.
2D: xᶦʲ = ∑xᵐⁿ ∀ m ≥ i and n ≥ j
3D: xᶦʲᵏ = ∑xᵐⁿᵒ ∀ m ≥ i and n ≥ j and o ≥ k
Examples in 2D:
1 1 0 2 1 0
1 1 1 -> 5 3 1
1 1 1 8 5 2
1 2 3 6 5 3
4 5 6 -> 21 16 9
7 8 9 45 33 18
Example in 3D:
1 1 1 3 2 1
1 1 1 6 4 2
1 1 1 9 6 3
1 1 1 6 4 2
1 1 1 -> 12 8 4
1 1 1 18 12 6
1 1 1 9 6 3
1 1 1 18 12 6
1 1 1 27 18 9
Flip along the last axis, cumsum along the same, flip it back and finally cumsum along the second last axis onwards until the first axis -
def multidim_cumsum(a):
out = a[...,::-1].cumsum(-1)[...,::-1]
for i in range(2,a.ndim+1):
np.cumsum(out, axis=-i, out=out)
return out
Sample 2D case run -
In [107]: a
Out[107]:
array([[1, 1, 0],
[1, 1, 1],
[1, 1, 1]])
In [108]: multidim_cumsum(a)
Out[108]:
array([[2, 1, 0],
[5, 3, 1],
[8, 5, 2]])
Sample 3D case run -
In [110]: a
Out[110]:
array([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1]]])
In [111]: multidim_cumsum(a)
Out[111]:
array([[[ 3, 2, 1],
[ 6, 4, 2],
[ 9, 6, 3]],
[[ 6, 4, 2],
[12, 8, 4],
[18, 12, 6]],
[[ 9, 6, 3],
[18, 12, 6],
[27, 18, 9]]])
For those who want a "numpy-like" cumsum where the top-left corner is smallest:
def multidim_cumsum(a):
out = a.cumsum(-1)
for i in range(2,a.ndim+1):
np.cumsum(out, axis=-i, out=out)
return out
Modified from #Divakar (thanks to him!)
Here is a general solution. I'm going by the description, not the examples, i.e. order of vertical display is top down not bottom up:
import itertools as it
import functools as ft
ft.reduce(np.cumsum, it.chain((a[a.ndim*(np.s_[::-1],)],), range(a.ndim)))[a.ndim*(np.s_[::-1],)]
Or in-place:
for i in range(a.ndim):
b = a.swapaxes(0, i)[::-1]
b.cumsum(axis=0, out=b)

Writing a 3d numpy array that is readable in matlab

I'm trying to save a 3D numpy array to my disk so that I can later read it in matlab. I've had some difficulty using numpy.savetxt() on a 3D array, so my solution has been to first convert it to a 1D array using the following code:
import numpy
array = numpy.array([[0, 1, 2, 3],
[0, 1, 1, 3],
[3, 1, 3, 1]])
ndarray = numpy.dstack((array, array, array))
darray = ndarray.reshape(36,1)
numpy.savetxt('test.txt', darray, fmt = '%i')
Then in matlab it can be read with the following code:
file = fopen('test.txt')
array = fscanf(file, '%f')
My issue now is converting it back to the original shape. Using reshape(array, 3,4,3) yields the following:
ans(:,:,1) =
0 1 2 3
0 1 2 3
0 1 2 3
ans(:,:,2) =
0 1 1 3
0 1 1 3
0 1 1 3
ans(:,:,3) =
3 1 3 1
3 1 3 1
3 1 3 1
I've tried to transpose the 1D matlab array, then use reshape() but get the same array.
What matlab function can I apply to achieve my original python array?
You want to permute the dimensions. In numpy this is transpose. There are two complications - the 'F' order of MATLAB matrices, and the display pattern, using blocks on the last dimension (which is the outer one with F order). Jump to the end of this answer for details.
===
In [72]: arr = np.array([[0, 1, 2, 3],
...: [0, 1, 1, 3],
...: [3, 1, 3, 1]])
...:
In [80]: np.dstack((arr,arr+1))
Out[80]:
array([[[0, 1],
[1, 2],
[2, 3],
[3, 4]],
[[0, 1],
[1, 2],
[1, 2],
[3, 4]],
[[3, 4],
[1, 2],
[3, 4],
[1, 2]]])
In [81]: np.dstack((arr,arr+1)).shape
Out[81]: (3, 4, 2)
In [75]: from scipy.io import loadmat, savemat
In [76]: pwd
Out[76]: '/home/paul/mypy'
In [83]: savemat('test3',{'arr':arr, 'arr3':arr3})
In Octave
>> load 'test3.mat'
>> arr
arr =
0 1 2 3
0 1 1 3
3 1 3 1
>> arr3
arr3 =
ans(:,:,1) =
0 1 2 3
0 1 1 3
3 1 3 1
ans(:,:,2) =
1 2 3 4
1 2 2 4
4 2 4 2
>> size(arr3)
ans =
3 4 2
back in numpy I can display the array as 2 3x4 blocks with:
In [95]: arr3[:,:,0]
Out[95]:
array([[0, 1, 2, 3],
[0, 1, 1, 3],
[3, 1, 3, 1]])
In [96]: arr3[:,:,1]
Out[96]:
array([[1, 2, 3, 4],
[1, 2, 2, 4],
[4, 2, 4, 2]])
These arrays, ravelled to 1d (showing in effect the layout of values in the underlying databuffer):
In [100]: arr.ravel()
Out[100]: array([0, 1, 2, 3, 0, 1, 1, 3, 3, 1, 3, 1])
In [101]: arr3.ravel()
Out[101]:
array([0, 1, 1, 2, 2, 3, 3, 4, 0, 1, 1, 2, 1, 2, 3, 4, 3, 4, 1, 2, 3, 4, 1, 2])
The corresponding ravel in Octave:
>> arr(:).'
ans =
0 0 3 1 1 1 2 1 3 3 3 1
>> arr3(:).'
ans =
0 0 3 1 1 1 2 1 3 3 3 1 1 1 4 2 2 2 3 2 4 4 4 2
MATLAB uses F (fortran) order, with the first dimension changing fastest. Thus it is natural to display blocks arr(:,:i). You can specify order='F' when creating and working with numpy arrays. But it can be tricky keeping the order straight, especially when working with 3d. loadmat/savemat try to do some of the reordering for us. For example a 2d MATLAB matrix loads as an order F array in numpy.
In [107]: np.array([0,0,3,1,1,1,2,1,3,3,3,1])
Out[107]: array([0, 0, 3, 1, 1, 1, 2, 1, 3, 3, 3, 1])
In [108]: np.array([0,0,3,1,1,1,2,1,3,3,3,1]).reshape(4,3)
Out[108]:
array([[0, 0, 3],
[1, 1, 1],
[2, 1, 3],
[3, 3, 1]])
In [109]: np.array([0,0,3,1,1,1,2,1,3,3,3,1]).reshape(4,3).T
Out[109]:
array([[0, 1, 2, 3],
[0, 1, 1, 3],
[3, 1, 3, 1]])
In [111]: np.array([0,0,3,1,1,1,2,1,3,3,3,1]).reshape((3,4),order='F')
Out[111]:
array([[0, 1, 2, 3],
[0, 1, 1, 3],
[3, 1, 3, 1]])
It might easier to keep track of shapes with this array:
In [112]: arr3 = np.arange(2*3*4).reshape(2,3,4)
In [113]: arr3f = np.arange(2*3*4).reshape(2,3,4, order='F')
In [114]: arr3
Out[114]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [115]: arr3f
Out[115]:
array([[[ 0, 6, 12, 18],
[ 2, 8, 14, 20],
[ 4, 10, 16, 22]],
[[ 1, 7, 13, 19],
[ 3, 9, 15, 21],
[ 5, 11, 17, 23]]])
In [116]: arr3f.ravel()
Out[116]:
array([ 0, 6, 12, 18, 2, 8, 14, 20, 4, 10, 16, 22, 1, 7, 13, 19, 3,
9, 15, 21, 5, 11, 17, 23])
In [117]: arr3f.ravel(order='F')
Out[117]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23])
In [118]: savemat('test3',{'arr3':arr3, 'arr3f':arr3f})
In Octave:
>> arr3
arr3 =
ans(:,:,1) =
0 4 8
12 16 20
ans(:,:,2) =
1 5 9
13 17 21
....
>> arr3f
arr3f =
ans(:,:,1) =
0 2 4
1 3 5
ans(:,:,2) =
6 8 10
7 9 11
...
>> arr3.ravel()'
error: int32 matrix cannot be indexed with .
>> arr3(:)'
ans =
Columns 1 through 20:
0 12 4 16 8 20 1 13 5 17 9 21 2 14 6 18 10 22 3 15
Columns 21 through 24:
7 19 11 23
>> arr3f(:)'
ans =
Columns 1 through 20:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Columns 21 through 24:
20 21 22 23
arr3f still looks 'messedup' when printed by blocks, but when raveled we see that values are in same F order. That's also evident if we print the last 'block' of the numpy array:
In [119]: arr3f[:,:,0]
Out[119]:
array([[0, 2, 4],
[1, 3, 5]])
So to match up numpy and matlab we have to keep 2 things straight - the order, and the block display style.
My MATLAB is rusty, but I found permute with is similar to the np.transpose. Using that to reorder the dimensions:
>> permute(arr3,[3,2,1])
ans =
ans(:,:,1) =
0 4 8
1 5 9
2 6 10
3 7 11
ans(:,:,2) =
12 16 20
13 17 21
14 18 22
15 19 23
>> permute(arr3,[3,2,1])(:)'
ans =
Columns 1 through 20:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Columns 21 through 24:
20 21 22 23
The equivalent transpose in numpy
In [121]: arr3f.transpose(2,1,0).ravel()
Out[121]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23])
(Sorry for the rambling answer. I may go back an edit it. Hopefully it gives you something to work with.)
===
Let's try to apply that rambling more explicitly to your case
In [122]: x = np.array([[0, 1, 2, 3],
...: [0, 1, 1, 3],
...: [3, 1, 3, 1]])
...:
In [123]: x3 = np.dstack((x,x,x))
In [125]: dx3 = x3.reshape(36,1)
In [126]: np.savetxt('test3.txt',dx3, fmt='%i')
In [127]: cat test3.txt
0
0
0
....
3
3
1
1
1
In Octave
>> file = fopen('test3.txt')
file = 21
>> array = fscanf(file,'%f')
array =
0
0
....
>> reshape(array,3,4,3)
ans =
ans(:,:,1) =
0 1 2 3
0 1 2 3
0 1 2 3
ans(:,:,2) =
0 1 1 3
0 1 1 3
0 1 1 3
ans(:,:,3) =
3 1 3 1
3 1 3 1
3 1 3 1
and with the perumtation
>> permute(reshape(array,3,4,3),[3,2,1])
ans =
ans(:,:,1) =
0 1 2 3
0 1 1 3
3 1 3 1
ans(:,:,2) =
0 1 2 3
0 1 1 3
3 1 3 1
ans(:,:,3) =
0 1 2 3
0 1 1 3
3 1 3 1

Indexing second dimension of Tensor using indices

I selected element in my tensor using a tensor of indices. Here the code below I use list of indices 0, 3, 2, 1 to select 11, 15, 2, 5
>>> import torch
>>> a = torch.Tensor([5,2,11, 15])
>>> torch.randperm(4)
0
3
2
1
[torch.LongTensor of size 4]
>>> i = torch.randperm(4)
>>> a[i]
11
15
2
5
[torch.FloatTensor of size 4]
Now, I have
>>> b = torch.Tensor([[5, 2, 11, 15],[5, 2, 11, 15], [5, 2, 11, 15]])
>>> b
5 2 11 15
5 2 11 15
5 2 11 15
[torch.FloatTensor of size 3x4]
Now, I want to use indices to select column 0, 3, 2, 1. In others word, I want a tensor like this
>>> b
11 15 2 5
11 15 2 5
11 15 2 5
[torch.FloatTensor of size 3x4]
If using pytorch version v0.1.12
For this version there isnt an easy way to do this. Even though pytorch promises tensor manipulation to be exactly like numpy's, there are some capabilities that are still lacking. This is one of them.
Typically you would be able to do this relatively easily if you were working with numpy arrays. Like so.
>>> i = [2, 1, 0, 3]
>>> a = np.array([[5, 2, 11, 15],[5, 2, 11, 15], [5, 2, 11, 15]])
>>> a[:, i]
array([[11, 2, 5, 15],
[11, 2, 5, 15],
[11, 2, 5, 15]])
But the same thing with Tensors will give you an error:
>>> i = torch.LongTensor([2, 1, 0, 3])
>>> a = torch.Tensor([[5, 2, 11, 15],[5, 2, 11, 15], [5, 2, 11, 15]])
>>> a[:,i]
The error:
TypeError: indexing a tensor with an object of type torch.LongTensor. The only supported types are integers, slices, numpy scalars and torch.LongTensor or torch.ByteTensor as the only argument.
What that TypeError is telling you is, if you plan to use a LongTensor or a ByteTensor for indexing, then the only valid syntax is a[<LongTensor>] or a[<ByteTensor>]. Anything other than that will not work.
Because of this limitation, you have two options:
Option 1: Convert to numpy, permute, then back to Tensor
>>> i = [2, 1, 0, 3]
>>> a = torch.Tensor([[5, 2, 11, 15],[5, 2, 11, 15], [5, 2, 11, 15]])
>>> np_a = a.numpy()
>>> np_a = np_a[:,i]
>>> a = torch.from_numpy(np_a)
>>> a
11 2 5 15
11 2 5 15
11 2 5 15
[torch.FloatTensor of size 3x4]
Option 2: Move the dim you want to permute to 0 and then do it
you will move the dim that you are looking to permute, (in your case dim=1) to 0, perform the permutation, and move it back. Its a bit hacky, but it gets the job done.
def hacky_permute(a, i, dim):
a = torch.transpose(a, 0, dim)
a = a[i]
a = torch.transpose(a, 0, dim)
return a
And use it like so:
>>> i = torch.LongTensor([2, 1, 0, 3])
>>> a = torch.Tensor([[5, 2, 11, 15],[5, 2, 11, 15], [5, 2, 11, 15]])
>>> a = hacky_permute(a, i, dim=1)
>>> a
11 2 5 15
11 2 5 15
11 2 5 15
[torch.FloatTensor of size 3x4]
If using pytorch version v0.2.0
Direct indexing using a tensor now works in this version. ie.
>>> i = torch.LongTensor([2, 1, 0, 3])
>>> a = torch.Tensor([[5, 2, 11, 15],[5, 2, 11, 15], [5, 2, 11, 15]])
>>> a[:,i]
11 2 5 15
11 2 5 15
11 2 5 15
[torch.FloatTensor of size 3x4]

Categories

Resources