Replicating elements in numpy array

Replicating elements in numpy array - python

I have a numpy array say
a = array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
I have an array 'replication' of the same size where replication[i,j](>=0) denotes how many times a[i][j] should be repeated along the row. Obiviously, replication array follows the invariant that np.sum(replication[i]) have the same value for all i.
For example, if
replication = array([[1, 2, 1],
[1, 1, 2],
[2, 1, 1]])
then the final array after replicating is:
new_a = array([[1, 2, 2, 3],
[4, 5, 6, 6],
[7, 7, 8, 9]])
Presently, I am doing this to create new_a:
##allocate new_a
h = a.shape[0]
w = a.shape[1]
for row in range(h):
ll = [[a[row][j]]*replicate[row][j] for j in range(w)]
new_a[row] = np.array([item for sublist in ll for item in sublist])
However, this seems to be too slow as it involves using lists. Can I do the intended entirely in numpy, without the use of python lists?

You can flatten out your replication array, then use the .repeat() method of a:
import numpy as np
a = array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
replication = array([[1, 2, 1],
[1, 1, 2],
[2, 1, 1]])
new_a = a.repeat(replication.ravel()).reshape(a.shape[0], -1)
print(repr(new_a))
# array([[1, 2, 2, 3],
# [4, 5, 6, 6],
# [7, 7, 8, 9]])

Related

Split a NumPy array into subarrays according to the values (not sorted, but grouped) of another array

Suppose I have two NumPy arrays
x = [[1, 2, 8],
[2, 9, 1],
[3, 8, 9],
[4, 3, 5],
[5, 2, 3],
[6, 4, 7],
[7, 2, 3],
[8, 2, 2],
[9, 5, 3],
[10, 2, 3],
[11, 2, 4]]
y = [0, 0, 1, 0, 1, 1, 2, 2, 2, 0, 0]
Note:
(values in x are not sorted in any way. I chose this example to better illustrate the example)
(These are just two examples of x and y. values of x and y can be arbitrarily many different numbers and y can have arbitrarily different numbers, but there are always as many values in x as there are in y)
I want to efficiently split the array x into sub-arrays according to the values in y.
My desired outputs would be
z_0 = [[1, 2, 8],
[2, 9, 1],
[4, 3, 5],
[10, 2, 3],
[11, 2, 4]]
z_1 = [[3, 8, 9],
[5, 2, 3],
[6, 4, 7],]
z_2 = [[7, 2, 3],
[8, 2, 2],
[9, 5, 3]]
Assuming that y starts with zero and is not sorted but grouped, what is the most efficient way to do this?
Note: This question is the unsorted version of this question:
Split a NumPy array into subarrays according to the values (sorted in ascending order) of another array

One way to solve this is to build up a list of filter indexes for each y value and then simply select those elements of x. For example:
z_0 = x[[i for i, v in enumerate(y) if v == 0]]
z_1 = x[[i for i, v in enumerate(y) if v == 1]]
z_2 = x[[i for i, v in enumerate(y) if v == 2]]
Output
array([[ 1, 2, 8],
[ 2, 9, 1],
[ 4, 3, 5],
[10, 2, 3],
[11, 2, 4]])
array([[3, 8, 9],
[5, 2, 3],
[6, 4, 7]])
array([[7, 2, 3],
[8, 2, 2],
[9, 5, 3]])
If you want to be more generic and support different sets of numbers in y, you could use a comprehension to produce a list of arrays e.g.
z = [x[[i for i, v in enumerate(y) if v == m]] for m in set(y)]
Output:
[array([[ 1, 2, 8],
[ 2, 9, 1],
[ 4, 3, 5],
[10, 2, 3],
[11, 2, 4]]),
array([[3, 8, 9],
[5, 2, 3],
[6, 4, 7]]),
array([[7, 2, 3],
[8, 2, 2],
[9, 5, 3]])]
If y is also an np.array and the same length as x you can simplify this to use boolean indexing:
z = [x[y==m] for m in set(y)]
Output is the same as above.

Just use list comprehension and boolean indexing
x = np.array(x)
y = np.array(y)
z = [x[y == i] for i in range(y.max() + 1)]
z
Out[]:
[array([[ 1, 2, 8],
[ 2, 9, 1],
[ 4, 3, 5],
[10, 2, 3],
[11, 2, 4]]),
array([[3, 8, 9],
[5, 2, 3],
[6, 4, 7]]),
array([[7, 2, 3],
[8, 2, 2],
[9, 5, 3]])]

Slight variation.
from operator import itemgetter
label = itemgetter(1)
Associate the implied information with the label ... (index,label)
y1 = [thing for thing in enumerate(y)]
Sort on the label
y1.sort(key=label)
Group by label and construct the results
import itertools
d = {}
for key,group in itertools.groupby(y1,label):
d[f'z{key}'] = [x[i] for i,k in group]
Pandas solution:
>>> import pandas as pd
>>> >>> df = pd.DataFrame({'points':[thing for thing in x],'cat':y})
>>> z = df.groupby('cat').agg(list)
>>> z
points
cat
0 [[1, 2, 8], [2, 9, 1], [4, 3, 5], [10, 2, 3], ...
1 [[3, 8, 9], [5, 2, 3], [6, 4, 7]]
2 [[7, 2, 3], [8, 2, 2], [9, 5, 3]]

Numpy array without for loop

array = np.empty(8,4)
for I in range(8):
array[I] = I
Can this be implemented without for loop. I would like know other approaches
array = np.empty(8,4)
for I in range(8):
array[I] = I
[0,0,0,0]
[1,1,1,1]
.
.
.
[7,7,7,7]

One easy way is to just use np.repeat:
array = np.repeat(np.arange(8), 4).reshape(8, 4)
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4],
[5, 5, 5, 5],
[6, 6, 6, 6],
[7, 7, 7, 7]])

Filling an array with arrays or vectors in python using numpy without a loop

I'm trying to find a way to fill an array with rows of values. It's much easier to express my desired output with an example. Given the input of an N x M matrix, array1,
array1 = np.array([[2, 3, 4],
[4, 8, 3],
[7, 6, 3]])
I would like to output an array of arrays in which each row is an N x N consisting of the values from the respective row. The output would be
[[[2, 3, 4],
[2, 3, 4],
[2, 3, 4]],
[[4, 8, 3],
[4, 8, 3],
[4, 8, 3]],
[[7, 6, 3],
[7, 6, 3],
[7, 6, 3]]]

You can reshape the array from 2d to 3d, then use numpy.repeat() along the desired axis:
np.repeat(array1[:, None, :], 3, axis=1)
#array([[[2, 3, 4],
# [2, 3, 4],
# [2, 3, 4]],
# [[4, 8, 3],
# [4, 8, 3],
# [4, 8, 3]],
# [[7, 6, 3],
# [7, 6, 3],
# [7, 6, 3]]])
Or equivalently you can use numpy.tile:
np.tile(array1[:, None, :], (1,3,1))

Another solution which is sometimes useful is the following
out = np.empty((3,3,3), dtype=array1.dtype)
out[...] = array1[:, None, :]

Identify vectors with same value in one column with numpy in python

I have a large 2d array of vectors. I want to split this array into several arrays according to one of the vectors' elements or dimensions. I would like to receive one such small array if the values along this column are consecutively identical. For example considering the third dimension or column:
orig = np.array([[1, 2, 3],
[3, 4, 3],
[5, 6, 4],
[7, 8, 4],
[9, 0, 4],
[8, 7, 3],
[6, 5, 3]])
I want to turn into three arrays consisting of rows 1,2 and 3,4,5 and 6,7:
>>> a
array([[1, 2, 3],
[3, 4, 3]])
>>> b
array([[5, 6, 4],
[7, 8, 4],
[9, 0, 4]])
>>> c
array([[8, 7, 3],
[6, 5, 3]])
I'm new to python and numpy. Any help would be greatly appreciated.
Regards
Mat
Edit: I reformatted the arrays to clarify the problem

Using np.split:
>>> a, b, c = np.split(orig, np.where(orig[:-1, 2] != orig[1:, 2])[0]+1)
>>> a
array([[1, 2, 3],
[1, 2, 3]])
>>> b
array([[1, 2, 4],
[1, 2, 4],
[1, 2, 4]])
>>> c
array([[1, 2, 3],
[1, 2, 3]])

Nothing fancy here, but this good old-fashioned loop should do the trick
import numpy as np
a = np.array([[1, 2, 3],
[1, 2, 3],
[1, 2, 4],
[1, 2, 4],
[1, 2, 4],
[1, 2, 3],
[1, 2, 3]])
groups = []
rows = a[0]
prev = a[0][-1] # here i assume that the grouping is based on the last column, change the index accordingly if that is not the case.
for row in a[1:]:
if row[-1] == prev:
rows = np.vstack((rows, row))
else:
groups.append(rows)
rows = [row]
prev = row[-1]
groups.append(rows)
print groups
## [array([[1, 2, 3],
## [1, 2, 3]]),
## array([[1, 2, 4],
## [1, 2, 4],
## [1, 2, 4]]),
## array([[1, 2, 3],
## [1, 2, 3]])]

if a looks like this:
array([[1, 1, 2, 3],
[2, 1, 2, 3],
[3, 1, 2, 4],
[4, 1, 2, 4],
[5, 1, 2, 4],
[6, 1, 2, 3],
[7, 1, 2, 3]])
than this
col = a[:, -1]
indices = np.where(col[:-1] != col[1:])[0] + 1
indices = np.concatenate(([0], indices, [len(a)]))
res = [a[start:end] for start, end in zip(indices[:-1], indices[1:])]
print(res)
results in:
[array([[1, 2, 3],
[1, 2, 3]]), array([[1, 2, 4],
[1, 2, 4],
[1, 2, 4]]), array([[1, 2, 3],
[1, 2, 3]])]
Update: np.split() is much nicer. No need to add first and last index:
col = a[:, -1]
indices = np.where(col[:-1] != col[1:])[0] + 1
res = np.split(a, indices)

Insert 1D NumPy array as column in existing 2D array

I have a 2D NumPy array:
>>> import numpy as np
>>> a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
>>> a
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
and a 1D array:
>>> b = np.arange(3)
>>> b
array([0, 1, 2])
Is there an elegant way to insert b into a as a new first column?
So that:
>>> a
array([[0, 1, 2, 3],
[1, 4, 5, 6],
[2, 7, 8, 9]])

You could use column_stack()
In [256]: np.column_stack((b, a))
Out[256]:
array([[0, 1, 2, 3],
[1, 4, 5, 6],
[2, 7, 8, 9]])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Replicating elements in numpy array - python

Related

Split a NumPy array into subarrays according to the values (not sorted, but grouped) of another array

Numpy array without for loop

Filling an array with arrays or vectors in python using numpy without a loop

Identify vectors with same value in one column with numpy in python

Insert 1D NumPy array as column in existing 2D array

Categories

Resources