Sort array with repeated values

Sort array with repeated values - python

I have to order an array with values from 0 to 9 that are repeated and obtain the vector initial index. The input array is:
[3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4]
I would like to obtain the following order:
array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9], dtype=uint8)
Instead of:
array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 9, 9])
which is given by:
import numpy as np
a = [3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4]
np.argsort(a)
Is there a way to manipulate this function?

l = [1,2,3,4,5,6,7,8,9,4,3,5]
l_oredered = []
while len(l) != 0:
unique_nums = list(set(l))
unique_nums.sort()
l_oredered.extend(unique_nums)
for num in unique_nums:
l.remove(num)
print(l_oredered)
This will result with:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 3, 4, 5]
You can apply the thinking with NumPy or convert the final result into a NumPy array.

For an array with a small number of elements #Mr.O's answer is faster. The code below is faster if there are more than around 100 ints in arr.
import numpy as np
def sort_groups( arr ):
ct = np.ones( len(arr), dtype = np.int64 )
for i in set( arr ):
ct[arr == i] = ct[arr == i ].cumsum()
# ct calculates a rank for each int in arr
tosort = ( arr.max() + 1 ) * ct + arr
# tosort ranks by ct first then a if ct's are equal
return arr[ np.argsort( tosort ) ]
a = np.array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4])
sort_groups( a )
# array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9])
Breaking the function out to see what's happening:
arr = a
ct = np.ones( len(arr), dtype = np.int64 )
for i in set( arr ):
ct[arr == i] = ct[arr == i ].cumsum()
arr, ct
# (array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4]),
# array([1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2]))
tosort = ( arr.max() + 1 ) * ct + arr # Assumes arr is > 0
tosort
# array([13, 11, 12, 10, 14, 15, 16, 17, 21, 20, 19, 25, 23, 29, 22, 27, 26, 24])
arr[ np.argsort( tosort ) ]
array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9])

Very interesting task! Here is my attempt at to solve the problem
import numpy as np
def groupsort(a: np.ndarray):
uniques, counts = np.unique(a, return_counts=True)
min_count = np.min(counts)
counts -= min_count
n_easy = min_count * len(uniques)
# Pre allocate array
values = np.empty(n_easy + counts.sum(), dtype=a.dtype)
# Set easy values
temp = values[:n_easy].reshape(min_count, len(uniques))
temp[:] = uniques
# Set hard values
i = n_easy
while np.any(mask := counts > 0): # Python 3.8 syntax
masksum = mask.sum()
values[i : i + masksum] = uniques[mask]
counts -= mask
i += masksum
return values
a = np.array(list(range(4)) * 2 + [0, 1, 2, 0, 1, 2, 0, 1, 1, 1])
np.random.shuffle(a)
print(a)
# [3 0 1 0 1 0 2 1 2 1 0 2 1 1 2 0 3 1]
print(groupsort(a))
# [0 1 2 3 0 1 2 3 0 1 2 0 1 2 0 1 1 1]
# Your input
a = np.array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4])
print(groupsort(a))
# [0 1 2 3 4 5 6 7 9 0 1 2 3 4 5 6 7 9]
The idea is to split the problem in two cases. One easy case and one hard case. The easy case is to handle inputs like this: a = [0,1,2,3,0,1,2,3], where the counts for each unique value are equal. Then you can simply count the number n of a specific value (e.g. 0), then just do list(range(max(a))) * n.
The hard case is to handle inputs such as a = [1,1,1,1,1,0,0,0,2,2]. Then the idea is to get the counts of each value, in this case counts = [3,5,2,0]. Then do:
values = np.empty(counts.sum())
i = 0
while np.any(mask := counts > 0): # Python 3.8 syntax
masksum = mask.sum()
values[i : i + masksum] = uniques[mask]
counts -= mask
i += masksum
In my solution you see I have combined the two solutions to optimize for speed. Assuming np.unique has a linear average time complexity, then this algorithm also has a linear average runtime complexity.

Here's a 2-liner:
unique, counts = np.unique(a, return_counts=True)
b = [x for y in [[u for i, u in enumerate(unique) if counts[i] > n] for n in range(counts.max())] for x in y]
Output:
>>> b
[0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9, 1, 5, 9]
#^ reset ^ reset ^ reset

I prefer using np.bincount instead of np.unique, np.sort or np.argsort because it's much faster in cases maximum item of data is small.
def count_out(arr, N):
bins = np.bincount(arr, minlength=N)
threshold_idx = np.unique(bins[bins!=0])
counts = np.diff(threshold_idx, prepend=0)
mask = (bins >= threshold_idx[:, None])
full_mask = np.repeat(mask, counts, axis=0)
blocks = np.repeat([np.arange(N)], np.sum(counts), axis=0)
return blocks[full_mask]
N = 10
X = np.array([3, 5, 3, 9, 9, 9, 9, 0, 0, 6, 8, 8, 7, 0, 5, 9, 7, 8, 1, 5, 8, 8, 1, 0, 7, 1, 9])
print(X)
print(count_out(X, N))
>>> [3 5 1 3 9 7 5 9 0 9 9 0 0 6 8 8 8 9 7 0 5 9 7 8 3 1 5 8 8 1 0 7 1 9 9 8]
>>> [0 1 3 5 6 7 8 9 0 1 3 5 7 8 9 0 1 3 5 7 8 9 0 1 5 7 8 9 0 8 9 8 9 8 9 9]
The key idea is to find counts of how many times does each block repeat. Then create unique mask for each block:
Blocks:
[[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]]
Unique masks:
[[1 1 0 1 0 1 1 1 1 1]
[1 1 0 1 0 1 0 1 1 1]
[1 1 0 0 0 1 0 1 1 1]
[1 0 0 0 0 0 0 0 1 1]
[0 0 0 0 0 0 0 0 1 1]
[0 0 0 0 0 0 0 0 0 1]]
Finally, reconstruct all the masks by the counts we've got.
Counts: [1 2 1 1 2 1]
Full masks:
[[1 1 0 1 0 1 1 1 1 1]
[1 1 0 1 0 1 0 1 1 1]
[1 1 0 1 0 1 0 1 1 1]
[1 1 0 0 0 1 0 1 1 1]
[1 0 0 0 0 0 0 0 1 1]
[0 0 0 0 0 0 0 0 1 1]
[0 0 0 0 0 0 0 0 1 1]
[0 0 0 0 0 0 0 0 0 1]]
By the way, it seems this can be optimised further. At first, creation of repetitive blocks is redundant since there should be a way to create a pointer to one single block. Secondly, it's slow in case full mask is sparse. In this case you should consider implementing your own way to repeat blocks with no masking.
I hope it's helpful for you at the current point.

IIUC, you want to have a duplicated, sorted, array.
Remove the duplicated values using numpy.unique, sort, and tile to the expected size:
a = np.array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4])
b = np.unique(a)
b = np.tile(b, len(a)//len(b))
output:
array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9])

Related

Find size of numpy array within a Panda dataframe in python

I would like to get the size of each numpy array within a panda. How do I do this?
I have:
x y z
0 [1, 2, 3, 4] [8, 9, 7] [8, 9, 7]
1 [2, 3, 4, 8] [9, 8, 1] [9, 8, 1, 6, 7, 8, 9]
2 [5, 6, 7] [3, 4, 1] [3, 4, 1]
cars= pd.DataFrame({'x': [[1,2,3,4],[2,3,4,8],[5,6,7]],
'y': [[8,9,7],[9,8,1],[3,4,1]],
'z': [[8,9,7],[9,8,1,6,7,8,9],[3,4,1]]})
I want:
x y z
0 4 3 3
1 4 3 7
2 3 3 3
I know how to get the shape and size of the entire DataFrame, but not how to combine them with size of each block.
print(cars)
print(cars.size)
print(cars.shape)

Use Series.str.len in DataFrame.apply for precessing all columns:
df = cars.apply(lambda x: x.str.len())
print (df)
x y z
0 4 3 3
1 4 3 7
2 3 3 3
If no missing values use DataFrame.applymap for element-wise apply function len :
df = cars.applymap(len)

Getting the index of the minimum value in each slice of `ndarray`

I am trying to do something that should be straightforward and can be accomplished in a for-loop but I am trying to avoid that.
I would like to get the index of the minimum value in each slice along a certain axis of a numpy.ndarray, a. I am more interested in the index than the value itself. I use the index to get a value from another 2D array with shape equal to the first 2 dimensions of a.
Here is a naive implementation using a for-loop:
a = np.random.randint(0, 10, 60).reshape(3, 4, 5)
print(a)
for i in range(a.shape[-1]):
idx = a[..., i].argmin()
print('Slice:', i, '| Index:', idx, '| min value:',
a[..., i].flat[idx])
Out:
[[[1 9 4 0 7]
[6 3 1 6 8]
[7 8 2 0 2]
[8 6 1 6 5]]
[[8 7 0 6 9]
[7 2 6 4 5]
[3 4 9 2 9]
[1 4 8 0 7]]
[[1 4 6 6 2]
[9 9 5 6 7]
[6 2 8 9 9]
[3 9 8 5 4]]]
Slice: 0 | Index: 0 | min value: 1
Slice: 1 | Index: 5 | min value: 2
Slice: 2 | Index: 4 | min value: 0
Slice: 3 | Index: 0 | min value: 0
Slice: 4 | Index: 2 | min value: 2
I realise I can pass an axis keyword argument to argmin but that does not produce the result I am looking for.

For the specific case given in your question, you can reshape your array, then use argmin:
>>> import numpy as np
>>> a = np.array([[[1, 9, 4, 0, 7],
... [6, 3, 1, 6, 8],
... [7, 8, 2, 0, 2],
... [8, 6, 1, 6, 5]],
...
... [[8, 7, 0, 6, 9],
... [7, 2, 6, 4, 5],
... [3, 4, 9, 2, 9],
... [1, 4, 8, 0, 7]],
...
... [[1, 4, 6, 6, 2],
... [9, 9, 5, 6, 7],
... [6, 2, 8, 9, 9],
... [3, 9, 8, 5, 4]]])
>>> a.reshape(-1, a.shape[2]).min(axis=0)
array([1, 2, 0, 0, 2])
>>> a.reshape(-1, a.shape[2]).argmin(axis=0)
array([0, 5, 4, 0, 2])
>>>
The shape[2] comes from the fact that this is the dimension (in this case, the inner dimension, or rows), where you don't want to calculate the minimum across: you're calculating the minimum across the first two dimensions.
You also need the slice number: basically just the second index of your elements. That is easy, since that one is sequential, and is just:
slices = np.arange(a.shape[2])

Writing a 3d numpy array that is readable in matlab

I'm trying to save a 3D numpy array to my disk so that I can later read it in matlab. I've had some difficulty using numpy.savetxt() on a 3D array, so my solution has been to first convert it to a 1D array using the following code:
import numpy
array = numpy.array([[0, 1, 2, 3],
[0, 1, 1, 3],
[3, 1, 3, 1]])
ndarray = numpy.dstack((array, array, array))
darray = ndarray.reshape(36,1)
numpy.savetxt('test.txt', darray, fmt = '%i')
Then in matlab it can be read with the following code:
file = fopen('test.txt')
array = fscanf(file, '%f')
My issue now is converting it back to the original shape. Using reshape(array, 3,4,3) yields the following:
ans(:,:,1) =
0 1 2 3
0 1 2 3
0 1 2 3
ans(:,:,2) =
0 1 1 3
0 1 1 3
0 1 1 3
ans(:,:,3) =
3 1 3 1
3 1 3 1
3 1 3 1
I've tried to transpose the 1D matlab array, then use reshape() but get the same array.
What matlab function can I apply to achieve my original python array?

You want to permute the dimensions. In numpy this is transpose. There are two complications - the 'F' order of MATLAB matrices, and the display pattern, using blocks on the last dimension (which is the outer one with F order). Jump to the end of this answer for details.
===
In [72]: arr = np.array([[0, 1, 2, 3],
...: [0, 1, 1, 3],
...: [3, 1, 3, 1]])
...:
In [80]: np.dstack((arr,arr+1))
Out[80]:
array([[[0, 1],
[1, 2],
[2, 3],
[3, 4]],
[[0, 1],
[1, 2],
[1, 2],
[3, 4]],
[[3, 4],
[1, 2],
[3, 4],
[1, 2]]])
In [81]: np.dstack((arr,arr+1)).shape
Out[81]: (3, 4, 2)
In [75]: from scipy.io import loadmat, savemat
In [76]: pwd
Out[76]: '/home/paul/mypy'
In [83]: savemat('test3',{'arr':arr, 'arr3':arr3})
In Octave
>> load 'test3.mat'
>> arr
arr =
0 1 2 3
0 1 1 3
3 1 3 1
>> arr3
arr3 =
ans(:,:,1) =
0 1 2 3
0 1 1 3
3 1 3 1
ans(:,:,2) =
1 2 3 4
1 2 2 4
4 2 4 2
>> size(arr3)
ans =
3 4 2
back in numpy I can display the array as 2 3x4 blocks with:
In [95]: arr3[:,:,0]
Out[95]:
array([[0, 1, 2, 3],
[0, 1, 1, 3],
[3, 1, 3, 1]])
In [96]: arr3[:,:,1]
Out[96]:
array([[1, 2, 3, 4],
[1, 2, 2, 4],
[4, 2, 4, 2]])
These arrays, ravelled to 1d (showing in effect the layout of values in the underlying databuffer):
In [100]: arr.ravel()
Out[100]: array([0, 1, 2, 3, 0, 1, 1, 3, 3, 1, 3, 1])
In [101]: arr3.ravel()
Out[101]:
array([0, 1, 1, 2, 2, 3, 3, 4, 0, 1, 1, 2, 1, 2, 3, 4, 3, 4, 1, 2, 3, 4, 1, 2])
The corresponding ravel in Octave:
>> arr(:).'
ans =
0 0 3 1 1 1 2 1 3 3 3 1
>> arr3(:).'
ans =
0 0 3 1 1 1 2 1 3 3 3 1 1 1 4 2 2 2 3 2 4 4 4 2
MATLAB uses F (fortran) order, with the first dimension changing fastest. Thus it is natural to display blocks arr(:,:i). You can specify order='F' when creating and working with numpy arrays. But it can be tricky keeping the order straight, especially when working with 3d. loadmat/savemat try to do some of the reordering for us. For example a 2d MATLAB matrix loads as an order F array in numpy.
In [107]: np.array([0,0,3,1,1,1,2,1,3,3,3,1])
Out[107]: array([0, 0, 3, 1, 1, 1, 2, 1, 3, 3, 3, 1])
In [108]: np.array([0,0,3,1,1,1,2,1,3,3,3,1]).reshape(4,3)
Out[108]:
array([[0, 0, 3],
[1, 1, 1],
[2, 1, 3],
[3, 3, 1]])
In [109]: np.array([0,0,3,1,1,1,2,1,3,3,3,1]).reshape(4,3).T
Out[109]:
array([[0, 1, 2, 3],
[0, 1, 1, 3],
[3, 1, 3, 1]])
In [111]: np.array([0,0,3,1,1,1,2,1,3,3,3,1]).reshape((3,4),order='F')
Out[111]:
array([[0, 1, 2, 3],
[0, 1, 1, 3],
[3, 1, 3, 1]])
It might easier to keep track of shapes with this array:
In [112]: arr3 = np.arange(2*3*4).reshape(2,3,4)
In [113]: arr3f = np.arange(2*3*4).reshape(2,3,4, order='F')
In [114]: arr3
Out[114]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [115]: arr3f
Out[115]:
array([[[ 0, 6, 12, 18],
[ 2, 8, 14, 20],
[ 4, 10, 16, 22]],
[[ 1, 7, 13, 19],
[ 3, 9, 15, 21],
[ 5, 11, 17, 23]]])
In [116]: arr3f.ravel()
Out[116]:
array([ 0, 6, 12, 18, 2, 8, 14, 20, 4, 10, 16, 22, 1, 7, 13, 19, 3,
9, 15, 21, 5, 11, 17, 23])
In [117]: arr3f.ravel(order='F')
Out[117]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23])
In [118]: savemat('test3',{'arr3':arr3, 'arr3f':arr3f})
In Octave:
>> arr3
arr3 =
ans(:,:,1) =
0 4 8
12 16 20
ans(:,:,2) =
1 5 9
13 17 21
....
>> arr3f
arr3f =
ans(:,:,1) =
0 2 4
1 3 5
ans(:,:,2) =
6 8 10
7 9 11
...
>> arr3.ravel()'
error: int32 matrix cannot be indexed with .
>> arr3(:)'
ans =
Columns 1 through 20:
0 12 4 16 8 20 1 13 5 17 9 21 2 14 6 18 10 22 3 15
Columns 21 through 24:
7 19 11 23
>> arr3f(:)'
ans =
Columns 1 through 20:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Columns 21 through 24:
20 21 22 23
arr3f still looks 'messedup' when printed by blocks, but when raveled we see that values are in same F order. That's also evident if we print the last 'block' of the numpy array:
In [119]: arr3f[:,:,0]
Out[119]:
array([[0, 2, 4],
[1, 3, 5]])
So to match up numpy and matlab we have to keep 2 things straight - the order, and the block display style.
My MATLAB is rusty, but I found permute with is similar to the np.transpose. Using that to reorder the dimensions:
>> permute(arr3,[3,2,1])
ans =
ans(:,:,1) =
0 4 8
1 5 9
2 6 10
3 7 11
ans(:,:,2) =
12 16 20
13 17 21
14 18 22
15 19 23
>> permute(arr3,[3,2,1])(:)'
ans =
Columns 1 through 20:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Columns 21 through 24:
20 21 22 23
The equivalent transpose in numpy
In [121]: arr3f.transpose(2,1,0).ravel()
Out[121]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23])
(Sorry for the rambling answer. I may go back an edit it. Hopefully it gives you something to work with.)
===
Let's try to apply that rambling more explicitly to your case
In [122]: x = np.array([[0, 1, 2, 3],
...: [0, 1, 1, 3],
...: [3, 1, 3, 1]])
...:
In [123]: x3 = np.dstack((x,x,x))
In [125]: dx3 = x3.reshape(36,1)
In [126]: np.savetxt('test3.txt',dx3, fmt='%i')
In [127]: cat test3.txt
0
0
0
....
3
3
1
1
1
In Octave
>> file = fopen('test3.txt')
file = 21
>> array = fscanf(file,'%f')
array =
0
0
....
>> reshape(array,3,4,3)
ans =
ans(:,:,1) =
0 1 2 3
0 1 2 3
0 1 2 3
ans(:,:,2) =
0 1 1 3
0 1 1 3
0 1 1 3
ans(:,:,3) =
3 1 3 1
3 1 3 1
3 1 3 1
and with the perumtation
>> permute(reshape(array,3,4,3),[3,2,1])
ans =
ans(:,:,1) =
0 1 2 3
0 1 1 3
3 1 3 1
ans(:,:,2) =
0 1 2 3
0 1 1 3
3 1 3 1
ans(:,:,3) =
0 1 2 3
0 1 1 3
3 1 3 1

Python NumPy: Performing different column operations over every N rows

I have a large NumPy array (OriginalArray) with many rows and 8 columns.
I want to create a new array (NewArray) in which each row has the following properties:
Columns 1, 3, 5, and 7 of NewArray are the sum over N rows of columns 1, 3, 5, and 7 of OriginalArray
Columns 2, 4, 6, and 8 of NewArray are the mean over N rows of columns 2, 4, 6, and 8 of OriginalArray
So, the NewArray has 1/N as many rows as the OriginalArray.
For example:
Original Array = [1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 ]
with N = 2
NewArray = [2 1 2 1 2 1 2 1
2 1 2 1 2 1 2 1]
Please excuse the messy formatting. I'm still very new at this (my first question here, actually).
Thanks!

Here's a vectorized approach making heavy usage of slicing -
nrows = a.shape[0]//N # a is input array
out = np.empty((nrows,8))
out[:,::2] = a[:,::2].reshape(-1,N,4).sum(1)
out[:,1::2] = a[:,1::2].reshape(-1,N,4).mean(1)
Sample run -
In [64]: a # Input array
Out[64]:
array([[5, 1, 5, 8, 5, 0, 3, 1],
[0, 7, 8, 7, 0, 3, 5, 1],
[8, 6, 6, 4, 1, 6, 1, 2],
[4, 5, 5, 7, 5, 2, 1, 2]])
In [65]: N = 2 # Summing/averaging length
In [66]: a[:,::2] # Select [1,3,5,7] cols
Out[66]:
array([[5, 5, 5, 3],
[0, 8, 0, 5],
[8, 6, 1, 1],
[4, 5, 5, 1]])
In [67]: a[:,::2].reshape(-1,N,4).sum(1) # Sum N rows by splitting axis
Out[67]:
array([[ 5, 13, 5, 8],
[12, 11, 6, 2]])
In [68]: a[:,1::2] # Select [2,4,6,8] cols
Out[68]:
array([[1, 8, 0, 1],
[7, 7, 3, 1],
[6, 4, 6, 2],
[5, 7, 2, 2]])
In [69]: a[:,1::2].reshape(-1,N,4).mean(1) # Similarly average across N rows
Out[69]:
array([[ 4. , 7.5, 1.5, 1. ],
[ 5.5, 5.5, 4. , 2. ]])

I'm assuming that your original_array (note the PEP8 style) is already formatted in rows and columns. By this I mean, original_array = np.array([[1,1...],[1,...],[1,...],[1,...]])
An easy one-liner to create a single row of new_array would be as follows:
import numpy as np
row = [np.sum(original_array[:,x]) if x%2==1 else np.mean(test[:,x]) for x in range(len(original_array[0]))]
And then to copy the row, simply:
new_array = [row]*N

numpy.tile did not work as Matlab repmat

According to What is the equivalent of MATLAB's repmat in NumPy, I tried to build 3x3x5 array from 3x3 array using python.
In Matlab, this work as I expected.
a = [1,1,1;1,2,1;1,1,1];
a_= repmat(a,[1,1,5]);
size(a_) = 3 3 5
But for numpy.tile
b = numpy.array([[1,1,1],[1,2,1],[1,1,1]])
b_ = numpy.tile(b, [1,1,5])
b_.shape = (1, 3, 15)
If I want to generate the same array as in Matlab, what is the equivalent?
Edit 1
The output I would expect to get is
b_(:,:,1) =
1 1 1
1 2 1
1 1 1
b_(:,:,2) =
1 1 1
1 2 1
1 1 1
b_(:,:,3) =
1 1 1
1 2 1
1 1 1
b_(:,:,4) =
1 1 1
1 2 1
1 1 1
b_(:,:,5) =
1 1 1
1 2 1
1 1 1
but what #farenorth and the numpy.dstack give is
[[[1 1 1 1 1]
[1 1 1 1 1]
[1 1 1 1 1]]
[[1 1 1 1 1]
[2 2 2 2 2]
[1 1 1 1 1]]
[[1 1 1 1 1]
[1 1 1 1 1]
[1 1 1 1 1]]]

NumPy functions are not, in general, 'drop-in' replacements for matlab functions. Often times there are subtle difference to how the 'equivalent' functions are used. It does take time to adapt, but I've found the transition to be very worthwhile.
In this case, the np.tile documentation indicates what happens when you are trying to tile an array to higher dimensions than it is defined,
numpy.tile(A, reps)
Construct an array by repeating A the number of times given by reps.
If reps has length d, the result will have dimension of max(d, A.ndim).
If A.ndim < d, A is promoted to be d-dimensional by prepending new axes. So a shape (3,) array is promoted to (1, 3) for 2-D replication, or shape (1, 1, 3) for 3-D replication. If this is not the desired behavior, promote A to d-dimensions manually before calling this function.
In this case, your array is being cast to a shape of [1, 3, 3], then being tiled. So, to get your desired behavior just be sure to append a new singleton-dimension to the array where you want it,
>>> b_ = numpy.tile(b[..., None], [1, 1, 5])
>>> print(b_.shape)
(3, 3, 5)
Note here that I've used None (i.e. np.newaxis) and ellipses notation to specify a new dimension at the end of the array. You can find out more about these capabilities here.
Another option, which is inspired by the OP's comment would be:
b_ = np.dstack((b, ) * 5)
In this case, I've used tuple multiplication to 'repmat' the array, which is then constructed by np.dstack.
As #hpaulj indicated, Matlab and NumPy display matrices differently. To replicate the Matlab output you can do something like:
>>> for idx in xrange(b_.shape[2]):
... print 'b_[:, :, {}] = \n{}\n'.format(idx, str(b_[:, :, idx]))
...
b_[:, :, 0] =
[[1 1 1]
[1 2 1]
[1 1 1]]
b_[:, :, 1] =
[[1 1 1]
[1 2 1]
[1 1 1]]
b_[:, :, 2] =
[[1 1 1]
[1 2 1]
[1 1 1]]
b_[:, :, 3] =
[[1 1 1]
[1 2 1]
[1 1 1]]
b_[:, :, 4] =
[[1 1 1]
[1 2 1]
[1 1 1]]
Good luck!

Let's try the comparison, taking care to diversify the shapes and values.
octave:7> a=reshape(0:11,3,4)
a =
0 3 6 9
1 4 7 10
2 5 8 11
octave:8> repmat(a,[1,1,2])
ans =
ans(:,:,1) =
0 3 6 9
1 4 7 10
2 5 8 11
ans(:,:,2) =
0 3 6 9
1 4 7 10
2 5 8 11
numpy equivalent - more or less:
In [61]: a=np.arange(12).reshape(3,4)
In [62]: np.tile(a,[2,1,1])
Out[62]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]])
numpy again, but with order F to better match the MATLAB Fortran-derived layout
In [63]: a=np.arange(12).reshape(3,4,order='F')
In [64]: np.tile(a,[2,1,1])
Out[64]:
array([[[ 0, 3, 6, 9],
[ 1, 4, 7, 10],
[ 2, 5, 8, 11]],
[[ 0, 3, 6, 9],
[ 1, 4, 7, 10],
[ 2, 5, 8, 11]]])
I'm adding the new numpy dimension at the start, because in many ways it better replicates the MATLAB practice of adding it at the end.
Try adding the new dimension at the end. The shape is (3,4,5), but you might not like the display.
np.tile(a[:,:,None],[1,1,2])
Another consideration - what happens when you flatten the tile?
octave:10> repmat(a,[1,1,2])(:).'
ans =
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10 11
with the order F a
In [78]: np.tile(a[:,:,None],[1,1,2]).flatten()
Out[78]:
array([ 0, 0, 3, 3, 6, 6, 9, 9, 1, 1, 4, 4, 7, 7, 10, 10, 2,
2, 5, 5, 8, 8, 11, 11])
In [79]: np.tile(a,[2,1,1]).flatten()
Out[79]:
array([ 0, 3, 6, 9, 1, 4, 7, 10, 2, 5, 8, 11, 0, 3, 6, 9, 1,
4, 7, 10, 2, 5, 8, 11])
with a C order array:
In [80]: a=np.arange(12).reshape(3,4)
In [81]: np.tile(a,[2,1,1]).flatten()
Out[81]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11])
This last one matches the Octave layout.
So does:
In [83]: a=np.arange(12).reshape(3,4,order='F')
In [84]: np.tile(a[:,:,None],[1,1,2]).flatten(order='F')
Out[84]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11])
Confused yet?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sort array with repeated values - python

Here's a 2-liner: unique, counts = np.unique(a, return_counts=True) b = [x for y in [[u for i, u in enumerate(unique) if counts[i] > n] for n in range(counts.max())] for x in y] Output: >>> b [0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9, 1, 5, 9] #^ reset ^ reset ^ reset

Related

Find size of numpy array within a Panda dataframe in python

Getting the index of the minimum value in each slice of `ndarray`

Writing a 3d numpy array that is readable in matlab

Python NumPy: Performing different column operations over every N rows

numpy.tile did not work as Matlab repmat

Categories

Resources