Inspired by the post How to create a sequence of sequences of numbers in R?.
Question:
I would like to make the following sequence in NumPy.
[1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5]
I have tried the following:
Non-generic and hard coding using np.r_
np.r_[1:6, 2:6, 3:6, 4:6, 5:6]
# array([1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5])
Pure Python to generate the desired array.
n = 5
a = np.r_[1:n+1]
[i for idx in range(a.shape[0]) for i in a[idx:]]
# [1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5]
Create a 2D array and take the upper triangle from it.
n = 5
a = np.r_[1:n+1]
arr = np.tile(a, (n, 1))
print(arr)
# [[1 2 3 4 5]
# [1 2 3 4 5]
# [1 2 3 4 5]
# [1 2 3 4 5]
# [1 2 3 4 5]]
o = np.triu(arr).flatten()
# array([1, 2, 3, 4, 5,
# 0, 2, 3, 4, 5,
# 0, 0, 3, 4, 5, # This is 1D array
# 0, 0, 0, 4, 5,
# 0, 0, 0, 0, 5])
out = o[o > 0]
# array([1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5])
The above solution is generic but I want to know if there's a more efficient way to do it in NumPy.
I'm not sure if this is a good idea but I tried running it against your python method and it seems to be faster.
np.concatenate([np.arange(i, n+1) for i in range(1, n+1)])
Here is the full code:
import numpy as np
from time import time
n = 5000
t = time()
c = np.concatenate([np.arange(i, n+1) for i in range(1, n+1)])
print(time() - t)
# 0.039876699447631836
t = time()
a = np.r_[1:n+1]
b = np.array([i for idx in range(a.shape[0]) for i in a[idx:]])
print(time() - t)
# 2.0875167846679688
print(all(b == c))
# True
A really plain Python (no numpy) way is:
n = 5
a = [r for start in range(1, n+1) for r in range(start, n+1)]
This will be faster for small n (~150) but slower than #tangolin's solution for larger n. It is still faster than the OP's "pure python" way.
A faster implementation prepares the data in advance, avoiding creating a new range each time :
source = np.arange(1, n+1)
d = np.concatenate([source[i: n+1] for i in range(0, n)])
NOTE
My original implementation both allocated space for the return value and prepared the data in advance, but it was not pythonic. I changed it to use concatenate after reading #tangolin's answer and noticed that concatenate does the same.
Original implementation:
e = np.empty((n*(n+1)//2, ), dtype='int64')
source = np.arange(1, n+1)
for i in range(n):
init = n * i - i*(i-1)//2
end = n - i + init
e[init:end] = source[i:n]
Related
I have a 1d array of ids, for example:
a = [1, 3, 4, 7, 9]
Then another 2d array:
b = [[1, 4, 7, 9], [3, 7, 9, 1]]
I would like to have a third array with the same shape of b where each item is the index of the corresponding item from a, that is:
c = [[0, 2, 3, 4], [1, 3, 4, 0]]
What's a vectorized way to do that using numpy?
this may not make sense but ... you can use np.interp to do that ...
a = [1, 3, 4, 7, 9]
sorting = np.argsort(a)
positions = np.arange(0,len(a))
xp = np.array(a)[sorting]
fp = positions[sorting]
b = [[1, 4, 7, 9], [3, 7, 9, 1]]
c = np.rint(np.interp(b,xp,fp)) # rint is better than astype(int) because floats are tricky.
# but astype(int) should work faster for small len(a) but not recommended.
this should work as long as the len(a) is smaller than the largest representable int by float (16,777,217) .... and this algorithm is of O(n*log(n)) speed, (or rather len(b)*log(len(a)) to be precise)
Effectively, this solution is a one-liner. The only catch is that you need to reshape the array before you do the one-liner, and then reshape it back again:
import numpy as np
a = np.array([1, 3, 4, 7, 9])
b = np.array([[1, 4, 7, 9], [3, 7, 9, 1]])
original_shape = b.shape
c = np.where(b.reshape(b.size, 1) == a)[1]
c = c.reshape(original_shape)
This results with:
[[0 2 3 4]
[1 3 4 0]]
Broadcasting to the rescue!
>>> ((np.arange(1, len(a) + 1)[:, None, None]) * (a[:, None, None] == b)).sum(axis=0) - 1
array([[0, 2, 3, 4],
[1, 3, 4, 0]])
I am trying to implement 1D-convolution for signals.
It should have the same output as:
ary1 = np.array([1, 1, 2, 2, 1])
ary2 = np.array([1, 1, 1, 3])
conv_ary = np.convolve(ary2, ary1, 'full')
>>>> [1 2 4 8 8 9 7 3]
I came up with this approach:
def convolve_1d(signal, kernel):
n_sig = signal.size
n_ker = kernel.size
n_conv = n_sig - n_ker + 1
# by a factor of 3.
rev_kernel = kernel[::-1].copy()
result = np.zeros(n_conv, dtype=np.double)
for i in range(n_conv):
result[i] = np.dot(signal[i: i + n_ker], rev_kernel)
return result
But my result is [8,8] I might have to zero pad my array instead and change its indexing.
Is there a smoother way to achieve the desired outcome?
Here is a possible solution:
def convolve_1d(signal, kernel):
kernel = kernel[::-1]
return [
np.dot(
signal[max(0,i):min(i+len(kernel),len(signal))],
kernel[max(-i,0):len(signal)-i*(len(signal)-len(kernel)<i)],
)
for i in range(1-len(kernel),len(signal))
]
Here is an example:
>>> convolve_1d([1, 1, 2, 2, 1], [1, 1, 1, 3])
[1, 2, 4, 8, 8, 9, 7, 3]
If I have a function which randomly returns 2D lists of the same size, how would I "tile" them together?
For example, if I generate 4 2D lists which are 3 by 3 in size, how would I combine them in a 2 by 2 arrangement into a 6 by 6 2D list?
[[0,0,0], [[1,1,1], [[2,2,2], [[3,3,3],
[0,0,0], + [1,1,1], + [2,2,2], + [3,3,3],
[0,0,0]] [1,1,1]] [2,2,2]] [3,3,3]]
Arranged h=2 by w=2 makes:
[[0,0,0,1,1,1],
[0,0,0,1,1,1],
[0,0,0,1,1,1],
[2,2,2,3,3,3],
[2,2,2,3,3,3],
[2,2,2,3,3,3]]
In my case the individual lists are generated randomly and returned by a function which takes width and height as arguments.
I need to specify some dimensions (h and w) and arrange (h*w) random sub-grids into an h by w super-grid. The order/specific arrangement of the sub-grids doesn't matter, one after the other or any other arrangement is fine.
How would I go about doing this if I want a function that takes as arguments width and height of the super-grid, and width and height of the sub-grids?
It's pretty much just a matter of grinding through the appropriate subsets of the list of subgrids. (There might be some elegant way to do this in a single nested comprehension, but I'm not seeing it, lol.)
from typing import List, TypeVar
_GridItem = TypeVar('_GridItem')
def tile(
h: int,
w: int,
subgrids: List[List[List[_GridItem]]]
) -> List[List[_GridItem]]:
tiled_grid: List[List[_GridItem]] = []
for _ in range(h):
tile_row = subgrids[:w]
subgrids = subgrids[w:]
for i in range(len(tile_row[0])):
tiled_grid.append([x for subgrid in tile_row for x in subgrid[i]])
return tiled_grid
This is sufficiently flexible that it'll work with any h, w, and conforming subgrids (i.e. len(subgrids) == w*h, and the dimensions of all subgrids are the same -- it wouldn't be too hard to add a couple of checks to the function to enforce this). Here's an example of using this function to tile 8 4x2 subgrids in a 2x4 layout:
print("\n".join(map(str, tile(2, 4, [
[[n for _ in range(2)] for _ in range(4)] for n in range(8)
]))))
yields:
[0, 0, 1, 1, 2, 2, 3, 3]
[0, 0, 1, 1, 2, 2, 3, 3]
[0, 0, 1, 1, 2, 2, 3, 3]
[0, 0, 1, 1, 2, 2, 3, 3]
[4, 4, 5, 5, 6, 6, 7, 7]
[4, 4, 5, 5, 6, 6, 7, 7]
[4, 4, 5, 5, 6, 6, 7, 7]
[4, 4, 5, 5, 6, 6, 7, 7]
I want to replace all the items of sequence with ids that tell which list of labeller they are in. Assume that all the values are distinct in both sequence and labeller and a union of lists of labeller has the same items as sequence. lsizes corresponds to the sizes of lists in labeller and is redundant for Pythonic solution but might be compulsory for solution to be vectorised fully.
sequence = [1, 2, 10, 5, 6, 4, 3, 8, 7, 9],
labeller = [[1, 2, 10], [3, 4, 5, 6, 7], [8, 9]]
lsizes = [3, 5, 2]
I know how to solve it in a simple way:
idx = {u:i for i, label in enumerate(labeller) for u in label}
tags = [idx[u] for u in sequence]
And the output is:
tags = [0, 0, 0, 1, 1, 1, 1, 2, 1, 2]
After that I put all my efforts to do it in vectorised way. It's quite complicated for me. This is my attempt, done rather by a guess, but, unfortunately, it doesn't pass all my tests. I hope I'm close:
sequence = np.array(sequence)
cl = np.concatenate(labeller)
_, cl_idx = np.unique(cl, return_index=True)
_, idx = np.unique(sequence[cl_idx], return_index=True)
tags = np.repeat(np.arange(len(lsizes)), lsizes)[idx]
#output: [0 0 1 1 0 1 1 1 2 2]
How can I finish it? I would also like to see rigour explanation what it does and how to understand it better. Any sources are also welcome.
Approach #1
For those tracing back problems, searchsorted seems to be the way to go and works here too, re-using your cl -
cl = np.concatenate(labeller)
sidx = cl.argsort()
idx = np.searchsorted(cl, sequence, sorter=sidx)
idx0 = sidx[idx]
l = list(map(len, labeller))
r = np.repeat(np.arange(len(l)), l)
out = r[idx0]
Using lsizes for l makes it fully vectorized. But, I suspect the concatenation step might be heavy. Whether this is worth it or not would depend a lot on the lengths of the subarrays.
Approach #2
For positive numbers, here's one with array-indexing as a hashing mechanism -
N = max(map(max, labeller))+1
id_ar = np.zeros(N, dtype=int) # use np.empty for perf. boost
for i,l in enumerate(labeller):
id_ar[l] = i
out = id_ar[sequence]
sequence = [1, 2, 10, 5, 6, 4, 3, 8, 7, 9]
labeller = [[1, 2, 10], [3, 4, 5, 6, 7], [8, 9]]
lsizes = [3, 5, 2]
sequence_array = np.array(sequence)
labeller_array = np.array(labeller).sum()
index_array = np.repeat(list(range(len(lsizes))), lsizes)
np.apply_along_axis(lambda num : index_array[np.where(labeller_array == num)[0]], 0, sequence_array[None, :])
# output: array([[0, 0, 0, 1, 1, 1, 1, 2, 1, 2]])
Alternative:
label_df = pd.DataFrame({'label':labeller_array, 'index':index_array})
seq_df = pd.DataFrame({'seq':sequence_array})
seq_df.merge(label_df, left_on = 'seq', right_on = 'label')['index'].tolist()
#output: [0, 0, 0, 1, 1, 1, 1, 2, 1, 2]
In case of differential evolution, during mutation, the formula that is used most often is
arr[a] = (arr[b] + M * (arr[c] - arr[d])) % arr.shape[1]
Where
arr is a 2d array consisting of non-negative integers such that all elements in each row are unique,
a represents each row of arr,
M is the mutation constant ranging between 0 and 2 and
b, c and d are 3 unique random numbers.
However, on using this formula, I see that arr[a] has duplicate values based on the values of arr[b], arr[c] and arr[d]. I wish to have only unique numbers in arr[a]. How is it possible using Numpy?
e.g.
arr[a] = [2, 8, 4, 9, 1, 6, 7, 3, 0, 5]
arr[b] = [3, 5, 1, 2, 9, 8, 0, 6, 7, 4]
arr[c] = [2, 3, 8, 4, 5, 1, 0, 6, 9, 7]
arr[d] = [6, 1, 9, 2, 7, 5, 8, 0, 3, 4]
On applying the formula, arr[a] becomes [9, 7, 0, 4, 7, 4, 2, 2, 3, 7]. But I want it to have only unique numbers between 0 and arr.shape[1]. I am open to modifying the mutation function if needed if M, arr[b], arr[c] and arr[d] are all used meaningfully.
This is a rather different approach to the problem, but since you seem to be working with permutations, I am not sure numerical differences are that meaningful. You can however see the problem in terms of permutations, that is, reordering of vectors. Instead of the difference between two vectors, you may consider the permutation that takes you from one vector to the other, and instead of the addition of two vectors, you may consider applying a permutation to a vector. If you want to have an M parameter, maybe that could be the number of times you apply the permutation? (assuming that is a non-negative integer)
Here is the basic idea of how you could implement this:
import numpy as np
# Finds the permutation that takes you from vector a to vector b.
# Returns a vector p such that a[p] = b.
def permutation_diff(a, b):
p = np.zeros_like(a)
p[a] = np.arange(len(p), dtype=p.dtype)
return p[b]
# Applies permutation p to vector a, m times.
def permutation_apply(a, p, m=1):
out = a.copy()
for _ in range(m):
out = out[p]
return out
# Combination function
def combine(b, c, d, m):
return permutation_apply(b, permutation_diff(d, c), m)
# Test
b = np.array([3, 5, 1, 2, 9, 8, 0, 6, 7, 4])
c = np.array([2, 3, 8, 4, 5, 1, 0, 6, 9, 7])
d = np.array([6, 1, 9, 2, 7, 5, 8, 0, 3, 4])
m = 1
a = combine(b, c, d, m)
print(a)
# [2 7 0 4 8 5 6 3 1 9]
Since you are working with many vectors arranged in a matrix, you may prefer vectorized versions of the above functions. You can have that with something like this (here I assume M is a fixed parameter for the whole algorithm, not per individual):
import numpy as np
# Finds the permutations that takes you from vectors in a to vectors in b.
def permutation_diff_vec(a, b):
p = np.zeros_like(a)
i = np.arange(len(p))[:, np.newaxis]
p[i, a] = np.arange(p.shape[-1], dtype=p.dtype)
return p[i, b]
# Applies permutations in p to vectors a, m times.
def permutation_apply_vec(a, p, m=1):
out = a.copy()
i = np.arange(len(out))[:, np.newaxis]
for _ in range(m):
out = out[i, p]
return out
# Combination function
def combine_vec(b, c, d, m):
return permutation_apply_vec(b, permutation_diff_vec(d, c), m)
# Test
np.random.seed(100)
arr = np.array([[2, 8, 4, 9, 1, 6, 7, 3, 0, 5],
[3, 5, 1, 2, 9, 8, 0, 6, 7, 4],
[2, 3, 8, 4, 5, 1, 0, 6, 9, 7],
[6, 1, 9, 2, 7, 5, 8, 0, 3, 4]])
n = len(arr)
b = arr[np.random.choice(n, size=n)]
c = arr[np.random.choice(n, size=n)]
d = arr[np.random.choice(n, size=n)]
m = 1
arr[:] = combine_vec(b, c, d, m)
print(arr)
# [[3 6 0 2 5 1 4 7 8 9]
# [6 1 9 2 7 5 8 0 3 4]
# [6 9 2 3 5 0 4 1 8 7]
# [2 6 5 4 1 9 8 0 7 3]]
Try to do this:
list(set(arr[a]))
here is a example what could do:
array = np.array([9, 7, 0, 4, 7, 4, 2, 2, 3, 7])
shape = array.shape[0]
array = list(set(array))
for i in range(shape):
if i not in array:
array.append(i)
array = np.array(array)
If you want to fill in the index of the numbers are duplicate, the logical is a little diffent. But the idea is that. I hope I helped you.