If I have a function which randomly returns 2D lists of the same size, how would I "tile" them together?
For example, if I generate 4 2D lists which are 3 by 3 in size, how would I combine them in a 2 by 2 arrangement into a 6 by 6 2D list?
[[0,0,0], [[1,1,1], [[2,2,2], [[3,3,3],
[0,0,0], + [1,1,1], + [2,2,2], + [3,3,3],
[0,0,0]] [1,1,1]] [2,2,2]] [3,3,3]]
Arranged h=2 by w=2 makes:
[[0,0,0,1,1,1],
[0,0,0,1,1,1],
[0,0,0,1,1,1],
[2,2,2,3,3,3],
[2,2,2,3,3,3],
[2,2,2,3,3,3]]
In my case the individual lists are generated randomly and returned by a function which takes width and height as arguments.
I need to specify some dimensions (h and w) and arrange (h*w) random sub-grids into an h by w super-grid. The order/specific arrangement of the sub-grids doesn't matter, one after the other or any other arrangement is fine.
How would I go about doing this if I want a function that takes as arguments width and height of the super-grid, and width and height of the sub-grids?
It's pretty much just a matter of grinding through the appropriate subsets of the list of subgrids. (There might be some elegant way to do this in a single nested comprehension, but I'm not seeing it, lol.)
from typing import List, TypeVar
_GridItem = TypeVar('_GridItem')
def tile(
h: int,
w: int,
subgrids: List[List[List[_GridItem]]]
) -> List[List[_GridItem]]:
tiled_grid: List[List[_GridItem]] = []
for _ in range(h):
tile_row = subgrids[:w]
subgrids = subgrids[w:]
for i in range(len(tile_row[0])):
tiled_grid.append([x for subgrid in tile_row for x in subgrid[i]])
return tiled_grid
This is sufficiently flexible that it'll work with any h, w, and conforming subgrids (i.e. len(subgrids) == w*h, and the dimensions of all subgrids are the same -- it wouldn't be too hard to add a couple of checks to the function to enforce this). Here's an example of using this function to tile 8 4x2 subgrids in a 2x4 layout:
print("\n".join(map(str, tile(2, 4, [
[[n for _ in range(2)] for _ in range(4)] for n in range(8)
]))))
yields:
[0, 0, 1, 1, 2, 2, 3, 3]
[0, 0, 1, 1, 2, 2, 3, 3]
[0, 0, 1, 1, 2, 2, 3, 3]
[0, 0, 1, 1, 2, 2, 3, 3]
[4, 4, 5, 5, 6, 6, 7, 7]
[4, 4, 5, 5, 6, 6, 7, 7]
[4, 4, 5, 5, 6, 6, 7, 7]
[4, 4, 5, 5, 6, 6, 7, 7]
Related
I want to solve this problem, but this isn't my issue. I only give this as context.
"You are given an integer array height of length n. There are n vertical lines drawn such that the two endpoints of the ith line are (i, 0) and (i, height[i]).
Find two lines that together with the x-axis form a container, such that the container contains the most water.
Return the maximum amount of water a container can store."
The above vertical lines are represented by array [1,8,6,2,5,4,8,3,7]. In this case, the max area of water (blue section) the container can contain is 49.
I made a simple nested for loop to solve this problem:
for i in range(0, len(height)):
for j in range(0, len(height)):
maxim = max(min(height[i], height[j]) * abs(j - i),maxim)
But this solution takes too long for a bigger array. So I tried to do this with List Comprehension:
mxm = [min(height[i], height[j] * abs(j - i)) for i in range(0, len(height)) for j in range(0, len(height))]
maxim = max(mxm)
The problem is , I have 2 different outputs: the nested for loop works (it returns 49) but the second one returns 8. (the mxm array has these elements: [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 6, 4, 8, 8, 8, 8, 8, 2, 6, 0, 2, 6, 6, 6, 6, 6, 2, 2, 2, 0, 2, 2, 2, 2, 2, 4, 5, 5, 2, 0, 4, 5, 5, 5, 4, 4, 4, 4, 4, 0, 4, 4, 4, 6, 8, 8, 6, 8, 4, 0, 3, 8, 3, 3, 3, 3, 3, 3, 3, 0, 3, 7, 7, 7, 7, 7, 7, 7, 3, 0])
Why are they different? And how can I make my solution faster?
In the first example you're applying the min function to just the height values
min(height[i], height[j])
In the second you include the absolute distance between index positions in that min function as well, it'll always apply to height[j] instead of the actual minimum.
min(height[i], height[j] * abs(j - i))
Also regarding your solution, I believe ive seen this problem before. I think what you're looking for is a sliding window.
Inspired by the post How to create a sequence of sequences of numbers in R?.
Question:
I would like to make the following sequence in NumPy.
[1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5]
I have tried the following:
Non-generic and hard coding using np.r_
np.r_[1:6, 2:6, 3:6, 4:6, 5:6]
# array([1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5])
Pure Python to generate the desired array.
n = 5
a = np.r_[1:n+1]
[i for idx in range(a.shape[0]) for i in a[idx:]]
# [1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5]
Create a 2D array and take the upper triangle from it.
n = 5
a = np.r_[1:n+1]
arr = np.tile(a, (n, 1))
print(arr)
# [[1 2 3 4 5]
# [1 2 3 4 5]
# [1 2 3 4 5]
# [1 2 3 4 5]
# [1 2 3 4 5]]
o = np.triu(arr).flatten()
# array([1, 2, 3, 4, 5,
# 0, 2, 3, 4, 5,
# 0, 0, 3, 4, 5, # This is 1D array
# 0, 0, 0, 4, 5,
# 0, 0, 0, 0, 5])
out = o[o > 0]
# array([1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5])
The above solution is generic but I want to know if there's a more efficient way to do it in NumPy.
I'm not sure if this is a good idea but I tried running it against your python method and it seems to be faster.
np.concatenate([np.arange(i, n+1) for i in range(1, n+1)])
Here is the full code:
import numpy as np
from time import time
n = 5000
t = time()
c = np.concatenate([np.arange(i, n+1) for i in range(1, n+1)])
print(time() - t)
# 0.039876699447631836
t = time()
a = np.r_[1:n+1]
b = np.array([i for idx in range(a.shape[0]) for i in a[idx:]])
print(time() - t)
# 2.0875167846679688
print(all(b == c))
# True
A really plain Python (no numpy) way is:
n = 5
a = [r for start in range(1, n+1) for r in range(start, n+1)]
This will be faster for small n (~150) but slower than #tangolin's solution for larger n. It is still faster than the OP's "pure python" way.
A faster implementation prepares the data in advance, avoiding creating a new range each time :
source = np.arange(1, n+1)
d = np.concatenate([source[i: n+1] for i in range(0, n)])
NOTE
My original implementation both allocated space for the return value and prepared the data in advance, but it was not pythonic. I changed it to use concatenate after reading #tangolin's answer and noticed that concatenate does the same.
Original implementation:
e = np.empty((n*(n+1)//2, ), dtype='int64')
source = np.arange(1, n+1)
for i in range(n):
init = n * i - i*(i-1)//2
end = n - i + init
e[init:end] = source[i:n]
I'm playing a bingo and I choose 15 balls out of 50 possible balls (without replacement).
Then 30 balls get drawn (without replacement), and if my 15 balls are in this set of 30 balls drawn, I get a prize.
I wanted to calculate the probability of winning a prize by simulating this many times, preferably vectorized.
So here's my code until now:
import numpy as np
my_chosen_balls = np.random.choice(range(1,51), 15)
samples_30_balls = np.random.choice(range(1,51), (1_000_000, 30))
How do I compare the 15 balls that i chose with any of these 30 ball samples and see if all my balls were picked?
So compare my 15 balls to every one of the samples separately.
Here is a smaller example to visualize with:
my_chosen_balls = np.array([7, 4, 3])
sample_5_balls = np.array([[5, 5, 5, 6, 6, 4, 1, 1, 1, 8, 2, 3, 2, 8, 8],
[1, 9, 1, 3, 4, 8, 5, 4, 7, 2, 8, 6, 5, 6, 4],
[7, 3, 6, 9, 8, 3, 6, 9, 3, 1, 6, 5, 3, 1, 7],
[8, 4, 3, 2, 9, 5, 3, 8, 4, 6, 9, 2, 6, 5, 9],
[3, 2, 8, 5, 1, 9, 2, 5, 8, 4, 5, 1, 7, 4, 6]])
There are a couple of ways of doing this. Since you have only a single selection of 15, you can use np.isin:
mask = np.isin(sample_5_balls, my_chosen_balls).sum(0) == my_chosen_balls.size
If you want the percentage of successes:
np.count_nonzero(mask) / sample_5_balls.shape[1]
The problem is that you can't easily generate an array like samples_30_balls or sample_5_balls using tools like np.random.choice or np.random.Generator.choice. There are some solutions available, like Numpy random choice, replacement only along one axis, but they only work for a small number of items.
Instead, you can use sorting and slicing to get what you want, as shown here and here:
sample_30_balls = np.random.rand(50, 100000).argsort(0)[:30, :]
You will want to add 1 to the numbers for display, but it will be much easier to go zero-based for the remainder of the answer.
If your population size stays at 64 or under, you can use bit twiddling to make everything work much faster. First convert the data to a single array of numbers:
sample_30_bits = (1 << sample_30_balls).sum(axis=0)
These two operations are equivalent to
sample_30_bits = np.bitwise_or.reduce((2**sample_30_balls), axis=0)
A single sample is a single integer with this scheme:
my_chosen_bits = (1 << np.random.rand(50).argsort()[:15]).sum()
np.isin is now infintely simpler: it's just bitwise AND (&). You can use the fast bit_count function I wrote here (copied verbatim):
def bit_count(arr):
# Make the values type-agnostic (as long as it's integers)
t = arr.dtype.type
mask = t(-1)
s55 = t(0x5555555555555555 & mask) # Add more digits for 128bit support
s33 = t(0x3333333333333333 & mask)
s0F = t(0x0F0F0F0F0F0F0F0F & mask)
s01 = t(0x0101010101010101 & mask)
arr = arr - ((arr >> 1) & s55)
arr = (arr & s33) + ((arr >> 2) & s33)
arr = (arr + (arr >> 4)) & s0F
return (arr * s01) >> (8 * (arr.itemsize - 1))
pct = (bit_count(my_chosen_bits & sample_30_bits) == 15).sum() / sample_30_bits.size
But there's more: now you can generate a large number of samples not just for the 30 balls, but for the 15 as well. One alternative is to generate identical numbers of samples, and compare them 1-to-1:
N = 100000
sample_15_bits = (1 << np.random.rand(50, N).argsort(0)[:15, :]).sum(0)
sample_30_bits = (1 << np.random.rand(50, N).argsort(0)[:30, :]).sum(0)
pct = (bit_count(sample_15_bits & sample_30_bits) == 15).sum() / N
Another alternative is to generate potentially different arrays of samples for each quantity, and compare all of them against each other. This will require a lot more space in the result, so I will show it for smaller inputs:
M = 100
N = 5000
sample_15_bits = (1 << np.random.rand(50, M).argsort(0)[:15, :]).sum(0)
sample_30_bits = (1 << np.random.rand(50, N).argsort(0)[:30, :]).sum(0)
pct = (bit_count(sample_15_bits[:, None] & sample_30_bits) == 15).sum() / (M * N)
If you need to optimize for space (e.g., using truly large sample sizes), keep in mind that all the operations here use ufuncs except np.random.rand and argsort. You can therefore do most of the work in-place without creating temporary arrays. That will be left as an exercise for the reader.
Also, I recommend that you draw histograms of bit_count(sample_15_bits & sample_30_bits) to adjust your expectations. Here is a histogram of the counts for the last example above:
y = np.bincount(bit_count(sample_15_bits[:, None] & sample_30_bits).ravel())
x = np.arange(y.size)
plt.bar(x, y)
Notice how tiny the bar at 15 is. I've seen values of pct around 7e-5 while writing this answer, but am too lazy to figure out the theoretical value.
With isin count the intersecting values and compare with 15. I changed the data generation to samples without replacement.
import numpy as np
np.random.seed(10)
my_chosen_balls = np.random.choice(range(0,50), 15, replace=False)
samples_30_balls = np.random.rand(1_000_000,50).argsort(1)[:,:30]
(np.isin(samples_30_balls, my_chosen_balls).sum(1) == 15).sum()
Output
74
So about 0.007% chance.
How generating a data sample without replacement works
Generate random values in [0,1) in the shape samples, range. Here 10 samples from [0,1,2,3,4]
np.random.rand(10,5)
Out
array([[0.37216438, 0.16884495, 0.05393551, 0.68189535, 0.30378455],
[0.63428637, 0.6566772 , 0.16162259, 0.16176099, 0.74568611],
[0.81452942, 0.10470267, 0.89547322, 0.60099124, 0.22604322],
[0.16562083, 0.89936513, 0.89291548, 0.95578207, 0.90790727],
[0.11326867, 0.18230934, 0.44912596, 0.65437732, 0.78308136],
[0.72693801, 0.22425798, 0.78157525, 0.93485338, 0.84097546],
[0.96751432, 0.57735756, 0.48147214, 0.22441829, 0.53388467],
[0.95415338, 0.07746658, 0.93875458, 0.21384035, 0.26350969],
[0.39937711, 0.35182801, 0.74707871, 0.07335893, 0.27553172],
[0.80749372, 0.40559599, 0.33654045, 0.14802479, 0.71198915]]
'Convert' to integers with argsort
np.random.rand(10,5).argsort(1)
Out
array([[4, 2, 1, 0, 3],
[0, 1, 3, 2, 4],
[1, 3, 2, 4, 0],
[4, 0, 2, 3, 1],
[2, 3, 0, 1, 4],
[1, 4, 3, 2, 0],
[4, 3, 2, 0, 1],
[1, 0, 2, 3, 4],
[4, 1, 2, 3, 0],
[1, 4, 0, 2, 3]])
Slice to the desired sample size
np.random.rand(10,5).argsort(1)[:,:3]
Out
array([[2, 3, 4],
[0, 4, 3],
[3, 0, 4],
[2, 0, 3],
[2, 3, 4],
[3, 4, 2],
[2, 0, 1],
[0, 4, 3],
[0, 2, 3],
[2, 3, 4]])
Say we are given a set of paths P (of same length) between a source and a sink and an edge e. In python, I represent this by a list of lists and a pair, i.e.,
# source = 0, sink = 9
# Path i is giving by P[i]: P[i][j] is the node j.
# Path i is giving then by the edges (P[i][0], P[i][1]), (P[i][1], P[i][2]), (P[i][2], P[i][3]), ...
P = [[0, 1, 3, 5, 7, 9],
[0, 1, 4, 6, 8, 9],
[0, 1, 3, 6, 8, 9],
[0, 1, 3, 5, 8, 9],
[0, 2, 4, 6, 8, 9]]
# The edge we are looking for is (1, 3)
e = (1, 3)
Since e=(1, 3) is contained in 3 paths, P[0], P[2], and P[3], the result is 3.
Here is my solution:
def count_paths(edge, paths):
count = 0
for path in paths:
edges = [(path[i], path[i + 1]) for i in range(len(path) - 1)]
if edge in edges:
count += 1
return count
When the number of paths is large, this function gives tottime of 16.245 using cProfile. Can we make it run faster, using numpy for example?
Convert to array, slice it with one-off offsets to look for sink start and stop values along each row and then simply sum the counts for our desired output, all in a vectorized manner -
In [43]: P = np.array(P)
In [44]: ((P[:,:-1]==1) & (P[:,1:]==3)).sum()
Out[44]: 3
If you need the valid paths too, mask the array with ANY reduced row-mask -
In [16]: P[((P[:,:-1]==1) & (P[:,1:]==3)).any(1)]
Out[16]:
array([[0, 1, 3, 5, 7, 9],
[0, 1, 3, 6, 8, 9],
[0, 1, 3, 5, 8, 9]])
In case of differential evolution, during mutation, the formula that is used most often is
arr[a] = (arr[b] + M * (arr[c] - arr[d])) % arr.shape[1]
Where
arr is a 2d array consisting of non-negative integers such that all elements in each row are unique,
a represents each row of arr,
M is the mutation constant ranging between 0 and 2 and
b, c and d are 3 unique random numbers.
However, on using this formula, I see that arr[a] has duplicate values based on the values of arr[b], arr[c] and arr[d]. I wish to have only unique numbers in arr[a]. How is it possible using Numpy?
e.g.
arr[a] = [2, 8, 4, 9, 1, 6, 7, 3, 0, 5]
arr[b] = [3, 5, 1, 2, 9, 8, 0, 6, 7, 4]
arr[c] = [2, 3, 8, 4, 5, 1, 0, 6, 9, 7]
arr[d] = [6, 1, 9, 2, 7, 5, 8, 0, 3, 4]
On applying the formula, arr[a] becomes [9, 7, 0, 4, 7, 4, 2, 2, 3, 7]. But I want it to have only unique numbers between 0 and arr.shape[1]. I am open to modifying the mutation function if needed if M, arr[b], arr[c] and arr[d] are all used meaningfully.
This is a rather different approach to the problem, but since you seem to be working with permutations, I am not sure numerical differences are that meaningful. You can however see the problem in terms of permutations, that is, reordering of vectors. Instead of the difference between two vectors, you may consider the permutation that takes you from one vector to the other, and instead of the addition of two vectors, you may consider applying a permutation to a vector. If you want to have an M parameter, maybe that could be the number of times you apply the permutation? (assuming that is a non-negative integer)
Here is the basic idea of how you could implement this:
import numpy as np
# Finds the permutation that takes you from vector a to vector b.
# Returns a vector p such that a[p] = b.
def permutation_diff(a, b):
p = np.zeros_like(a)
p[a] = np.arange(len(p), dtype=p.dtype)
return p[b]
# Applies permutation p to vector a, m times.
def permutation_apply(a, p, m=1):
out = a.copy()
for _ in range(m):
out = out[p]
return out
# Combination function
def combine(b, c, d, m):
return permutation_apply(b, permutation_diff(d, c), m)
# Test
b = np.array([3, 5, 1, 2, 9, 8, 0, 6, 7, 4])
c = np.array([2, 3, 8, 4, 5, 1, 0, 6, 9, 7])
d = np.array([6, 1, 9, 2, 7, 5, 8, 0, 3, 4])
m = 1
a = combine(b, c, d, m)
print(a)
# [2 7 0 4 8 5 6 3 1 9]
Since you are working with many vectors arranged in a matrix, you may prefer vectorized versions of the above functions. You can have that with something like this (here I assume M is a fixed parameter for the whole algorithm, not per individual):
import numpy as np
# Finds the permutations that takes you from vectors in a to vectors in b.
def permutation_diff_vec(a, b):
p = np.zeros_like(a)
i = np.arange(len(p))[:, np.newaxis]
p[i, a] = np.arange(p.shape[-1], dtype=p.dtype)
return p[i, b]
# Applies permutations in p to vectors a, m times.
def permutation_apply_vec(a, p, m=1):
out = a.copy()
i = np.arange(len(out))[:, np.newaxis]
for _ in range(m):
out = out[i, p]
return out
# Combination function
def combine_vec(b, c, d, m):
return permutation_apply_vec(b, permutation_diff_vec(d, c), m)
# Test
np.random.seed(100)
arr = np.array([[2, 8, 4, 9, 1, 6, 7, 3, 0, 5],
[3, 5, 1, 2, 9, 8, 0, 6, 7, 4],
[2, 3, 8, 4, 5, 1, 0, 6, 9, 7],
[6, 1, 9, 2, 7, 5, 8, 0, 3, 4]])
n = len(arr)
b = arr[np.random.choice(n, size=n)]
c = arr[np.random.choice(n, size=n)]
d = arr[np.random.choice(n, size=n)]
m = 1
arr[:] = combine_vec(b, c, d, m)
print(arr)
# [[3 6 0 2 5 1 4 7 8 9]
# [6 1 9 2 7 5 8 0 3 4]
# [6 9 2 3 5 0 4 1 8 7]
# [2 6 5 4 1 9 8 0 7 3]]
Try to do this:
list(set(arr[a]))
here is a example what could do:
array = np.array([9, 7, 0, 4, 7, 4, 2, 2, 3, 7])
shape = array.shape[0]
array = list(set(array))
for i in range(shape):
if i not in array:
array.append(i)
array = np.array(array)
If you want to fill in the index of the numbers are duplicate, the logical is a little diffent. But the idea is that. I hope I helped you.