Problem:
Let's say I have a 2D array from which I want to randomly sample (using Monte-Carlo) smaller 2D sub-arrays as shown by the black patches in the figure below. I am looking for an efficient method of doing this.
Prospective (but partial) solution:
I came across one function that partially achieves what I am trying to do after several hours of search, but it lacks the ability to sample a patch at a random location. At least I don't think it can sample from random locations based on its arguments, although it does have one random_state argument that I do not understand.
sklearn.feature_extraction.image.extract_patches_2d(image, patch_size, max_patches=None, random_state=None)
Question:
Select random patch coordinates (2D sub-array) and use them to slice a patch from the bigger array as shown in figure above. The randomly sampled patches are allowed to overlap.
Here is a sampler that creates a sample cut from an array of any dimensionality. It uses functions to control where to start the cut and for how wide the cut should be along any axis.
Here is an explanation of the parameters:
arr - the input numpy array.
loc_sampler_fn - this is the function you want to use to set the corner of the box. If you want the corner of the box to be sampled uniformly from the anywhere along the axis, use np.random.uniform. If you want the corner to be closer to the center of the array, use np.random.normal. However, we need to tell the function what range to sample over. This brings us to the next parameter.
loc_dim_param - this passes the size of each axis to loc_sampler_fn. If we are using np.random.uniform for the location sampler, we want to sample from the entire range of the axis. np.random.uniform has two parameters: low and high, so by passing the length of the axis to high it samples uniformly over the entire axis. In other words, if the axis has length 120 we want np.random.uniform(low=0, high=120), so we would set loc_dim_param='high'.
loc_params - this passes any additional parameters to loc_sampler_fn. Keeping with the example, we need to pass low=0 to np.random.uniform, so we pass the dictionary loc_params={'low':0}.
From here, it is basically identical for the shape of the box. If you want the box height and width to be uniformly sampled from 3 to 10, pass in shape_sampler_fn=np.random.uniform, with shape_dim_param=None since we are not using the size of the axis for anything, and shape_params={'low':3, 'high':11}.
def box_sampler(arr,
loc_sampler_fn,
loc_dim_param,
loc_params,
shape_sampler_fn,
shape_dim_param,
shape_params):
'''
Extracts a sample cut from `arr`.
Parameters:
-----------
loc_sampler_fn : function
The function to determine the where the minimum coordinate
for each axis should be placed.
loc_dim_param : string or None
The parameter in `loc_sampler_fn` that should use the axes
dimension size
loc_params : dict
Parameters to pass to `loc_sampler_fn`.
shape_sampler_fn : function
The function to determine the width of the sample cut
along each axis.
shape_dim_param : string or None
The parameter in `shape_sampler_fn` that should use the
axes dimension size.
shape_params : dict
Parameters to pass to `shape_sampler_fn`.
Returns:
--------
(slices, x) : A tuple of the slices used to cut the sample as well as
the sampled subsection with the same dimensionality of arr.
slice :: list of slice objects
x :: array object with the same ndims as arr
'''
slices = []
for dim in arr.shape:
if loc_dim_param:
loc_params.update({loc_dim_param: dim})
if shape_dim_param:
shape_params.update({shape_dim_param: dim})
start = int(loc_sampler_fn(**loc_params))
stop = start + int(shape_sampler_fn(**shape_params))
slices.append(slice(start, stop))
return slices, arr[slices]
Example for a uniform cut on a 2D array with widths between 3 and 9:
a = np.random.randint(0, 1+1, size=(100,150))
box_sampler(a,
np.random.uniform, 'high', {'low':0},
np.random.uniform, None, {'low':3, 'high':10})
# returns:
([slice(49, 55, None), slice(86, 89, None)],
array([[0, 0, 1],
[0, 1, 1],
[0, 0, 0],
[0, 0, 1],
[1, 1, 1],
[1, 1, 0]]))
Examples for taking 2x2x2 chunks from a 10x20x30 3D array:
a = np.random.randint(0,2,size=(10,20,30))
box_sampler(a, np.random.uniform, 'high', {'low':0},
np.random.uniform, None, {'low':2, 'high':2})
# returns:
([slice(7, 9, None), slice(9, 11, None), slice(19, 21, None)],
array([[[0, 1],
[1, 0]],
[[0, 1],
[1, 1]]]))
Update based on the comments.
For your specific purpose, it looks like you want a rectangular sample where the starting corner is uniformly sampled from anywhere in the array, and the the width of the sample along each axis is uniformly sampled, but can be limited.
Here is a function that generates these samples. min_width and max_width can accept iterables of integers (such as a tuple) or a single integer.
def uniform_box_sampler(arr, min_width, max_width):
'''
Extracts a sample cut from `arr`.
Parameters:
-----------
arr : array
The numpy array to sample a box from
min_width : int or tuple
The minimum width of the box along a given axis.
If a tuple of integers is supplied, it my have the
same length as the number of dimensions of `arr`
max_width : int or tuple
The maximum width of the box along a given axis.
If a tuple of integers is supplied, it my have the
same length as the number of dimensions of `arr`
Returns:
--------
(slices, x) : A tuple of the slices used to cut the sample as well as
the sampled subsection with the same dimensionality of arr.
slice :: list of slice objects
x :: array object with the same ndims as arr
'''
if isinstance(min_width, (tuple, list)):
assert len(min_width)==arr.ndim, 'Dimensions of `min_width` and `arr` must match'
else:
min_width = (min_width,)*arr.ndim
if isinstance(max_width, (tuple, list)):
assert len(max_width)==arr.ndim, 'Dimensions of `max_width` and `arr` must match'
else:
max_width = (max_width,)*arr.ndim
slices = []
for dim, mn, mx in zip(arr.shape, min_width, max_width):
fn = np.random.uniform
start = int(np.random.uniform(0,dim))
stop = start + int(np.random.uniform(mn, mx+1))
slices.append(slice(start, stop))
return slices, arr[slices]
Example of generating a box cut that starts uniformly anywhere in the array, the height is a random uniform draw from 1 to 4 and the width is a random uniform draw from 2 to 6 (just to show). In this case, the size of the box was 3 by 4, starting at the 66th row and 19th column.
x = np.random.randint(0,2,size=(100,100))
uniform_box_sampler(x, (1,2), (4,6))
# returns:
([slice(65, 68, None), slice(18, 22, None)],
array([[1, 0, 0, 0],
[0, 0, 1, 1],
[0, 1, 1, 0]]))
So it seems like your issue with sklearn.feature_extraction.image.extract_patches_2d is that it forces you to to specify a single patch size, whereas you are looking for different patches of random size.
One thing to note here is that your result can't be a NumPy array (unlike the result of the sklearn function) because arrays have to have uniform-length rows/columns. So your output needs to be some other data structure that contains differently-shaped arrays.
Here's a workaround:
from itertools import product
def random_patches_2d(arr, n_patches):
# The all possible row and column slices from `arr` given its shape
row, col = arr.shape
row_comb = [(i, j) for i, j in product(range(row), range(row)) if i < j]
col_comb = [(i, j) for i, j in product(range(col), range(col)) if i < j]
# Pick randomly from the possible slices. The distribution will be
# random uniform from the given slices. We can't use
# np.random.choice because it only samples from a 1d array.
a = np.random.choice(np.arange(len(row_comb)), size=n_patches)
b = np.random.choice(np.arange(len(col_comb)), size=n_patches)
for i, j in zip(a, b):
yield arr[row_comb[i][0]:row_comb[i][1],
col_comb[i][0]:col_comb[i][1]]
Example:
np.random.seed(99)
arr = np.arange(49).reshape(7, 7)
res = list(random_patches_2d(arr, 5))
print(res[0])
print()
print(res[3])
[[0 1]
[7 8]]
[[ 8 9 10 11]
[15 16 17 18]
[22 23 24 25]
[29 30 31 32]]
Condensed:
def random_patches_2d(arr, n_patches):
row, col = arr.shape
row_comb = [(i, j) for i, j in product(range(row), range(row)) if i < j]
col_comb = [(i, j) for i, j in product(range(col), range(col)) if i < j]
a = np.random.choice(np.arange(len(row_comb)), size=n_patches)
b = np.random.choice(np.arange(len(col_comb)), size=n_patches)
for i, j in zip(a, b):
yield arr[row_comb[i][0]:row_comb[i][1],
col_comb[i][0]:col_comb[i][1]]
Addressing your comment: you could successively add 1 patch and check the area after each.
# `size` is just row x col
area = arr.size
patch_area = 0
while patch_area <= area: # or while patch_area <= 0.1 * area:
patch = random_patches_2d(arr, n_patches=1)
patch_area += patch
Related
I have an array of one-dimensional values of Yk, where each subsequent value of U = random number [0,1], k =1,2,3..N . I need to build an array of size NxN, which will be filled with a different condition, which depends on the index of the row and column.
If the index of the column and row for the array cell are equal (i=m) 2π+0.1*U_k, i=m, then the array cell is filled by the upper condition sin(i-k) * cosπ(i-k), i≠m, and otherwise by the lower one.
I did not deal with the fact that the rows and columns of the array participated in the filling of this array, so I got confused with the implementation. I only need a working basis for padding, so a finite array of NxN is enough for 4x4.
enter image description here
simplifying the task, I got a semblance of what I need to get. However, some of the problems remained.
1: instead of the variable U, there should be an element of an array of random numbers within [0,1], this element is U_k, where k is its index in a one-dimensional array. For example, in the array U_k = [0.11, 0.5, 0.66]: U1=0.11, U2=0.5, U3=0.66.
2: Also, instead of the constant output of the variable A, I need to form a one-dimensional array.
In other words, I still have problems with outputting the value from the previously set array U_k and packing the results of the loop execution into a one-dimensional array.
import numpy as np
import math
i=0
k=1
N=5
U=10
while k < N:
while i < N:
i= i+1
k = k
if i==k:
A = 2* math.pi + 0.1 * U
if i != k:
A = math.sin(i-k)* math.cos(math.pi*(i-k))
print(A)
else:
i= 1
k= k+1
I am not sure about your U_ks but the question you asked in the title has the following easy solution:
np.fromfunction(lambda i, j: i + 10*j, (4, 4))
yields
array([[ 0, 10, 20, 30],
[ 1, 11, 21, 31],
[ 2, 12, 22, 32],
[ 3, 13, 23, 33]])
And obviously you can replace my return line with your U_k/Sine things.
Notice that you can achieve something similar with pure python like so
[[i + 10*j for j in range(n)] for i in range(n)]
but numpy is ruffly 50 times faster.
My goal is to interpolate the discretized continuous 2D Fourier transform of a function. The problem seems to be that the frequencies in each dimension are not output in strictly ascending order (see here).
The fft.fft2 function accepts a 2D array, where in my case the array (let's call it A) is structured such that A[i][j] = fun(x[i], y[j]), fun being the function to be transformed. After applying fft.fft2 to A, output is an array F of the same dimensions as the original array, such that the frequency coordinate corresponding to F[i][j] is (w_x[i], w_y[j]), where w_x = fft.fftfreq(F.shape[0]) and w_y = fft.fftfreq(F.shape[1]), both of these being 1D arrays which are not in ascending order.
Over wx and wy I am wanting to interpolate F (say to a function finterp) such that the interpolated value is returned upon calling finterp(w_x, w_y), w_x and w_y being within the domain of wx and range of wy, but otherwise arbitrary. I've looked into the varieties of interpolation available through scipy.interpolate, but it doesn't seem to me that any of them can deal with this type of data structure (the coordinate axes being defined as out-of-order 1D arrays and the function values being in a 2D array).
This is a little abstract, so here I've made up a simple example which is similar in structure to the above. Suppose we are wishing to construct a continuous function f(x, y) = x + y over the region x = [-1, 1] and y = [-1, 1] given the following data:
import numpy as np
# note that below z[i][j] corresponds to what we want f(x[i], y[j]) to be
x = np.array([0, 1, -1])
y = np.array([0, 1, -1])
z = np.array([0, 1, -1],[1, 2, 0],[-1, 0, -2])
z[i][j] we know corresponds to the function evaluated at x[i], y[j]. How can one either (a) interpolate this data directly, given its original structure, or (b) rearrange the data so that x and y are in ascending order, and the arranged z is such that z[i][j] is equal to the function evaluated at the rearranged x[i], y[j]?
The following code shows how to use fftshift to change the output of fft2 and fftfreq so that the frequency axes are monotonically increasing. After applying fftshift, you can use the arrays for interpolation. I've added display of the arrays so that you can verify that the data itself is unchanged. The origin is shifted from the top-left corner to the middle of the array, moving the negative frequencies from the right side to the left side.
import numpy as np
import matplotlib.pyplot as pp
x = np.array([0, 1, -1])
y = np.array([0, 1, -1])
z = np.array([[0, 1, -1],[1, 2, 0],[-1, 0, -2]])
f = np.fft.fft2(z)
w_x = np.fft.fftfreq(f.shape[0])
w_y = np.fft.fftfreq(f.shape[1])
pp.figure()
pp.imshow(np.abs(f))
pp.xticks(np.arange(0,len(w_x)), np.round(w_x,2))
pp.yticks(np.arange(0,len(w_y)), np.round(w_y,2))
f = np.fft.fftshift(f)
w_x = np.fft.fftshift(w_x)
w_y = np.fft.fftshift(w_y)
pp.figure()
pp.imshow(np.abs(f))
pp.xticks(np.arange(0,len(w_x)), np.round(w_x,2))
pp.yticks(np.arange(0,len(w_y)), np.round(w_y,2))
pp.show()
An alternative approach is to not use fftfreq to determine your frequencies, but compute them by hand. The FFT, by default, computes the DFT for k=[0..N-1]. Because of the periodicity, with the DFT at k equal to the DFT at k+N and k-N, its output is often interpreted to have k=[N//2...(N-1)//2] instead (but arranged differently to match k=[0..N-1]); this is the k that fftfreq returns (it returns k/N).
Thus, you can instead say
N = f.shape[0]
w_x = np.linspace(0, N, N, endpoint=False) / N
Now you don't have any negative frequencies, and instead have frequencies in the range [0,N-1]/N.
Prompt:
Given a 2D integer matrix M representing the gray scale of an image, you need to design a smoother to make the gray scale of each cell becomes the average gray scale (rounding down) of all the 8 surrounding cells and itself. If a cell has less than 8 surrounding cells, then use as many as you can.
Example:
Input:
[[1,1,1],
[1,0,1],
[1,1,1]]
Output:
[[0, 0, 0],
[0, 0, 0],
[0, 0, 0]]
Explanation:
For the point (0,0), (0,2), (2,0), (2,2) -> floor(3/4) = floor(0.75) = 0
For the point (0,1), (1,0), (1,2), (2,1) -> floor(5/6) = floor(0.83333333) = 0
For the point (1,1): floor(8/9) = floor(0.88888889) = 0
Solution:
class Solution:
def imageSmoother(self, grid):
"""
:type M: List[List[int]]
:rtype: List[List[int]]
"""
rows, cols = len(grid), len(grid[0])
#Go through each cell
for r in range(rows):
for c in range(cols):
#Metrics for calculating average, starting inputs are zero since the loop includes the current cell, grid[r][c]
total = 0
n = 0
#Checking the neighbors
for ri in [-1,0,1]:
for ci in [-1,0,1]:
if (r + ri >= 0 and r + ri <= rows-1 and c + ci >=0 and c + ci <= cols-1):
total += grid[r+ri][c+ci]
n += 1
#Now we convert the cell value to the average
grid[r][c] = int(total/n)
return grid
My solution is incorrect. It passes some test cases, but for this one I fail.
Input: [[2,3,4],[5,6,7],[8,9,10],[11,12,13],[14,15,16]]
Output: [[4,4,5],[6,6,6],[8,9,9],[11,11,12],[12,12,12]]
Expected: [[4,4,5],[5,6,6],[8,9,9],[11,12,12],[13,13,14]]
As you can see, my solution is really close. I'm not sure where I'm messing up since when I changed the parameters around I started failing other basic test cases. The solutions I see online use other packages which I'd prefer not to use since I want to approach this problem more intuitively.
How do you check where you're going wrong with 2D array problems? Thanks!
Leetcode solution:
def imageSmoother(self, M):
R,C=len(M),len(M[0])
M2=[[0]*C for i in range(R)]
for i in range(R):
for j in range(C):
temp=[M[i+x][j+y] for x,y in list(itertools.product([-1,0,1],[-1,0,1])) if 0<=i+x<R and 0<=j+y<C ]
M2[i][j]=(sum(temp)//len(temp))
return M2
The problem with your code is that you're modifying grid as you go along. So, for each cell, you're using the input values for the down/right neighbors, but the output values for the up/left neighbors.
So, for your given example, when you're computing the neighbors of grid[1][0], you've already replaced two of the neighbors, grid[0][0] and grid[0][1], so they're now 4, 4 instead of 2, 3. Which means you're averaging 4, 4, 5, 6, 8, 9 instead of 2, 3, 5, 6, 8, 9. So, instead of getting a 5.5 that you round down to 5, you get a 6.0 that you round down to 6.
The simplest fix is to just build up a new output grid as you go along, then return that:
rows, cols = len(grid), len(grid[0])
outgrid = []
#Go through each cell
for r in range(rows):
outrow = []
for c in range(cols):
# … same code as before, but instead of the grid[r][c] =
outrow.append(int(total/n))
outgrid.append(outrow)
return outgrid
If you need to modify the grid in place, you can instead copy the original grid, and iterate over that copy:
rows, cols = len(grid), len(grid[0])
ingrid = [list(row) for row in grid]
#Go through each cell
for r in range(rows):
for c in range(cols):
# … same code as before, but instead of total += grid[r+ri][c+ci]
total += ingrid[r+ri][c+ci]
If you used a 2D NumPy array instead of a list of lists, you could solve this at a higher level.
NumPy lets you add entire arrays all at once, divide them by scalars, etc., so you can get rid of those loops over r and c and just do the work array-wide. But you still have to think about your boundaries. You can't just add arr and arr[:-1] and arr[1:] and so on, you need to pad them out to the same size. And if you just pad with 0s, you'll end up averaging 0, 4, 4, 0, 5, 6, 0, 8, 9, which is no good. But if you pad them with NaN values, so you're averaging NaN, 4, 4, NaN, 5, 6, NaN, 8, 9, then you can use the nanmean function, which ignores those NaN values and averages the 6 real values.
So, this is still a few lines of code to iterate over the 9 directions, pad the 9 arrays, and nanmean the results. (Or you could cram it into a giant expression with product, like the leetcode answer, but that isn't exactly more readable or easier to understand.)
But if you can drag in SciPy, a collection of algorithms for almost anything you'd ever want to build on top of NumPy, it has a function in its ndimage library called generic_filter that can do every conceivable variation of "gather the N neighbors, padding like X, and run function Y on the resulting arrays".
In our case, we want to gather the 3-per-axis neighbors, pad with the constant value NaN, and run the nanmean function, so this one-liner will do everything you need:
scipy.ndimage.generic_filter(grid, function=np.nanmean, size=3, mode='constant', cval=np.NaN)
I am working on a genetic algorithm code. I am fairly new to python.
My code snippet is as follows:
import numpy as np
pop_size = 10 # Population size
noi = 2 # Number of Iterations
M = 2 # Number of Phases in the Data
alpha = [np.random.randint(0, 64, size = pop_size)]* M
phi = [np.random.randint(0, 64, size = pop_size)]* M
reduced_tensor = [np.zeros((pop_size,3,3))]* M
for n_i in range(noi):
alpha_en = [(2*np.pi*alpha/63.00) for alpha in alpha]
phi_en = [(phi/63.00) for phi in phi]
for i in range(M):
for j in range(pop_size):
reduced_tensor[i][j] = [[1, 0, 0],
[0, phi_en[i][j], 0],
[0, 0, 0]]
Here I have a list of numpy arrays. The variable 'alpha' is a list containing two numpy arrays. How do I use list comprehension in this case? I want to create a similar list 'alpha_en' which operates on every element of alpha. How do I do that? I know my current code is wrong, it was just trial and error.
What does 'for alpha in alpha' mean (line 11)? This line doesn't give any error, but also doesn't give the desired output. It changes the dimension and value of alpha.
The variable 'reduced_tensor' is a list of an array of 3x3 matrix, i.e., four dimensions in total. How do I differentiate between the indexing of a list comprehension and a numpy array? I want to perform various operations on a list of matrices, in this case, assign the values of phi_en to one of the elements of the matrix reduced_tensor (as shown in the code). How should I do it efficiently? I think my current code is wrong, if not just confusing.
There some questionable programming in these 2 lines
alpha = [np.random.randint(0, 64, size = pop_size)]* M
...
alpha_en = [(2*np.pi*alpha/63.00) for alpha in alpha]
The first makes an array, and then makes a list with M pointers to the same thing. Note, M copies of the random array. If I were to change one element of alpha, I'd change them all. I don't see the point to this type of construction.
The [... for alpha in alpha] works because the 2 uses of alpha are different. At least in newer Pythons the i in [i*3 for i in range(3)] does not 'leak out' of the comprehension. That said, I would not approve of that variable naming. At the very least is it confusing to readers.
The arrays in alpha_en are separate. Values are derived from the array in alpha, but they are new.
for a in alphas:
a *= 2
would modify each array in alphas; how ever due to how alphas is constructed this ends up multiplying the array many times.
reduced_tensor = [np.zeros((pop_size,3,3))]* M
has the same problem; it's a list of M references to the same 3d array.
reduced_tensor[i][j]
references the i reference in that list, and the j 'row' of that array. I like to use
reduced_tensor[i][j,:,:]
to make it clearer to me and my reader the expected dimensions of the result.
The iteration over M does nothing for you; it just repeats the same assignment M times.
At the root of your problems is that use of list replication.
In [30]: x=[np.arange(3)]*3
In [31]: x
Out[31]: [array([0, 1, 2]), array([0, 1, 2]), array([0, 1, 2])]
In [32]: [id(i) for i in x]
Out[32]: [3036895536, 3036895536, 3036895536]
In [33]: x[0] *= 10
In [34]: x
Out[34]: [array([ 0, 10, 20]), array([ 0, 10, 20]), array([ 0, 10, 20])]
I want to use numpy.ix_ to generate an multi-dimensional index for a 2D space of values. However, I need to use a subindex to look up the indices for one dimension. For example,
assert subindex.shape == (ny, nx)
data = np.random.random(size=(ny,nx))
# Generator returning the index tuples
def get_idx(ny,nx,subindex):
for y in range(ny):
for x in range(nx):
yi = y # This is easy
xi = subindex[y,x] # Get the second index value from the subindex
yield (yi,xi)
# Generator returning the data values
def get_data_vals(ny,nx,data,subindex):
for y in range(ny):
for x in range(nx):
yi = y # This is easy
xi = subindex[y,x] # Get the second index value from the subindex
yield data[y,subindex[y,x]]
So instead of the for loops above, I'd like to use a multi-dimensional index to index data Using numpy.ix_, I guess I would have something like:
idx = numpy.ix_([np.arange(ny), ?])
data[idx]
but I don't know what the second dimension argument should be. I'm guessing it should be something involving numpy.choose?
What you actually seem to want is:
y_idx = np.arange(ny)[:,np.newaxis]
data[y_idx, subindex]
BTW, you could achieve the same thing with y_idx = np.arange(ny).reshape((-1, 1)).
Let's look at a small example:
import numpy as np
ny, nx = 3, 5
data = np.random.rand(ny, nx)
subindex = np.random.randint(nx, size=(ny, nx))
Now
np.arange(ny)
# array([0, 1, 2])
are just the indices for the "y-axis", the first dimension of data. And
y_idx = np.arange(ny)[:,np.newaxis]
# array([[0],
# [1],
# [2]])
adds a new axis to this array (after the existing axis) and effectively transposes it. When you now use this array in an indexing expression together with the subindex array, the former gets broadcasted to the shape of the latter. So y_idx becomes effectively:
# array([[0, 0, 0, 0, 0],
# [1, 1, 1, 1, 1],
# [2, 2, 2, 2, 2]])
And now for each pair of y_idx and subindex you look up an element in the data array.
Here you can find out more about "fancy indexing"
It sounds like you need to do two things:
Find all indices into the data array and
Translate the column indices according to some other array, subindex.
The code below therefore generates indices for all array positions (using np.indices), and reshapes it to (..., 2) -- a 2-D list of coordinates representing each position in the array. For each coordinate, (i, j), we then translate the column coordinate j using the subindex array provided, and then use that translated index as the new column index.
With numpy, it is not necessary to do that in a for-loop--we can simply pass in all the indices at once:
i, j = np.indices(data.shape).reshape((-1, 2)).T
data[i, subindex[i, j]]