I have basic 2-D numpy arrays and I'd like to "downsample" them to a more coarse resolution. Is there a simple numpy or scipy module that can easily do this? I should also note that this array is being displayed geographically via Basemap modules.
SAMPLE:
scikit-image has implemented a working version of downsampling here, although they shy away from calling it downsampling for it not being a downsampling in terms of DSP, if I understand correctly:
http://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.block_reduce
but it works very well, and it is the only downsampler that I found in Python that can deal with np.nan in the image. I have downsampled gigantic images with this very quickly.
When downsampling, interpolation is the wrong thing to do. Always use an aggregated approach.
I use block means to do this, using a "factor" to reduce the resolution.
import numpy as np
from scipy import ndimage
def block_mean(ar, fact):
assert isinstance(fact, int), type(fact)
sx, sy = ar.shape
X, Y = np.ogrid[0:sx, 0:sy]
regions = sy//fact * (X//fact) + Y//fact
res = ndimage.mean(ar, labels=regions, index=np.arange(regions.max() + 1))
res.shape = (sx//fact, sy//fact)
return res
E.g., a (100, 200) shape array using a factor of 5 (5x5 blocks) results in a (20, 40) array result:
ar = np.random.rand(20000).reshape((100, 200))
block_mean(ar, 5).shape # (20, 40)
imresize and ndimage.interpolation.zoom look like they do what you want
I haven't tried imresize before but here is how I have used ndimage.interpolation.zoom
a = np.array(64).reshape(8,8)
a = ndimage.interpolation.zoom(a,.5) #decimate resolution
a is then a 4x4 matrix with interpolated values in it
Easiest way:
You can use the array[0::2] notation, which only considers every second index.
E.g.
array= np.array([[i+j for i in range(0,10)] for j in range(0,10)])
down_sampled=array[0::2,0::2]
print("array \n", array)
print("array2 \n",down_sampled)
has the output:
array
[[ 0 1 2 3 4 5 6 7 8 9]
[ 1 2 3 4 5 6 7 8 9 10]
[ 2 3 4 5 6 7 8 9 10 11]
[ 3 4 5 6 7 8 9 10 11 12]
[ 4 5 6 7 8 9 10 11 12 13]
[ 5 6 7 8 9 10 11 12 13 14]
[ 6 7 8 9 10 11 12 13 14 15]
[ 7 8 9 10 11 12 13 14 15 16]
[ 8 9 10 11 12 13 14 15 16 17]
[ 9 10 11 12 13 14 15 16 17 18]]
array2
[[ 0 2 4 6 8]
[ 2 4 6 8 10]
[ 4 6 8 10 12]
[ 6 8 10 12 14]
[ 8 10 12 14 16]]
Because the OP just wants a courser resolution, I thought I would share my way for reducing number of pixels by half in each dimension. I takes the mean of 2x2 blocks. This can be applied multiple times to reduce by factors of 2.
from scipy.ndimage import convolve
array_downsampled = convolve(array,
np.array([[0.25,0.25],[0.25,0.25]]))[:array.shape[0]:2,:array.shape[1]:2]
xarray's "coarsen" method can downsample a xarray.Dataset or xarray.DataArray
http://xarray.pydata.org/en/stable/generated/xarray.DataArray.coarsen.html
http://xarray.pydata.org/en/stable/computation.html#coarsen-large-arrays
For example:
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(15,5))
# Create a 10x10 array of random numbers
a = xr.DataArray(np.random.rand(10,10)*100, dims=['x', 'y'])
# "Downscale" the array, mean of blocks of size (2x2)
b = a.coarsen(x=2, y=2).mean()
# "Downscale" the array, mean of blocks of size (5x5)
c = a.coarsen(x=5, y=5).mean()
# Plot and cosmetics
a.plot(ax=ax1)
ax1.set_title("Full Data")
b.plot(ax=ax2)
ax2.set_title("mean of (2x2) boxes")
c.plot(ax=ax3)
ax3.set_title("mean of (5x5) boxes")
This might not be what you're looking for, but I thought I'd mention it for completeness.
You could try installing scikits.samplerate (docs), which is a Python wrapper for libsamplerate. It provides nice, high-quality resampling algorithms -- BUT as far as I can tell, it only works in 1D. You might be able to resample your 2D signal first along one axis and then along another, but I'd think that might counteract the benefits of high-quality resampling to begin with.
This will take an image of any resolution and return only a quarter of its size by taking the 4th index of the image array.
import cv2
import numpy as np
def quarter_res_drop(im):
resized_image = im[0::4, 0::4]
cv2.imwrite('resize_result_image.png', resized_image)
return resized_image
im = cv2.imread('Your_test_image.png', 1)
quarter_res_drop(im)
Related
Can't find a question/ answer that fits this exact criteria but if this is a duplicate question then I will delete it. Is there a numpy equivalent to the following code or is it better to just keep my code as is/ use xrange?
x = [i for i in range (50)]
y = [i for i in range (120)]
for i in x:
foo = [i+z for z in y]
print(foo)
This is a toy example but the the data set I am working with can range from something like this to 1000x the size in the example; I have tried np.idter but don't see much of a performance increase and as I gathered from bmu's answer here using range to iterate over a numpy array is the worst. But I cannot see how ufunc and indexing can reproduce the same results as above which is my desired result.
This is a classic application of broadcasting:
import numpy as np
x = np.arange(0,5).reshape(5,1)
y = np.arange(0,12).reshape(1,12)
foos = x + y
print(foos)
[[ 0 1 2 3 4 5 6 7 8 9 10 11]
[ 1 2 3 4 5 6 7 8 9 10 11 12]
[ 2 3 4 5 6 7 8 9 10 11 12 13]
[ 3 4 5 6 7 8 9 10 11 12 13 14]
[ 4 5 6 7 8 9 10 11 12 13 14 15]]
Obviously a binary operation like addition can't emit multiple arrays, but it can emit a higher dimensional array containing all the output arrays as rows or columns of that higher dimensional array.
As pointed out in comments, there is also a generalization of the outer product which is functionally identical to the broadcasting approach I have shown.
I'm supposed to create code that will simulate a d20 sided dice rolling 25 times using np.random.choice.
I tried this:
np.random.choice(20,25)
but this still includes 0's which wouldn't appear on a dice.
How do I account for the 0's?
Use np.arange:
import numpy as np
np.random.seed(42) # for reproducibility
result = np.random.choice(np.arange(1, 21), 50)
print(result)
Output
[ 7 20 15 11 8 7 19 11 11 4 8 3 2 12 6 2 1 12 12 17 10 16 15 15
19 12 20 3 5 19 7 9 7 18 4 14 18 9 2 20 15 7 12 8 15 3 14 17
4 18]
The above code draws numbers from 0 to 20 both inclusive. To understand why, you could check the documentation of np.random.choice, in particular on the first argument:
a : 1-D array-like or int
If an ndarray, a random sample is generated from its elements. If an
int, the random sample is generated as if a was np.arange(n)
np.random.choice() takes as its first argument an array of possible choices (if int is given it works like np.arrange), so you can use list(range(1, 21)) to get the output you want
+1
np.random.choice(20,25) + 1
I have a 2D array of shape (50,50). I need to subtract a value from each column of this array skipping the first), which is calculated based on the index of the column. For example, using a for loop it would look something like this:
for idx in range(1, A[0, :].shape[0]):
A[0, idx] -= idx * (...) # simple calculations with idx
Now, of course this works fine, but it's very slow and performance is critical for my application. I've tried computing the values to be subtracted using np.fromfunction() and then subtracting it from the original array, but results are different than those obtained by the for loop iteractive subtraction:
func = lambda i, j: j * (...) #some simple calculations
subtraction_matrix = np.fromfunction(np.vectorize(func), (1,50))
A[0, 1:] -= subtraction_matrix
What am I doing wrong? Or is there some other method that would be better? Any help is appreciated!
All your code snippets indicate that you require the subtraction to happen only in the first row of A (though you've not explicitly mentioned that). So, I'm proceeding with that understanding.
Referring to your use of from_function(), you can use the subtraction_matrix as below:
A[0,1:] -= subtraction_matrix[1:]
Testing it out (assuming shape (5,5) instead of (50,50)):
import numpy as np
A = np.arange(25).reshape(5,5)
print (A)
func = lambda j: j * 10 #some simple calculations
subtraction_matrix = np.fromfunction(np.vectorize(func), (5,), dtype=A.dtype)
A[0,1:] -= subtraction_matrix[1:]
print (A)
Output:
[[ 0 1 2 3 4] # print(A), before subtraction
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
[[ 0 -9 -18 -27 -36] # print(A), after subtraction
[ 5 6 7 8 9]
[ 10 11 12 13 14]
[ 15 16 17 18 19]
[ 20 21 22 23 24]]
If you want the subtraction to happen in all the rows of A, you just need to use the line A[:,1:] -= subtraction_matrix[1:], instead of the line A[0,1:] -= subtraction_matrix[1:]
i'm tring to do a simple linear regression using pyfinance package and using PandasRollingOLS to have rolling regression beta (rolling with min_window option).
it works but i would like to have a min_window in the function.
i would like to have min_window in the rollingOLS function, because if we have a window of 90 it does not perform OLS on first 90 values. i would like to perform a OLS expanding until 90 observations starting when there is at least 12 observation (min_window), then rolling of 90 (window)
i tried to understand the code of the package but i'm not able to include min_window in the code.
i would like this kind of function (this is init of PandasRollingOLS class):
def __init__(self, y, x=None, window=None, **min_window=None**, has_const=False, use_const=True):
i think i should update the code on utils.rolling_windows posted below, can someone help me please?
def rolling_windows(a, window):
"""Creates rolling-window 'blocks' of length `window` from `a`.
Note that the orientation of rows/columns follows that of pandas.
Example
-------
import numpy as np
onedim = np.arange(20)
twodim = onedim.reshape((5,4))
print(twodim)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]]
print(rwindows(onedim, 3)[:5])
[[0 1 2]
[1 2 3]
[2 3 4]
[3 4 5]
[4 5 6]]
print(rwindows(twodim, 3)[:5])
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
[[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]]]
"""
if window > a.shape[0]:
raise ValueError('Specified `window` length of {0} exceeds length of'
' `a`, {1}.'.format(window, a.shape[0]))
if isinstance(a, (Series, DataFrame)):
a = a.values
if a.ndim == 1:
a = a.reshape(-1, 1)
shape = (a.shape[0] - window + 1, window) + a.shape[1:]
strides = (a.strides[0],) + a.strides
windows = np.squeeze(np.lib.stride_tricks.as_strided(a, shape=shape,
strides=strides))
# In cases where window == len(a), we actually want to "unsqueeze" to 2d.
# I.e., we still want a "windowed" structure with 1 window.
if windows.ndim == 1:
windows = np.atleast_2d(windows)
return windows
thank you all!
Alessandro
I am struggling with this myself at the moment using PandasRollingOLS. I came to the temporary conclusion to simply take care of it before the regression, i.e. delete every column with below min_window value before running regressions.
min_window = 3
df.loc[:,~(df.rolling(min_window).count() < min_window).all()]
Note that it requires that your dataframe has NaNs (which is why I guess you want to have a min_window):
NaN NaN
0.5 NaN
0.8 NaN
0.7 0.5
0.6 0.4
This might be a temporary (ugly) solution until a Python guru stumbles upon your post.
There are 2D arrays of numbers as outputs of some numerical processes in the form of 1x1, 3x3, 5x5, ... shaped, that correspond to different resolutions.
In a stage an average i.e., 2D array value in the shape nxn needs to be produced.
If the outputs were in consistency of shape i.e., say all in 11x11 the solution was obvious, so:
element_wise_mean_of_all_arrays.
For the problem of this post however the arrays are in different shapes so the obvious way does not work!
I thought it might be some help by using kron function however it didn't. For example, if array is in shape of 17x17 how to make it 21x21. So for all others from 1x1,3x3,..., to build a constant-shaped array, say 21x21.
Also it can be the case that the arrays are smaller and bigger in shape compared to the target shape. That is an array of 31x31 to be shruk into 21x21.
You could imagine the problem as a very common task for images, being shrunk or extended.
What are possible efficient approaches to do the same jobs on 2D arrays, in Python, using numpy, scipy, etc?
Updates:
Here is a bit optimized version of the accepted answer bellow:
def resize(X,shape=None):
if shape==None:
return X
m,n = shape
Y = np.zeros((m,n),dtype=type(X[0,0]))
k = len(X)
p,q = k/m,k/n
for i in xrange(m):
Y[i,:] = X[i*p,np.int_(np.arange(n)*q)]
return Y
It works perfectly, however do you all agree it is the best choice in terms of the efficiency? If not any improvement?
# Expanding ---------------------------------
>>> X = np.array([[1,2,3],[4,5,6],[7,8,9]])
[[1 2 3]
[4 5 6]
[7 8 9]]
>>> resize(X,[7,11])
[[1 1 1 1 2 2 2 2 3 3 3]
[1 1 1 1 2 2 2 2 3 3 3]
[1 1 1 1 2 2 2 2 3 3 3]
[4 4 4 4 5 5 5 5 6 6 6]
[4 4 4 4 5 5 5 5 6 6 6]
[7 7 7 7 8 8 8 8 9 9 9]
[7 7 7 7 8 8 8 8 9 9 9]]
# Shrinking ---------------------------------
>>> X = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]
[13 14 15 16]]
>>> resize(X,(2,2))
[[ 1 3]
[ 9 11]]
Final note: that the code above easily could be translated to Fortran for the highest performance possible.
I'm not sure I understand exactly what you are trying but if what I think the simplest way would be:
wanted_size = 21
a = numpy.array([[1,2,3],[4,5,6],[7,8,9]])
b = numpy.zeros((wanted_size, wanted_size))
for i in range(wanted_size):
for j in range(wanted_size):
idx1 = i * len(a) / wanted_size
idx2 = j * len(a) / wanted_size
b[i][j] = a[idx1][idx2]
You could maybe replace the b[i][j] = a[idx1][idx2] with some custom function like the average of a 3x3 matrix centered in a[idx1][idx2] or some interpolation function.