Generate image data from three numpy arrays - python

I have three numpy arrays, X, Y, and Z.
X and Y are coordinates of a spatial grid and each grid point (X, Y) has an intensity Z. I would like to save a PNG image using this data. Interpolation is not needed, as X and Y are guaranteed to cover each grid point between min(X) and max(Y).
I'm guessing the solution lies within numpy's meshgrid() function, but I can't figure out how to reshape the Z array to NxM intensity data.
How can I do that?
To clarify the input data structure, this is what it looks like:
X | Y | Z
-----------------------------
0.1 | 0.1 | something..
0.1 | 0.2 | something..
0.1 | 0.3 | something..
...
0.2 | 0.1 | something..
0.2 | 0.2 | something..
0.2 | 0.3 | something..
...
0.2 | 0.1 | something..
0.1 | 0.2 | something..
0.3 | 0.3 | something..
...

To begin with, you should run this piece of code:
import numpy as np
X = np.asarray(<X data>)
Y = np.asarray(<Y data>)
Z = np.asarray(<Z data>)
Xu = np.unique(X)
Yu = np.unique(Y)
Then you could apply any of the following approaches. It is worth noting that all of them would work fine even if the data are NOT sorted (in contrast to the currently accepted answer):
1) A for loop and numpy.where() function
This is perhaps the simplest and most readable solution:
Zimg = np.zeros((Xu.size, Yu.size), np.uint8)
for i in range(X.size):
Zimg[np.where(Xu==X[i]), np.where(Yu==Y[i])] = Z[i]
2) A list comprehension and numpy.sort() funtion
This solution - which is a bit more involved than the previous one - relies on Numpy's structured arrays:
data_type = [('x', np.float), ('y', np.float), ('z', np.uint8)]
XYZ = [(X[i], Y[i], Z[i]) for i in range(len(X))]
table = np.array(XYZ, dtype=data_type)
Zimg = np.sort(table, order=['y', 'x'])['z'].reshape(Xu.size, Yu.size)
3) Vectorization
Using lexsort is an elegant and efficient way of performing the required task:
Zimg = Z[np.lexsort((Y, X))].reshape(Xu.size, Yu.size)
4) Pure Python, not using NumPy
You may want to check out this link for a pure Python solution without any third party dependencies.
To end up, you have different options to save Zimg as an image:
from PIL import Image
Image.fromarray(Zimg).save('z-pil.png')
import matplotlib.pyplot as plt
plt.imsave('z-matplotlib.png', Zimg)
import cv2
cv2.imwrite('z-cv2.png', Zimg)
import scipy.misc
scipy.misc.imsave('z-scipy.png', Zimg)

You said you needed no interpolation is needed since every grid point is covered. So I assume the points are equally spaced.
If your table is already sorted primary by increasing x and secondary by y you can simply take the Z array directly and save it using PIL:
import numpy as np
# Find out what shape your final array has (if you already know just hardcode these)
x_values = np.unique(X).size
y_values = np.unique(Y).size
img = np.reshape(Z, (x_values, y_values))
# Maybe you need to cast the dtype to fulfill png restrictions
#img = img.astype(np.uint) # alter it if needed
# Print image
from PIL import Image
Image.fromarray(img).save('filename.png')
If your input isn't sorted (it looks like it is but who knows) you have to sort it before you start. Depending on your input this can be easy or really hard.

np.ufunc.at is a good tool to manage duplicates in vectorized way.
Suppose these data :
In [3]: X,Y,Z=rand(3,10).round(1)
(array([ 0.4, 0.2, 0.1, 0.8, 0.4, 0.1, 0.5, 0.2, 0.6, 0.2]),
array([ 0.5, 0.3, 0.5, 0.9, 0.9, 0.5, 0.3, 0.6, 0.4, 0.4]),
array([ 0.4, 0.6, 0.6, 0.4, 0.1, 0.1, 0.2, 0.6, 0.9, 0.8]))
First scale the image (scale=3 here) :
In [4]: indices=[ (3*c).astype(int) for c in (X,Y)]
[array([1, 0, 0, 2, 1, 0, 1, 0, 1, 0]), array([1, 0, 1, 2, 2, 1, 0, 1, 1, 1])]
Make a empty image : image=zeros((3,3)), according to indices bounds.
Then build. Here we keep the maximum.
In [5]: np.maximum.at(image,indices,Z) # in place
array([[ 0.6, 0.8, 0. ],
[ 0.2, 0.9, 0.1],
[ 0. , 0. , 0.4]])
Finally save in PNG : matplotlib.pyplot.imsave('img.png',image)

Related

Why can't RegularGridInterpolator not return several values (for a function that outputs in $R^d$)

MRE (with working output and output that doesn't work although I would like it to work as it would be the intuitional thing to do):
import numpy as np
from scipy.interpolate import RegularGridInterpolator, griddata
def f(x1, x2, x3):
return x1 + 2*x2 + 3*x3, x1**2, x2
# Define the input points
xi = [np.linspace(0, 1, 5), np.linspace(0, 1, 5), np.linspace(0, 1, 5)]
# Mesh grid
x1, x2, x3 = np.meshgrid(*xi, indexing='ij')
# Outputs
y = f(x1, x2, x3)
assert (y[0][1][1][3] == (0.25 + 2*0.25 + 3*0.75))
assert (y[1][1][1][3] == (0.25**2))
assert (y[2][1][1][3] == 0.25)
#### THIS WORKS BUT I CAN ONLY GET THE nth (with n integer in [1, d]) VALUE RETURNED BY f
# Interpolate at point 0.3, 0.3, 0.4
interp = RegularGridInterpolator(xi, y[0])
print(interp([0.3, 0.3, 0.4])) # outputs 2.1 as expected
#### THIS DOESN'T WORK (I WOULD'VE EXPECTED A LIST OF TUPLES FOR EXAMPLE)
# Interpolate at point 0.3, 0.3, 0.4
interp = RegularGridInterpolator(xi, y)
print(interp([0.3, 0.3, 0.4])) # doesn't output array([2.1, 0.1, 0.3])
What is intriguing is that griddata does support functions that output values in R^d
# Same with griddata
grid_for_griddata = np.array([x1.flatten(), x2.flatten(), x3.flatten()]).T
assert (grid_for_griddata.shape == (125, 3))
y_for_griddata = np.array([y[0].flatten(), y[1].flatten(), y[2].flatten()]).T
assert (y_for_griddata.shape == (125, 3))
griddata(grid_for_griddata, y_for_griddata, [0.3, 0.3, 0.4], method='linear')[0] # outputs array([2.1, 0.1, 0.3]) as expected
Am I using RegularGridInterpolator the wrong way?
I know someone might say "just use griddata", but because my data is in a rectilinear grid, I should use RegularGridInterpolator so that it's faster, right?
Proof that it's faster:
If I define a y with the 3 as last dimension:
In [196]: yarr = np.stack(y,axis=3); yarr.shape
Out[196]: (5, 5, 5, 3)
Setup works (no complaints about 3 not matching 5):
In [197]: interp = RegularGridInterpolator(xi, yarr)
And the interpolation:
In [198]: interp([.3,.3,.4])
Out[198]: array([[2.1, 0.1, 0.3]])
and for multiple points:
In [202]: interp([[.3,.3,.4],[.31,.31,.41],[.5,.4,.4]])
Out[202]:
array([[2.1 , 0.1 , 0.3 ],
[2.16 , 0.1075, 0.31 ],
[2.5 , 0.25 , 0.4 ]])
While the above was just a guess that works, I see that the docs can be interpreted this way:
values: array_like, shape (m1, …, mn, …)
The ... at the end suggest that the array may have 0 or more trailing dimensions (beyond the n that match the points dimensions). But this flexibility may apply more to the linear and nearest methods. Others seem to have problems.
This is clearer on the doc page for its __call__:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.RegularGridInterpolator.__call__.html#scipy.interpolate.RegularGridInterpolator.__call__
Returns
values_x : ndarray, shape xi.shape[:-1] + values.shape[ndim:]
interpn also documents this.

Interpolating an xarray DataArray for N points and getting a list of N interpolated using dask

Sorry if the title isn't very descriptive, but what I want is the following.
I have a DataArray with coordinates x, y and t. I also have a list of N coordinates and I'd like to interpolate to get a list of N interpolated values. However, I don't quite know how to do that with xarray while still taking advantage of the parallelism of dask. Here's an example with random values:
import numpy as np
import xarray as xr
x = np.linspace(0, 1, 10)
datar = xr.DataArray(np.random.randn(10,10,10), dims=('x', 'y', 't'), coords=dict(x=x,
y=x,
t=x))
datar = datar.chunk(dict(t=1))
points = np.array([(0.1, 0.1, 0.1),
(0.2, 0.3, 0.3),
(0.6, 0.6, 0.6),
])
ivals = []
for point in points:
x0, y0, t0 = point
interp_val = datar.interp(x=x0, y=y0, t=t0)
ivals.append(float(interp_val))
print(ivals)
This gives me the correct result of [-1.7047738779949937, 0.9568015637947849, 0.04437392968785547].
Is there any way to achieve the same result but taking advantage of dask?
If I naively pass lists to the interpolating function I get a 3 cubed matrix instead:
In [35]: x0s, y0s, t0s = points.T
...: print(datar.interp(x=x0s, y=y0s, t=t0s))
...:
<xarray.DataArray (x: 3, y: 3, t: 3)>
dask.array<dask_aware_interpnd, shape=(3, 3, 3), dtype=float64, chunksize=(3, 3, 3), chunktype=numpy.ndarray>
Coordinates:
* x (x) float64 0.1 0.2 0.6
* y (y) float64 0.1 0.3 0.6
* t (t) float64 0.1 0.3 0.6
A bit late, but in order to interpolate the way you want, and not having a cube as a result, you should cast your coordinates as xarray DataArrays with a fictitious dimension points:
import numpy as np
import xarray as xr
np.random.seed(1234)
x = np.linspace(0, 1, 10)
datar = xr.DataArray(np.random.randn(10, 10, 10), dims=('x', 'y', 't'), coords=dict(x=x, y=x, t=x))
datar = datar.chunk(dict(t=1))
points = np.array([(0.1, 0.1, 0.1),
(0.2, 0.3, 0.3),
(0.6, 0.6, 0.6)])
x = xr.DataArray(points[:, 0], dims="points")
y = xr.DataArray(points[:, 1], dims="points")
t = xr.DataArray(points[:, 2], dims="points")
datar.interp(x=x, y=y, t=t).values
It gives you the three values tou want. Two remarks :
you should time the executions of the two methods, your loop for and my solution, to check if xarray really takes advantage of the multiple points given to interp,
you give the correct values you expect, but they depend on your random data. You should fix the seed before in order to give reproducible examples ;)

How to generate a 0-1 sequence according to different probabilities efficiently?

I want to get a random 0-1 sequence. Now I generate the number one by one. My code is as follows:
p_arr = [0.1, 0.5, 0.3, 0.8]
seq = []
for pb in p_arr:
seq.append(np.random.choice(2, 1, p=[1-pb, pb]))
It's very time-consuming when the length of p_arr is very large (i.e. 10000). I wonder if there is another faster way to do this.
If you want to use Numpy, do:
Try it online!
import numpy as np
p_arr = [0.1, 0.5, 0.3, 0.8]
seq = (np.random.rand(len(p_arr)) > p_arr).astype(np.uint8)
print(seq)
If you don't want to use Numpy then you can still make your code simpler:
Try it online!
import random
p_arr = [0.1, 0.5, 0.3, 0.8]
seq = []
for pb in p_arr:
seq.append(1 if random.random() > pb else 0)
print(seq)
Also notice that in both cases I thought that your p_arr contains probability of 0 (not 1). If you want to inverse logic, then replace > with < in both of my codes.

Summing three consecutive number when equal to or great than 0 - Python

I am using numpy in Python
I have an array of numbers, for example:
arr = np.array([0.1, 1, 1.2, 0.5, -0.3, -0.2, 0.1, 0.5, 1)
If i is a position in the array, I want to create a function which creates a running sum of i and the two previous numbers, but only accumulating the number if it is equal to or greater than 0.
In other words, negative numbers in the array become equal to 0 when calculating the three number running sum.
For example, the answer I would be looking for here is
2.3, 2.7, 1.7, 0.5, 0.1, 0.6, 1.6
The new array has two elements less than the original array as the calculation can't be completed for the first two number.
Thank you !
As Dani Mesejo answered, you can use stride tricks. You can either use clip or boolean indexing to handle the <0 elements. I have explained how stride tricks work below -
arr[arr<0]=0 sets all elements below 0 as 0
as_strided takes in the array, the expected shape of the view (7,3) and the number of strides in the respective axes, (8,8). This is the number of bytes you have to move in axis0 and axis1 respectively to access the next element. E.g. If you want to move every 2 elements, then you can set it to (16,8). This means you would move 16 bytes each time to get the element in axis0 (which is 0.1->1.2->0->0.1->.., till a shape of 7) and 8 bytes each time to get element in axis1 (which is 0.1->1->1.2, till a shape of 3)
Use this function with caution! Always use x.strides to define the strides parameter to avoid corrupting memory!
Lastly, sum this array view over axis=1 to get your rolling sum.
arr = np.array([0.1, 1, 1.2, 0.5, -0.3, -0.2, 0.1, 0.5, 1])
w = 3 #rolling window
arr[arr<0]=0
shape = arr.shape[0]-w+1, w #Expected shape of view (7,3)
strides = arr.strides[0], arr.strides[0] #Strides (8,8) bytes
rolling = np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)
rolling_sum = np.sum(rolling, axis=1)
rolling_sum
array([2.3, 2.7, 1.7, 0.5, 0.1, 0.6, 1.6])
You could clip, roll and sum:
import numpy as np
def rolling_window(a, window):
"""Recipe from https://stackoverflow.com/q/6811183/4001592"""
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
a = np.array([0.1, 1, 1.2, 0.5, -0.3, -0.2, 0.1, 0.5, 1])
res = rolling_window(np.clip(a, 0, a.max()), 3).sum(axis=1)
print(res)
Output
[2.3 2.7 1.7 0.5 0.1 0.6 1.6]
You may use np.correlate to sweep an array of 3 ones over the clipped of arr to get desired output
In [20]: np.correlate(arr.clip(0), np.ones(3), mode='valid')
Out[20]: array([2.3, 2.7, 1.7, 0.5, 0.1, 0.6, 1.6])
arr = np.array([0.1, 1, 1.2, 0.5, -0.3, -0.2, 0.1, 0.5, 1])
def sum_3(x):
collector = []
for i in range(len(arr)-2):
collector.append(sum(arr[i:i+3][arr[i:i+3]>0]))
return collector
#output
[2.3, 2.7, 1.7, 0.5, 0.1, 0.6, 1.6]
Easiest and most comprehensible way. The collector will append the sum of the 3 consecutive numbers if their indices are True otherwise, they are all turned to 0s.
The method is not general, it is for 3 consecutives but you can adapt it.
def sum_any(x,n):
collector = []
for i in range(len(arr)-(n-1)):
collector.append(sum(arr[i:i+n][arr[i:i+n]>0]))
return collector
Masked arrays and view_as_windows (which uses numpy strides under the hood) are built for this purpose:
from skimage.util import view_as_windows
arr = view_as_windows(arr, 3)
arr2 = np.ma.masked_array(arr, arr<0).sum(-1)
output:
[2.3 2.7 1.7 0.5 0.1 0.6 1.6]

numpy *= not working

I use numpy to calculate matrix multiply.
If I use t = t * x, it works just fine, but if I use t *= x, it doesn't.
Do I need to use t = t * x?
import numpy as np
if __name__ == '__main__':
x = [
[0.9, 0.075, 0.025],
[0.15, 0.8, 0.05],
[0.25, 0.25, 0.5]
]
t = [1, 0, 0]
x = np.matrix(x)
t = np.matrix(t)
t = t * x # work , [[ 0.9 0.075 0.025]]
# t *= x # not work? always [[0 0 0]]
print t
You filled t with ints rather than floats, so NumPy decides you want a matrix of integer dtype. When you do t *= x, this requests that the operation be performed in place, reusing the t object to store the result. This forces the results to be cast to integers, so they can be stored in t.
Initialize t with floats:
t = numpy.matrix([1.0, 0.0, 0.0])
I would also recommend switching to plain arrays, rather than matrices. The convenience of * over dot isn't worth the inconsistencies matrix causes. If you're on Python 3.5 or later, you can even use # for matrix multiplication with regular arrays.

Categories

Resources