Most efficient way to save 2D array and its axes values

Most efficient way to save 2D array and its axes values - python

Suppose I have some large arrays as a result of a 2-variable function:
import numpy as np
lenx = 100
leny = 37
x = np.linspace(0, 10, lenx)
y = np.linspace(0, 20, leny)
z = np.zeros((lenx, leny))
fx = x + 1 # Might need this later
for i in range(lenx):
for k in range(leny):
z[i, k] = x + y
I can plot this using plt.imshow (for example) after I am done calculating z, but sometimes I want to save this data (x, y, z, and maybe fx too) and replot it later. What would be the best way to go about it?
I could expand this into a 3 by lenx*leny array:
[[x0, y0, z00],
[x0, y1, z01],
...
[xn, ym, znm]]
But that seems very inefficient (x and y values would be repeated many times unnecessarily) and would take a long time to load every time.
I also thought of a json file but my understanding is that they are text-based, so in case of very large arrays (think 200 x 250000) it would take up a lot of memory.
What is the best option in this case?

Related

Plotting 3D points with different colors in Mayavi (Python)

Is there any way to give mayavi a list of tuples, or maybe some numpy array of number_of_points x 3 size, such that I can specify different colour for each point?
So, I have the following data:
x of size Nx1 (contains x coordinates of N points)
y of size Nx1 (contains y coordinates of N points)
z of size Nx1 (contains z coordinates of N points)
R of size Nx1 (contains the values for the R channel of N points)
G of size Nx1 (contains the values for the G channel of N points)
B of size Nx1 (contains the values for the B channel of N points)
I want somehow to give this RGB data to mayavi so it will use the actual colour of the point, so I would like something like this:
from mayavi import mlab
plt = mlab.points3d(x, y, z, color = (R, G, B))
This works if N = 1, or with other words, only if I gave Mayavi a single point, otherwise it doesn't. Thus, I can iterate it, but it is very slow and hard on memory for some reason.
I've tried many things, but I can't seem to find a single approach (apart from doing it in a loop) that does what I need. Any ideas on how to do it?

One way is to put your RGB arrays into a lookup table that you then tell your points3d object to use. For example:
import numpy as np
import mayavi.mlab as mlab
# Fake data from:
# http://docs.enthought.com/mayavi/mayavi/auto/mlab_helper_functions.html#points3d
t = np.linspace(0, 2 * np.pi, 20)
x = np.sin(2 * t)
y = np.cos(t)
z = np.cos(2 * t)
# Create a [0..len(t)) index that we'll pass as 's'
s = np.arange(len(t))
# Create and populate lookup table (the integer index in s corresponding
# to the point will be used as the row in the lookup table
lut = np.zeros((len(s), 4))
# A simple lookup table that transitions from red (at index 0) to
# blue (at index len(data)-1)
for row in s:
f = (row/len(s))
lut[row,:] = [255*(1-f),0,255*f,255]
# Plot the points, update its lookup table
p3d = mlab.points3d(x, y, z, s, scale_mode='none')
p3d.module_manager.scalar_lut_manager.lut.number_of_colors = len(s)
p3d.module_manager.scalar_lut_manager.lut.table = lut
mlab.draw()
mlab.show()
Produces
Reference:
Enthought docs: Custom Colormap

Using meshgrid to convert X,Y,Z triplet to three 2D arrays for surface plot in matplotlib

I'm new to Python so please be patient. I appreciate any help!
What I have: three 1D lists (xr, yr, zr), one containing x-values, the other two y- and z-values
What I want to do: create a 3D contour plot in matplotlib
I realized that I need to convert the three 1D lists into three 2D lists, by using the meshgrid function.
Here's what I have so far:
xr = np.asarray(xr)
yr = np.asarray(yr)
zr = np.asarray(zr)
X, Y = np.meshgrid(xr,yr)
znew = np.array([zr for x,y in zip(np.ravel(X), np.ravel(Y))])
Z = znew.reshape(X.shape)
Running this gives me the following error (for the last line I entered above):
total size of new array must be unchanged
I went digging around stackoverflow, and tried using suggestions from people having similar problems. Here are the errors I get from each of those suggestions:
Changing the last line to:
Z = znew.reshape(X.shape[0])
Gives the same error.
Changing the last line to:
Z = znew.reshape(X.shape[0], len(znew))
Gives the error:
Shape of x does not match that of z: found (294, 294) instead of (294, 86436).
Changing it to:
Z = znew.reshape(X.shape, len(znew))
Gives the error:
an integer is required
Any ideas?

Well,sample code below works for me
import numpy as np
import matplotlib.pyplot as plt
xr = np.linspace(-20, 20, 100)
yr = np.linspace(-25, 25, 110)
X, Y = np.meshgrid(xr, yr)
#Z = 4*X**2 + Y**2
zr = []
for i in range(0, 110):
y = -25.0 + (50./110.)*float(i)
for k in range(0, 100):
x = -20.0 + (40./100.)*float(k)
v = 4.0*x*x + y*y
zr.append(v)
Z = np.reshape(zr, X.shape)
print(X.shape)
print(Y.shape)
print(Z.shape)
plt.contour(X, Y, Z)
plt.show()

TL;DR
import matplotlib.pyplot as plt
import numpy as np
def get_data_for_mpl(X, Y, Z):
result_x = np.unique(X)
result_y = np.unique(Y)
result_z = np.zeros((len(result_x), len(result_y)))
# result_z[:] = np.nan
for x, y, z in zip(X, Y, Z):
i = np.searchsorted(result_x, x)
j = np.searchsorted(result_y, y)
result_z[i, j] = z
return result_x, result_y, result_z
xr, yr, zr = np.genfromtxt('data.txt', unpack=True)
plt.contourf(*get_data_for_mpl(xr, yr, zr), 100)
plt.show()
Detailed answer
At the beginning, you need to find out for which values of x and y the graph is being plotted. This can be done using the numpy.unique function:
result_x = numpy.unique(X)
result_y = numpy.unique(Y)
Next, you need to create a numpy.ndarray with function values for each point (x, y) from zip(X, Y):
result_z = numpy.zeros((len(result_x), len(result_y)))
for x, y, z in zip(X, Y, Z):
i = search(result_x, x)
j = search(result_y, y)
result_z[i, j] = z
If the array is sorted, then the search in it can be performed not in linear time, but in logarithmic time, so it is enough to use the numpy.searchsorted function to search. but to use it, the arrays result_x and result_y must be sorted. Fortunately, sorting is part of the numpy.unique method and there are no additional actions to do. It is enough to replace the search (this method is not implemented anywhere and is given simply as an intermediate step) method with np.searchsorted.
Finally, to get the desired image, it is enough to call the matplotlib.pyplot.contour or matplotlib.pyplot.contourf method.
If the function value does not exist for (x, y) for all x from result_x and all y from result_y, and you just want to not draw anything, then it is enough to replace the missing values with NaN. Or, more simply, create result_z as numpy.ndarray` from NaN and then fill it in:
result_z = numpy.zeros((len(result_x), len(result_y)))
result_z[:] = numpy.nan

Difference in x,y parameters for scipy interpolate RectBivariateSpline and interp2d

If I want to interpolate the data below:
from scipy.interpolate import RectBivariateSpline, interp2d
import numpy as np
x1 = np.linspace(0,5,10)
y1 = np.linspace(0,20,20)
xx, yy = np.meshgrid(x1, y1)
z = np.sin(xx**2+yy**2)
with interp2d this works:
f = interp2d(x1, y1, z, kind='cubic')
however if I use RectBivariateSpline with the same x1, y1 parameters:
f = RectBivariateSpline(x1, y1, z)
I get this error:
TypeError Traceback (most recent call last)
<ipython-input-9-3da046e1ebe0> in <module>()
----> 1 f = RectBivariateSpline(x, y, z)
C:\...\Local\Continuum\Anaconda\lib\site-packages\scipy\interpolate\fitpack2.pyc in __init__(self, x, y, z, bbox, kx, ky, s)
958 raise TypeError('y must be strictly ascending')
959 if not x.size == z.shape[0]:
--> 960 raise TypeError('x dimension of z must have same number of '
961 'elements as x')
962 if not y.size == z.shape[1]:
TypeError: x dimension of z must have same number of elements as x
I'd have to switch the sizes of x, y like this to have it work:
x2 = np.linspace(0,5,20)
y2 = np.linspace(0,20,10)
f = RectBivariateSpline(x2, y2, z)
Is there a reason for this behavior - or something I am not understanding?

Well, the reason is that the parameters to the two functions are, as you have noted, different. Yes, this makes it really hard to just switch out one for the other, as I well know.
Why? In general it was a clear design decision to break backward compatibility with the new object-oriented spline functions, or at least not worry about it. Certainly, for large grid sizes there is significant space savings with not having to pass x and y as 2D objects. Frankly, I have found in my code that once this initial barrier is overcome, I'm much happier using the spline objects. For example, with the UnivariateSpline object, getting the derivative(s) is easy, as is the integral.
It would appear that, going forward, the SciPy folks will focus on the new objects, so you might contemplate just moving to them now. They are the same base functionality, and have additional methods that provide nice benefits.
EDIT - clarify what 'broke' between the two.
From the SciPy manual on interp2d you get the code snippet:
from scipy import interpolate
x = np.arange(-5.01, 5.01, 0.25)
y = np.arange(-5.01, 5.01, 0.25)
xx, yy = np.meshgrid(x, y)
z = np.sin(xx**2+yy**2)
f = interpolate.interp2d(x, y, z, kind=’cubic’)
This can be, unfortunately, potentially misleading since both x and y are the same length, so z will be a square matrix. So, lets play with this a bit:
x = np.linspace(0,5,11)
y = np.linspace(0,20,21) # note different lengths
z = x[None,:].T + y*y # need broadcasting
xx,yy = np.meshgrid(x,y) # this is from the interp2d example to compare
zz = xx + yy*yy
These now have different shapes: shape(z) is (11,21) and shape(zz) is (21,11). In fact, they are the transpose of each other, z == zz.T. Once you realize this, it all becomes clearer - going from interp2d to RectBivariateSpline swapped the expected axes. Pick one instantiation of the splines (I've opted for the newer ones), and you have picked a particular set of axes to keep clear in your head. To mix them together, a simple transpose will work as well, but can get to be a headache when you go back through your code a month or more from now.

Search numpy array ((x, y, z)...) for z matching nearest x, y

I have a very large array similar to elevation data of the format:
triplets = ((x0, y0, z0),
(x1, y1, z1),
... ,
(xn, yn, zn))
where x, y, z are all floats in metres. You can create suitable test data matching this format with:
x = np.arange(20, 40, dtype=np.float64)
y = np.arange(30, 50, dtype=np.float64)
z = np.random.random(20) * 25.0
triplets = np.hstack((x, y, z)).reshape((len(x),3))
I want to be able to efficiently find the corresponding z-value for a given (x,y) pair. My research so far leads to more questions. Here's what I've got:
Iterate through all of the triplets:
query = (a, b) # where a, b are the x and y coordinates we're looking for
for i in triplets:
if i[0] == query[0] and i[1] == query[1]:
result = i[2]
Drawbacks: slow; a, b must exist, which is a problem with comparing floats.
Use scipy.spatial.cKDTree to find nearest:
points = triplets[:,0:2] # drops the z column
tree = cKDTree(points)
idx = tree.query((a, b))[1] # this returns a tuple, we want the index
query = tree.data[idx]
result = triplets[idx, 2]
Drawbacks: returns nearest point rather than interpolating.
Using interp2d as per comment:
f = interp2d(x, y, z)
result = f(a, b)
Drawbacks: doesn't work on a large dataset. I get OverflowError: Too many data points to interpolate when run on real data. (My real data is some 11 million points.)
So the question is: is there any straightforward way of doing this that I'm overlooking? Are there ways to reduce the drawbacks of the above?

If you want to interpolate the result, rather than just find the z value for the nearest neighbour, I would consider doing something like the following:
Use a k-d tree to partition your data points according to their (x, y) coordinates
For a given (xi, yi) point to interpolate, find its k nearest neighbours
Take the average of their z values, weighted according to their distance from (xi, yi)
The code might look something like this:
import numpy as np
from scipy.spatial import cKDTree
# some fake (x, y, z) data
XY = np.random.rand(10000, 2) - 0.5
Z = np.exp(-((XY ** 2).sum(1) / 0.1) ** 2)
# construct a k-d tree from the (x, y) coordinates
tree = cKDTree(XY)
# a random point to query
xy = np.random.rand(2) - 0.5
# find the k nearest neighbours (say, k=3)
distances, indices = tree.query(xy, k=3)
# the z-values for the k nearest neighbours of xy
z_vals = Z[indices]
# take the average of these z-values, weighted by 1 / distance from xy
dw_avg = np.average(z_vals, weights=(1. / distances))
It's worth playing around a bit with the value of k, the number of nearest neighbours to take the average of. This is essentially a crude form of kernel density estimation, where the value of k controls the degree of 'smoothness' you're imposing on the underlying distribution of z-values. A larger k results in more smoothness.
Similarly, you might want to play around with how you weight the contributions of points according to their distance from (xi, yi), depending on how you think similarity in z decreases with increasing x, y distance. For example you might want to weight by (1 / distances ** 2) rather than (1 / distances).
In terms of performance, constructing and searching k-d trees are both very efficient. Bear in mind that you only need to construct the tree once for your dataset, and if necessary you can query multiple points at a time by passing (N, 2) arrays to tree.query().
Tools for approximate nearest neighbour searches, such as FLANN, might potentially be quicker, but these are usually more helpful in situations when the dimensionality of your data is very high.

I don't understand your cKDTree code, you got the idx, why do the for loop again? You can get the result just by result = triplets[idx, 2].
from scipy.spatial import cKDTree
x = np.arange(20, 40, dtype=np.float64)
y = np.arange(30, 50, dtype=np.float64)
z = np.random.random(20) * 25.0
triplets = np.hstack((x, y, z)).reshape((len(x),3))
a = 30.1
b = 40.5
points = triplets[:,0:2] # drops the z column
tree = cKDTree(points)
idx = tree.query((a, b))[1] # this returns a tuple, we want the index
result = triplets[idx, 2]

You can create a sparse matrix and use simple indexing.
In [1]: import numpy as np
In [2]: x = np.arange(20, 40, dtype=np.float64)
In [3]: y = np.arange(30, 50, dtype=np.float64)
In [4]: z = np.random.random(20) * 25.0
In [9]: from scipy.sparse import coo_matrix
In [12]: m = coo_matrix((z, (x, y))).tolil()
In [17]: m[25,35]
Out[17]: 17.410532044604292

Calculating next y using previous y element using numpy

I want to calculate the speed of a vehicle, to plot a graph with time in seconds on x axis, and speed in km/h on y axis. To do that, I need to get the previous calculated y value.
Example: y[x] = y[x-1] * a
a = 0,11768
x = np.arange(0, 100, 1) # 0 to 100 seconds
y = a * y[x-1] ??
plt.plot(x, y)
plt.show()
Is that possible with numpy, or should I do a loop to iterate over all indexes?

v=v0+at Assuming your acceleration is constant and v0=0there's no need to do what you want simply:
import numpy as np
import matplotlib.pyplot as plt
a = 0.11768 #is it in m/s^2? I've used m/s^2...
v=[] #velocity at a given time ‹
x = np.arange(0, 100, 1) # 0 to 100 seconds
for i in x: # ‹
v.append(i*a) #read it as a*t, in fact is t...use i*a*3.6 if you want km/h ‹
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x,v,)
plt.plot(x, v)
plt.ylabel(r'Velocity $(m/sec )$') #note if you want km/h use v.append(i*a*3.6) above
plt.xlabel(r'Time $(sec)$')
plt.show()
This is the result:
EDIT:
As suggest by Joe in his comment you should use v=a*x delating the lines marked with ‹in my code, for a more efficient way to do that!

Your calculation for y is wrong. Instead of multiplying the previous speed with the acceleration, you have to add the acceleration to that speed. An alternative way would be to multiply the acceleration with the time and add that to some initial speed. This way, you can use a simple list comprehension for y.
a = 0.11768 # acceleration (note the dot instead of comma!)
y0 = 0 # initial speed at time x = 0
X = numpy.arange(0, 100, 1)
Y = numpy.array([y0 + a * x for x in X])
When using Numpy, there's an even simpler way -- thanks to #JoeKington for pointing this out:
Y = y0 + a * X # multiplies each value of X with a and adds y0

I don't know if it's possible in numpy but I know how you can easily achieve it using pandas:
import pandas as pd
import numpy as np
a=0.11768
df = pd.DataFrame(np.arange(0, 100, 1),columns=['X'])
df['Y'] = a*df['X'].shift(1)

a = 0.11768
x = np.arange(0, 100, 1)
y = [1]
for i in range(1, len(x)-1):
y.append(a * y[i-1])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Most efficient way to save 2D array and its axes values - python

Related

Plotting 3D points with different colors in Mayavi (Python)

Using meshgrid to convert X,Y,Z triplet to three 2D arrays for surface plot in matplotlib

Difference in x,y parameters for scipy interpolate RectBivariateSpline and interp2d

Search numpy array ((x, y, z)...) for z matching nearest x, y

Calculating next y using previous y element using numpy

Categories

Resources