I have a pair of numpy arrays; here's a simple equivalent example:
t = np.linspace(0,1,100)
data = ((t % 0.1) * 50).astype(np.uint16)
I want these to be columns in a numpy recarray of dtype f8, i2. This is the only way I can seem to get what I want:
X = np.array(zip(t,data),dtype=[('t','f8'),('data','i2')])
But is it the right way if my data values are large? I want to minimize the unnecessary overhead of shifting around data.
This seems like it should be an easy problem but I can't find a good example.
A straight-forward way to do this is with numpy.rec.fromarrays. In your case:
np.rec.fromarrays([t, data], dtype=[('t','f8'),('data','i2')])
or simply
np.rec.fromarrays([t, data], names='t,data', formats='f8,i2')
would work.
Alternative approaches are also given at Converting a 2D numpy array to a structured array
Related
I have two numpy arrays, with just the 3-dimensional coordinates of two molecules.
I need to implement the following equation, and I'm having problems in the subtraction of each coordinate of one of the arrays by the second, and then square it.
I have tried the following, but since I'm still learning I feel that I am making some major mistake. The simple code I use is:
a = [math.sqrt(1/3*((i[:,0]-j[:,0])**2) + ((i[:,1] - j[:,1])**2) + ((i[:,2]-j[:,2])**2) for i, j in zip(coordenates_2, coordenates_1))]
It's numpy you can easily do it using the following example:
import numpy as np
x1 = np.random.randn(3,3,3)
x2 = np.random.randn(3,3,3)
res = np.sqrt(np.mean(np.power(x1-x2,2)))
I would like to create a numpy array without creating a list first.
At the moment I've got this:
import pandas as pd
import numpy as np
dfa = pd.read_csv('csva.csv')
dfb = pd.read_csv('csvb.csv')
pa = np.array(dfa['location'])
pb = np.array(dfb['location'])
ra = [(pa[i+1] - pa[i]) / float(pa[i]) for i in range(9999)]
rb = [(pb[i+1] - pb[i]) / float(pb[i]) for i in range(9999)]
ra = np.array(ra)
rb = np.array(rb)
Is there any elegant way to do in one step the last fill in of this np array without creating the list first ?
Thanks
You can calculate with vectors in numpy, without the need of lists:
ra = (pa[1:] - pa[:-1]) / pa[:-1]
rb = (pb[1:] - pb[:-1]) / pb[:-1]
The title of your question and what you need to do in your specific case are actually two slighly different things.
To create a numpy array without "casting" a list (or other iterable) you can use one of the several methods defined by numpy itself that returns array:
np.empty, np.zeros, np.ones, np.full to create arrays of given size with fixed values
np.random.* (where * can be various distributions, like normal, uniform, exponential ...), to create arrays of given size with random values
In general, read this: Array creation routines
In your case, you already have numpy arrays (pa and pb) and you don't have to create lists to calculate the new arrays (ra and rb), you can directly operate on the numpy arrays (which is the entire point of numpy: you can do operations on arrays way faster that would be iterating over each element!). Copied from #Daniel's answer:
ra = (pa[1:] - pa[:-1]) / pa[:-1]
rb = (pb[1:] - pb[:-1]) / pb[:-1]
This will be much faster than you're current implementation, not only because you avoid converting a list to ndarray, but because numpy arrays are order of magnuitude faster for mathematical and batch operations than iteration
numpy.zeros
Return a new array of given shape and type, filled with zeros.
or
numpy.ones
Return a new array of given shape and type, filled with ones.
or
numpy.empty
Return a new array of given shape and type, without initializing
entries.
I am trying to create 3D array in python using Numpy and by multiplying 2D array in to 3rd dimension. I am quite new in Numpy multidimensional arrays and basically I am missing something important here.
In this example I am trying to make 10x10x20 3D array using base 2D array(10x10) by copying it 20 times.
My starting 2D array:
a = zeros(10,10)
for i in range(0,9):
a[i+1, i] = 1
What I tried to create 3D array:
b = zeros(20)
for i in range(0,19):
b[i]=a
This approach is probably stupid. So what is correct way to approach construction of 3D arrays from base 2D arrays?
Cheers.
Edit
Well I was doing things wrongly probably because of my R background.
Here is how I did it finally
b = zeros(20*10*10)
b = b.reshape((20,10,10))
for i in b:
for m in range(0, 9):
i[m+1, m] = 1
Are there any other ways to do the same?
There are many ways how to construct multidimensional arrays.
If you want to construct a 3D array from given 2D arrays you can do something like
import numpy
# just some 2D arrays with shape (10,20)
a1 = numpy.ones((10,20))
a2 = 2* numpy.ones((10,20))
a3 = 3* numpy.ones((10,20))
# creating 3D array with shape (3,10,20)
b = numpy.array((a1,a2,a3))
Depending on the situation there are other ways which are faster. However, as long as you use built-in constructors instead of loops you are on the fast side.
For your concrete example in Edit I would use numpy.tri
c = numpy.zeros((20,10,10))
c[:] = numpy.tri(10,10,-1) - numpy.tri(10,10,-2)
Came across similar problem...
I needed to modify 2D array into 3D array like so:
(y, x) -> (y, x, 3).
Here is couple solutions for this problem.
Solution 1
Using python tool set
array_3d = numpy.zeros(list(array_2d.shape) + [3], 'f')
for z in range(3):
array_3d[:, :, z] = array_2d.copy()
Solution 2
Using numpy tool set
array_3d = numpy.stack([array_2d.copy(), ]*3, axis=2)
That is what I came up with. If someone knows numpy to give a better solution I would love to see it! This works but I suspect there is a better way performance-wise.
I'm trying to optimize an algorithm to reduce memory usage, and I've identified this particular operation as a pain point.
I have a symmetric matrix, an index array along the rows, and another index array along the columns (which is just all values that I wasn't selecting in the row index). I feel like I should just be able to pass in both indexes at the same time, but I find myself being forced to select along one axis and then the other, which is causing some memory issues because I don't actually need the copy of the array that's returned, just statistics I'm calculating from it. Here's what I am trying to do:
from scipy.spatial.distance import pdist, squareform
from sklearn import datasets
import numpy as np
iris = datasets.load_iris().data
dx = pdist(iris)
mat = squareform(dx)
outliers = [41,62,106,108,109,134,135]
inliers = np.setdiff1d( range(iris.shape[0]), outliers)
# What I want to be able to do:
scores = mat[inliers, outliers].min(axis=0)
Here's what I'm actually doing to make this work:
# What I'm being forced to do:
s1 = mat[:,outliers]
scores = s1[inliers,:].min(axis=0)
Because I'm fancy indexing, s1 is a new array instead of a view. I only need this array for one operation, so if I could eliminate returning a copy here or at least make the new array smaller (i.e. by respecting the second fancy index selection while I'm doing the first one instead of two separate fancy index operations) that would be preferable.
"Broadcasting" applies to indexing. You could convert inliers into column matrix (e.g. inliers.reshape(-1,1) or inliers[:, np.newaxis], so it has shape (m,1)) and index mat with that in the first column:
s1 = mat[inliers.reshape(-1,1), outliers]
scores = s1.min(axis=0)
There's a better way in terms of readability:
result = mat[np.ix_(inliers, outliers)].min(0)
https://docs.scipy.org/doc/numpy/reference/generated/numpy.ix_.html#numpy.ix_
Try:
outliers = np.array(outliers) # just to be sure they are arrays
result = mat[inliers[:, np.newaxis], outliers[np.newaxis, :]].min(0)
I have a 2D numpy array and a list of lists of indices for which I wish to compute the sum of the corresponding 1D vectors from the numpy array. This can be easily done through a for loop or via list comprehension, but I wonder if it's possible to vectorize it. With similar code I gain about 40x speedups from the vectorization.
Here's sample code:
import numpy as np
indices = [[1,2],[1,3],[2,0,3],[1]]
array_2d = np.array([[0.5, 1.5],[1.5,2.5],[2.5,3.5],[3.5,4.5]])
soln = [np.sum(array_2d[x], axis=-1) for x in indices]
(edit): Note that the indices are not (x,y) coordinates for array_2d, instead indices[0] = [1,2] represents the first and second vectors (rows) in array_2d. The number of elements of each list in indices can be variable.
This is what I would hope to be able to do:
vectorized_soln = np.sum(array_2d[indices[:]], axis=-1)
Does anybody know if there are any ways of achieving this?
First to all, I think you have a typo in the third element of indices...
The easy way to do that is building a sub_array with two arrays of indices:
i = np.array([1,1,2])
j = np.array([2,3,?])
sub_arr2d = array_2d[i,j]
and finally, you can take the sum of sub_arr2d...