I have a multidimensional numpy array.
The first array indicates the quality of the data. 0 is good, 1 is not so good.
For a first check I only want to use good data.
How do I split the array into two new ones?
My own idea does not work:
good_data = [x for x in data[0,:] if x = 1.0]
bad_data = [x for x in data[0,:] if x = 0.0]
Here is a small example indicating my problem:
import numpy as np
flag = np.array([0., 0., 0., 1., 1., 1.])
temp = np.array([300., 310., 320., 300., 222., 333.])
pressure = np.array([1013., 1013., 1013., 900., 900., 900.])
data = np.array([flag, temp, pressure])
good_data = data[0,:][data[0,:] == 1.0]
bad_data = data[0,:][data[0,:] == 0.0]
print good_data
The print statement gives me [1., 1., 1.].
But I am looking for [[1., 1., 1.], [300., 222., 333.], [900., 900., 900.]].
Is this what you are looking for?
good_data = data[0,:][data[0,:] == 1.0]
bad_data = data[0,:][data[0,:] == 0.0]
This returns a numpy.array.
Alternatively, you can do as you suggested, but convert the resulting list to numpy.array:
good_data = np.array([x for x in data[0,:] if x == 1.0])
Notice the comparison operator == in place of the assignment operator =.
For your particular example, subset data using flag == 1 while iterating over the first index:
good_data = [data[n,:][flag == 1] for n in range(data.shape[0])]
If you really want the elements of good_data to be lists, convert inside the comprehension:
good_data = [data[n,:][flag == 1].tolist() for n in range(data.shape[0])]
Thanks to Jaime who pointed out that the easy way to do this is:
good_data = data[:, data[0] == 1]
Related
I was trying to speed up the code for calculating the distance by this code
points = genfromtxt(path, delimiter='\t', usecols=[0, 1])
max_id = len(points)
points =torch.tensor(points)
d = pd.DataFrame(np.zeros((max_id, max_id)))
dis = torch.cdist(points,points)
n = 0
for i in range(max_id):
print(i)
for j in range(i + 1, max_id):
d.at[i, j] = dis[n]
d.at[j, i] = d.at[i, j]
n += 1
i got
d.at[i, j] = dis[n]
ValueError: only one element tensors can be converted to Python scalars
import torch
points = torch.randint(1,5,(4,2)).float() # You already have this
print(points)
dis = torch.cdist(points,points)
print(dis)
Output
tensor([[2., 1.],
[1., 2.],
[3., 4.],
[4., 4.]])
tensor([[0.0000, 1.4142, 3.1623, 3.6056],
[1.4142, 0.0000, 2.8284, 3.6056],
[3.1623, 2.8284, 0.0000, 1.0000],
[3.6056, 3.6056, 1.0000, 0.0000]])
Here is a snippet to help you understand how to work with cdist. It is returning the scalar distance already. You don't need to do anything extra. You already have the dis matrix here.
You want the dis tensor to be in numpy array you can simply call numpy() function.
dis_np = dis.numpy()
How can I stack arrays in an alternating fashion? Consider the following example with three arrays:
import numpy as np
one = np.ones((5, 2, 2))
two = np.ones((5, 2, 2))*2
three = np.ones((5, 2, 2))*3
I would like to create a new array result with shape (15, 2, 2) which is formed by alternately taking a slice from each of the given arrays, i.e. the result should look like:
result[0] = one[0]
result[1] = two[0]
result[2] = three[0]
result[3] = one[1]
result[4] = two[1]
result[5] = three[1]
result[6] = one[2]
etc...
The arrays above are just an example to illustrate the question, I am not looking for a way to create this specific result array. What is the easiest way to achieve this, at best with specifying a stacking axis?
Of course, it is possible to do some loops but it seems rather inconvenient...
You may wanne take a look at np.stack() i.e.:
np.stack([one, two, three], axis=1).reshape(15, 2, 2)
With np.hstack and then reshape (with -1 for the first axis appended with the lengths along last two axes for a generic solution) -
np.hstack([one,two,three]).reshape((-1,)+one.shape[1:])
I think you are looking for np.vstack
np.vstack((one,two,three))
Read more about it here np.vstack
With selectable axis:
# example arrays
a,b,c = np.multiply.outer([1,2,3],np.ones((5,2,2)))
# axis
k = 1
np.stack([a,b,c],k+1).reshape(*(-(k==j) or s for j,s in enumerate(a.shape)))
# array([[[1., 1.],
# [2., 2.],
# [3., 3.],
# [1., 1.],
# [2., 2.],
# [3., 3.]],
#
# [[1., 1.],
...
I have an array:
MDP= [[0.705,.655,0.614,0.388],[0.762,None,0.660,-1],[0.812,.868,0.918,+1]]
How can I apply np.around on above array without getting the error for None and -1, +1 values?
TIA
Make sure that you work with a numpy array, not lists of lists:
np.around(np.array(MDP).astype(float))
#array([[ 1., 1., 1., 0.],
# [ 1., nan, 1., -1.],
# [ 1., 1., 1., 1.]])
You can convert the result back to a nested list with .tolist(), if needed.
My solution is to make an exception when the values inside the array are of NoneType. This can be done pretty elegantly via a lambda function.
If your array is 1D:
flex_round = lambda array: [None if x == None else np.round(x) for x in array]
If your array is 2D:
flex_round = lambda array: [[None if x == None else np.round(x) for x in y] for y in array]
Do not forget to add the decimals argument to the np.roud call to precise how many digits should remain after the comma.
I am very new to Python (in the past I used Mathematica, Maple, or Matlab scripts). I am very impressed how NumPy can evaluate functions over arrays but having problems trying to implement it in several dimensions. My question is very simple (please don't laugh): is there a more elegant and efficient way to evaluate some function f (which is defined over R^2) without using loops?
import numpy
M=numpy.zeros((10,10))
for i in range(0,10):
for j in range(0,10):
M[i,j]=f(i,j)
return M
The goal when coding with numpy is to implement your computation on the whole array, as much as possible. So if your function is, for example, f(x,y) = x**2 +2*y and you want to apply it to all integer pairs x,y in [0,10]x[0,10], do:
x,y = np.mgrid[0:10, 0:10]
fxy = x**2 + 2*y
If you don't find a way to express your function in such a way, then:
Ask how to do it (and state explicitly the function definition)
use numpy.vectorize
Same example using vectorize:
def f(x,y): return x**2 + 2*y
x,y = np.mgrid[0:10, 0:10]
fxy = np.vectorize(f)(x.ravel(),y.ravel()).reshape(x.shape)
Note that in practice I only use vectorize similarly to python map when the content of the arrays are not numbers. A typical example is to compute the length of all list in an array of lists:
# construct a sample list of lists
list_of_lists = np.array([range(i) for i in range(1000)])
print np.vectorize(len)(list_of_lists)
# [0,1 ... 998,999]
Yes, many numpy functions operate on N-dimensional arrays. Take this example:
>>> M = numpy.zeros((3,3))
>>> M[0][0] = 1
>>> M[2][2] = 1
>>> M
array([[ 1., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 1.]])
>>> M > 0.5
array([[ True, False, False],
[False, False, False],
[False, False, True]], dtype=bool)
>>> numpy.sum(M)
2.0
Note the difference between numpy.sum, which operates on N-dimensional arrays, and sum, which only goes 1 level deep:
>>> sum(M)
array([ 1., 0., 1.])
So if you build your function f() out of operations that work on n-dimensional arrays, then f() itself will work on n-dimensional arrays.
You can also use numpy multi-dimension slicing, like below. You just provide slices for each dimension:
arr = np.zeros((5,5)) # 5 rows, 5 columns
# update only first column
arr[:,0] = 1
# update only last row ... same as arr[-1] = 1
arr[-1,:] = 1
# update center
arr[1:-1, 1:-1] = 1
print arr
output:
array([[ 1., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 1.]])
A pure python answer, not depending upon numpy tools, is to make the Cartesian Product of two sequences:
from itertools import product
for i, j in product(range(0, 10), range(0, 10)):
M[i,j]=f(i,j)
Edit: Actually, I should have read the question properly. This still uses loops, just one less loop.
I'm using genfromtxt to import essentially a 2D array that has all its values listed in a text file of the form (x's and y's are integers):
x1 y1 z1
x2 y2 z2
: : :
I'm using the for loop below but I'm pretty sure there must be a one line way to do it. What would be a more efficient way to do this conversion?
raw = genfromtxt(file,skip_header = 6)
xrange = ( raw[:,0].min() , raw[:,0].max() )
yrange = ( raw[:,1].min() , raw[:,1].max() )
Z = zeros(( xrange[1] - xrange[0] +1 , yrange[1] - yrange[0] +1 ))
for row in raw:
Z[ row[0]-xrange[0] , row[1]-yrange[0] ] = row[2]
You can replace the for loop with the following:
xidx = (raw[:,0]-xrange[0]).astype(int)
yidx = (raw[:,1]-yrange[0]).astype(int)
Z[xidx, yidx] = raw[:,2]
To import a matrix from a file you can just split the lines and then convert to int.
[[int(i) for i in j.split()] for j in open('myfile').readlines()]
of course, I'm supposing your file contains only the matrix.
At the end, you can convert this 2-D array to numpy.
You may try something like this:
>>> Z = zeros((3, 3))
>>> test = array([[0, 1, 2], [1, 1, 6], [2, 0, 4]])
>>> Z[test[:, 0:2].T.tolist()]
array([ 0., 0., 0.])
>>> Z[test[:, 0:2].T.tolist()] = test[:, 2]
>>> Z
array([[ 0., 2., 0.],
[ 0., 6., 0.],
[ 4., 0., 0.]])
In your case:
Z[(raw[:, 0:2] - minimum(raw[:, 0:2], axis=0)).T.tolist()] = raw[:, 2]
You could also go with numpy.searchsorted which will also allow for non-equally spaced / float data:
raw = genfromtxt(file,skip_header = 6)
xvalues = numpy.sorted(set(raw[:,0]))
xidx = numpy.searchsorted(xvalues, raw[:,0])
yvalues = numpy.sorted(set(raw[:,1]))
yidx = numpy.searchsorted(yvalues, raw[:,1])
Z = numpy.zeros((len(xvalues), len(yvalues)))
Z[xidx, yidx] = raw[:,2]
Otherwise, I would be following Simon's answer.