In the help resource for the multivariate normal sampling function in SciPy, they give the following example:
x,y = np.random.multivariate_normal(mean,cov,5000).T
My question is rather basic: what does the final .T actually do?
Thanks a lot, I know it is fairly simple, but it is hard to look in Google for ".T".
The .T accesses the attribute T of the object, which happens to be a NumPy array. The T attribute is the transpose of the array, see the documentation.
Apparently you are creating random coordinates in the plane. The output of multivariate_normal() might look like this:
>>> np.random.multivariate_normal([0, 0], [[1, 0], [0, 1]], 5)
array([[ 0.59589335, 0.97741328],
[-0.58597307, 0.56733234],
[-0.69164572, 0.17840394],
[-0.24992978, -2.57494471],
[ 0.38896689, 0.82221377]])
The transpose of this matrix is:
array([[ 0.59589335, -0.58597307, -0.69164572, -0.24992978, 0.38896689],
[ 0.97741328, 0.56733234, 0.17840394, -2.57494471, 0.82221377]])
which can be conveniently separated in x and y parts by sequence unpacking.
.T is just np.transpose().
Best of luck
Example
import numpy as np
a = [[1, 2, 3]]
b = np.array(a).T # ndarray.T The transposed array. [[1,2,3]] -> [[1][2][3]]
print("a=", a, "\nb=", b)
for i in range(3):
print(" a=", a[0][i]) # prints 1 2 3
for i in range(3):
print(" b=", b[i][0]) # prints 1 2 3
Related
How to convert this numpy array:
array = np.array([[np.array([1]),2],[np.array([1]),2],[np.array([1]),2]])
print (array)
[[array([1]) 2]
[array([1]) 2]
[array([1]) 2]]
to this numpy array:
print(array)
[[1 2]
[1 2]
[1 2]]
How can I achieve this without a for loop?
This is what I tried but it doesn't work:
first_col = array[:,0]
first_col = np.array([i[0] for i in first_col])
I don't even know if answering this is a good idea, since there must be a fundemental flaw in the design to even come up with a situation like this and the correct solution would be to fix that, rather than trying to mitigate the issue by converting the output.
Never-the-less, given the data, interestingly enough, it is possible to 'unpack' the structure using the numpy array method .astype():
import numpy as np
array = np.array([[np.array([1]),2],[np.array([1]),2],[np.array([1]),2]])
array = array.astype(int) # alt array = array.astype(float)
But, as stated above, this is treating the symptom of the problem, rather than the problem itself.
Try using map to convert it to a list:
map(lambda x: [list(x[0]), x[1]], array)
This is one way.
import numpy as np
arr = np.array([[np.array([1]),2],
[np.array([1]),2],
[np.array([1]),2]])
np.array([[i[0][0], i[1]] for i in arr])
# array([[1, 2],
# [1, 2],
# [1, 2]])
I'm trying to create a function that will calculate the lattice distance (number of horizontal and vertical steps) between elements in a multi-dimensional numpy array. For this I need to retrieve the actual numbers from the indexes of each element as I iterate through the array. I want to store those values as numbers that I can run through a distance formula.
For the example array A
A=np.array([[1,2,3],[4,5,6],[7,8,9]])
I'd like to create a loop that iterates through each element and for the first element 1 it would retrieve a=0, b=0 since 1 is at A[0,0], then a=0, b=1 for element 2 as it is located at A[0,1], and so on...
My envisioned output is two numbers (corresponding to the two index values for that element) for each element in the array. So in the example above, it would be the two values that I am assigning to be a and b. I only will need to retrieve these two numbers within the loop (rather than save separately as another data object).
Any thoughts on how to do this would be greatly appreciated!
As I've become more familiar with the numpy and pandas ecosystem, it's become clearer to me that iteration is usually outright wrong due to how slow it is in comparison, and writing to use a vectorized operation is best whenever possible. Though the style is not as obvious/Pythonic at first, I've (anecdotally) gained ridiculous speedups with vectorized operations; more than 1000x in a case of swapping out a form like some row iteration .apply(lambda)
#MSeifert's answer much better provides this and will be significantly more performant on a dataset of any real size
More general Answer by #cs95 covering and comparing alternatives to iteration in Pandas
Original Answer
You can iterate through the values in your array with numpy.ndenumerate to get the indices of the values in your array.
Using the documentation above:
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
for index, values in np.ndenumerate(A):
print(index, values) # operate here
You can do it using np.ndenumerate but generally you don't need to iterate over an array.
You can simply create a meshgrid (or open grid) to get all indices at once and you can then process them (vectorized) much faster.
For example
>>> x, y = np.mgrid[slice(A.shape[0]), slice(A.shape[1])]
>>> x
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
>>> y
array([[0, 1, 2],
[0, 1, 2],
[0, 1, 2]])
and these can be processed like any other array. So if your function that needs the indices can be vectorized you shouldn't do the manual loop!
For example to calculate the lattice distance for each point to a point say (2, 3):
>>> abs(x - 2) + abs(y - 3)
array([[5, 4, 3],
[4, 3, 2],
[3, 2, 1]])
For distances an ogrid would be faster. Just replace np.mgrid with np.ogrid:
>>> x, y = np.ogrid[slice(A.shape[0]), slice(A.shape[1])]
>>> np.hypot(x - 2, y - 3) # cartesian distance this time! :-)
array([[ 3.60555128, 2.82842712, 2.23606798],
[ 3.16227766, 2.23606798, 1.41421356],
[ 3. , 2. , 1. ]])
Another possible solution:
import numpy as np
A=np.array([[1,2,3],[4,5,6],[7,8,9]])
for _, val in np.ndenumerate(A):
ind = np.argwhere(A==val)
print val, ind
In this case you will obtain the array of indexes if value appears in array not once.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to initialize a two-dimensional array in Python?
In solving a simple problem regarding a two dimensional array I came across a solution on this site that explained how to declare one in Python using the overload operator.
Example:
Myarray = [[0]*3]*3
this would produce the following array (list)
[[0,0,0],[0,0,0],[0,0,0]]
This seems fine until you use it:
if you assign an element for example:
Myarray [0][0] = 1
you get the unexpected output:
[[1,0, 0],[1,0,0] , [1,0,0]]
In effect assigning Myarray [1][0] and Myarray[2][0] at the same time
My solution:
Myarray = [[][][]]
for i in range(0,3):
for j in range (0,3):
Myarray[i].append(0)
This solution works as intended:
Marray[0][1] = 1
gives you
[[1,0, 0],[0,0,0] , [0,0,0]]
Is there a simpler way to do this? This was a solution to an A level Cambridge question and seems too long winded for students compared to other languages.
With vanilla Python, you could use this, a nested list comprehension
>>> m = [[0 for y in range(3)] for x in range(3)]
>>> m
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
Unlike the multiplied list you showed in your example, it has the desired behavior
>>> m[1][0] = 99
>>> m
[[0, 0, 0], [99, 0, 0], [0, 0, 0]]
However, for serious use of multidimensional arrays and/or numerical programming, I'd suggest you use Numpy arrays.
If I have an 1D numpy.ndarray b and a Python function f that I want to vectorize, this is very easy using the numpy.vectorize function:
c = numpy.vectorize(f)(a).
But if f returns a 1D numpy.ndarray instead of a scalar, how can I build a 2D numpy.ndarray instead? (That is, I want every 1D numpy.ndarray returned from f to become a row in the new 2D numpy.ndarray.)
Example:
def f(x):
return x * x
a = numpy.array([1,2,3])
c = numpy.vectorize(f)(a)
def f_1d(x):
return numpy.array([x, x])
a = numpy.ndarray([1,2,3])
d = ???(f_1d)(a)
In the above example c would become array([1, 4, 9]). What should ??? be replaced with if d should become array([[1, 1], [2, 2], [3, 3]])?
Could do this instead:
def f_1d(x):
return (x,x)
d = numpy.column_stack(numpy.vectorize(f_1d)(a))
will output:
array([[1, 1],
[2, 2],
[3, 3]])
I think you're looking for reshape and repeat
def f(x):
return x * x
a = numpy.array([1,2,3])
b= numpy.vectorize(f)(a)
c = numpy.repeat(b.reshape( (-1,1) ),2, axis=1)
print c
output:
[[1 1]
[4 4]
[9 9]]
You can also just set the array.shape tuple directly.
It may be worthwhile to know that you can accomplish the same as vectorize using map, if you ever need to write pure python. b= numpy.vectorize(f)(a) would become b=map(f,a)
Using this kind of approach, it becomes unnecessary to have your f_1d at all, since all it seems to do is duplicate information, which is done best by numpy.repeat.
Also, this version is a bit faster, but this only matters if you're dealing with large arrays.
I have this piece of Python code that fills up a 2d matrix in a for loop
img=zeros((len(bins_x),len(bins_y)))
for i in arange(0,len(ix)):
img[ix[i]][iy[i]]=dummy[i]
Is it possible to use a vectorial operation for the last two lines of code? Is there also something that might speed up the calculation?
If ix, iy are index sequences:
img[ix, iy] = dummy
It might be useful to use numpy. In particular, the reshape method might be useful. Here is an example (adapted from the second link):
>>> import numpy as np
>>> a = np.array([1,2,3,4,5,6])
>>> np.reshape(a, (3,2))
array([[1, 2],
[3, 4],
[5, 6]])