Reshape and indexing in MATLAB and Python - python

I have a code in Matlab which I need to translate in Python. A point here that shapes and indexes are really important since it works with tensors. I'm a little bit confused since it seems that it's enough to use order='F' in python reshape(). But when I work with 3D data I noticed that it does not work. For example, if A is an array from 1 to 27 in python
array([[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9]],
[[10, 11, 12],
[13, 14, 15],
[16, 17, 18]],
[[19, 20, 21],
[22, 23, 24],
[25, 26, 27]]])
if I perform A.reshape(3, 9, order='F') I get
[[ 1 4 7 2 5 8 3 6 9]
[10 13 16 11 14 17 12 15 18]
[19 22 25 20 23 26 21 24 27]]
In Matlab for A = 1:27 reshaped to [3, 3, 3] and then to [3, 9] it seems that I get another array:
1 4 7 10 13 16 19 22 25
2 5 8 11 14 17 20 23 26
3 6 9 12 15 18 21 24 27
And SVD in Matlab and Python gives different results. So, is there a way to fix this?
And maybe you know the correct way of operating with multidimensional arrays in Matlab -> python, like should I get the same SVD for arrays like arange(1, 13).reshape(3, 4) and in Matlab 1:12 -> reshape(_, [3, 4]) or what is the correct way to work with that? Maybe I can swap axes somehow in python to get the same results as in Matlab? Or change the order of axes in reshape(x1, x2, x3,...) in Python?

I was having the same issues, until I found this wikipedia article: row- and column-major order
Python (and C) organizes the data arrays in row major order. As you can see in your first example code, the elements first increases with the columns:
array([[[ 1, 2, 3],
- - - -> increasing
Then in the rows
array([[[ 1, 2, 3],
[ 4, <--- new element
When all columns and rows are full, it moves to the next page.
array([[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9]],
[[10, <-- new element in next page
In matlab (as fortran) increases first the rows, then the columns, and so on.
For N-dimensionals arrays it looks like:
Python (row major -> last dimension is contiguous): [dim1,dim2,...,dimN]
Matlab (column major -> first dimension is contiguous): the same tensor in memory would look the other way around .. [dimN,...,dim2,dim1]
If you want to export n-dim. arrays from python to matlab, the easiest way is to permute the dimensions first:
(in python)
import numpy as np
import scipy.io as sio
A=np.reshape(range(1,28),[3,3,3])
sio.savemat('A',{'A':A})
(in matlab)
load('A.mat')
A=permute(A,[3 2 1]);%dimensions in reverse ordering
reshape(A,9,3)' %gives the same result as A.reshape([3,9]) in python
Just notice that the (9,3) an the (3,9) are intentionally putted in reverse order.

In Matlab
A = 1:27;
A = reshape(A,3,3,3);
B = reshape(A,9,3)'
B =
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27
size(B)
ans =
3 9
In Python
A = np.array(range(1,28))
A = A.reshape(3,3,3)
B = A.reshape(3,9)
B
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24, 25, 26, 27]])
np.shape(B)
(3, 9)

Related

Find nearest index in one dataframe to another

I am new to python and its libraries. Searched all the forums but could not find a proper solution. This is the first time posting a question here. Sorry if I did something wrong.
So, I have two DataFrames like below containing X Y Z coordinates (UTM) and other features.
In [2]: a = {
...: 'X': [1, 2, 5, 7, 10, 5, 2, 3, 24, 21],
...: 'Y': [3, 4, 8, 15, 20, 12, 23, 22, 14, 7],
...: 'Z': [12, 4, 9, 16, 13, 1, 8, 17, 11, 19],
...: }
...:
In [3]: b = {
...: 'X': [1, 8, 20, 7, 32],
...: 'Y': [6, 4, 17, 45, 32],
...: 'Z': [52, 12, 6, 8, 31],
...: }
In [4]: df1 = pd.DataFrame(data=a)
In [5]: df2 = pd.DataFrame(data=b)
In [6]: print(df1)
X Y Z
0 1 3 12
1 2 4 4
2 5 8 9
3 7 15 16
4 10 20 13
5 5 12 1
6 2 23 8
7 3 22 17
8 24 14 11
9 21 7 19
In [7]: print(df2)
X Y Z
0 1 6 52
1 8 4 12
2 20 17 6
3 7 45 8
4 32 32 31
I need to find the closest point (distance) in df1 to each point of df2 and creating new DataFrame.
So I wrote the code below and actually find the closest point (distance) to df2.iloc[0].
In [8]: x = (
...: np.sqrt(
...: ((df1['X'].sub(df2["X"].iloc[0]))**2)
...: .add(((df1['Y'].sub(df2["Y"].iloc[0]))**2))
...: .add(((df1['Z'].sub(df2["Z"].iloc[0]))**2))
...: )
...: ).idxmin()
In [9]: x1 = df1.iloc[[x]]
In[10]: print(x1)
X Y Z
3 7 15 16
So, I guess I need a loop to iterate through df2 and apply above code to each row. As a result I need a new updated df1 containing all the closest points to each point of df2. But couldn't make it. Please advise.
This is actually a great example of a case where numpy's broadcasting rules have distinct advantages over pandas.
Manually aligning df1's coordinates as column vectors (by referencing df1[[col]].to_numpy()) and df2's coordinates as row vectors (df2[col].to_numpy()), we can get the distance from every element in each dataframe to each element in the other very quickly with automatic broadcasting:
In [26]: dists = np.sqrt(
...: (df1[['X']].to_numpy() - df2['X'].to_numpy()) ** 2
...: + (df1[['Y']].to_numpy() - df2['Y'].to_numpy()) ** 2
...: + (df1[['Z']].to_numpy() - df2['Z'].to_numpy()) ** 2
...: )
In [27]: dists
Out[27]:
array([[40.11234224, 7.07106781, 24.35159132, 42.61455151, 46.50806382],
[48.05205511, 10. , 22.29349681, 41.49698784, 49.12229636],
[43.23193264, 5.83095189, 17.74823935, 37.06750599, 42.29657197],
[37.58989226, 11.74734012, 16.52271164, 31.04834939, 33.74907406],
[42.40283009, 16.15549442, 12.56980509, 25.67099531, 30.85449724],
[51.50728104, 13.92838828, 16.58312395, 33.7934905 , 45.04442252],
[47.18050445, 20.32240143, 19.07878403, 22.56102835, 38.85871846],
[38.53569774, 19.33907961, 20.85665361, 25.01999201, 33.7194306 ],
[47.68647607, 18.89444363, 7.07106781, 35.48239 , 28.0713377 ],
[38.60051813, 15.06651917, 16.43167673, 41.96427052, 29.83286778]])
Argmin will now give you the correct vector of positional indices:
In [28]: dists.argmin(axis=0)
Out[28]: array([3, 2, 8, 6, 8])
Or, to select the appropriate values from df1:
In [29]: df1.iloc[dists.argmin(axis=0)]
Out[29]:
X Y Z
3 7 15 16
2 5 8 9
8 24 14 11
6 2 23 8
8 24 14 11
Edit
An answer popped up just after mine, then was deleted, which made reference to scipy.spatial.distance_matrix, computing dists with:
distance_matrix(df1[list('XYZ')].to_numpy(), df2[list('XYZ')].to_numpy())
Not sure why that answer was deleted, but this seems like a really nice, clean approach to getting the array I produced manually above!
Performance Note
Note that if you are just trying to get the closest value, there's no need to take the square root, as this is a costly operation compared to addition, subtraction, and powers, and sorting on dist**2 is still valid.
First, you define a function that returns the closest point using numpy.where. Then you use the apply function to run through df2.
import pandas as pd
import numpy as np
a = {
'X': [1, 2, 5, 7, 10, 5, 2, 3, 24, 21],
'Y': [3, 4, 8, 15, 20, 12, 23, 22, 14, 7],
'Z': [12, 4, 9, 16, 13, 1, 8, 17, 11, 19]
}
b = {
'X': [1, 8, 20, 7, 32],
'Y': [6, 4, 17, 45, 32],
'Z': [52, 12, 6, 8, 31]
}
df1 = pd.DataFrame(a)
df2 = pd.DataFrame(b)
dist = lambda dx,dy,dz: np.sqrt(dx**2+dy**2+dz**2)
def closest(row):
darr = dist(df1['X']-row['X'], df1['Y']-row['Y'], df1['Z']-row['Z'])
idx = np.where(darr == np.amin(darr))[0][0]
return df1['X'][idx], df1['Y'][idx], df1['Z'][idx]
df2['closest'] = df2.apply(closest, axis=1)
print(df2)
Output:
X Y Z closest
0 1 6 52 (7, 15, 16)
1 8 4 12 (5, 8, 9)
2 20 17 6 (24, 14, 11)
3 7 45 8 (2, 23, 8)
4 32 32 31 (24, 14, 11)

Cummulative addition in a loop

I am trying to cummatively add a value to the previous value and each time, store the value in an array.
This code is just part of a larger project. For simplicity i am going to define my variables as follows:
ele_ini = [12]
smb = [2, 5, 7, 8, 9, 10]
val = ele_ini
for i in range(len(smb)):
val += smb[i]
print(val)
elevation_smb.append(val)
Problem
Each time, the previous value stored in elevation_smb is replaced by the current value such that the result i obtain is:
elevation_smb = [22, 22, 22, 22, 22, 22]
The result i am expecting however is
elevation_smb = [14, 19, 26, 34, 43, 53]
NOTE:
ele_ini is a vector with n elements. I am only using 1 element just for simplicity.
Don use loops, because slow. Better is fast vectorized solution below.
I think need numpy.cumsum and add vector ele_ini for 2d numpy array:
ele_ini = [12, 10, 1, 0]
smb = [2, 5, 7, 8, 9, 10]
elevation_smb = np.cumsum(np.array(smb)) + np.array(ele_ini)[:, None]
print (elevation_smb)
[[14 19 26 34 43 53]
[12 17 24 32 41 51]
[ 3 8 15 23 32 42]
[ 2 7 14 22 31 41]]
It seems vector in your case is using pointers. That's why it is not creating new values. Try adding copy() which copies the value.
elevation_smb.append(val.copy())
Do with reduce,
In [6]: reduce(lambda c, x: c + [c[-1] + x], smb, ele_ini)
Out[6]: [12, 14, 19, 26, 34, 43, 53]

Manipulating 3D arrays in python

As a disclaimer I'm very new to python and numpy arrays. Reading some of the answers to similar questions and trying their solutions for my own data hasn't been very helpful so I thought I'd just post my own question. For example, Reshaping 3D Numpy Array to a 2D array. Its entirely believable though that I've just implemented those other solutions wrong.
I have a 3D numpy array "C"
C = np.reshape(np.arange(3*3*4),(3,3,4))
print(C)
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]
[[24 25 26 27]
[28 29 30 31]
[32 33 34 35]]]
I would like to reshape it into something like:
[0 12 14], [1,13,25], [2,24,26] ..... etc
where the first elements of each of the 3 arrays gets put into its own array, then the second elements of each array get put into a new array, and so on.
It seems trivial, but I'm stumped. I've tried different types combinations of .reshape, just for example,
output=C.reshape(12,3)
I've tried changing the order from "C" to "F", playing around with different .reshape() parameters, but can't seem to actually get the final result in the desired structure
Any tips would be much appreciated.
I think this is what you want:
C = np.reshape(np.arange(3*3*4),(3,3,4))
C.reshape(3,12).T
array([[ 0, 12, 24],
[ 1, 13, 25],
[ 2, 14, 26],
[ 3, 15, 27],
[ 4, 16, 28],
[ 5, 17, 29],
[ 6, 18, 30],
[ 7, 19, 31],
[ 8, 20, 32],
[ 9, 21, 33],
[10, 22, 34],
[11, 23, 35]])

Interpreting numpy multidimensional array [duplicate]

This question already has answers here:
Explain this 4D numpy array indexing intuitively
(2 answers)
Closed 5 years ago.
I am new to numerical computation using numpy. I am having a hard time in understanding arrays with dimentions more than 2. Is there any way to interpret a multidimensional array?
e.g:
>>> import numpy as np
>>> arr1 = np.arange(24).reshape(2,3,2,2)
>>> arr1
array([[[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]]],
[[[12, 13],
[14, 15]],
[[16, 17],
[18, 19]],
[[20, 21],
[22, 23]]]])
Any explanation, reference to build intuition?
Edited:
I wanted to know how to interpret the output of .shape with the output of . i.e in the above example (2,3,2,2) what is the rightmost 2 referring to or 3 or other 2. How numpy handles this?
This isn't a direct answer, but when I started working with multidimensional arrays, my biggest difficulty was visualizing what the big long streaming list and brackets was all about. I had a picture in my mind what a 3D and 4D array looked like, but the current representations didn't match what I 'pictured'. To assist me in viewing the data structures, I wrote a couple of functions to rearrange the structure into a form I could understand.
My question to you then is... do any of the presentations below, help you in understanding or visualizing the structure any better? I can provide support code in an edit, if any of these are useful.
arr1 = np.arange(24).reshape(2,3,2,2)
Sample array...
-shape (2, 3, 2, 2), ndim 4
-------------------------
-(0, + (3, 2, 2)
. 0 1 4 5 8 9
. 2 3 6 7 10 11
-------------------------
-(1, + (3, 2, 2)
. 12 13 16 17 20 21
. 14 15 18 19 22 23
Or presentation option 2
Alternate format
Main array...
shape: (2, 3, 2, 2)
[0,...] (3, 2, 2)
.[[[ 0 1]
. [ 2 3]]
. [[ 4 5]
. [ 6 7]]
. [[ 8 9]
. [10 11]]]
[1,...] (3, 2, 2)
.[[[12 13]
. [14 15]]
. [[16 17]
. [18 19]]
. [[20 21]
. [22 23]]]
Sorry not to be able to answer your question directly, but often the 'direct' answer isn't what is really needed to get to the root problem,

Python 3 range iteration uncertain

I'm using python 3.4.0 on an ubuntu 14.04 system.
This is a bubble sort procedure, but no, this is NOT homework. I'm having problems with a range. This is just stemming from an interest in learning python.
The program statement is:
for i in range(siz):
siz has been set to 20, but it's iterating from 0 to 18. Here's the output.
======================================\n
Traditional Bubble Sort\n
20 54 7 array('I', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
---> \n
siz = 20\n
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
=========================================
and here's the code segment that produced this.
print(siz, rows, colw, a)
s = input('---> ')
print("siz =", siz)
for i in range(siz):
print(i)
========================================
I thought range was supposed to iterate from
zero to siz-1, given the constraints of the statement.
What am I missing?
Cordially,
Dave

Categories

Resources