Print two arrays side by side using numpy - python

I'm trying to create a table of cosines using numpy in python. I want to have the angle next to the cosine of the angle, so it looks something like this:
0.0 1.000 5.0 0.996 10.0 0.985 15.0 0.966
20.0 0.940 25.0 0.906 and so on.
I'm trying to do it using a for loop but I'm not sure how to get this to work.
Currently, I have .
Any suggestions?

Let's say you have:
>>> d = np.linspace(0, 360, 10, endpoint=False)
>>> c = np.cos(np.radians(d))
If you don't mind having some brackets and such on the side, then you can simply concatenate column-wise using np.c_, and display:
>>> print(np.c_[d, c])
[[ 0.00000000e+00 1.00000000e+00]
[ 3.60000000e+01 8.09016994e-01]
[ 7.20000000e+01 3.09016994e-01]
[ 1.08000000e+02 -3.09016994e-01]
[ 1.44000000e+02 -8.09016994e-01]
[ 1.80000000e+02 -1.00000000e+00]
[ 2.16000000e+02 -8.09016994e-01]
[ 2.52000000e+02 -3.09016994e-01]
[ 2.88000000e+02 3.09016994e-01]
[ 3.24000000e+02 8.09016994e-01]]
But if you care about removing them, one possibility is to use a simple regex:
>>> import re
>>> print(re.sub(r' *\n *', '\n',
np.array_str(np.c_[d, c]).replace('[', '').replace(']', '').strip()))
0.00000000e+00 1.00000000e+00
3.60000000e+01 8.09016994e-01
7.20000000e+01 3.09016994e-01
1.08000000e+02 -3.09016994e-01
1.44000000e+02 -8.09016994e-01
1.80000000e+02 -1.00000000e+00
2.16000000e+02 -8.09016994e-01
2.52000000e+02 -3.09016994e-01
2.88000000e+02 3.09016994e-01
3.24000000e+02 8.09016994e-01
I'm removing the brackets, and then passing it to the regex to remove the spaces on either side in each line.
np.array_str also lets you set the precision. For more control, you can use np.array2string instead.

Side-by-Side Array Comparison using Numpy
A built-in Numpy approach using the column_stack((...)) method.
numpy.column_stack((A, B)) is a column stack with Numpy which allows you to compare two or more matrices/arrays.
Use the numpy.column_stack((A, B)) method with a tuple. The tuple must be represented with () parenthesizes representing a single argument with as many matrices/arrays as you want.
import numpy as np
A = np.random.uniform(size=(10,1))
B = np.random.uniform(size=(10,1))
C = np.random.uniform(size=(10,1))
np.column_stack((A, B, C)) ## <-- Compare Side-by-Side
The result looks like this:
array([[0.40323596, 0.95947336, 0.21354263],
[0.18001121, 0.35467198, 0.47653884],
[0.12756083, 0.24272134, 0.97832504],
[0.95769626, 0.33855075, 0.76510239],
[0.45280595, 0.33575171, 0.74295859],
[0.87895151, 0.43396391, 0.27123183],
[0.17721346, 0.06578044, 0.53619146],
[0.71395251, 0.03525021, 0.01544952],
[0.19048783, 0.16578012, 0.69430883],
[0.08897691, 0.41104408, 0.58484384]])
Numpy column_stack is useful for AI/ML applications when comparing the predicted results with the expected answers. This determines the effectiveness of the Neural Net training. It is a quick way to detect where errors are in the network calculations.

Pandas is very convenient module for such tasks:
In [174]: import pandas as pd
...:
...: x = pd.DataFrame({'angle': np.linspace(0, 355, 355//5+1),
...: 'cos': np.cos(np.deg2rad(np.linspace(0, 355, 355//5+1)))})
...:
...: pd.options.display.max_rows = 20
...:
...: x
...:
Out[174]:
angle cos
0 0.0 1.000000
1 5.0 0.996195
2 10.0 0.984808
3 15.0 0.965926
4 20.0 0.939693
5 25.0 0.906308
6 30.0 0.866025
7 35.0 0.819152
8 40.0 0.766044
9 45.0 0.707107
.. ... ...
62 310.0 0.642788
63 315.0 0.707107
64 320.0 0.766044
65 325.0 0.819152
66 330.0 0.866025
67 335.0 0.906308
68 340.0 0.939693
69 345.0 0.965926
70 350.0 0.984808
71 355.0 0.996195
[72 rows x 2 columns]

You can use python's zip function to go through the elements of both lists simultaneously.
import numpy as np
degreesVector = np.linspace(0.0, 360.0, 73.0)
cosinesVector = np.cos(np.radians(degreesVector))
for d, c in zip(degreesVector, cosinesVector):
print d, c
And if you want to make a numpy array out of the degrees and cosine values, you can modify the for loop in this way:
table = []
for d, c in zip(degreesVector, cosinesVector):
table.append([d, c])
table = np.array(table)
And now on one line!
np.array([[d, c] for d, c in zip(degreesVector, cosinesVector)])

You were close - but if you iterate over angles, just generate the cosine for that angle:
In [293]: for angle in range(0,60,10):
...: print('{0:8}{1:8.3f}'.format(angle, np.cos(np.radians(angle))))
...:
0 1.000
10 0.985
20 0.940
30 0.866
40 0.766
50 0.643
To work with arrays, you have lots of options:
In [294]: angles=np.linspace(0,60,7)
In [295]: cosines=np.cos(np.radians(angles))
iterate over an index:
In [297]: for i in range(angles.shape[0]):
...: print('{0:8}{1:8.3f}'.format(angles[i],cosines[i]))
Use zip to dish out the values 2 by 2:
for a,c in zip(angles, cosines):
print('{0:8}{1:8.3f}'.format(a,c))
A slight variant on that:
for ac in zip(angles, cosines):
print('{0:8}{1:8.3f}'.format(*ac))
You could concatenate the arrays together into a 2d array, and display that:
In [302]: np.vstack((angles, cosines)).T
Out[302]:
array([[ 0. , 1. ],
[ 10. , 0.98480775],
[ 20. , 0.93969262],
[ 30. , 0.8660254 ],
[ 40. , 0.76604444],
[ 50. , 0.64278761],
[ 60. , 0.5 ]])
In [318]: print(np.vstack((angles, cosines)).T)
[[ 0. 1. ]
[ 10. 0.98480775]
[ 20. 0.93969262]
[ 30. 0.8660254 ]
[ 40. 0.76604444]
[ 50. 0.64278761]
[ 60. 0.5 ]]
np.column_stack can do that without the transpose.
And you can pass that array to your formatting with:
for ac in np.vstack((angles, cosines)).T:
print('{0:8}{1:8.3f}'.format(*ac))
or you could write that to a csv style file with savetxt (which just iterates over the 'rows' of the 2d array and writes with fmt):
In [310]: np.savetxt('test.txt', np.vstack((angles, cosines)).T, fmt='%8.1f %8.3f')
In [311]: cat test.txt
0.0 1.000
10.0 0.985
20.0 0.940
30.0 0.866
40.0 0.766
50.0 0.643
60.0 0.500
Unfortunately savetxt requires the old style formatting. And trying to write to sys.stdout runs into byte v unicode string issues in Py3.

Just in numpy with some format ideas, to use #MaxU 's syntax
a = np.array([[i, np.cos(np.deg2rad(i)), np.sin(np.deg2rad(i))]
for i in range(0,361,30)])
args = ["Angle", "Cos", "Sin"]
frmt = ("{:>8.0f}"+"{:>8.3f}"*2)
print(("{:^8}"*3).format(*args))
for i in a:
print(frmt.format(*i))
Angle Cos Sin
0 1.000 0.000
30 0.866 0.500
60 0.500 0.866
90 0.000 1.000
120 -0.500 0.866
150 -0.866 0.500
180 -1.000 0.000
210 -0.866 -0.500
240 -0.500 -0.866
270 -0.000 -1.000
300 0.500 -0.866
330 0.866 -0.500
360 1.000 -0.000

Related

Get the Transformation Matrix From the SciPy Procrustes Implementation

The Procrustes library has an example where it demonstrates how to get the Transformation Matrix of two matrices by solving the Procrustes problem. The library seems to be old and doesn't work in Python 3.
I was wondering if there's any way to use the SciPy implementation of the Procrustes problem and be able to solve the exact problem discussed in the library's example.
Another StackOverflow question seems to need the exact thing that I'm asking here but I can't get it to give me the proper Transformation Matrix that would transform the Source Matrix to nearly
Using
In summary, I'd like to be able to implement this example using the SciPy library.
You could use scipy.linalg.orthogonal_procrustes. Here's a demonstration. Note that the function generateAB only exists to generate the arrays A and B for the demo. The key steps of the calculation are to center A and B, and then call orthogonal_procrustes.
import numpy as np
from scipy.stats import ortho_group
from scipy.linalg import orthogonal_procrustes
def generateAB(shape, noise=0, rng=None):
# Generate A and B for the example.
if rng is None:
rng = np.random.default_rng()
m, n = shape
# Random matrix A
A = 3 + 2*rng.random(shape)
Am = A.mean(axis=0, keepdims=True)
# Random orthogonal matrix T
T = ortho_group.rvs(n, random_state=rng)
# Target matrix B
B = ((A - Am) # T + rng.normal(scale=noise, size=A.shape)
+ 3*rng.random((1, n)))
# Include T in the return, but in a real problem, T would not be known.
return A, B, T
# For reproducibility, use a seeded RNG.
rng = np.random.default_rng(0x1ce1cebab1e)
A, B, T = generateAB((7, 5), noise=0.01, rng=rng)
# Find Q. Note that `orthogonal_procrustes` does not include
# dilation or translation. To handle translation, we center
# A and B by subtracting the means of the points.
A0 = A - A.mean(axis=0, keepdims=True)
B0 = B - B.mean(axis=0, keepdims=True)
Q, scale = orthogonal_procrustes(A0, B0)
with np.printoptions(precision=3, suppress=True):
print('T (used to generate B from A):')
print(T)
print('Q (computed by orthogonal_procrustes):')
print(Q)
print('\nCompare A0 # Q with B0.')
print('A0 # Q:')
print(A0 # Q)
print('B0 (should be close to A0 # Q if the noise parameter was small):')
print(B0)
Output:
T (used to generate B from A):
[[-0.873 0.017 0.202 -0.44 -0.054]
[-0.129 0.606 -0.763 -0.047 -0.18 ]
[ 0.055 -0.708 -0.567 -0.408 0.088]
[ 0.024 0.24 -0.028 -0.168 0.955]
[ 0.466 0.272 0.235 -0.78 -0.21 ]]
Q (computed by orthogonal_procrustes):
[[-0.871 0.022 0.203 -0.443 -0.052]
[-0.129 0.604 -0.765 -0.046 -0.178]
[ 0.053 -0.709 -0.565 -0.409 0.087]
[ 0.027 0.239 -0.029 -0.166 0.956]
[ 0.47 0.273 0.233 -0.779 -0.21 ]]
Compare A0 # Q with B0.
A0 # Q:
[[-0.622 0.224 0.946 1.038 0.578]
[ 0.263 0.143 -0.031 -0.949 0.492]
[-0.49 0.758 0.473 -0.221 -0.755]
[ 0.205 -0.74 0.065 -0.192 -0.551]
[-0.295 -0.434 -1.103 0.444 0.547]
[ 0.585 -0.378 -0.645 -0.233 0.651]
[ 0.354 0.427 0.296 0.113 -0.963]]
B0 (should be close to A0 # Q if the noise parameter was small):
[[-0.627 0.226 0.949 1.032 0.576]
[ 0.268 0.135 -0.028 -0.95 0.492]
[-0.493 0.765 0.475 -0.201 -0.75 ]
[ 0.214 -0.743 0.071 -0.196 -0.55 ]
[-0.304 -0.433 -1.115 0.451 0.551]
[ 0.589 -0.375 -0.645 -0.235 0.651]
[ 0.354 0.426 0.292 0.1 -0.969]]

Array Manipulation to DataFrame

I have the following array:
(array([[5.8205872e+07, 2.0200601e+07, 1.6700000e+02, 2.1500000e+02,
5.0000000e+01, 5.0000000e+00],
[5.7929117e+07, 2.0200601e+07, 1.6700000e+02, 1.5000000e+02,
5.0000000e+01, 5.0000000e+00],
[5.8178782e+07, 2.0200601e+07, 1.6700000e+02, 1.5750000e+02,
5.0000000e+01, 5.0000000e+00],
[5.7936230e+07, 2.0210228e+07, 1.6700000e+02, 1.8000000e+02,
4.0000000e+01, 5.0000000e+00],
[5.8213574e+07, 2.0210228e+07, 1.6700000e+02, 6.9500000e+02,
4.0000000e+01, 5.0000000e+00],
[2.5693916e+07, 2.0210228e+07, 1.6700000e+02, 4.8518000e+02,
4.0000000e+01, 5.0000000e+00]]),
array([[ 0.46666667, 7.16666667],
[ 0.51724138, 5.17241379],
[ 0.73333333, 5.25 ],
[ 0.34285714, 5.14285714],
[ 1.18918919, 18.78378378],
[ 1.26315789, 12.76789474]]))
I would like to transform it to a data frame that has 8 columns and six rows in total.
I tried to do : pd.Dataframe(my_array) but the result is just two rows like this:
0 [[58205872.0, 20200601.0, 167.0, 30.0, 1.0, 10...
1 [[0.4666666666666667, 7.166666666666667], [0.5...
How can I achieve what is described above?
It looks like you want to concatenate your two arrays (indeed you do have two arrays assigned to my_array) and then turn the result into a dataframe. What about first using numpy.hstack
>>> your_two_arrays = (..., ...)
>>> a = np.hstack(your_two_arrays)
>>> a.shape
(6, 8)
and finally pandas.DataFrame
>>> pd.DataFrame(data=a)
0 1 2 3 4 5 6 7
0 58205872.0 20200601.0 167.0 215.00 50.0 5.0 0.466667 7.166667
1 57929117.0 20200601.0 167.0 150.00 50.0 5.0 0.517241 5.172414
2 58178782.0 20200601.0 167.0 157.50 50.0 5.0 0.733333 5.250000
3 57936230.0 20210228.0 167.0 180.00 40.0 5.0 0.342857 5.142857
4 58213574.0 20210228.0 167.0 695.00 40.0 5.0 1.189189 18.783784
5 25693916.0 20210228.0 167.0 485.18 40.0 5.0 1.263158 12.767895
[...] the result is just two rows like this: [...]
The data that you were providing to pd.Dataframe when doing pd.Dataframe(my_array) was a tuple of two objects. Hence the two rows you got (and one column), i.e. one per array.

Get the average value of the pixel across all the frames

I have 14400 values saved in a list which represent a 4 pixels vertical by 6 pixels horizontal and 600 frames.
Here are the values if anyone is interested
# len of blurry values 14400
data = np.array(blurry_values)
#shape of data (4, 6, 600)
shape = ( 4,6,600 )
data= data.reshape(shape)
#print(np.round(np.mean(data, axis=2),2))
[[0.89 0.37 0.45 0.44 0.51 0.52]
[0.5 0.47 0.53 0.48 0.48 0.53]
[0.49 0.5 0.5 0.53 0.48 0.54]
[0.48 0.51 0.45 0.55 0.5 0.49]]
However, when I confirm the sanity of the first average by doing the following
list1 = blurry_values[::23]
np.round(np.mean(list1),2)
I get 0.51 instead of 0.89
I am trying to get the average value of the pixel across all the frames. Why are these values different?
I don't know exactly why, but :
list1 = blurry_values[:600]
gives 0.89
list1 = blurry_values[600:1200]
gives 0.37
Python starts reshaping by first filling the last dimension, i believe..
Let us tackle this with a smaller array:
import numpy as np
np.random.seed(42)
values = np.random.randint(low=0, high=100, size=48)
shape = (2,4,6)
data = values.reshape(shape) # 2 frames of 4 pixels by 6 pixels each
print(data, '\n')
print(np.round(np.mean(data, axis=0),2), '\n') # average values across frames
list1 = values[::24]
print(np.round(np.mean(list1),2)) # average of first pixel across frames
Output:
[[[51 92 14 71 60 20]
[82 86 74 74 87 99]
[23 2 21 52 1 87]
[29 37 1 63 59 20]]
[[32 75 57 21 88 48]
[90 58 41 91 59 79]
[14 61 61 46 61 50]
[54 63 2 50 6 20]]]
[[41.5 83.5 35.5 46. 74. 34. ]
[86. 72. 57.5 82.5 73. 89. ]
[18.5 31.5 41. 49. 31. 68.5]
[41.5 50. 1.5 56.5 32.5 20. ]]
41.5
Since I haven't seen the code that produced blurry_values, I can't be 100% sure, but I'm guessing that you're re-shaping blurry_values wrongly.
In most programming scenarios, I would expect the pixel-height and pixel-width to be represented by the last two axes, and the frame to be represented by an axis preceding these two.
So, I'm guessing that your shape should have been shape = (600, 4, 6) instead of shape = (4, 6, 600).
In that case, you should be doing np.round(np.mean(data, axis=0),2) rather than np.round(np.mean(data, axis=2),2). BTW, that would also produce a shape of (4, 6).
Then, for your sanity check, you should be doing this:
list1 = blurry_values[::24] # Note that it's 24, not 23
np.round(np.mean(list1),2)
You should be checking whether the first value of np.round(np.mean(data, axis=0),2), with the first value of np.round(np.mean(list1),2). (I haven't tested it myself, though).

fast read less structure ascii data file in numpy

I would like to read a data grid (3D array of floats) from .xsf file. (format documentation is here http://www.xcrysden.org/doc/XSF.html the BEGIN_BLOCK_DATAGRID_3D block )
the problem is that data are in 5 columns and if the number of elements Nx*Ny*Nz is not divisible by 5 than the last line can have any length.
For this reason I'm not able to use numpy.genfromtxt() of numpy.loadtxt() ...
I made a subroutine which does solve the problem, but is terribly slow ( because it use tight loops probably ). The files i want to read are large ( >200 MB 200x200x200 = 8000000 numbers in ASCII )
Is there any really fast way how to read such unfriendly formats in python / numpy into ndarray?
xsf datagrids looks like this (example for shape=(3,3,3))
BEGIN_BLOCK_DATAGRID_3D
BEGIN_DATAGRID_3D_this_is_3Dgrid
3 3 3 # number of elements Nx Ny Nz
0.0 0.0 0.0 # grid origin in real space
1.0 0.0 0.0 # grid size in real space
0.0 1.0 0.0
0.0 0.0 1.0
0.000 1.000 2.000 5.196 8.000 # data in 5 columns
1.000 1.414 2.236 5.292 8.062
2.000 2.236 2.828 5.568 8.246
3.000 3.162 3.606 6.000 8.544
4.000 4.123 4.472 6.557 8.944
1.000 1.414 # this is the problem
END_DATAGRID_3D
END_BLOCK_DATAGRID_3D
I got something working with Pandas and Numpy. Pandas will fill in nan values for the missing data.
import pandas as pd
import numpy as np
df = pd.read_csv("xyz.data", header=None, delimiter=r'\s+', dtype=np.float, skiprows=7, skipfooter=2)
data = df.values.flatten()
data = data[~np.isnan(data)]
result = data.reshape((data.size/3, 3))
Output
>>> result
array([[ 0. , 1. , 2. ],
[ 5.196, 8. , 1. ],
[ 1.414, 2.236, 5.292],
[ 8.062, 2. , 2.236],
[ 2.828, 5.568, 8.246],
[ 3. , 3.162, 3.606],
[ 6. , 8.544, 4. ],
[ 4.123, 4.472, 6.557],
[ 8.944, 1. , 1.414]])

Python programming - numpy polyfit saying NAN

I am having some issues with a pretty simple code I have written. I have 4 sets of data, and want to generate polynomial best fit lines using numpy polyfit. 3 of the lists yield numbers when using polyfit, but the third data set yields NAN when using polyfit. Below is the code and the print out. Any ideas?
Code:
all of the 'ind_#'s are the lists of data. Below converts them into numpy arrays that can then generate polynomial best fit line
ind_1=np.array(ind_1, np.float)
dep_1=np.array(dep_1, np.float)
x_1=np.arange(min(ind_1)-1, max(ind_1)+1, .01)
ind_2=np.array(ind_2, np.float)
dep_2=np.array(dep_2, np.float)
x_2=np.arange(min(ind_2)-1, max(ind_2)+1, .01)
ind_3=np.array(ind_3, np.float)
dep_3=np.array(dep_3, np.float)
x_3=np.arange(min(ind_3)-1, max(ind_3)+1, .01)
ind_4=np.array(ind_4, np.float)
dep_4=np.array(dep_4, np.float)
x_4=np.arange(min(ind_4)-1, max(ind_4)+1, .01)
Below prints off the arrays generated above, as well as the contents of the polyfit list, which are usually the coefficients of the polynomial equation, but for the third case below, all of the polyfit contents print off as NAN
print(ind_1)
print(dep_1)
print(np.polyfit(ind_1,dep_1,2))
print(ind_2)
print(dep_2)
print(np.polyfit(ind_2,dep_2,2))
print(ind_3)
print(dep_3)
print(np.polyfit(ind_3,dep_3,2))
print(ind_4)
print(dep_4)
print(np.polyfit(ind_4,dep_4,2))
Print out:
[ 1.405 1.871 2.713 ..., 5.367 5.404 2.155]
[ 0.274 0.07 0.043 ..., 0.607 0.614 0.152]
[ 0.01391925 -0.00950728 0.14803846]
[ 0.9760001 2.067 8.8 ..., 1.301 1.625 2.007 ]
[ 0.219 0.05 0.9810001 ..., 0.163 0.161 0.163 ]
[ 0.00886807 -0.00868727 0.17793324]
[ 1.143 0.9120001 2.162 ..., 2.915 2.865 2.739 ]
[ 0.283 0.3 0.27 ..., 0.227 0.213 0.161]
[ nan nan nan]
[ 0.167 0.315 1.938 ..., 2.641 1.799 2.719]
[ 0.6810001 0.7140001 0.309 ..., 0.283 0.313 0.251 ]
[ 0.00382331 0.00222269 0.16940372]
Why are the polyfit constants from the third case listed as NAN? All the data sets have same type of data, and all of the code is consistent. Please help.
Just looked at your data. This is happening because you have a NaN in dep_3 (element 713). You can make sure that you only use finite values in the fit like this:
idx = np.isfinite(ind_3) & np.isfinite(dep_3)
print(np.polyfit(ind_3[idx], dep_3[idx], 2))
As for finding for bad values in large datasets, numpy makes that really easy. You can find the indices like this:
print(np.where(~np.isfinite(dep_3)))

Categories

Resources