I am having some issues with a pretty simple code I have written. I have 4 sets of data, and want to generate polynomial best fit lines using numpy polyfit. 3 of the lists yield numbers when using polyfit, but the third data set yields NAN when using polyfit. Below is the code and the print out. Any ideas?
Code:
all of the 'ind_#'s are the lists of data. Below converts them into numpy arrays that can then generate polynomial best fit line
ind_1=np.array(ind_1, np.float)
dep_1=np.array(dep_1, np.float)
x_1=np.arange(min(ind_1)-1, max(ind_1)+1, .01)
ind_2=np.array(ind_2, np.float)
dep_2=np.array(dep_2, np.float)
x_2=np.arange(min(ind_2)-1, max(ind_2)+1, .01)
ind_3=np.array(ind_3, np.float)
dep_3=np.array(dep_3, np.float)
x_3=np.arange(min(ind_3)-1, max(ind_3)+1, .01)
ind_4=np.array(ind_4, np.float)
dep_4=np.array(dep_4, np.float)
x_4=np.arange(min(ind_4)-1, max(ind_4)+1, .01)
Below prints off the arrays generated above, as well as the contents of the polyfit list, which are usually the coefficients of the polynomial equation, but for the third case below, all of the polyfit contents print off as NAN
print(ind_1)
print(dep_1)
print(np.polyfit(ind_1,dep_1,2))
print(ind_2)
print(dep_2)
print(np.polyfit(ind_2,dep_2,2))
print(ind_3)
print(dep_3)
print(np.polyfit(ind_3,dep_3,2))
print(ind_4)
print(dep_4)
print(np.polyfit(ind_4,dep_4,2))
Print out:
[ 1.405 1.871 2.713 ..., 5.367 5.404 2.155]
[ 0.274 0.07 0.043 ..., 0.607 0.614 0.152]
[ 0.01391925 -0.00950728 0.14803846]
[ 0.9760001 2.067 8.8 ..., 1.301 1.625 2.007 ]
[ 0.219 0.05 0.9810001 ..., 0.163 0.161 0.163 ]
[ 0.00886807 -0.00868727 0.17793324]
[ 1.143 0.9120001 2.162 ..., 2.915 2.865 2.739 ]
[ 0.283 0.3 0.27 ..., 0.227 0.213 0.161]
[ nan nan nan]
[ 0.167 0.315 1.938 ..., 2.641 1.799 2.719]
[ 0.6810001 0.7140001 0.309 ..., 0.283 0.313 0.251 ]
[ 0.00382331 0.00222269 0.16940372]
Why are the polyfit constants from the third case listed as NAN? All the data sets have same type of data, and all of the code is consistent. Please help.
Just looked at your data. This is happening because you have a NaN in dep_3 (element 713). You can make sure that you only use finite values in the fit like this:
idx = np.isfinite(ind_3) & np.isfinite(dep_3)
print(np.polyfit(ind_3[idx], dep_3[idx], 2))
As for finding for bad values in large datasets, numpy makes that really easy. You can find the indices like this:
print(np.where(~np.isfinite(dep_3)))
Related
The Procrustes library has an example where it demonstrates how to get the Transformation Matrix of two matrices by solving the Procrustes problem. The library seems to be old and doesn't work in Python 3.
I was wondering if there's any way to use the SciPy implementation of the Procrustes problem and be able to solve the exact problem discussed in the library's example.
Another StackOverflow question seems to need the exact thing that I'm asking here but I can't get it to give me the proper Transformation Matrix that would transform the Source Matrix to nearly
Using
In summary, I'd like to be able to implement this example using the SciPy library.
You could use scipy.linalg.orthogonal_procrustes. Here's a demonstration. Note that the function generateAB only exists to generate the arrays A and B for the demo. The key steps of the calculation are to center A and B, and then call orthogonal_procrustes.
import numpy as np
from scipy.stats import ortho_group
from scipy.linalg import orthogonal_procrustes
def generateAB(shape, noise=0, rng=None):
# Generate A and B for the example.
if rng is None:
rng = np.random.default_rng()
m, n = shape
# Random matrix A
A = 3 + 2*rng.random(shape)
Am = A.mean(axis=0, keepdims=True)
# Random orthogonal matrix T
T = ortho_group.rvs(n, random_state=rng)
# Target matrix B
B = ((A - Am) # T + rng.normal(scale=noise, size=A.shape)
+ 3*rng.random((1, n)))
# Include T in the return, but in a real problem, T would not be known.
return A, B, T
# For reproducibility, use a seeded RNG.
rng = np.random.default_rng(0x1ce1cebab1e)
A, B, T = generateAB((7, 5), noise=0.01, rng=rng)
# Find Q. Note that `orthogonal_procrustes` does not include
# dilation or translation. To handle translation, we center
# A and B by subtracting the means of the points.
A0 = A - A.mean(axis=0, keepdims=True)
B0 = B - B.mean(axis=0, keepdims=True)
Q, scale = orthogonal_procrustes(A0, B0)
with np.printoptions(precision=3, suppress=True):
print('T (used to generate B from A):')
print(T)
print('Q (computed by orthogonal_procrustes):')
print(Q)
print('\nCompare A0 # Q with B0.')
print('A0 # Q:')
print(A0 # Q)
print('B0 (should be close to A0 # Q if the noise parameter was small):')
print(B0)
Output:
T (used to generate B from A):
[[-0.873 0.017 0.202 -0.44 -0.054]
[-0.129 0.606 -0.763 -0.047 -0.18 ]
[ 0.055 -0.708 -0.567 -0.408 0.088]
[ 0.024 0.24 -0.028 -0.168 0.955]
[ 0.466 0.272 0.235 -0.78 -0.21 ]]
Q (computed by orthogonal_procrustes):
[[-0.871 0.022 0.203 -0.443 -0.052]
[-0.129 0.604 -0.765 -0.046 -0.178]
[ 0.053 -0.709 -0.565 -0.409 0.087]
[ 0.027 0.239 -0.029 -0.166 0.956]
[ 0.47 0.273 0.233 -0.779 -0.21 ]]
Compare A0 # Q with B0.
A0 # Q:
[[-0.622 0.224 0.946 1.038 0.578]
[ 0.263 0.143 -0.031 -0.949 0.492]
[-0.49 0.758 0.473 -0.221 -0.755]
[ 0.205 -0.74 0.065 -0.192 -0.551]
[-0.295 -0.434 -1.103 0.444 0.547]
[ 0.585 -0.378 -0.645 -0.233 0.651]
[ 0.354 0.427 0.296 0.113 -0.963]]
B0 (should be close to A0 # Q if the noise parameter was small):
[[-0.627 0.226 0.949 1.032 0.576]
[ 0.268 0.135 -0.028 -0.95 0.492]
[-0.493 0.765 0.475 -0.201 -0.75 ]
[ 0.214 -0.743 0.071 -0.196 -0.55 ]
[-0.304 -0.433 -1.115 0.451 0.551]
[ 0.589 -0.375 -0.645 -0.235 0.651]
[ 0.354 0.426 0.292 0.1 -0.969]]
I'm trying to filter an image so that the value of each pixel is equal to the value of the median of the pixels within a 50x50 square around it, excluding any masked pixels. This is my latest attempt:
Read an image from a FITS file (looks like this...)
Apply a mask from another FITS file
Pass a 50x50 pixel window (I think this is the best way to do it...open to suggestions) across the masked image (masked image below)
Create a filtered copy of the masked image, with the value of each pixel being equal to the value of the median of the pixels within a 50x50 square around it, excluding any masked pixels
In the code here, I've used some methods from the documentation of skimage.util.view_as_windows
to produce the filtered image:
It looks to me like it's ignoring the masked pixels. My question is twofold:
Is this the best way to do it?
If so, why does it look like it's ignoring the mask?
import numpy as np
from astropy.io import fits
from skimage.util.shape import view_as_windows
# Use the fits files as input image and mask
hdulist = fits.open('xbulge-w1.fits')
image = hdulist[0].data
hdulist3 = fits.open('xbulge-mask.fits')
mask = 1 - hdulist3[0].data
imagemasked = np.ma.masked_array(image, mask = mask)
side = 50
window_shape = (side, side)
Afiltered = view_as_windows(imagemasked, window_shape)
# collapse the last two dimensions in one
flatten_view = Afiltered.reshape(Afiltered.shape[0], Afiltered.shape[1], -1)
# resampling the image by taking median
median_view = np.ma.median(flatten_view, axis=2)
Note: Using 'side = 50' results in quite a long run-time, so for testing purposes I've tended to decrease it to, say 10 to 25.
there are many filters in python with different behavior by nan's, for example for mean filter:
x=np.array([[0.1,0.8,.2],
[0.5,0.2,np.nan],
[0.7,0.2,0.9],
[0.4,0.7,1],
[np.nan,0.14,1]])
print(uniform_filter(x, size=3, mode='constant'))
[[ 0.17777778 nan nan]
[ 0.27777778 nan nan]
[ 0.3 nan nan]
[ nan nan nan]
[ nan nan nan]]
or
from skimage.filters.rank import mean
from skimage.morphology import square
from skimage import img_as_float
x=np.array([[0.1,0.8,.2],
[0.5,0.2,np.nan],
[0.7,0.2,0.9],
[0.4,0.7,1],
[np.nan,0.14,1]])
print(mean(x, square(3)))
[[102 76 76]
[106 102 97]
[114 130 127]
[ 90 142 167]
[ 79 137 181]]
print(img_as_float(mean(x, square(3))))
[[ 0.4 0.29803922 0.29803922]
[ 0.41568627 0.4 0.38039216]
[ 0.44705882 0.50980392 0.49803922]
[ 0.35294118 0.55686275 0.65490196]
[ 0.30980392 0.5372549 0.70980392]]
skimage dose not support nan's and masking: refrence
or
import numpy as np
# from scipy.signal import convolve
from scipy.signal import convolve2d
x=np.array([[0.1,0.8,.2],
[0.5,0.2,np.nan],
[0.7,0.2,0.9],
[0.4,0.7,1],
[np.nan,0.14,1]])
core = np.full((3,3),1/3**2)
# convolve(x, core, mode='same')
convolve2d(x, core, mode='same')
[[ 0.17777778 nan nan]
[ 0.27777778 nan nan]
[ 0.3 nan nan]
[ nan nan 0.43777778]
[ nan nan 0.31555556]]
I'm trying to create a table of cosines using numpy in python. I want to have the angle next to the cosine of the angle, so it looks something like this:
0.0 1.000 5.0 0.996 10.0 0.985 15.0 0.966
20.0 0.940 25.0 0.906 and so on.
I'm trying to do it using a for loop but I'm not sure how to get this to work.
Currently, I have .
Any suggestions?
Let's say you have:
>>> d = np.linspace(0, 360, 10, endpoint=False)
>>> c = np.cos(np.radians(d))
If you don't mind having some brackets and such on the side, then you can simply concatenate column-wise using np.c_, and display:
>>> print(np.c_[d, c])
[[ 0.00000000e+00 1.00000000e+00]
[ 3.60000000e+01 8.09016994e-01]
[ 7.20000000e+01 3.09016994e-01]
[ 1.08000000e+02 -3.09016994e-01]
[ 1.44000000e+02 -8.09016994e-01]
[ 1.80000000e+02 -1.00000000e+00]
[ 2.16000000e+02 -8.09016994e-01]
[ 2.52000000e+02 -3.09016994e-01]
[ 2.88000000e+02 3.09016994e-01]
[ 3.24000000e+02 8.09016994e-01]]
But if you care about removing them, one possibility is to use a simple regex:
>>> import re
>>> print(re.sub(r' *\n *', '\n',
np.array_str(np.c_[d, c]).replace('[', '').replace(']', '').strip()))
0.00000000e+00 1.00000000e+00
3.60000000e+01 8.09016994e-01
7.20000000e+01 3.09016994e-01
1.08000000e+02 -3.09016994e-01
1.44000000e+02 -8.09016994e-01
1.80000000e+02 -1.00000000e+00
2.16000000e+02 -8.09016994e-01
2.52000000e+02 -3.09016994e-01
2.88000000e+02 3.09016994e-01
3.24000000e+02 8.09016994e-01
I'm removing the brackets, and then passing it to the regex to remove the spaces on either side in each line.
np.array_str also lets you set the precision. For more control, you can use np.array2string instead.
Side-by-Side Array Comparison using Numpy
A built-in Numpy approach using the column_stack((...)) method.
numpy.column_stack((A, B)) is a column stack with Numpy which allows you to compare two or more matrices/arrays.
Use the numpy.column_stack((A, B)) method with a tuple. The tuple must be represented with () parenthesizes representing a single argument with as many matrices/arrays as you want.
import numpy as np
A = np.random.uniform(size=(10,1))
B = np.random.uniform(size=(10,1))
C = np.random.uniform(size=(10,1))
np.column_stack((A, B, C)) ## <-- Compare Side-by-Side
The result looks like this:
array([[0.40323596, 0.95947336, 0.21354263],
[0.18001121, 0.35467198, 0.47653884],
[0.12756083, 0.24272134, 0.97832504],
[0.95769626, 0.33855075, 0.76510239],
[0.45280595, 0.33575171, 0.74295859],
[0.87895151, 0.43396391, 0.27123183],
[0.17721346, 0.06578044, 0.53619146],
[0.71395251, 0.03525021, 0.01544952],
[0.19048783, 0.16578012, 0.69430883],
[0.08897691, 0.41104408, 0.58484384]])
Numpy column_stack is useful for AI/ML applications when comparing the predicted results with the expected answers. This determines the effectiveness of the Neural Net training. It is a quick way to detect where errors are in the network calculations.
Pandas is very convenient module for such tasks:
In [174]: import pandas as pd
...:
...: x = pd.DataFrame({'angle': np.linspace(0, 355, 355//5+1),
...: 'cos': np.cos(np.deg2rad(np.linspace(0, 355, 355//5+1)))})
...:
...: pd.options.display.max_rows = 20
...:
...: x
...:
Out[174]:
angle cos
0 0.0 1.000000
1 5.0 0.996195
2 10.0 0.984808
3 15.0 0.965926
4 20.0 0.939693
5 25.0 0.906308
6 30.0 0.866025
7 35.0 0.819152
8 40.0 0.766044
9 45.0 0.707107
.. ... ...
62 310.0 0.642788
63 315.0 0.707107
64 320.0 0.766044
65 325.0 0.819152
66 330.0 0.866025
67 335.0 0.906308
68 340.0 0.939693
69 345.0 0.965926
70 350.0 0.984808
71 355.0 0.996195
[72 rows x 2 columns]
You can use python's zip function to go through the elements of both lists simultaneously.
import numpy as np
degreesVector = np.linspace(0.0, 360.0, 73.0)
cosinesVector = np.cos(np.radians(degreesVector))
for d, c in zip(degreesVector, cosinesVector):
print d, c
And if you want to make a numpy array out of the degrees and cosine values, you can modify the for loop in this way:
table = []
for d, c in zip(degreesVector, cosinesVector):
table.append([d, c])
table = np.array(table)
And now on one line!
np.array([[d, c] for d, c in zip(degreesVector, cosinesVector)])
You were close - but if you iterate over angles, just generate the cosine for that angle:
In [293]: for angle in range(0,60,10):
...: print('{0:8}{1:8.3f}'.format(angle, np.cos(np.radians(angle))))
...:
0 1.000
10 0.985
20 0.940
30 0.866
40 0.766
50 0.643
To work with arrays, you have lots of options:
In [294]: angles=np.linspace(0,60,7)
In [295]: cosines=np.cos(np.radians(angles))
iterate over an index:
In [297]: for i in range(angles.shape[0]):
...: print('{0:8}{1:8.3f}'.format(angles[i],cosines[i]))
Use zip to dish out the values 2 by 2:
for a,c in zip(angles, cosines):
print('{0:8}{1:8.3f}'.format(a,c))
A slight variant on that:
for ac in zip(angles, cosines):
print('{0:8}{1:8.3f}'.format(*ac))
You could concatenate the arrays together into a 2d array, and display that:
In [302]: np.vstack((angles, cosines)).T
Out[302]:
array([[ 0. , 1. ],
[ 10. , 0.98480775],
[ 20. , 0.93969262],
[ 30. , 0.8660254 ],
[ 40. , 0.76604444],
[ 50. , 0.64278761],
[ 60. , 0.5 ]])
In [318]: print(np.vstack((angles, cosines)).T)
[[ 0. 1. ]
[ 10. 0.98480775]
[ 20. 0.93969262]
[ 30. 0.8660254 ]
[ 40. 0.76604444]
[ 50. 0.64278761]
[ 60. 0.5 ]]
np.column_stack can do that without the transpose.
And you can pass that array to your formatting with:
for ac in np.vstack((angles, cosines)).T:
print('{0:8}{1:8.3f}'.format(*ac))
or you could write that to a csv style file with savetxt (which just iterates over the 'rows' of the 2d array and writes with fmt):
In [310]: np.savetxt('test.txt', np.vstack((angles, cosines)).T, fmt='%8.1f %8.3f')
In [311]: cat test.txt
0.0 1.000
10.0 0.985
20.0 0.940
30.0 0.866
40.0 0.766
50.0 0.643
60.0 0.500
Unfortunately savetxt requires the old style formatting. And trying to write to sys.stdout runs into byte v unicode string issues in Py3.
Just in numpy with some format ideas, to use #MaxU 's syntax
a = np.array([[i, np.cos(np.deg2rad(i)), np.sin(np.deg2rad(i))]
for i in range(0,361,30)])
args = ["Angle", "Cos", "Sin"]
frmt = ("{:>8.0f}"+"{:>8.3f}"*2)
print(("{:^8}"*3).format(*args))
for i in a:
print(frmt.format(*i))
Angle Cos Sin
0 1.000 0.000
30 0.866 0.500
60 0.500 0.866
90 0.000 1.000
120 -0.500 0.866
150 -0.866 0.500
180 -1.000 0.000
210 -0.866 -0.500
240 -0.500 -0.866
270 -0.000 -1.000
300 0.500 -0.866
330 0.866 -0.500
360 1.000 -0.000
Trying to learn PCA through and through but interestingly enough when I use numpy and sklearn I get different covariance matrix results.
The numpy results match this explanatory text here but the sklearn results different from both.
Is there any reason why this is so?
d = pd.read_csv("example.txt", header=None, sep = " ")
print(d)
0 1
0 0.69 0.49
1 -1.31 -1.21
2 0.39 0.99
3 0.09 0.29
4 1.29 1.09
5 0.49 0.79
6 0.19 -0.31
7 -0.81 -0.81
8 -0.31 -0.31
9 -0.71 -1.01
Numpy Results
print(np.cov(d, rowvar = 0))
[[ 0.61655556 0.61544444]
[ 0.61544444 0.71655556]]
sklearn Results
from sklearn.decomposition import PCA
clf = PCA()
clf.fit(d.values)
print(clf.get_covariance())
[[ 0.5549 0.5539]
[ 0.5539 0.6449]]
Because for np.cov,
Default normalization is by (N - 1), where N is the number of observations given (unbiased estimate). If bias is 1, then normalization is by N.
Set bias=1, the result is the same as PCA:
In [9]: np.cov(df, rowvar=0, bias=1)
Out[9]:
array([[ 0.5549, 0.5539],
[ 0.5539, 0.6449]])
So I've encountered the same issue, and I think that it returns different values because the covariance is calculated in a different way. According to the sklearn documentation, the get_covariance() method, uses the noise variances to obtain the covariance matrix.
I would like to read a data grid (3D array of floats) from .xsf file. (format documentation is here http://www.xcrysden.org/doc/XSF.html the BEGIN_BLOCK_DATAGRID_3D block )
the problem is that data are in 5 columns and if the number of elements Nx*Ny*Nz is not divisible by 5 than the last line can have any length.
For this reason I'm not able to use numpy.genfromtxt() of numpy.loadtxt() ...
I made a subroutine which does solve the problem, but is terribly slow ( because it use tight loops probably ). The files i want to read are large ( >200 MB 200x200x200 = 8000000 numbers in ASCII )
Is there any really fast way how to read such unfriendly formats in python / numpy into ndarray?
xsf datagrids looks like this (example for shape=(3,3,3))
BEGIN_BLOCK_DATAGRID_3D
BEGIN_DATAGRID_3D_this_is_3Dgrid
3 3 3 # number of elements Nx Ny Nz
0.0 0.0 0.0 # grid origin in real space
1.0 0.0 0.0 # grid size in real space
0.0 1.0 0.0
0.0 0.0 1.0
0.000 1.000 2.000 5.196 8.000 # data in 5 columns
1.000 1.414 2.236 5.292 8.062
2.000 2.236 2.828 5.568 8.246
3.000 3.162 3.606 6.000 8.544
4.000 4.123 4.472 6.557 8.944
1.000 1.414 # this is the problem
END_DATAGRID_3D
END_BLOCK_DATAGRID_3D
I got something working with Pandas and Numpy. Pandas will fill in nan values for the missing data.
import pandas as pd
import numpy as np
df = pd.read_csv("xyz.data", header=None, delimiter=r'\s+', dtype=np.float, skiprows=7, skipfooter=2)
data = df.values.flatten()
data = data[~np.isnan(data)]
result = data.reshape((data.size/3, 3))
Output
>>> result
array([[ 0. , 1. , 2. ],
[ 5.196, 8. , 1. ],
[ 1.414, 2.236, 5.292],
[ 8.062, 2. , 2.236],
[ 2.828, 5.568, 8.246],
[ 3. , 3.162, 3.606],
[ 6. , 8.544, 4. ],
[ 4.123, 4.472, 6.557],
[ 8.944, 1. , 1.414]])