I was using parameters and formulations below to generate signals.
python code:
import numpy as np
fs=15e6
dt=1/fs
f0=1e6
pri=400e-6
t=np.arange(0,pri,dt)
i=64
fd=5/(i*pri)
xt=0.1*np.exp(2j*np.pi*f0*t)
xf=np.fft.fft(xt)
matlab code is very similar with python code:
fs=15e6
dt=1/fs
f0=1e6
pri=400e-6
t=0:dt:pri-dt
i=64
fd=5/(i*pri)
xt=0.1*exp(2j*pi*f0*t)
xf=fft(xt)
These code will generate an array of length 6000 to perform fft. Then I calculate the result in matlab using the same method. The result is absolutely same when the fft length is less than 6000. But it became a little different when the fft length is 6000.
The result of xf in python is:
xf[:5] = [4.68819428e-12-2.53650626e-12j,
6.55886345e-12+4.51937973e-13j,
5.91758655e-12+4.48215898e-12j,
2.07297400e-12+6.37992397e-12j,
-1.44454940e-12+5.60550355e-12j]
The result of xf in matlab is:
xf(1:5) = 5.165829569664382e-12+1.503743771929872e-12j
4.389776854811194e-12+5.127317569216533e-12j
1.067288620484369e-12+7.191186166371298e-12j
-3.058138112418996e-12+6.189531470616248e-12j
-5.288313073640339e-12+2.908982377132765e-12j
if use length 5999 to do fft like this in python:
xf=np.fft.fft(xt, 5999)
or in matlab:
xf=fft(xt, 5999)
The result is absolutely identical.
In python:
xf[:5] = [-0.09135455+0.04067366j,
-0.09160153+0.04072616j,
-0.09184974+0.04077892j,
-0.09209917+0.04083194j,
-0.09234986+0.04088522j]
In matlab:
xf(1:5) = -9.135455e-02+4.067366e-2j
-9.160153e-02+4.072616e-2j
-9.184974e-02+4.077892e-2j
-9.209917e-02+4.083194e-2j
-9.234986e-02+4.088522e-2j
I was confused. Can anybody illustrate this phenomenon? Thanks for your help.
PS: python 3.8.5, numpy 1.19.2, matlab 2014
demio. I think the different values you are getting is because MATLAB's floating point rounding errors. For low values, of order 1e-15, that values are rounded to 0 and that generates an error of the order that is being rounded to. It happens the same way for really big values. You can see a related post with pretty good explanation of this on: https://es.mathworks.com/matlabcentral/answers/475494-unexpected-results-due-to-floating-point-rounding-errors-by-performing-arithmetic-calculations-on-la.
Also it is worth noticing that even though this floating point rounding errors always occur you have to determine whether that's significant or not taking into account your set of data and the result you are expecting. Sometimes those absolute differences does not mean anything because the relative differences are marginal. If you wish to avoid this behavior from MATLAB you need to use the sym function, that triggers MATLAB to use a Symbolic representation which involves several things, one of them being that the numbers are represented more accurately. More on this subject can be found here: https://es.mathworks.com/help/symbolic/create-symbolic-numbers-variables-and-expressions.html#buyfu27.
Related
I have a matrix that look like the one below. It is always a square matrix (up to 1000 x 1000) with the values are between 0 and 1:
data = np.array([[0.0308, 0.07919, 0.05694, 0.00662, 0.00927],
[0.07919, 0.00757, 0.00720, 0.00526, 0.00709],
[0.05694, 0.00720, 0.00518, 0.00707, 0.00413],
[0.00662, 0.00526, 0.00707, 0.01612, 0.00359],
[0.00927, 0.00709, 0.00413, 0.00359, 0.00870]])
When I try to take the natural log of this matrix, using scipy.linalg.logm, it gives me the following result.
print(logm(data))
>> [[-2.3492917 +1.42962407j 0.15360003-1.26717846j 0.15382223-0.91631624j 0.15673496+0.0443927j 0.20636448-0.01113953j]
[ 0.15360003-1.26717846j -3.75764578+2.16378501j 1.92614937-0.60836013j -0.13584605+0.27652444j 0.27819383-0.25190565j]
[ 0.15382223-0.91631624j 1.92614937-0.60836013j -5.08018989+2.52657239j 0.37036433-0.45966441j -0.03892575+0.36450564j]
[ 0.15673496+0.0443927j -0.13584605+0.27652444j 0.37036433-0.45966441j -4.22733838+0.09726189j 0.26291385-0.07980921j]
[ 0.20636448-0.01113953j 0.27819383-0.25190565j -0.03892575+0.36450564j 0.26291385-0.07980921j -4.91972246+0.06594195j]]
First of all, why is this happening? Based on another post I found here, pertaining to a different scipy.linalg method, this is due to truncation and rounding issues caused by floating point errors.
If that is correct, then how am I able to fix it? The second answer on that same linked post suggested this:
(2) All imaginary parts returned by numpy's linalg.eig are close to the machine precision. Thus you should consider them zero.
Is this correct? I can use numpy.real(data) to simply discard the complex portion of the values, but I don't know if that is a mathematically (or scientifically) robust thing to do.
Additionally I attempted to use tensorflow's linalg.logm method, but got the exact same complex results meaning this isn't unexpected behavior?
the following code is from a reference notebook from Kaggle House Price Prediction:
X=train_df.drop(['SalePrice'],axis=1)
y=train_df.SalePrice
X_pwr=power_transformer.fit_transform(X)
test_std=std_scaler.fit_transform(test_df)
test_rbst=rbst_scaler.fit_transform(test_df)
test_pwr=power_transformer.fit_transform(test_df)
gb_reg = GradientBoostingRegressor(n_estimators=1792,
learning_rate=0.01005, max_depth=4, max_features='sqrt',
min_samples_leaf=15, min_samples_split=14, loss='huber', random_state =42)
gb_reg.fit(X_pwr, y)
y_head=gb_reg.predict(X_test)
test_pred_gb=gb_reg.predict(test_pwr)
test_pred_gb=pd.DataFrame(test_pred_gb,columns=['SalePrice'])
test_pred_gb.SalePrice =np.floor(np.expm1(test_pred_gb.SalePrice))
sample_sub.iloc[:,1]=(0.5 * test_pred_gb.iloc[:,0])+(0.5 *
old_prediction.iloc[:,1])
#here old_prediction is the sample prediction given by kaggle
I wanna know the reason for the last line of code. Why they are assigning exponent of predicted values.
also, the last line is giving runtime warning: overflow encountered in expm1. I also wanna know how to solve this overflow problem because, after this step, all the SalePrice is replaced by Nan
For the first question, it is hard to say without seeing more code, though I doubt there is a good reason, as the numbers you are feeding np.expm1 are apparently large (which makes sense if they're the sale prices of houses). This brings me to the second question:
expm1 is a special function for computing exp(x) - 1. It returns greater precision for very small x than just using exp(x) - 1. I don't know the exact way in which numpy is performing the calculation, though typically it is done with a Taylor Series. You start with the Taylor Series for exp(x) and just move the initial term of 1 over to the other side to get exp(x) - 1 = a large polynomial sum of terms. This polynomial contains things like x^n and n! where n is the number of terms that the polynomial is taken to (i.e. the level of precision). For large x, the numbers get unwieldy pretty quick! In other words, you quickly approach the limits of how big a number can be represented in bits on your operating system. To show this, just try the following:
import numpy as np
import warnings
warnings.filterwarnings('error')
for i in range(200000):
try:
np.expm1(i)
except Warning:
print(i)
break
Which, on my system, prints 710. For a workaround, you may try and get away with making large numbers small (i.e. a price of $200,000 would really be 0.2 mega-dollars).
I noticed that numpy has a built in function linalg.norm(vector), which produces the magnitude. For small values I get the desired output
>>> import numpy as np
>>> np.linalg.norm([0,2])
2.0
However for large values:
>>> np.linalg.norm([0,149600000000])
2063840737.6330884
This is a huge error, what could I do instead. Making my own function seems to produce the same error. What is the problem here, is a rounding error this big?, and what can I do instead?
Your number is written as an integer, and yet it is too big to fit into a numpy.int32. This problem seems to happen even in python3, where
the native numbers are big.
In numerical work I try to make everything floating point unless it is an index. So I tried:
In [3]: np.linalg.norm([0.0,149600000000.0])
Out[3]: 149600000000.0
To elaborate: in this case Adding the .0 was an easy way of turning integers into doubles. In more realistic code, you might have incoming data which is of uncertain type. The safest (but not always the right) thing to do is just coerce to a floating point array at the top of your function.
def do_something_with_array(arr):
arr = np.double(arr) # or np.float32 if you prefer.
... do something ...
I have a matlab code that I'm trying to translate in python.
I'm new on python but I have been able to answer a lot of questions googling a little bit.
But now, I'm trying to figure out the following:
I have a for loop when I apply different things on each column, but you don't know the number of columns. For example.
In matlab, nothing easier than this:
for n = 1:size(x,2); y(n) = mean(x(:,n)); end
But I have no idea how to do it on python when, for example, the number of columns is 1, because I can't do x[:,1] in python.
Any idea?
Thanx
Yes, if you use numpy you can use x[:,1], and also you get other data structures (vectors instead of lists), the main difference between matlab and numpy is that matlab uses matrices for calculations and numpy uses vectors, but you get used to it, I think this guide will help you out.
Try numpy. It is a python bindings for high-performance math library written in C. I believe it has the same concepts of matrix slice operations, and it is significantly faster than the same code written in pure python (in most cases).
Regarding your example, I think the closest would be something using numpy.mean.
In pure python it is hard to calculate mean of column, but it you are able to transpose the matrix you could do it using something like this:
# there are no builtin avg function
def avg(lst):
return sum(lst)/len(lst)
rows = list(avg(row) for row in a)
this is one way to do it
from numpy import *
x=matrix([[1,2,3],[2,3,4]])
[mean(x[:,n]) for n in range(shape(x)[1])]
# [1.5, 2.5, 3.5]
I wonder if it is possible to exactly reproduce the whole sequence of randn() of MATLAB with NumPy. I coded my own routine with Python/Numpy, and it is giving me a little bit different results from the MATLAB code somebody else did, and I am having hard time finding out where it is coming from because of different random draws.
I have found the numpy.random.seed value which produces the same number for the first draw, but from the second draw and on, it is completely different. I'm making multivariate normal draws for about 20,000 times so I don't want to just save the matlab draws and read it in Python.
The user asked if it was possible to reproduce the output of randn() of Matlab, not rand. I have not been able to set the algorithm or seed to reproduce the exact number for randn(), but the solution below works for me.
In Matlab: Generate your normal distributed random numbers as follows:
rng(1);
norminv(rand(1,5),0,1)
ans =
-0.2095 0.5838 -3.6849 -0.5177 -1.0504
In Python: Generate your normal distributed random numbers as follows:
import numpy as np
from scipy.stats import norm
np.random.seed(1)
norm.ppf(np.random.rand(1,5))
array([[-0.2095, 0.5838, -3.6849, -0.5177,-1.0504]])
It is quite convenient to have functions, which can reproduce equal random numbers, when moving from Matlab to Python or vice versa.
If you set the random number generator to the same seed, it will theoretically create the same numbers, ie in matlab. I am not quite sure how to best do it, but this seems to work, in matlab do:
rand('twister', 5489)
and corresponding in numy:
np.random.seed(5489)
To (re)initalize your random number generators. This gives for me the same numbers for rand() and np.random.random(), however not for randn, I am not sure if there is an easy method for that.
With newer matlab versions you can probably set up a RandStream with the same properties as numpy, for older you can reproduce numpy's randn in matlab (or vice versa). Numpy uses the polar form to create the uniform numbers from np.random.random() (the second algorithm given here: http://www.taygeta.com/random/gaussian.html). You could just write that algorithm in matlab to create the same randn numbers as numpy does from the rand function in matlab.
If you don't need a huge amount of random numbers, just save them in a .mat and read them from scipy.io though...
Just wanted to further clarify on using the twister/seeding method: MATLAB and numpy generate the same sequence using this seeding but will fill them out in matrices differently.
MATLAB fills out a matrix down columns, while python goes down rows. So in order to get the same matrices in both, you have to transpose:
MATLAB:
rand('twister', 1337);
A = rand(3,5)
A =
Columns 1 through 2
0.262024675015582 0.459316887214567
0.158683972154466 0.321000540520167
0.278126519494360 0.518392820597537
Columns 3 through 4
0.261942925565145 0.115274226683149
0.976085284877434 0.386275068634359
0.732814552690482 0.628501179539712
Column 5
0.125057926335599
0.983548605143641
0.443224868645128
python:
import numpy as np
np.random.seed(1337)
A = np.random.random((5,3))
A.T
array([[ 0.26202468, 0.45931689, 0.26194293, 0.11527423, 0.12505793],
[ 0.15868397, 0.32100054, 0.97608528, 0.38627507, 0.98354861],
[ 0.27812652, 0.51839282, 0.73281455, 0.62850118, 0.44322487]])
Note: I also placed this answer on this similar question: Comparing Matlab and Numpy code that uses random number generation