Different result using welch function between Matlab and Python - python

When I run welch function on the same data in Matlab and Python, I get slightly PSD estimation difference, while the sample frequencies are identical.
Here is the parameters i used in both Matlab and Python:
Matlab:
winlength=512;
overlap=0;
fftlength=1024;
fs=127.886;
[PSD, freqs] = pwelch(data, winlength, overlap, fftlength, fs);
Python:
freqs, PSD = welch(data, fs=127.886, window='hamming', nperseg=512,
noverlap=None, nfft=1024)
here's a plot presenting the difference:
enter image description here
Does anyone have any idea what should I change to get the same results of PSD?

In the Matlab documentation https://se.mathworks.com/help/signal/ref/pwelch.html it says that the overlap parameter has to be a positive integer thus 0 is not a valid value.
If you omit the overlap value - (or declare a non-valid value) the parameter is automatically set to a 50% overlap i.e. changing the curve.
Try to set the Python function to a 50% overlap and see if they match.
BTW you rarely want to have zero overlap as this is likely to cause transients in the signal.

Related

How can I compress an image in Python using a range of singular values?

I'm working on an assignment for college where we need to find the SVD of an image and then calculate the min. and max. singular values of the Sigma matrix. All good up to this point. Where I'm having trouble is the second part of the assignment, where I'm asked to build a function that receives as an argument a value between the min. and max. I calculated before and then returns an image made up from an SVD that uses the singular values equal or higher than the one given in the argument.
I've found some functions online but all I've found are functions that take as an argument a certain quantity of singular values to be used for the SVD, not a range of them.
So far I have this:
def new_image(k):
U,s,V = np.linalg.svd(ae,full_matrices = False)
matrix_reconst = np.dot(U[:,:],np.dot(np.diag(s[:k]),V[:k,:]))
Here k represents the number of singular values that will be used to build the image.
However I need to build a script in which, given a value between the minimum and maximum of the singular values of the SVD of the image, it will take all the singular values equal or higher than that given value and rebuilds an image according to that.
So, for example, if my singular values range from 1 to 500, I can insert into my function the number 49 and it will "rebuild" the image using the singular values equal or higher than 49 up to 500.
I'm sorry for the confusing explanation, I'm not sure how to explain it in a better way :(
Thank you so much in advance for your answers!!

How to filter unusefull data in a dataset using python?

I have a dataset : temperature and pressure values in different ranges.
I want to filter out all data that deviates more than x% from the "normal" value. This data occurs on process failures .
Extra: the normal value can change over a longer time , so what is a exception at timestamp1 can be normal at timestamp2.
I looked into some noise-filters but i'm not sure this is noise.
You asked two questions.
1.
Tack on a derived column, so it's easy to filter.
For "x%", like five percent, you might use
avg = np.mean(df.pressure)
df['pres_deviation'] = abs(df.pressure - avg) / avg
print(df[df.pres_deviation < .05])
But rather than working with a percentage,
you might find it more natural to work with standard deviations,
filtering out e.g. values more than three standard deviations from the mean.
See
https://en.wikipedia.org/wiki/Standard_score
sklearn StandardScaler
2.
(Extra: the normal value can change over time.)
You could use a window of "most recent 100 samples" to define a smoothed average, store that as an extra column, and it replaces the avg scalar in the calculations above.
More generally you could manually set high / low thresholds as a time series in your data.
The area you're describing is called "change point detection", and we find an extensive literature on it, see e.g. https://paperswithcode.com/task/change-point-detection .
I have used ruptures to good effect, and I recommend it to you.

Python: p value from scipy is not correct

i want to calculate a p value with the package scipy. this is the code is used:
x = st.ttest_1samp(df_efw.stack(),np.round(np.mean(df_lw).mean(),2))
This is my output:
Ttest_1sampResult(statistic=-1.3939917717040629, pvalue=0.16382682901590806)
I also calculated it manually and my statistic value is correct but the p value is not..? The p value can be read on the standard normal distribution table.
So the problem is: if you read the table you will see that -1,39399 has a p value of 0,0823 and not 0,1638. So i am thinking that i did the code wrong or i am interpreting something wrong. What is it?
By default, ttest_1samp returns the two-sided or two-tailed p-value, which is twice the single-sided p-value due to the symmetry about 0 of the t distribution. Consistent with this, your manually computed single-sided p-value is (roughly) half of SciPy's p-value.
One solution is just to divide the two-sided p-value from ttest_1samp by 2. In SciPy 1.6.0 and later, you can pass the argument alternative='greater' or alternative='less' to get a single-sided p-value.
Further Reading
ttest_1samp documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_1samp.html
The GitHub issue where the alternative argument was proposed: https://github.com/scipy/scipy/pull/12597
The resulting pull request: https://github.com/scipy/scipy/pull/12597

Why is Python randint() generating bizarre grid effect when used in script to transform image?

I am playing around with images and the random number generator in Python, and I would like to understand the strange output picture my script is generating.
The larger script iterates through each pixel (left to right, then next row down) and changes the color. I used the following function to offset the given input red, green, and blue values by a randomly determined integer between 0 and 150 (so this formula is invoked 3 times for R, G, and B in each iteration):
def colCh(cVal):
random.seed()
rnd = random.randint(0,150)
newVal = max(min(cVal - 75 + rnd,255),0)
return newVal
My understanding is that random.seed() without arguments uses the system clock as the seed value. Given that it is invoked prior to the calculation of each offset value, I would have expected a fairly random output.
When reviewing the numerical output, it does appear to be quite random:
Scatter plot of every 100th R value as x and R' as y:
However, the picture this script generates has a very peculiar grid effect:
Output picture with grid effect hopefully noticeable:
Furthermore, fwiw, this grid effect seems to appear or disappear at different zoom levels.
I will be experimenting with new methods of creating seed values, but I can't just let this go without trying to get an answer.
Can anybody explain why this is happening? THANKS!!!
Update: Per Dan's comment about possible issues from JPEG compression, the input file format is .jpg and the output file format is .png. I would assume only the output file format would potentially create the issue he describes, but I admittedly do not understand how JPEG compression works at all. In order to try and isolate JPEG compression as the culprit, I changed the script so that the colCh function that creates the randomness is excluded. Instead, it merely reads the original R,G,B values and writes those exact values as the new R,G,B values. As one would expect, this outputs the exact same picture as the input picture. Even when, say, multiplying each R,G,B value by 1.2, I am not seeing the grid effect. Therefore, I am fairly confident this is being caused by the colCh function (i.e. the randomness).
Update 2: I have updated my code to incorporate Sascha's suggestions. First, I moved random.seed() outside of the function so that it is not reseeding based on the system time in every iteration. Second, while I am not quite sure I understand how there is bias in my original function, I am now sampling from a positive/negative distribution. Unfortunately, I am still seeing the grid effect. Here is my new code:
random.seed()
def colCh(cVal):
rnd = random.uniform(-75,75)
newVal = int(max(min(cVal + rnd,255),0))
return newVal
Any more ideas?
As imgur is down for me right now, some guessing:
Your usage of PRNGs is a bit scary. Don't use time-based seeds in very frequently called loops. It's very much possible, that the same seeds are generated and of course this will generate patterns. (granularity of time + number of random-bits used matter here)
So: seed your PRNG once! Don't do this every time, don't do this for every channel. Seed one global PRNG and use it for all operations.
There should be no pattern then.
(If there is: also check the effect of interpolation = image-size change)
Edit: As imgur is on now, i recognized the macro-block like patterns, like Dan mentioned in the comments. Please change your PRNG-usage first before further analysis. And maybe show more complete code.
It may be possible, that you recompressed the output and JPEG-compression emphasized the effects observed before.
Another thing is:
newVal = max(min(cVal - 75 + rnd,255),0)
There is a bit of a bias here (better approach: sample from symmetric negative/positive distribution and clip between 0,255), which can also emphasize some effect (what looked those macroblocks before?).

Fitting on a semi-logarithmic scale and transfering it back to normal?

I am working with IFFT and have a set of real and imaginary values with their respective frequencies (x-axis). The frequencies are not equidistant, I can't use a discrete IFFT, and I am unable to fit my data correctly, because the values are so jumpy at the beginning. So my plan is to "stretch out" my frequency data points on a lg-scale, fit them (with polyfit) and then return - somehow - to normal scale.
f = data[0:27,0] #x-values
re = daten[0:27,5] #y-values
lgf = p.log10(f)
polylog_re = p.poly1d(p.polyfit(lgf, re, 6))
The fit works definitely better (http://imgur.com/btmC3P0), but is it possible to then transform my polynom back into the normal x-scaling? Right now I'm using those logarithmic fits for my IFFT and take the log10 of my transformed values for plotting etc., but that probably defies all mathematical logic and results in errors.
Your fit is perfectly valid but not a regular polynomial fit. By using log_10(x), you use another model function. Something like y(x)=sum(a_i * 10^(x_i^i). If this is okay for you, you are done. When you wan't to do some more maths, I would suggest using the natural logarithm instead the one to base 10.

Categories

Resources