i have a streamed power data in real time coming from my electric meter, and when i see the load with my eyes i can tell which kind of appliance is on.
Currently i'm using a sliding window of ten points and calculating the standard deviation to detect appliances turning on or off. The aim is to know how much each appliance is consuming by an integral calculation. I need help to perform a signal disaggregation in real Time os i can calculate the inegral of each appliance and avoid having cross calculated consumption values that can happen like in this img
Thx in advance for any help you could provide!
If it's just about distinguish between on and off state, naive bayes classification might do the work (https://machinelearningmastery.com/naive-bayes-classifier-scratch-python/) there are several interesting links at the end.
If you want to disaggregate various consumers, an artificial neural network might be a possible solution using TensorFlow https://www.tensorflow.org/tutorials/
An issue here is to generate the labeled training data from scratch.
Performing a fast fourier analysis is used e.g. for detection of hifi equipment - as each device has a specific spectrum.
Related
Im dealing with with identifying model dynamics for human movement mistakes from each attemp(trial) to the next, so my time-series data is discrete in nature unlike most sampled-data, also depending on the variable an input of such nature might exist(Sys ID) or not(time-series estimation). I come from a control engineering background and Im well aware of importance of sampling time selection for analysis, system Identification and controls design for sampled-data systems. Im confused about what should I select as sampling-period in either MATLAB or Python when dealing with discrete-time system or signal for which sampling time is not defined. Id appreciate if someone could point to a paper/book that discusses non-sampled discrete time system identification or time-series estimation.
In MATLAB system identification toolbox I can enter 1 as sampling period however MATLAB then will assume that it is a 1 sec sampling period rather than the fact that the signal is a non-sampled discrete time.
I am trying to use wavelets coefficients as feature for neural networks on a time series data and I am bit confused on usage of the same. Do I need to find the coefficients on entire time series at once, or use a sliding window for finding the same. I mean, will finding coefficients on entire time series for once, include the future data points while determining those coefficients? What should be the approach to go about using Wavelets on a time series data without look ahead bias if any?
It is hard to provide you with a detailed answer without knowing what you are trying to achieve.
In a nutshell, you first need to decide whether you want to apply a discrete (DWT) or a continous (CWT) wavelet transform to your time series.
A DWT will allow you to decompose your input data into a set of discrete levels, providing you with information about the frequency content of the signal i.e. determining whether the signal contains high frequency variations or low frequency trends. Think of it as applying several band-pass filters to your input data.
I do not think that you should apply a DWT to your entire time series at once. Since you are working with financial data, maybe decomposing your input signal into 1-day windows and applying a DWT on these subsets would do the trick for you.
In any case, I would suggest:
Installing the pywt toolbox and playing with a dummy time series to understand how wavelet decomposition works.
Checking out the abundant literature available about wavelet analysis of financial data. For instance, if you are interested into financial time series forecasting, you might want to read this paper.
Posting your future questions on the DSP stack exchange, unless you have a specific coding-related answer.
I have some audio files recorded from wind turbines, and I'm trying to do anomaly detection. The general idea is if a blade has a fault (e.g. cracking), the sound of this blade will differ with other two blades, so we can basically find a way to extract each blade's sound signal and compare the similarity / distance between them, if one of this signals has a significant difference, we can say the turbine is going to fail.
I only have some faulty samples, labels are lacking.
However, there seems to be no one doing this kind of work, and I met lots of troubles while attempting.
I've tried using stft to convert the signal to power spectrum, and some spikes show. How to identify each blade from the raw data? (Some related work use AutoEncoders to detect anomaly from audio, but in this task we want to use some similarity-based method.)
Anyone has good idea? Have some related work / paper to recommend?
Well...
If your shaft is rotating at, say 1200 RPM or 20 Hz, then all the significant sound produced by that rotation should be at harmonics of 20Hz.
If the turbine has 3 perfect blades, however, then it will be in exactly the same configuration 3 times for every rotation, so all of the sound produced by the rotation should be confined to multiples of 60 Hz.
Energy at the other harmonics of 20 Hz -- 20, 40, 80, 100, etc. -- that is above the noise floor would generally result from differences between the blades.
This of course ignores noise from other sources that are also synchronized to the shaft, which can mess up the analysis.
Assuming that the audio you got is from a location where one can hear individual blades as they pass by, there are two subproblems:
1) Estimate each blade position, and extract the audio for each blade.
2) Compare the signal from each blade to eachother. Determine if one of them is different enough to be considered an anomaly
Estimating the blade position can be done with a sensor that detects the rotation directly. For example based on the magnetic field of the generator. Ideally you would have this kind known-good sensor data, at least while developing your system. It may be possible to estimate using only audio, using some sort of Periodicity Detection. Autocorrelation is a commonly used technique for that.
To detect differences between blades, you can try to use a standard distance function on a standard feature description, like Euclidean on MFCC. You will still need to have some samples for both known faulty examples and known good/acceptable examples, to evaluate your solution.
There is however a risk that this will not be good enough. Then try to compute some better features as basis for the distance computation. Perhaps using an AutoEncoder. You can also try some sort of Similarity Learning.
If you have a good amount of both good and faulty data, you may be able to use a triplet loss setup to learn the similarity metric. Feed in data for two good blades as objects that should be similar, and the known-bad as something that should be dissimilar.
For my master's thesis I am using a 3rd party program (SExtractor) in addition to a python pipeline to work with astronomical image data. SExtractor takes a configuration file with numerous parameters as input, which influences (after some intermediate steps) the statistics of my data. I've already spent way too much time playing around with the parameters, so I've looked a little bit into machine learning and have gained a very basic understanding.
What I am wondering now is: Is it reasonable to use a machine learning algorithm to optimize the parameters of the SExtractor, when the only method to judge the performance or quality of the parameters is with the final statistics of the analysis run (which takes at least an hour on my machine) and there are more than 6 parameters which influence the statistics.
As an example, I have included 2 different versions of the statistics I am referring to, made from slightly different versions of Sextractor parameters. Red line in the left image is the median value of the standard deviation (as it should be). Blue line is the median of the standard deviation as I get them. The right images display the differences of the objects in the 2 data sets.
I know this is a very specific question, but as I am new to machine learning, I can't really judge if this is possible. So it would be great if someone could suggest me if this is a pointless endeavor and point me in the right.
You can try an educated guess based on the data that you already have. You are trying to optimize the parameters such that the median of the standard deviation has the desired value. You could assume various models and try to estimate the parameters based on the model and the estimated data. But I think you should have a good understanding of machine learning to do so. With good I mean beyound an undergraduate course.
I wanted to plot a graph between Average power spectral density(in dbm) and the frequency (2.4 GHZ to 2.5 GHZ).
The basic procedure i used earlier for power vs freq plot was to store the data generated by "usrp_specteum_sense.py" for some time period and then taking average.
Can i calculate PSD from the power used in "usrp_spectrum_sense.py"?
Is there any way to calculate PSD directly from usrp data?
Is there any other apporch which can be used to calculate PSD using USRP for desired range of frquency??
PS: I recently found out about the psd() in matplotlib, can it be use to solve my problem??
I wasn't 100% sure whether or not to mark this question a duplicate of Retrieve data from USRP N210 device ; however, since the poster of that question was very confused and so was his question, let's answer this in a concise way:
What an SDR device like the USRP does is give you digital samples. What these are is nothing more or less than what the ADC (Analog-to-Digital converter) makes out of the voltages it sees. Then, those numbers are subject to a DSP chain that does frequency shifting, decimation and appropriate filtering. In other words, the discrete complex signal's envelope coming from the USRP should be proportional to the voltages observed by the ADC. Thanks to physics, that means that the magnitude square of these samples should be proportional to the signal power as seen by the ADC.
Thus, the values you get are "dBFS" (dB relative to Full Scale), which is an arbitrary measure relative to the maximum value the signal processing chain might produce.
Now, notice two things:
As seen by the ADC is important. Prior to the ADC there's
an unknown antenna with a) an unknown efficiency and b) unknown radiation pattern illuminated from an unknown direction,
connected to a cable that might or might not perfectly match the antennas impedance, and that might or might not perfectly match the USRP's RF front-end's impedance,
potentially a bank of preselection filters with different attenuations,
a low-noise frontend amplifier, depending on the device/daughterboard with adjustable gain, with non-perfectly flat gain over frequency
a mixer with frequency-dependent gain,
baseband and/or IF gain stages and attenuators, adjustable,
baseband filters, might be adjustable,
component variances in PCBs, connectors, passives and active components, temperature-dependent gain and intermodulation, as well as
ADC non-linearity, frequency-dependent behaviour.
proportional is important here, since after sampling, there will be
I/Q imbalance correction,
DC/LO leakage cancellation,
anti-aliasing filtering prior to
decimation,
and bit-width and numerical type changing operations.
All in all, the USRPs are not calibrated measurement devices. They are pretty nice, and if chose the right one for your specific application, you might just need to calibrate once with a known external power source feeding exactly your system from antenna to sampling rate coming out at the end, at exactly the frequency you want to observe. After knowing "ok, when I feed in x dBm of power, I see y dBFS, so there's this factor (x-y) dB between dBFS", you now have calibrated your device for exactly one configuration consisting of
hardware models and individual units used, including antennas and cables,
center frequency,
gain,
filter settings,
decimation/sampling rate
Note that doing such calibrations, especially in the 2.4 GHz ISM band will require a "RF silent" room – it'll be hard to find an office or lab with no 2.4 GHz devices these days, and the reason why these frequencies are free for usage is that microwave ovens interfere; and then there's the fact that these frequencies tend to diffract and reflect on building structures, PC cases, furniture with metal parts... In other words: get access to an anechoic chamber, a reference transmit antenna and transmit power source, and do the whole antenna system calibration dance that results in a directivity diagram normally, but instead generate a "digital value relative to transmit power" measurement. Whether or not that measurement is really representative for how you'll be using your USRP in a lab environment is very much up for your consideration.
That is a problem of any microwave equipment, not only the USRPs – RF propagation isn't easy to predict in complex environments, and the power characteristics of a receiving system isn't determined by a single component, but by the system as a whole in exactly its intended operational environment. Thus, calibration must require you either know your antenna, cable, measurement frontend, digitizer and DSP exactly and can do the math including error margins, or that you calibrate the system as a whole, and change as little as possible afterwards.
So: No. No Matlab function in this world can give meaning to numbers that isn't in these numbers – for absolute power, you'll need to calibrate against a reference.
Another word on linearity: A USRP's analog hardware at full gain is pretty sensitive – so much sensitive that operating e.g. a WiFi device in the same room would be like screaming in its ear, blanking out weaker signals, and driving the analog signal chain into non-linearity. In that case, not only do the voltages observed by the ADC lose their linear relation to the voltages inserted at the antenna port, but also, and that is usually worse, amplifiers become mixers, so unwanted intermodulation introduces energy in spectral places where there was none. So make sure you operate your device in a place where you make the most of your signal's dynamic range without running into nonlinearities.